Ttooleras
🤖

Robots.txt Generator

Web Tools

Generate robots.txt files with allow and disallow rules, sitemap references, and user-agent targeting to control search engine crawling.. Free, private — all processing in your browser.

This tool is coming soon. Check back later!

Advertisement

The Robots.txt Generator creates a valid robots.txt file for your website with allow/disallow rules, user-agent targeting, sitemap references, and crawl-delay directives. Robots.txt is the file search engines check first when crawling your site to determine which pages they can or cannot access. Used correctly, it prevents crawlers from wasting resources on admin pages, filter URLs, and other content that shouldn\u2019t appear in search. Used incorrectly, it can inadvertently hide your entire site from search engines.

Configure rules via structured forms: specify user agents (* for all, or specific like Googlebot, Bingbot), add disallow patterns for paths to block, allow patterns for exceptions, set crawl-delay (mostly for Bing), and reference your sitemap URL. The tool outputs standards-compliant robots.txt ready to upload to your site root. Common presets included: standard WordPress robots.txt, e-commerce with blocked checkout and cart, single-page app avoiding crawl of query variations. All generation runs in your browser.

Robots.txt Generator — key features

Rule builder

Configure allow/disallow rules via structured forms rather than hand-writing syntax.

User-agent targeting

Different rules for Googlebot, Bingbot, or all crawlers.

Common templates

Presets for WordPress, e-commerce, SPA, news site, and other common patterns.

Sitemap reference

Automatic inclusion of Sitemap: line with your sitemap URL.

Wildcard support

Wildcards * and $ supported with validation that the pattern matches as intended.

Validation

Checks syntax, detects common mistakes (blocking entire site, conflicting rules), and warns.

Preview

See the full robots.txt output before downloading.

Client-side only

Configuration data stays in your browser.

How to use the Robots.txt Generator

  1. 1

    Pick a template or start blank

    Use a common preset or configure from scratch.

  2. 2

    Add user-agent blocks

    For each user-agent (* for all, or specific), define allow and disallow rules.

  3. 3

    Add sitemap reference

    Include your sitemap URL so crawlers discover it via robots.txt.

  4. 4

    Review and validate

    Check the generated robots.txt for syntax issues and unexpected blocks.

  5. 5

    Upload

    Download robots.txt and upload to your site root (https://example.com/robots.txt).

Common use cases for the Robots.txt Generator

SEO administration

  • Block admin sections: Prevent /admin/, /wp-admin/, or similar paths from being crawled.
  • Block duplicate content: Disallow filter, sort, and search URLs that generate many variants of the same content.
  • Reference sitemap: Include Sitemap: directive so crawlers auto-discover your sitemap without submission.

Site development

  • Staging site protection: Use robots.txt to signal that staging shouldn’t be indexed (though HTTP auth is more reliable).
  • Block crawl waste: Prevent crawlers from hammering dynamic URL patterns that generate infinite variations.
  • API endpoint exclusion: Block /api/ paths from crawling — they’re not meant for search engines.

Compliance

  • AI crawler blocking: Disallow GPTBot, CCBot, Google-Extended for sites that don’t want content used in AI training.
  • Archive.org control: Disallow ia_archiver if you want your site excluded from the Wayback Machine.
  • Social crawler exceptions: Allow specific bots (facebookexternalhit, Twitterbot) even when blocking others.

Robots.txt Generator — examples

Basic WordPress

Standard WordPress site.

Input
WP template
Output
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://example.com/sitemap.xml

E-commerce

Block cart, checkout, search.

Input
e-commerce template
Output
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /search?

Sitemap: https://example.com/sitemap.xml

Allow all

Permissive robots.txt.

Input
allow all
Output
User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

Block AI crawlers

Prevent content use in AI training.

Input
block GPTBot and others
Output
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

File type block

Prevent PDF indexing.

Input
disallow PDFs
Output
User-agent: *
Disallow: /*.pdf$

Technical details

Robots.txt follows the Robots Exclusion Protocol (originally 1994). Google formalized it in 2019 as a proposed RFC. Placed at the root: https://example.com/robots.txt.

Basic syntax:

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

Sitemap: https://example.com/sitemap.xml

Directives:
- User-agent: identifies which crawler the following rules apply to. * means all. Googlebot, Bingbot, Slurp, DuckDuckBot are specific.
- Disallow: path prefix or pattern to block
- Allow: exception to Disallow (specific path inside a disallowed directory)
- Crawl-delay: seconds between crawler requests (ignored by Google; respected by Bing, Yandex)
- Sitemap: URL of sitemap file (reference only, not rule)

Path matching:
- Disallow: /admin/ blocks /admin/ and everything inside
- Disallow: /page blocks /page, /pages, /page.html (prefix match)
- Disallow: /*.pdf$ blocks URLs ending with .pdf (wildcard)
- Disallow: / blocks everything (emergency use only)
- Disallow: (empty) allows everything for this user-agent

Google supports * and $ wildcards. Older crawlers may not.

Rule precedence:
- Most specific rule wins
- Allow beats Disallow when paths are equally specific
- Different User-agent blocks are independent; a specific user-agent matches first if named

Common patterns:
- Allow all: User-agent: * / Disallow: (empty)
- Block all: User-agent: * / Disallow: /
- Block admin: User-agent: * / Disallow: /admin/
- Block search parameters: Disallow: /*?s= (block internal search URLs)
- Block file types: Disallow: /*.pdf$ (block PDF indexing)

Important caveats:
- Robots.txt is a request, not an enforcement. Well-behaved crawlers respect it; malicious ones ignore it
- Blocking URLs in robots.txt does NOT deindex them. Google may keep previously indexed URLs. For deindexing, use noindex meta tag or X-Robots-Tag header
- Robots.txt blocks affect crawling but may not affect indexing if the page is linked from elsewhere
- Don\u2019t use robots.txt to hide sensitive content — it reveals the existence of paths. Use authentication or IP restriction
- Syntax errors can silently break robots.txt — always validate

Size limits: Google parses first 500 KiB of robots.txt. Anything beyond is ignored.

Case sensitivity: paths are case-sensitive on most servers. /Admin/ is different from /admin/ on Unix-based hosts.

Common problems and solutions

Accidentally blocking whole site

Disallow: / blocks everything. Common mistake on staging sites that gets pushed to production. Always verify with Google Search Console robots.txt tester after changes.

Robots.txt doesn’t deindex

Blocking a URL via robots.txt doesn’t remove it from Google’s index if it was already indexed. Use noindex meta tag (and let Google crawl the page to see it) for deindexing.

Case sensitivity

Disallow: /Admin/ won’t block /admin/ on Unix servers. Match exact case of your actual URLs.

Pattern too broad

Disallow: /page blocks /page, /pages, /pagecount, /page.html. Use /page/ or /page$ for more specific matching.

Multiple Allow beats Disallow

Allow: /public/ and Disallow: /public/private.html — private.html is blocked because Disallow is more specific. Order doesn’t matter; specificity does.

Syntax error silently breaks file

Robots.txt is tolerant of errors — broken lines are ignored. This can lead to rules not applying as expected. Validate with Google’s testing tool.

Revealing secret paths

Disallow: /secret-admin/ reveals the existence of that path to anyone who reads robots.txt. Use proper authentication, not obscurity via robots.txt.

Robots.txt Generator — comparisons and alternatives

Compared to writing robots.txt by hand, this tool provides structured forms and validates against common mistakes. Hand-writing is possible but error-prone.

Compared to CMS auto-generated robots.txt (WordPress, Shopify), this tool gives manual control for sites without auto-generation or when CMS defaults don\u2019t fit.

Compared to Google\u2019s robots.txt tester, this tool is for generation; the tester is for validation. Generate here, test there.

Frequently asked questions about the Robots.txt Generator

What is robots.txt?

A text file at your site root (example.com/robots.txt) that tells search engine crawlers which URLs they can or cannot access. Standard since 1994. Used for controlling crawl behavior, not for hiding content (since malicious bots can ignore it).

Does robots.txt affect rankings?

Not directly. It affects what gets crawled. If important pages are accidentally blocked, they won’t be indexed and therefore won’t rank. So correct robots.txt is important for SEO by preventing accidental blocking.

What’s the difference between robots.txt and noindex?

Robots.txt blocks crawling (page may still be indexed via external links). Noindex meta tag blocks indexing (page is still crawled). Use robots.txt to save crawl budget; use noindex to prevent pages from appearing in search results.

Can I use wildcards in robots.txt?

Yes. * matches any sequence of characters; $ anchors to end. Google supports both. Disallow: /*.pdf$ blocks all .pdf URLs. Not all crawlers support wildcards, so check your target crawlers.

Should I block AI crawlers?

Depends on your preference. GPTBot (OpenAI), CCBot (Common Crawl, used by many AI companies), and Google-Extended (Gemini/Bard training) are common targets. Blocking them prevents your content from being used in AI training but may limit future AI features that reference your site.

How do I test my robots.txt?

Google Search Console has a robots.txt tester that validates syntax and lets you test whether specific URLs are blocked or allowed. Always test after changes.

What if I don’t have a robots.txt?

No robots.txt means crawlers can access everything. This is fine for most small sites. robots.txt is needed when you want to block specific paths.

Is my configuration saved?

No. All generation runs in your browser. Your rules aren’t logged or stored after you close the tab.

Additional resources

Advertisement

Related tools

All Web Tools

Learn more

Explore more tools

200+ free tools that run in your browser.

Browse all tools →