🤖

Robots.txt Generator

Generate robots.txt files with allow and disallow rules, sitemap references, and user-agent targeting to control search engine crawling.. Free, private — all processing in your browser.

Preset

Disallowed paths (one per line)

Crawl-delay (seconds, optional)

Block common AI crawlers (GPTBot, ClaudeBot, CCBot…)

Sitemap URL (optional)

Generated output

User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /checkout
Disallow: /*?

Sitemap: https://example.com/sitemap.xml

The Robots.txt Generator creates a valid robots.txt file for your website with allow/disallow rules, user-agent targeting, sitemap references, and crawl-delay directives. Robots.txt is the file search engines check first when crawling your site to determine which pages they can or cannot access. Used correctly, it prevents crawlers from wasting resources on admin pages, filter URLs, and other content that shouldn\u2019t appear in search. Used incorrectly, it can inadvertently hide your entire site from search engines.

Configure rules via structured forms: specify user agents (* for all, or specific like Googlebot, Bingbot), add disallow patterns for paths to block, allow patterns for exceptions, set crawl-delay (mostly for Bing), and reference your sitemap URL. The tool outputs standards-compliant robots.txt ready to upload to your site root. Common presets included: standard WordPress robots.txt, e-commerce with blocked checkout and cart, single-page app avoiding crawl of query variations. All generation runs in your browser.

Robots.txt Generator — key features

Rule builder

Configure allow/disallow rules via structured forms rather than hand-writing syntax.

User-agent targeting

Different rules for Googlebot, Bingbot, or all crawlers.

Common templates

Presets for WordPress, e-commerce, SPA, news site, and other common patterns.

Sitemap reference

Automatic inclusion of Sitemap: line with your sitemap URL.

Wildcard support

Wildcards * and $ supported with validation that the pattern matches as intended.

Validation

Checks syntax, detects common mistakes (blocking entire site, conflicting rules), and warns.

Preview

See the full robots.txt output before downloading.

Client-side only

Configuration data stays in your browser.

Under the hood

Robots.txt follows the Robots Exclusion Protocol (originally 1994). Google formalized it in 2019 as a proposed RFC. Placed at the root: https://example.com/robots.txt.

Basic syntax:

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/

Sitemap: https://example.com/sitemap.xml

Directives:
- User-agent: identifies which crawler the following rules apply to. * means all. Googlebot, Bingbot, Slurp, DuckDuckBot are specific.
- Disallow: path prefix or pattern to block
- Allow: exception to Disallow (specific path inside a disallowed directory)
- Crawl-delay: seconds between crawler requests (ignored by Google; respected by Bing, Yandex)
- Sitemap: URL of sitemap file (reference only, not rule)

Path matching:
- Disallow: /admin/ blocks /admin/ and everything inside
- Disallow: /page blocks /page, /pages, /page.html (prefix match)
- Disallow: /*.pdf$ blocks URLs ending with .pdf (wildcard)
- Disallow: / blocks everything (emergency use only)
- Disallow: (empty) allows everything for this user-agent

Google supports * and $ wildcards. Older crawlers may not.

Rule precedence:
- Most specific rule wins
- Allow beats Disallow when paths are equally specific
- Different User-agent blocks are independent; a specific user-agent matches first if named

Common patterns:
- Allow all: User-agent: * / Disallow: (empty)
- Block all: User-agent: * / Disallow: /
- Block admin: User-agent: * / Disallow: /admin/
- Block search parameters: Disallow: /*?s= (block internal search URLs)
- Block file types: Disallow: /*.pdf$ (block PDF indexing)

Important caveats:
- Robots.txt is a request, not an enforcement. Well-behaved crawlers respect it; malicious ones ignore it
- Blocking URLs in robots.txt does NOT deindex them. Google may keep previously indexed URLs. For deindexing, use noindex meta tag or X-Robots-Tag header
- Robots.txt blocks affect crawling but may not affect indexing if the page is linked from elsewhere
- Don\u2019t use robots.txt to hide sensitive content — it reveals the existence of paths. Use authentication or IP restriction
- Syntax errors can silently break robots.txt — always validate

Size limits: Google parses first 500 KiB of robots.txt. Anything beyond is ignored.

Case sensitivity: paths are case-sensitive on most servers. /Admin/ is different from /admin/ on Unix-based hosts.

When to use the Robots.txt Generator

SEO administration

→Block admin sections: Prevent /admin/, /wp-admin/, or similar paths from being crawled.
→Block duplicate content: Disallow filter, sort, and search URLs that generate many variants of the same content.
→Reference sitemap: Include Sitemap: directive so crawlers auto-discover your sitemap without submission.

Site development

→Staging site protection: Use robots.txt to signal that staging shouldn’t be indexed (though HTTP auth is more reliable).
→Block crawl waste: Prevent crawlers from hammering dynamic URL patterns that generate infinite variations.
→API endpoint exclusion: Block /api/ paths from crawling — they’re not meant for search engines.

Compliance

→AI crawler blocking: Disallow GPTBot, CCBot, Google-Extended for sites that don’t want content used in AI training.
→Archive.org control: Disallow ia_archiver if you want your site excluded from the Wayback Machine.
→Social crawler exceptions: Allow specific bots (facebookexternalhit, Twitterbot) even when blocking others.

Worked examples

Basic WordPress

Standard WordPress site.

Input

WP template

Output

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: https://example.com/sitemap.xml

E-commerce

Block cart, checkout, search.

Input

e-commerce template

Output

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /search?

Sitemap: https://example.com/sitemap.xml

Allow all

Permissive robots.txt.

Input

allow all

Output

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

Block AI crawlers

Prevent content use in AI training.

Input

block GPTBot and others

Output

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

File type block

Prevent PDF indexing.

Input

disallow PDFs

Output

User-agent: *
Disallow: /*.pdf$

Troubleshooting

⚠Accidentally blocking whole site

Disallow: / blocks everything. Common mistake on staging sites that gets pushed to production. Always verify with Google Search Console robots.txt tester after changes.

⚠Robots.txt doesn’t deindex

Blocking a URL via robots.txt doesn’t remove it from Google’s index if it was already indexed. Use noindex meta tag (and let Google crawl the page to see it) for deindexing.

⚠Case sensitivity

Disallow: /Admin/ won’t block /admin/ on Unix servers. Match exact case of your actual URLs.

⚠Pattern too broad

Disallow: /page blocks /page, /pages, /pagecount, /page.html. Use /page/ or /page$ for more specific matching.

⚠Multiple Allow beats Disallow

Allow: /public/ and Disallow: /public/private.html — private.html is blocked because Disallow is more specific. Order doesn’t matter; specificity does.

⚠Syntax error silently breaks file

Robots.txt is tolerant of errors — broken lines are ignored. This can lead to rules not applying as expected. Validate with Google’s testing tool.

⚠Revealing secret paths

Disallow: /secret-admin/ reveals the existence of that path to anyone who reads robots.txt. Use proper authentication, not obscurity via robots.txt.

Alternatives and comparisons

Compared to writing robots.txt by hand, this tool provides structured forms and validates against common mistakes. Hand-writing is possible but error-prone.

Compared to CMS auto-generated robots.txt (WordPress, Shopify), this tool gives manual control for sites without auto-generation or when CMS defaults don\u2019t fit.

Compared to Google\u2019s robots.txt tester, this tool is for generation; the tester is for validation. Generate here, test there.

Frequently asked questions about the Robots.txt Generator

▶What is robots.txt?

A text file at your site root (example.com/robots.txt) that tells search engine crawlers which URLs they can or cannot access. Standard since 1994. Used for controlling crawl behavior, not for hiding content (since malicious bots can ignore it).

▶Does robots.txt affect rankings?

Not directly. It affects what gets crawled. If important pages are accidentally blocked, they won’t be indexed and therefore won’t rank. So correct robots.txt is important for SEO by preventing accidental blocking.

▶What’s the difference between robots.txt and noindex?

Robots.txt blocks crawling (page may still be indexed via external links). Noindex meta tag blocks indexing (page is still crawled). Use robots.txt to save crawl budget; use noindex to prevent pages from appearing in search results.

▶Can I use wildcards in robots.txt?

Yes. * matches any sequence of characters; $ anchors to end. Google supports both. Disallow: /*.pdf$ blocks all .pdf URLs. Not all crawlers support wildcards, so check your target crawlers.

▶Should I block AI crawlers?

Depends on your preference. GPTBot (OpenAI), CCBot (Common Crawl, used by many AI companies), and Google-Extended (Gemini/Bard training) are common targets. Blocking them prevents your content from being used in AI training but may limit future AI features that reference your site.

▶How do I test my robots.txt?

Google Search Console has a robots.txt tester that validates syntax and lets you test whether specific URLs are blocked or allowed. Always test after changes.

▶What if I don’t have a robots.txt?

No robots.txt means crawlers can access everything. This is fine for most small sites. robots.txt is needed when you want to block specific paths.

▶Is my configuration saved?

No. All generation runs in your browser. Your rules aren’t logged or stored after you close the tab.

Additional resources

Google on robots.txt — Google’s authoritative guide to robots.txt creation and semantics.
Robots Exclusion Protocol RFC — Formalized 2022 RFC for robots.txt specification.
Robots.txt tester in Search Console — Google’s validation tool for checking your robots.txt behavior.
robotstxt.org — Community reference site for robots.txt history and conventions.
Ahrefs robots.txt guide — Practical SEO guidance on robots.txt best practices.