Robots.txt Generator
Web ToolsGenerate robots.txt files with allow and disallow rules, sitemap references, and user-agent targeting to control search engine crawling.. Free, private — all processing in your browser.
This tool is coming soon. Check back later!
The Robots.txt Generator creates a valid robots.txt file for your website with allow/disallow rules, user-agent targeting, sitemap references, and crawl-delay directives. Robots.txt is the file search engines check first when crawling your site to determine which pages they can or cannot access. Used correctly, it prevents crawlers from wasting resources on admin pages, filter URLs, and other content that shouldn\u2019t appear in search. Used incorrectly, it can inadvertently hide your entire site from search engines.
Configure rules via structured forms: specify user agents (* for all, or specific like Googlebot, Bingbot), add disallow patterns for paths to block, allow patterns for exceptions, set crawl-delay (mostly for Bing), and reference your sitemap URL. The tool outputs standards-compliant robots.txt ready to upload to your site root. Common presets included: standard WordPress robots.txt, e-commerce with blocked checkout and cart, single-page app avoiding crawl of query variations. All generation runs in your browser.
Robots.txt Generator — key features
Rule builder
Configure allow/disallow rules via structured forms rather than hand-writing syntax.
User-agent targeting
Different rules for Googlebot, Bingbot, or all crawlers.
Common templates
Presets for WordPress, e-commerce, SPA, news site, and other common patterns.
Sitemap reference
Automatic inclusion of Sitemap: line with your sitemap URL.
Wildcard support
Wildcards * and $ supported with validation that the pattern matches as intended.
Validation
Checks syntax, detects common mistakes (blocking entire site, conflicting rules), and warns.
Preview
See the full robots.txt output before downloading.
Client-side only
Configuration data stays in your browser.
How to use the Robots.txt Generator
- 1
Pick a template or start blank
Use a common preset or configure from scratch.
- 2
Add user-agent blocks
For each user-agent (* for all, or specific), define allow and disallow rules.
- 3
Add sitemap reference
Include your sitemap URL so crawlers discover it via robots.txt.
- 4
Review and validate
Check the generated robots.txt for syntax issues and unexpected blocks.
- 5
Upload
Download robots.txt and upload to your site root (https://example.com/robots.txt).
Common use cases for the Robots.txt Generator
SEO administration
- →Block admin sections: Prevent /admin/, /wp-admin/, or similar paths from being crawled.
- →Block duplicate content: Disallow filter, sort, and search URLs that generate many variants of the same content.
- →Reference sitemap: Include Sitemap: directive so crawlers auto-discover your sitemap without submission.
Site development
- →Staging site protection: Use robots.txt to signal that staging shouldn’t be indexed (though HTTP auth is more reliable).
- →Block crawl waste: Prevent crawlers from hammering dynamic URL patterns that generate infinite variations.
- →API endpoint exclusion: Block /api/ paths from crawling — they’re not meant for search engines.
Compliance
- →AI crawler blocking: Disallow GPTBot, CCBot, Google-Extended for sites that don’t want content used in AI training.
- →Archive.org control: Disallow ia_archiver if you want your site excluded from the Wayback Machine.
- →Social crawler exceptions: Allow specific bots (facebookexternalhit, Twitterbot) even when blocking others.
Robots.txt Generator — examples
Basic WordPress
Standard WordPress site.
WP template
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Sitemap: https://example.com/sitemap.xml
E-commerce
Block cart, checkout, search.
e-commerce template
User-agent: * Disallow: /cart/ Disallow: /checkout/ Disallow: /search? Sitemap: https://example.com/sitemap.xml
Allow all
Permissive robots.txt.
allow all
User-agent: * Disallow: Sitemap: https://example.com/sitemap.xml
Block AI crawlers
Prevent content use in AI training.
block GPTBot and others
User-agent: GPTBot Disallow: / User-agent: CCBot Disallow: / User-agent: Google-Extended Disallow: /
File type block
Prevent PDF indexing.
disallow PDFs
User-agent: * Disallow: /*.pdf$
Technical details
Robots.txt follows the Robots Exclusion Protocol (originally 1994). Google formalized it in 2019 as a proposed RFC. Placed at the root: https://example.com/robots.txt.
Basic syntax:
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /public/
Sitemap: https://example.com/sitemap.xml
Directives:
- User-agent: identifies which crawler the following rules apply to. * means all. Googlebot, Bingbot, Slurp, DuckDuckBot are specific.
- Disallow: path prefix or pattern to block
- Allow: exception to Disallow (specific path inside a disallowed directory)
- Crawl-delay: seconds between crawler requests (ignored by Google; respected by Bing, Yandex)
- Sitemap: URL of sitemap file (reference only, not rule)
Path matching:
- Disallow: /admin/ blocks /admin/ and everything inside
- Disallow: /page blocks /page, /pages, /page.html (prefix match)
- Disallow: /*.pdf$ blocks URLs ending with .pdf (wildcard)
- Disallow: / blocks everything (emergency use only)
- Disallow: (empty) allows everything for this user-agent
Google supports * and $ wildcards. Older crawlers may not.
Rule precedence:
- Most specific rule wins
- Allow beats Disallow when paths are equally specific
- Different User-agent blocks are independent; a specific user-agent matches first if named
Common patterns:
- Allow all: User-agent: * / Disallow: (empty)
- Block all: User-agent: * / Disallow: /
- Block admin: User-agent: * / Disallow: /admin/
- Block search parameters: Disallow: /*?s= (block internal search URLs)
- Block file types: Disallow: /*.pdf$ (block PDF indexing)
Important caveats:
- Robots.txt is a request, not an enforcement. Well-behaved crawlers respect it; malicious ones ignore it
- Blocking URLs in robots.txt does NOT deindex them. Google may keep previously indexed URLs. For deindexing, use noindex meta tag or X-Robots-Tag header
- Robots.txt blocks affect crawling but may not affect indexing if the page is linked from elsewhere
- Don\u2019t use robots.txt to hide sensitive content — it reveals the existence of paths. Use authentication or IP restriction
- Syntax errors can silently break robots.txt — always validate
Size limits: Google parses first 500 KiB of robots.txt. Anything beyond is ignored.
Case sensitivity: paths are case-sensitive on most servers. /Admin/ is different from /admin/ on Unix-based hosts.
Common problems and solutions
⚠Accidentally blocking whole site
Disallow: / blocks everything. Common mistake on staging sites that gets pushed to production. Always verify with Google Search Console robots.txt tester after changes.
⚠Robots.txt doesn’t deindex
Blocking a URL via robots.txt doesn’t remove it from Google’s index if it was already indexed. Use noindex meta tag (and let Google crawl the page to see it) for deindexing.
⚠Case sensitivity
Disallow: /Admin/ won’t block /admin/ on Unix servers. Match exact case of your actual URLs.
⚠Pattern too broad
Disallow: /page blocks /page, /pages, /pagecount, /page.html. Use /page/ or /page$ for more specific matching.
⚠Multiple Allow beats Disallow
Allow: /public/ and Disallow: /public/private.html — private.html is blocked because Disallow is more specific. Order doesn’t matter; specificity does.
⚠Syntax error silently breaks file
Robots.txt is tolerant of errors — broken lines are ignored. This can lead to rules not applying as expected. Validate with Google’s testing tool.
⚠Revealing secret paths
Disallow: /secret-admin/ reveals the existence of that path to anyone who reads robots.txt. Use proper authentication, not obscurity via robots.txt.
Robots.txt Generator — comparisons and alternatives
Compared to writing robots.txt by hand, this tool provides structured forms and validates against common mistakes. Hand-writing is possible but error-prone.
Compared to CMS auto-generated robots.txt (WordPress, Shopify), this tool gives manual control for sites without auto-generation or when CMS defaults don\u2019t fit.
Compared to Google\u2019s robots.txt tester, this tool is for generation; the tester is for validation. Generate here, test there.
Frequently asked questions about the Robots.txt Generator
▶What is robots.txt?
A text file at your site root (example.com/robots.txt) that tells search engine crawlers which URLs they can or cannot access. Standard since 1994. Used for controlling crawl behavior, not for hiding content (since malicious bots can ignore it).
▶Does robots.txt affect rankings?
Not directly. It affects what gets crawled. If important pages are accidentally blocked, they won’t be indexed and therefore won’t rank. So correct robots.txt is important for SEO by preventing accidental blocking.
▶What’s the difference between robots.txt and noindex?
Robots.txt blocks crawling (page may still be indexed via external links). Noindex meta tag blocks indexing (page is still crawled). Use robots.txt to save crawl budget; use noindex to prevent pages from appearing in search results.
▶Can I use wildcards in robots.txt?
Yes. * matches any sequence of characters; $ anchors to end. Google supports both. Disallow: /*.pdf$ blocks all .pdf URLs. Not all crawlers support wildcards, so check your target crawlers.
▶Should I block AI crawlers?
Depends on your preference. GPTBot (OpenAI), CCBot (Common Crawl, used by many AI companies), and Google-Extended (Gemini/Bard training) are common targets. Blocking them prevents your content from being used in AI training but may limit future AI features that reference your site.
▶How do I test my robots.txt?
Google Search Console has a robots.txt tester that validates syntax and lets you test whether specific URLs are blocked or allowed. Always test after changes.
▶What if I don’t have a robots.txt?
No robots.txt means crawlers can access everything. This is fine for most small sites. robots.txt is needed when you want to block specific paths.
▶Is my configuration saved?
No. All generation runs in your browser. Your rules aren’t logged or stored after you close the tab.
Additional resources
- Google on robots.txt — Google’s authoritative guide to robots.txt creation and semantics.
- Robots Exclusion Protocol RFC — Formalized 2022 RFC for robots.txt specification.
- Robots.txt tester in Search Console — Google’s validation tool for checking your robots.txt behavior.
- robotstxt.org — Community reference site for robots.txt history and conventions.
- Ahrefs robots.txt guide — Practical SEO guidance on robots.txt best practices.
Related tools
All Web ToolsCanonical URL Checker
Check canonical URL tags on any page and detect missing, wrong, or conflicting canonicals that cause SEO duplicate content issues.
CSP Header Generator
Generate Content Security Policy headers with a structured form for all CSP directives, trusted sources, and advanced features.
Heading Structure Checker
Analyze H1-H6 heading structure on any page — detect missing headings, multiple H1s, broken hierarchy, and accessibility problems.
Htaccess Generator
Generate Apache .htaccess files with redirects, URL rewrites, password protection, compression, caching, and security headers.
JSON Formatter
Format, validate, and beautify JSON instantly in your browser
Meta Tag Generator
Generate SEO meta tags, Open Graph, Twitter Cards, and canonical tags
Learn more
Explore more tools
200+ free tools that run in your browser.
Browse all tools →