Real-World Robots.txt Examples

Learn from actual robots.txt files used by popular websites and see how they handle different use cases.

WordPress Blog

Protect admin areas while allowing content crawling

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Disallow: /trackback/
Disallow: /feed/
Disallow: */trackback/
Disallow: */feed/
Allow: /wp-content/uploads/

Sitemap: https://example.com/sitemap.xml
Blocks: Admin panel, plugins, themes, feeds
Allows: Uploaded media files (images, PDFs)
Best for: WordPress blogs, news sites

E-commerce Store

Protect checkout and user accounts

User-agent: *
Disallow: /checkout/
Disallow: /cart/
Disallow: /my-account/
Disallow: /order-tracking/
Disallow: /wishlist/
Disallow: /compare/
Disallow: /*?*
Allow: /*?p=*

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/product-sitemap.xml
Blocks: Checkout, cart, user accounts
Allows: Product pages with parameters
Best for: Online stores, marketplaces

SaaS Application

Index marketing pages only

User-agent: *
Disallow: /app/
Disallow: /dashboard/
Disallow: /api/
Disallow: /auth/
Disallow: /login
Disallow: /signup
Disallow: /admin/
Allow: /

Sitemap: https://example.com/sitemap.xml
Blocks: App, dashboard, API, authentication
Allows: Marketing pages, documentation, blog
Best for: SaaS products, web apps

News & Media Site

Maximize content discovery

User-agent: *
Disallow: /admin/
Disallow: /login/
Disallow: /search/
Disallow: /*?s=
Allow: /

User-agent: Googlebot-News
Allow: /

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/news-sitemap.xml
Blocks: Admin, search results
Allows: All articles, especially for Google News
Best for: News sites, magazines, blogs

Development/Staging

Block all search engines

User-agent: *
Disallow: /

# This blocks all search engines
# Use for development or staging sites
🚫Blocks: Everything from all bots
⚠️Warning: Don't use on production sites!
Best for: Staging, development, test sites

Portfolio/Agency

Simple and permissive

User-agent: *
Allow: /

Sitemap: https://example.com/sitemap.xml
Allows: Everything - full site crawling
Simple: Minimal configuration
Best for: Portfolios, agencies, small business sites

💡 Tips for Using These Examples

1️⃣
Customize URLs: Replace example.com with your actual domain and adjust paths to match your site structure.
2️⃣
Test Before Deployment: Use our URL Tester to verify rules work as expected.
3️⃣
Update Sitemap: Make sure your sitemap URL is correct and accessible.
4️⃣
Monitor Results: Check Google Search Console to see how crawlers interact with your robots.txt.