Robots.txt: The 2026 Best Practices Guide

Robots.txt is the first file Googlebot fetches. A single misplaced character can hide your entire site from search. This guide covers the safe defaults and the rules you should actually use.

Never block CSS or JS

Google needs to render your page like a user. Disallowing /js/ or /css/ breaks rendering and tanks rankings. Allow them explicitly even when you allow everything else.

Use Disallow for crawl budget, not security

Robots.txt blocks crawling, not indexing. URLs blocked in robots.txt can still appear in search results if other sites link to them — they'll just have no snippet. Use noindex meta tags for real exclusion.

Reference your sitemap

Add 'Sitemap: https://example.com/sitemap.xml' at the bottom. Helps every crawler — Google, Bing, DuckDuckGo — find your URL inventory.

Test before you deploy

Search Console's robots.txt Tester catches syntax errors before they hit production. One typo can cost you weeks of traffic.

Free tools to apply this

FAQ

Should I block the AI crawlers (GPTBot, Claude-Web)?

Depends on your strategy. Blocking them prevents your content from training models — but also from being cited in AI answers. Most publishers now let them crawl but watch attribution patterns.

Is robots.txt case-sensitive?

Paths are case-sensitive (Disallow: /Admin won't block /admin). Directives (User-agent, Disallow) are not.

More SEO guides