robots.txt
0 articles tagged with robots.txt
The robots.txt file is a plain text file at the root of your domain that controls which crawlers can access which parts of your site. It is the primary mechanism for managing AI crawler access and a common source of AI visibility failures.
[ Coming soon ]
Articles with this tag are in progress. Follow @MattQR on X to be notified when they publish.
Many sites have robots.txt configurations that were set up for traditional SEO purposes and accidentally block AI crawlers. A common failure pattern is using "User-agent: * Disallow: /admin/" combined with broader rules that restrict crawl access. AI crawlers follow robots.txt rules strictly. If GPTBot, PerplexityBot, or ClaudeBot are blocked, those platforms cannot index your content regardless of how excellent it is. Audit your robots.txt specifically for AI crawler access and add explicit Allow rules for each major AI crawler you want to permit.
Common questions
How does robots.txt affect AI visibility?
Robots.txt controls which crawlers can access which pages on your site. AI crawlers like GPTBot, PerplexityBot, and ClaudeBot follow robots.txt rules. If these crawlers are blocked, the corresponding AI platforms cannot index or cite your content. A misconfigured robots.txt is one of the most common and easiest-to-fix AI visibility failures.
What AI crawlers should I allow in robots.txt?
Allow GPTBot (OpenAI/ChatGPT), PerplexityBot (Perplexity), ClaudeBot (Anthropic/Claude), and Googlebot-Extended (Google AI). Add each as a separate User-agent entry with explicit Allow directives for the paths you want indexed. Review your robots.txt to ensure existing Disallow rules do not accidentally block these crawlers.
All tags
Related resources