AI Crawling
How AI crawlers like GPTBot, ClaudeBot, and PerplexityBot work, what they index, and how your robots.txt and rendering choices affect what they see.
AI crawling refers to how AI companies like OpenAI, Anthropic, and Perplexity collect web content using their own crawlers. Understanding which crawlers exist, what they can access, and how to configure your site for them is foundational to AI visibility.
[ Coming soon ]
Articles in this category are in progress. Follow @MattQR on X to be notified when they publish.
The major AI crawlers include GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and Googlebot-Extended (Google AI). Each crawler indexes content for its respective AI system. GPTBot feeds OpenAI training data and ChatGPT Search. PerplexityBot feeds Perplexity search results directly. Blocking any of these crawlers prevents the corresponding AI system from including your content.
AI crawlers generally do not execute JavaScript. This is a critical distinction from Googlebot, which has improved JavaScript rendering over time. A React or Next.js application that renders entirely on the client side will return empty HTML to AI crawlers, making the content invisible regardless of how good it is. Server-side rendering or static generation is required for AI crawl visibility.
Your robots.txt file is the primary control mechanism for AI crawler access. Overly broad Disallow rules (e.g., User-agent: * Disallow: /) will block all crawlers including AI bots. Each AI crawler should be explicitly listed with appropriate access rules. Review your robots.txt to ensure all major AI crawlers have access to your public content.
Common questions
What are the main AI crawlers and what do they do?
The main AI crawlers are GPTBot (OpenAI, feeds ChatGPT and training data), PerplexityBot (Perplexity search), ClaudeBot (Anthropic, feeds Claude), and Googlebot-Extended (Google AI Overviews). Each crawler discovers and indexes your content for its respective AI platform. Blocking any of these crawlers makes your content invisible to that platform.
Can AI crawlers render JavaScript?
Most AI crawlers cannot execute JavaScript. They receive and index the raw HTML response from your server. If your site relies on client-side JavaScript rendering, the HTML sent to AI crawlers will be empty or incomplete. Server-side rendering (SSR) or static site generation (SSG) is required for AI crawlers to see your actual content.
How do I check if my robots.txt blocks AI crawlers?
Review your robots.txt file at yourdomain.com/robots.txt. Look for User-agent: GPTBot, User-agent: PerplexityBot, and User-agent: ClaudeBot entries. If these are absent, the global User-agent: * rules apply. A Disallow: / rule under User-agent: * will block all crawlers. You should add explicit Allow rules for AI crawlers you want to permit.
Can I allow some AI crawlers but not others?
Yes. You can configure robots.txt to allow specific AI crawlers and block others. Use User-agent: GPTBot with Disallow: / to block only GPTBot, while leaving other crawlers accessible. This gives you granular control over which AI platforms can index and cite your content.
Related resources