Question 1

What are the main AI crawlers and what do they do?

Accepted Answer

The main AI crawlers are GPTBot (OpenAI, feeds ChatGPT and training data), PerplexityBot (Perplexity search), ClaudeBot (Anthropic, feeds Claude), and Googlebot-Extended (Google AI Overviews). Each crawler discovers and indexes your content for its respective AI platform. Blocking any of these crawlers makes your content invisible to that platform.

Question 2

Can AI crawlers render JavaScript?

Accepted Answer

Most AI crawlers cannot execute JavaScript. They receive and index the raw HTML response from your server. If your site relies on client-side JavaScript rendering, the HTML sent to AI crawlers will be empty or incomplete. Server-side rendering (SSR) or static site generation (SSG) is required for AI crawlers to see your actual content.

Question 3

How do I check if my robots.txt blocks AI crawlers?

Accepted Answer

Review your robots.txt file at yourdomain.com/robots.txt. Look for User-agent: GPTBot, User-agent: PerplexityBot, and User-agent: ClaudeBot entries. If these are absent, the global User-agent: * rules apply. A Disallow: / rule under User-agent: * will block all crawlers. You should add explicit Allow rules for AI crawlers you want to permit.

Question 4

Can I allow some AI crawlers but not others?

Accepted Answer

Yes. You can configure robots.txt to allow specific AI crawlers and block others. Use User-agent: GPTBot with Disallow: / to block only GPTBot, while leaving other crawlers accessible. This gives you granular control over which AI platforms can index and cite your content.

AI Crawling

What are the main AI crawlers and what do they do?

Can AI crawlers render JavaScript?

How do I check if my robots.txt blocks AI crawlers?

Can I allow some AI crawlers but not others?