Tag

GPTBot

0 articles tagged with GPTBot

GPTBot is OpenAI's web crawler used to collect training data and power ChatGPT Search. Allowing GPTBot access is a prerequisite for appearing in ChatGPT citations and for contributing to OpenAI model training.

[ Coming soon ]

Articles with this tag are in progress. Follow @MattQR on X to be notified when they publish.

GPTBot identifies itself with the user agent string "GPTBot" and respects robots.txt rules. Many sites accidentally block GPTBot through overly broad Disallow rules applied to all crawlers. To allow GPTBot, add a specific User-agent: GPTBot entry in your robots.txt with appropriate Allow directives. GPTBot does not execute JavaScript, so server-rendered content is required for full indexing. Pages that are accessible to GPTBot, server-rendered, and structurally clear are the ones that appear as ChatGPT Search citations.

Common questions

What is GPTBot and what does it do?

GPTBot is OpenAI's web crawler that collects web content for AI training data and real-time retrieval for ChatGPT Search. Sites that allow GPTBot can have their content indexed and cited by ChatGPT. Blocking GPTBot prevents your content from appearing in ChatGPT responses.

How do I allow or block GPTBot?

Control GPTBot access through your robots.txt file. Add "User-agent: GPTBot" followed by "Allow: /" to permit full access, or "Disallow: /" to block it entirely. You can also allow or block specific paths. Without an explicit rule, the User-agent: * rules apply.

All articles →