26 February 2026Derek Chua8 min read

Should You Block AI Crawlers? What Claude, ChatGPT, and Perplexity Are Actually Doing on Your Website

Anthropic now runs three separate crawlers, and blocking one doesn't block the others. Here's what each AI bot does and how to decide what to allow.

A robots.txt file on a laptop screen with AI company logos in the background

Back in 2023, a lot of website owners made a quick, confident decision: add Disallow: / for GPTBot to their robots.txt file. Done. AI can't touch our content. Problem solved.

Here's the issue. That single line doesn't actually block "AI." It blocks one specific crawler from one specific company for one specific purpose: OpenAI's training data collection. Three separate Anthropic bots, OpenAI's search crawler, Perplexity's indexer, and a handful of others were entirely unaffected.

Anthropic updated its crawler documentation this week. They now officially list three distinct bots. Each one does something different, and each one requires a separate robots.txt decision.

If you haven't revisited your AI crawler strategy since the initial GPTBot panic, you're probably not blocking what you think you're blocking. You might also be opting out of AI search visibility without realising it. For Singapore SMEs that depend on organic discovery to generate leads, that's a bigger deal than it sounds.

Let's sort through the actual landscape.

Why This Matters More Now Than in 2023

In 2023, blocking AI training data made intuitive sense. Companies were hoovering up web content, authors and publishers were upset, and the robots.txt block felt like a reasonable boundary.

But 2026 is a different situation. ChatGPT has a search feature that cites sources. Claude searches the web in real time. Perplexity is essentially a search engine. People are increasingly asking AI assistants the same questions they used to type into Google.

That changes the calculus. Blocking AI from your website isn't just a philosophical stance on training data anymore. It's a decision about whether you appear in AI-powered search results.

The two questions are now separate: "Can AI train on my content?" and "Will AI surface my business when someone searches?" Treating them as the same decision is how you accidentally opt out of the next generation of search.

Anthropic's Three Crawlers (Verified Feb 26, 2026)

This is fresh as of this week. From Anthropic's official support documentation:

ClaudeBot: Training data collection. Crawls public web content that may contribute to AI model training. Block this one if you don't want your content used for training Anthropic's models. It's the equivalent of GPTBot on the OpenAI side.

Claude-User: Real-time retrieval at a user's request. When someone asks Claude a question and Claude decides to browse the web for the answer, this is the bot that visits your site. Block it, and Claude won't retrieve your content when users ask about your industry, your products, or problems you solve. Your site becomes invisible to Claude's live web search.

Claude-SearchBot: Search indexing. Crawls the web to improve Anthropic's search result quality. Block it, and your site won't appear in Claude's search functionality.

The robots.txt syntax for each is straightforward:

# Block Anthropic training only
User-agent: ClaudeBot
Disallow: /

# Block Claude live retrieval
User-agent: Claude-User
Disallow: /

# Block Claude search indexing
User-agent: Claude-SearchBot
Disallow: /

These are independent. Blocking ClaudeBot has no effect on Claude-SearchBot.

OpenAI's Three Crawlers

OpenAI has had a similar three-way split for a while, though it's less widely understood:

GPTBot: Training data. The original one everyone blocked in 2023. Disallowing it signals that your content shouldn't be used in OpenAI's model training. It does nothing to your ChatGPT search visibility.

OAI-SearchBot: ChatGPT search results. This is the crawler that determines whether your site appears as a cited source when someone uses ChatGPT's search feature. Block GPTBot and leave OAI-SearchBot allowed, and you can appear in ChatGPT search while keeping the training opt-out. Block OAI-SearchBot, and your site disappears from ChatGPT's search citations entirely.

One nuance from OpenAI's documentation worth knowing: sites that opt out of OAI-SearchBot "will not be shown in ChatGPT search answers, though can still appear as navigational links." The distinction matters.

ChatGPT-User: User-initiated browsing. Similar to Claude-User, this handles cases where ChatGPT or a custom GPT browses a specific webpage at a user's direction. OpenAI notes that because these actions are initiated by users, robots.txt rules may not apply. It's not used for automatic web crawling or training data.

# Block OpenAI training only (ChatGPT search still works)
User-agent: GPTBot
Disallow: /

# Block ChatGPT search indexing
User-agent: OAI-SearchBot
Disallow: /

Perplexity: Simpler, But Still Worth Knowing

Perplexity operates with a single primary crawler: PerplexityBot. Unlike Anthropic and OpenAI, Perplexity doesn't make the same training-vs-search distinction public. Its crawler primarily serves Perplexity's search and answer generation. Block it, and you're opting out of Perplexity results entirely.

# Block Perplexity
User-agent: PerplexityBot
Disallow: /

Perplexity is smaller than ChatGPT and Claude in total user numbers, but it's disproportionately popular with technical and research-oriented users. For B2B businesses especially, it's worth thinking about before reflexively blocking.

The Decision Framework: What Are You Actually Choosing?

There are really three separate choices here.

1. AI training opt-out (ClaudeBot, GPTBot)

This is the original concern. If your content is proprietary, you're in a regulated industry, or you simply don't want your writing used to train AI models, block these two. There's no meaningful downside to search visibility: they handle training only.

2. AI search visibility (Claude-SearchBot, OAI-SearchBot, PerplexityBot)

These crawlers determine whether your business appears when someone searches with an AI assistant. Blocking them is a significant tradeoff: you're saying "I don't want to appear in AI search results." For most businesses, that's not the intention. The whole point of a website is to be found.

3. Live AI retrieval (Claude-User, ChatGPT-User)

These bots visit your site in real time when a user asks an AI assistant a question. Block them, and Claude and ChatGPT can't pull your content to answer a live query. For content you want discovered (service pages, guides, case studies), blocking this works against you.

The recommended starting position for most SMEs:

Allow Claude-SearchBot, OAI-SearchBot, PerplexityBot to appear in AI search
Allow Claude-User, ChatGPT-User for live retrieval
Block ClaudeBot and GPTBot if you want to opt out of training data use

This gives you AI search visibility while maintaining training data opt-out. It's the most commercially sensible default unless you have a specific reason to stay invisible to AI assistants.

What Your robots.txt Should Actually Look Like

If you want the recommended configuration (training opt-out, search visibility maintained):

# Block AI training crawlers
User-agent: ClaudeBot
Disallow: /

User-agent: GPTBot
Disallow: /

# AI search crawlers are allowed by default (no entry needed):
# Claude-SearchBot, OAI-SearchBot, PerplexityBot, Claude-User, ChatGPT-User

If you want to block all AI activity entirely:

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-User
Disallow: /

User-agent: Claude-SearchBot
Disallow: /

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

One important note from Anthropic's documentation: blocking by IP address doesn't work reliably for their bots, because they use service provider public IPs that change. The robots.txt file is the correct mechanism.

How to Check Your Current robots.txt

Before you make any changes, check what your site currently does. Visit yourdomain.com/robots.txt in a browser. Most sites either have nothing AI-related (which means everything is allowed by default) or a single GPTBot block added in 2023.

If you're on WordPress, you can edit robots.txt through Yoast SEO or directly via the WordPress admin under Tools. On Shopify or Squarespace, you'll need to edit through the theme or platform settings.

Make changes one bot at a time. Don't clear the file and start over unless you know exactly what each existing rule does.

The Bigger Picture

Two years ago, blocking AI crawlers was mostly symbolic. The AI search products were primitive, and the practical stakes were low.

That's changed. ChatGPT search has tens of millions of users. Claude is increasingly used for research and information retrieval. Perplexity is growing fast among professional users. For businesses that depend on being discovered online, appearing in these results is a legitimate consideration.

The blanket GPTBot block was an understandable response to a real concern. But the landscape has been reorganised. Training and search are now separate surfaces, operated by separate bots, controlled by separate robots.txt rules.

The smart move isn't "block all AI" or "allow all AI." It's understanding what each bot actually does, and making a deliberate choice about each one. That's a 15-minute task. It's probably worth more than another month of wondering why your traffic isn't growing.

Magnified helps SMEs build websites and SEO strategies that stay visible across both traditional and AI-powered search. If you want to make sure your site is set up correctly for the current landscape, talk to us.

Back to all articles