Looking for the llms.txt WordPress plugin or documentation? Click Here

Comprehensive List of AI Search Engine Crawlers

Comprehensive List of AI Search Engine Crawlers

This is a comprehensive list of AI-related crawler user agents, including search engine crawlers known to feed AI systems.

Major AI Company Crawlers

OpenAI

  • GPTBot
  • ChatGPT-User
  • OAI-SearchBot

Anthropic

  • ClaudeBot
  • Claude-Web
  • Anthropic-ai

Google

  • Googlebot
  • GoogleOther
  • Google-CloudVertexBot
  • Google-Extended

Microsoft/Bing

  • bingbot
  • msnbot

Meta

  • FacebookBot
  • Meta-ExternalAgent
  • Meta-ExternalFetcher

ByteDance

  • Bytespider

Other Commercial AI Crawlers

Amazon

  • Amazonbot

Apple

  • Applebot-Extended

Huawei

  • PetalBot

Cohere

  • cohere-ai

Perplexity

  • PerplexityBot

Research & Data Collection Crawlers

Common Crawl

  • CCBot

Data Collection

  • DataForSeoBot
  • img2dataset
  • ImagesiftBot

Additional AI Crawlers

Search & Analysis

  • AwarioRssBot
  • AwarioSmartBot
  • Diffbot
  • magpie-crawler
  • Seekr
  • YouBot
  • omgili
  • omgilibot
  • peer39_crawler

User Agent Strings

<code>

# OpenAI
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: OAI-SearchBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0

# Microsoft/Bing
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0) Chrome/W.X.Y.Z Safari/537.36

# Google
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
User-agent: Googlebot
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: Google-CloudVertexBot

# Anthropic
User-agent: anthropic-ai
User-agent: ClaudeBot
User-agent: Claude-Web

# Meta/Facebook
User-agent: FacebookBot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher

# Others
User-agent: Bytespider
User-agent: CCBot
User-agent: cohere-ai
User-agent: PerplexityBot
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: omgili
User-agent: omgilibot
User-agent: Diffbot
User-agent: YouBot
User-agent: Applebot-Extended
User-agent: AwarioRssBot
User-agent: AwarioSmartBot
User-agent: DataForSeoBot
User-agent: magpie-crawler
User-agent: peer39_crawler
User-agent: Seekr
</code>

Most active AI-specific crawlers based on website access share:

Crawler Website Access Share
Bytespider 40.40%
GPTBot 35.46%
ClaudeBot 11.17%
ImagesiftBot 8.75%
CCBot 2.14%