Looking for the llms.txt WordPress plugin or documentation? Click Here

Comprehensive List of AI Search Engine Crawlers

This is a comprehensive list of AI-related crawler user agents, including search engine crawlers known to feed AI systems.

Major AI Company Crawlers

OpenAI

  • GPTBot
  • ChatGPT-User
  • OAI-SearchBot

Anthropic

  • ClaudeBot
  • Claude-Web
  • Anthropic-ai

Google

  • Googlebot
  • GoogleOther
  • Google-CloudVertexBot
  • Google-Extended

Microsoft/Bing

  • bingbot
  • msnbot

Meta

  • FacebookBot
  • Meta-ExternalAgent
  • Meta-ExternalFetcher

ByteDance

  • Bytespider

Other Commercial AI Crawlers

Amazon

  • Amazonbot

Apple

  • Applebot-Extended

Huawei

  • PetalBot

Cohere

  • cohere-ai

Perplexity

  • PerplexityBot

Research & Data Collection Crawlers

Common Crawl

  • CCBot

Data Collection

  • DataForSeoBot
  • img2dataset
  • ImagesiftBot

Additional AI Crawlers

Search & Analysis

  • AwarioRssBot
  • AwarioSmartBot
  • Diffbot
  • magpie-crawler
  • Seekr
  • YouBot
  • omgili
  • omgilibot
  • peer39_crawler

User Agent Strings

<code>

# OpenAI
User-agent: GPTBot
User-agent: ChatGPT-User
User-agent: OAI-SearchBot
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI-SearchBot/1.0

# Microsoft/Bing
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0) Chrome/W.X.Y.Z Safari/537.36

# Google
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
User-agent: Googlebot
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: Google-CloudVertexBot

# Anthropic
User-agent: anthropic-ai
User-agent: ClaudeBot
User-agent: Claude-Web

# Meta/Facebook
User-agent: FacebookBot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher

# Others
User-agent: Bytespider
User-agent: CCBot
User-agent: cohere-ai
User-agent: PerplexityBot
User-agent: ImagesiftBot
User-agent: img2dataset
User-agent: omgili
User-agent: omgilibot
User-agent: Diffbot
User-agent: YouBot
User-agent: Applebot-Extended
User-agent: AwarioRssBot
User-agent: AwarioSmartBot
User-agent: DataForSeoBot
User-agent: magpie-crawler
User-agent: peer39_crawler
User-agent: Seekr
</code>

Most active AI-specific crawlers based on website access share:

Crawler Website Access Share
Bytespider 40.40%
GPTBot 35.46%
ClaudeBot 11.17%
ImagesiftBot 8.75%
CCBot 2.14%

How to Improve Rankings in Gen AI Search Engines

AI-powered search engines like ChatGPT, Perplexity, and Google Gemini prioritize contextual understanding over static keyword matching, and are revolutionizing how users find information. In this landscape, Generative Engine Optimization (GEO) emerges as a vital strategy. This article delves into the importance of GEO for businesses, emphasizing how it enhances visibility and engagement in AI-driven search environments.

How GEO Drives Visibility

Semantic Alignment

GEO ensures content matches AI’s query interpretation by focusing on:

  • Entity recognition (e.g., products or services).
  • Semantic relevance (e.g., aligning with conversational intent).

Schema Markup for Enhanced AI Comprehension

Structured data, such as FAQ schema, provides clarity for AI engines, enabling content to be featured prominently in generative search results.

Integrating llm.txt for Optimal GEO Performance

The recent proposal of llm.txt by Jimmy Howard introduces a standardized way to guide LLMs in interpreting website content. This file functions as a directive for AI systems, ensuring better visibility and control over AI-generated content.

Advantages of llm.txt:

  • Clarifies which content sections should be prioritized in AI summaries.
  • Boosts alignment with user intents captured by conversational AI.
  • Facilitates faster integration of generative capabilities with existing GEO efforts.

By combining the principles of GEO with innovative tools like llm.txt, businesses can adapt to the changing landscape of AI-powered search engines and maintain a competitive edge.