Ask ChatGPT about your product category and watch who gets cited. If it's never you, there are exactly three places the failure can live: access (the crawler can't reach you), extraction (it reaches you but sees nothing), or grounding (it sees you but won't cite you). This guide shows you how to test all three in under ten minutes — by hand, or in one automated AEO scan.
Every major answer engine announces itself with a user agent and obeys robots.txt:
| Engine | Crawler user agents |
|---|---|
| ChatGPT / OpenAI | GPTBot, OAI-SearchBot, ChatGPT-User |
| Claude / Anthropic | ClaudeBot, Claude-SearchBot, Claude-User |
| Perplexity | PerplexityBot, Perplexity-User |
| Google AI / Gemini | Google-Extended |
| Microsoft Copilot | Bingbot (shared with Bing search) |
| Meta AI | Meta-ExternalAgent |
| Mistral | MistralAI-User |
Open https://yourdomain.com/robots.txt and look for any of those names under a Disallow: / rule. Watch for the sneaky version — a template or "privacy-focused" robots.txt you copied in 2024 that blocks everything except Googlebot.
The fix: delete the block, or scope it to paths you genuinely want private. Blocking AI crawlers is a legitimate choice — but it should be a decision, not an inherited accident.
robots.txt is only half of access. Cloudflare, Vercel's firewall, AWS WAF, and most bot-protection products ship one-click "block AI bots" toggles — and some enable them by default on new plans.
Test it by fetching your site as an AI crawler:
curl -A "GPTBot/1.2 (+https://openai.com/gptbot)" -I https://yourdomain.com
A 403, 429, or a challenge page means OpenAI's crawler gets stonewalled no matter what robots.txt says. Repeat with ClaudeBot and PerplexityBot — WAF rules often treat each one differently.
The fix: in your WAF dashboard, find the AI/bot-blocking rule and allow the specific crawlers you want. Keep blocking the scrapers you don't.
Most AI crawlers do not execute JavaScript. If your site is client-side rendered, the bot may receive an empty <div id="root"></div> where your content should be.
Test it:
curl -s https://yourdomain.com | grep -i "your headline text"
If your headline isn't in the raw HTML, answer engines can't read it. (Googlebot can — which is exactly why this failure hides behind good Google rankings.)
The fix: server-side render or statically generate every page you want cited. Next.js, Astro, SvelteKit, and Remix all do this by default — but client-only useEffect data fetching, auth-gated shells, and "loading…" skeletons still leak into marketing pages constantly. Vibe-coded apps are especially prone to this.
Access gets you read. Structure gets you quoted. Answer engines lift passages, so audit your key pages for:
The full playbook is in our AEO explainer, and the structural overlap with classic search is covered in SEO vs AEO.
Everything above can regress with one robots.txt edit, one WAF update, or one framework migration — and nothing will alert you. There is no Search Console for ChatGPT.
CheckVibe's AEO scanner runs all of these tests automatically:
Run a free scan and you'll know in about a minute whether the AI engines your customers use can actually see you.
Check your server logs for the user agents GPTBot, OAI-SearchBot, or ChatGPT-User with OpenAI's published IP ranges. No hits in weeks usually means you're blocked at robots.txt or WAF level — or your site has too little linked, extractable content to attract crawls.
Because Googlebot renders JavaScript and is whitelisted by every WAF, while AI crawlers mostly don't render JS and are frequently blocked by default bot rules. Your site can be page one on Google and a blank 403 to GPTBot simultaneously. Test each engine separately.
If you sell content itself (journalism, research, courses), blocking training crawlers can be rational. For products and services, blocking answer engines mostly means your competitors get recommended instead of you. You can also split the difference: allow search-time crawlers like OAI-SearchBot while blocking training-time crawlers like GPTBot.
llms.txt is an emerging convention — a markdown file at /llms.txt that gives language models a curated index of your site's most important content. It's cheap to add and several tools already read it. It complements rather than replaces robots.txt: robots.txt controls access, llms.txt aids understanding.
Search-time crawlers (OAI-SearchBot, PerplexityBot, Claude-SearchBot) re-fetch popular pages within days. Training-data inclusion moves on model-release timescales — months. Fix access today and the citation benefits compound from the next crawl onward.
Paste your URL and get a security report in 30 seconds. 100+ automated checks with AI-powered fix prompts.
Scan your site freeRelated articles
SEO gets you ranked. AEO gets you cited by ChatGPT, Claude, and Perplexity. A practical breakdown of where they overlap, where they diverge, and how to win both with one workflow.
AEO (Answer Engine Optimization) is how you get cited by ChatGPT, Claude, Perplexity, and Google AI Overviews. Here's what it is, how it differs from SEO, and exactly how to optimize for it.