Check if ChatGPT Can See Your Website

Ask ChatGPT about your product category and watch who gets cited. If it's never you, there are exactly three places the failure can live: access (the crawler can't reach you), extraction (it reaches you but sees nothing), or grounding (it sees you but won't cite you). This guide shows you how to test all three in under ten minutes — by hand, or in one automated AEO scan.

Step 1: Check robots.txt for AI crawler blocks

Every major answer engine announces itself with a user agent and obeys robots.txt:

Engine	Crawler user agents
ChatGPT / OpenAI	`GPTBot`, `OAI-SearchBot`, `ChatGPT-User`
Claude / Anthropic	`ClaudeBot`, `Claude-SearchBot`, `Claude-User`
Perplexity	`PerplexityBot`, `Perplexity-User`
Google AI / Gemini	`Google-Extended`
Microsoft Copilot	`Bingbot` (shared with Bing search)
Meta AI	`Meta-ExternalAgent`
Mistral	`MistralAI-User`

Open https://yourdomain.com/robots.txt and look for any of those names under a Disallow: / rule. Watch for the sneaky version — a template or "privacy-focused" robots.txt you copied in 2024 that blocks everything except Googlebot.

The fix: delete the block, or scope it to paths you genuinely want private. Blocking AI crawlers is a legitimate choice — but it should be a decision, not an inherited accident.

Step 2: Check your WAF and bot protection

robots.txt is only half of access. Cloudflare, Vercel's firewall, AWS WAF, and most bot-protection products ship one-click "block AI bots" toggles — and some enable them by default on new plans.

Test it by fetching your site as an AI crawler:

curl -A "GPTBot/1.2 (+https://openai.com/gptbot)" -I https://yourdomain.com

A 403, 429, or a challenge page means OpenAI's crawler gets stonewalled no matter what robots.txt says. Repeat with ClaudeBot and PerplexityBot — WAF rules often treat each one differently.

The fix: in your WAF dashboard, find the AI/bot-blocking rule and allow the specific crawlers you want. Keep blocking the scrapers you don't.

Step 3: Check what the crawler actually sees

Most AI crawlers do not execute JavaScript. If your site is client-side rendered, the bot may receive an empty <div id="root"></div> where your content should be.

Test it:

curl -s https://yourdomain.com | grep -i "your headline text"

If your headline isn't in the raw HTML, answer engines can't read it. (Googlebot can — which is exactly why this failure hides behind good Google rankings.)

The fix: server-side render or statically generate every page you want cited. Next.js, Astro, SvelteKit, and Remix all do this by default — but client-only useEffect data fetching, auth-gated shells, and "loading…" skeletons still leak into marketing pages constantly. Vibe-coded apps are especially prone to this.

Step 4: Check whether your content is worth citing

Access gets you read. Structure gets you quoted. Answer engines lift passages, so audit your key pages for:

A plain definition near the top — "X is Y" in the first hundred words
Question-style headings with direct answers underneath
Lists and tables — the most liftable content formats
Schema.org markup — Article, FAQPage, and Organization schema give models machine-readable grounding
Trust signals — visible author, publish date, and an about page

The full playbook is in our AEO explainer, and the structural overlap with classic search is covered in SEO vs AEO.

Step 5: Automate it — because this breaks silently

Everything above can regress with one robots.txt edit, one WAF update, or one framework migration — and nothing will alert you. There is no Search Console for ChatGPT.

CheckVibe's AEO scanner runs all of these tests automatically:

Per-engine access matrix — we probe your site as ChatGPT, Claude, Perplexity, Google AI, Copilot, Meta AI, and Mistral crawlers and show exactly who gets in and who gets blocked
45 AEO checks across access, extractability, readability, structured data, and trust signals
68 SEO checks in the same pass — because classic indexability feeds both
A fix prompt on every finding — paste into Claude, Cursor, or Windsurf, redeploy, rescan

Run a free scan and you'll know in about a minute whether the AI engines your customers use can actually see you.

FAQ

How do I know if ChatGPT has crawled my site?

Check your server logs for the user agents GPTBot, OAI-SearchBot, or ChatGPT-User with OpenAI's published IP ranges. No hits in weeks usually means you're blocked at robots.txt or WAF level — or your site has too little linked, extractable content to attract crawls.

Why does my site rank on Google but never get cited by AI?

Because Googlebot renders JavaScript and is whitelisted by every WAF, while AI crawlers mostly don't render JS and are frequently blocked by default bot rules. Your site can be page one on Google and a blank 403 to GPTBot simultaneously. Test each engine separately.

Should I block AI crawlers instead?

If you sell content itself (journalism, research, courses), blocking training crawlers can be rational. For products and services, blocking answer engines mostly means your competitors get recommended instead of you. You can also split the difference: allow search-time crawlers like OAI-SearchBot while blocking training-time crawlers like GPTBot.

What is llms.txt and do I need it?

llms.txt is an emerging convention — a markdown file at /llms.txt that gives language models a curated index of your site's most important content. It's cheap to add and several tools already read it. It complements rather than replaces robots.txt: robots.txt controls access, llms.txt aids understanding.

How long until AI engines pick up my fixes?

Search-time crawlers (OAI-SearchBot, PerplexityBot, Claude-SearchBot) re-fetch popular pages within days. Training-data inclusion moves on model-release timescales — months. Fix access today and the citation benefits compound from the next crawl onward.

Is your app vulnerable?

Paste your URL and get a security report in 30 seconds — 100+ automated checks with AI-ready fix prompts.

Scan your site free

Step 1: Check robots.txt for AI crawler blocks

Every major answer engine announces itself with a user agent and obeys robots.txt:

Engine	Crawler user agents
ChatGPT / OpenAI	`GPTBot`, `OAI-SearchBot`, `ChatGPT-User`
Claude / Anthropic	`ClaudeBot`, `Claude-SearchBot`, `Claude-User`
Perplexity	`PerplexityBot`, `Perplexity-User`
Google AI / Gemini	`Google-Extended`
Microsoft Copilot	`Bingbot` (shared with Bing search)
Meta AI	`Meta-ExternalAgent`
Mistral	`MistralAI-User`

The fix: delete the block, or scope it to paths you genuinely want private. Blocking AI crawlers is a legitimate choice — but it should be a decision, not an inherited accident.

Step 2: Check your WAF and bot protection

robots.txt is only half of access. Cloudflare, Vercel's firewall, AWS WAF, and most bot-protection products ship one-click "block AI bots" toggles — and some enable them by default on new plans.

Test it by fetching your site as an AI crawler:

curl -A "GPTBot/1.2 (+https://openai.com/gptbot)" -I https://yourdomain.com

A 403, 429, or a challenge page means OpenAI's crawler gets stonewalled no matter what robots.txt says. Repeat with ClaudeBot and PerplexityBot — WAF rules often treat each one differently.

The fix: in your WAF dashboard, find the AI/bot-blocking rule and allow the specific crawlers you want. Keep blocking the scrapers you don't.

Step 3: Check what the crawler actually sees

Most AI crawlers do not execute JavaScript. If your site is client-side rendered, the bot may receive an empty <div id="root"></div> where your content should be.

Test it:

curl -s https://yourdomain.com | grep -i "your headline text"

If your headline isn't in the raw HTML, answer engines can't read it. (Googlebot can — which is exactly why this failure hides behind good Google rankings.)

Step 4: Check whether your content is worth citing

Access gets you read. Structure gets you quoted. Answer engines lift passages, so audit your key pages for:

A plain definition near the top — "X is Y" in the first hundred words
Question-style headings with direct answers underneath
Lists and tables — the most liftable content formats
Schema.org markup — Article, FAQPage, and Organization schema give models machine-readable grounding
Trust signals — visible author, publish date, and an about page

The full playbook is in our AEO explainer, and the structural overlap with classic search is covered in SEO vs AEO.

Step 5: Automate it — because this breaks silently

Everything above can regress with one robots.txt edit, one WAF update, or one framework migration — and nothing will alert you. There is no Search Console for ChatGPT.

CheckVibe's AEO scanner runs all of these tests automatically:

Per-engine access matrix — we probe your site as ChatGPT, Claude, Perplexity, Google AI, Copilot, Meta AI, and Mistral crawlers and show exactly who gets in and who gets blocked
45 AEO checks across access, extractability, readability, structured data, and trust signals
68 SEO checks in the same pass — because classic indexability feeds both
A fix prompt on every finding — paste into Claude, Cursor, or Windsurf, redeploy, rescan

Run a free scan and you'll know in about a minute whether the AI engines your customers use can actually see you.

FAQ

How do I know if ChatGPT has crawled my site?

Why does my site rank on Google but never get cited by AI?

Should I block AI crawlers instead?

What is llms.txt and do I need it?

How long until AI engines pick up my fixes?

Is your app vulnerable?

Paste your URL and get a security report in 30 seconds — 100+ automated checks with AI-ready fix prompts.

Scan your site free

How to Check if ChatGPT Can See Your Website (and Fix It if It Can't)

Step 1: Check robots.txt for AI crawler blocks

Step 2: Check your WAF and bot protection

Step 3: Check what the crawler actually sees

Step 4: Check whether your content is worth citing

Step 5: Automate it — because this breaks silently

FAQ

How do I know if ChatGPT has crawled my site?

Why does my site rank on Google but never get cited by AI?

Should I block AI crawlers instead?

What is llms.txt and do I need it?

How long until AI engines pick up my fixes?

Is your app vulnerable?

AEO for Vibe-Coded Apps: Why AI-Built Sites Are Invisible to AI (and How to Fix It)

How to Rank a Vibe-Coded SPA in AI Search (ChatGPT, Perplexity, Claude)

Why Don't AI Engines Find My Lovable Site? (Diagnosis + Fixes)

How to Check if ChatGPT Can See Your Website (and Fix It if It Can't)

Step 1: Check robots.txt for AI crawler blocks

Step 2: Check your WAF and bot protection

Step 3: Check what the crawler actually sees

Step 4: Check whether your content is worth citing

Step 5: Automate it — because this breaks silently

FAQ

How do I know if ChatGPT has crawled my site?

Why does my site rank on Google but never get cited by AI?

Should I block AI crawlers instead?

What is llms.txt and do I need it?

How long until AI engines pick up my fixes?

Is your app vulnerable?

AEO for Vibe-Coded Apps: Why AI-Built Sites Are Invisible to AI (and How to Fix It)

How to Rank a Vibe-Coded SPA in AI Search (ChatGPT, Perplexity, Claude)

Why Don't AI Engines Find My Lovable Site? (Diagnosis + Fixes)