robots.txt
A plain-text file at the root of your site that tells crawlers which paths they may or may not request, acting as the gatekeeper before any page is read.
One line in a file most people forget exists can make your whole site invisible to AI crawlers. No error, no warning. The crawler just reads the rule, turns around, and leaves.
That file is robots.txt, and a stale rule in it is one of the quietest ways to disappear from AI answers.
What robots.txt is
A robots.txt is a plain-text file at the root of your site, at yoursite.com/robots.txt. It tells crawlers which paths they may request and which they should leave alone, by user-agent.
A crawler checks it before fetching. So robots.txt sits in front of everything else: before a page can be read, used, or cited, the crawler has to be allowed past this gate.
Old way, new way
The old way: you tuned robots.txt to steer Google away from admin pages and duplicate junk, so it spent its crawl on pages that mattered for search.
The new way: the crawlers reading the gate also feed AI engines. A rule that once just trimmed Google's crawl can now block the fetch that would have put your page into an AI answer. The same file has higher stakes.
How robots.txt quietly blocks you
Common ways a forgotten file walls off content you want read:
- A broad disallow left over from a staging site that shipped to production.
- A block on a specific AI crawler's user-agent that you no longer mean to keep.
- A disallowed path that happens to contain pages you now want surfaced.
- A robots.txt that returns an error, which some crawlers treat as block everything.
Each of these reads as a closed door to a machine, no matter how good the page behind it is.
The damaging admission
Opening the gate does not make a page worth reading. robots.txt only decides whether a crawler is allowed to fetch, not whether what it fetches is legible. Citedon checks both, but they are separate problems.
And robots.txt is a request, not a lock. Well-behaved crawlers respect it, but it is not a security control, and we will not pretend it is one.
How to check yours
Open yoursite.com/robots.txt and read it as a crawler would. Does any rule block a path that holds pages you want engines to read. Is there a leftover disallow from an older setup. If a door you meant to leave open is shut, the best page behind it still cannot be read.
Run a free scan on a URL to see whether engines can reach and read it, or read the guide on how to fix crawlability issues.