What is a robots.txt file?

It is a plain-text file at yoursite.com/robots.txt that tells crawlers which paths they may or may not request. It is the gatekeeper a crawler checks before fetching, so it shapes what ever gets read.

Can robots.txt block AI engines from reading my site?

Yes. A disallow rule, or a block on a specific AI crawler's user-agent, can keep engines from fetching pages. One overly broad line can hide content you actually want read, often without anyone noticing.

Is robots.txt the same as llms.txt?

No. robots.txt tells crawlers what they may not access. An llms.txt does the opposite, highlighting the pages you most want read. One restricts, the other invites.

← Back to glossary

Glossary · 2 min read

robots.txt

A plain-text file at the root of your site that tells crawlers which paths they may or may not request, acting as the gatekeeper before any page is read.

One line in a file most people forget exists can make your whole site invisible to AI crawlers. No error, no warning. The crawler just reads the rule, turns around, and leaves.

That file is robots.txt, and a stale rule in it is one of the quietest ways to disappear from AI answers.

What robots.txt is

A robots.txt is a plain-text file at the root of your site, at yoursite.com/robots.txt. It tells crawlers which paths they may request and which they should leave alone, by user-agent.

A crawler checks it before fetching. So robots.txt sits in front of everything else: before a page can be read, used, or cited, the crawler has to be allowed past this gate.

Old way, new way

The old way: you tuned robots.txt to steer Google away from admin pages and duplicate junk, so it spent its crawl on pages that mattered for search.

The new way: the crawlers reading the gate also feed AI engines. A rule that once just trimmed Google's crawl can now block the fetch that would have put your page into an AI answer. The same file has higher stakes.

How robots.txt quietly blocks you

Common ways a forgotten file walls off content you want read:

A broad disallow left over from a staging site that shipped to production.
A block on a specific AI crawler's user-agent that you no longer mean to keep.
A disallowed path that happens to contain pages you now want surfaced.
A robots.txt that returns an error, which some crawlers treat as block everything.

Each of these reads as a closed door to a machine, no matter how good the page behind it is.

The damaging admission

Opening the gate does not make a page worth reading. robots.txt only decides whether a crawler is allowed to fetch, not whether what it fetches is legible. Citedon checks both, but they are separate problems.

And robots.txt is a request, not a lock. Well-behaved crawlers respect it, but it is not a security control, and we will not pretend it is one.

How to check yours

Open yoursite.com/robots.txt and read it as a crawler would. Does any rule block a path that holds pages you want engines to read. Is there a leftover disallow from an older setup. If a door you meant to leave open is shut, the best page behind it still cannot be read.

Run a free scan on a URL to see whether engines can reach and read it, or read the guide on how to fix crawlability issues.

See whether your robots.txt is letting engines in, free.

Run a free scan. No signup. You get a readiness score and the gaps to fix, in about a minute.

Run a free scan How it works