April 16, 2026 · 5 min read

How AI Engines Decide What to Recommend

Nobody outside the labs knows the exact ranking. But the input signals an engine can read are visible and controllable. Here is what they are.

Hard for engines to read

<p>We help businesses grow their

online presence with solutions

tailored to their needs.</p>

Machine-readable

{

"@type": "FAQPage",

"name": "How we help",

"description": "..."

}

Same page, made readable to engines.

Ask an AI engine why it recommends one site over another and it will give you a confident answer.

That answer is a guess. The engine is describing general principles, not the actual ranking math that decided the call, which it does not have access to either.

So let us be honest up front: no one outside the labs knows the exact formula. But that does not leave you blind, because the inputs an engine reads are visible. This post is about those inputs.

The ranking is hidden. The inputs are not.

Here is the useful distinction. The ranking, how an engine weights everything and picks a winner, is a black box that shifts with every model update.

The inputs are not a black box. They are the signals on your page that an engine can actually read before it decides anything. You can see them, check them, and change them.

So you do not optimize the ranking, because you cannot. You optimize the input the ranking reads, because you can. That is the whole strategy.

Chasing the hidden formula is the old way: guess, tweak, hope. Controlling the readable input is the new way.

The old way reads like a horoscope. Someone claims to know the secret sauce, you rearrange your page on faith, and nothing tells you whether it worked. When the engine updates, the supposed secret is stale and you start guessing again.

The new way is boring by comparison, which is the point. You make the page readable, you measure across the four engines, and you re-measure when something moves. No mysticism, just inputs and a readout.

The signals an engine can actually read

There are a handful of them, and they are concrete. Walk down the list.

Clear structure

A clean heading hierarchy that tells a machine what the page is about and where each idea lives. A wall of undifferentiated text gives an engine nothing to anchor on.

A direct answer in plain text

The engine wants a sentence it can lift: "X costs $Y," "the three options are A, B, and C." A page that circles its point in a story gives the model nothing to quote.

Schema that labels the page

Schema is the machine-readable tag that says "this is a product, this is its price, this is the FAQ." Without it, the engine has to infer everything from prose, and it competes against pages that did not make it guess.

Crawlability

If your robots.txt, a firewall, or a bot setting returns a 403 to AI fetchers, none of the other signals matter. The engine never reads the page at all.

Corroboration

Engines lean toward pages whose claims line up with other sources. A page that agrees with the broader record is easier to trust than one making a lonely assertion.

These signals compound. A page with clean structure but no schema is half-legible. A page with schema but a 403 to AI crawlers is invisible despite the markup. The engine reads the whole picture, so a single broken signal can sink the rest.

That is why a scan checks the full list, not one item. Fixing your schema does not help if the crawler still cannot fetch the page, and you would never know which one was the blocker without checking both.

Why this means you optimize for readability

Notice what every signal on that list has in common. Each one is about whether a machine can read and parse your page, not about tricking it.

That is the controllable input: readability. You cannot reach into the ranking, but you can make sure the page sends clean, complete, readable signals into it.

ScanFind gaps
->
FixApply fixes
->
WatchMonitor drift
->
Re-proveConfirm ready
-> start
Get ready, then stay ready as the models change.

And readability is not a one-time state, which is why the work is a loop. Scan to see what signals your page sends today. Fix the ones that are missing or broken. Watch as the engines change how they read pages and as your content grows. Re-prove by re-scanning to confirm the signals improved.

You can read how the full loop works on the how it works page.

The watch matters because two of the three things that move are not yours to freeze. Your pages change as you publish. The engines change how they parse and weight signals. A page that sent clean signals in spring can fall behind by summer without you touching it.

A concrete example

Say an engine recommends a competitor's comparison page instead of yours, and you cannot see why.

A scan fetches both pages as a machine would. Yours emits a generic "article" with the answer buried on line 80. Theirs emits a structured comparison with FAQ schema and the answer in the first screen of text.

Now the why is visible. It is not mysterious ranking favoritism; it is that their page sent a cleaner, more complete signal. That is an input gap you can close.

The scan shows you the readable difference and which engine named whom. It does not pretend to show you the secret math, because no honest tool can.

The honest part

Here is the damaging admission. We do not know the exact ranking, and neither does anyone outside the model labs.

So we do not optimize for a number we cannot see. We optimize for the readable input, the part you control, and then we measure the result across the four engines so you are working from data, not from a blog post's hunch.

We also do not promise a recommendation. Better signals make you eligible and competitive; the engine still chooses, and it can choose someone else. We report the citation rate as proof of readiness, never as a guarantee.

And the automated fix layer is WordPress-only. The scan reads the signals on any site, but on Shopify, Wix, Webflow, or headless you would apply the fixes yourself.

The cost of optimizing the wrong thing

Most effort here goes to the thing you cannot control. People rewrite a page chasing a recommendation, see no change, and assume the page is just unlucky.

The page was rarely unlucky. It was usually sending a broken signal the writer never checked: an empty shell that rendered after JavaScript, a generic article tag where a comparison should be, a crawler quietly turned away.

Every hour spent guessing at the hidden ranking is an hour not spent fixing a readable input you could have seen in a scan. That is the real cost: not a lost citation, but motion in the wrong place.

The fix is to stop optimizing the black box and start optimizing the part with a readout. Make the signal clean, measure it, and move on to the next page instead of relitigating the last one.

Where to start

Pick the page where an engine keeps choosing someone else, and find out what signal it is missing.

Scan the URL, read how many of ChatGPT, Perplexity, Gemini, and Claude name it, and see which of the readable signals are failing. That turns a hidden ranking into a fixable input.

For the underlying definitions, start with what answer engine optimization actually is. And to go deeper on the markup itself, read schema that makes pages machine-readable.

aeohow-it-works

Written by

Alex

AI Engineer at Citedon

Alex is an AI engineer at Citedon, where they work on the scan engine that measures how readable a site is to ChatGPT, Perplexity, Gemini, and Claude, and on the fixes that make a site agent-ready and keep it that way as the models change. Alex writes about answer engine optimization, structured data, and the practical work of staying readable to AI engines.