Where Models Agree Is Not Where Truth Is

Paul Stephen

On AI

Where Models Agree Is Not Where Truth Is

Why Consensus Among AI Systems Is a Measurement of Convergence, Not Correctness

Paul StephenApatheia LabsMay 15, 2026 · 4 min read

When everything was trained on the same internet and tuned toward the same preferences, agreement is not corroboration. It is an echo with several voices.

On this page4 sections

The Comforting Convergence

A practice has quietly become common: ask the same question of several AI models, and treat the places they agree as the reliable parts of the answer. The intuition behind it is the intuition behind every panel of experts — independent sources converging on the same conclusion is strong evidence that the conclusion is right. The intuition is sound. Its application here is not, because the premise it depends on — independence — is precisely the thing large models do not have with respect to each other.

This is a specific, growing failure mode, and it is worth stating plainly because it is so easy to fall into and feels so much like rigour while you are doing it. Agreement among models is real and measurable. What it measures is convergence. Whether convergence tracks correctness is a separate question, and in the cases that matter most, it does not.

Why Models Agree

Independent confirmation requires the confirmers to be independent. Large models are built in ways that make them systematically correlated, for at least three reasons that have nothing to do with the truth of any particular answer.

They are trained on heavily overlapping data. The web, the same large text corpora, much of the same reference material. A claim widely repeated across that shared substrate will be reproduced by all of them — not because each independently verified it, but because each absorbed the same upstream consensus, including the same upstream errors. This is authority laundering with a new set of intermediaries: a claim repeated often enough in the training data is echoed in unison by systems that never checked it, and the unison feels like confirmation.

They are optimised against similar objectives. The dominant training and tuning methods reward outputs that humans rate as helpful, plausible, and agreeable. Models converge on what is persuasive to human raters, which overlaps with truth but is not the same set — and where the two diverge, the optimisation pressure points the same way for all of them, toward the plausible-and-pleasing rather than the correct.

They are tuned toward similar preferences. The behavioural shaping that makes models safe and acceptable is broadly similar across the field, which means they also share characteristic blind spots, characteristic hedges, and characteristic reluctances. Where one model declines to see something for reasons of tuning, the others tend to decline in the same place, for the same reasons. The agreement there is not a signal about the world. It is a shared property of the training regime.

The Test

The discipline is to treat model agreement as a hypothesis about the world's clarity, not as a verdict on it — and then to ask the question that actually discriminates: would these models agree here even if the claim were false? Where the answer is yes — because the false claim is widely repeated in shared training data, or is the plausible-sounding answer human raters reward, or sits in a shared blind spot — the agreement is uninformative regardless of how unanimous it is, and the only thing that resolves the question is the primary evidence the models are standing in for.

Where the answer is no — where a false version of the claim would actually produce visible disagreement, contradiction, or incoherence across models — the convergence carries some weight, though still less than genuinely independent human verification against sources. The test is not "do they agree?" It is "is this a question where agreement could occur without the answer being true?" For most consequential, contested, or sparsely-evidenced questions, it can. That is exactly the domain where the convergence feels most reassuring and means least.

Why This Has to Be Said Now

The practice is spreading precisely because it is convenient and feels disciplined. Polling several models and trusting the overlap has the form of triangulation — multiple sources, independent queries, convergent result — while lacking the property that gives triangulation its power. It is, structurally, the same error as accepting a claim because four institutions repeat it: authority counted instead of evidence, with the institutions replaced by models and the comfort increased because the models are articulate and fast.

Naming it restores the missing question. Consensus among models is a measurement. It measures how strongly the systems were pulled toward the same output by shared data, shared objectives, and shared tuning. That number is real and sometimes useful and never, by itself, a measurement of truth. Where models agree is where they were built to agree. Whether that is also where the truth is remains, every time, a separate question — and the only honest answer to it is the one the models were standing in for: the evidence, checked.

About the author

Paul Stephen

Founder, Apatheia Labs

Evidence-governed research publication — Prosoche applied in the open.

Where Models Agree Is Not Where Truth Is

The Comforting Convergence

Why Models Agree

The Test

Why This Has to Be Said Now

AI as Infrastructure for Human Agency

Memento Mori

The Illusion of Aligned Progress