Statistical Significance as an Accountability Tool

Paul Stephen

On Method

Statistical Significance as an Accountability Tool

Turning "It Just Feels One-Sided" Into a Number That Can Be Wrong

Paul StephenApatheia LabsMay 15, 2026 · 4 min read

"It feels one-sided" can be dismissed. "If selection were neutral, this asymmetry would arise by chance about once in ten thousand times" cannot be — it can only be checked.

On this page3 sections

The Accusation That Always Loses

"This coverage is one-sided." "This selection is skewed." "This isn't balanced." These are among the most common accusations made of institutional output, and among the weakest, because they are almost always staged as an impression against an impression. One person feels the asymmetry; the institution feels, or says, that it does not; there is no third thing to appeal to, and the exchange ends where it began. The accusation loses not because it is wrong but because it was never put in a form that could be tested.

It can be put in that form. Directional asymmetry — which way the close calls went, how often the discretionary choice favoured one side — is countable, and once counted it is governed by ordinary probability. If the choices were genuinely neutral, the departures from balance follow known distributions, and a large enough departure has a calculable, small probability of having arisen by chance. That calculation converts "it feels one-sided" into a claim with a number attached: a claim that can be checked, and therefore a claim that can be wrong, and therefore a claim worth making. This essay hands over the elementary version of the instrument and, more importantly, the discipline that stops it being abused.

The Instrument, Plainly

The logic requires no advanced statistics, only a clearly stated null hypothesis and the willingness to count.

Begin with the neutral expectation. If a set of discretionary choices — which items to include, which way to resolve an ambiguous call, whom to scrutinise — were made without directional preference, you would expect them to split roughly evenly between the directions, the way a fair coin splits across many tosses. Not exactly evenly; chance produces lopsided runs. But the distribution of possible splits, under neutrality, is known. A fair process making, say, forty independent binary choices almost never produces a thirty-five-to-five split; the probability of a departure that large or larger, by chance alone, is vanishingly small and is exactly computable with an elementary binomial calculation.

So you count. You define the choice precisely and in advance — what counts as a directional call, decided by a rule fixed before you look, not after. You tally how the calls actually fell. You compute the probability that a neutral process would have produced a split at least this lopsided. If that probability is tiny, you have a finding: not "it feels skewed" but "under the assumption of neutral selection, this degree of one-directional outcome would occur by chance about once in N times." The same logic, in slightly different dress, is the chi-square test for whether a distribution departs from expectation and the proportion test for whether two rates differ by more than sampling noise. The arithmetic is secondary. The move that matters is replacing the impression with a null hypothesis and a count.

(An illustrative shape, with invented figures: a corpus offers sixty occasions where an ambiguous framing choice could fall either way; a neutral process is expected near thirty-thirty; the actual split is forty-two to eighteen. The two-sided binomial probability of a departure that extreme or greater from a fair process is on the order of one in three hundred. The numbers are hypothetical. The structure of the argument is the transferable thing.)

Why It Belongs in the Open

The reason to hand this over is that its entire value is adversarial. A statistic computed by an accuser and kept private is no better than the impression it replaced. A statistic computed against a pre-registered rule, with the count published, is the opposite: it invites the accused to re-run it, challenge the category, show the dependence, or supply the neutral explanation — and if it survives that, it has earned a standing the impression never could.

That is why the method, not just the result, is published here, and why a piece in this publication that makes a significance claim must expose its rule and its tally for exactly this treatment. The point is not that counting proves the accusers right. It is that counting moves the dispute off the terrain where it is unwinnable — feeling against feeling — and onto terrain where someone can be shown to be wrong, including the person holding the calculator. An accountability tool that could not be turned on its user was never an accountability tool. This one can be, and is meant to be.

About the author

Paul Stephen

Founder, Apatheia Labs

Evidence-governed research publication — Prosoche applied in the open.

Statistical Significance as an Accountability Tool

The Accusation That Always Loses

The Instrument, Plainly

Why It Belongs in the Open

The One Artifact You Can't Delegate

What Clarity Without Distortion Actually Requires

The Council of Phronesis