On Method
Statistical Significance as an Accountability Tool
Turning "It Just Feels One-Sided" Into a Number That Can Be Wrong
"It feels one-sided" can be dismissed. "If selection were neutral, this asymmetry would arise by chance about once in ten thousand times" cannot be — it can only be checked.
On this page3 sections
The Accusation That Always Loses
"This coverage is one-sided." "This selection is skewed." "This isn't balanced." These are among the most common accusations made of institutional output, and among the weakest, because they are almost always staged as an impression against an impression. One person feels the asymmetry; the institution feels, or says, that it does not; there is no third thing to appeal to, and the exchange ends where it began. The accusation loses not because it is wrong but because it was never put in a form that could be tested.
It can be put in that form. Directional asymmetry — which way the close calls went, how often the discretionary choice favoured one side — is countable, and once counted it is governed by ordinary probability. If the choices were genuinely neutral, the departures from balance follow known distributions, and a large enough departure has a calculable, small probability of having arisen by chance. That calculation converts "it feels one-sided" into a claim with a number attached: a claim that can be checked, and therefore a claim that can be wrong, and therefore a claim worth making. This essay hands over the elementary version of the instrument and, more importantly, the discipline that stops it being abused.
The Instrument, Plainly
The logic requires no advanced statistics, only a clearly stated null hypothesis and the willingness to count.
Begin with the neutral expectation. If a set of discretionary choices — which items to include, which way to resolve an ambiguous call, whom to scrutinise — were made without directional preference, you would expect them to split roughly evenly between the directions, the way a fair coin splits across many tosses. Not exactly evenly; chance produces lopsided runs. But the distribution of possible splits, under neutrality, is known. A fair process making, say, forty independent binary choices almost never produces a thirty-five-to-five split; the probability of a departure that large or larger, by chance alone, is vanishingly small and is exactly computable with an elementary binomial calculation.
So you count. You define the choice precisely and in advance — what counts as a directional call, decided by a rule fixed before you look, not after. You tally how the calls actually fell. You compute the probability that a neutral process would have produced a split at least this lopsided. If that probability is tiny, you have a finding: not "it feels skewed" but "under the assumption of neutral selection, this degree of one-directional outcome would occur by chance about once in N times." The same logic, in slightly different dress, is the chi-square test for whether a distribution departs from expectation and the proportion test for whether two rates differ by more than sampling noise. The arithmetic is secondary. The move that matters is replacing the impression with a null hypothesis and a count.
(An illustrative shape, with invented figures: a corpus offers sixty occasions where an ambiguous framing choice could fall either way; a neutral process is expected near thirty-thirty; the actual split is forty-two to eighteen. The two-sided binomial probability of a departure that extreme or greater from a fair process is on the order of one in three hundred. The numbers are hypothetical. The structure of the argument is the transferable thing.)
Why It Belongs in the Open
The reason to hand this over is that its entire value is adversarial. A statistic computed by an accuser and kept private is no better than the impression it replaced. A statistic computed against a pre-registered rule, with the count published, is the opposite: it invites the accused to re-run it, challenge the category, show the dependence, or supply the neutral explanation — and if it survives that, it has earned a standing the impression never could.
That is why the method, not just the result, is published here, and why a piece in this publication that makes a significance claim must expose its rule and its tally for exactly this treatment. The point is not that counting proves the accusers right. It is that counting moves the dispute off the terrain where it is unwinnable — feeling against feeling — and onto terrain where someone can be shown to be wrong, including the person holding the calculator. An accountability tool that could not be turned on its user was never an accountability tool. This one can be, and is meant to be.
About the author
Paul Stephen
Founder, Apatheia Labs
Forensic analysis of institutional behavior.
Read next
More in The Method- EssayMay 2026
What Clarity Without Distortion Actually Requires
Clarity is normally bought with distortion: every act of making something readable drops the qualifiers, the provenance, and the contradictions that made it true. The trade feels inevitable. It is not — it is an architectural failure with an architectural fix. This essay states what clarity without distortion actually requires, and why the disciplines that deliver it are unglamorous, checkable, and almost never run.
- EssayMay 2026
The Council of Phronesis
Everyone says they check their own bias; almost no one has a procedure that removes the discretion to skip it. This one is built like a constitutional republic: seven adversarial personas that make findings and challenge each other, an executive who may veto only in writing, a judicial auditor that reviews for constitutional compliance, a final external review, and a Bill of Rights no finding may override. Published so it can be run against this publication.
- EssayMay 2026
Greek Names as Design Constraints
The Greeks had several words for what English flattens into one: thinking. Practical judgment, unconcealment, deliberate reason, the test that refutes — each names a distinct epistemic operation. Collapsing them into a single undifferentiated 'analysis' is not a translation problem. It is the exact category error that produces unreliable institutional reasoning. Naming an instrument precisely is the first act of building it honestly.