Entity Resolution
Canonical Identity Mapping
Creates a unified identity registry across your document corpus. Resolves aliases, tracks role changes, and maps relationships between individuals.
The Problem
The same person appears five different ways across 500 pages. "Dr. J. Smith," "Jane Smith," "Ms Smith," "the psychologist," and "JS" are all one actor — but fragmented identity means fragmented analysis. Manual tracking breaks down at scale, and missed connections mean missed accountability.
How It Works
- 1Extract named entities with NLP
- 2Cluster by phonetic similarity and context
- 3Resolve clusters using document proximity and co-reference chains
- 4Build canonical identity with all known aliases
- 5Map relationships from co-occurrence and explicit mentions
Inputs
- • Document corpus
- • Named entity extraction
- • Co-reference resolution
Outputs
- • Identity registry
- • Relationship graph
- • Conflict of interest flags
What You Get
Canonical Identity Card Canonical Name: Dr. Jane Smith Aliases: J. Smith, Jane Smith, Ms Smith, "the psychologist," JS Role: Clinical Psychologist (2019–present) Organisation: Regional Assessment Service Documents: 47 appearances across 23 documents First Seen: Doc 3, p.2 (14 March 2021) Last Seen: Doc 156, p.8 (9 October 2024) Relationships: Supervised by Prof. R. Williams; assessed Client A (12 sessions); authored reports E4.1, E4.3, E4.7 Flags: Role change: "Independent Expert" → "Trust Employee" at Doc 89 (no disclosure in subsequent reports)
Works With
Uses resolved entities to build per-person timelines and detect when someone’s role or involvement changed.
Relies on canonical identities to determine whether two conflicting statements came from the same or different sources.
Maps statutory duties to resolved individuals, ensuring breach findings attach to the right person regardless of how they were named.
Use Cases
Family court proceedings
A social worker is referred to by name, job title, and pronouns across 40 reports from different agencies. Entity Resolution unifies these references so their assessments can be tracked as a single professional narrative.
Corporate investigations
Beneficial ownership tracing across shell companies where directors use variations of their names. The engine surfaces hidden connections between entities that appeared unrelated.
Multi-agency reviews
Witness and professional identification across police, health, and education records where different agencies use different naming conventions for the same individuals.
Technical Approach
- Named Entity Recognition using spaCy transformer models, tuned for UK institutional documents (titles, honorifics, role-based references)
- Phonetic clustering via Soundex and Double Metaphone algorithms to catch spelling variations and transliterations
- Co-reference resolution using neural co-reference chains to link pronouns and descriptions back to named entities
- Manual verification layer — ambiguous merges (confidence < 0.85) are flagged for human review before finalisation