Skip to content
AL | Apatheia Labs
E

Entity Resolution

Canonical Identity Mapping

Creates a unified identity registry across your document corpus. Resolves aliases, tracks role changes, and maps relationships between individuals.

Alias DetectionRole TrackingRelationship MappingTemporal ContextConflict Detection

The Problem

The same person appears five different ways across 500 pages. "Dr. J. Smith," "Jane Smith," "Ms Smith," "the psychologist," and "JS" are all one actor — but fragmented identity means fragmented analysis. Manual tracking breaks down at scale, and missed connections mean missed accountability.

How It Works

  1. 1Extract named entities with NLP
  2. 2Cluster by phonetic similarity and context
  3. 3Resolve clusters using document proximity and co-reference chains
  4. 4Build canonical identity with all known aliases
  5. 5Map relationships from co-occurrence and explicit mentions

Inputs

  • Document corpus
  • Named entity extraction
  • Co-reference resolution

Outputs

  • Identity registry
  • Relationship graph
  • Conflict of interest flags

What You Get

Canonical Identity Card

Canonical Name: Dr. Jane Smith
Aliases: J. Smith, Jane Smith, Ms Smith, "the psychologist," JS
Role: Clinical Psychologist (2019–present)
Organisation: Regional Assessment Service
Documents: 47 appearances across 23 documents
First Seen: Doc 3, p.2 (14 March 2021)
Last Seen: Doc 156, p.8 (9 October 2024)
Relationships: Supervised by Prof. R. Williams; assessed Client A (12 sessions); authored reports E4.1, E4.3, E4.7
Flags: Role change: "Independent Expert" → "Trust Employee" at Doc 89 (no disclosure in subsequent reports)

Use Cases

Family court proceedings

A social worker is referred to by name, job title, and pronouns across 40 reports from different agencies. Entity Resolution unifies these references so their assessments can be tracked as a single professional narrative.

Corporate investigations

Beneficial ownership tracing across shell companies where directors use variations of their names. The engine surfaces hidden connections between entities that appeared unrelated.

Multi-agency reviews

Witness and professional identification across police, health, and education records where different agencies use different naming conventions for the same individuals.

Technical Approach

  • Named Entity Recognition using spaCy transformer models, tuned for UK institutional documents (titles, honorifics, role-based references)
  • Phonetic clustering via Soundex and Double Metaphone algorithms to catch spelling variations and transliterations
  • Co-reference resolution using neural co-reference chains to link pronouns and descriptions back to named entities
  • Manual verification layer — ambiguous merges (confidence < 0.85) are flagged for human review before finalisation