Pre-analysis gate. A dataset cannot enter ACTIVE TEST until all five core criteria are satisfied: actor identity, case context, explicit decision, observable outcome, and repeated observations per actor.
Warning · Do not interpret actor main effects as execution-fit. Anthrocentrix proof requires actor × case interaction and ideally downstream causal improvement.
Evidence axes — project-wide
Every dataset result is reported on four independent axes. A result on one axis is never treated as a result on another — actor main effects are not execution-fit, and modeled policy lift is not a causal proof.
Actor Main Effect
Does actor identity add information beyond case context? A positive result here is necessary but NOT sufficient for execution-fit.
Actor main effects are proven across crossed actor×case datasets (EOIR, STAR, LaborSupply, Grunfeld, Arizona Open Policing).
STRONGLY SUPPORTED
Actor × Case Interaction (Execution-Fit)
Does the actor's contribution depend on the case? This is the execution-fit claim. A main effect alone does not establish it.
Execution-fit is proven to exist. MLB Statcast Umpires (1.47M called pitches, 124 umpires) shows interaction share 67.8% (95% CI 58.8%–75.9%), classified INTERACTION_DOMINATED. EOIR shows weak interaction; Arizona shows main-effect-dominated. Allocation strategy must be measured per domain, not assumed.
SUPPORTED
Policy Gain (actor-aware vs actor-blind)
Does an actor-aware routing/conditioning policy beat the actor-blind baseline in offline/modeled evaluation? Modeled lift only — not yet a causal claim.
Actor-aware allocation produces measurable downstream gains (e.g. Arizona Open Policing: +1.3 pts recall over the actor-blind baseline). Observed across multiple domains; magnitude is domain-dependent.
SUPPORTED
Causal Confidence
Is the policy gain backed by random or quasi-random assignment so it can be read causally? Without this, lift can be confounded by selection.
Causal proof is domain dependent. MLB variance decomposition supports a high-confidence interaction read in that domain; EOIR and Arizona remain associational pending quasi-random assignment.
WEAKLY SUPPORTED
Observed domain types
Allocation strategy must be measured, not assumed. The platform classifies each domain empirically by decomposition.
Main-Effect Dominated
e.g. Arizona Open Policing
Actor quality drives most of the observed actor value. Allocate on actor quality; execution-fit layer stays OFF.
Interaction-Dominated
e.g. MLB Statcast Umpires
Actor × case interaction drives most of the observed actor value. Activate execution-fit allocation.
Benchmark leaderboard
Dataset
Verdict
Interaction share
Note
MLB Statcast Umpires
INTERACTION DOMINATED
67.8% · CI 58.8%–75.9%
1.47M called pitches · 124 umpires · variance decomposition
Arizona Open Policing
MAIN EFFECT DOMINATED
9%
+1.3 pts recall; 91% actor main effect / 9% execution-fit
EOIR Judges
WEAK INTERACTION
≈30%
Interaction statistically significant (p<0.001) but below practical-meaningfulness threshold
Dataset evidence cards
MLB Statcast Umpires (2021–2024)
Sports · human decision making · ball/strike calls
INTERACTION DOMINATEDActor × case interaction confirmed
Execution-fit dominates actor main effects in a multidimensional decision environment. Variance decomposition of calibration residuals over 1.47M called pitches across 124 umpires yields an interaction share of 67.8% (95% CI 58.8%–75.9%) — the first INTERACTION_DOMINATED benchmark in the Anthrocentrix evidence base.
Actor × case interaction is large and replicated; downstream causal benefit not yet demonstrated.
Evidence axes
Actor Main Effect
Umpire identity contributes a stable main-effect component, but it is the minority share of total actor value in this domain.
SUPPORTED
Actor × Case Interaction (Execution-Fit)
Interaction share 67.8% (95% CI 58.8%–75.9%). Umpire × pitch-context interaction materially exceeds main effects and clears the deployment threshold.
STRONGLY SUPPORTED
Policy Gain (actor-aware vs actor-blind)
Decomposition implies large headroom for execution-fit-aware allocation in pitch-call contexts.
SUPPORTED
Causal Confidence
Causal confidence HIGH for the decomposition claim: 1.47M pitches and 272,978 contested pitches across 124 actors yield tight CIs; variance decomposition is identified within the calibration-residual frame.
Variance decomposition is identified within the calibration-residual frame; causal confidence labeled HIGH for the decomposition claim.
Called pitches
1,469,350
Umpires
124
Contested pitches
272,978
Method
Variance decomposition of calibration residuals
Interaction share
67.8%
Interaction 95% CI
58.8%–75.9%
Verdict
INTERACTION DOMINATED
Causal confidence
HIGH
Evidence status
YES
Qualification fields
✓actor_present
✓case_context_present
✓decision_present
✓downstream_outcome_present
✓repeated_actors_present
✓assignment_random_or_quasi_random
proof_eligibility_score: 92/100
commercial_relevance_score: 7/10
current_status: INTERACTION_CONFIRMED
First INTERACTION_DOMINATED benchmark. Demonstrates that some domains are genuinely execution-fit-led; allocation strategy must be measured per domain, not assumed.
Arizona Open Policing
Traffic stops · officer × incident
MAIN EFFECT DOMINATEDActor × case interaction detected
Arizona is the first dataset where an actor-aware allocation policy materially improved a downstream outcome. The bulk of the gain (~91%) is actor main effect; execution-fit contributes the remaining ~9%. Result is associational, not causal.
Statistically significant actor × case interaction exists, but magnitude is modest and/or downstream causal effect is unproven.
Evidence axes
Actor Main Effect
Officer identity carries strong, stable performance signal independent of incident type — drives ~91% of observed policy gain.
STRONGLY SUPPORTED
Actor × Case Interaction (Execution-Fit)
Officer × incident execution-fit is real but small; contributes only ~9% of total actor value.
WEAKLY SUPPORTED
Policy Gain (actor-aware vs actor-blind)
Actor-aware allocation produced a measurable downstream improvement (+1.3 points recall) over the actor-blind baseline.
WEAKLY SUPPORTED
Causal Confidence
Assignment is not random or quasi-random; gain is associational and cannot yet be read as a causal interventional claim.
NOT YET ESTABLISHED
Actor Main Effect
91%
of policy gain
Execution-Fit
9%
of policy gain
Policy Gain
+1.3 points recall
vs actor-blind baseline
Fit LayerOFF
Execution-fit contributes only 9% of total actor value and does not clear deployment threshold (25%).
Decomposition gate
BLOCKED
Execution-fit is real (statistically significant) but contributes <25% of total actor gain.
Volume gate
PASS
Repeated officers across many stops; volume gate cleared.
Causal confidence gate
ASSOCIATIONAL
Officer assignment is not random/quasi-random; observed gain cannot yet be read causally.
Policy gain
+1.3 pts recall
Actor main effect share
91%
Execution-fit share
9%
Verdict
MAIN EFFECT DOMINATED
Causal confidence
ASSOCIATIONAL
Fit layer
OFF
Qualification fields
✓actor_present
✓case_context_present
✓decision_present
✓downstream_outcome_present
✓repeated_actors_present
✗assignment_random_or_quasi_random
proof_eligibility_score: 78/100
commercial_relevance_score: 7/10
current_status: INTERACTION_DETECTED
Headline result: actor-aware allocation works. Mechanism is dominated by actor quality, not execution-fit. Anthrocentrix must report this as an Actor Quality Allocation win, not an execution-fit win.
U.S. Immigration Judges (EOIR)
Asylum adjudication
MAIN EFFECT DOMINATEDActor × case interaction detected
EOIR shows statistically significant judge × case execution-fit in decision behavior, but the effect is modest and downstream causal improvement is not proven.
Statistically significant actor × case interaction exists, but magnitude is modest and/or downstream causal effect is unproven.
Evidence axes
Actor Main Effect
Actor identity adds information beyond context (actor lift +0.0082 AUC over context-only).
STRONGLY SUPPORTED
Actor × Case Interaction (Execution-Fit)
Judge × case interaction is statistically significant (AUC +0.0036, 95% CI [0.0026, 0.0046], p<0.001) but below the 0.01 practical-meaningfulness threshold.
WEAKLY SUPPORTED
Policy Gain (actor-aware vs actor-blind)
Actor-aware policy beats the actor-blind baseline in modeled offline evaluation, but the lift is small.
WEAKLY SUPPORTED
Causal Confidence
Judge assignment is not quasi-random in EOIR; Phase 3 downstream reversal/remand test did not establish a causal interventional claim.
NOT YET ESTABLISHED
Actor Main Effect
70%
of policy gain
Execution-Fit
30%
of policy gain
Policy Gain
small modeled lift; not validated downstream
vs actor-blind baseline
Fit LayerOFF
Execution-fit is statistically significant but below the practical-meaningfulness threshold; does not clear deployment gate.
Decomposition gate
BLOCKED
Interaction is statistically significant but practically below threshold.
Volume gate
PASS
300,000 modeled decisions across many judges.
Causal confidence gate
ASSOCIATIONAL
Judge assignment is not quasi-random; no clean interventional contrast.
Merits decisions scanned
6,485,038
Modeled sample
300,000
Context-only AUC
0.8947
Actor + context AUC
0.9029
Actor × case AUC
0.9065
Actor main lift
+0.0082
Interaction lift
+0.0036
Interaction 95% CI
[0.0026, 0.0046]
p-value
< 0.001
Interaction as % of main effect
43.5%
Verdict
MAIN EFFECT DOMINATED
Qualification fields
✓actor_present
✓case_context_present
✓decision_present
✓downstream_outcome_present
✓repeated_actors_present
✗assignment_random_or_quasi_random
proof_eligibility_score: 72/100
commercial_relevance_score: 6/10
current_status: INTERACTION_DETECTED
Phase 3 downstream reversal/remand test did NOT prove the interventional claim — judge assignment was not quasi-random and the downstream actor × case interaction was practically negligible. Phase 4 decision-level execution-fit test detected a statistically significant but practically modest interaction (below the 0.01 practical-meaningfulness threshold).
Claim ladder
Actor effects exist
Actor main effects proven across crossed actor×case datasets — EOIR, STAR, LaborSupply, Grunfeld, and Arizona Open Policing all show actor identity adds information beyond context.
SUPPORTED
Actor × case fit exists
MLB Statcast Umpires (1.47M called pitches, 124 umpires): interaction share 67.8% (95% CI 58.8%–75.9%). Execution-fit is proven to exist in at least one production-scale domain.
Arizona Open Policing: actor-aware allocation produced +1.3 points recall over the actor-blind baseline. Observed; magnitude is domain-dependent.
SUPPORTED
Execution-fit can be the dominant mechanism in some domains
MLB Statcast: INTERACTION_DOMINATED (68% execution-fit share). Arizona: MAIN_EFFECT_DOMINATED (9%). EOIR: WEAK_INTERACTION. Allocation strategy must be measured per domain, not assumed.
Causal proof is domain dependent. MLB variance decomposition is high-confidence within its frame; EOIR and Arizona remain associational pending quasi-random assignment.
Customer Service / Contact Center · Inbound/outbound calls · 91,706 transcripts · 10,448 hours
Proof Candidate
52
/ 100
✗Actor
✗Case context
✗Decision
✗Outcome
✗Repeated actors
Sample size
91,706
Domain
Inbound/outbound calls
Commercial relevance
9/10
Proposed state
DISCOVERED
Gate blocked — cannot enter ACTIVE TEST. Missing: Actor identity missing, Case context missing, Explicit decision missing, Observable outcome missing, Repeated observations per actor missing.
Customer Service / Contact Center · Conversational speech · 260 hours, ~2,400 conversations
Proof Candidate
47
/ 100
✓Actor
✓Case context
✓Decision
✗Outcome
✗Repeated actors
Sample size
2,400
Domain
Conversational speech
Commercial relevance
4/10
Proposed state
DISCOVERED
Gate blocked — cannot enter ACTIVE TEST. Missing: Observable outcome missing, Repeated observations per actor missing.
StateDISCOVERED
Lifecycle states: DISCOVERED → SCREENING → (REJECTED · PARKED) → ELIGIBLE → ACTIVE TEST → (PROVEN · FAILED). ACTIVE TEST is gated server-side by the five core criteria; all other transitions are operator-driven and persisted locally. See also: discovery registry.