Commercial Prototype

Annotation QA — Decision-Quality Routing

Route AI-annotation work to QA review by behavioral decision-quality risk. Catch more bad labels in fewer reviewer hours.

Buyer · Head of Data Quality
prototype v0.1

Random or uniform QA sampling reviews everything at equal probability and misses the systematic shape of human labeling mistakes. Anthrocentrix scores each annotation decision from its telemetry — time on task, revisions, reopens, skip-and-return, self-confidence, fatigue, disagreement history — and routes only the high-risk decisions to human review.

Decisions Modeled
3,000
Labelers
40
Base Error Rate
42.2%
PR-AUC (errors)
0.590
anthrocentrix model
Errors @ 20% Review
391
random: 260
Recall @ 20% Review
30.9%
random: 20.5%

Workflow

labeler → telemetry → risk → routing
  1. Step 1
    Labeler decides
    submits annotation
  2. Step 2
    Telemetry captured
    time, revisions, fatigue
  3. Step 3
    Risk scored
    anthrocentrix model
  4. Step 4
    High-risk routed
    → human QA
  5. Step 5
    Low-risk bypassed
    auto-accept

Review-Efficiency Frontier

anthrocentrix routing vs random QA sampling
0%25%50%75%100%5%10%15%20%30%40%50%75%100%Anthrocentrix routingRandom sampling
Review %Anthrocentrix recallRandom recallLiftErrors / 100 reviews (A)Errors / 100 reviews (R)
5%8.3%4.7%+3.6 pp70.040.0
10%16.0%9.7%+6.3 pp67.741.0
15%23.3%15.6%+7.7 pp65.644.0
20%30.9%20.5%+10.3 pp65.243.3
30%43.1%29.8%+13.3 pp60.741.9
40%53.6%39.8%+13.8 pp56.642.0
50%63.6%49.9%+13.7 pp53.742.1
75%85.8%75.5%+10.3 pp48.342.5
100%100.0%100.0%+0.0 pp42.242.2

ROI Calculator

adjust inputs to match your operation
Result · monthly
Anthrocentrix chose review fraction 75% to hit target recall 75%.
Baseline cost
$182,292
Routed cost
$546,875
Hours saved
-10,417
Reviews avoided
-250,000
Errors caught Δ
129,500
Error exposure ↓
$518,000
Net monthly savings
$153,417

Routing Preview

top 20% risk → human QA
TaskLabelerTime sRevReopenConfFatigueRiskRoute
t_31_54lbl_03244.25yes0.000.700.781review
t_1_64lbl_00244.55yes0.330.860.776review
t_11_63lbl_01237.95yes0.110.830.776review
t_2_72lbl_00346.15yes0.030.910.763review
t_9_66lbl_010464yes0.000.910.762review
t_29_74lbl_03043.44yes0.001.000.762review
t_26_72lbl_027464yes0.000.940.758review
t_12_67lbl_013444yes0.000.850.755review
t_30_70lbl_03147.55yes0.390.980.754review
t_9_70lbl_010404yes0.000.920.752review
t_36_71lbl_037475yes0.370.910.750review
t_25_73lbl_02639.44yes0.040.940.747review

Uploadable Dataset Schema

CSV or JSON
Annotation QA Dataset Schema
============================
Required columns (CSV or JSON):

  task_id               string   unique per labeling decision
  labeler_id            string   stable labeler identifier
  task_index            int      position in labeler's session (0-based)
  time_on_task_s        number   seconds spent on the task
  revision_count        int      number of edits before submission
  reopened              0|1      labeler reopened after submit
  skipped_then_returned 0|1      task was skipped and later completed
  self_confidence       0..1     optional self-reported confidence
  session_fatigue       0..1     fraction of session elapsed
  disagreement_history  0..1     labeler's historical disagreement rate
  label                 string   submitted label (any taxonomy)

Optional (for evaluation only):
  is_error              0|1      ground-truth flag from gold review