Commercial Prototype
Annotation QA — Decision-Quality Routing
Route AI-annotation work to QA review by behavioral decision-quality risk. Catch more bad labels in fewer reviewer hours.
Buyer · Head of Data Quality
prototype v0.1
Random or uniform QA sampling reviews everything at equal probability and misses the systematic shape of human labeling mistakes. Anthrocentrix scores each annotation decision from its telemetry — time on task, revisions, reopens, skip-and-return, self-confidence, fatigue, disagreement history — and routes only the high-risk decisions to human review.
Decisions Modeled
3,000
Labelers
40
Base Error Rate
42.2%
PR-AUC (errors)
0.590
anthrocentrix model
Errors @ 20% Review
391
random: 260
Recall @ 20% Review
30.9%
random: 20.5%
Workflow
labeler → telemetry → risk → routing- Step 1Labeler decidessubmits annotation
- Step 2Telemetry capturedtime, revisions, fatigue
- Step 3Risk scoredanthrocentrix model
- Step 4High-risk routed→ human QA
- Step 5Low-risk bypassedauto-accept
Review-Efficiency Frontier
anthrocentrix routing vs random QA sampling| Review % | Anthrocentrix recall | Random recall | Lift | Errors / 100 reviews (A) | Errors / 100 reviews (R) |
|---|---|---|---|---|---|
| 5% | 8.3% | 4.7% | +3.6 pp | 70.0 | 40.0 |
| 10% | 16.0% | 9.7% | +6.3 pp | 67.7 | 41.0 |
| 15% | 23.3% | 15.6% | +7.7 pp | 65.6 | 44.0 |
| 20% | 30.9% | 20.5% | +10.3 pp | 65.2 | 43.3 |
| 30% | 43.1% | 29.8% | +13.3 pp | 60.7 | 41.9 |
| 40% | 53.6% | 39.8% | +13.8 pp | 56.6 | 42.0 |
| 50% | 63.6% | 49.9% | +13.7 pp | 53.7 | 42.1 |
| 75% | 85.8% | 75.5% | +10.3 pp | 48.3 | 42.5 |
| 100% | 100.0% | 100.0% | +0.0 pp | 42.2 | 42.2 |
ROI Calculator
adjust inputs to match your operationResult · monthly
Anthrocentrix chose review fraction 75% to hit target recall 75%.
Baseline cost
$182,292
Routed cost
$546,875
Hours saved
-10,417
Reviews avoided
-250,000
Errors caught Δ
129,500
Error exposure ↓
$518,000
Net monthly savings
$153,417
Routing Preview
top 20% risk → human QA| Task | Labeler | Time s | Rev | Reopen | Conf | Fatigue | Risk | Route |
|---|---|---|---|---|---|---|---|---|
| t_31_54 | lbl_032 | 44.2 | 5 | yes | 0.00 | 0.70 | 0.781 | review |
| t_1_64 | lbl_002 | 44.5 | 5 | yes | 0.33 | 0.86 | 0.776 | review |
| t_11_63 | lbl_012 | 37.9 | 5 | yes | 0.11 | 0.83 | 0.776 | review |
| t_2_72 | lbl_003 | 46.1 | 5 | yes | 0.03 | 0.91 | 0.763 | review |
| t_9_66 | lbl_010 | 46 | 4 | yes | 0.00 | 0.91 | 0.762 | review |
| t_29_74 | lbl_030 | 43.4 | 4 | yes | 0.00 | 1.00 | 0.762 | review |
| t_26_72 | lbl_027 | 46 | 4 | yes | 0.00 | 0.94 | 0.758 | review |
| t_12_67 | lbl_013 | 44 | 4 | yes | 0.00 | 0.85 | 0.755 | review |
| t_30_70 | lbl_031 | 47.5 | 5 | yes | 0.39 | 0.98 | 0.754 | review |
| t_9_70 | lbl_010 | 40 | 4 | yes | 0.00 | 0.92 | 0.752 | review |
| t_36_71 | lbl_037 | 47 | 5 | yes | 0.37 | 0.91 | 0.750 | review |
| t_25_73 | lbl_026 | 39.4 | 4 | yes | 0.04 | 0.94 | 0.747 | review |
Uploadable Dataset Schema
CSV or JSONAnnotation QA Dataset Schema ============================ Required columns (CSV or JSON): task_id string unique per labeling decision labeler_id string stable labeler identifier task_index int position in labeler's session (0-based) time_on_task_s number seconds spent on the task revision_count int number of edits before submission reopened 0|1 labeler reopened after submit skipped_then_returned 0|1 task was skipped and later completed self_confidence 0..1 optional self-reported confidence session_fatigue 0..1 fraction of session elapsed disagreement_history 0..1 labeler's historical disagreement rate label string submitted label (any taxonomy) Optional (for evaluation only): is_error 0|1 ground-truth flag from gold review