Benchmarks
Comprehensive performance evaluations of The Token Company compression API. Each benchmark provides detailed methodology, statistical analysis, and reproducible results.
Making LLMs understand financial documents better
Compression improved financial QA accuracy by 2.7 percentage points on 150 SEC filing questions — while reducing input tokens by up to 20%.
February 2026
Reducing LLM response times through compression
Up to 37% faster on Claude Opus 4.6 and 30% on GPT-5.2 — saving seconds per request across 5 input sizes with sub-120ms compression overhead.
February 2026
Improving LLM reading comprehension with compression
Compression improved SQuAD 2.0 accuracy by 4.0 percentage points on 150 reading comprehension questions — while reducing input tokens by 17%.
March 2026
Zero accuracy loss on conversational QA with 14% fewer tokens
Compression maintained 87.3% accuracy on 150 multi-turn CoQA questions across 4 domains — while reducing input tokens by 14%.
March 2026
We are working on updating the benchmarks to include more models and domains using our next generation of compression models.
Why benchmark?
Token compression must balance efficiency with quality. Removing too many tokens risks degrading model performance, while removing too few limits cost savings.
Every result is reproducible. We publish the exact configurations, datasets, and evaluation criteria so you can verify our claims.