# The Token Company

> LLM input compression API. Reduce tokens by 66%, cut AI costs by 3x, improve accuracy.

## About

The Token Company builds compression models that remove low-signal tokens from LLM prompts before they reach the model. The compression is semantic and deterministic — it processes tokens based on context and intent, not simple truncation. Backed by Y Combinator, with investors from Supercell, Wolt, Hugging Face, Silo AI, and SVA.

Website: https://thetokencompany.com

## Models

### bear-1
The first LLM input compression model. Compresses input tokens by 66% without sacrificing accuracy. Released November 2025.

### bear-1.1
Improved version of bear-1 with better accuracy preservation and faster compression speeds. Released February 2026.

### bear-1.2
Latest compression model. Removes low-signal tokens from prompts before they hit the LLM.

## How It Works

1. Send text to The Token Company compression API
2. Receive compressed text back
3. Pass the compressed text to any LLM (GPT, Claude, Gemini, etc.)

That's the entire integration — one API call before your LLM call.

## Pricing

Free tier: up to 1B processed tokens per month, bear-1 & bear-1.2 models, API access.
Enterprise: custom pricing for unlimited tokens, custom compression models, on-premise hosting, dedicated support, SLA guarantees, and more.

Details: https://thetokencompany.com/pricing

## Use Cases

- **LLM Entertainment & Gaming**: Longer memories, richer worlds, same budget
- **Meeting Transcription Analysis**: Distill hours of calls into signal-dense context
- **Web Scraping**: Strip boilerplate from crawled pages before ingest
- **Document Analysis**: Fit more PDFs and reports into one context window

## Case Study: Pax Historia

Pax Historia processes 193 billion tokens per month on OpenRouter, making them one of the biggest token consumers globally. They ran a 268K-vote blind model arena with bear-1.1 compression.

Results:
- +4.9% improvement in Sonnet 4.5 score
- +15% improvement in Gemini 3 Flash score
- +5% purchase amount lift in A/B tests

Compressed models scored higher than uncompressed models — compression improved quality by removing context bloat.

Full case study: https://thetokencompany.com/blog/pax-historia

## Benchmarks

### FinanceBench
Accuracy evaluation on real-world financial documents, not synthetic benchmarks. Detailed methodology, statistical analysis, and reproducible results.

Details: https://thetokencompany.com/benchmarks/financebench

## Key Links

- [Home](https://thetokencompany.com)
- [Pricing](https://thetokencompany.com/pricing)
- [Blog](https://thetokencompany.com/blog)
- [Benchmarks](https://thetokencompany.com/benchmarks)
- [Contact](https://thetokencompany.com/contact)
- [Careers](https://thetokencompany.com/careers)
- [Privacy Policy](https://thetokencompany.com/privacy)
- [Data Residency](https://thetokencompany.com/data-residency)

## Blog Posts

- [Pax Historia Case Study](https://thetokencompany.com/blog/pax-historia): One of the biggest token consumers globally improved quality by removing context bloat
- [bear-1.1 Release](https://thetokencompany.com/blog/bear-1-1): Improved accuracy preservation and faster compression speeds
- [bear-1 Launch](https://thetokencompany.com/blog/bear-1): First LLM input compression model — 66% compression, 3x cost reduction