Backed byCombinator

Supercharge LLM performance
by removing context bloat

The bear-1.1 compression model removes context bloat from your prompts before they hit your LLM. Drop-in API middleware that integrates in minutes. (we measured it 🙂)

Intelligent semantic processing

The bear-1 and bear-1.1 models process tokens based on context and semantic intent. bear-1.1 is the latest version with improved accuracy.

In its most fundamental sense, compression is the process of encoding
information using fewer bits or resources than the original representation
by identifying and eliminating statistical redundancies or irrelevant data
within a dataset. Whether applied to digital media, text, or the high-
dimensional vector spaces of Large Language Models, compression relies on
the principle that most raw information contains noise or repeating patterns
that do not contribute new meaning. By applying an algorithm—or in your
case, an ML-based model—to map the input data into a more compact form,
you essentially distil the signal from the noise. In the context of ML
inputs, this means transforming long-form text into a dense, mathematically
efficient representation that preserves the original semantic intent and
logical relationships while significantly reducing the physical token count,
thereby allowing a system to process more information within the same fixed
computational window or budget.
In its most fundamental sense, compression is the process of encoding
information using fewer bits or resources than the original representation
by identifying and eliminating statistical redundancies or irrelevant data
within a dataset. Whether applied to digital media, text, or the high-
dimensional vector spaces of Large Language Models, compression relies on
the principle that most raw information contains noise or repeating patterns
that do not contribute new meaning. By applying an algorithm—or in your
case, an ML-based model—to map the input data into a more compact form,
you essentially distil the signal from the noise. In the context of ML
inputs, this means transforming long-form text into a dense, mathematically
efficient representation that preserves the original semantic intent and
logical relationships while significantly reducing the physical token count,
thereby allowing a system to process more information within the same fixed
computational window or budget.
← Move to compare →

Featurednew

Pax Historia
Pax Historia · 193B tokens/mo on OpenRouter

One of the biggest token consumers globally found that compressed prompts outperformed uncompressed in a 268K-vote blind arena.

+4.9%

Sonnet 4.5 score

+15%

Gemini 3 Flash score

+5%

Purchase amount lift

Read the full case study →

One API call

Send text in, get compressed text back. Drop it in before your LLM call. That's the entire integration.

POSTapi.thetokencompany.com/v1/compress
{
"model": "bear-1.1",
"input": "Your long text to compress..."
}
response
{
"output": "Compressed text...",
"original_input_tokens": 1284,
"output_tokens": 436
}

$0.05 per 1M compressed tokens · Available models: bear-1, bear-1.1

Benchmarks

Measured on real-world financial documents, not synthetic benchmarks.

Use cases

Backed by

the founders and operators of

Silo
Wolt
Y Combinator
Supercell
Hugging Face
SVA

Ready to compress?

Access the compression API.