Backed byCombinator

Supercharge LLM performance
by removing context bloat

The bear-1.1 compression model removes context bloat from your prompts before they hit your LLM. Drop-in API middleware that integrates in minutes. (we measured it 🙂)

Try for free Talk with us

Intelligent semantic processing

The bear-1 and bear-1.1 models process tokens based on context and semantic intent. bear-1.1 is the latest version with improved accuracy.

In its most fundamental sense, compression is the process of encoding

information using fewer bits or resources than the original representation

by identifying and eliminating statistical redundancies or irrelevant data

within a dataset. Whether applied to digital media, text, or the high-

dimensional vector spaces of Large Language Models, compression relies on

the principle that most raw information contains noise or repeating patterns

that do not contribute new meaning. By applying an algorithm—or in your

case, an ML-based model—to map the input data into a more compact form,

you essentially distil the signal from the noise. In the context of ML

inputs, this means transforming long-form text into a dense, mathematically

efficient representation that preserves the original semantic intent and

logical relationships while significantly reducing the physical token count,

thereby allowing a system to process more information within the same fixed

computational window or budget.

In its most fundamental sense, compression is the process of encoding

information using fewer bits or resources than the original representation

by identifying and eliminating statistical redundancies or irrelevant data

within a dataset. Whether applied to digital media, text, or the high-

dimensional vector spaces of Large Language Models, compression relies on

the principle that most raw information contains noise or repeating patterns

that do not contribute new meaning. By applying an algorithm—or in your

case, an ML-based model—to map the input data into a more compact form,

you essentially distil the signal from the noise. In the context of ML

inputs, this means transforming long-form text into a dense, mathematically

efficient representation that preserves the original semantic intent and

logical relationships while significantly reducing the physical token count,

thereby allowing a system to process more information within the same fixed

computational window or budget.

← Move to compare →

Featurednew

Pax Historia · 193B tokens/mo on OpenRouter

One of the biggest token consumers globally found that compressed prompts outperformed uncompressed in a 268K-vote blind arena.

+4.9%

Sonnet 4.5 score

+15%

Gemini 3 Flash score

+5%

Purchase amount lift

Read the full case study →

One API call

Send text in, get compressed text back. Drop it in before your LLM call. That's the entire integration.

POSTapi.thetokencompany.com/v1/compress

{
"model": "bear-1.1",
"input": "Your long text to compress..."
}

response

{
"output": "Compressed text...",
"original_input_tokens": 1284,
"output_tokens": 436
}

$0.05 per 1M compressed tokens · Available models: bear-1, bear-1.1

Benchmarks

Measured on real-world financial documents, not synthetic benchmarks.

Making LLMs understand financial documents better

Compression improved financial QA accuracy by 2.7 percentage points on 150 SEC filing questions — while reducing input tokens by up to 20%.

February 2026

Reducing LLM response times through compression

Up to 37% faster on Claude Opus 4.6 and 30% on GPT-5.2 — saving seconds per request across 5 input sizes.

February 2026

More benchmarks coming soon

We are evaluating compression across additional domains and model families. Results will be published here as they are completed.

Meanwhile, try our API for free

Use cases

Chat applications

Expand conversation history by 3x within the same context window. Process input to increase context quality.

Try our infinite chat history demo

Document processing

Process web scrapes, PDFs, and large documents without bloated inputs.

Test inputs on a Google Colab notebook

Backed by

the founders and operators of

Ready to compress?

Access the compression API.

Try for free Talk with us