Backed byCombinator

Optimize LLM context
by removing input bloat

Bear-1.2 compression removes low signal tokens from your prompts before they hit your LLM.

Start compressing Talk with us

Backed by people behind

Save tokens and improve accuracy on agent's background knowledge

Bear-1.2 compresses your agent's background knowledge before it enters the context window.

Start compressing

Featurednew

Pax Historia · 193B tokens/mo

Compressed prompts outperformed uncompressed in a 268K-vote blind arena across all models.

+4.9%

Sonnet 4.5

+15%

Gemini 3 Flash

+5%

Purchase lift

Read the case study →

Pax Historia · 193B tokens/mo

Compressed prompts outperformed uncompressed in a 268K-vote blind arena across all models.

+4.9%

Sonnet 4.5

+15%

Gemini 3 Flash

+5%

Purchase lift

Read the case study →

Helonic (YC F25) · Construction AI

Long-running agents analyzing construction drawings at near million-token prompts.

4.7%

Token reduction

~47K

Saved per request

Hours

Agent run time

Read the case study →

Intelligent semantic processing

The bear-1 and bear-1.2 models process tokens based on context and semantic intent. Compression runs deterministic and low latency.

Start compressing

In its most fundamental sense, compression is the process of encoding

information using fewer bits or resources than the original representation

by identifying and eliminating statistical redundancies or irrelevant data

within a dataset. Whether applied to digital media, text, or the high-

dimensional vector spaces of Large Language Models, compression relies on

the principle that most raw information contains noise or repeating patterns

that do not contribute new meaning. By applying an algorithm—or in your

case, an ML-based model—to map the input data into a more compact form,

you essentially distil the signal from the noise. In the context of ML

inputs, this means transforming long-form text into a dense, mathematically

efficient representation that preserves the original semantic intent and

logical relationships while significantly reducing the physical token count,

thereby allowing a system to process more information within the same fixed

computational window or budget.

One API call

Send text in, get compressed text back. Drop it in before your LLM call. That's the entire integration.

POSTapi.thetokencompany.com/v1/compress

{
"model": "bear-1.1",
"input": "Your long text to compress..."
}

response

{
"output": "Compressed text...",
"original_input_tokens": 1284,
"output_tokens": 436
}

Read the docs

Benchmarks

financial QA

Making LLMs understand financial documents better

Compression improved financial QA accuracy by 2.7 percentage points on 150 SEC filing questions — while reducing input tokens by up to 20%.

February 2026

E2E latency

Reducing LLM response times through compression

Up to 37% faster on Claude Opus 4.6 and 30% on GPT-5.2 — saving seconds per request across 5 input sizes.

February 2026

reading comprehension

Improving LLM reading comprehension with compression

Compression improved SQuAD 2.0 accuracy by 4.0 percentage points on 150 reading comprehension questions — with 17% fewer tokens.

March 2026

conversational QA

Zero accuracy loss on conversational QA with 14% fewer tokens

Compression maintained 87.3% accuracy on 150 multi-turn CoQA questions across 4 domains — with 14% fewer tokens.

March 2026

More benchmarks coming soon

We are evaluating compression across additional domains and model families. Results will be published here as they are completed.

Start compressing

Use cases

LLM Entertainment & Gaming

Longer memories, richer worlds, same budget.

Meeting Transcription

Distill hours of calls into signal-dense context.

Web Scraping

Strip boilerplate from crawled pages before ingest.

Document Analysis

Fit more PDFs and reports into one context window.

Ready to compress?

Access the compression API.

Start compressing Talk with us

Optimize LLM contextby removing input bloat

Backed by people behind

Save tokens and improve accuracy on agent's background knowledge

Featurednew

Intelligent semantic processing

One API call

Benchmarks

Making LLMs understand financial documents better

Reducing LLM response times through compression

Improving LLM reading comprehension with compression

Zero accuracy loss on conversational QA with 14% fewer tokens

Use cases

LLM Entertainment & Gaming

Meeting Transcription

Web Scraping

Document Analysis

Ready to compress?

Optimize LLM context
by removing input bloat