November 2025

bear-1: First LLM Input Compression Model

bear-1 is a compression model that reduces LLM input tokens by 66% without sacrificing accuracy. It integrates as a single API call before your existing LLM pipeline.

What is bear-1?

bear-1 is The Token Company's first LLM input compression model. It uses semantic compression to intelligently remove redundant tokens from prompts before they reach your language model. The result: you send fewer tokens to your LLM, pay less, and get the same (or better) results.

Unlike simple truncation or summarization, bear-1 understands the semantic structure of your input. It identifies and removes tokens that don't contribute to the meaning of the text — what we call “context bloat” — while preserving the information your LLM needs to generate accurate responses.

Performance

66%

Token reduction

Cost reduction

+1.1%

Accuracy gain

<0.1s

Per 10K tokens

Benchmarked on GPT-4o-mini using LongBench v2, bear-1 achieves a 66% reduction in input tokens while actually improving accuracy by up to 1.1%. This means you're not just saving money — your LLM performs better with compressed input because the noise has been removed.

How it works

Integration is a single API call. Send your text to the bear-1 compression endpoint, get compressed text back, and pass it to your LLM. That's the entire integration.

Pricing

$0.05 per 1M compressed tokens. You only pay for the tokens that were removed. The tokens that pass through are free.

What's next

bear-1 laid the foundation for LLM input compression. We've since released bear-1.1, an improved version with better accuracy preservation and faster compression. If you're starting fresh, we recommend bear-1.1.

Start compressing with bear-1

Create an account and get your API key.

Start compressing