Compression Statistics
Track per-turn and aggregate compression metrics.
Every withCompression() wrapper exposes a client.compression object that tracks per-turn and aggregate compression stats. Use it to monitor savings, debug compression behavior, and build dashboards.
OpenAI
from openai import OpenAI
from thetokencompany.openai import with_compression
client = with_compression(OpenAI(), compression_api_key="ttc-...")
# Make some calls
client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Explain quantum computing..."}],
)
client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Now explain it simply..."}],
)
# Access compression stats
stats = client.compression
print(f"Total calls: {stats.calls}")
print(f"Total tokens saved: {stats.total_tokens_saved}")
print(f"Overall ratio: {stats.ratio:.1f}x")
# Per-turn history
for i, turn in enumerate(stats.history):
print(f"Turn {i+1}: {turn.input_tokens} → {turn.output_tokens} "
f"({turn.tokens_saved} saved, {turn.messages_compressed} messages)")
Anthropic
from anthropic import Anthropic
from thetokencompany.anthropic import with_compression
client = with_compression(Anthropic(), compression_api_key="ttc-...")
client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Your prompt..."}],
)
stats = client.compression
print(f"Tokens saved: {stats.total_tokens_saved}")
Vercel AI SDK
import { openai } from "@ai-sdk/openai";
import { generateText } from "ai";
import { withCompression } from "the-token-company/ai-sdk";
const model = withCompression(openai("gpt-5.4"), {
compressionApiKey: "ttc-...",
});
const { text } = await generateText({
model,
messages: [{ role: "user", content: "Your prompt..." }],
});
// Stats available via model.compression
console.log(`Tokens saved: ${model.compression.totalTokensSaved}`);Per-turn stats
Each entry in stats.history represents one create() call:
input_tokensintrequiredTotal tokens across all messages before compression.
output_tokensintrequiredTotal tokens after compression.
tokens_savedintrequiredTokens removed in this turn.
messages_compressedintrequiredNumber of messages that were compressed in this turn.
ratiofloatrequiredCompression ratio for this turn (e.g. 2.5x).
timestampfloatrequiredUnix timestamp when the turn was processed.
Aggregate stats
The compression object also provides aggregate properties across all turns:
total_tokens_savedintrequiredSum of all tokens saved across all calls.
total_input_tokensintrequiredSum of all input tokens.
total_output_tokensintrequiredSum of all output tokens.
callsintrequiredNumber of
create() calls made.ratiofloatrequiredOverall compression ratio across all calls.
Logging
import logging
# Enable debug logging to see per-request compression details
logging.basicConfig(level=logging.DEBUG)
# Or log stats after each call
stats = client.compression
for turn in stats.history:
logging.info(
"Compressed %d → %d tokens (%.0f%% reduction, %d messages)",
turn.input_tokens,
turn.output_tokens,
(1 - turn.output_tokens / turn.input_tokens) * 100,
turn.messages_compressed,
)