Compression Statistics

Track per-turn and aggregate compression metrics.

Every withCompression() wrapper exposes a client.compression object that tracks per-turn and aggregate compression stats. Use it to monitor savings, debug compression behavior, and build dashboards.

OpenAI

from openai import OpenAI
from thetokencompany.openai import with_compression

client = with_compression(OpenAI(), compression_api_key="ttc-...")

# Make some calls
client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Explain quantum computing..."}],
)
client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Now explain it simply..."}],
)

# Access compression stats
stats = client.compression

print(f"Total calls: {stats.calls}")
print(f"Total tokens saved: {stats.total_tokens_saved}")
print(f"Overall ratio: {stats.ratio:.1f}x")

# Per-turn history
for i, turn in enumerate(stats.history):
print(f"Turn {i+1}: {turn.input_tokens}{turn.output_tokens} "
f"({turn.tokens_saved} saved, {turn.messages_compressed} messages)")

Anthropic

from anthropic import Anthropic
from thetokencompany.anthropic import with_compression

client = with_compression(Anthropic(), compression_api_key="ttc-...")

client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Your prompt..."}],
)

stats = client.compression
print(f"Tokens saved: {stats.total_tokens_saved}")

Vercel AI SDK

import { openai } from "@ai-sdk/openai";
import { generateText } from "ai";
import { withCompression } from "the-token-company/ai-sdk";

const model = withCompression(openai("gpt-5.4"), {
compressionApiKey: "ttc-...",
});

const { text } = await generateText({
model,
messages: [{ role: "user", content: "Your prompt..." }],
});

// Stats available via model.compression
console.log(`Tokens saved: ${model.compression.totalTokensSaved}`);

Per-turn stats

Each entry in stats.history represents one create() call:

input_tokensintrequired
Total tokens across all messages before compression.
output_tokensintrequired
Total tokens after compression.
tokens_savedintrequired
Tokens removed in this turn.
messages_compressedintrequired
Number of messages that were compressed in this turn.
ratiofloatrequired
Compression ratio for this turn (e.g. 2.5x).
timestampfloatrequired
Unix timestamp when the turn was processed.

Aggregate stats

The compression object also provides aggregate properties across all turns:

total_tokens_savedintrequired
Sum of all tokens saved across all calls.
total_input_tokensintrequired
Sum of all input tokens.
total_output_tokensintrequired
Sum of all output tokens.
callsintrequired
Number of create() calls made.
ratiofloatrequired
Overall compression ratio across all calls.

Logging

import logging

# Enable debug logging to see per-request compression details
logging.basicConfig(level=logging.DEBUG)

# Or log stats after each call
stats = client.compression
for turn in stats.history:
logging.info(
"Compressed %d%d tokens (%.0f%% reduction, %d messages)",
turn.input_tokens,
turn.output_tokens,
(1 - turn.output_tokens / turn.input_tokens) * 100,
turn.messages_compressed,
)