OpenAI

Automatic compression for OpenAI API calls.

The withCompression() wrapper automatically compresses all non-assistant messages before sending them to OpenAI. Your existing code stays the same - just wrap your client.

Setup

from openai import OpenAI
from thetokencompany.openai import with_compression

client = with_compression(
OpenAI(),
compression_api_key="ttc-...",
)

# Use OpenAI exactly as before - compression happens automatically
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your long prompt text here..."},
],
)

print(response.choices[0].message.content)
Info
Assistant messages pass through unchanged so the LLM cache is fully preserved. Only system, user, and tool messages are compressed.

Per-role aggressiveness

Set different compression levels per message role. Roles not in the dictionary are not compressed.

client = with_compression(
OpenAI(),
compression_api_key="ttc-...",
aggressiveness={
"system": 0.1, # light - preserve instructions
"user": 0.4, # moderate - compress user messages
"tool": 0.6, # aggressive - compress tool results
},
)

How it works

  1. You call client.chat.completions.create() as normal
  2. The wrapper intercepts the request and compresses all non-assistant messages
  3. Compressed messages are sent to OpenAI
  4. You receive the response as usual - no changes needed