OpenAI
Automatic compression for OpenAI API calls.
The withCompression() wrapper automatically compresses all non-assistant messages before sending them to OpenAI. Your existing code stays the same - just wrap your client.
Setup
from openai import OpenAI
from thetokencompany.openai import with_compression
client = with_compression(
OpenAI(),
compression_api_key="ttc-...",
)
# Use OpenAI exactly as before - compression happens automatically
response = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your long prompt text here..."},
],
)
print(response.choices[0].message.content)
Info
Assistant messages pass through unchanged so the LLM cache is fully preserved. Only
system, user, and tool messages are compressed.Per-role aggressiveness
Set different compression levels per message role. Roles not in the dictionary are not compressed.
client = with_compression(
OpenAI(),
compression_api_key="ttc-...",
aggressiveness={
"system": 0.1, # light - preserve instructions
"user": 0.4, # moderate - compress user messages
"tool": 0.6, # aggressive - compress tool results
},
)
How it works
- You call
client.chat.completions.create()as normal - The wrapper intercepts the request and compresses all non-assistant messages
- Compressed messages are sent to OpenAI
- You receive the response as usual - no changes needed