The TTC Router is currently in private preview. Request access to get started.
TTC Router
The Token Company's AI router — OpenAI-compatible endpoint with built-in compression, powered by Stripe AI Gateway and DeepInfra.
The TTC Router is a drop-in replacement for the OpenAI /chat/completions endpoint. Every message is automatically compressed before being forwarded to your chosen LLM provider — no extra code needed.
Usage
Point the OpenAI SDK at our base URL and prefix your model with the provider name.
from openai import OpenAI
client = OpenAI(
base_url="https://api.thetokencompany.com/v1",
api_key="ttc-...",
)
response = client.chat.completions.create(
model="openai/gpt-5.4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your long prompt goes here..."},
],
)
print(response.choices[0].message.content)
Supported models
20%
Prices per 1M tokens — no markup. Effective input prices are estimated based on typical compression savings.
OpenAI, Anthropic, and Google models are routed via Stripe AI Gateway. DeepSeek and Qwen models are routed via DeepInfra.
Streaming
Full SSE streaming support — works exactly like the OpenAI SDK.
from openai import OpenAI
client = OpenAI(
base_url="https://api.thetokencompany.com/v1",
api_key="ttc-...",
)
stream = client.chat.completions.create(
model="openai/gpt-5.4",
messages=[{"role": "user", "content": "Your long prompt..."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Info
The response includes a
compressed_input_tokens field in the usage object so you can see exactly how many tokens were sent after compression.How it works
- You send a standard chat completions request
- All messages are compressed automatically
- The compressed request is forwarded to the LLM provider
- The response is returned with token usage stats
Features
- OpenAI SDK compatible — works with the Python and Node SDKs
- All messages compressed — system, user, and assistant messages are all optimized
- Streaming — full SSE streaming support
- Multimodal — text parts are compressed; images pass through unchanged
- Tools & function calling — fully supported
- Automatic caching — repeated messages are served from cache