The TTC Router is currently in private preview. Request access to get started.

TTC Router

The Token Company's AI router — OpenAI-compatible endpoint with built-in compression, powered by Stripe AI Gateway and DeepInfra.

The TTC Router is a drop-in replacement for the OpenAI /chat/completions endpoint. Every message is automatically compressed before being forwarded to your chosen LLM provider — no extra code needed.

Usage

Point the OpenAI SDK at our base URL and prefix your model with the provider name.

from openai import OpenAI

client = OpenAI(
base_url="https://api.thetokencompany.com/v1",
api_key="ttc-...",
)

response = client.chat.completions.create(
model="openai/gpt-5.4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your long prompt goes here..."},
],
)

print(response.choices[0].message.content)

Supported models

20%

Prices per 1M tokens — no markup. Effective input prices are estimated based on typical compression savings.

ModelInputCached InputOutput
anthropic/claude-opus-4-8$5.00$4.00$0.50$0.40$25.00
anthropic/claude-opus-4-7$5.00$4.00$0.50$0.40$25.00
anthropic/claude-opus-4-6$5.00$4.00$0.50$0.40$25.00
anthropic/claude-sonnet-4-6$3.00$2.40$0.30$0.24$15.00
anthropic/claude-opus-4-5$5.00$4.00$0.50$0.40$25.00
anthropic/claude-sonnet-4-5$3.00$2.40$0.30$0.24$15.00
anthropic/claude-haiku-4-5$1.00$0.80$0.10$0.08$5.00
ModelInputCached InputOutput
openai/gpt-5.4$2.50$2.00$0.25$0.20$15.00
openai/gpt-5.2$1.75$1.40$0.18$0.14$14.00
openai/gpt-5.1$1.25$1.00$0.13$0.10$10.00
openai/gpt-5.1-codex$1.25$1.00$0.13$0.10$10.00
openai/gpt-5.1-codex-max$1.25$1.00$0.13$0.10$10.00
openai/gpt-5.1-codex-mini$0.25$0.20$0.03$0.02$2.00
openai/gpt-5$1.25$1.00$0.13$0.10$10.00
openai/gpt-5-codex$1.25$1.00$0.13$0.10$10.00
openai/gpt-5-mini$0.25$0.20$0.03$0.02$2.00
openai/gpt-5-nano$0.05$0.04$0.01$0.008$0.40
ModelInputCached InputOutput
gemini/gemini-3.1-pro-preview$2.00$1.60$0.20$0.16$12.00
gemini/gemini-3.1-flash-lite-preview$0.25$0.20$0.03$0.02$1.50
gemini/gemini-3-pro-preview$2.00$1.60$0.20$0.16$12.00
gemini/gemini-3-flash$0.50$0.40$0.05$0.04$3.00
gemini/gemini-2.5-pro$1.25$1.00$0.31$0.25$10.00
gemini/gemini-2.5-flash$0.30$0.24$0.08$0.06$2.50
gemini/gemini-2.5-flash-lite$0.10$0.08$0.03$0.02$0.40
gemini/gemini-2.5-flash-image$0.30$0.24$0.08$0.06$30.00
ModelInputCached InputOutput
deepseek-ai/DeepSeek-V4-Pro$1.30$1.04$0.10$0.08$2.60
deepseek-ai/DeepSeek-V4-Flash$0.10$0.08$0.02$0.02$0.20
ModelInputCached InputOutput
Qwen/Qwen3.7-Max$2.50$2.00$0.50$0.40$7.50
Qwen/Qwen3-Max$1.20$0.96$0.24$0.19$6.00
Qwen/Qwen3-Max-Thinking$1.20$0.96$0.24$0.19$6.00
Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo$0.30$0.24$0.10$0.08$1.00
Qwen/Qwen3-235B-A22B-Thinking-2507$0.23$0.18$0.20$0.16$2.30
Qwen/Qwen3-235B-A22B-Instruct-2507$0.09$0.07$0.10
Qwen/Qwen3-Next-80B-A3B-Instruct$0.09$0.07$1.10
Qwen/Qwen3-VL-235B-A22B-Instruct$0.20$0.16$0.11$0.09$0.88
Qwen/Qwen3-VL-30B-A3B-Instruct$0.15$0.12$0.60
Qwen/Qwen3-32B$0.08$0.06$0.28
Qwen/Qwen3-30B-A3B$0.12$0.10$0.50
Qwen/Qwen3-14B$0.12$0.10$0.24
Qwen/Qwen2.5-72B-Instruct$0.36$0.29$0.40

OpenAI, Anthropic, and Google models are routed via Stripe AI Gateway. DeepSeek and Qwen models are routed via DeepInfra.

Streaming

Full SSE streaming support — works exactly like the OpenAI SDK.

from openai import OpenAI

client = OpenAI(
base_url="https://api.thetokencompany.com/v1",
api_key="ttc-...",
)

stream = client.chat.completions.create(
model="openai/gpt-5.4",
messages=[{"role": "user", "content": "Your long prompt..."}],
stream=True,
)

for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Info
The response includes a compressed_input_tokens field in the usage object so you can see exactly how many tokens were sent after compression.

How it works

  1. You send a standard chat completions request
  2. All messages are compressed automatically
  3. The compressed request is forwarded to the LLM provider
  4. The response is returned with token usage stats

Features

  • OpenAI SDK compatible — works with the Python and Node SDKs
  • All messages compressed — system, user, and assistant messages are all optimized
  • Streaming — full SSE streaming support
  • Multimodal — text parts are compressed; images pass through unchanged
  • Tools & function calling — fully supported
  • Automatic caching — repeated messages are served from cache