The TTC Router is currently in private preview. Request access to get started.

TTC Router

The Token Company's AI router — OpenAI-compatible endpoint with built-in compression, powered by Stripe AI Gateway and DeepInfra.

The TTC Router is a drop-in replacement for the OpenAI /chat/completions endpoint. Every message is automatically compressed before being forwarded to your chosen LLM provider — no extra code needed.

Usage

Point the OpenAI SDK at our base URL and prefix your model with the provider name.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.thetokencompany.com/v1",
    api_key="ttc-...",
)

response = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Your long prompt goes here..."},
    ],
)

print(response.choices[0].message.content)

Supported models

20%

Prices per 1M tokens — no markup. Effective input prices are estimated based on typical compression savings.

Model	Input	Cached Input	Output
`anthropic/claude-opus-4-8`	$5.00$4.00	$0.50$0.40	$25.00
`anthropic/claude-opus-4-7`	$5.00$4.00	$0.50$0.40	$25.00
`anthropic/claude-opus-4-6`	$5.00$4.00	$0.50$0.40	$25.00
`anthropic/claude-sonnet-4-6`	$3.00$2.40	$0.30$0.24	$15.00
`anthropic/claude-opus-4-5`	$5.00$4.00	$0.50$0.40	$25.00
`anthropic/claude-sonnet-4-5`	$3.00$2.40	$0.30$0.24	$15.00
`anthropic/claude-haiku-4-5`	$1.00$0.80	$0.10$0.08	$5.00

Model	Input	Cached Input	Output
`openai/gpt-5.4`	$2.50$2.00	$0.25$0.20	$15.00
`openai/gpt-5.2`	$1.75$1.40	$0.18$0.14	$14.00
`openai/gpt-5.1`	$1.25$1.00	$0.13$0.10	$10.00
`openai/gpt-5.1-codex`	$1.25$1.00	$0.13$0.10	$10.00
`openai/gpt-5.1-codex-max`	$1.25$1.00	$0.13$0.10	$10.00
`openai/gpt-5.1-codex-mini`	$0.25$0.20	$0.03$0.02	$2.00
`openai/gpt-5`	$1.25$1.00	$0.13$0.10	$10.00
`openai/gpt-5-codex`	$1.25$1.00	$0.13$0.10	$10.00
`openai/gpt-5-mini`	$0.25$0.20	$0.03$0.02	$2.00
`openai/gpt-5-nano`	$0.05$0.04	$0.01$0.008	$0.40

Model	Input	Cached Input	Output
`gemini/gemini-3.1-pro-preview`	$2.00$1.60	$0.20$0.16	$12.00
`gemini/gemini-3.1-flash-lite-preview`	$0.25$0.20	$0.03$0.02	$1.50
`gemini/gemini-3-pro-preview`	$2.00$1.60	$0.20$0.16	$12.00
`gemini/gemini-3-flash`	$0.50$0.40	$0.05$0.04	$3.00
`gemini/gemini-2.5-pro`	$1.25$1.00	$0.31$0.25	$10.00
`gemini/gemini-2.5-flash`	$0.30$0.24	$0.08$0.06	$2.50
`gemini/gemini-2.5-flash-lite`	$0.10$0.08	$0.03$0.02	$0.40
`gemini/gemini-2.5-flash-image`	$0.30$0.24	$0.08$0.06	$30.00

Model	Input	Cached Input	Output
`deepseek-ai/DeepSeek-V4-Pro`	$1.30$1.04	$0.10$0.08	$2.60
`deepseek-ai/DeepSeek-V4-Flash`	$0.10$0.08	$0.02$0.02	$0.20

Model	Input	Cached Input	Output
`Qwen/Qwen3.7-Max`	$2.50$2.00	$0.50$0.40	$7.50
`Qwen/Qwen3-Max`	$1.20$0.96	$0.24$0.19	$6.00
`Qwen/Qwen3-Max-Thinking`	$1.20$0.96	$0.24$0.19	$6.00
`Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo`	$0.30$0.24	$0.10$0.08	$1.00
`Qwen/Qwen3-235B-A22B-Thinking-2507`	$0.23$0.18	$0.20$0.16	$2.30
`Qwen/Qwen3-235B-A22B-Instruct-2507`	$0.09$0.07	—	$0.10
`Qwen/Qwen3-Next-80B-A3B-Instruct`	$0.09$0.07	—	$1.10
`Qwen/Qwen3-VL-235B-A22B-Instruct`	$0.20$0.16	$0.11$0.09	$0.88
`Qwen/Qwen3-VL-30B-A3B-Instruct`	$0.15$0.12	—	$0.60
`Qwen/Qwen3-32B`	$0.08$0.06	—	$0.28
`Qwen/Qwen3-30B-A3B`	$0.12$0.10	—	$0.50
`Qwen/Qwen3-14B`	$0.12$0.10	—	$0.24
`Qwen/Qwen2.5-72B-Instruct`	$0.36$0.29	—	$0.40

OpenAI, Anthropic, and Google models are routed via Stripe AI Gateway. DeepSeek and Qwen models are routed via DeepInfra.

Streaming

Full SSE streaming support — works exactly like the OpenAI SDK.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.thetokencompany.com/v1",
    api_key="ttc-...",
)

stream = client.chat.completions.create(
    model="openai/gpt-5.4",
    messages=[{"role": "user", "content": "Your long prompt..."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Info

The response includes a compressed_input_tokens field in the usage object so you can see exactly how many tokens were sent after compression.

How it works

You send a standard chat completions request
All messages are compressed automatically
The compressed request is forwarded to the LLM provider
The response is returned with token usage stats

Features

OpenAI SDK compatible — works with the Python and Node SDKs
All messages compressed — system, user, and assistant messages are all optimized
Streaming — full SSE streaming support
Multimodal — text parts are compressed; images pass through unchanged
Tools & function calling — fully supported
Automatic caching — repeated messages are served from cache