Skip to main content

How It Works

Add the Content-Encoding: gzip header and send the gzip-compressed JSON body. The API will automatically decompress your request and gzip-compress the response. Fully backwards compatible — requests without the header work exactly as before.
We recommend gzip for every request. Client-side compression takes under 1ms for most payloads and is never slower end-to-end. In our tests, 100K tokens completed in under 100ms when gzip achieved over 90% compression.
If you’re using the Python SDK or the npm package, gzip is enabled by default — no extra setup needed.

Examples

import gzip
import json
import requests

payload = json.dumps({
    "model": "bear-1.2",
    "input": very_long_text,
    "compression_settings": {"aggressiveness": 0.1}
}).encode()

response = requests.post(
    "https://api.thetokencompany.com/v1/compress",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
        "Content-Encoding": "gzip"
    },
    data=gzip.compress(payload)
)

print(response.json()["output"])

Throughput Benchmarks

Real end-to-end measurements against the live API, 20 runs per data point (middle 50% averaged). Throughput varies based on how much gzip can compress your content and reduce network latency. Over the internet:
TokensRawGzipSpeedup
10,00097K tok/s182K tok/s1.9x
100,000471K tok/s887K tok/s1.9x
1,000,000647K tok/s1.44M tok/s2.2x