How It Works
Add theContent-Encoding: gzip header and send the gzip-compressed JSON body. The API will automatically decompress your request and gzip-compress the response. Fully backwards compatible — requests without the header work exactly as before.
If you’re using the Python SDK or the npm package, gzip is enabled by default — no extra setup needed.
Examples
Throughput Benchmarks
Real end-to-end measurements against the live API, 20 runs per data point (middle 50% averaged). Throughput varies based on how much gzip can compress your content and reduce network latency. Over the internet:| Tokens | Raw | Gzip | Speedup |
|---|---|---|---|
| 10,000 | 97K tok/s | 182K tok/s | 1.9x |
| 100,000 | 471K tok/s | 887K tok/s | 1.9x |
| 1,000,000 | 647K tok/s | 1.44M tok/s | 2.2x |