The Token Company · March 2026

How input compression enabled world class performance for long running agents

Helonic runs AI on construction drawings at scale. bear-1.2 trims tokens while keeping every critical detail intact.

Helonic case study

4.7%

Safe compression

~47K

Tokens saved / request

Hours

Agent run duration

The challenge

Helonic (by Articulate, YC F25) uses AI to analyze construction drawings, detecting coordination conflicts, code compliance issues, and design errors across architectural, structural, and MEP plans. A single analysis can approach 1 million tokens of input context.

At that size, prompts fill with OCR artifacts, repeated headers, and filler that dilute the model's attention away from the structural relationships that matter.

Helonic

Helonic

AI construction drawing analysis · Articulate YC F25

Where compression fits

bear-1.2 sits between extraction and reasoning. It compresses assembled prompts before they hit the expensive reasoning model.

01OCR
02Vision
03bear-1.2
04Reasoning
05Detection

“Just having a prompt be less bloated is always a good thing. Agents perform better when they have more clear directions and less instructions. Having compression when we're running at the scale of hours is a very big deal.”

Manas Gandhi, Co-Founder

Why 4.7% matters at scale

Construction data is domain-critical. Every token could be a structural dimension, a code reference, a load value, or a callout on a drawing. Drop one and the agent silently loses context that a downstream reasoning step needed to flag a real-world conflict.

So Helonic doesn't pick the aggressiveness setting that maximises savings. They pick the one with a margin of safety. At 0.05 aggressiveness, bear-1.2 only strips the most clearly redundant tokens. OCR noise, repeated headers, page numbers, boilerplate legends, formatting artifacts. The kind of content that a human reading the drawing would skip past automatically. Nothing that contributes to the structural reasoning.

A 4.7% trim sounds small in isolation. It isn't. Helonic's prompts run near a million tokens, agents loop for hours, and every loop pays for the full context again. At that volume, 4.7% is ~47K tokens off every request. Multiply by every loop, every drawing, every project, and the same conservative setting that protects the structural data also pays for itself many times over.

4.7%conservative setting, no structural risk

Tunable up to 20%+ for natural language content where the consequences of a wrong cut are lower

Ready to try it?

Book a 30-minute call and we'll get you set up.