Case Study
How input compression enabled world class performance for long running agents
Helonic runs AI on construction drawings at scale. bear-1.2 trims tokens while keeping every critical detail intact.
4.7%
Safe compression
~47K
Tokens saved / request
Hours
Agent run duration
The challenge
Helonic (by Articulate, YC F25) uses AI to analyze construction drawings — detecting coordination conflicts, code compliance issues, and design errors across architectural, structural, and MEP plans. A single analysis can approach 1 million tokens of input context.
At that size, prompts fill with OCR artifacts, repeated headers, and filler that dilute the model's attention away from the structural relationships that matter.
Helonic
AI construction drawing analysis · Articulate YC F25
Where compression fits
bear-1.2 sits between extraction and reasoning — compressing assembled prompts before they hit the expensive reasoning model.
“Just having a prompt be less bloated is always a good thing. Agents perform better when they have more clear directions and less instructions. Having compression when we're running at the scale of hours is a very big deal.”
— Manas Gandhi, Co-Founder
Why 4.7% matters at scale
Construction data is domain-critical — every token could be a structural dimension or code reference. Helonic uses conservative settings, prioritizing safety. But at near million-token prompts, even 4.7% means ~47K fewer tokens per request, compounding across hour-scale analyses.
Tunable up to 20%+ for natural language content