How input compression enabled world class performance for long running agents
Helonic runs AI on construction drawings at scale. bear-1.2 trims tokens while keeping every critical detail intact.

4.7%
Safe compression
~47K
Tokens saved / request
Hours
Agent run duration
The challenge
Helonic (by Articulate, YC F25) uses AI to analyze construction drawings, detecting coordination conflicts, code compliance issues, and design errors across architectural, structural, and MEP plans. A single analysis can approach 1 million tokens of input context.
At that size, prompts fill with OCR artifacts, repeated headers, and filler that dilute the model's attention away from the structural relationships that matter.
Helonic
AI construction drawing analysis · Articulate YC F25
Where compression fits
bear-1.2 sits between extraction and reasoning. It compresses assembled prompts before they hit the expensive reasoning model.
“Just having a prompt be less bloated is always a good thing. Agents perform better when they have more clear directions and less instructions. Having compression when we're running at the scale of hours is a very big deal.”
Manas Gandhi, Co-Founder
Why 4.7% matters at scale
Construction data is domain-critical. Every token could be a structural dimension, a code reference, a load value, or a callout on a drawing. Drop one and the agent silently loses context that a downstream reasoning step needed to flag a real-world conflict.
So Helonic doesn't pick the aggressiveness setting that maximises savings. They pick the one with a margin of safety. At 0.05 aggressiveness, bear-1.2 only strips the most clearly redundant tokens. OCR noise, repeated headers, page numbers, boilerplate legends, formatting artifacts. The kind of content that a human reading the drawing would skip past automatically. Nothing that contributes to the structural reasoning.
A 4.7% trim sounds small in isolation. It isn't. Helonic's prompts run near a million tokens, agents loop for hours, and every loop pays for the full context again. At that volume, 4.7% is ~47K tokens off every request. Multiply by every loop, every drawing, every project, and the same conservative setting that protects the structural data also pays for itself many times over.
Tunable up to 20%+ for natural language content where the consequences of a wrong cut are lower
Ready to try it?
Book a 30-minute call and we'll get you set up.