Case Study

How input compression enabled world class performance for long running agents

Helonic runs AI on construction drawings at scale. bear-1.2 trims tokens while keeping every critical detail intact.

March 2026Long duration AI agentsbear-1.2

4.7%

Safe compression

~47K

Tokens saved / request

Hours

Agent run duration

The challenge

Helonic (by Articulate, YC F25) uses AI to analyze construction drawings — detecting coordination conflicts, code compliance issues, and design errors across architectural, structural, and MEP plans. A single analysis can approach 1 million tokens of input context.

At that size, prompts fill with OCR artifacts, repeated headers, and filler that dilute the model's attention away from the structural relationships that matter.

Helonic

AI construction drawing analysis · Articulate YC F25

Where compression fits

bear-1.2 sits between extraction and reasoning — compressing assembled prompts before they hit the expensive reasoning model.

01OCR

→

02Vision

→

03bear-1.2

→

04Reasoning

→

05Detection

“Just having a prompt be less bloated is always a good thing. Agents perform better when they have more clear directions and less instructions. Having compression when we're running at the scale of hours is a very big deal.”
— Manas Gandhi, Co-Founder

Why 4.7% matters at scale

Construction data is domain-critical — every token could be a structural dimension or code reference. Helonic uses conservative settings, prioritizing safety. But at near million-token prompts, even 4.7% means ~47K fewer tokens per request, compounding across hour-scale analyses.

4.7%guaranteed-safe compression

Tunable up to 20%+ for natural language content

Ready to try it?

Create an account and start compressing.

Get started Read the docs