← Back to research

Course · UIUC CS 598 · PACT

Agents over-deliberate—pruning calls beats bigger models

LLM agents · DPO · Phi-3-mini

Multi-step agents often burn minutes on reasoning loops that add little value. I explored decoupling 'thinking' from execution with a fine-tuned speculator that predicts when another full LLM call is unnecessary.

Problem

Agent pipelines stack planner, critic, and executor calls. Each hop adds latency and cost—even when the next action is obvious from context.

Approach

A Phi-3-mini speculator learns to flag over-deliberation; DPO preference alignment balances speed against accuracy on held-out agent traces.

Early insight

Many failures weren't reasoning errors—they were call-pattern errors. Pruning the call graph recovered most of the latency budget before touching base model size.