Course · UIUC CS 598 · PACT
Agents over-deliberate—pruning calls beats bigger models
LLM agents · DPO · Phi-3-mini
Multi-step agents often burn minutes on reasoning loops that add little value. I explored decoupling 'thinking' from execution with a fine-tuned speculator that predicts when another full LLM call is unnecessary.
Problem
Agent pipelines stack planner, critic, and executor calls. Each hop adds latency and cost—even when the next action is obvious from context.
Approach
A Phi-3-mini speculator learns to flag over-deliberation; DPO preference alignment balances speed against accuracy on held-out agent traces.
Early insight
Many failures weren't reasoning errors—they were call-pattern errors. Pruning the call graph recovered most of the latency budget before touching base model size.