Foundations

Reasoning Models

Reasoning models like OpenAI's o1/o3 and Anthropic's Claude with extended thinking spend more computational effort at inference time to solve harder problems, trading latency and cost for significantly improved accuracy on complex tasks. Unlike standard large language models that generate output in a single forward pass, reasoning models run chain-of-thought internally, sometimes generating thousands of thinking tokens before producing a response. The routing decision is more specific than "use reasoning for complex tasks": reach for a reasoning model when the problem requires multi-step inference where intermediate conclusions depend on earlier ones, because that is the structure that benefits from extended thinking time; if the task is hard but self-contained in a single step, the extra cost buys little.