Anthropic released Claude Sonnet 4.6 in early 2026. It's the current recommended model for most production workloads — faster than Opus, more capable than the previous Sonnet, and with a set of architectural improvements that matter if you're building systems on the Claude API.
Note: The certification exam tests architecture patterns and API usage, not model-specific details. Model updates like Sonnet 4.6 don't change the concepts tested — but understanding model capabilities helps you make better architecture decisions.
Adaptive reasoning
The headline feature is adaptive reasoning — the model dynamically allocates thinking tokens based on task complexity. Previous models used extended thinking with a fixed budget_tokens parameter. You'd set a ceiling and the model would use up to that amount regardless of whether the problem needed it.
With adaptive reasoning, the model scales its thinking effort automatically. A simple classification task might use minimal reasoning tokens, while a multi-step code generation task triggers deeper analysis. The practical result: you pay for the thinking you actually need.
From an API perspective, you still pass budget_tokens as a maximum, but the model now treats it as a ceiling rather than a target:
response = client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=8096,
thinking={
"type": "enabled",
"budget_tokens": 5000 # ceiling, not target
},
messages=[{"role": "user", "content": "Classify this support ticket: ..."}]
)
For a simple classification, the model might use 200 thinking tokens. For a complex architectural review, it might use the full 5,000. You get the same interface with better cost efficiency.
1M context window beta
Sonnet 4.6 supports a 1 million token context window in beta. The standard context remains 200K, but you can opt into the extended window for workloads that need it — large codebase analysis, long document processing, or multi-agent orchestration where context accumulates quickly.
This doesn't change how you architect context management. The fundamentals still apply: trim tool results, use persistent case facts for critical data, and put key findings at the top of aggregated inputs to avoid the lost-in-the-middle effect. A larger window gives you more room, but it doesn't eliminate the need for disciplined context budgeting.
If anything, the 1M window makes context management skills more important. More room means more temptation to dump everything into context without trimming. The cost of processing 800K tokens of mostly irrelevant data is real, both in latency and spend.
Improved agentic search performance
Sonnet 4.6 shows measurably better performance on agentic search tasks — workflows where the model needs to plan a search strategy, execute multiple tool calls, synthesise results, and decide whether to search further or return an answer.
This matters for architectures that use Claude as a research agent or retrieval-augmented generation (RAG) orchestrator. The model is better at:
- Deciding when it has enough information to stop searching
- Formulating targeted queries rather than broad sweeps
- Cross-referencing results from multiple sources before answering
For architects, the takeaway is that Sonnet 4.6 is a stronger default for agent loops that involve iterative tool use. You may find that retry logic and forced re-search patterns you built for earlier models are no longer necessary.
Fewer tokens consumed
Across benchmarks, Sonnet 4.6 produces equivalent or better quality output while consuming fewer tokens. This is a model-level efficiency gain — the same prompt produces a tighter response. For high-volume production workloads, this translates directly to lower API costs.
Combined with adaptive reasoning, the token savings compound. You're spending less on thinking and less on output. If you're running cost projections for a production deployment, benchmark with Sonnet 4.6 directly rather than extrapolating from older model pricing.
Claude Haiku 3 deprecation
Anthropic has announced that Claude Haiku 3 will be deprecated on 19 April 2026. If you're running production workloads on Haiku 3, you need to migrate before that date.
The migration path depends on your use case:
- Latency-sensitive classification and routing: Move to Claude Haiku 4 (the current Haiku generation). It's faster and more accurate on structured tasks.
- General-purpose workloads previously on Haiku 3: Consider whether Sonnet 4.6 with adaptive reasoning is a better fit. For tasks that needed Haiku 3's speed but suffered from its accuracy limitations, Sonnet 4.6's dynamic thinking allocation may give you better results at a comparable cost for simple tasks.
Update your model strings in API calls, test thoroughly, and deploy before the cutoff. After 19 April, Haiku 3 API calls will return errors.
# Before: Haiku 3
response = client.messages.create(
model="claude-3-haiku-20240307", # deprecated 19 April 2026
...
)
# After: Haiku 4 (for speed) or Sonnet 4.6 (for capability)
response = client.messages.create(
model="claude-haiku-4-20250514", # speed-optimised
...
)
Model selection in multi-agent systems
Sonnet 4.6's adaptive reasoning opens up a cleaner pattern for multi-agent architectures. Instead of using different models for different agent roles (Haiku for routing, Sonnet for analysis, Opus for synthesis), you can standardise on Sonnet 4.6 and let adaptive reasoning handle the complexity scaling.
A routing agent that just classifies user intent uses minimal thinking tokens. An analysis agent working through a complex problem uses more. Same model, different compute profiles, one deployment to manage.
This isn't always the right call — Opus still outperforms on the most demanding tasks, and Haiku 4 is cheaper for pure speed — but it simplifies your architecture when the workload distribution is broad.
What this means for the exam
The exam doesn't test model version numbers or release-specific features. It tests whether you understand how to select models for different roles in a system, how to manage context effectively, and how to design agent architectures that handle tool use reliably.
That said, adaptive reasoning reinforces a core exam concept: right-sizing compute for each task in a multi-agent pipeline. The exam tests this in Domain 1 — Agentic Architecture, specifically around orchestrator-worker patterns and model selection strategy. Context window management — regardless of window size — is covered in Domain 5 — Context Management.