Claude Sonnet 4.6: What's New for Architects

Anthropic released Claude Sonnet 4.6 in early 2026. It's the current recommended model for most production workloads — faster than Opus, more capable than the previous Sonnet, and with a set of architectural improvements that matter if you're building systems on the Claude API.

Note: The certification exam tests architecture patterns and API usage, not model-specific details. Model updates like Sonnet 4.6 don't change the concepts tested — but understanding model capabilities helps you make better architecture decisions.

Adaptive reasoning

The headline feature is adaptive reasoning — the model dynamically allocates thinking tokens based on task complexity. Previous models used extended thinking with a fixed budget_tokens parameter. You'd set a ceiling and the model would use up to that amount regardless of whether the problem needed it.

With adaptive reasoning, the model scales its thinking effort automatically. A simple classification task might use minimal reasoning tokens, while a multi-step code generation task triggers deeper analysis. The practical result: you pay for the thinking you actually need.

From an API perspective, you still pass budget_tokens as a maximum, but the model now treats it as a ceiling rather than a target:

python

response = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=8096,
    thinking={
        "type": "enabled",
        "budget_tokens": 5000  # ceiling, not target
    },
    messages=[{"role": "user", "content": "Classify this support ticket: ..."}]
)

For a simple classification, the model might use 200 thinking tokens. For a complex architectural review, it might use the full 5,000. You get the same interface with better cost efficiency.

1M context window beta

Sonnet 4.6 supports a 1 million token context window in beta. The standard context remains 200K, but you can opt into the extended window for workloads that need it — large codebase analysis, long document processing, or multi-agent orchestration where context accumulates quickly.

This doesn't change how you architect context management. The fundamentals still apply: trim tool results, use persistent case facts for critical data, and put key findings at the top of aggregated inputs to avoid the lost-in-the-middle effect. A larger window gives you more room, but it doesn't eliminate the need for disciplined context budgeting.

If anything, the 1M window makes context management skills more important. More room means more temptation to dump everything into context without trimming. The cost of processing 800K tokens of mostly irrelevant data is real, both in latency and spend.

Improved agentic search performance

Sonnet 4.6 shows measurably better performance on agentic search tasks — workflows where the model needs to plan a search strategy, execute multiple tool calls, synthesise results, and decide whether to search further or return an answer.

This matters for architectures that use Claude as a research agent or retrieval-augmented generation (RAG) orchestrator. The model is better at:

Deciding when it has enough information to stop searching
Formulating targeted queries rather than broad sweeps
Cross-referencing results from multiple sources before answering

For architects, the takeaway is that Sonnet 4.6 is a stronger default for agent loops that involve iterative tool use. You may find that retry logic and forced re-search patterns you built for earlier models are no longer necessary.

Fewer tokens consumed

Across benchmarks, Sonnet 4.6 produces equivalent or better quality output while consuming fewer tokens. This is a model-level efficiency gain — the same prompt produces a tighter response. For high-volume production workloads, this translates directly to lower API costs.

Combined with adaptive reasoning, the token savings compound. You're spending less on thinking and less on output. If you're running cost projections for a production deployment, benchmark with Sonnet 4.6 directly rather than extrapolating from older model pricing.

Claude Haiku 3 deprecation

Anthropic has announced that Claude Haiku 3 will be deprecated on 19 April 2026. If you're running production workloads on Haiku 3, you need to migrate before that date.

The migration path depends on your use case:

Latency-sensitive classification and routing: Move to Claude Haiku 4 (the current Haiku generation). It's faster and more accurate on structured tasks.
General-purpose workloads previously on Haiku 3: Consider whether Sonnet 4.6 with adaptive reasoning is a better fit. For tasks that needed Haiku 3's speed but suffered from its accuracy limitations, Sonnet 4.6's dynamic thinking allocation may give you better results at a comparable cost for simple tasks.

Update your model strings in API calls, test thoroughly, and deploy before the cutoff. After 19 April, Haiku 3 API calls will return errors.

python

# Before: Haiku 3
response = client.messages.create(
    model="claude-3-haiku-20240307",  # deprecated 19 April 2026
    ...
)

# After: Haiku 4 (for speed) or Sonnet 4.6 (for capability)
response = client.messages.create(
    model="claude-haiku-4-20250514",  # speed-optimised
    ...
)

Model selection in multi-agent systems

Sonnet 4.6's adaptive reasoning opens up a cleaner pattern for multi-agent architectures. Instead of using different models for different agent roles (Haiku for routing, Sonnet for analysis, Opus for synthesis), you can standardise on Sonnet 4.6 and let adaptive reasoning handle the complexity scaling.

A routing agent that just classifies user intent uses minimal thinking tokens. An analysis agent working through a complex problem uses more. Same model, different compute profiles, one deployment to manage.

This isn't always the right call — Opus still outperforms on the most demanding tasks, and Haiku 4 is cheaper for pure speed — but it simplifies your architecture when the workload distribution is broad.

What this means for the exam

The exam doesn't test model version numbers or release-specific features. It tests whether you understand how to select models for different roles in a system, how to manage context effectively, and how to design agent architectures that handle tool use reliably.

That said, adaptive reasoning reinforces a core exam concept: right-sizing compute for each task in a multi-agent pipeline. The exam tests this in Domain 1 — Agentic Architecture, specifically around orchestrator-worker patterns and model selection strategy. Context window management — regardless of window size — is covered in Domain 5 — Context Management.

Claude Sonnet 4.6: What's New for Architects

Adaptive reasoning

1M context window beta

Improved agentic search performance

Fewer tokens consumed

Claude Haiku 3 deprecation

Model selection in multi-agent systems

What this means for the exam

Related articles

Context Window Management in Claude: Avoiding the Summarisation Trap

Building AI Agents with Claude: Orchestration Patterns That Actually Work

MCP Servers Explained: How the Model Context Protocol Works