Context Window Management in Claude: Avoiding the Summarisation Trap

Every message you send to Claude includes the full conversation history. The longer the conversation runs, the bigger that payload gets, and eventually you hit the context limit. So you summarise older turns to free up space. Sounds reasonable. The problem is that summarisation quietly destroys the exact data your system needs to do its job.

The patterns below are how you manage context without losing critical information. They come up repeatedly on the Claude Certified Architect exam because they separate working prototypes from production systems that actually hold up.

The progressive summarisation trap

Picture this. A customer support agent handles a session with two open issues. The conversation grows, the system summarises older turns to stay within the context limit, and after summarisation:

Before: "Customer wants a refund of $247.83 for order #8891 placed on 3 March"

After: "Customer wants a refund for a recent order"

The amount, the order number, the date. Gone. The agent starts referring to "your recent refund request" instead of the specific transaction. The customer notices. Trust evaporates.

That's the progressive summarisation trap. When you condense conversation history, numerical values, dates, percentages, and customer-stated expectations get compressed into vague summaries. Summarisation models optimise for semantic compression, not factual preservation. They'll keep the gist and drop the specifics every time.

The fix: persistent case facts

Pull transactional facts into a structured block that lives outside the summarised history and gets included in every prompt:

python

case_facts = {
    "customer_id": "CUST-44891",
    "issues": [
        {
            "order_id": "#8891",
            "amount": "$247.83",
            "date": "2026-03-03",
            "status": "refund_requested",
            "product": "Wireless headphones"
        },
        {
            "order_id": "#9102",
            "amount": "$89.99",
            "date": "2026-03-10",
            "status": "delivery_delayed",
            "expected_delivery": "2026-03-15"
        }
    ]
}

This block gets prepended to the system prompt on every turn. You can summarise the conversation history as aggressively as you like because the hard facts live separately:

python

system_prompt = f"""You are a customer support agent.

CASE FACTS (always reference these exact values):
{json.dumps(case_facts, indent=2)}

CONVERSATION SUMMARY:
{summarised_history}
"""

Now the model always has the exact amounts, dates, and order numbers, no matter how much you compress the conversation history. This pattern is the core of Task 5.1 — Context Window Management.

Tool result trimming

Your order lookup API returns 40+ fields. Shipping address, billing address, warehouse ID, internal tracking codes, tax breakdowns, audit timestamps. Your agent needs five of them: order ID, status, amount, product name, and expected delivery date.

If you append the full API response to conversation history, you're burning tokens on irrelevant data and crowding out the information that actually matters.

python

def trim_order_result(raw_result: dict) -> dict:
    """Extract only the fields the agent needs."""
    return {
        "order_id": raw_result["order_id"],
        "status": raw_result["status"],
        "total_amount": raw_result["total_amount"],
        "product_name": raw_result["items"][0]["name"],
        "expected_delivery": raw_result.get("expected_delivery"),
    }

Do this trimming before you append tool results to conversation history. The model gets clean, focused data instead of a wall of fields it doesn't need.

When to trim aggressively

Trim when:

The tool returns structured data with dozens of fields
You only care about a handful of those fields for the current task
The same tool gets called multiple times per conversation (each call eats context)

Keep the full output when:

The user might ask follow-up questions about any field
The tool returns unstructured text you can't reliably summarise
You're debugging or auditing and completeness matters

The lost-in-the-middle effect

Language models are good at processing information near the beginning and end of long inputs. Content buried in the middle? It gets less attention. This is well-documented in the research literature, and it matters a lot when you're aggregating outputs from multiple tools or agents.

What not to do: Dump all tool results sequentially into one long context block.

What works: Put a summary of key findings at the top, then include the detailed results below:

python

aggregated_input = f"""KEY FINDINGS SUMMARY:
- Order #8891: Refund approved, processing in 3-5 business days
- Order #9102: Delivery delayed, new ETA 2026-03-20

DETAILED RESULTS:
{detailed_tool_output_1}

{detailed_tool_output_2}

{detailed_tool_output_3}
"""

The summary at the top means the model hits the important stuff first. Even if it pays less attention to the detailed sections in the middle, the key facts have already been processed.

Explicit section headers help too. They act as retrieval cues that let the model jump to specific information instead of scanning through a wall of text.

Upstream agent optimisation

In multi-agent systems, upstream agents love to return everything: full reasoning chains, intermediate calculations, raw data dumps. If a downstream agent has a limited context budget, all that verbosity is wasted tokens.

The fix is straightforward. Require upstream agents to return structured output with only the fields the downstream agent needs:

python

# Instead of: verbose reasoning + raw data
upstream_output_bad = """
I searched through the database and found several matching records.
After comparing the dates and cross-referencing with the shipping
logs, I determined that order #8891 was shipped on March 4th from
warehouse W-12 via carrier DHL with tracking number 1234567890...
(500 more words of reasoning)
"""

# Return structured data with metadata
upstream_output_good = {
    "order_id": "#8891",
    "ship_date": "2026-03-04",
    "carrier": "DHL",
    "tracking": "1234567890",
    "status": "in_transit",
    "confidence": "verified",
    "source": "shipping_database"
}

The structured version uses a fraction of the tokens and preserves every fact. The downstream agent gets exactly what it needs without inheriting 500 words of reasoning it'll never use.

This is covered in Task 5.3 — Long Conversations.

Prompt caching

Your system prompt, tool definitions, and reference data don't change between requests. So why reprocess them every time? Claude's prompt caching lets you mark content as cacheable so it's processed once and reused:

python

message = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": long_system_prompt,
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": user_query}]
)

Cache the system prompt, tool descriptions, and any reference data that stays constant. Conversation history changes on every turn, so don't cache that.

The result is lower latency and lower cost on production workloads. See Task 5.2 — Prompt Caching for the full breakdown.

Exam patterns

The exam tests context management through applied scenarios. Here's what to watch for:

"The agent says 'your recent order' instead of the specific order number." That's the progressive summarisation trap. Fix it with a persistent case facts block.
"Context fills up after 8 tool calls." Verbose tool results eating tokens. Trim to relevant fields before appending.
"Findings from the third data source are missing from the final report." Lost-in-the-middle effect. Put a key findings summary at the top.
"Increase context window size" as a distractor. This just postpones the problem. The exam wants structural fixes (case facts, trimming, placement), not capacity increases.
"Instruct the model to preserve numbers during summarisation." Don't fall for this one. Prompt-based summarisation instructions don't guarantee factual preservation. Extract facts structurally instead.

Work through the complete Context Management domain for all six task statements, or test yourself with the practice questions.

Context Window Management in Claude: Avoiding the Summarisation Trap

The progressive summarisation trap

The fix: persistent case facts

Tool result trimming

When to trim aggressively

The lost-in-the-middle effect

Upstream agent optimisation

Prompt caching

Exam patterns

Related articles

Claude Sonnet 4.6: What's New for Architects