Domain 5
Task 5.1

Context Window Management

Learn this interactively

What You Need to Know

Context window management is the foundation of reliable Claude-based systems. Every multi-turn conversation, every multi-agent pipeline, and every long-document extraction task depends on how well you manage what goes into the context window. Get this wrong and your customer support agent forgets refund amounts, your research pipeline drops citations, and your extraction system loses precision on the very fields that matter.

The Progressive Summarisation Trap

When conversations grow long, a common strategy is to summarise earlier turns to free up token budget. This is a trap. Progressive summarisation systematically destroys the most critical information in customer-facing and data-processing systems: numerical values, dates, percentages, and customer-stated expectations.

Consider a real example. A customer contacts support about a refund:

Turn 3: "I'd like a refund of $247.83 for order #8891 placed on March 3rd"

After summarisation, this becomes:

Summary: "Customer wants a refund for a recent order"

The amount, order number, and date — the three facts the agent needs to process the refund — are gone. This is not a hypothetical failure mode; it is the default behaviour of summarisation applied to transactional data.

The fix: persistent case facts blocks. Extract transactional facts (amounts, dates, order numbers, statuses) into a structured block that is included in every prompt, outside the summarised history. This block is never summarised. It persists across every turn regardless of what happens to the conversation history.

json
{
  "caseFactsBlock": {
    "customerId": "C-4421",
    "issues": [
      {
        "orderId": "#8891",
        "orderDate": "2024-03-03",
        "refundAmount": "$247.83",
        "status": "pending_refund",
        "itemDescription": "Wireless headphones — defective"
      }
    ]
  }
}

For multi-issue sessions where a customer raises several problems in one conversation, extract and persist structured issue data into a separate context layer. Each issue gets its own entry with order IDs, amounts, and statuses. This prevents cross-contamination between issues during summarisation.

The "Lost in the Middle" Effect

Models process information at the beginning and end of long inputs reliably. Findings buried in the middle of a long context may be missed or given less weight. This is a well-documented phenomenon in large language models and it directly affects how you structure aggregated inputs.

The fix is structural, not prompt-based. Place key findings summaries at the beginning of aggregated inputs. Organise detailed results with explicit section headers throughout. If you are feeding a synthesis agent the output of three research subagents, start with a "Key Findings Summary" section, then provide the detailed outputs with clear section boundaries.

## Key Findings Summary - Source A: 12% market growth in renewable sector (2023) - Source B: Patent filings increased 34% year-on-year - Source C: Regulatory framework delayed until Q3 2025 ## Detailed Findings ### Source A: Market Analysis Report [Full details here...] ### Source B: Patent Database Analysis [Full details here...] ### Source C: Regulatory Review [Full details here...]

Tool Result Trimming

Tool results are a silent context budget killer. An order lookup might return 40+ fields: internal audit timestamps, warehouse codes, shipping carrier IDs, fulfilment centre identifiers, and dozens of other fields irrelevant to the customer's refund request. You need 5 fields. Those other 35 fields consume tokens in every subsequent turn as the conversation history grows.

Trim verbose tool outputs to only relevant fields before they accumulate in context. This is not optional optimisation — it is essential for multi-turn systems where tool results stack up across the conversation.

python
def trim_order_result(raw_result, relevant_fields=None):
    if relevant_fields is None:
        relevant_fields = [
            "order_id", "order_date", "total_amount",
            "return_eligible", "item_description"
        ]
    return {k: v for k, v in raw_result.items() if k in relevant_fields}

This trimming should happen in a PostToolUse hook or in the tool implementation itself, before the result enters the conversation history. Once verbose data is in the context, it stays there for every subsequent turn.

Full Conversation History

The Claude API is stateless. Each request must include the complete conversation history. If you omit earlier messages, the model loses conversational coherence. There is no session state on the server side — every turn must include everything the model needs to understand the full conversation.

This creates a tension with context limits: you need the full history for coherence, but the history grows with every turn. The persistent case facts block resolves this by separating critical facts from summarisable narrative, letting you summarise the conversation flow while preserving every transactional detail.

Upstream Agent Optimisation

In multi-agent systems, upstream agents often return verbose reasoning chains and raw content that downstream agents do not need. When a research subagent sends its full thought process to a synthesis agent with a limited context budget, the synthesis agent wastes tokens on reasoning it cannot use.

Modify upstream agents to return structured data — key facts, citations, relevance scores — instead of verbose content and reasoning chains. Require subagents to include metadata (dates, source locations, methodological context) in structured outputs to support accurate downstream synthesis.

json
{
  "findings": [
    {
      "claim": "Renewable energy investment grew 12% in 2023",
      "source": "IEA World Energy Report 2024",
      "sourceUrl": "https://example.com/report",
      "relevanceScore": 0.92,
      "publicationDate": "2024-01-15"
    }
  ]
}

This is not just about saving tokens. Structured outputs from upstream agents enable downstream agents to process findings without re-parsing verbose prose.

Key Concept

The persistent case facts block is the single most important pattern in context window management. Extract transactional facts (amounts, dates, order numbers) into a structured block that is included in every prompt and never summarised. This is the fix for progressive summarisation and the foundation for reliable multi-turn systems.

Exam Traps

EXAM TRAP

Thinking progressive summarisation is safe for transactional data

Summarisation systematically destroys numerical values, dates, and specific identifiers. A persistent case facts block must hold these outside summarised history.

EXAM TRAP

Assuming the 'lost in the middle' effect is solved by telling the model to pay attention to everything

The fix is structural: place key findings at the beginning of inputs and use explicit section headers. Prompt-based reminders are unreliable for position effects.

EXAM TRAP

Keeping full tool results in context because 'the model might need them later'

Untrimmed tool results from 40+ field lookups exhaust the token budget across turns. Trim to relevant fields before results enter the conversation history.

EXAM TRAP

Believing conversation history can be selectively truncated without consequences

The API is stateless. Each request needs complete conversation history. Selective truncation breaks conversational coherence. Use case facts blocks and summarisation instead of truncation.

Practice Scenario

A customer support agent handles a multi-issue session. After several turns, the agent refers to 'your recent refund request' instead of the specific $247.83 refund for order #8891. The conversation history is being summarised between turns to manage context length. What is the most effective fix?

Build Exercise

Build a Persistent Case Facts Context Manager

Intermediate
45 minutes

What you'll learn

  • Implement the persistent case facts block pattern to protect transactional data from summarisation
  • Trim verbose tool results to relevant fields before they accumulate in context
  • Recognise and mitigate the progressive summarisation trap for numerical values, dates, and identifiers
  • Apply the lost-in-the-middle mitigation by placing key findings at the beginning of aggregated inputs
  • Understand that the Claude API is stateless and each request must include complete conversation history
  1. Create a case facts extractor that identifies transactional data (amounts, dates, order numbers, statuses) from tool results

    Why: The persistent case facts block is the single most important pattern in context window management. Extracting transactional facts into a structured block that is never summarised prevents the progressive summarisation trap from destroying critical numerical values and identifiers.

    You should see: A function that takes raw tool output and returns a structured object containing only the transactional facts: customer ID, order numbers, amounts, dates, and statuses. Non-transactional narrative content should be excluded.

  2. Implement a persistent case facts block that is prepended to every prompt, outside summarised history

    Why: The case facts block must persist across every turn regardless of what happens to the conversation history. It sits outside the summarised portion of the context, ensuring amounts, dates, and order numbers survive even when earlier conversation turns are compressed.

    You should see: A prompt construction function that always includes the case facts block at the top of every message, followed by any summarised history, followed by the current turn. The case facts block should be clearly delimited with a section header.

  3. Build a tool result trimmer that filters order lookup responses from 40+ fields to only the 5 relevant return-related fields

    Why: Untrimmed tool results are a silent context budget killer. An order lookup returning 40+ fields consumes tokens in every subsequent turn as conversation history grows. Trimming to relevant fields before results enter context is essential, not optional.

    You should see: A trimming function that takes a raw tool result object and returns only the fields needed for the current task. The trimmed result should be 80-90% smaller than the original.

  4. Test with a multi-turn conversation where summarisation occurs and verify that transactional facts survive intact across all turns

    Why: This validates that the persistent case facts pattern actually works. The exam tests whether you understand that progressive summarisation destroys specific amounts and dates, and the case facts block is the fix. You need to verify this empirically.

    You should see: A 6-8 turn conversation where summarisation occurs after turn 4. After summarisation, the agent should still reference the exact refund amount ($247.83), order number (#8891), and date (March 3rd) from the case facts block. Without the block, these values would be lost to summarisation.

  5. Add key findings placement logic that positions summaries at the beginning of aggregated inputs to mitigate the lost-in-the-middle effect

    Why: Models process information at the beginning and end of long inputs reliably, but findings buried in the middle may be missed. Placing key findings summaries at the start of aggregated inputs is a structural fix for this well-documented phenomenon.

    You should see: An aggregation function that places a Key Findings Summary section at the top of combined inputs, followed by detailed results with explicit section headers. The key findings should be concise bullet points drawn from the detailed content.

Sources