Domain 1
Task 1.4

Claude Agent SDK

Learn this interactively

What You Need to Know

Task Statement 1.4 draws a hard line between two approaches to controlling agent behaviour: prompt-based guidance and programmatic enforcement. The exam tests this distinction repeatedly, and getting it wrong on high-stakes scenarios will cost you marks.

The Enforcement Spectrum

There are two fundamentally different ways to enforce workflow ordering in an agentic system:

Prompt-based guidance means including instructions in the system prompt. For example: "Always verify the customer's identity before processing a refund." This works most of the time — perhaps 90-95% of cases. But it has a non-zero failure rate. The model is probabilistic. Sometimes it will skip steps, reorder them, or interpret instructions loosely. For low-stakes operations, this failure rate is acceptable.

Programmatic enforcement means implementing hooks, prerequisite gates, or code-level checks that physically block downstream tools until prerequisites complete. For example: the process_refund tool cannot execute until get_customer has returned a verified customer ID. This works every time. It is deterministic, not probabilistic. No matter what the model decides to do, the gate prevents the wrong execution order.

Key Concept

Prompt-based guidance is probabilistic — it works most of the time. Programmatic enforcement is deterministic — it works every time. The exam decision rule: if a single failure would cause financial loss, security breach, or compliance violation, use programmatic enforcement.

The Exam Decision Rule

The exam applies a consistent decision rule across multiple scenarios:

  • Financial operations (refunds, transfers, payments): programmatic enforcement. A single unverified refund to the wrong account is a financial loss.
  • Security operations (identity verification, access control): programmatic enforcement. A single bypass of identity verification is a security breach.
  • Compliance operations (AML checks, regulatory requirements): programmatic enforcement. A single missed compliance check can result in legal penalties.
  • Low-stakes operations (formatting preferences, style guidelines, output ordering): prompt-based guidance is acceptable. A formatting inconsistency is not a business risk.

The exam will present prompt-based solutions as answer options for high-stakes scenarios. Reject them. Enhanced system prompts, few-shot examples, and stronger instructions all improve accuracy but none provide deterministic guarantees. When the scenario involves money, security, or compliance, the answer is always programmatic enforcement.

Exam Trap

The exam consistently presents "add stronger instructions to the system prompt" or "include few-shot examples showing the correct workflow" as distractors for high-stakes scenarios. These answers improve probability but do not eliminate the failure rate. For financial, security, and compliance operations, only programmatic enforcement is correct.

Prerequisite Gates in Practice

A prerequisite gate is a programmatic check that blocks a tool from executing until a prior condition is met. In a customer support agent:

  1. The agent has access to get_customer, lookup_order, and process_refund tools.
  2. A prerequisite gate checks: has get_customer returned a verified customer ID for this session?
  3. If yes, process_refund executes normally.
  4. If no, process_refund returns an error message: "Cannot process refund — customer identity not verified. Please call get_customer first."

The gate is code, not a prompt instruction. The model cannot bypass it by deciding to skip verification. Even if the model attempts to call process_refund directly, the gate blocks the call and returns an error that forces the model to verify identity first.

Multi-Concern Request Handling

Customers frequently submit requests with multiple issues: "I want to return my order, update my shipping address, and ask about my loyalty points." The exam tests how agents should handle these compound requests.

The correct approach:

  1. Decompose the request into distinct items (return, address update, loyalty inquiry).
  2. Investigate each in parallel using shared context (the customer's account information is relevant to all three).
  3. Synthesise a unified resolution that addresses all items in a single response.

The wrong approach is to handle them sequentially with separate conversations, or to address only the first item and forget the rest.

Structured Handoff Protocols

When an agent cannot resolve an issue and must escalate to a human agent, the handoff must follow a structured protocol. The critical constraint: the human agent does NOT have access to the conversation transcript. They cannot scroll through the chat history to understand the issue.

A proper handoff summary must be self-contained and include:

  • Customer ID — so the human agent can pull up the account.
  • Conversation summary — what the customer asked for and what has been attempted.
  • Root cause analysis — the agent's assessment of the underlying issue.
  • Refund amount (if applicable) — the specific financial figure, not a vague reference.
  • Recommended action — what the agent believes the human agent should do.

This summary is the only information the human agent receives. If it is incomplete, the human agent must ask the customer to repeat everything, creating a poor experience.

Practical Example: The 8% Failure Rate

Production data shows a customer support agent processes refunds without verifying account ownership in 8% of cases. The system prompt instructs: "Always verify the customer's identity before processing any refund." The prompt works 92% of the time but fails 8% of the time.

The 8% failure rate has already resulted in refunds processed on wrong accounts. This is a financial operation with real monetary consequences.

The fix is a programmatic prerequisite gate. Before process_refund can execute, the system checks that get_customer has returned a verified customer ID in the current session. This eliminates the 8% failure rate entirely — not by improving the prompt, but by physically preventing the incorrect execution order.

Exam Traps

EXAM TRAP

Enhanced system prompt instructions as the fix for high-stakes compliance failures

If the current prompt already instructs the correct workflow but fails 8% of the time, a stronger prompt might reduce failures to 3-4% but will never reach 0%. Financial, security, and compliance operations require programmatic enforcement for deterministic guarantees.

EXAM TRAP

Few-shot examples as sufficient for guaranteed compliance

Few-shot examples improve model behaviour but are still probabilistic. They cannot provide the 100% enforcement required for financial and compliance operations. Use programmatic prerequisite gates.

EXAM TRAP

Routing classifiers proposed to fix per-agent compliance issues

A routing classifier determines which agent handles a request. The compliance failure occurs within the agent execution sequence, not at the routing level. Classifiers handle routing, not per-agent workflow enforcement.

EXAM TRAP

Handoff summaries that omit critical fields like customer ID or recommended action

Human agents do not have access to the conversation transcript. The handoff summary must be self-contained with all required fields: customer ID, conversation summary, root cause analysis, refund amount, and recommended action.

Practice Scenario

Production data reveals that in 8% of cases, a customer support agent processes refunds without verifying account ownership, occasionally leading to refunds on wrong accounts. The system prompt clearly states 'always verify customer identity before processing refunds.' What is the most appropriate fix?

Build Exercise

Build a Prerequisite Gate for Financial Operations

Advanced
60 minutes

What you'll learn

  • Why programmatic enforcement is required for financial operations instead of prompt-based guidance
  • How prerequisite gates physically block tool execution until preconditions are met
  • The difference between the 8% prompt failure rate and 0% gate failure rate
  • How to implement structured handoff protocols with all required fields
  • How multi-concern requests should be decomposed and handled in parallel
  1. Create a customer support agent with three tools: get_customer (returns customer ID and verification status), lookup_order (returns order details), and process_refund (processes a refund for a given amount)

    Why: These three tools create the exact scenario the exam uses for the 8% failure rate question. The workflow dependency between get_customer and process_refund is where programmatic enforcement becomes essential.

    You should see: Three tool definitions with proper JSON Schema input_schema. get_customer accepts a name or email, lookup_order accepts an order ID, and process_refund accepts a customer ID and amount.

  2. Implement a programmatic prerequisite gate that blocks process_refund from executing until get_customer has returned a verified customer ID in the current session

    Why: This is the core exam concept: prompt instructions work 92% of the time but fail 8%. A prerequisite gate provides 100% deterministic enforcement. The exam always rejects prompt-based solutions for financial operations.

    You should see: A session-level state tracker that records whether get_customer has returned a verified customer. The process_refund handler checks this state before executing and returns an error if verification has not occurred.

  3. Test that the gate works by prompting the agent to skip verification and process a refund directly — verify the gate blocks the attempt

    Why: Testing the bypass attempt demonstrates the difference between prompt-based and programmatic enforcement. Even when the model decides to skip verification, the gate blocks the action — which is the entire point of deterministic enforcement.

    You should see: The agent attempts to call process_refund without prior verification. The gate returns a blocked error message. The agent then calls get_customer before retrying the refund successfully.

  4. Implement a structured handoff protocol: when the agent cannot resolve an issue, it compiles a self-contained summary with customer ID, conversation summary, root cause analysis, refund amount, and recommended action

    Why: Human agents do NOT have access to the conversation transcript. The handoff summary is the only information they receive. The exam tests whether you include all five required fields: customer ID, summary, root cause, amount, and recommended action.

    You should see: A handoff function that produces a structured object with all five fields populated. No field should be empty or contain placeholder text.

  5. Test the handoff with a multi-concern request (return plus billing dispute plus account update) and verify the handoff summary is complete and self-contained

    Why: Multi-concern requests test whether the agent decomposes the request into distinct items and addresses all of them. The exam expects decomposition, parallel investigation, and unified resolution — not sequential handling or forgetting items.

    You should see: The agent identifies all three concerns, investigates each one, and produces a handoff summary that covers all three issues with specific details for each. No concern is omitted.

Sources