4.3 — Prompt Chaining and Validation-Retry Loops

What You Need to Know

Production extraction systems fail. Documents have unexpected formats, numerical values do not add up, and fields end up in the wrong places. The question is not whether failures occur but how your system responds to them. This task statement covers the validation-retry pattern that turns extraction failures into self-correcting workflows.

Retry-with-Error-Feedback

The correct retry pattern sends three pieces of information back to the model:

The original document — so the model has the source to re-examine
The failed extraction — so the model can see what it produced
The specific validation error — so the model knows exactly what went wrong

typescript

// Retry with error feedback
const retryMessages = [
  {
    role: "user",
    content: `Original document:\n${originalDocument}\n\n` +
      `Your extraction:\n${JSON.stringify(failedExtraction)}\n\n` +
      `Validation error: Line items sum to £450 but stated_total is £500. ` +
      `Please re-extract, ensuring all line items are captured.`
  }
];

This is dramatically more effective than naive retries. Without the specific error, the model has no guidance for what to fix and typically produces the same mistake. With the error, the model can target its self-correction — re-examining the document for missed line items, checking field placement, or recalculating totals.

The Retry Effectiveness Boundary

This is the concept the exam tests most aggressively in this task statement. Retries have a clear effectiveness boundary:

Retries ARE effective for:

Format mismatches (wrong date format, inconsistent currency notation)
Structural output errors (values in wrong fields, incorrect nesting)
Misplaced values (data that exists in the document but was extracted into the wrong field)
Mathematical errors (the model missed a line item affecting the total)

Retries are NOT effective for:

Information genuinely absent from the source document
Data that exists only in an external document not provided to the model
Fields requiring knowledge the model does not have

The exam presents both scenarios and expects you to identify which is fixable. If a document genuinely does not contain a department name, no amount of retrying will produce a correct value. The correct action is to flag the extraction for human review or return null (if the schema allows it).

Self-Correction Flow Design

Rather than relying solely on external validation logic, you can build self-correction into the extraction schema itself:

calculated_total vs stated_total: Extract both the sum the model calculates from individual line items and the total stated in the document. When these differ, you have an automatic discrepancy flag without external logic.

json

{
  "line_items": [
    { "description": "Widget A", "amount": 150.00 },
    { "description": "Widget B", "amount": 300.00 }
  ],
  "calculated_total": 450.00,
  "stated_total": 500.00,
  "total_discrepancy": true
}

conflict_detected booleans: Add boolean fields that flag when the source document contains contradictory information. For example, if a document states "payment due: 30 days" in one section but "payment terms: net 60" in another, the model should extract both and set conflict_detected: true rather than silently picking one.

detected_pattern Fields

For code review and analysis pipelines, add detected_pattern fields to structured findings. This tracks which specific code construct triggered each finding.

json

{
  "finding": "Potential SQL injection vulnerability",
  "severity": "critical",
  "detected_pattern": "string concatenation in SQL query",
  "file": "user_service.py",
  "line": 42
}

When developers dismiss findings, you can analyse dismissal patterns by detected_pattern. If developers consistently dismiss findings triggered by "variable shadowing in nested scope," that pattern likely needs prompt refinement. This creates a systematic improvement loop: extract, validate, collect dismissal data, refine prompts, repeat.

Schema Syntax Errors vs Semantic Validation Errors

The exam distinguishes between these two error categories:

Schema syntax errors — Malformed JSON, missing required fields, wrong data types. Eliminated entirely by tool_use with JSON schemas (covered in Task Statement 4.2).

Semantic validation errors — Correct JSON structure but incorrect values. Line items that do not sum, dates that precede each other incorrectly, values in wrong fields. These require validation logic outside the schema and are the focus of retry loops.

The overlap between these task statements is intentional. The exam tests whether you understand that tool_use solves the first category but not the second.

Key Concept

Retry-with-error-feedback works by sending the original document, the failed extraction, and the specific validation error. Retries fix format and structural errors but cannot create information absent from the source document. Always identify whether a failure is fixable before retrying.

Exam Traps

Exam Trap

Assuming retries always work for extraction failures

Retries fix format mismatches, structural errors, and misplaced values. They cannot produce information genuinely absent from the source document. The exam presents both fixable and unfixable scenarios — you must distinguish them.

Exam Trap

Implementing retries without including the specific validation error

Naive retries without error feedback produce the same mistakes. The model needs to see exactly what went wrong (e.g., 'line items sum to £450 but stated total is £500') to self-correct effectively.

Exam Trap

Relying on schema validation alone without semantic checks

Schema validation (via tool_use) catches syntax errors. Semantic errors — wrong sums, misplaced values, fabricated data — require validation logic and retry loops.

Practice Scenario

Your extraction pipeline validates that line item amounts sum to the stated total. For Document A, the calculated sum is £450 but the stated total is £500. For Document B, the 'department' field is missing entirely from the source text. Which retry strategy is correct?

Option ARetry both documents with the validation errors, instructing the model to re-extract all fields

Option BRetry Document A with the discrepancy error; flag Document B for human review since the information is absent from the source

Option CRetry both documents with the same prompt, since extraction is non-deterministic and may succeed on a second attempt

Option DSkip retries for both documents and flag them all for human review to ensure accuracy

Build Exercise

Build a Validation-Retry Loop for Document Extraction

Difficulty

60 minutes

What you'll learn

Implement the retry-with-error-feedback pattern: original document + failed extraction + specific validation error
Distinguish fixable errors (format, structural, mathematical) from unfixable errors (absent information)
Design self-correction schemas with calculated_total vs stated_total and conflict_detected booleans
Build systematic improvement loops using detected_pattern fields and dismissal tracking
Understand the boundary between schema syntax errors (eliminated by tool_use) and semantic validation errors (require retry loops)

Define an extraction tool with calculated_total and stated_total fields, a conflict_detected boolean, and detected_pattern fields for tracking which constructs trigger findings
Why: Self-correction fields like calculated_total vs stated_total enable automatic discrepancy detection without external logic. conflict_detected booleans and detected_pattern fields create the data foundation for systematic prompt improvement.
You should see: A JSON schema with separate calculated_total and stated_total number fields, a total_discrepancy boolean, a conflict_detected boolean, and a detected_pattern string field on each finding in the line_items array.
Implement validation logic that checks: field completeness, numerical consistency (calculated sum matches stated total), enum validity, and date ordering
Why: Semantic validation catches errors that tool_use cannot. The exam distinguishes schema syntax errors (eliminated by tool_use) from semantic errors (wrong sums, misplaced values) that require validation logic and retry loops.
You should see: A validation function that returns an array of specific, actionable error messages. Each error should state what was expected versus what was found, not just that validation failed.
Build the retry loop: on validation failure, construct a follow-up message containing the original document, the failed extraction, and the specific validation error
Why: Retry-with-error-feedback is dramatically more effective than naive retries. Without the specific error, the model has no guidance and typically reproduces the same mistake. With the error, the model can target its self-correction.
You should see: A retry message that includes all three elements: the original document text, the JSON of the failed extraction, and the specific validation error string. The model should produce a corrected extraction on retry.
Test with 5 documents: 2 with fixable errors (misplaced values, wrong totals) and 3 with unfixable errors (absent information) — verify the loop retries only fixable cases
Why: The retry effectiveness boundary is the most aggressively tested concept in this task statement. Retries fix format mismatches and structural errors but cannot create information absent from the source. The exam presents both scenarios and expects you to identify which is fixable.
You should see: The 2 fixable documents succeed after 1-2 retries with corrected totals or field placements. The 3 unfixable documents are correctly identified as having absent information and flagged for human review rather than retried.
Log detected_pattern data for each finding and analyse which patterns are most frequently dismissed to identify prompt refinement priorities
Why: detected_pattern fields create a systematic improvement loop. When developers consistently dismiss findings triggered by a specific pattern, that pattern likely needs prompt refinement. This turns dismissal data into actionable prompt improvement priorities.
You should see: A log or table showing each detected_pattern, its frequency, its dismissal rate, and a prioritised list of patterns needing prompt refinement. Patterns with high dismissal rates should be at the top.

Sources

Claude Certified Architect Foundations Exam Guide — Task Statement 4.4 — Anthropic
Tool Use (Function Calling) — Anthropic
Building with Claude API (Skilljar) — Anthropic