Prompt Engineering & Structured Output

6 build exercises to practise the concepts in this domain.

4.1Build an Explicit Criteria Code Review Prompt

Intermediate
45 minutes
  • Understand why vague instructions (be conservative, high-confidence only) fail in production prompts
  • Design explicit categorical criteria that define what to flag and what to skip
  • Calibrate severity levels using concrete code examples rather than prose descriptions
  • Measure false positive rates and apply the trust recovery strategy of disabling problematic categories
  • Recognise the hierarchy: explicit criteria first, confidence-based routing second

4.2Build a Structured Extraction Tool with JSON Schema

Intermediate
45 minutes
  • Design JSON schemas with optional/nullable fields to prevent fabrication of missing data
  • Understand the three tool_choice modes (auto, any, forced) and when to use each
  • Recognise that tool_use eliminates syntax errors but not semantic errors
  • Apply schema design patterns: unclear enum values, other + detail string, format normalisation

4.3Build a Validation-Retry Loop for Document Extraction

Advanced
60 minutes
  • Implement the retry-with-error-feedback pattern: original document + failed extraction + specific validation error
  • Distinguish fixable errors (format, structural, mathematical) from unfixable errors (absent information)
  • Design self-correction schemas with calculated_total vs stated_total and conflict_detected booleans
  • Build systematic improvement loops using detected_pattern fields and dismissal tracking
  • Understand the boundary between schema syntax errors (eliminated by tool_use) and semantic validation errors (require retry loops)

4.4Build a Few-Shot Enhanced Extraction Prompt

Intermediate
45 minutes
  • Identify the three triggers for deploying few-shot examples: inconsistent formatting, ambiguous judgement calls, and empty fields for existing data
  • Construct effective few-shot examples with reasoning, not just input-output pairs
  • Use 2-4 targeted examples covering the specific failing scenarios
  • Distinguish when few-shot examples are the right technique versus schema changes or validation loops
  • Measure the impact of few-shot examples on empty field rates and format consistency

4.5Design a Batch Processing Strategy

Intermediate
45 minutes
  • Classify workflows as blocking (synchronous) or latency-tolerant (batch-eligible) based on latency requirements
  • Use the Message Batches API with custom_id fields for request-response correlation
  • Implement failure handling that resubmits only failed documents with targeted modifications
  • Calculate batch submission frequency against SLA constraints accounting for the 24-hour processing window
  • Apply the prompt refinement workflow: sample set testing before full batch submission

4.6Build a Multi-Pass Code Review System

Advanced
60 minutes
  • Understand why self-review in the same session retains reasoning context and is less effective than independent review
  • Design multi-pass review architectures with per-file local analysis and cross-file integration passes
  • Identify and mitigate attention dilution in large multi-file reviews
  • Implement confidence-based routing with calibrated thresholds from labelled validation sets
  • Distinguish uncalibrated raw confidence from calibrated thresholds suitable for automated routing