Prompt Engineering & Structured Output
6 build exercises to practise the concepts in this domain.
4.1 — Build an Explicit Criteria Code Review Prompt
Intermediate
45 minutes
- Understand why vague instructions (be conservative, high-confidence only) fail in production prompts
- Design explicit categorical criteria that define what to flag and what to skip
- Calibrate severity levels using concrete code examples rather than prose descriptions
- Measure false positive rates and apply the trust recovery strategy of disabling problematic categories
- Recognise the hierarchy: explicit criteria first, confidence-based routing second
4.2 — Build a Structured Extraction Tool with JSON Schema
Intermediate
45 minutes
- Design JSON schemas with optional/nullable fields to prevent fabrication of missing data
- Understand the three tool_choice modes (auto, any, forced) and when to use each
- Recognise that tool_use eliminates syntax errors but not semantic errors
- Apply schema design patterns: unclear enum values, other + detail string, format normalisation
4.3 — Build a Validation-Retry Loop for Document Extraction
Advanced
60 minutes
- Implement the retry-with-error-feedback pattern: original document + failed extraction + specific validation error
- Distinguish fixable errors (format, structural, mathematical) from unfixable errors (absent information)
- Design self-correction schemas with calculated_total vs stated_total and conflict_detected booleans
- Build systematic improvement loops using detected_pattern fields and dismissal tracking
- Understand the boundary between schema syntax errors (eliminated by tool_use) and semantic validation errors (require retry loops)
4.4 — Build a Few-Shot Enhanced Extraction Prompt
Intermediate
45 minutes
- Identify the three triggers for deploying few-shot examples: inconsistent formatting, ambiguous judgement calls, and empty fields for existing data
- Construct effective few-shot examples with reasoning, not just input-output pairs
- Use 2-4 targeted examples covering the specific failing scenarios
- Distinguish when few-shot examples are the right technique versus schema changes or validation loops
- Measure the impact of few-shot examples on empty field rates and format consistency
4.5 — Design a Batch Processing Strategy
Intermediate
45 minutes
- Classify workflows as blocking (synchronous) or latency-tolerant (batch-eligible) based on latency requirements
- Use the Message Batches API with custom_id fields for request-response correlation
- Implement failure handling that resubmits only failed documents with targeted modifications
- Calculate batch submission frequency against SLA constraints accounting for the 24-hour processing window
- Apply the prompt refinement workflow: sample set testing before full batch submission
4.6 — Build a Multi-Pass Code Review System
Advanced
60 minutes
- Understand why self-review in the same session retains reasoning context and is less effective than independent review
- Design multi-pass review architectures with per-file local analysis and cross-file integration passes
- Identify and mitigate attention dilution in large multi-file reviews
- Implement confidence-based routing with calibrated thresholds from labelled validation sets
- Distinguish uncalibrated raw confidence from calibrated thresholds suitable for automated routing