What You Need to Know
The Message Batches API is a cost optimisation tool with hard constraints that the exam tests directly. Understanding when to use it — and when not to — is the core of this task statement.
Message Batches API: The Facts
These are the non-negotiable constraints:
- 50% cost savings compared to synchronous API calls
- Up to 24-hour processing window — results may arrive in minutes or take up to 24 hours
- No guaranteed latency SLA — you cannot rely on results arriving within any specific timeframe
- No multi-turn tool calling within a single batch request — the model cannot execute tools mid-request and use the results to continue processing
custom_idfields for correlating request/response pairs — each request in a batch gets a unique identifier used to match it with its response
The Matching Rule
This is the single most tested concept from this task statement:
Synchronous API: For blocking workflows where someone or something is waiting for the result. Pre-merge checks in CI/CD, real-time code review feedback, any workflow where developers are blocked pending completion.
Batch API: For latency-tolerant workflows where results are consumed later. Overnight technical debt reports, weekly code audit summaries, nightly test generation runs, batch document extraction.
The exam specifically presents a scenario (Question 11 in the sample questions) where a manager proposes switching everything to batch processing for the cost savings. The correct answer keeps blocking workflows synchronous and only moves latency-tolerant workflows to batch.
// Synchronous — developer is waiting for this
const preMergeReview = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages: [{ role: "user", content: prDiffContent }]
});
// Batch — results consumed tomorrow morning
const batchRequest = await client.batches.create({
requests: technicalDebtDocuments.map((doc, i) => ({
custom_id: `debt-report-${i}`,
params: {
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages: [{ role: "user", content: doc }]
}
}))
});
SLA Calculation
When designing batch processing schedules, you must account for the 24-hour maximum processing window. If your organisation requires a 30-hour SLA for a report:
- 30 hours total SLA minus 24 hours maximum batch processing = 6 hours buffer
- Submit batches every 4-6 hours to guarantee the SLA with margin
- This means the batch must be submitted at least 30 hours before the deadline, not 24
The exam may present a scheduling question where you need to work backwards from the SLA to determine submission frequency.
Batch Failure Handling
Not all documents in a batch succeed. The correct failure handling pattern has three steps:
1. Identify failures by custom_id. Each request has a unique identifier. Parse the batch results to find which custom_id values failed.
2. Resubmit only failures with modifications. Do not resubmit the entire batch. Common modifications include:
- Chunking oversized documents that exceeded context limits
- Simplifying extraction prompts for documents with unusual structures
- Adding format-specific few-shot examples for documents that failed due to structural variety
3. Refine prompts on a sample set BEFORE batch processing. This is the proactive step that maximises first-pass success and reduces resubmission costs. Test your prompts against a representative sample (5-10 documents covering the range of formats and edge cases) before processing the full batch.
// Parse batch results and identify failures
const results = await client.batches.results(batchId);
const failures = results.filter(r => r.result.type === "errored");
const failedIds = failures.map(f => f.custom_id);
// Resubmit only failures with modifications
const retryRequests = failedIds.map(id => {
const originalDoc = documentsById[id];
return {
custom_id: `${id}-retry-1`,
params: {
model: "claude-sonnet-4-20250514",
max_tokens: 8192, // increased for oversized docs
messages: [{
role: "user",
content: chunkIfNeeded(originalDoc)
}]
}
};
});
Multi-Turn Tool Calling Limitation
The batch API does not support multi-turn tool calling within a single request. This means you cannot:
- Define tools and have the model call them mid-request
- Process tool results and continue the conversation within the same batch item
- Run agentic loops within a single batch request
If your workflow requires tool execution mid-processing, you must use the synchronous API. This limitation is a direct exam test point — if a scenario describes a batch workflow that needs to call external tools during processing, the correct answer is to use the synchronous API for that step.
Key Concept
The Message Batches API provides 50% cost savings with an up to 24-hour processing window and no latency SLA. Use it only for latency-tolerant workflows (overnight reports, weekly audits). Blocking workflows (pre-merge checks) must remain synchronous. Always refine prompts on a sample set before submitting large batches.
Prompt Optimisation Before Batch Submission
The most cost-effective batch processing strategy is to invest time in prompt refinement before submitting large volumes:
- Sample set testing: Take 5-10 representative documents covering the range of formats, edge cases, and document types in your batch
- Iterate on the sample: Refine your extraction prompts, add few-shot examples, adjust schema design until the sample set achieves high accuracy
- Submit the full batch: With refined prompts, your first-pass success rate will be significantly higher
- Handle failures: Resubmit only the failed documents with targeted modifications
This workflow dramatically reduces total cost. A 90% first-pass success rate on 1,000 documents means only 100 retries. A 60% first-pass rate means 400 retries — four times the resubmission cost, plus the batch processing cost for those retries.
Exam Traps
Switching all workflows to batch processing for cost savings
Blocking workflows where developers wait for results (pre-merge checks, real-time reviews) must remain synchronous. The batch API has no guaranteed latency SLA and can take up to 24 hours. Only latency-tolerant workflows should use batch.
Assuming batch results arrive quickly because they often do
The batch API has no latency SLA. Results often arrive faster than 24 hours, but you cannot design blocking workflows around best-case timing. Design around the 24-hour maximum.
Using batch API for workflows requiring multi-turn tool calling
The batch API does not support multi-turn tool calling within a single request. If your workflow needs to execute tools and use results mid-processing, you must use the synchronous API.
Practice Scenario
Your team wants to reduce API costs for automated analysis. You have two workflows: (1) a blocking pre-merge check that must complete before developers merge, and (2) a technical debt report generated overnight for review the next morning. Your manager proposes switching both to the Message Batches API for 50% cost savings. How should you evaluate this proposal?
Build Exercise
Design a Batch Processing Strategy
What you'll learn
- Classify workflows as blocking (synchronous) or latency-tolerant (batch-eligible) based on latency requirements
- Use the Message Batches API with custom_id fields for request-response correlation
- Implement failure handling that resubmits only failed documents with targeted modifications
- Calculate batch submission frequency against SLA constraints accounting for the 24-hour processing window
- Apply the prompt refinement workflow: sample set testing before full batch submission
- List 5 workflows in a hypothetical organisation and categorise each as blocking (synchronous) or latency-tolerant (batch-eligible) with justification
Why: The matching rule between synchronous and batch API is the most tested concept in this task statement. The exam presents a scenario where a manager proposes switching everything to batch for cost savings, and you must identify which workflows cannot tolerate the 24-hour processing window.
You should see: A table with 5 workflows, each clearly categorised with justification. Blocking workflows have someone or something waiting for the result. Batch-eligible workflows consume results later with no real-time dependency.
- Define a batch submission for 20 documents using the Message Batches API format with unique custom_id fields for each document
Why: custom_id fields are the mechanism for correlating request-response pairs in batch results. Without unique identifiers, you cannot determine which documents succeeded or failed, making failure handling impossible.
You should see: A valid batch request object with 20 entries, each containing a unique custom_id, model specification, max_tokens, and a messages array with the document content.
- Implement failure handling: parse batch results, identify failures by custom_id, and construct a retry batch containing only failed documents with increased max_tokens
Why: Resubmitting only failures with targeted modifications is the correct batch failure pattern. Resubmitting the entire batch wastes cost on already-successful documents. The exam tests that you understand custom_id correlation and targeted retry.
You should see: A failure handler that filters results by error status, extracts the custom_id values of failures, looks up the original documents, and creates a retry batch with modifications like increased max_tokens or chunked content.
- Calculate the batch submission frequency needed to guarantee a 30-hour SLA given the 24-hour maximum processing window
Why: SLA calculation with the 24-hour batch processing window is a direct exam test point. You must work backwards from the SLA deadline to determine when to submit, accounting for the maximum processing time plus a safety margin.
You should see: A calculation showing: 30-hour SLA minus 24-hour maximum processing window equals 6 hours of buffer. Submission must occur at least 30 hours before the deadline, with batches submitted every 4-6 hours to guarantee the SLA with margin.
- Create a 5-document sample set and refine extraction prompts iteratively before submitting the full batch of 20 documents
Why: Prompt refinement on a sample set before batch submission is the most cost-effective batch processing strategy. A 90% first-pass success rate means 2 retries on 20 documents. A 60% first-pass rate means 8 retries, four times the resubmission cost.
You should see: A sample set covering the range of document types and edge cases, 2-3 prompt iterations improving accuracy on the sample, and then the full batch submission achieving a high first-pass success rate.
Sources
- Claude Certified Architect Foundations Exam Guide — Task Statement 4.5 — Anthropic
- Message Batches API — Anthropic
- Building with Claude API (Skilljar) — Anthropic