What You Need to Know
Integrating Claude Code into CI/CD pipelines transforms it from an interactive developer tool into an automated review and generation engine. The exam tests five specific concepts in this task statement, with the -p flag being the single most directly tested item (it is Question 10 in the sample question set).
The -p Flag: Non-Interactive Mode
Claude Code defaults to interactive mode — it expects keyboard input and displays a conversational interface. In a CI pipeline, there is no keyboard. Without the -p flag, the CI job hangs indefinitely, waiting for input that will never arrive.
# WRONG — hangs in CI claude "Analyse this pull request for security issues" # CORRECT — runs non-interactively claude -p "Analyse this pull request for security issues"
The -p flag (also --print) switches Claude Code to print mode: it processes the prompt, outputs the result to stdout, and exits. No interactive input required.
This is a memorisation item. The exam presents a scenario where a CI job hangs, logs show Claude waiting for input, and you must select the correct fix. The answer is the -p flag. Not CLAUDE_HEADLESS=true (does not exist). Not --batch (does not exist). Not stdin redirection from /dev/null (does not properly address Claude Code's interactive mode).
Key Concept
The -p flag is the single most directly testable fact in Domain 3. It is Question 10 in the official sample questions. When you see a CI pipeline hanging and logs showing Claude waiting for input, the answer is always -p.
Structured Output for CI
In CI pipelines, Claude Code output must be machine-parseable. Humans are not reading the output — automated systems are processing it to post inline PR comments, update dashboards, or trigger downstream workflows.
Two flags work together:
--output-format json— forces JSON output instead of human-readable text--json-schema— enforces a specific JSON structure for the output
claude -p \
--output-format json \
--json-schema '{"type":"object","properties":{"findings":{"type":"array","items":{"type":"object","properties":{"file":{"type":"string"},"line":{"type":"integer"},"severity":{"type":"string"},"message":{"type":"string"}}}}}}' \
"Review this PR for security issues"
The output conforms to the specified schema, enabling automated systems to:
- Parse findings programmatically
- Post findings as inline PR comments at the exact file and line
- Filter by severity for different notification channels
- Track findings across review runs
Session Context Isolation
The same Claude session that generated code is less effective at reviewing its own changes. This is not a theoretical concern — it is a measurable effect.
Why self-review is weaker:
When Claude generates code in a session, it builds up reasoning context: why it chose this approach, what tradeoffs it considered, what alternatives it rejected. When you then ask it to review the same code in the same session, it retains that reasoning context. It is less likely to question decisions it already justified to itself.
The fix: independent review instances
Use a separate Claude Code invocation for review — one that has no access to the generation session's reasoning context. The independent reviewer evaluates the code on its own merits, without the bias of prior justification.
# Step 1: Generate code (session A) claude -p "Implement the authentication middleware" # Step 2: Review code (session B — independent, no shared context) claude -p "Review the authentication middleware for security issues, error handling gaps, and edge cases"
This concept connects to Domain 4 (multi-instance review architectures) and Domain 5 (context management). The exam tests it in CI/CD scenarios specifically.
Incremental Review Context
Automated reviews run on every push. Without context about previous reviews, each run analyses the entire PR from scratch. This creates duplicate comments — the same issue flagged on every push, even after the developer has acknowledged or addressed it.
The fix: include prior review findings in context and instruct Claude to report only new or still-unaddressed issues.
claude -p \
--output-format json \
"Review this PR. Here are the findings from the previous review:
${PREVIOUS_FINDINGS}
Report ONLY:
1. New issues not in the previous findings
2. Issues from the previous findings that are still present
Do NOT re-report issues that have been addressed."
Duplicate comments erode developer trust. If every push generates the same five comments regardless of whether the developer fixed the issues, developers stop reading the comments. Incremental review context preserves the signal-to-noise ratio.
CLAUDE.md for CI Context
When Claude Code runs in CI, it reads the project's CLAUDE.md files just as it does in interactive mode. This means the CLAUDE.md is the mechanism for providing project-specific context to CI-invoked Claude Code:
- Testing standards: what makes a valuable test, what patterns to follow, what to avoid
- Available fixtures: which test fixtures exist, how to use them, what data they contain
- Review criteria: what constitutes a critical finding vs a minor style issue
- Existing test coverage: what is already covered, to avoid suggesting duplicate tests
Without this context in CLAUDE.md, CI-invoked test generation produces low-value boilerplate. With it, generated tests follow the team's patterns and add genuine coverage.
# .claude/CLAUDE.md — CI-relevant section ## Testing Standards - Tests must use the factory pattern from test/factories/ for data creation - Integration tests connect to the test database via test/setup/db.ts - Do not test private implementation details — test public API contracts - Coverage target: 80% branch coverage for new code - Available fixtures: test/fixtures/users.json, test/fixtures/orders.json
Providing Existing Tests to Avoid Duplication
When running test generation in CI, include existing test files in context. Without them, Claude Code may suggest tests that already exist, wasting developer review time. Including existing tests enables Claude to identify coverage gaps rather than duplicating existing scenarios.
Batch API vs Real-Time for CI Workflows
The Message Batches API offers 50% cost savings but has processing times up to 24 hours with no guaranteed latency SLA. This creates a clear decision boundary:
| Workflow type | API choice | Reason |
|---|---|---|
| Pre-merge checks (blocking) | Real-time (synchronous) | Developers wait for results |
| Overnight technical debt reports | Batch API | Not time-sensitive, 50% savings |
| Weekly code audit | Batch API | Scheduled, latency-tolerant |
| Nightly test generation | Batch API | Runs overnight, reviewed next morning |
Pre-merge checks are blocking workflows. Developers cannot merge until the check completes. Batch API is unsuitable because there is no latency guarantee. The exam tests this distinction directly (Sample Question 11).
Exam Traps
CI pipeline hanging because Claude Code is waiting for interactive input
The fix is the -p (--print) flag. Not CLAUDE_HEADLESS=true (does not exist), not --batch (does not exist), not stdin redirection. The -p flag is the documented method for non-interactive execution.
Assuming self-review in the same session is as effective as independent review
The same session retains reasoning context from code generation, making it less likely to question its own decisions. An independent review instance without that context is more effective at finding issues.
Using the Batch API for pre-merge CI checks
The Message Batches API has up to 24-hour processing time with no latency SLA. Pre-merge checks are blocking workflows where developers wait for results. Use real-time API for blocking checks; batch API for overnight or weekly non-blocking analysis.
Not including prior review findings in subsequent review runs
Without prior context, each review run analyses from scratch and produces duplicate comments. Include previous findings and instruct Claude to report only new or unaddressed issues to maintain developer trust.
Practice Scenario
A CI pipeline script runs claude with a prompt but the job hangs indefinitely. Logs show Claude Code is waiting for interactive input. What is the correct fix?
Build Exercise
Set Up a CI/CD Pipeline with Claude Code
What you'll learn
- Use the -p flag for non-interactive Claude Code execution in CI pipelines
- Configure structured JSON output with --output-format json and --json-schema
- Implement session context isolation between code generation and review
- Set up incremental review to eliminate duplicate findings across runs
- Provide project context via CLAUDE.md for CI-invoked Claude Code
- Write a CI script that runs Claude Code with the -p flag for non-interactive PR analysis
Why: The -p flag is the single most directly testable fact in Domain 3. Without it, the CI job hangs indefinitely waiting for interactive input. This is Question 10 in the official sample questions.
You should see: A CI script (GitHub Actions YAML, GitLab CI, or similar) that invokes claude -p with a review prompt. The job completes successfully without hanging. The output is printed to stdout and captured by the CI system.
- Add --output-format json and --json-schema to produce structured findings with file, line, severity, and message fields
Why: CI output must be machine-parseable. Automated systems need structured JSON to post inline PR comments, filter by severity, and track findings across runs. Human-readable text output cannot be reliably parsed by downstream tools.
You should see: The Claude Code output is valid JSON conforming to the specified schema. Each finding has file, line, severity, and message fields. The output can be piped to jq or parsed by a script without errors.
- Configure the pipeline to parse the JSON output and post findings as inline PR comments
Why: Inline PR comments at exact file and line numbers provide actionable feedback. Generic PR-level comments are ignored. Structured JSON output makes precise inline commenting possible.
You should see: Each finding from the JSON output appears as an inline comment on the PR at the exact file and line number. Severity levels are visible. Developers can see the finding in context alongside the code it references.
- Add a section to CLAUDE.md documenting testing standards, available fixtures, and review criteria for CI-invoked Claude Code
Why: Claude Code reads CLAUDE.md in CI just as in interactive mode. Without project context, CI-invoked test generation produces low-value boilerplate. With testing standards and fixture documentation, generated tests follow team patterns.
You should see: The CLAUDE.md file contains a clearly marked CI-relevant section with testing standards, available fixture paths, and review severity criteria. CI-invoked Claude Code produces tests using the documented factories and fixtures rather than generic boilerplate.
- Set up two separate Claude Code invocations: one for code generation and an independent one for review (no shared session context)
Why: The same session that generated code is less effective at reviewing it because it retains reasoning context that biases it toward its own decisions. Independent review instances evaluate code on its own merits without prior justification bias.
You should see: Two distinct claude -p invocations in the CI script: one for generation and one for review. They share no session context. The review invocation analyses the generated code independently. The review findings are more thorough than self-review in the same session.
- Implement incremental review: store previous findings, include them in the next review run, and instruct Claude to report only new or still-unaddressed issues
Why: Without incremental context, each review run analyses the entire PR from scratch and produces duplicate comments. Duplicate comments erode developer trust — when the same five issues appear on every push regardless of fixes, developers stop reading them.
You should see: The first review run produces findings and stores them (as a JSON artifact or file). Subsequent runs include the previous findings in context. The output contains only new issues or issues that remain unaddressed. Previously fixed issues do not reappear as comments.