3.6 — CI/CD Integration | Claude Certification Guide

What You Need to Know

Integrating Claude Code into CI/CD pipelines transforms it from an interactive developer tool into an automated review and generation engine. The exam tests five specific concepts in this task statement, with the -p flag being the single most directly tested item (it is Question 10 in the sample question set).

The -p Flag: Non-Interactive Mode

Claude Code defaults to interactive mode — it expects keyboard input and displays a conversational interface. In a CI pipeline, there is no keyboard. Without the -p flag, the CI job hangs indefinitely, waiting for input that will never arrive.

bash

# WRONG — hangs in CI
claude "Analyse this pull request for security issues"

# CORRECT — runs non-interactively
claude -p "Analyse this pull request for security issues"

The -p flag (also --print) switches Claude Code to print mode: it processes the prompt, outputs the result to stdout, and exits. No interactive input required.

This is a memorisation item. The exam presents a scenario where a CI job hangs, logs show Claude waiting for input, and you must select the correct fix. The answer is the -p flag. Not CLAUDE_HEADLESS=true (does not exist). Not --batch (does not exist). Not stdin redirection from /dev/null (does not properly address Claude Code's interactive mode).

Key Concept

The -p flag is the single most directly testable fact in Domain 3. It is Question 10 in the official sample questions. When you see a CI pipeline hanging and logs showing Claude waiting for input, the answer is always -p.

Structured Output for CI

In CI pipelines, Claude Code output must be machine-parseable. Humans are not reading the output — automated systems are processing it to post inline PR comments, update dashboards, or trigger downstream workflows.

Two flags work together:

--output-format json — forces JSON output instead of human-readable text
--json-schema — enforces a specific JSON structure for the output

bash

claude -p \
  --output-format json \
  --json-schema '{"type":"object","properties":{"findings":{"type":"array","items":{"type":"object","properties":{"file":{"type":"string"},"line":{"type":"integer"},"severity":{"type":"string"},"message":{"type":"string"}}}}}}' \
  "Review this PR for security issues"

The output conforms to the specified schema, enabling automated systems to:

Parse findings programmatically
Post findings as inline PR comments at the exact file and line
Filter by severity for different notification channels
Track findings across review runs

Session Context Isolation

The same Claude session that generated code is less effective at reviewing its own changes. This is not a theoretical concern — it is a measurable effect.

Why self-review is weaker:

When Claude generates code in a session, it builds up reasoning context: why it chose this approach, what tradeoffs it considered, what alternatives it rejected. When you then ask it to review the same code in the same session, it retains that reasoning context. It is less likely to question decisions it already justified to itself.

The fix: independent review instances

Use a separate Claude Code invocation for review — one that has no access to the generation session's reasoning context. The independent reviewer evaluates the code on its own merits, without the bias of prior justification.

bash

# Step 1: Generate code (session A)
claude -p "Implement the authentication middleware"

# Step 2: Review code (session B — independent, no shared context)
claude -p "Review the authentication middleware for security issues, error handling gaps, and edge cases"

This concept connects to Domain 4 (multi-instance review architectures) and Domain 5 (context management). The exam tests it in CI/CD scenarios specifically.

Incremental Review Context

Automated reviews run on every push. Without context about previous reviews, each run analyses the entire PR from scratch. This creates duplicate comments — the same issue flagged on every push, even after the developer has acknowledged or addressed it.

The fix: include prior review findings in context and instruct Claude to report only new or still-unaddressed issues.

bash

claude -p \
  --output-format json \
  "Review this PR. Here are the findings from the previous review:
  ${PREVIOUS_FINDINGS}

  Report ONLY:
  1. New issues not in the previous findings
  2. Issues from the previous findings that are still present

  Do NOT re-report issues that have been addressed."

Duplicate comments erode developer trust. If every push generates the same five comments regardless of whether the developer fixed the issues, developers stop reading the comments. Incremental review context preserves the signal-to-noise ratio.

CLAUDE.md for CI Context

When Claude Code runs in CI, it reads the project's CLAUDE.md files just as it does in interactive mode. This means the CLAUDE.md is the mechanism for providing project-specific context to CI-invoked Claude Code:

Testing standards: what makes a valuable test, what patterns to follow, what to avoid
Available fixtures: which test fixtures exist, how to use them, what data they contain
Review criteria: what constitutes a critical finding vs a minor style issue
Existing test coverage: what is already covered, to avoid suggesting duplicate tests

Without this context in CLAUDE.md, CI-invoked test generation produces low-value boilerplate. With it, generated tests follow the team's patterns and add genuine coverage.

text

# .claude/CLAUDE.md — CI-relevant section
## Testing Standards

- Tests must use the factory pattern from test/factories/ for data creation
- Integration tests connect to the test database via test/setup/db.ts
- Do not test private implementation details — test public API contracts
- Coverage target: 80% branch coverage for new code
- Available fixtures: test/fixtures/users.json, test/fixtures/orders.json

CLI Flags Reference

The -p flag is the most directly tested flag, but the exam also expects familiarity with the flags that shape a headless run: how output is formatted, which system prompt is used, and how permissions and tools are scoped. These flags work with claude -p in CI and with the interactive claude command.

System prompt flags. Claude Code provides four flags here, and the exam tests the append-versus-replace distinction:

Flag	Effect
`--system-prompt "<text>"`	Replaces the entire default system prompt
`--system-prompt-file <path>`	Replaces the default prompt with a file's contents
`--append-system-prompt "<text>"`	Appends text to the default prompt
`--append-system-prompt-file <path>`	Appends a file's contents to the default prompt

Append when Claude should stay a coding assistant that also follows your extra rules. Appending keeps the default tool guidance, safety instructions, and coding conventions, so you only supply what differs. Replace when the identity or permission model differs from Claude Code's, like a non-coding agent in a pipeline no human watches. Replacing drops the entire default prompt, so you own everything the task still needs.

Headless output and limits (print mode).

Flag	Effect
`--output-format text\|json\|stream-json`	Output shape for `-p`; `json` and `stream-json` are machine-parseable
`--input-format text\|stream-json`	Input shape for `-p`
`--json-schema '<schema>'`	Validate the final output against a JSON Schema
`--max-turns <n>`	Cap the number of agentic turns, then exit
`--verbose`	Full turn-by-turn output

Permissions, tools, and context.

Flag	Effect
`--permission-mode <mode>`	Start in `default`, `acceptEdits`, `plan`, `auto`, `dontAsk`, or `bypassPermissions`
`--allowedTools "<rules>"`	Tools that run without a permission prompt, e.g. `"Bash(git diff *)" "Read"`
`--disallowedTools "<rules>"`	Deny rules; a bare tool name removes the tool from context entirely
`--tools "Bash,Edit,Read"`	Restrict which built-in tools are available at all
`--add-dir <path>`	Add a directory Claude may read and edit (grants file access, not configuration discovery)
`--model <alias\|name>`	Set the session model (`sonnet`, `opus`, or a full model name)

Session and start-up. -c / --continue resumes the most recent conversation in the current directory, and -r / --resume <id|name> resumes a specific session. --bare is minimal mode: it skips auto-discovery of hooks, skills, plugins, MCP servers, auto memory, and CLAUDE.md so scripted calls start faster, leaving Claude with the Bash and file read/edit tools only. Reach for --bare when you want a fast, predictable scripted run and don't need project configuration loaded.

Providing Existing Tests to Avoid Duplication

When running test generation in CI, include existing test files in context. Without them, Claude Code may suggest tests that already exist, wasting developer review time. Including existing tests enables Claude to identify coverage gaps rather than duplicating existing scenarios.

Batch API vs Real-Time for CI Workflows

The Message Batches API offers 50% cost savings but has processing times up to 24 hours with no guaranteed latency SLA. This creates a clear decision boundary:

Workflow type	API choice	Reason
Pre-merge checks (blocking)	Real-time (synchronous)	Developers wait for results
Overnight technical debt reports	Batch API	Not time-sensitive, 50% savings
Weekly code audit	Batch API	Scheduled, latency-tolerant
Nightly test generation	Batch API	Runs overnight, reviewed next morning

Pre-merge checks are blocking workflows. Developers cannot merge until the check completes. Batch API is unsuitable because there is no latency guarantee. The exam tests this distinction directly (Sample Question 11).

Exam Traps

Exam Trap

CI pipeline hanging because Claude Code is waiting for interactive input

The fix is the -p (--print) flag. Not CLAUDE_HEADLESS=true (does not exist), not --batch (does not exist), not stdin redirection. The -p flag is the documented method for non-interactive execution.

Exam Trap

Assuming self-review in the same session is as effective as independent review

The same session retains reasoning context from code generation, making it less likely to question its own decisions. An independent review instance without that context is more effective at finding issues.

Exam Trap

Using the Batch API for pre-merge CI checks

The Message Batches API has up to 24-hour processing time with no latency SLA. Pre-merge checks are blocking workflows where developers wait for results. Use real-time API for blocking checks; batch API for overnight or weekly non-blocking analysis.

Exam Trap

Not including prior review findings in subsequent review runs

Without prior context, each review run analyses from scratch and produces duplicate comments. Include previous findings and instruct Claude to report only new or unaddressed issues to maintain developer trust.

Practice Scenario

A CI pipeline script runs claude with a prompt but the job hangs indefinitely. Logs show Claude Code is waiting for interactive input. What is the correct fix?

Option AAdd the -p flag to run Claude Code in non-interactive print mode

Option BSet the environment variable CLAUDE_HEADLESS=true before running the command

Option CRedirect stdin from /dev/null to prevent interactive prompts

Option DAdd the --batch flag to enable batch processing mode

Build Exercise

Set Up a CI/CD Pipeline with Claude Code

Difficulty

45 minutes

What you'll learn

Use the -p flag for non-interactive Claude Code execution in CI pipelines
Configure structured JSON output with --output-format json and --json-schema
Implement session context isolation between code generation and review
Set up incremental review to eliminate duplicate findings across runs
Provide project context via CLAUDE.md for CI-invoked Claude Code

Write a CI script that runs Claude Code with the -p flag for non-interactive PR analysis
Why: The -p flag is the single most directly testable fact in Domain 3. Without it, the CI job hangs indefinitely waiting for interactive input. This is Question 10 in the official sample questions.
You should see: A CI script (GitHub Actions YAML, GitLab CI, or similar) that invokes claude -p with a review prompt. The job completes successfully without hanging. The output is printed to stdout and captured by the CI system.
Add --output-format json and --json-schema to produce structured findings with file, line, severity, and message fields
Why: CI output must be machine-parseable. Automated systems need structured JSON to post inline PR comments, filter by severity, and track findings across runs. Human-readable text output cannot be reliably parsed by downstream tools.
You should see: The Claude Code output is valid JSON conforming to the specified schema. Each finding has file, line, severity, and message fields. The output can be piped to jq or parsed by a script without errors.
Configure the pipeline to parse the JSON output and post findings as inline PR comments
Why: Inline PR comments at exact file and line numbers provide actionable feedback. Generic PR-level comments are ignored. Structured JSON output makes precise inline commenting possible.
You should see: Each finding from the JSON output appears as an inline comment on the PR at the exact file and line number. Severity levels are visible. Developers can see the finding in context alongside the code it references.
Add a section to CLAUDE.md documenting testing standards, available fixtures, and review criteria for CI-invoked Claude Code
Why: Claude Code reads CLAUDE.md in CI just as in interactive mode. Without project context, CI-invoked test generation produces low-value boilerplate. With testing standards and fixture documentation, generated tests follow team patterns.
You should see: The CLAUDE.md file contains a clearly marked CI-relevant section with testing standards, available fixture paths, and review severity criteria. CI-invoked Claude Code produces tests using the documented factories and fixtures rather than generic boilerplate.
Set up two separate Claude Code invocations: one for code generation and an independent one for review (no shared session context)
Why: The same session that generated code is less effective at reviewing it because it retains reasoning context that biases it toward its own decisions. Independent review instances evaluate code on its own merits without prior justification bias.
You should see: Two distinct claude -p invocations in the CI script: one for generation and one for review. They share no session context. The review invocation analyses the generated code independently. The review findings are more thorough than self-review in the same session.
Implement incremental review: store previous findings, include them in the next review run, and instruct Claude to report only new or still-unaddressed issues
Why: Without incremental context, each review run analyses the entire PR from scratch and produces duplicate comments. Duplicate comments erode developer trust — when the same five issues appear on every push regardless of fixes, developers stop reading them.
You should see: The first review run produces findings and stores them (as a JSON artifact or file). Subsequent runs include the previous findings in context. The output contains only new issues or issues that remain unaddressed. Previously fixed issues do not reappear as comments.

Sources

Claude Code CLI Reference — Anthropic
Claude Certified Architect Foundations Exam Guide — Task Statement 3.6 — Anthropic
Claude Certified Architect Foundations Exam Guide — Sample Questions 10 and 11 — Anthropic
Anthropic Message Batches API Documentation — Anthropic