A single IaC scanner is a checkbox; a gate is a contract. The difference shows up the first time someone ships a public bucket through a wrapper module your one scanner could not parse, or the tenth time a developer rubber-stamps “12 medium findings” because the gate has cried wolf since onboarding. The job is not to run a tool — it is to catch the misconfigurations that become incidents, suppress noise with an audit trail, and fail the build on the right things so people keep trusting the red X.
This guide assembles that gate from two complementary tools. Checkov (Prisma Cloud / Bridgecrew) is graph-aware and extensible — you write policy in Python or YAML against the parsed resource graph. Trivy (Aqua) adds a second, independently maintained ruleset plus secret detection in the same binary you already use for images. Running both and normalizing their output gives defense in depth without betting your posture on one vendor’s coverage.
1. The misconfiguration threat model and where scanning fits
Static IaC analysis catches one class of defect: a resource declared insecurely by construction. Public storage, unencrypted volumes, security groups open to 0.0.0.0/0, IAM policies with Action: "*", logging disabled. These decisions are baked into the template, visible before provisioning, and cheap to catch.
Be honest about what it does not catch, so you do not oversell it:
| Catches | Misses |
|---|---|
| Insecure resource attributes in the declared config | Runtime drift after a console change |
| Hardcoded secrets and high-entropy strings in source | Logic spanning data sources resolved only at apply |
| Missing encryption, logging, public-access flags | Identity reachability (“can this principal actually reach that bucket?”) |
| Known-bad patterns from a curated policy library | Business intent (“this bucket is meant to be public”) |
Scanning is the cheapest layer of a stack that also includes plan-JSON policy-as-code (OPA/Conftest), admission control, and runtime CSPM. It runs earliest and fails fastest: put it on the pull request, keep it under a minute, reserve slower checks for nightly. But it evaluates the template, not the world — a green scan proves the declared config is clean, not that the deployed resource matches. Necessary, never sufficient.
2. Running Checkov across Terraform, plan JSON, Bicep, and CloudFormation
Checkov auto-detects frameworks under a directory, so the baseline invocation is one command. Pin it in CI — a floating latest breaks builds non-deterministically when it adds a check.
pip install checkov==3.2.450
# Scan everything Checkov can parse under the repo.
checkov --directory . --compact --quiet
# Or scope to one framework / one file to avoid re-parsing the whole tree.
checkov -d ./terraform --framework terraform
checkov -f ./terraform/s3.tf
--quiet drops passed checks; --compact trims the code block from each finding.
Scan the plan, not just the HCL. Raw HCL hides anything resolved at plan time — variable defaults, default_tags, computed names. Render the plan to JSON and feed that so the gate sees what actually deploys:
terraform init -backend=false
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
# Checkov understands the plan representation directly.
checkov -f tfplan.json --framework terraform_plan --compact
Scanning both HCL (fast, no credentials) and plan JSON (accurate through modules) is the belt-and-suspenders default for any non-trivial estate.
Checkov also parses Bicep and CloudFormation natively:
# Bicep — Checkov compiles via the bicep CLI, so it must be on PATH.
checkov -d ./bicep --framework bicep --compact
# CloudFormation (JSON or YAML); AWS SAM is recognized via its transform.
checkov -d ./cfn --framework cloudformation --compact
checkov -f ./sam/template.yaml --framework cloudformation
Bicep support depends on the
bicepCLI being onPATH— Checkov compiles to ARM JSON first. If the CI image lacks it, Bicep files are silently skipped, so assert the framework ran (see Verify). Usecheckov --listto see every available check.
3. Authoring custom Checkov policies in Python and YAML
The built-in library is broad but generic. Organizational rules — “every resource carries a cost_center tag,” “no S3 bucket outside an approved region” — you write yourself. Use YAML for attribute and connection-state checks; drop to Python for real logic.
A YAML policy lives in a --external-checks-dir. This one requires S3 buckets to declare a cost_center tag:
# policies/yaml/s3_cost_center_tag.yaml
metadata:
id: "CKV_ORG_S3_1"
name: "S3 buckets must carry a cost_center tag"
category: "CONVENTION"
severity: "MEDIUM"
definition:
cond_type: "attribute"
resource_types:
- "aws_s3_bucket"
attribute: "tags.cost_center"
operator: "exists"
A Python check has the full graph available. This one denies any IAM policy granting Action: "*" on Resource: "*":
# policies/python/IAMNoStarStar.py
from checkov.terraform.checks.resource.base_resource_check import BaseResourceCheck
from checkov.common.models.enums import CheckCategories, CheckResult
class IAMNoStarStar(BaseResourceCheck):
def __init__(self):
super().__init__(
name="IAM policy must not allow Action:* on Resource:*",
id="CKV_ORG_IAM_1",
categories=[CheckCategories.IAM],
supported_resources=["aws_iam_policy"],
)
def scan_resource_conf(self, conf):
policy = conf.get("policy")
if not policy or not isinstance(policy[0], str):
# Computed/HCL-expression policy: cannot evaluate statically.
return CheckResult.UNKNOWN
body = policy[0]
if '"Action": "*"' in body and '"Resource": "*"' in body:
return CheckResult.FAILED
return CheckResult.PASSED
check = IAMNoStarStar()
Run custom policies layered on top of the built-ins with --external-checks-dir; add --check CKV_ORG_IAM_1,... to scope a run to just your org checks:
checkov -d ./terraform --external-checks-dir ./policies/python --compact
Returning
CheckResult.UNKNOWNon computed values separates a useful policy from a flaky one — a check that hard-fails when it cannot read an attribute trains people to suppress it. For security-critical attributes, fail closed on the plan JSON where the value is resolved, not on raw HCL.
Test custom policies before they gate anyone: Checkov ships a pytest harness, so point it at example resources that should pass and fail (pytest policies/python/tests/ -q).
4. Trivy: config scanning, secret detection, and built-in misconfig checks
Trivy is the second opinion. Its config (alias misconfig) subcommand scans Terraform, CloudFormation, Helm, Dockerfiles, and Kubernetes manifests with its own ruleset, detecting secrets in the same pass.
# Misconfiguration scan over a directory tree.
trivy config ./terraform
# Scan a rendered Terraform plan directly.
trivy config --tf-vars prod.tfvars ./terraform
trivy config tfplan.json
Secret scanning is on by default for trivy fs. This is what catches an access key pasted into a .tfvars or a locals.tf:
# Vulnerabilities + secrets + misconfigurations; drop scanners for a faster pre-commit pass.
trivy fs --scanners vuln,secret,misconfig .
Constrain Trivy the same way as Checkov — by severity and exit code (Section 7). Its misconfig rules overlap Checkov’s but are not identical, and that overlap is the point: when both flag the same volume you have high confidence; when only one does, you caught what the other missed. Deduplicate at the reporting layer, not by dropping a tool.
Trivy pulls its policy bundle from a registry on first run and caches it. In an egress-restricted runner, pre-pull the bundle, or the misconfig scan silently runs zero policies and reports a meaningless clean pass.
5. Managing false positives with inline skips and centralized baselines
Every estate generates findings you will not fix today: a deliberately public docs bucket, a third-party module you cannot edit. Suppression must be attributable and reviewable, never a blanket --skip-check buried in a CI script.
Inline, for Checkov, suppress one check on one resource with a reason that lives next to the code it excuses:
resource "aws_s3_bucket" "public_docs" {
bucket = "acme-public-docs"
# checkov:skip=CKV_AWS_20:Intentionally public; serves the docs site. JIRA SEC-1421
}
Inline, for Trivy, use a trailing ignore comment on the offending line or block:
#trivy:ignore:AVD-AWS-0089
resource "aws_s3_bucket" "logs" {
bucket = "acme-access-logs"
}
For suppressions that should not live in application code — third-party modules, time-boxed exceptions — use centralized baselines. Trivy reads a .trivyignore.yaml:
# .trivyignore.yaml
misconfigurations:
- id: AVD-AWS-0086
paths:
- "modules/legacy-vpc/*"
statement: "Vendored module; upstream PR open."
expired_at: 2026-09-30
Checkov reads a .checkov.yaml at the repo root that centralizes skips, frameworks, and excludes (all the CLI flags above, plus a project-wide skip-check list):
# .checkov.yaml
skip-check:
- CKV_AWS_18 # access logging handled centrally by org SCP
framework: [terraform, terraform_plan]
The governing rule: a suppression without a reason and an owner is debt you cannot find later. Inline skips force a :reason; baseline entries get a statement and an expired_at. Review them in the same PR as the code, and audit expirations so they actually expire.
6. Normalizing SARIF and surfacing findings in code review
Both tools emit SARIF, the format GitHub code-scanning ingests — the lingua franca that lets two scanners annotate the same diff without glue. Checkov takes --output sarif --output-file-path checkov.sarif and Trivy takes --format sarif --output trivy.sarif; in a GitHub workflow the job needs permissions: security-events: write, then three steps (after actions/checkout):
- name: Checkov
uses: bridgecrewio/checkov-action@v12
with:
directory: .
output_format: cli,sarif
output_file_path: console,checkov.sarif
soft_fail: true # do not fail here; gate on severity later
quiet: true
- name: Trivy config
uses: aquasecurity/trivy-action@0.28.0
with:
scan-type: config
scan-ref: ./terraform
format: sarif
output: trivy.sarif
exit-code: "0"
# One upload per tool; repeat for trivy.sarif / category: trivy.
- name: Upload Checkov SARIF
if: always() # upload even if a prior step failed
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: checkov.sarif
category: checkov
Two details are load-bearing. The category keys results in GitHub, so distinct categories stop Checkov and Trivy findings overwriting each other. The if: always() runs the upload even when a scan step fails the job — otherwise a red build shows no annotations explaining why. Both scan steps soft_fail/exit-code: 0 so findings always upload; the gate lives in a later step (Section 7).
7. Severity thresholds, exit codes, and failing the build appropriately
A gate that fails on everything gets disabled. Fail on what is exploitable and actionable; report the rest. Both tools express this through severity selection plus exit codes.
For Checkov, --soft-fail always exits 0 (report-only), while --soft-fail-on and --hard-fail-on split by severity:
# Report everything, but only HIGH and CRITICAL fail the build.
checkov -d . --soft-fail-on LOW,MEDIUM --hard-fail-on HIGH,CRITICAL --compact
echo "checkov exit: $?"
Severity gating requires Checkov to know each check’s severity; full metadata ships with the Prisma/Bridgecrew integration, while OSS severities cover the curated set. Verify your critical checks carry the severity you gate on rather than assuming it.
For Trivy, --severity filters what is reported and --exit-code 1 makes any reported finding non-zero. A sane policy:
| Severity | PR behavior | Default-branch / nightly |
|---|---|---|
| CRITICAL | Block | Block |
| HIGH | Block | Block |
| MEDIUM | Report (annotate) | Report + ticket |
| LOW / INFO | Report | Report |
Secrets are the exception: a detected live secret is always blocking, at any severity, on any branch. There is no acceptable medium-severity hardcoded credential.
Wire these two gating commands as their own CI run: step so reporting and gating stay decoupled — the SARIF uploads ran earlier with soft-fail, and this is the only step allowed to fail the job.
8. Tracking posture over time and avoiding alert fatigue
A gate stops new badness; it does not tell you whether posture is improving. For that you need a trend, plus a strategy for the pre-existing backlog so the gate does not block PRs on debt nobody introduced.
Baseline existing findings so only new ones gate. Checkov snapshots current findings; later runs compare against the baseline and fail only on regressions:
checkov -d . --create-baseline && git add .checkov.baseline
# Future runs fail only on findings NOT in the baseline.
checkov -d . --baseline .checkov.baseline --hard-fail-on HIGH,CRITICAL
This makes adoption survivable on a brownfield estate: strict for anything new, lenient for the documented backlog, with the baseline a reviewable artifact you burn down. Regenerate it deliberately — an auto-refreshed baseline silently accepts whatever regressed.
Emit machine-readable output to a dashboard so you watch the trend instead of logs. Both tools support --format json; ship the summary per run:
checkov -d . --output json --output-file-path checkov.json
jq '.summary' checkov.json # -> { "passed": N, "failed": M, "skipped": K, ... }
Push failed, skipped, and severity-bucketed counts to a time-series DB or Grafana panel. The metrics that matter: HIGH/CRITICAL count over time (should trend down), active suppression count (should not silently grow), and mean age of an open finding. The leading indicator of a dying gate is suppressions rising against flat findings — people silencing, not fixing — so alert on that ratio, not raw counts.
Verify
Prove each layer works before you trust it. A clean run is not coverage — assert it.
# 1. Both tools present and pinned.
checkov --version && trivy --version
# 2. Every expected framework actually parsed (a missing line = silent zero coverage).
checkov -d . --compact | grep -iE "terraform|bicep|cloudformation"
# 3. A deny fires: plant a public bucket, confirm a FAILED line and non-zero exit.
checkov -f ./terraform/bad_example.tf --hard-fail-on HIGH,CRITICAL; echo "exit: $?"
# 4. Secret detection fires: drop a fake AWS key in a tracked file, confirm non-zero.
trivy fs --scanners secret --exit-code 1 .; echo "exit: $?"
# 5. A suppression works: add an inline checkov:skip with a reason, confirm it passes.
# 6. SARIF exists and the PR shows annotations from BOTH categories.
test -s checkov.sarif && test -s trivy.sarif && echo "SARIF OK"
# 7. The baseline gates only on NEW findings: re-run with no new bad config, expect exit 0.
checkov -d . --baseline .checkov.baseline; echo "exit: $?"
The decisive test is #4. If a planted credential does not break the build, every other layer is theater — a scanner that misses secrets misses the thing most likely to cause a breach.
Checklist
Pitfalls and next steps
Two failure modes quietly hollow out an IaC gate. The first is silent zero coverage: Checkov skips Bicep without the bicep CLI, Trivy runs zero rules when it cannot pull its bundle, and the build goes green — certifying safety that was never checked. The second is suppression rot: skips without a :reason, owner, and expiry accumulate into a list nobody audits while coverage erodes one exception at a time. Both defenses are in the checklist: assert every framework parsed, and make every suppression attributable and time-boxed.
From here, the high-value extensions are: feeding the same plan JSON to OPA/Conftest for logic awkward as attribute checks (cross-resource invariants, Infracost cost ceilings); packaging custom Checkov policies as a versioned, separately tested artifact rather than a folder people copy; and extending Trivy from IaC into the image and SBOM scanning it already does, so one binary and one SARIF pipeline cover config, dependencies, and secrets end to end.