There is a long-lived AWS access key in your Terraform pipeline right now: a CI variable named AWS_SECRET_ACCESS_KEY, scoped to the whole project, masked, protected, rotated last quarter. It can create IAM roles, delete buckets, and read every other secret in the account, with no expiry. If it leaks through a set -x, a third-party action, or a printed environment, the blast radius is the entire account until a human revokes it — and nobody will know it leaked, because the key looks identical whether your pipeline uses it or an attacker does.
The fix is not “rotate more often.” It is to stop issuing standing credentials at all. Every credential your pipeline touches should be minted on demand, scoped to the run that needs it, and revoked when the run ends. Vault’s dynamic secrets engines do the minting; OIDC federation handles the bootstrap so there is no “secret zero” to protect; and Terraform’s ephemeral values keep the minted secret out of state. This guide wires all three together, then migrates a static-key pipeline onto them. It assumes Vault 1.15+, Terraform 1.11+, and the hashicorp/vault provider 5.x.
1. Where static credentials actually hide
Before removing secrets, name every place they live. There are four, and teams usually think about only the first.
| Location | Example | Why it is dangerous |
|---|---|---|
| CI/CD variables | AWS_SECRET_ACCESS_KEY, ARM_CLIENT_SECRET |
Standing, broadly scoped, survive every run |
| Provider config | provider "aws" { access_key = var.key } |
Often committed; ends up in plan logs |
| Terraform state | aws_db_instance.password, any data.vault_* read |
Stored in plaintext inside state, forever |
| Plan/apply logs | terraform plan echoing a sensitive variable |
Persisted in CI artifacts and run history |
The state file is the worst offender because it is silent. A
data "vault_generic_secret"block reads a secret at plan time and writes it verbatim intoterraform.tfstate. Encrypting the backend does not help anyone holdingterraform state pullor read access to the bucket. Any secret that flows through adatasource or a normal resource argument is a secret you have persisted, not one you have merely used.
The target end state: provider auth comes from a short-lived token federated from CI identity, and any secret a resource needs is read through an ephemeral resource or passed via a write-only argument so it is never serialized.
2. Dynamic secrets engines for AWS, Azure, and databases
A dynamic secrets engine generates credentials at request time, hands back a lease, and revokes the credential automatically when the lease expires. Enable the AWS engine and define a role mapped to a least-privilege policy:
# Bootstrap credential: an IAM user Vault uses ONLY to mint other creds.
vault secrets enable -path=aws aws
vault write aws/config/root \
access_key="$VAULT_AWS_BOOTSTRAP_KEY" \
secret_key="$VAULT_AWS_BOOTSTRAP_SECRET" \
region=eu-west-1
# A role that produces STS credentials limited to one IAM policy.
vault write aws/roles/terraform-plan \
credential_type=assumed_role \
role_arns="arn:aws:iam::111122223333:role/tf-plan-readonly"
vault write aws/roles/terraform-apply \
credential_type=assumed_role \
role_arns="arn:aws:iam::111122223333:role/tf-apply"
Reading the role returns a fresh, expiring credential — access_key, secret_key, security_token, plus a lease_id and lease_duration:
vault read aws/creds/terraform-apply
The Azure engine works the same way (vault secrets enable -path=azure azure, then a role whose azure_roles scopes a short-lived service principal to a resource group), and GCP, RabbitMQ, and others follow the identical enable/config/role shape. The database engine is where dynamic credentials earn their keep: instead of a shared application password baked into config, Vault creates a unique database user per lease and drops it on revocation:
vault secrets enable -path=postgres database
vault write postgres/config/app-db \
plugin_name=postgresql-database-plugin \
allowed_roles="app-readwrite" \
connection_url="postgresql://{{username}}:{{password}}@db.internal:5432/app?sslmode=require" \
username="$VAULT_DB_BOOTSTRAP_USER" \
password="$VAULT_DB_BOOTSTRAP_PASSWORD"
vault write postgres/roles/app-readwrite \
db_name=app-db \
default_ttl=1h max_ttl=24h \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO \"{{name}}\";"
Each engine still relies on one bootstrap credential (aws/config/root, azure/config, the DB admin). That is the engine’s own secret zero, not your pipeline’s, and it lives only inside Vault — your pipeline never sees it.
3. Vault-backed dynamic provider credentials in HCP Terraform
On HCP Terraform (or Terraform Enterprise), you do not have to script vault read and shuttle the output into provider env vars. Vault-backed dynamic provider credentials make the platform authenticate to Vault and inject minted cloud credentials into the run, with no static keys in the workspace. It stacks two features: the workspace federates its own signed workload-identity token into a Vault JWT role (so HCP Terraform holds no Vault token), and the Vault-backed AWS/Azure/GCP layer uses that session to mint cloud credentials from the secrets engine for the run’s duration.
Set these workspace environment variables. The AWS-specific names below are exact:
# Enable Vault auth for the workspace (common variables).
TFC_VAULT_PROVIDER_AUTH = "true"
TFC_VAULT_ADDR = "https://vault.internal:8200"
TFC_VAULT_RUN_ROLE = "tfc-aws" # Vault JWT role HCP Terraform logs into
# TFC_VAULT_NAMESPACE = "admin/platform" # Vault Enterprise/HCP namespace, if used
# Enable the Vault-backed AWS layer (AWS-specific variables).
TFC_VAULT_BACKED_AWS_AUTH = "true"
TFC_VAULT_BACKED_AWS_AUTH_TYPE = "assumed_role" # or iam_user / federation_token
TFC_VAULT_BACKED_AWS_RUN_VAULT_ROLE = "terraform-apply" # role in the aws/ engine
TFC_VAULT_BACKED_AWS_MOUNT_PATH = "aws" # defaults to "aws"
# Split plan/apply privileges by using separate roles:
TFC_VAULT_BACKED_AWS_PLAN_VAULT_ROLE = "terraform-plan"
TFC_VAULT_BACKED_AWS_APPLY_VAULT_ROLE = "terraform-apply"
With those set, the AWS provider block carries no credentials at all:
provider "aws" {
region = "eu-west-1"
# No access_key, no secret_key, no profile.
# HCP Terraform injects STS credentials from Vault for this run.
}
The mechanism is identical for Azure (TFC_VAULT_BACKED_AZURE_*) and GCP (TFC_VAULT_BACKED_GCP_*). The platform handles lease acquisition and revocation: when the run finishes, the lease is revoked and the credential dies. Using TFC_VAULT_BACKED_AWS_PLAN_VAULT_ROLE to give plans read-only access while reserving the privileged role for apply is the highest-value control here, since most pipeline activity is planning.
4. Authenticating CI to Vault with JWT/OIDC and bound claims
Self-hosted CI (GitHub Actions, GitLab, Azure DevOps) authenticates to Vault using the JWT auth method and the OIDC token the platform already issues to every job — no Vault token, no AppRole secret_id to rotate. Configure Vault to trust the CI provider’s issuer:
vault auth enable jwt
# GitHub Actions OIDC issuer + its JWKS, so Vault can verify token signatures.
vault write auth/jwt/config \
oidc_discovery_url="https://token.actions.githubusercontent.com" \
bound_issuer="https://token.actions.githubusercontent.com"
The critical control is bound_claims. Without it, any GitHub repository could authenticate to this role. Bind it to your specific repository and, ideally, a protected branch or environment:
vault policy write tf-apply - <<EOF
path "aws/creds/terraform-apply" { capabilities = ["read"] }
path "postgres/creds/app-readwrite" { capabilities = ["read"] }
EOF
vault write auth/jwt/role/gha-terraform-apply \
role_type="jwt" \
user_claim="actor" \
bound_audiences="https://github.com/acme-corp" \
bound_claims_type="glob" \
bound_claims=-<<EOF
{
"repository": "acme-corp/platform-infra",
"ref": "refs/heads/main",
"job_workflow_ref": "acme-corp/platform-infra/.github/workflows/apply.yml@refs/heads/main"
}
EOF
token_policies="tf-apply" \
token_ttl="20m" \
token_max_ttl="30m"
Bind on
job_workflow_reforref, not justrepository. Binding only the repository lets any branch — including an attacker’s PR branch that edits the workflow — assume the apply role. Pinningreftorefs/heads/mainplus a protected-branch rule keeps the privileged role reachable only from reviewed, merged code.
The GitHub Actions job exchanges its OIDC token for a Vault token in one step:
permissions:
id-token: write # required to mint the OIDC token
contents: read
jobs:
apply:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Authenticate to Vault
uses: hashicorp/vault-action@v3
with:
url: https://vault.internal:8200
method: jwt
role: gha-terraform-apply
# Pulls the OIDC token from the runner automatically.
exportToken: true # sets VAULT_TOKEN for later steps
secrets: |
aws/creds/terraform-apply access_key | AWS_ACCESS_KEY_ID ;
aws/creds/terraform-apply secret_key | AWS_SECRET_ACCESS_KEY ;
aws/creds/terraform-apply security_token | AWS_SESSION_TOKEN
- name: Terraform apply
run: |
terraform init
terraform apply -auto-approve
The AWS credentials in the environment are now STS tokens valid for the role’s lease TTL, federated from a token that lived only for this job. No standing secret exists anywhere in the chain.
5. Aligning leases, TTLs, and revocation to pipeline lifetime
Dynamic credentials are only as good as their TTLs. The rule: a credential’s lifetime should match the operation, not the day. A plan takes minutes, an apply minutes to low tens of minutes. A one-hour TTL on an apply credential leaves a 50-minute window of usable, leaked credential after the run ends.
Set TTLs at the role and tighten the auth token at the JWT role:
# Cap how long minted AWS creds live regardless of caller request.
vault write aws/roles/terraform-apply \
credential_type=assumed_role \
role_arns="arn:aws:iam::111122223333:role/tf-apply" \
default_sts_ttl=20m max_sts_ttl=30m
Explicitly revoke at the end of the job so credentials die the moment work finishes, rather than waiting for expiry:
# Best-effort revocation in an always-run cleanup step.
vault lease revoke -prefix aws/creds/terraform-apply || true
vault token revoke -self || true
In GitHub Actions, put this in a step with if: always(). Revocation is idempotent and safe to fail. The combination — short max_sts_ttl as the backstop, explicit revoke as the fast path — means a credential is almost never valid outside the run that created it.
Do not raise TTLs to “fix” flaky long applies. If an apply legitimately exceeds 30 minutes, split the configuration or use targeted applies — do not widen the window. A 4-hour TTL chosen for one slow module is a 4-hour exposure on every run.
6. Keeping secrets out of state with ephemeral resources and write-only arguments
Federating the provider solves authentication but does nothing for secrets a resource consumes — a generated database password, an API token written into a Key Vault. Read those with an ephemeral resource (Terraform 1.10+) and pass them via a write-only argument (Terraform 1.11+). Neither is ever written to plan or state.
The vault_database_secret ephemeral resource mints a dynamic DB credential exposed only during the run:
terraform {
required_providers {
vault = { source = "hashicorp/vault", version = "~> 5.1" }
}
}
# Mints a unique DB user from the role; never serialized to state.
ephemeral "vault_database_secret" "app" {
mount = "postgres"
name = "app-readwrite" # the database role name
}
ephemeral blocks may only be referenced from other ephemeral contexts: provider config, other ephemeral resources, ephemeral variables/outputs, or write-only arguments. Assigning ephemeral.vault_database_secret.app.password to a normal resource argument is rejected at validate time — exactly the guardrail you want.
Feed the value into a resource through its write-only (_wo) argument. Write-only arguments are not stored in state; only the companion _wo_version integer is persisted, and you bump it to force a new value to be sent:
# azurerm 4.34+ exposes write-only secret values.
resource "azurerm_key_vault_secret" "db_password" {
name = "app-db-password"
key_vault_id = azurerm_key_vault.app.id
value_wo = ephemeral.vault_database_secret.app.password
value_wo_version = 1 # increment to rotate the stored secret
}
Compare the two data paths:
| Approach | What lands in state | Rotation trigger |
|---|---|---|
data "vault_generic_secret" -> value |
The secret, in plaintext | Re-read on every plan |
ephemeral + value_wo |
Nothing but value_wo_version |
Increment value_wo_version |
To compute or template a sensitive value before passing it on, mark variables and outputs ephemeral = true so they too are excluded from artifacts:
variable "extra_db_grant" {
type = string
ephemeral = true
}
Migrate one secret-bearing resource at a time. After switching to value_wo, run terraform plan and confirm the diff shows the write-only version changing rather than the secret, then terraform state pull to prove the secret is gone.
7. Auditing, secret zero, and break-glass
Dynamic credentials make auditing tractable because every credential is attributable. Enable an audit device so every request is logged with caller identity, path, and lease:
vault audit enable file file_path=/var/log/vault/audit.log
Audit logs HMAC sensitive values by default, so they record that gha-terraform-apply read aws/creds/terraform-apply, without leaking the credential. Ship these to your SIEM and alert on what should never happen: a JWT login whose repository claim is not allow-listed, a privileged-role read outside change windows, or any direct use of an engine’s bootstrap credential.
Secret zero is the credential that bootstraps everything else. OIDC federation largely eliminates it for CI — the “secret” is the platform’s signing key, which you do not hold. What remains is each engine’s admin credential (aws/config/root, the DB admin). Reduce its standing power: scope the AWS bootstrap user to only sts:AssumeRole and iam:GetUser, and where supported, configure engines to use instance/workload identity instead of a static key.
Break-glass must exist and must be loud. When Vault is unreachable and an apply genuinely cannot wait, you need a documented path that is auditable and self-expiring:
# Pre-create a sealed-away root-policy token, stored offline, with a hard cap.
vault token create \
-policy=break-glass \
-ttl=1h -use-limit=10 \
-display-name=break-glass-$(date +%Y%m%d)
Keep it out of normal automation, require two people to retrieve the token, and fire a high-severity alert the instant the break-glass policy is used. A break-glass path nobody notices is just a backdoor.
Verify
Prove the static credentials are gone, not merely unused.
# 1. A minted credential is short-lived and tied to a lease.
vault read -format=json aws/creds/terraform-apply | jq '{lease_duration, lease_id}'
# 2. The provider blocks carry no credentials.
grep -REn 'access_key|secret_key|client_secret|password\s*=' ./*.tf && echo "STATIC CREDS FOUND" || echo "clean"
# 3. No secret persisted in state from ephemeral usage.
terraform state pull | jq -r '.. | objects | select(has("password")) | .password' \
| grep -v null && echo "SECRET IN STATE" || echo "no plaintext secrets in state"
A passing run shows a non-null lease_duration, “clean” provider files, and “no plaintext secrets in state”. In the cloud audit trail (CloudTrail / Azure Activity Log), the actor for the apply should be the assumed role with a federated STS session, never a long-lived IAM user.