Secret Manager Rotation Pipelines with Cloud Functions, IAM, and CMEK

Secret Manager will happily store a database password forever, and that is exactly the problem. A static credential that never changes is one that has already leaked by the time you find out. The fix is not a calendar reminder and a runbook; it is a pipeline: Secret Manager emits a Pub/Sub message when a secret is due to rotate, a Cloud Functions rotator mints a fresh credential and adds it as a new version, your workloads pick up the new version, and the old version is disabled. Done right, the credential changes every 30 days with zero downtime and no human in the loop.

This walkthrough builds that pipeline end to end for a Cloud SQL database password, wraps the secret in CMEK, and locks every identity to least privilege. The same shape works for API keys and TLS material.

1. The Secret Manager object model

Get the model exact before automating; the rotation logic depends on it.

A secret is a logical container with a name, replication policy, optional rotation schedule, and IAM bindings. It holds no payload itself.
A secret version holds the bytes. Versions are numbered monotonically (1, 2, 3…) and are immutable once added.
A version has a state: ENABLED, DISABLED, or DESTROYED. Disabled versions reject access but can be re-enabled; destroyed versions have their payload deleted permanently.
The latest alias always resolves to the highest-numbered ENABLED version. It is the only built-in alias; Secret Manager has no named/staged labels like some other vaults.

That last point drives the zero-downtime design. Because latest follows the newest enabled version, a rotator that adds version N+1 instantly shifts latest while version N stays enabled and valid. Consumers pinned to latest get the new value on their next read; in-flight consumers keep working. You only break something by destroying or disabling the old version too early.

# Create the secret container with no payload yet, automatic replication.
gcloud secrets create db-app-password \
  --replication-policy="automatic" \
  --project="$PROJECT_ID"

# Versions are immutable; "add-version" always creates a new number.
echo -n "initial-bootstrap-pw" | \
  gcloud secrets versions add db-app-password --data-file=- \
  --project="$PROJECT_ID"

# Resolve latest, or pin to an explicit number.
gcloud secrets versions access latest  --secret=db-app-password
gcloud secrets versions access 1       --secret=db-app-password

2. Rotation schedules and Pub/Sub notifications

Secret Manager’s rotation feature does not generate new secret values. It is a scheduler: at next-rotation-time it publishes a message to a topic you nominate, then advances the clock by rotation-period. Minting and storing the credential is yours to implement in a subscriber. That separation is deliberate and it is why the feature is generic.

First, the topic and the publish grant. Secret Manager publishes as a per-project service agent, service-<PROJECT_NUMBER>@gcp-sa-secretmanager.iam.gserviceaccount.com, which must hold roles/pubsub.publisher on the topic or the secret create/update call is rejected up front.

PROJECT_NUMBER=$(gcloud projects describe "$PROJECT_ID" --format='value(projectNumber)')
SM_AGENT="service-${PROJECT_NUMBER}@gcp-sa-secretmanager.iam.gserviceaccount.com"

gcloud pubsub topics create secret-rotation-events --project="$PROJECT_ID"

# Without this binding, attaching the topic to a secret fails validation.
gcloud pubsub topics add-iam-policy-binding secret-rotation-events \
  --member="serviceAccount:${SM_AGENT}" \
  --role="roles/pubsub.publisher" \
  --project="$PROJECT_ID"

Now attach a rotation schedule and the notification topic to the secret. rotation-period has a hard minimum of 3600s (1 hour) and next-rotation-time must be at least 300s in the future.

gcloud secrets update db-app-password \
  --next-rotation-time="2026-07-01T03:00:00Z" \
  --rotation-period="2592000s" \
  --topics="projects/${PROJECT_ID}/topics/secret-rotation-events" \
  --project="$PROJECT_ID"

Every message carries the event type in an attribute. The rotation cron fires SECRET_ROTATE; the same topic also receives SECRET_VERSION_ADD, SECRET_VERSION_ENABLE, SECRET_VERSION_DISABLE, SECRET_VERSION_DESTROY, and SECRET_UPDATE. Your rotator must filter on the attribute or it recurses: it adds a version, that fires SECRET_VERSION_ADD, which re-triggers the rotator. Filter ruthlessly.

Pub/Sub message attribute	Meaning
`eventType`	One of `SECRET_ROTATE`, `SECRET_VERSION_ADD`, `SECRET_VERSION_DISABLE`, etc.
`secretId`	Full resource name: `projects/<num>/secrets/<name>`
`data` (base64 body)	The secret resource as JSON; includes `rotation` and `topics`

3. Building the Cloud Functions rotator

The rotator is a Pub/Sub-triggered Cloud Function (2nd gen, on Cloud Run under the hood). Its contract: receive a SECRET_ROTATE event, generate a strong credential, apply it to the backing system (the Cloud SQL instance), add it as a new secret version, and stop. It must be idempotent because Pub/Sub delivery is at-least-once.

# main.py -- 2nd-gen Cloud Function, entry point "rotate_secret"
import base64
import json
import secrets
import string

import functions_framework
from google.cloud import secretmanager
import sqlalchemy
from google.cloud.sql.connector import Connector

SM = secretmanager.SecretManagerServiceClient()
DB_USER = "app_user"
INSTANCE = "my-proj:us-central1:app-sql"  # project:region:instance

def _strong_password(n: int = 32) -> str:
    alphabet = string.ascii_letters + string.digits + "-_.~"
    return "".join(secrets.choice(alphabet) for _ in range(n))

@functions_framework.cloud_event
def rotate_secret(cloud_event):
    attrs = cloud_event.data["message"].get("attributes", {})
    # Critical guard: only act on the rotation cron, never on our own writes.
    if attrs.get("eventType") != "SECRET_ROTATE":
        print(f"Ignoring eventType={attrs.get('eventType')}")
        return

    secret_resource = attrs["secretId"]          # projects/<num>/secrets/<name>
    new_password = _strong_password()

    # 1) Apply the new credential to the backing system FIRST.
    #    If this fails we never publish a version that does not work.
    connector = Connector()
    def _admin_conn():
        return connector.connect(INSTANCE, "pg8000", user="rotator",
                                 enable_iam_auth=True, db="appdb")
    engine = sqlalchemy.create_engine("postgresql+pg8000://", creator=_admin_conn)
    with engine.connect() as conn:
        conn.execute(sqlalchemy.text(
            f'ALTER USER "{DB_USER}" WITH PASSWORD :pw'
        ), {"pw": new_password})
        conn.commit()

    # 2) Only now record the value as a new ENABLED version.
    SM.add_secret_version(
        parent=secret_resource,
        payload=secretmanager.SecretPayload(
            data=new_password.encode("utf-8")
        ),
    )
    print(f"Rotated {secret_resource}; 'latest' now points to the new version.")

Ordering matters. Change the live system before writing the secret version. If the ALTER USER fails, the function throws, Pub/Sub redelivers, and latest was never moved to a credential the database does not accept. The reverse ordering would publish a “valid” version that fails every login.

Deploy it with a dedicated identity and bounded concurrency:

gcloud functions deploy secret-rotator \
  --gen2 --runtime=python312 --region=us-central1 \
  --source=. --entry-point=rotate_secret \
  --trigger-topic=secret-rotation-events \
  --service-account="rotator-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --set-env-vars="PROJECT_ID=${PROJECT_ID}" \
  --max-instances=3 \
  --project="$PROJECT_ID"

4. Two-version strategy for zero-downtime cutover

The single-credential rotator works when a credential updates atomically. When the old one must stay valid while consumers catch up, you need the two-version pattern: at any moment two credentials are accepted, and rotation alternates between them.

For databases the cleanest implementation uses two roles, app_user_a and app_user_b, behind a connection that reads latest. Each rotation rotates the password of the role that is not currently latest, then flips latest to it. The previously-live role stays valid for one full rotation period, giving every consumer time to re-read.

# Determine which role is currently "live" by reading latest, then rotate the other.
def _current_live_role(secret_resource: str) -> str:
    resp = SM.access_secret_version(name=f"{secret_resource}/versions/latest")
    return json.loads(resp.payload.data)["role"]   # payload is {"role":..,"password":..}

def rotate_two_version(secret_resource: str):
    live = _current_live_role(secret_resource)
    standby = "app_user_b" if live == "app_user_a" else "app_user_a"
    new_pw = _strong_password()
    # rotate the STANDBY role; the live role keeps working untouched
    _alter_user(standby, new_pw)
    SM.add_secret_version(
        parent=secret_resource,
        payload=secretmanager.SecretPayload(
            data=json.dumps({"role": standby, "password": new_pw}).encode()
        ),
    )
    # 'live' is still valid; it becomes the standby next cycle.

This guarantees a window equal to your rotation-period during which both N and N-1 work. Size the period against how often your longest-lived workload re-reads: a fleet that re-reads on every pool refresh has enormous headroom at 30 days, but workloads that cache a secret for a pod’s lifetime need the window set to max pod age plus margin.

Do not destroy old versions in the rotator. Disable them on a separate, slower schedule once you have telemetry proving nothing is reading them. A destroyed version is unrecoverable; a disabled one can be re-enabled during an incident in seconds.

5. CMEK encryption and regional replication

By default Secret Manager encrypts payloads with Google-managed keys. Regulated workloads want customer-managed encryption keys (CMEK) in Cloud KMS so you control rotation, location, and revocation via the key version. CMEK and replication are coupled: automatic replication requires a multi-region or global KMS key, while a user-managed (per-region) replication policy binds a distinct regional key to each replica. Most compliance regimes want the latter so the key never leaves the jurisdiction.

# A regional keyring + key co-located with the secret replica.
gcloud kms keyrings create secrets-kr --location=us-central1 --project="$PROJECT_ID"
gcloud kms keys create db-secret-key \
  --location=us-central1 --keyring=secrets-kr \
  --purpose=encryption --rotation-period=90d \
  --next-rotation-time="2026-09-01T00:00:00Z" \
  --project="$PROJECT_ID"

# The Secret Manager service agent must be able to use the key.
gcloud kms keys add-iam-policy-binding db-secret-key \
  --location=us-central1 --keyring=secrets-kr \
  --member="serviceAccount:${SM_AGENT}" \
  --role="roles/cloudkms.cryptoKeyEncrypterDecrypter" \
  --project="$PROJECT_ID"

Create the secret with user-managed replication pinning the region and its CMEK key:

gcloud secrets create db-app-password-cmek \
  --replication-policy="user-managed" \
  --locations="us-central1" \
  --kms-key-name="projects/${PROJECT_ID}/locations/us-central1/keyRings/secrets-kr/cryptoKeys/db-secret-key" \
  --project="$PROJECT_ID"

KMS key rotation and secret rotation are independent and complementary: rotating the KMS key re-wraps the data encryption keys without touching the payload, while secret rotation changes the payload. You want both. The kill switch matters too: disabling the KMS key version makes every secret version encrypted under it instantly unreadable, the fastest containment for a confirmed compromise.

6. Least-privilege IAM: three distinct identities

The biggest mistake teams make is one service account that can read, write, and manage a secret. Split it into three roles bound at the secret resource, not the project:

Identity	Role	Granted on	Why
Consumer (your app)	`roles/secretmanager.secretAccessor`	The one secret	Read payloads only; cannot list, write, or destroy
Rotator (the function)	`roles/secretmanager.secretVersionAdder`	The one secret	Add new versions; cannot read existing ones
Operator (break-glass)	`roles/secretmanager.secretVersionManager`	The one secret	Enable/disable/destroy versions during incidents

The rotator deliberately does not get secretAccessor. It generates new values and never needs old ones (the two-version variant reads only to learn which role is live, so grant secretAccessor on just that one secret if you use it). A compromise of the rotator function therefore cannot exfiltrate the current production password.

SECRET=projects/${PROJECT_ID}/secrets/db-app-password

# App reads only.
gcloud secrets add-iam-policy-binding db-app-password \
  --member="serviceAccount:app-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor" --project="$PROJECT_ID"

# Rotator writes versions, cannot read them.
gcloud secrets add-iam-policy-binding db-app-password \
  --member="serviceAccount:rotator-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretVersionAdder" --project="$PROJECT_ID"

# Break-glass operator can disable/destroy.
gcloud secrets add-iam-policy-binding db-app-password \
  --member="group:secret-operators@example.com" \
  --role="roles/secretmanager.secretVersionManager" --project="$PROJECT_ID"

Bind at the secret, never the project. A project-level secretAccessor grant reads every secret in the project, which is almost never what you intend.

7. Consuming secrets from GKE, Cloud Run, and Compute

Pin consumers to latest so rotation propagates without a redeploy, but know each platform’s caching so you know your real propagation window.

Cloud Run mounts a secret as an env var or a file. Env-var injection is resolved at instance start; a :latest file mount is re-read as new revisions and restarts roll. Prefer the volume mount for rotating secrets.

gcloud run deploy app \
  --image="$IMG" --region=us-central1 \
  --update-secrets="/secrets/db-pw=db-app-password:latest" \
  --service-account="app-sa@${PROJECT_ID}.iam.gserviceaccount.com"

GKE should use the Secret Store CSI driver with the GCP provider plus Workload Identity, not a synced Kubernetes Secret if you can avoid it. With auto-rotation enabled the CSI driver re-polls and updates the mounted file in place on its rotation interval.

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: db-pw
spec:
  provider: gcp
  parameters:
    secrets: |
      - resourceName: "projects/PROJECT_ID/secrets/db-app-password/versions/latest"
        path: "db-pw"

The app must reload from the mounted path on connection failure, not cache for the pod’s lifetime. That single retry-and-reload loop turns “the file changed” into “zero downtime”.

Compute Engine has no native mount; fetch at boot via the metadata-authenticated API and refresh on a timer or on auth failure.

TOKEN=$(curl -s -H "Metadata-Flavor: Google" \
  "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" \
  | python3 -c 'import sys,json;print(json.load(sys.stdin)["access_token"])')
curl -s -H "Authorization: Bearer ${TOKEN}" \
  "https://secretmanager.googleapis.com/v1/projects/${PROJECT_ID}/secrets/db-app-password/versions/latest:access" \
  | python3 -c 'import sys,json,base64;print(base64.b64decode(json.load(sys.stdin)["payload"]["data"]).decode())'

8. Auditing, disabling leaked versions, and alerting

Secret Manager writes Admin Activity audit logs unconditionally (create/update/destroy), but Data Access logs for AccessSecretVersion are off by default and must be enabled explicitly, per service, or you have no record of who read what.

# Enable DATA_READ for Secret Manager in the audit config.
gcloud projects get-iam-policy "$PROJECT_ID" --format=yaml > /tmp/policy.yaml
# Append under auditConfigs, then set-iam-policy:
#   - service: secretmanager.googleapis.com
#     auditLogConfigs:
#       - logType: DATA_READ
gcloud projects set-iam-policy "$PROJECT_ID" /tmp/policy.yaml

When a version leaks, disable before you destroy so you keep a re-enable path during the incident, then containment is one command:

gcloud secrets versions disable 7 --secret=db-app-password --project="$PROJECT_ID"

Build the alert in Cloud Logging. This query surfaces every access from outside your expected workload identities, the tripwire for a stolen credential being used:

resource.type="audited_resource"
logName="projects/PROJECT_ID/logs/cloudaudit.googleapis.com%2Fdata_access"
protoPayload.serviceName="secretmanager.googleapis.com"
protoPayload.methodName="google.cloud.secretmanager.v1.SecretManagerService.AccessSecretVersion"
protoPayload.resourceName=~"secrets/db-app-password/"
protoPayload.authenticationInfo.principalEmail!="app-sa@PROJECT_ID.iam.gserviceaccount.com"

Wire that into a log-based metric and a Cloud Monitoring alert policy so an unexpected accessor pages on-call instead of sitting in a log nobody reads.

Enterprise scenario

A fintech platform team ran 40+ Cloud SQL Postgres instances behind GKE services, each with a single application role whose password was set once at provisioning and pinned to latest. An auditor flagged it: PCI required 90-day rotation, and “we rotate manually during the maintenance window” did not survive scrutiny because three of the 40 had not been touched in over a year.

The constraint that made the simple rotator dangerous was their consumption pattern. The Java services cached the password at connection-pool initialization and held it for the pod’s lifetime, and pods ran for weeks. A single-credential ALTER USER rotation would have invalidated every pool the instant latest moved, taking the service down until pods recycled, the kind of self-inflicted outage that gets rotation projects cancelled.

They solved it with the two-version role pattern from section 4 plus an explicit reload on failure. Each instance got app_user_a and app_user_b; the rotator rotated whichever role was not live and flipped latest to it, leaving the previously-live role valid for the full 90-day window. The pool was wrapped so a Postgres 28P01 (invalid password) error triggered a single re-read and pool rebuild rather than a hard failure.

// HikariCP: on auth failure, re-read latest from the CSI-mounted file and rebuild.
catch (SQLException e) {
    if ("28P01".equals(e.getSQLState())) {
        Credential c = readCredentialFromMount("/secrets/db-pw"); // {role,password}
        dataSource.setUsername(c.role());
        dataSource.setPassword(c.password());
        dataSource.getHikariPoolMXBean().softEvictConnections();   // drain gracefully
    } else { throw e; }
}

The result: every instance rotated automatically every 90 days, the old credential stayed valid long enough that the 28P01 path was the safety net rather than the mechanism, and the audit finding closed. The rotator’s secretVersionAdder-only identity meant the function touching all 40 production databases could not read a single stored password.

Verify

Confirm the whole pipeline before trusting it in production:

# 1) The schedule is attached and the next time is set.
gcloud secrets describe db-app-password \
  --format="yaml(rotation, topics)" --project="$PROJECT_ID"

# 2) Force a rotation now by setting next-rotation-time ~6 min out, then watch.
gcloud secrets update db-app-password \
  --next-rotation-time="$(date -u -v+6M +%Y-%m-%dT%H:%M:%SZ)" \
  --project="$PROJECT_ID"

# 3) After it fires, confirm a NEW version exists and latest advanced.
gcloud secrets versions list db-app-password --project="$PROJECT_ID"

# 4) Confirm the rotator actually ran (and only on SECRET_ROTATE).
gcloud functions logs read secret-rotator --gen2 --region=us-central1 --limit=20

# 5) Prove the new password works against the database, old version still enabled.
gcloud secrets versions access latest --secret=db-app-password | <connect-and-test>

# 6) CMEK: confirm the secret is bound to your key, not Google-managed.
gcloud secrets describe db-app-password-cmek \
  --format="yaml(replication)" --project="$PROJECT_ID"

If step 3 shows no new version, the rotator either filtered out the event incorrectly or lacks secretVersionAdder; check the function logs from step 4 first.

Secret Manager Rotation Pipelines with Cloud Functions, IAM, and CMEK

1. The Secret Manager object model

2. Rotation schedules and Pub/Sub notifications

3. Building the Cloud Functions rotator

4. Two-version strategy for zero-downtime cutover

5. CMEK encryption and regional replication

6. Least-privilege IAM: three distinct identities

7. Consuming secrets from GKE, Cloud Run, and Compute

8. Auditing, disabling leaked versions, and alerting

Enterprise scenario

Verify

Checklist

Written by Vinod

Comments

Keep Reading

BigQuery Fine-Grained Security: Column-Level, Row-Level, and Data Masking

Cloud DNS at Scale: Private Zones, Peering, Forwarding, and Response Policies

Event-Driven Architecture with Cloud Functions 2nd Gen and Eventarc