Architecture GCP

SOC 2 Continuous Compliance Automation on GCP with Drata

A Series-C digital health company that processes claims and care-coordination data for regional payers gets the email every scaling startup eventually gets: their three largest prospects, each a hospital network, will not sign until they produce a SOC 2 Type II report covering a 12-month observation window. The deals are worth more than the company’s entire current ARR, and the auditor wants to start the window in 90 days. The head of security does the math and it is grim: 64 trust-services controls, evidence to be pulled from a GCP estate of nine projects, two GKE clusters, BigQuery warehouses full of PHI-adjacent data, and a workforce of 140 — and the way they have always done it is a frantic, screenshot-driven scramble in the two weeks before the auditor arrives. That approach does not survive a Type II audit, which does not ask “is this control configured today” but “was this control operating effectively, continuously, for twelve months.” This article is the reference architecture for the only thing that actually answers that question: a continuous compliance system on Google Cloud that collects control evidence automatically, every day, and streams it into Drata so the report is a byproduct of how you operate rather than a fire drill you survive.

The pressures here are specific to a company growing through its first serious audit. Revenue is gated on the report — no SOC 2, no enterprise contracts, full stop. Continuity is the trap of Type II: a control that was green in January and silently drifted red in March produces an audit exception, and exceptions are what prospects’ security teams flag. Headcount is the constraint — a 140-person company cannot dedicate three engineers to manually screenshotting IAM policies forever. And trust is the actual product being sold: a healthcare buyer is handing you their patients’ data, and the report is the artifact that says you can be trusted with it. Continuous compliance — machine-collected, timestamped, immutable evidence mapped to each trust-services criterion — satisfies all four at once. The control state lives in your cloud’s own telemetry; Drata supplies the mapping, the monitoring, and the auditor-facing evidence room.

Why not the obvious shortcuts

Three shortcuts will be proposed in the first planning meeting, and each fails a Type II audit in a way worth naming.

The pre-audit screenshot sprint — assign people to capture console screenshots the month before fieldwork — produces evidence for a point in time, not the period, and a screenshot proves nothing about the eleven months the auditor cannot see. It also rots: the engineer who took it leaves, the console UI changes, and next year you start from zero. A spreadsheet of controls maintained by hand drifts out of sync with reality the day after someone updates it, and an auditor who finds the spreadsheet says “GKE RBAC is restricted” while the cluster actually allows broad access has just found you an exception and a credibility problem. Buying a compliance tool and stopping there is the most expensive mistake: Drata is not magic, and a GRC platform with nothing feeding it real, current evidence is a very polished to-do list. The tool is the system of record; the architecture is what makes the evidence true.

Continuous compliance threads the needle. Every control is backed by an automated evidence collector that reads the live state from GCP’s own security and audit telemetry on a schedule, compares it to the policy the control requires, and pushes a timestamped pass/fail with the underlying artifact into Drata. When a control drifts, the platform detects it within hours — not at audit time — and raises a ticket so it is remediated inside the window, which is exactly what keeps an exception off the final report.

Architecture overview

SOC 2 Continuous Compliance Automation on GCP with Drata — architecture

The platform runs two distinct flows that share telemetry but live on different schedules: a continuous evidence flow that proves controls are operating, and an event-driven remediation flow that closes the gap the moment a control drifts. Keeping them separate is the first step to operating this without alert fatigue.

The defining property of the topology is that evidence is derived from immutable, first-party platform signals, never from a human re-stating what they believe is true. Three GCP sources do the heavy lifting: Security Command Center (SCC) Premium for posture and misconfiguration findings, Cloud Logging for the audit trail of who did what, and Organization Policy + Policy Controller for the policy-as-code guardrails that make many controls preventive rather than merely detective. Drata sits above them as the mapping and monitoring layer that turns raw signal into a SOC 2 trust-criterion it can show an auditor.

Evidence flow, following the control path:

  1. The GCP organization is the root of trust. Below it, a folder hierarchy separates prod, non-prod, and shared projects, and the whole org is enrolled in Security Command Center Premium, which continuously scans every project for misconfigurations — public buckets, over-broad IAM, unencrypted resources, disabled audit logs — and emits structured findings mapped to standards including SOC 2.
  2. Cloud Audit Logs (Admin Activity, Data Access, and System Event) capture every administrative and data-plane action across the org. An aggregated organization-level log sink exports these continuously to a dedicated, bucket-locked Cloud Storage bucket and to a BigQuery dataset, so the audit trail is immutable and queryable — the raw material for any “show me access over the period” question.
  3. Policy-as-code guardrails run at two layers: Organization Policy constraints deny whole classes of misconfiguration at the API (no public IPs, no external service-account keys, region pinning for data residency), and Policy Controller (managed OPA Gatekeeper) enforces admission policy inside GKE so a non-compliant workload is rejected before it ever runs. These make controls preventive, which is the strongest evidence you can give an auditor.
  4. An evidence-collection service — a small set of Cloud Run jobs triggered by Cloud Scheduler — runs daily per control. Each job queries the relevant source (SCC findings API, the BigQuery audit dataset, IAM policy bindings, the Policy Controller constraint status), evaluates it against the control’s expected state, and produces a normalized evidence record: control ID, SOC 2 criterion, pass/fail, timestamp, and the underlying artifact.
  5. Those records flow into Drata through its API and native GCP integration. Drata maps each to one or more Trust Services Criteria (Security/Common Criteria, Availability, Confidentiality, Processing Integrity, Privacy), maintains the continuous monitoring view, and becomes the evidence room the auditor logs into — every control, its current status, and the dated history that proves it held for the whole period.
  6. Personnel and policy controls — background checks, security-awareness training, signed acceptable-use and access-review attestations — are collected by Drata’s HR and identity integrations, including security-awareness training delivered through Moodle, the company’s LMS, whose completion records satisfy the “personnel receive security training” criterion automatically.

Remediation flow, event-driven and independent: a new high-severity SCC finding, or a control flipping to fail in Drata, publishes to a Pub/Sub topic. A subscriber automatically opens a ServiceNow incident with the control, the criterion at risk, and the offending resource, routed to the owning team with an SLA tied to keeping the fix inside the audit window. Critical, deterministic fixes (a bucket that went public, an audit-log sink that was disabled) can trigger auto-remediation Cloud Functions, while everything else is human-reviewed through the ticket. The loop is what converts “we found drift” into “we fixed it before it became an exception.”

Component breakdown

Component Service / tool Role in the platform Key configuration choices
Posture findings Security Command Center Premium Continuous misconfiguration + threat findings, SOC 2 mapped Org-level activation; SOC 2 posture; findings export to Pub/Sub
Audit trail Cloud Audit Logs + sink Immutable record of every admin/data action Aggregated org sink; bucket-lock retention; BigQuery for query
Preventive guardrails Organization Policy + Policy Controller Deny misconfig at the API and at GKE admission constraints/* org policies; Gatekeeper constraint templates
Evidence collectors Cloud Run + Cloud Scheduler Daily per-control evaluation → normalized evidence One job per control family; least-privilege service accounts
GRC system of record Drata TSC mapping, continuous monitoring, auditor evidence room API + GCP integration; control owners; alerting on drift
Workforce identity Okta (or Microsoft Entra ID) SSO, MFA, lifecycle, access-review source SCIM provisioning; MFA enforced; logs feed access-review evidence
Secrets HashiCorp Vault Dynamic DB creds, API tokens, signing keys Short-lived leases; audit device on; no static secrets in code
CSPM (independent) Wiz / Wiz Code Second-opinion posture + attack paths + IaC scanning Agentless scan; Wiz Code gates Terraform pre-merge
Runtime security CrowdStrike Falcon Endpoint + GKE node runtime threat detection Sensor on nodes + laptops; detections feed SOC and evidence
Observability / SLO Datadog (or Dynatrace) Availability evidence: uptime, SLOs, monitoring coverage Synthetic checks; SLO monitors; alert history as availability proof
ITSM / workflow ServiceNow Incident, change, and access-review ticketing Auto-incident on drift; change gate; review campaigns
CI / IaC GitHub Actions / Jenkins + Argo CD Pipeline build/test/policy-gate; GitOps deploy OIDC to GCP; Terraform plan + Wiz Code gate; Argo for GKE
IaC / config Terraform + Ansible Declarative infra + golden-image/config baselines Terraform for cloud; Ansible for VM/appliance hardening
Edge Akamai TLS, WAF, DDoS at the perimeter of the patient portal WAF rules; bot mitigation; logs as a monitored control surface
LMS Moodle Security-awareness training delivery + completion records Annual + onboarding courses; completion API feeds Drata

A few choices deserve the why, because they are the ones teams get wrong.

Why preventive guardrails beat detective ones. A detective control finds the public bucket after it exists and someone has to close the gap before the auditor notices; a preventive control means the bucket could never be made public in the first place. Organization Policy constraints like constraints/storage.publicAccessPrevention and constraints/iam.disableServiceAccountKeyCreation are enforced by GCP at the API, so the misconfiguration is impossible org-wide, and that impossibility is the cleanest evidence you can hand an auditor. Policy Controller does the same at GKE admission. Detective controls via SCC are the safety net for everything you cannot prevent — you want both, but lead with prevention.

Why the audit-log sink must be bucket-locked. The single most damaging audit finding is tampered or deleted evidence. An aggregated organization sink exports every project’s audit logs to one place; bucket-lock applies a retention policy that even a project owner cannot shorten or bypass, so the trail is immutable for the full observation period. Skip this and a clever insider — or a ransomware actor — can erase exactly the records the auditor needs, and you cannot prove they didn’t.

Why an independent CSPM alongside SCC. SCC is Google’s view of Google’s cloud, and Drata trusts it. Wiz provides a second, vendor-independent posture assessment with attack-path analysis that correlates a misconfiguration with a real exploitation route, and Wiz Code scans Terraform before merge so a misconfiguration never reaches the cloud to be found later. Two independent posture sources is not redundancy for its own sake — it is what lets you tell an auditor your control coverage does not depend on a single tool’s blind spots.

Implementation guidance

Provision with Terraform, and treat the org structure and log sink as the first deliverables. The order matters: you cannot prove a control held for the period if logging started after the period did.

  1. The organization with a folder hierarchy (prod / non-prod / shared) and projects placed beneath it, so policy and logging inherit by structure rather than per-project toil.
  2. The aggregated org-level audit-log sink to a bucket-locked Storage bucket and a BigQuery dataset — first, so the trail is complete from day one of the window.
  3. Security Command Center Premium activated at the org, with the SOC 2 posture enabled and findings exported to Pub/Sub.
  4. Organization Policy constraints applied at the org node, then Policy Controller installed on each GKE cluster with the constraint templates the controls require.
  5. The evidence-collection Cloud Run jobs and their Cloud Scheduler triggers, each with a least-privilege, read-only service account scoped to exactly the API it queries.

A minimal Terraform shape for the immutable sink communicates the intent — org-wide, retained, tamper-resistant:

resource "google_logging_organization_sink" "audit_all" {
  name             = "org-audit-to-locked-bucket"
  org_id           = var.org_id
  destination      = "storage.googleapis.com/${google_storage_bucket.audit.name}"
  include_children = true                       # every project under the org
  filter           = "logName:\"cloudaudit.googleapis.com\""
}

resource "google_storage_bucket" "audit" {
  name                        = "hc-soc2-audit-logs-locked"
  location                    = "US"
  uniform_bucket_level_access = true
  retention_policy {
    retention_period = 34128000                 # ~13 months, covers the window
    is_locked        = true                      # cannot be shortened or deleted
  }
}

The pipeline that applies this runs in GitHub Actions (or Jenkins where a team already lives there), authenticating to GCP via OIDC / Workload Identity Federation so there is no long-lived service-account key to leak — which is itself a SOC 2 control. The same pipeline runs Wiz Code against the Terraform plan as a required gate, and Argo CD handles GitOps delivery of the in-cluster policy and workloads so the deployed state is always the reviewed, version-controlled state — a continuous “changes are authorized and tested” evidence stream by construction.

Identity: one front door, fully logged. Workforce access flows through Okta as the IdP (or Microsoft Entra ID in a Microsoft-centric shop), with SCIM provisioning so a leaver is deprovisioned everywhere the moment HR offboards them, and MFA enforced on every app. Okta’s system log is itself evidence — it answers “MFA is enforced” and “access was reviewed and revoked” without a screenshot — and Drata reads it directly. Human access to GCP is granted through Okta-federated groups mapped to least-privilege IAM roles, never standing project-owner grants. The few secrets that are not workload identities — third-party API tokens, database credentials — live in HashiCorp Vault, issued as short-lived dynamic leases with the audit device enabled, so secret access is itself a logged, time-bounded, auditable event rather than a static string in a config file.

Map collectors to control families, not to individual controls. Write one evidence collector per family — encryption, IAM/least-privilege, logging, network, vulnerability management, change management, availability — and let each emit evidence for the several SOC 2 criteria that family touches. A collector that reads IAM bindings and flags any external or primitive-role grant feeds multiple Common Criteria controls at once, which keeps the number of jobs manageable as the control set grows.

Enterprise considerations

Security & Zero Trust. The architecture is evidence-generating by construction: least-privilege IAM, preventive org policy, immutable audit logs, no standing admin. Layer on top: (a) Wiz running continuous, independent CSPM and attack-path analysis across the org, so a misconfiguration that SCC’s ruleset misses is still caught and a real exploitation path is prioritized over a theoretical one; (b) CrowdStrike Falcon sensors on GKE nodes, VMs, and workforce laptops for runtime threat detection, feeding the SOC and supplying the “endpoints are protected and monitored” evidence directly; © Akamai at the edge of the patient portal for TLS, WAF, and DDoS protection, whose logs are a monitored control surface; (d) any drift or detection auto-raises a ServiceNow incident so the response is a tracked ticket with an SLA, not a buried log line; (e) virtual appliances — a network firewall and a secrets/HSM appliance from the GCP Marketplace — hardened with Ansible golden configs and patched on a tracked cadence, because an unpatched appliance is exactly the kind of vulnerability-management exception auditors look for.

Cost optimization. Continuous compliance is cheap relative to a lost enterprise deal, but the line items still warrant engineering.

Lever Mechanism Typical effect
Right-size SCC SCC Premium at org, not redundant per-project scanners One posture bill, full coverage
Tier log retention Hot in BigQuery 90 days; cold in locked bucket for the window Cuts query/storage spend without losing the trail
Serverless collectors Cloud Run jobs that run daily and scale to zero Pay per evaluation, not for idle infra
Partition the audit dataset Date-partition BigQuery; query only the window in scope Slashes scan cost on access-review queries
Consolidate GRC Drata as the single evidence room across frameworks Reuse SOC 2 evidence for ISO 27001/HIPAA later

The real cost story is reuse: the evidence collected for SOC 2 maps largely onto ISO 27001 and HIPAA controls in Drata, so the second framework is mostly mapping work, not a new collection effort — which is how a health company adds HIPAA attestation without doubling the team.

Scalability. Each part scales independently. SCC and Organization Policy apply org-wide automatically as new projects are created beneath the folders, so onboarding the tenth project adds zero compliance work — the controls inherit. The Cloud Run collectors scale to zero between runs and fan out per control family, so the evidence volume grows with controls, not with infrastructure cost. Drata scales by mapping, and adding a second framework reuses most existing evidence. The natural ceiling is people: control ownership must scale with the org, which is why every control in Drata has a named owner and a review cadence rather than being “security’s problem.”

Failure modes, and what each one looks like. Name them before they cost you an exception.

Reliability & evidence integrity. For a compliance system, “reliability” means the evidence is durable and trustworthy, and the availability of the product is itself a SOC 2 criterion. On the evidence side: the locked bucket and BigQuery dataset are the durable record, retained past the window, and the collectors are stateless and idempotent so a failed run simply re-runs. On the Availability criterion: Datadog (or Dynatrace) supplies the proof — synthetic uptime checks, defined SLOs with error budgets, and a documented alert-and-response history that shows monitoring was in place and incidents were handled, which is precisely what an availability auditor asks for. A pragmatic posture: treat any evidence gap longer than 24 hours as an incident, because an unexplained gap in the trail is itself an audit finding.

Observability. Instrument the compliance platform like production, because to an auditor it is production. Emit the metrics that matter: control pass rate (the headline number), mean time to remediate a failed control (does drift get fixed inside the window), collector freshness (is every evidence stream current), and open compliance incidents by criterion. Pipe these to Datadog dashboards the security lead watches daily, with alerting so a control flip pages someone rather than waiting for the weekly review. The auditor-facing view lives in Drata; the operator-facing view lives in Datadog — and the two should never disagree.

Governance. Every control has a named owner and a review cadence in ServiceNow, which runs the quarterly access-review campaigns that produce the “access is reviewed periodically” evidence automatically. Policy-as-code lives in version control, reviewed and revertable, so a change to a guardrail is itself an auditable change-management event. Pin tool and policy versions explicitly so behavior does not drift between audits. And log the provenance of every evidence record — which collector, which source, which timestamp — so when an auditor asks “how do you know this was true on March 12,” the answer is a query, not a memory.

Explicit tradeoffs

Accept these or do not build it. Continuous compliance is real engineering, not a purchase: you are building and maintaining evidence collectors that break when an API changes, and a stale collector that silently fails is worse than a manual process because it manufactures false confidence — which is why freshness monitoring is non-negotiable, not optional. Preventive guardrails via Organization Policy and Policy Controller will, by design, block legitimate work sometimes — a developer’s deploy gets rejected by an admission policy — and you must staff the exception path or the platform becomes the thing teams route around. The immutable, bucket-locked log trail costs storage you cannot reclaim early and cannot delete even when you want to. And the whole system front-loads effort: the screenshot sprint is cheaper this quarter and ruinous over a Type II window, while continuous compliance is the reverse — more now, far less every audit after.

The alternatives, and when they win. If you are pre-revenue and pursuing a Type I report (a point-in-time snapshot, not a period), the lightweight path — Drata’s out-of-the-box GCP integration with minimal custom collectors — is genuinely enough, and you should not over-build. If you live in multi-cloud, the same pattern holds but the sources change (AWS Security Hub or Azure Defender for Cloud in place of SCC), and Drata or Vanta sits above all of them as the unifying evidence room — Vanta is the close substitute here and wins where your team already runs it. And if compliance is genuinely not your bottleneck — a tiny internal tool with no enterprise buyers — then none of this is justified, and a hand-maintained control list is the honest answer. This architecture earns its complexity precisely when a Type II report on a regulated-data platform stands between you and the contracts that fund the company.

The shape of the win

For the health company’s first enterprise audit, the payoff is not “a passed report.” It is that fieldwork week is boring: the auditor is given a login to Drata, sees 64 controls each backed by dated, machine-collected evidence that held continuously for twelve months, samples a handful, queries the immutable audit trail to confirm, and finds zero exceptions — because every drift that happened during the year was caught by SCC or a freshness alert, ticketed in ServiceNow, and remediated inside the window. That clean Type II report is what unlocks the three hospital-network contracts, and it is reusable: the same evidence, re-mapped in Drata, carries most of the way to HIPAA and ISO 27001. Everything upstream — the org policy guardrails, the bucket-locked sink, the Cloud Run collectors, the Okta-fed access reviews, the Vault-issued short-lived secrets, the Wiz and CrowdStrike posture, the Moodle training records — exists so that the report is a byproduct of operating well, not a thing you brace for once a year. Start narrower if you are only chasing Type I, but for a regulated platform selling into healthcare at scale, this is where continuous compliance has to land.

GCPSOC 2ComplianceSecurity Command CenterDrataPolicy-as-Code
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading