Architecture AWS

CrowdStrike Falcon Runtime Protection for EKS and Fargate Workloads

A national health-insurance payer is moving its claims-adjudication platform off a fleet of long-lived EC2 servers and onto Amazon EKS, with the bursty nightly batch — eligibility recalculation for twelve million members — landing on Fargate so nobody has to babysit nodes at 2 a.m. Three weeks before the cutover, the CISO’s office returns the architecture with a single blocking comment: “You have removed every host you can install an EDR agent on, and you are now running code that touches PHI on infrastructure you cannot see inside. Show me runtime protection or we do not migrate.” That comment is correct, and it is the reason this article exists. Containers are not a security model — a stolen base image, a compromised dependency, a process that suddenly spawns a reverse shell inside a running pod are exactly as dangerous in Kubernetes as they were on a VM, and on Fargate you do not even own the node to put a sensor on it. This is the reference architecture for runtime protection on EKS and Fargate using CrowdStrike Falcon, built so a regulated payer’s security team signs the migration instead of vetoing it.

The pressures are the ones every regulated platform team knows. Compliance — this is HIPAA-regulated PHI, and the HITRUST assessment has an explicit control for malware and runtime threat detection on every workload that processes member data; “we use containers” satisfies none of it. Blast radius — a single compromised pod in a shared cluster can pivot to the Kubernetes API, to mounted secrets, to the pod next door, and the security team needs to see and stop that movement, not read about it in a post-incident report. Velocity — the platform ships forty times a day through a CI pipeline, and security cannot be a Friday-afternoon scan that blocks the release train. Cost and noise — runtime tooling that pages the SOC every time a legitimate sidecar starts is tooling that gets muted, and a muted control is no control at all. Falcon’s cloud-workload story threads these together: an agent where you can run one, an agentless-ish sidecar where you cannot, the same detections everywhere, and one console feeding the existing SOC.

Why the obvious approaches fall short

Three shortcuts get proposed on every one of these projects, and each fails in a way worth naming so the team stops relitigating it.

“Scan the images and call it secure.” Image scanning is necessary and it is in this design — but it is a point-in-time check of what is on disk. It says nothing about what a process does at runtime: a clean image can still load a malicious dependency at startup, fetch a second-stage payload, or be exploited through a zero-day in code the scanner declared clean yesterday. Scanning without runtime detection secures the warehouse and leaves the building unguarded.

“We have a VPC, security groups, and Network Policies.” Network controls limit where a pod can talk, not what runs inside it. A reverse shell that beacons out over 443 looks like ordinary HTTPS to a security group. East-west segmentation is real defense in depth and you should have it, but it does not see process execution, file integrity, or credential theft happening inside the container.

“Fargate is AWS-managed, so AWS secures it.” AWS secures the Fargate substrate — the kernel, the isolation boundary, the host. AWS does not watch your application’s runtime behavior; under the shared-responsibility model, what your code does inside the container is yours to defend. And because you have no node, you cannot install a DaemonSet sensor — which is precisely the gap that sinks naive “just deploy the agent” plans and forces the sidecar pattern this architecture uses.

Falcon answers all three: Falcon Cloud Security (CWPP) scans images in the pipeline and the registry, the Falcon sensor provides kernel-level runtime detection on EKS nodes, the Falcon sidecar brings that same runtime protection to Fargate pods where no node exists, and an admission controller refuses to run anything that did not pass the gate. One platform, one telemetry stream, one SOC workflow.

Architecture overview

CrowdStrike Falcon Runtime Protection for EKS and Fargate Workloads — architecture

The design has three planes that are easy to conflate and important to keep separate: a build-time plane that scans and signs images before they exist in production, a deploy-time plane that admission-gates what is allowed to run, and a runtime plane that watches behavior inside live workloads and streams detections to the SOC. A vulnerability caught in the first plane never becomes a runtime alert in the third; a workload that slips past the second is still watched by the third. Defense in depth means each plane assumes the one before it can fail.

The defining constraint that shapes the whole topology is the split enforcement model: EKS managed node groups get the Falcon sensor as a DaemonSet; Fargate pods get the Falcon sidecar injected at admission time — because on Fargate there is no host to run a DaemonSet on. Same Falcon console, same detection logic, two deployment mechanics. Getting that split right is the heart of the design.

Build-time plane, following the flow:

  1. A developer merges to main. GitHub Actions builds the container image. The pipeline authenticates to AWS via OIDC federation — no long-lived access keys stored in the repo, a line the security team drew after a prior incident — and pulls any third-party tokens (the registry robot account, the Falcon API client secret) from HashiCorp Vault with short-lived dynamic leases rather than baked-in secrets.
  2. The pipeline runs the Falcon image scan (fcs / the Falcon Cloud Security CLI) against the freshly built image, surfacing CVEs, embedded secrets, and misconfigurations, with a policy gate: a Critical vulnerability fails the build. In parallel, Wiz Code scans the same repository and IaC for misconfigurations, exposed secrets, and known-vulnerable dependencies before the image is even built — shifting posture left into the pull request so the developer sees it in code review, not in a runtime alert weeks later.
  3. A passing image is pushed to Amazon ECR, where Falcon Cloud Security continues registry scanning so a CVE disclosed after the build (the log4shell pattern) still surfaces against images already sitting in the registry. The image digest is recorded; Argo CD picks up the updated manifest from the GitOps repo and syncs it to the cluster.

Deploy-time plane: every pod creation in EKS passes through the Falcon Kubernetes admission controller (a ValidatingWebhookConfiguration). It enforces two things: the image must have a passing Falcon scan verdict (unscanned or critically-vulnerable images are rejected outright), and for Fargate pods it mutates the pod spec to inject the Falcon sidecar container so runtime protection rides along with the workload. This is the choke point that makes “no unscanned image in production” an enforced invariant rather than a hopeful policy.

Runtime plane: on EKS nodes the Falcon sensor DaemonSet runs with kernel visibility into every container on the node — process executions, file writes, network connections, container escapes, credential-access attempts — applying CrowdStrike’s behavioral IOAs (indicators of attack) and ML to flag and, in prevention mode, kill malicious activity. On Fargate, the injected sidecar provides the same behavioral telemetry scoped to its pod. All detections flow to the Falcon cloud console, and from there stream via the Falcon SIEM connector into Datadog Cloud SIEM (the payer’s existing SIEM and observability backbone), where they correlate with EKS audit logs, VPC flow logs, and application traces. A high-severity detection auto-raises a ServiceNow security incident so the SOC works a ticket with an owner and an SLA, not a console notification that scrolls away.

Component breakdown

Component Service / tool Role in the platform Key configuration choices
Source & IaC scan Wiz / Wiz Code Shift-left scan of repo, dependencies, IaC for vulns, secrets, misconfig PR-time checks; attack-path analysis; blocks merge on critical IaC drift
CI / build GitHub Actions Build, test, scan-gate, push; OIDC to AWS (no stored keys) OIDC role assumption; Falcon scan as required job; SBOM artifact
Image scanning CrowdStrike Falcon Cloud Security CVE/secret/misconfig scan of images in CI and ECR fcs CLI gate on Critical; continuous registry rescan
Registry Amazon ECR Stores signed, scanned images; immutable tags Image-immutability on; scan-on-push; lifecycle policy
GitOps deploy Argo CD Declarative sync of manifests to EKS from Git Auto-sync with self-heal; image-digest pinning
Admission control Falcon Kubernetes admission controller Reject unscanned/vulnerable images; inject Fargate sidecar ValidatingWebhook + mutating injection; fail-closed policy
Runtime — EKS nodes Falcon sensor (DaemonSet) Kernel-level runtime detection & prevention on node groups Prevention mode; node-pool tolerations; eBPF/kernel-module sensor
Runtime — Fargate Falcon sidecar (injected) Per-pod runtime protection where no node exists Auto-injected at admission; shared process namespace
Cloud posture (CSPM) Wiz Continuous cloud posture, exposure, attack paths across AWS Agentless AWS scan; alert on public-exposure or IAM drift
Identity / SSO Okta + Microsoft Entra ID Workforce SSO to the AWS console, Falcon, Argo CD, ServiceNow OIDC/SAML federation; SCIM provisioning; conditional access
Secrets HashiCorp Vault Falcon API client secret, registry creds, signing keys Dynamic leases; Kubernetes auth method; agent injector
SIEM / detections Datadog Cloud SIEM Correlate Falcon detections with cluster & cloud logs Falcon SIEM connector; detection rules; SLO on triage time
Observability Datadog Cluster/app metrics, traces, sensor health, sidecar overhead EKS integration; sidecar resource dashboards
ITSM / incidents ServiceNow Security incident records, change approvals, audit trail Auto-ticket on high-severity detection; change gate on policy edits
Edge Akamai TLS, WAF, bot mitigation in front of the public claims API WAF rules; origin shield to the ALB/Ingress

A few of these choices carry the weight of the design and deserve the why.

Why a sidecar on Fargate instead of “just the DaemonSet everywhere.” This is the single most misunderstood point. A DaemonSet schedules one pod per node, and Fargate gives you no node you control — each Fargate pod runs on its own AWS-managed micro-VM with no place to land a DaemonSet. So Falcon’s Fargate model injects a sidecar container into the pod itself at admission time; the sidecar shares the pod’s process namespace and watches the workload container’s behavior from inside the same isolation boundary. The cost is real and you must plan for it: an extra container per Fargate pod means extra vCPU/memory on every task (which, on Fargate, is money), and pods now start a little slower. The benefit is that the same runtime detections you get on nodes follow workloads onto serverless compute — no blind spot where the batch jobs run.

Why admission control is fail-closed, and why that is a deliberate risk. The webhook is configured so that an image without a passing Falcon verdict is rejected, and a Fargate pod that cannot have a sidecar injected does not start. Fail-closed means a webhook outage can block deployments — a genuine operational risk — so it is mitigated with a highly-available webhook (multiple replicas, a generous timeoutSeconds, a tight namespaceSelector that exempts kube-system and the Falcon namespace itself to avoid a deadlock). The team accepts “a webhook outage pauses deploys” over “a webhook outage silently ships unscanned PHI workloads.” That tradeoff is a decision the CISO signs, not a default to drift into.

Why both Falcon and Wiz, when they overlap. They look redundant and are not. Wiz is agentless cloud posture and shift-left: it scans the AWS account, the IaC, and the repo for misconfigurations, exposed secrets, over-broad IAM, and attack paths — it tells you the cluster’s control-plane and configuration posture and catches the vulnerable dependency in the pull request. Falcon is the runtime and the in-pipeline image gate: it watches what processes actually do once running and blocks bad images at the door. Wiz answers “is this configured to be exploitable”; Falcon answers “is something being exploited right now.” Mature programs run both, and the payer’s security architecture standard mandates exactly this layering.

Implementation guidance

Provision with Terraform; the cluster and its security planes are one deliverable. Treat the Falcon footprint as part of the cluster definition, not a follow-on. A sane order:

  1. The EKS cluster (private API endpoint, control-plane logging on) with a managed node group for sensor-bearing workloads and one or more Fargate profiles for the batch namespaces, provisioned via Terraform.
  2. The Falcon Operator installed (Helm/Terraform), which manages the sensor DaemonSet, the admission controller, and the sidecar injector as first-class cluster resources.
  3. HashiCorp Vault wired with the Kubernetes auth method so the Falcon Operator and pipeline retrieve the Falcon API client credentials dynamically — never a static secret in a Secret object.
  4. ECR with image immutability and scan-on-push, and the registry registered with Falcon Cloud Security for continuous rescanning.
  5. Argo CD pointed at the GitOps repo, with image digests (not floating tags) so what is deployed is exactly what was scanned.

The Falcon admission/sidecar behavior is declared through the operator’s custom resources. A minimal shape communicates the intent — protect the Fargate-backed namespaces, fail closed on unscanned images:

apiVersion: falcon.crowdstrike.com/v1alpha1
kind: FalconAdmission
metadata:
  name: falcon-admission
spec:
  falcon_api:
    client_id: "{{ vault:secret/falcon#client_id }}"   # leased from Vault, not stored
    cloud_region: us-1
  admissionConfig:
    failurePolicy: Fail                                  # fail-closed: no scan, no run
    namespaceSelector:
      matchExpressions:
        - key: kubernetes.io/metadata.name
          operator: NotIn
          values: ["kube-system", "falcon-system"]       # avoid a self-deadlock

And the sidecar injection scoped to the serverless workloads:

apiVersion: falcon.crowdstrike.com/v1alpha1
kind: FalconContainer            # the Fargate sidecar injector
metadata:
  name: falcon-sidecar
spec:
  falcon_api:
    cloud_region: us-1
  injection:
    namespaceSelector:
      matchLabels:
        compute: fargate          # only batch/serverless namespaces get the sidecar

The CI gate is a few lines in the GitHub Actions workflow, run after the image build and before the push, so a Critical finding stops the train:

- name: Falcon image scan (fail on Critical)
  run: |
    fcs scan image "$ECR_REPO:$GIT_SHA" \
      --report-formats sarif --fail-on critical

Identity: federate the humans, lease the machines. Console access to AWS, the Falcon platform, Argo CD, and ServiceNow all federate through Okta as the workforce IdP, brokered to Microsoft Entra ID where corporate apps already live, with SCIM provisioning and conditional access so a deprovisioned engineer loses Falcon and cluster access at once. There are no human IAM users. Machine identities are short-lived: GitHub Actions assumes an AWS role via OIDC, and the Falcon API client secret plus registry credentials are leased from HashiCorp Vault through the Kubernetes auth method and injected by the Vault Agent — nothing sensitive is ever written into a Kubernetes Secret or a pipeline variable.

Sensor mechanics that bite if you skip them. On the node groups, decide between the kernel-module and the eBPF sensor flavor; eBPF avoids the dance of matching a kernel-module version to every AMI you roll, which matters when EKS bumps the AMI under you. Set the node-group DaemonSet with the right tolerations so the sensor lands on every node including tainted pools. On Fargate, remember the injected sidecar consumes part of the task’s CPU/memory request — size the Fargate task definitions to include the sidecar’s footprint or pods will be starved, and watch the cold-start delta on the nightly batch.

Enterprise considerations

Security & defense in depth. The architecture layers controls so no single failure is fatal: (a) Wiz Code catches vulnerable dependencies, secrets, and IaC misconfig in the pull request, before an image exists; (b) Falcon image scanning gates the build and continuously rescans ECR so a newly-disclosed CVE surfaces against images already shipped; © the fail-closed admission controller guarantees no unscanned or critically-vulnerable image runs and that every Fargate pod gets its sidecar; (d) the Falcon sensor and sidecar detect and prevent malicious runtime behavior — container escape attempts, unexpected process trees, credential theft, crypto-mining — across both compute models; (e) Wiz runs continuous CSPM over the whole AWS account so a drifted security group, a public S3 bucket, or an over-broad IAM role is flagged independently of the cluster’s own controls; (f) every high-severity Falcon detection auto-raises a ServiceNow incident, turning a log line into an owned, SLA-bound investigation. The public claims API sits behind Akamai for TLS, WAF, and bot mitigation before traffic reaches the ingress, keeping volumetric and injection attacks off the cluster entirely. Enforce prevention mode (not just detect-only) on the production node groups once the team has tuned exclusions — detect-only is a comfortable place to stay, and a control that only watches is half a control.

Cost optimization. Runtime security has a real bill, and most of it is the Fargate sidecar tax, so engineer for it.

Lever Mechanism Typical effect
Right-size the sidecar Set the sidecar’s CPU/memory request to its measured floor, not a guess Trims the per-pod Fargate surcharge across thousands of nightly tasks
EKS nodes for steady load Keep always-on services on node groups (one sensor per node) One sensor amortized over many pods vs. one sidecar per pod
Fargate for spiky batch Reserve Fargate + sidecar for bursty/serverless jobs Pay for protection only when the batch actually runs
Scope sidecar injection Inject only into PHI/serverless namespaces via label selector No sidecar overhead on non-sensitive or node-backed workloads
Tune detections early Cut false positives so the SOC is not staffed for noise Lowers the human cost, which dwarfs the licensing

The node-vs-Fargate economics flip with density: a sensor on a node is amortized across every pod that lands there, while a Fargate sidecar is paid per pod. Put high-density, always-on services on node groups and keep Fargate for the genuinely bursty work — which is exactly why the payer’s batch lives there and the adjudication API does not.

Scalability. Each plane scales independently. The sensor DaemonSet scales automatically with the node group — every new node from the Cluster Autoscaler or Karpenter brings its own sensor, no action required. The Fargate sidecar scales one-to-one with pods, which is its cost characteristic and its scaling story at once. The admission webhook is the component to watch under burst: a 500-pod scale-out hammers it, so run multiple webhook replicas and keep the validation fast, or a slow webhook becomes a deploy bottleneck. Falcon’s cloud console and the Datadog SIEM ingestion scale as managed services; the practical ceiling is detection volume the SOC can triage, which is why tuning (below) is a scaling concern, not just a hygiene one.

Failure modes, and what each one looks like. Name them before they page you.

Reliability. Falcon’s protection is designed to fail safe for availability: if the sensor itself has a problem on a node, the workloads keep running — security degrades, the application does not, which is the right default for a claims platform that cannot take an outage. Sensor and sidecar health are first-class signals in Datadog, alerting the moment a node’s sensor goes unhealthy or a Fargate pod launches without its sidecar (which should be impossible given fail-closed injection, and is therefore worth an immediate page if seen). For the SOC’s own continuity, detections persist in the Falcon cloud independent of cluster state, so an investigation survives a cluster recycle.

Observability. Treat security telemetry as observability data, not a separate silo. Stream Falcon detections through the Falcon SIEM connector into Datadog Cloud SIEM, correlated with EKS control-plane audit logs, VPC flow logs, and application traces, so an analyst pivots from a detection to the pod, to the deployment, to the developer in one place. Emit the metrics the program is actually judged on — percentage of running pods with an active sensor/sidecar (the coverage number the auditor asks for), admission-rejection rate (unscanned images turned away), mean time to triage a high-severity detection, sidecar resource overhead, and detections by IOA category to spot tuning needs. Wire a ServiceNow change gate in front of any edit to detection policy or admission rules so security changes carry the same audit trail as code.

Governance & compliance. This is the section that satisfies the HITRUST control that started the project. Pin the Falcon sensor and operator versions explicitly and promote upgrades through pre-prod — never auto-latest on a regulated cluster. Keep admission policy, Fargate profiles, and the operator config in Git, reviewed and revertable, deployed by Argo CD. Use the coverage metric as continuous evidence that every PHI-processing pod is protected, and feed it into the HITRUST assessment so “runtime malware protection on all workloads” is a dashboard, not an assertion. Retain detection and admission logs for the audit window. Wiz serves as the independent verifier that the controls are real — that the cluster’s posture has not drifted away from the signed design.

Explicit tradeoffs

Accept these or do not build it. Runtime protection adds moving parts and real cost: a DaemonSet on every node, a sidecar in every Fargate pod (with its per-pod compute bill and a measurable cold-start hit), and an admission webhook in the critical path of every deployment. Fail-closed admission means a webhook problem can pause your release train — you are trading availability of deploys for assurance that nothing unscanned reaches production, and that is a real, deliberate exchange the security and platform teams must agree to in writing, not a setting to flip blindly. Prevention mode can, in principle, kill a legitimate process that trips an untuned rule, so the path from detect-only to prevention runs through disciplined exclusion tuning, not a single brave toggle. And running Falcon and Wiz is two licenses and two consoles — justified by genuinely different coverage (runtime-and-image-gate versus posture-and-shift-left), but it is more tooling, more integration, and more to operate.

The alternatives, and when they win. If you run only on EKS nodes and never touch Fargate, you can skip the sidecar pattern entirely and run the DaemonSet alone — simpler, cheaper, no per-pod tax — and graduate to sidecars only when serverless compute appears. If your compliance bar is lighter, the open-source Falco project gives you eBPF-based runtime detection without a commercial agent, trading CrowdStrike’s managed threat intelligence, prevention, and SOC integration for cost and control — a reasonable choice for a startup, a poor one for a payer that needs vendor-backed IOAs and a console an auditor recognizes. If you are all-in on AWS-native tooling, Amazon GuardDuty’s EKS Runtime Monitoring and Inspector image scanning cover a meaningful slice for less integration effort — but you give up the unified cross-cloud, cross-workload console and the depth of CrowdStrike’s detection that a multi-cloud security org standardizes on. The architecture here is the destination for a regulated, multi-compute, multi-cloud-leaning estate; start with the DaemonSet-only subset if your footprint allows, and add the sidecar and admission planes as Fargate and the audit demand them.

The shape of the win

For the payer, the payoff is not “we installed an EDR.” It is that the nightly eligibility batch runs on Fargate — no nodes, no 2 a.m. babysitting — and every one of those serverless pods carries the same runtime protection as the always-on adjudication services, with a coverage dashboard that says 100% to the HITRUST assessor and a fail-closed gate that makes “an unscanned image touched PHI” structurally impossible. That is the sentence that turned the CISO’s blocking comment into a signature. Everything upstream — the Wiz shift-left scan, the Falcon image gate in GitHub Actions, the admission controller, the node sensor, the Fargate sidecar, the Datadog SIEM correlation, the ServiceNow incident with an owner — exists to let a regulated payer move to containers and serverless without trading away the runtime visibility it had on the VMs it left behind. Start with the node-group subset if you must; this is where runtime protection for a PHI-grade EKS-and-Fargate platform has to land.

AWSEKSFargateCrowdStrike FalconRuntime SecurityCWPP
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading