Running EKS at Scale: Pod Identity, Karpenter Autoscaling, and VPC CNI Networking

eksctl create cluster gives you a control plane and some nodes. It does not give you a platform. The gap between a demo cluster and one that runs hundreds of services across thousands of pods comes down to four decisions you make early and rarely revisit cheaply: how identity flows to workloads, how the data plane allocates IPs, how nodes appear and disappear, and how you keep the whole thing current. This guide walks each one with the commands and manifests I actually ship.

Beyond `eksctl create`: the four decisions

A production EKS platform on AWS lives or dies on these:

Decision	Legacy default	What scales
Cluster auth	`aws-auth` ConfigMap	Access entries (EKS access-management API)
Workload identity	IRSA (OIDC + per-SA role)	EKS Pod Identity (association API)
Pod networking	One ENI per IP, low pod density	VPC CNI prefix delegation
Node lifecycle	Managed node groups + Cluster Autoscaler	Karpenter with consolidation

None of these are exotic. They are the boring, correct defaults for a cluster you intend to operate for years. Assume EKS 1.31+ throughout.

Step 1 — Cluster provisioning with access entries

The aws-auth ConfigMap was the original way to map IAM principals to Kubernetes RBAC. It is a single YAML blob with no validation: one bad edit locks every admin out of the cluster. The access-management API replaces it with first-class AWS resources you manage via the API, CLI, or IaC.

Create the cluster with the API-based authentication mode. With eksctl:

# cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: platform-prod
  region: us-east-1
  version: "1.31"
accessConfig:
  authenticationMode: API_AND_CONFIG_MAP
  bootstrapClusterCreatorAdminPermissions: true
vpc:
  clusterEndpoints:
    publicAccess: true
    privateAccess: true
addons:
  - name: vpc-cni
  - name: coredns
  - name: kube-proxy
  - name: eks-pod-identity-agent

eksctl create cluster -f cluster.yaml

API_AND_CONFIG_MAP lets both mechanisms coexist while you migrate; flip to API once nothing reads the ConfigMap. Grant a role cluster-admin via an access entry plus an access policy association:

aws eks create-access-entry \
  --cluster-name platform-prod \
  --principal-arn arn:aws:iam::111122223333:role/platform-admins

aws eks associate-access-policy \
  --cluster-name platform-prod \
  --principal-arn arn:aws:iam::111122223333:role/platform-admins \
  --policy-arn arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy \
  --access-scope type=cluster

AWS-managed access policies (AmazonEKSClusterAdminPolicy, AmazonEKSAdminPolicy, AmazonEKSViewPolicy, and others) map to predictable RBAC. For namespace-scoped grants, set --access-scope type=namespace,namespaces=team-a,team-b. For anything bespoke, create an access entry of type STANDARD and bind your own RBAC by Kubernetes group.

The payoff: access is auditable in CloudTrail, expressible in Terraform (aws_eks_access_entry / aws_eks_access_policy_association), and a typo returns an API error instead of bricking RBAC.

Step 2 — Workload identity: IRSA to EKS Pod Identity

IRSA works: annotate a ServiceAccount with a role ARN, the pod gets a projected token, and the SDK exchanges it via the cluster’s OIDC provider. The operational cost shows up at scale. Every cluster needs its own IAM OIDC provider, and every role’s trust policy hardcodes that provider’s URL plus the SA sub. Replicate a workload across ten clusters and you maintain ten trust policies per role.

EKS Pod Identity removes the OIDC plumbing. A node-level agent (the eks-pod-identity-agent add-on) vends credentials, and a single API call associates a role with a (namespace, ServiceAccount) pair. The role’s trust policy points at the EKS service, not a cluster-specific OIDC URL.

The trust policy is identical across every cluster:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "Service": "pods.eks.amazonaws.com" },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    }
  ]
}

Create the association:

aws eks create-pod-identity-association \
  --cluster-name platform-prod \
  --namespace payments \
  --service-account checkout-sa \
  --role-arn arn:aws:iam::111122223333:role/checkout-app

The ServiceAccount needs no annotation — the binding lives in EKS, not on the SA. Application code is unchanged: the AWS SDK (a recent version) resolves Pod Identity credentials transparently.

A practical migration sequence:

Install the eks-pod-identity-agent add-on.
For one workload, retarget its IAM role trust policy to pods.eks.amazonaws.com and create the association.
Roll the pods, confirm AWS calls still succeed, then remove the IRSA SA annotation.
Repeat per workload; decommission the IAM OIDC provider only after the last IRSA consumer is gone.

Keep IRSA where you genuinely need cross-account sts:AssumeRole chains or non-EKS consumers of the same role. For in-cluster workloads, Pod Identity is the lower-maintenance default.

Step 3 — VPC CNI tuning: prefix delegation and beyond

The AWS VPC CNI gives every pod a routable VPC IP — great for native security groups and flow logs, brutal for IP exhaustion. By default each ENI carries one IP per pod, so pod density per node is capped by ENI/IP limits, and large nodes burn through a /24 fast.

Prefix delegation assigns each ENI a /28 prefix (16 IPs) instead of single IPs, multiplying pod density and slashing EC2 API calls during scale-up. Enable it on the add-on:

kubectl set env daemonset aws-node -n kube-system \
  ENABLE_PREFIX_DELEGATION=true

# Warm capacity so pod scheduling never blocks on a slow ENI attach
kubectl set env daemonset aws-node -n kube-system \
  WARM_PREFIX_TARGET=1

Prefix delegation also changes how you size the --max-pods value on each node — derive it from the instance’s ENI and prefix limits rather than leaving the old per-IP default. AWS publishes a max-pods-calculator helper for this; bake the result into your node bootstrap.

Two adjacent features worth knowing:

Custom networking places pods on a different subnet (and security group) than the node’s primary ENI, via ENIConfig CRDs. Reach for it when your node subnets are small and you want pods in a separate, larger CIDR — often a secondary VPC CIDR like 100.64.0.0/16.
Security groups for pods lets you attach EC2 security groups directly to pods through a SecurityGroupPolicy, so database access rules target the pod, not the whole node. It requires ENABLE_POD_ENI=true on the CNI and is supported on a subset of (mostly Nitro) instance types.

apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
  name: payments-db-access
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: checkout
  securityGroups:
    groupIds:
      - sg-0abc123def4567890

Prefix delegation is the one almost everyone needs; custom networking and security-groups-for-pods are situational. Turn them on only when a real constraint demands it — each adds moving parts to the data plane.

Step 4 — Node lifecycle with Karpenter

Cluster Autoscaler scales node groups you predefine: it can only add nodes of a shape you already declared, and it bin-packs poorly across many instance types. Karpenter watches for unschedulable pods and provisions right-sized nodes directly against EC2, picks instance types from a broad pool, and consolidates — replacing or removing nodes when workloads no longer justify them.

Two CRDs drive it. EC2NodeClass is the AWS-specific template (AMI, subnets, security groups, IAM role). NodePool is the scheduling policy and constraints.

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  role: "KarpenterNodeRole-platform-prod"
  amiSelectorTerms:
    - alias: al2023@latest
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "platform-prod"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "platform-prod"
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["5"]
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidationAfter: 1m
  limits:
    cpu: "1000"

Design notes from running this in anger:

Let the pool be wide. Listing many instance families and both spot and on-demand gives Karpenter room to bin-pack cheaply and to ride out Spot interruptions by falling back to on-demand. Constrain only what the workload actually requires (arch, GPU, local NVMe).
WhenEmptyOrUnderutilized is where the savings live. Karpenter will proactively replace a lightly-loaded node with a smaller/cheaper one. Protect pods that must not be evicted with karpenter.sh/do-not-disrupt: "true" and rely on PodDisruptionBudgets.
Spot is safe for stateless tiers. Karpenter consumes the EC2 interruption signal and cordons/drains ahead of reclamation. Keep stateful or long-running jobs on on-demand via a separate NodePool.
Use limits as a guardrail. A runaway controller creating pods can otherwise provision unbounded capacity; a CPU cap on the pool is your circuit breaker.

Install Karpenter via its Helm chart, ensuring the controller has its own IAM permissions (a Pod Identity association is the clean way) and that the node role is registered as an EKS access entry of type EC2_LINUX so nodes can join.

Step 5 — Managing core add-ons and the upgrade cadence

CoreDNS, kube-proxy, the VPC CNI, and the EBS CSI driver are EKS managed add-ons — version them through EKS rather than as loose manifests, so the control plane tracks compatibility.

List what an add-on supports for your cluster version, then update:

aws eks describe-addon-versions \
  --addon-name aws-ebs-csi-driver \
  --kubernetes-version 1.31 \
  --query 'addons[].addonVersions[].addonVersion'

aws eks update-addon \
  --cluster-name platform-prod \
  --addon-name aws-ebs-csi-driver \
  --addon-version v1.35.0-eksbuild.1 \
  --resolve-conflicts PRESERVE

--resolve-conflicts PRESERVE keeps your field-level customizations (replica counts, tolerations) instead of clobbering them with add-on defaults. Use OVERWRITE deliberately, when you want to reset to defaults.

The EBS CSI driver needs IAM permissions to manage volumes — wire it with a Pod Identity association to its controller ServiceAccount rather than node-instance-profile permissions, so the blast radius stays narrow.

Upgrade cadence: EKS ships a new Kubernetes minor roughly every quarter, and each version has a support window after which extended support charges apply. Plan one planned upgrade per quarter rather than a panicked annual jump across four versions. Control-plane upgrades are one minor at a time and non-skippable.

Step 6 — Ingress with the AWS Load Balancer Controller

The AWS Load Balancer Controller reconciles Kubernetes Ingress objects into ALBs and Service type: LoadBalancer into NLBs, with target-type ip registering pod IPs directly (no extra node hop). Give its controller an IAM role via Pod Identity, then drive everything with annotations:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: checkout
  namespace: payments
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:111122223333:certificate/abcd-1234
    alb.ingress.kubernetes.io/healthcheck-path: /healthz
spec:
  ingressClassName: alb
  rules:
    - host: checkout.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: checkout
                port:
                  number: 80

Use IngressGroups (alb.ingress.kubernetes.io/group.name) to merge multiple Ingress resources onto one shared ALB — otherwise every Ingress spins up its own load balancer and the bill (and ENI consumption) climbs fast.

Enterprise scenario

A fintech platform team ran 40+ services on a single EKS cluster and started seeing pods stuck Pending during morning traffic ramps — but only on their m6i.4xlarge nodes, never the smaller ones. The constraint wasn’t compute; CPU and memory sat at 50%. It was IP exhaustion masked by a subtle interaction: they had enabled ENABLE_PREFIX_DELEGATION=true on the VPC CNI but never recalculated --max-pods, which Karpenter was still deriving from the old per-IP ENI formula. So a node advertised capacity for ~110 pods, but the CNI could only attach enough /28 prefixes for ~58 before hitting the per-instance ENI limit. The kubelet kept scheduling; the CNI kept failing IP allocation, leaving pods wedged.

The fix was to make Karpenter compute --max-pods consistently with prefix delegation by setting maxPods explicitly in the EC2NodeClass kubelet config, derived from AWS’s max-pods-calculator --cni-version 1.x --instance-type m6i.4xlarge --cni-prefix-delegation-enabled:

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  kubelet:
    maxPods: 110

After applying it, Karpenter drifted the old nodes out under PDBs and the Pending storm disappeared. The lesson: prefix delegation and --max-pods are one decision, not two — and Karpenter’s advertised capacity must agree with what the CNI can physically allocate, or the scheduler will happily overcommit IPs you don’t have.

Verify

Confirm each layer before declaring the platform ready:

# Auth: access entries resolve, no stale aws-auth dependency
aws eks list-access-entries --cluster-name platform-prod

# Pod Identity: agent running, associations present
kubectl get daemonset eks-pod-identity-agent -n kube-system
aws eks list-pod-identity-associations --cluster-name platform-prod

# VPC CNI: prefix delegation active
kubectl get daemonset aws-node -n kube-system -o yaml | grep -i ENABLE_PREFIX_DELEGATION

# Karpenter: pools healthy, nodes claimed
kubectl get nodepool,ec2nodeclass
kubectl get nodeclaim

# Add-ons: all ACTIVE on compatible versions
aws eks list-addons --cluster-name platform-prod
aws eks describe-addon --cluster-name platform-prod --addon-name vpc-cni \
  --query 'addon.{v:addonVersion,status:status}'

# Ingress: ALB provisioned and address assigned
kubectl get ingress -A

A fast end-to-end identity smoke test: schedule a debug pod under a Pod-Identity-bound ServiceAccount and call STS.

kubectl run sts-check --rm -it --restart=Never \
  --image=public.ecr.aws/aws-cli/aws-cli \
  --overrides='{"spec":{"serviceAccountName":"checkout-sa"}}' \
  -n payments -- sts get-caller-identity

The returned ARN should be the assumed role you associated — proof the credential chain works without any SA annotation.

Production checklist

Cost visibility, scaling limits, and the upgrade runbook

Cost visibility. Enable Split Cost Allocation Data for EKS in the billing console to attribute shared node cost down to pods by namespace and label — this is what turns “the cluster costs $X” into per-team chargeback. Tag NodePool-provisioned instances (via EC2NodeClass tags) so Cost Explorer can group by team. Karpenter consolidation is the single biggest lever on the compute line item; measure node utilization before and after enabling it.

Scaling limits to respect. IP space is the usual wall — even with prefix delegation, plan VPC CIDRs (and secondary CIDRs / custom networking) for peak pod count, not today’s. Watch per-node --max-pods, per-ENI prefix limits, and Karpenter’s own controller throughput when scaling thousands of nodes. Service quotas (ENIs per region, EBS volumes, ELBs) bite at the data-plane edges before the control plane does.

Cluster-upgrade runbook (one minor at a time):

Read the EKS release notes and Kubernetes deprecation guide for the target minor; scan workloads for removed APIs (kubectl deprecation warnings, or a tool like pluto).
Upgrade add-ons first to versions compatible with the target Kubernetes version.
Upgrade the control plane: aws eks update-cluster-version --name platform-prod --kubernetes-version 1.32.
Roll the data plane: for Karpenter, bump the EC2NodeClass AMI alias and let consolidation/drift recycle nodes gracefully under PDBs; for managed node groups, do a rolling update.
Re-run the Verify section end to end.
Confirm no workload is pinned to a now-removed API and that HPA/Karpenter still react to load.

Pitfalls to avoid

Flipping authenticationMode to API too early. Anything still reading aws-auth (some older controllers, bootstrap scripts) loses access. Migrate, verify, then drop CONFIG_MAP.
Leaving IRSA annotations alongside a Pod Identity association. Mixed signals on the same SA cause confusing credential precedence. Pick one per workload.
Skipping the --max-pods recalculation after prefix delegation. You either under-utilize big nodes or oversubscribe IPs and stall scheduling.
Per-Ingress ALBs. Without IngressGroups, dozens of load balancers appear silently and dominate the bill.
Karpenter with no limits and no PDBs. One is a cost safety net, the other prevents consolidation from evicting pods that can’t tolerate it.

Get identity, networking, node lifecycle, and add-on hygiene right at the start, and EKS becomes a platform your teams build on without thinking about it — which is exactly the point.

Running EKS at Scale: Pod Identity, Karpenter Autoscaling, and VPC CNI Networking

Beyond `eksctl create`: the four decisions

Step 1 — Cluster provisioning with access entries

Step 2 — Workload identity: IRSA to EKS Pod Identity

Step 3 — VPC CNI tuning: prefix delegation and beyond

Step 4 — Node lifecycle with Karpenter

Step 5 — Managing core add-ons and the upgrade cadence

Step 6 — Ingress with the AWS Load Balancer Controller

Enterprise scenario

Verify

Production checklist

Cost visibility, scaling limits, and the upgrade runbook

Pitfalls to avoid

Written by Vinod

Comments

Keep Reading

Centralized AWS Backup with Organizations: Vault Lock, Cross-Account Copy, and Recovery Runbooks

Centralized Egress Inspection with AWS Network Firewall: Routing, Domain Filtering, and Suricata Rules

Validating VPC Connectivity with Reachability Analyzer and Network Access Analyzer

Running EKS at Scale: Pod Identity, Karpenter Autoscaling, and VPC CNI Networking

Beyond eksctl create: the four decisions

Step 1 — Cluster provisioning with access entries

Step 2 — Workload identity: IRSA to EKS Pod Identity

Step 3 — VPC CNI tuning: prefix delegation and beyond

Step 4 — Node lifecycle with Karpenter

Step 5 — Managing core add-ons and the upgrade cadence

Step 6 — Ingress with the AWS Load Balancer Controller

Enterprise scenario

Verify

Production checklist

Cost visibility, scaling limits, and the upgrade runbook

Pitfalls to avoid

Written by Vinod

Comments

Keep Reading

Centralized AWS Backup with Organizations: Vault Lock, Cross-Account Copy, and Recovery Runbooks

Centralized Egress Inspection with AWS Network Firewall: Routing, Domain Filtering, and Suricata Rules

Validating VPC Connectivity with Reachability Analyzer and Network Access Analyzer

Beyond `eksctl create`: the four decisions