GKE Autopilot in Production: A Hardening and Cost-Control Playbook

Autopilot promises a Kubernetes cluster where Google runs the nodes and you pay for pods. That promise is real, but the abstraction leaks in specific, predictable places: you cannot SSH a node, you cannot run a privileged DaemonSet, and your bill is driven by resource requests rather than nodes you can pack. This playbook covers how to run Autopilot for production workloads — provisioning a private cluster, right-sizing without node access, the scheduling levers that still work, the security controls worth turning on, and the cost traps that quietly inflate the invoice.

Autopilot vs Standard: what you are signing up for

Autopilot is not a different Kubernetes; it is a different operational contract. Google manages and SRE-owns the nodes, applies a hardened node configuration you cannot override, and charges you for the CPU, memory, and ephemeral storage your pods request (not the node capacity).

Dimension	Standard	Autopilot
Node ownership	You size, patch, scale node pools	Google provisions and manages nodes
Billing unit	Node-hours (whatever you provision)	Pod resource requests + cluster fee
Node access	SSH, privileged pods, custom DaemonSets	No SSH; privileged and host-namespace pods blocked
Bin-packing	Your responsibility (Karpenter-style)	Google packs and scales for you
Security defaults	You opt in	Shielded nodes, Workload Identity, hardened OS by default

The mental shift: on Standard you optimize node utilization; on Autopilot you optimize request accuracy, because over-requesting is the entire cost story. There is one cluster management fee per cluster (the same flat fee as Standard), then per-pod resource billing on top.

Autopilot is the right default for teams that want platform discipline without owning node lifecycle. Reach for Standard when you genuinely need DaemonSets that touch the host, GPUs with custom drivers, Windows nodes, or extreme bin-packing control. Increasingly the two converge — much of the Autopilot security posture is now available as a compute class on Standard — but the billing and node-access contract is what actually differs.

Step 1: Provision a private Autopilot cluster

For production, the cluster should be private (nodes have no public IPs) with a control plane reachable only from authorized networks. Autopilot clusters are regional and use VPC-native (alias IP) networking by default.

gcloud container clusters create-auto prod-apps \
  --project my-prod-project \
  --region us-central1 \
  --network projects/my-host-project/global/networks/shared-vpc \
  --subnetwork projects/my-host-project/regions/us-central1/subnetworks/gke-us-central1 \
  --enable-private-nodes \
  --enable-master-authorized-networks \
  --master-authorized-networks 10.0.0.0/8,203.0.113.10/32 \
  --release-channel regular \
  --enable-google-cloud-access

A few choices worth defending:

--enable-private-nodes removes public IPs from nodes; egress to the internet then flows through Cloud NAT, which you provision separately on the subnet’s region.
--enable-master-authorized-networks with an explicit CIDR list locks the public control-plane endpoint to known sources (your CI ranges, a bastion, corporate egress). For a fully private control plane, also use --enable-private-endpoint, but then your tooling must reach the private endpoint over the VPC.
--release-channel is effectively mandatory on Autopilot — Google manages the version. regular is the sane production default; stable lags further behind for risk-averse fleets.
--enable-google-cloud-access lets the private control plane be reached from Google Cloud public IP ranges, which keeps some managed integrations working without opening the endpoint to the world.

In Terraform, pin the same shape so it is reviewable and reproducible:

resource "google_container_cluster" "prod_apps" {
  name             = "prod-apps"
  project          = "my-prod-project"
  location         = "us-central1"
  enable_autopilot = true

  network    = "projects/my-host-project/global/networks/shared-vpc"
  subnetwork = "projects/my-host-project/regions/us-central1/subnetworks/gke-us-central1"

  release_channel {
    channel = "REGULAR"
  }

  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false
  }

  master_authorized_networks_config {
    cidr_blocks {
      cidr_block   = "10.0.0.0/8"
      display_name = "internal"
    }
  }

  # Workload Identity is on by default for Autopilot; pinning it is explicit.
  workload_identity_config {
    workload_pool = "my-prod-project.svc.id.goog"
  }
}

Do not set remove_default_node_pool or node_config blocks on an Autopilot cluster — there are no node pools to manage, and Terraform will reject node-level fields. This is the single most common copy-paste error when porting a Standard config.

Step 2: Right-size pod requests when you cannot touch nodes

On Autopilot, the pod spec is your capacity plan. Two rules govern the cost:

Autopilot enforces minimum resource requests per pod. A pod requesting less than the floor (commonly around 250m CPU / 0.5 GiB memory for the general-purpose class, more for DaemonSet-style and some compute classes) is bumped up to the floor — and you pay the floor. Ten tiny sidecar-only pods can cost far more than their actual footprint.
Autopilot enforces a CPU-to-memory ratio range per compute class. Wildly skewed requests (for example 4 CPU with 256 MiB) get adjusted, again changing what you pay.

Inspect what Autopilot actually admitted versus what you asked for:

# What did the mutating webhook set the request to?
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].resources}{"\n"}'

# Mutation/adjustment events are surfaced as warnings on the object
kubectl describe pod <pod> | grep -iE 'autopilot|adjust|limit'

Set requests deliberately. On Autopilot, if you omit limits, GKE sets limits == requests, so the request is both your guaranteed allocation and your cap — there is no burst above it.

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
    ephemeral-storage: "1Gi"
  limits:
    cpu: "500m"
    memory: "512Mi"
    ephemeral-storage: "1Gi"

To find the right numbers instead of guessing, enable Vertical Pod Autoscaler in recommendation-only mode and let it observe real usage before you commit:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Off"   # recommend only; do not mutate live pods

Then read status.recommendation and bake the target into the Deployment. Combine VPA recommendations with HPA on a custom or CPU metric for the number of replicas — but never let VPA and HPA both drive the same CPU/memory signal, or they fight.

Step 3: Scheduling levers that still work

You cannot taint nodes or write node-affinity against node pools you do not control, but the workload-level scheduling primitives are fully available and matter more on a platform that scales nodes underneath you.

PodDisruptionBudgets protect availability during the node upgrades and consolidations Google performs:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api

A PDB is your contract with Autopilot’s node maintenance. Without one, a node recycle can evict every replica of a 2-pod Deployment at once. Set minAvailable to leave real headroom; a PDB that allows zero disruptions can also block legitimate upgrades.

Topology spread keeps replicas across zones so a single-zone event does not take the service down:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: api

PriorityClasses decide who wins when capacity is briefly tight. Define a high class for latency-critical services and let lower-priority batch work be preempted rather than starving the front door:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "Latency-critical request-path workloads"

For workload separation you select a compute class (for example a general-purpose, scale-out, or balanced class) and, where supported, Spot via a node selector rather than managing pools:

spec:
  nodeSelector:
    cloud.google.com/gke-spot: "true"

Step 4: Harden with Workload Identity, Binary Authorization, and Pod Security

Autopilot ships secure-by-default, but three controls are worth turning on explicitly for production.

Workload Identity Federation for GKE is the only sane way to reach Google APIs — it is enabled by default on Autopilot, and you should never mount service-account keys. Bind a Kubernetes ServiceAccount to an IAM service account:

# Allow the KSA to impersonate the GSA
gcloud iam service-accounts add-iam-policy-binding \
  api-runtime@my-prod-project.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:my-prod-project.svc.id.goog[apps/api]"

apiVersion: v1
kind: ServiceAccount
metadata:
  name: api
  namespace: apps
  annotations:
    iam.gke.io/gcp-service-account: api-runtime@my-prod-project.iam.gserviceaccount.com

The pod runs as api, GKE mints a short-lived token, and the workload calls Google APIs as the GSA with no static credential anywhere.

Binary Authorization stops anything but signed, attested images from running. Enable it on the cluster and enforce a policy that requires your CI attestor:

gcloud container clusters update prod-apps \
  --region us-central1 \
  --binauthz-evaluation-mode=PROJECT_SINGLETON_POLICY_ENFORCE

Pair it with a policy whose default rule requires attestations, plus narrow allowlist exemptions for trusted system images. Run in dry-run first and watch the audit logs for what would be blocked before you enforce.

Pod Security Admission enforces the upstream Pod Security Standards at the namespace level. Autopilot already blocks privileged and host-namespace pods, but PSA gives you an explicit, auditable baseline:

apiVersion: v1
kind: Namespace
metadata:
  name: apps
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/warn: restricted

Start with warn/audit to surface violations, then move enforce to restricted once workloads comply.

Step 5: Ingress, Gateway API, and container-native load balancing

Autopilot uses container-native load balancing via Network Endpoint Groups (NEGs) — the load balancer targets pod IPs directly instead of hopping through a node port, which removes a hop and gives accurate health checks. The modern, recommended way to expose services is the Gateway API, which on GKE is backed by Google Cloud Load Balancing.

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: external-gw
  namespace: apps
spec:
  gatewayClassName: gke-l7-global-external-managed
  listeners:
    - name: https
      protocol: HTTPS
      port: 443
      tls:
        mode: Terminate
        certificateRefs:
          - kind: Secret
            name: api-tls
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: api-route
  namespace: apps
spec:
  parentRefs:
    - name: external-gw
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /api
      backendRefs:
        - name: api
          port: 80

The gke-l7-global-external-managed GatewayClass provisions a global external Application Load Balancer; internal and regional classes exist for private and regional needs. NEG-based backends are created automatically for the referenced Service. If you are still on classic Ingress it works, but Gateway API is where header-based routing, traffic splitting, and cross-namespace delegation live.

Step 6: Observability with managed Prometheus and Cloud Logging

Autopilot enables Google Cloud Managed Service for Prometheus and Cloud Logging/Monitoring by default — you get system metrics and logs without running a collector. To scrape your own application metrics, declare a PodMonitoring resource:

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: api-metrics
  namespace: apps
spec:
  selector:
    matchLabels:
      app: api
  endpoints:
    - port: metrics
      interval: 30s

Managed Prometheus is fully PromQL-compatible, so existing dashboards and recording rules port over, and you query through Cloud Monitoring or any Prometheus-API client. Note that managed metric samples and log ingestion are billed by volume — high-cardinality labels and verbose debug logging are a real cost line, not just noise.

Step 7: Cost traps that bite teams

The bill surprises come from a handful of mechanics:

Trap	Why it costs	Mitigation
Minimum request floors	Tiny pods are billed at the per-pod floor, not actual usage	Consolidate sidecars; right-size with VPA; avoid many micro-pods
Over-requested headroom	You pay the request even at 5% utilization	VPA-recommend, then trim; do not pad “just in case”
`limits == requests`	Omitting limits caps burst at the request	Set requests to real peak, not average, for spiky apps
Idle replicas	`replicas: 3` at night still bills	HPA with a sane floor; scale-to-floor off-peak
High-cardinality metrics/logs	Managed Prometheus and Logging bill by volume	Drop noisy labels; route debug logs away from ingestion

Balloon (low-priority placeholder) pods are a deliberate technique on Autopilot: schedule pods with a negative PriorityClass that reserve capacity and get preempted the instant a real workload needs the room. This keeps warm headroom so scale-ups do not wait on node provisioning. The catch is that reserved capacity is still billed while the balloon runs — use it to buy latency, and size it so the cost of warm headroom is less than the cost of a cold-start stall.

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: balloon
value: -10   # negative: preempted before any real workload
globalDefault: false

Spot Autopilot is the biggest lever for fault-tolerant work: schedule eligible pods onto Spot capacity with the cloud.google.com/gke-spot selector for a substantial discount, accepting that Google can reclaim them on short notice. Keep request-path services on standard capacity and push batch, async, and stateless retry-safe jobs to Spot.

Enterprise scenario

A fintech platform team migrated forty microservices from a self-managed Standard cluster to Autopilot to shed node-patching toil. The lift-and-shift looked clean until the first month’s invoice came in roughly 30% higher than the old node-based bill. The cause was not the workloads — it was a shared Istio sidecar and a Datadog agent injected into every pod. Each service requested a modest 100m CPU, but Autopilot bumped every pod to the general-purpose floor (~250m CPU / 0.5 GiB), and they were running 600+ pods across staging and prod. They were paying the floor 600 times over for pods that idled at 8% utilization.

The fix was twofold. First, they ran VPA in updateMode: "Off" across the fleet for two weeks and discovered most services peaked well under their requests, so they consolidated the per-pod observability sidecar to a node-level agent on the workloads that genuinely needed it. Second, they moved async and batch consumers onto Spot, which absorbed the floor cost at a steep discount.

# Surface every pod Autopilot bumped above its declared request, fleet-wide
kubectl get pods -A -o json | jq -r '
  .items[] | select(.metadata.annotations["autopilot.gke.io/resource-adjustment"]) |
  "\(.metadata.namespace)/\(.metadata.name)"'

That one query, wired into a weekly review, turned an invisible 30% premium into a tracked, governed line item. The lesson: on Autopilot, pod count and the request floor — not node utilization — are the cost model, and a fleet of tiny pods is the most expensive shape you can run.

Verify

Run these after provisioning and after any policy change.

# Cluster is Autopilot, private, on a release channel
gcloud container clusters describe prod-apps --region us-central1 \
  --format="value(autopilot.enabled, privateClusterConfig.enablePrivateNodes, releaseChannel.channel)"

# Authorized networks are scoped, not 0.0.0.0/0
gcloud container clusters describe prod-apps --region us-central1 \
  --format="value(masterAuthorizedNetworksConfig.cidrBlocks)"

# Pods landed across multiple zones (topology spread working)
kubectl get pods -n apps -o wide \
  --sort-by='.spec.nodeName' \
  -l app=api

# Workload Identity resolves to the GSA from inside a pod
kubectl run wi-test -n apps --rm -it --restart=Never \
  --image=google/cloud-sdk:slim --serviceaccount=api \
  -- gcloud auth list

# Managed Prometheus is scraping your target
kubectl get podmonitoring -n apps

Confirm Binary Authorization is enforcing by attempting to deploy an unsigned image into a watched namespace — it should be rejected at admission, and the denial should appear in Cloud Audit Logs.

Production checklist

Pitfalls

Porting a Standard config wholesale. Node pools, taints, custom DaemonSets that touch the host, and node_config blocks will be rejected. Strip node-level concerns entirely.
Treating requests as suggestions. They are the bill and the cap (with limits == requests). Under-request and you throttle; over-request and you overpay — both are silent.
No PDBs. Autopilot recycles nodes for upgrades and consolidation; without budgets you will see availability dips you cannot explain.
Wide-open authorized networks. A private cluster with 0.0.0.0/0 in the master-authorized list is not private. Scope it to known CIDRs.
Unbounded observability spend. High-cardinality labels and debug-level logs turn managed Prometheus and Cloud Logging into a surprise line item — govern them like any other cost.

Autopilot trades node control for operational leverage, and for most teams that is the right trade. Master the request model, keep the scheduling and security controls deliberate, and the platform becomes genuinely low-toil — without the bill drifting out from under you.

GKE Autopilot in Production: A Hardening and Cost-Control Playbook

Autopilot vs Standard: what you are signing up for

Step 1: Provision a private Autopilot cluster

Step 2: Right-size pod requests when you cannot touch nodes

Step 3: Scheduling levers that still work

Step 4: Harden with Workload Identity, Binary Authorization, and Pod Security

Step 5: Ingress, Gateway API, and container-native load balancing

Step 6: Observability with managed Prometheus and Cloud Logging

Step 7: Cost traps that bite teams

Enterprise scenario

Verify

Production checklist

Pitfalls

Written by Vinod

Comments

Keep Reading

BigQuery Fine-Grained Security: Column-Level, Row-Level, and Data Masking

Cloud DNS at Scale: Private Zones, Peering, Forwarding, and Response Policies

Event-Driven Architecture with Cloud Functions 2nd Gen and Eventarc