Autopilot promises a Kubernetes cluster where Google runs the nodes and you pay for pods. That promise is real, but the abstraction leaks in specific, predictable places: you cannot SSH a node, you cannot run a privileged DaemonSet, and your bill is driven by resource requests rather than nodes you can pack. This playbook covers how to run Autopilot for production workloads — provisioning a private cluster, right-sizing without node access, the scheduling levers that still work, the security controls worth turning on, and the cost traps that quietly inflate the invoice.
Autopilot vs Standard: what you are signing up for
Autopilot is not a different Kubernetes; it is a different operational contract. Google manages and SRE-owns the nodes, applies a hardened node configuration you cannot override, and charges you for the CPU, memory, and ephemeral storage your pods request (not the node capacity).
| Dimension | Standard | Autopilot |
|---|---|---|
| Node ownership | You size, patch, scale node pools | Google provisions and manages nodes |
| Billing unit | Node-hours (whatever you provision) | Pod resource requests + cluster fee |
| Node access | SSH, privileged pods, custom DaemonSets | No SSH; privileged and host-namespace pods blocked |
| Bin-packing | Your responsibility (Karpenter-style) | Google packs and scales for you |
| Security defaults | You opt in | Shielded nodes, Workload Identity, hardened OS by default |
The mental shift: on Standard you optimize node utilization; on Autopilot you optimize request accuracy, because over-requesting is the entire cost story. There is one cluster management fee per cluster (the same flat fee as Standard), then per-pod resource billing on top.
Autopilot is the right default for teams that want platform discipline without owning node lifecycle. Reach for Standard when you genuinely need DaemonSets that touch the host, GPUs with custom drivers, Windows nodes, or extreme bin-packing control. Increasingly the two converge — much of the Autopilot security posture is now available as a compute class on Standard — but the billing and node-access contract is what actually differs.
Step 1: Provision a private Autopilot cluster
For production, the cluster should be private (nodes have no public IPs) with a control plane reachable only from authorized networks. Autopilot clusters are regional and use VPC-native (alias IP) networking by default.
gcloud container clusters create-auto prod-apps \
--project my-prod-project \
--region us-central1 \
--network projects/my-host-project/global/networks/shared-vpc \
--subnetwork projects/my-host-project/regions/us-central1/subnetworks/gke-us-central1 \
--enable-private-nodes \
--enable-master-authorized-networks \
--master-authorized-networks 10.0.0.0/8,203.0.113.10/32 \
--release-channel regular \
--enable-google-cloud-access
A few choices worth defending:
--enable-private-nodesremoves public IPs from nodes; egress to the internet then flows through Cloud NAT, which you provision separately on the subnet’s region.--enable-master-authorized-networkswith an explicit CIDR list locks the public control-plane endpoint to known sources (your CI ranges, a bastion, corporate egress). For a fully private control plane, also use--enable-private-endpoint, but then your tooling must reach the private endpoint over the VPC.--release-channelis effectively mandatory on Autopilot — Google manages the version.regularis the sane production default;stablelags further behind for risk-averse fleets.--enable-google-cloud-accesslets the private control plane be reached from Google Cloud public IP ranges, which keeps some managed integrations working without opening the endpoint to the world.
In Terraform, pin the same shape so it is reviewable and reproducible:
resource "google_container_cluster" "prod_apps" {
name = "prod-apps"
project = "my-prod-project"
location = "us-central1"
enable_autopilot = true
network = "projects/my-host-project/global/networks/shared-vpc"
subnetwork = "projects/my-host-project/regions/us-central1/subnetworks/gke-us-central1"
release_channel {
channel = "REGULAR"
}
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
}
master_authorized_networks_config {
cidr_blocks {
cidr_block = "10.0.0.0/8"
display_name = "internal"
}
}
# Workload Identity is on by default for Autopilot; pinning it is explicit.
workload_identity_config {
workload_pool = "my-prod-project.svc.id.goog"
}
}
Do not set
remove_default_node_poolornode_configblocks on an Autopilot cluster — there are no node pools to manage, and Terraform will reject node-level fields. This is the single most common copy-paste error when porting a Standard config.
Step 2: Right-size pod requests when you cannot touch nodes
On Autopilot, the pod spec is your capacity plan. Two rules govern the cost:
- Autopilot enforces minimum resource requests per pod. A pod requesting less than the floor (commonly around 250m CPU / 0.5 GiB memory for the general-purpose class, more for DaemonSet-style and some compute classes) is bumped up to the floor — and you pay the floor. Ten tiny sidecar-only pods can cost far more than their actual footprint.
- Autopilot enforces a CPU-to-memory ratio range per compute class. Wildly skewed requests (for example 4 CPU with 256 MiB) get adjusted, again changing what you pay.
Inspect what Autopilot actually admitted versus what you asked for:
# What did the mutating webhook set the request to?
kubectl get pod <pod> -o jsonpath='{.spec.containers[*].resources}{"\n"}'
# Mutation/adjustment events are surfaced as warnings on the object
kubectl describe pod <pod> | grep -iE 'autopilot|adjust|limit'
Set requests deliberately. On Autopilot, if you omit limits, GKE sets limits == requests, so the request is both your guaranteed allocation and your cap — there is no burst above it.
resources:
requests:
cpu: "500m"
memory: "512Mi"
ephemeral-storage: "1Gi"
limits:
cpu: "500m"
memory: "512Mi"
ephemeral-storage: "1Gi"
To find the right numbers instead of guessing, enable Vertical Pod Autoscaler in recommendation-only mode and let it observe real usage before you commit:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api
updatePolicy:
updateMode: "Off" # recommend only; do not mutate live pods
Then read status.recommendation and bake the target into the Deployment. Combine VPA recommendations with HPA on a custom or CPU metric for the number of replicas — but never let VPA and HPA both drive the same CPU/memory signal, or they fight.
Step 3: Scheduling levers that still work
You cannot taint nodes or write node-affinity against node pools you do not control, but the workload-level scheduling primitives are fully available and matter more on a platform that scales nodes underneath you.
PodDisruptionBudgets protect availability during the node upgrades and consolidations Google performs:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: api
A PDB is your contract with Autopilot’s node maintenance. Without one, a node recycle can evict every replica of a 2-pod Deployment at once. Set
minAvailableto leave real headroom; a PDB that allows zero disruptions can also block legitimate upgrades.
Topology spread keeps replicas across zones so a single-zone event does not take the service down:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api
PriorityClasses decide who wins when capacity is briefly tight. Define a high class for latency-critical services and let lower-priority batch work be preempted rather than starving the front door:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "Latency-critical request-path workloads"
For workload separation you select a compute class (for example a general-purpose, scale-out, or balanced class) and, where supported, Spot via a node selector rather than managing pools:
spec:
nodeSelector:
cloud.google.com/gke-spot: "true"
Step 4: Harden with Workload Identity, Binary Authorization, and Pod Security
Autopilot ships secure-by-default, but three controls are worth turning on explicitly for production.
Workload Identity Federation for GKE is the only sane way to reach Google APIs — it is enabled by default on Autopilot, and you should never mount service-account keys. Bind a Kubernetes ServiceAccount to an IAM service account:
# Allow the KSA to impersonate the GSA
gcloud iam service-accounts add-iam-policy-binding \
api-runtime@my-prod-project.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:my-prod-project.svc.id.goog[apps/api]"
apiVersion: v1
kind: ServiceAccount
metadata:
name: api
namespace: apps
annotations:
iam.gke.io/gcp-service-account: api-runtime@my-prod-project.iam.gserviceaccount.com
The pod runs as api, GKE mints a short-lived token, and the workload calls Google APIs as the GSA with no static credential anywhere.
Binary Authorization stops anything but signed, attested images from running. Enable it on the cluster and enforce a policy that requires your CI attestor:
gcloud container clusters update prod-apps \
--region us-central1 \
--binauthz-evaluation-mode=PROJECT_SINGLETON_POLICY_ENFORCE
Pair it with a policy whose default rule requires attestations, plus narrow allowlist exemptions for trusted system images. Run in dry-run first and watch the audit logs for what would be blocked before you enforce.
Pod Security Admission enforces the upstream Pod Security Standards at the namespace level. Autopilot already blocks privileged and host-namespace pods, but PSA gives you an explicit, auditable baseline:
apiVersion: v1
kind: Namespace
metadata:
name: apps
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/warn: restricted
Start with warn/audit to surface violations, then move enforce to restricted once workloads comply.
Step 5: Ingress, Gateway API, and container-native load balancing
Autopilot uses container-native load balancing via Network Endpoint Groups (NEGs) — the load balancer targets pod IPs directly instead of hopping through a node port, which removes a hop and gives accurate health checks. The modern, recommended way to expose services is the Gateway API, which on GKE is backed by Google Cloud Load Balancing.
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: external-gw
namespace: apps
spec:
gatewayClassName: gke-l7-global-external-managed
listeners:
- name: https
protocol: HTTPS
port: 443
tls:
mode: Terminate
certificateRefs:
- kind: Secret
name: api-tls
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: api-route
namespace: apps
spec:
parentRefs:
- name: external-gw
rules:
- matches:
- path:
type: PathPrefix
value: /api
backendRefs:
- name: api
port: 80
The gke-l7-global-external-managed GatewayClass provisions a global external Application Load Balancer; internal and regional classes exist for private and regional needs. NEG-based backends are created automatically for the referenced Service. If you are still on classic Ingress it works, but Gateway API is where header-based routing, traffic splitting, and cross-namespace delegation live.
Step 6: Observability with managed Prometheus and Cloud Logging
Autopilot enables Google Cloud Managed Service for Prometheus and Cloud Logging/Monitoring by default — you get system metrics and logs without running a collector. To scrape your own application metrics, declare a PodMonitoring resource:
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
name: api-metrics
namespace: apps
spec:
selector:
matchLabels:
app: api
endpoints:
- port: metrics
interval: 30s
Managed Prometheus is fully PromQL-compatible, so existing dashboards and recording rules port over, and you query through Cloud Monitoring or any Prometheus-API client. Note that managed metric samples and log ingestion are billed by volume — high-cardinality labels and verbose debug logging are a real cost line, not just noise.
Step 7: Cost traps that bite teams
The bill surprises come from a handful of mechanics:
| Trap | Why it costs | Mitigation |
|---|---|---|
| Minimum request floors | Tiny pods are billed at the per-pod floor, not actual usage | Consolidate sidecars; right-size with VPA; avoid many micro-pods |
| Over-requested headroom | You pay the request even at 5% utilization | VPA-recommend, then trim; do not pad “just in case” |
limits == requests |
Omitting limits caps burst at the request | Set requests to real peak, not average, for spiky apps |
| Idle replicas | replicas: 3 at night still bills |
HPA with a sane floor; scale-to-floor off-peak |
| High-cardinality metrics/logs | Managed Prometheus and Logging bill by volume | Drop noisy labels; route debug logs away from ingestion |
Balloon (low-priority placeholder) pods are a deliberate technique on Autopilot: schedule pods with a negative PriorityClass that reserve capacity and get preempted the instant a real workload needs the room. This keeps warm headroom so scale-ups do not wait on node provisioning. The catch is that reserved capacity is still billed while the balloon runs — use it to buy latency, and size it so the cost of warm headroom is less than the cost of a cold-start stall.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: balloon
value: -10 # negative: preempted before any real workload
globalDefault: false
Spot Autopilot is the biggest lever for fault-tolerant work: schedule eligible pods onto Spot capacity with the cloud.google.com/gke-spot selector for a substantial discount, accepting that Google can reclaim them on short notice. Keep request-path services on standard capacity and push batch, async, and stateless retry-safe jobs to Spot.
Enterprise scenario
A fintech platform team migrated forty microservices from a self-managed Standard cluster to Autopilot to shed node-patching toil. The lift-and-shift looked clean until the first month’s invoice came in roughly 30% higher than the old node-based bill. The cause was not the workloads — it was a shared Istio sidecar and a Datadog agent injected into every pod. Each service requested a modest 100m CPU, but Autopilot bumped every pod to the general-purpose floor (~250m CPU / 0.5 GiB), and they were running 600+ pods across staging and prod. They were paying the floor 600 times over for pods that idled at 8% utilization.
The fix was twofold. First, they ran VPA in updateMode: "Off" across the fleet for two weeks and discovered most services peaked well under their requests, so they consolidated the per-pod observability sidecar to a node-level agent on the workloads that genuinely needed it. Second, they moved async and batch consumers onto Spot, which absorbed the floor cost at a steep discount.
# Surface every pod Autopilot bumped above its declared request, fleet-wide
kubectl get pods -A -o json | jq -r '
.items[] | select(.metadata.annotations["autopilot.gke.io/resource-adjustment"]) |
"\(.metadata.namespace)/\(.metadata.name)"'
That one query, wired into a weekly review, turned an invisible 30% premium into a tracked, governed line item. The lesson: on Autopilot, pod count and the request floor — not node utilization — are the cost model, and a fleet of tiny pods is the most expensive shape you can run.
Verify
Run these after provisioning and after any policy change.
# Cluster is Autopilot, private, on a release channel
gcloud container clusters describe prod-apps --region us-central1 \
--format="value(autopilot.enabled, privateClusterConfig.enablePrivateNodes, releaseChannel.channel)"
# Authorized networks are scoped, not 0.0.0.0/0
gcloud container clusters describe prod-apps --region us-central1 \
--format="value(masterAuthorizedNetworksConfig.cidrBlocks)"
# Pods landed across multiple zones (topology spread working)
kubectl get pods -n apps -o wide \
--sort-by='.spec.nodeName' \
-l app=api
# Workload Identity resolves to the GSA from inside a pod
kubectl run wi-test -n apps --rm -it --restart=Never \
--image=google/cloud-sdk:slim --serviceaccount=api \
-- gcloud auth list
# Managed Prometheus is scraping your target
kubectl get podmonitoring -n apps
Confirm Binary Authorization is enforcing by attempting to deploy an unsigned image into a watched namespace — it should be rejected at admission, and the denial should appear in Cloud Audit Logs.
Production checklist
Pitfalls
- Porting a Standard config wholesale. Node pools, taints, custom DaemonSets that touch the host, and
node_configblocks will be rejected. Strip node-level concerns entirely. - Treating requests as suggestions. They are the bill and the cap (with
limits == requests). Under-request and you throttle; over-request and you overpay — both are silent. - No PDBs. Autopilot recycles nodes for upgrades and consolidation; without budgets you will see availability dips you cannot explain.
- Wide-open authorized networks. A private cluster with
0.0.0.0/0in the master-authorized list is not private. Scope it to known CIDRs. - Unbounded observability spend. High-cardinality labels and debug-level logs turn managed Prometheus and Cloud Logging into a surprise line item — govern them like any other cost.
Autopilot trades node control for operational leverage, and for most teams that is the right trade. Master the request model, keep the scheduling and security controls deliberate, and the platform becomes genuinely low-toil — without the bill drifting out from under you.