Google Kubernetes Engine, In Depth: Autopilot vs Standard, Node Pools, Networking & Security

Google Kubernetes Engine (GKE) is Google Cloud’s managed Kubernetes — the same Kubernetes you would run yourself, but with Google operating the control plane, patching it, scaling it, and (in Autopilot) running the nodes too. Kubernetes itself came out of Google’s internal Borg system, and GKE is the most opinionated, most automated managed Kubernetes of the big three: it ships secure defaults, two genuinely different operating modes, an eBPF dataplane, and keyless identity for pods, all wired together out of the box.

This lesson is the exhaustive version. By the end you will know every meaningful knob you set when you create a cluster and a node pool, the load-bearing concepts an interviewer probes (Autopilot vs Standard, zonal vs regional, VPC-native, Workload Identity), what you can and cannot change after the fact, and the cost mechanics that quietly drive the bill. It is beginner-accessible — every term is defined — but complete enough to operate GKE in production and to answer the hard questions on the Associate Cloud Engineer and Professional Cloud Architect exams.

Learning objectives

Decide between Autopilot and Standard and justify the choice on operations, security, cost and flexibility.
Explain the control-plane topology — zonal vs regional clusters — and its availability and cost implications.
Configure a node pool end to end: machine type, image, autoscaling, node auto-provisioning, auto-upgrade, auto-repair, surge upgrades, Spot VMs, taints, labels and node-system config.
Design GKE networking: VPC-native (alias IP) clusters, the pod/service/node ranges, Dataplane V2 (eBPF) and network policy, private clusters, and Gateway vs Ingress vs Service load balancing.
Use Workload Identity Federation for GKE so pods call Google APIs with no service-account keys.
Harden a cluster (Shielded GKE nodes, Binary Authorization, Security Posture, node auto-upgrade) and pick a release channel.
Control GKE cost with the right mode, machine families, Spot, autoscaling, and cluster-fee awareness.

Prerequisites & where this fits

You should be comfortable with the GCP resource hierarchy and IAM, with a VPC and subnets (this lesson leans on alias IP ranges and Cloud NAT), and with the absolute basics of Kubernetes — what a Pod, Deployment and Service are, and that kubectl talks to an API server. If those are fuzzy, skim the VPC and IAM fundamentals first. In the Zero-to-Hero programme this is the Containers lesson of the Intermediate tier: it follows Cloud Load Balancing (GKE leans on the same load-balancer building blocks) and precedes BigQuery. The two advanced follow-ons — Autopilot production hardening and Workload Identity in depth — assume you have the fundamentals on this page.

Core concepts

A cluster is a control plane plus a set of worker nodes. The control plane runs the Kubernetes API server, scheduler, controller-manager and etcd; in GKE it is fully managed by Google — you never SSH it, and Google patches and scales it. You talk to it with kubectl, which hits the cluster’s API endpoint.

A node is a Compute Engine VM that runs your pods via the kubelet and a container runtime (containerd). In Standard mode you group nodes into node pools — sets of identical nodes (same machine type, image, disk, settings) that scale and upgrade together. In Autopilot mode there are no visible node pools: Google provisions, sizes, patches and scales nodes for you, and you pay for the resources your pods request, not for nodes.

VPC-native is the modern networking model: pods get real IP addresses from a secondary (alias) IP range on the subnet, so pod traffic is first-class in your VPC and routable to on-prem and peered networks without extra routes. Workload Identity Federation for GKE lets a Kubernetes ServiceAccount act as (federate to) a Google service account, so pods authenticate to Google APIs using short-lived tokens and no downloaded keys. Release channels (Rapid, Regular, Stable, plus Extended) control how quickly your cluster auto-upgrades through Kubernetes versions. Keep these terms in mind — every section below is one of these boxes or the wiring between two of them.

Autopilot vs Standard: the whole decision

GKE has two modes of operation, chosen at cluster creation and (with rare exceptions) fixed for the cluster’s life — you cannot flip a Standard cluster to Autopilot in place; you create a new cluster and migrate workloads.

Autopilot is a hands-off, pod-centric mode. Google owns node provisioning, sizing, bin-packing, patching and repair. You submit pods with resource requests; Google finds or creates capacity. You are billed per vCPU/memory/ephemeral-storage requested by running pods (rounded and with minimums), plus the flat cluster management fee. Many security controls are on by default and cannot be turned off (Shielded nodes, Workload Identity, a hardened, locked-down node OS — no SSH, no privileged pods, no host namespaces, no arbitrary DaemonSets that touch the host).
Standard is the classic, node-centric mode. You define node pools, choose machine types and counts, and you are billed for node VMs (whatever you provision) plus the cluster fee. You own utilisation/bin-packing, you can SSH nodes, run privileged DaemonSets, attach GPUs with custom drivers, run Windows node pools, and tune the node OS. Maximum flexibility, maximum responsibility.

Dimension	Autopilot	Standard
Who runs nodes	Google provisions, sizes, scales, repairs	You size node pools and scale them (autoscaler optional)
Billing unit	Pod resource requests (vCPU/mem/ephemeral) + cluster fee	Node VM-hours (provisioned) + cluster fee
Bin-packing	Google packs and scales for you	Your responsibility
Node access	No SSH; privileged/host-namespace pods blocked	SSH, privileged pods, host DaemonSets allowed
Security defaults	Shielded nodes, Workload Identity, hardened OS — enforced	Opt-in (most are recommended defaults but changeable)
GPUs / TPUs	Supported via requests (managed)	Full control incl. custom drivers
Windows nodes	Not supported	Supported
DaemonSets	Restricted (no host access)	Full
SLA	Pod-level + control-plane SLA	Control-plane (+ node) SLA
Best for	Teams wanting platform discipline without node ops	Teams needing host access, special hardware, extreme packing

The mental shift: on Standard you optimise node utilisation; on Autopilot you optimise request accuracy, because over-requesting is the cost. A useful middle ground exists on Standard — compute classes and node auto-provisioning bring much of Autopilot’s automation while keeping node access — but the billing model and the node-access contract are the things that actually differ. Default to Autopilot unless you have a concrete reason (host-level DaemonSets, custom GPU drivers, Windows, or bin-packing you must control) to take node ownership.

The control plane: zonal vs regional

When you create a cluster you choose its location type, which determines where the control plane (and, for Standard, the default nodes) live. This is one of the highest-leverage availability decisions, and it is immutable — you cannot convert zonal to regional after creation.

Location type	Control plane	Default node placement	Control-plane HA	Use when
Zonal (single-zone)	One replica in one zone	Nodes in that one zone	None — control plane down during that zone’s outage/upgrade	Dev/test, cost-sensitive, non-critical
Zonal (multi-zonal)	One replica in one zone	Nodes spread across several zones	None (plane still single-zone)	You want node spread but accept a single-zone plane
Regional	3 replicas across 3 zones in the region	Nodes replicated across zones (×3 by default on Standard)	Yes — survives a zone failure, zero-downtime control-plane upgrades	Production

Two consequences people miss. First, a regional control plane gives you a highly available API endpoint and no control-plane downtime during upgrades — worth it for anything production. Second, on Standard regional clusters, a node pool’s node count is per-zone: ask for 1 node in a 3-zone region and you get 3 nodes. Autopilot clusters are regional by design (the node concept is hidden, and Google spreads pods across zones). The cluster management fee is the same flat hourly fee for zonal and regional (and for Autopilot and Standard) — what differs is the node/pod compute you pay underneath.

Creating a cluster: every setting that matters

Whether you use the Console wizard or gcloud, these are the fields you set. The Console groups them under Cluster basics, Fleet, Networking, Security, Metadata and Features; the flags below mirror them.

Setting	What it is / choices	Default	When / trade-off / gotcha
Mode	Autopilot vs Standard	Console offers Autopilot first	Immutable; drives the whole billing and ops model
Name / location	Cluster name; zone (zonal) or region (regional)	—	Location type is immutable; pick region for prod
Release channel	Rapid / Regular / Stable / Extended (or static, Standard only)	Regular	Channel = auto-upgrade cadence; static versions on Standard reach end-of-life and force upgrades
Control-plane version	Specific minor/patch within the channel	Channel default	You can set a target, but the channel keeps it current
Network / subnet	Which VPC and subnet the cluster uses	`default`	Choose a real subnet with sized secondary ranges
Network policy / dataplane	Legacy (Calico) vs Dataplane V2 (eBPF)	Dataplane V2 on new clusters	DPv2 also gives network-policy logging and better observability
Cluster IP allocation	VPC-native (alias IP) vs routes-based (legacy)	VPC-native	Routes-based is legacy; always pick VPC-native
Pod range / Service range	Secondary ranges (or auto-created) sizing pod and ClusterIP space	Auto	These cap pods-per-cluster and services; immutable — size up front (see networking)
Private cluster	Nodes get internal IPs only; control-plane access via private endpoint	Off (public nodes)	Recommended for prod; pair with Cloud NAT for egress
Control-plane authorized networks	CIDR allow-list for the public API endpoint	Off	Strongly recommended even on public clusters
Workload Identity	Federate K8s SAs to Google SAs	On (enforced on Autopilot)	Enable on Standard at create; the keyless standard
Shielded GKE nodes	Secure/measured boot + integrity monitoring for nodes	On (enforced on Autopilot)	Keep on; cheap integrity guarantee
Binary Authorization	Admission policy: only signed/attested images run	Off	Turn on for supply-chain control in prod
Security posture / vuln scanning	Built-in misconfig + workload vuln dashboard	Basic on	Standard tier adds deeper scanning
Cluster autoscaler / NAP	Per-pool autoscale; node auto-provisioning creates pools on demand	Off (Standard)	Autopilot does this implicitly; enable on Standard for elasticity
Maintenance window / exclusions	When auto-upgrades may run; blackout windows	Any time	Set a window + freeze around peak events
Fleet registration	Join the cluster to a fleet (multi-cluster mgmt)	Optional	Needed for Config Management, multi-cluster services, Gateway

A minimal Autopilot cluster:

gcloud container clusters create-auto demo-auto \
  --region=us-central1 \
  --release-channel=regular \
  --enable-private-nodes

A minimal production-shaped Standard regional cluster (VPC-native, private nodes, Dataplane V2, Workload Identity, Shielded nodes):

gcloud container clusters create demo-std \
  --region=us-central1 \
  --release-channel=regular \
  --enable-ip-alias \
  --enable-dataplane-v2 \
  --enable-private-nodes \
  --master-ipv4-cidr=172.16.0.0/28 \
  --workload-pool=$(gcloud config get-value project).svc.id.goog \
  --shielded-secure-boot --shielded-integrity-monitoring \
  --num-nodes=1 \
  --machine-type=e2-standard-4

(--num-nodes=1 on a 3-zone region creates 3 nodes — one per zone.)

Node pools, in depth (Standard)

A node pool is a group of identical nodes managed as a unit. A cluster always has at least one (the default pool created with the cluster); you add more to mix machine types, attach GPUs, use Spot, or isolate workloads with taints. Autopilot has no user-visible node pools — skip this section if you run Autopilot. Every setting below is per-pool.

Node-pool setting	What it is / choices	Default	When / trade-off / gotcha
Machine type / family	E2, N2, N2D, C3, C3D, T2D (general/compute/memory/Arm), plus A-series/accelerator for GPU/TPU; custom types allowed	`e2-medium`	Match CPU:RAM to the workload; reserve some capacity for system pods
Image type	Container-Optimized OS (cos_containerd, default & recommended), Ubuntu (ubuntu_containerd), Windows	COS	COS is minimal/auto-updating; Ubuntu only if you need its packages/kernel modules
Boot disk	pd-standard / pd-balanced / pd-ssd; size; CMEK	pd-balanced, 100 GB	SSD for I/O-heavy nodes; size for image cache + ephemeral storage
Node count	Fixed size of the pool (per-zone on regional)	3	With autoscaler this is the starting size
Autoscaling (min/max)	Cluster autoscaler grows/shrinks the pool by pending-pod pressure	Off	Set sane max; scale-down respects PodDisruptionBudgets and `safe-to-evict`
Location policy	BALANCED vs ANY (where the autoscaler adds nodes)	BALANCED	ANY helps land scarce capacity (e.g. Spot/GPU)
Auto-upgrade	Nodes auto-upgraded toward the control-plane version	On (required on channels)	Keeps nodes patched; control the timing with maintenance windows
Auto-repair	Unhealthy nodes auto-recreated	On	Repairs failed/NotReady nodes; expect occasional node churn
Surge upgrade	`max-surge` (extra temp nodes) + `max-unavailable` during upgrades	surge 1 / unavailable 0	Higher surge = faster, more cost during upgrade; or use blue-green node-pool upgrades
Spot VMs	Preemptible, deeply discounted, can be reclaimed any time (no 24h cap)	Off	Up to ~60–91% cheaper; only for fault-tolerant/stateless work; combine with on-demand pool
Taints	`key=value:NoSchedule	PreferNoSchedule	NoExecute` to repel pods lacking the matching toleration
Labels	Node labels for `nodeSelector`/affinity	GKE adds some	Use for scheduling and cost attribution
Node metadata / SA	The node’s Compute Engine service account & scopes; metadata concealment	default CE SA	Give nodes a least-privilege SA; pods get Google access via Workload Identity, not node scopes
`node-system-config`	sysctls and kubelet config (e.g. `--max-pods-per-node`)	Platform defaults	Max-pods-per-node is set at pool creation and constrains the alias range
Confidential nodes	Memory encrypted in use (AMD SEV)	Off	For sensitive workloads; small overhead

Add a Spot, GPU-style or autoscaling pool:

# Autoscaling on-demand pool
gcloud container node-pools create web-pool \
  --cluster=demo-std --region=us-central1 \
  --machine-type=e2-standard-4 \
  --enable-autoscaling --min-nodes=1 --max-nodes=6 \
  --enable-autoupgrade --enable-autorepair \
  --max-surge-upgrade=2 --max-unavailable-upgrade=0

# Spot pool, tainted so only tolerant pods land here
gcloud container node-pools create spot-pool \
  --cluster=demo-std --region=us-central1 \
  --machine-type=e2-standard-4 --spot \
  --enable-autoscaling --min-nodes=0 --max-nodes=10 \
  --node-taints=cloud.google.com/gke-spot=true:NoSchedule

Cluster autoscaler vs node auto-provisioning

The cluster autoscaler (CA) scales node counts within pools you already defined, up to each pool’s max, based on pending (unschedulable) pods, and scales down underused nodes when their pods can move elsewhere. Node auto-provisioning (NAP) goes further: it creates and deletes whole node pools automatically with machine shapes that fit pending pods — closer to the Autopilot experience while keeping node access. Enable NAP with resource limits:

gcloud container clusters update demo-std --region=us-central1 \
  --enable-autoprovisioning --min-cpu=1 --max-cpu=64 --min-memory=1 --max-memory=256

Networking: VPC-native, ranges, Dataplane V2, private clusters

VPC-native and the three IP ranges

A VPC-native cluster draws from three address pools that you must size up front, because the pod and service ranges are effectively immutable for the cluster’s life:

Node range — the subnet’s primary range; one IP per node.
Pod range — a secondary (alias) IP range on the subnet; pods get real, routable VPC IPs from it. The pod range size and the per-node max-pods together cap how many pods/nodes you can run. GKE reserves an alias block per node sized at roughly 2× max-pods-per-node (default max-pods 110 → /24 per node).
Service range — a second secondary range backing ClusterIP Services (virtual IPs, never assigned to a node/pod).

Size the pod range with headroom: a /16 pod range with the default 110 max-pods-per-node supports a few hundred nodes; shrinking max-pods-per-node packs more nodes into the same range. Because you cannot grow the pod or service range later, over-provision deliberately. Pods getting real VPC IPs is what makes GKE traffic routable to on-prem, peered VPCs and Private Google Access without per-pod routes — the big advantage over the legacy routes-based model.

Dataplane V2 (eBPF)

Dataplane V2 replaces the old kube-proxy/iptables and Calico stack with an eBPF dataplane built on Cilium. It is the default on new clusters and brings: scalable Service/load-balancing handling, built-in NetworkPolicy enforcement (no separate Calico add-on), network policy logging, and better visibility. NetworkPolicy is how you implement pod-to-pod firewalling (default-deny then allow):

gcloud container clusters create demo-dpv2 --region=us-central1 \
  --enable-dataplane-v2 --enable-ip-alias

Private clusters

In a private cluster, nodes have internal IPs only (no public IPs), shrinking the attack surface. You then control how you reach the control plane:

Public endpoint with authorized networks — API reachable from allow-listed CIDRs.
Private endpoint — API reachable only from within the VPC/peered networks (most locked-down). Set --master-ipv4-cidr for the control-plane subnet and --enable-private-endpoint to disable the public one.

Because private nodes have no public IP, outbound internet (pulling images from non-Artifact-Registry registries, calling third-party APIs) needs Cloud NAT, and access to Google APIs/Artifact Registry needs Private Google Access (on by default for GKE subnets). This is the standard production shape: private nodes + authorized networks (or private endpoint) + Cloud NAT.

Exposing workloads: Service, Ingress, Gateway

Mechanism	Layer	What it creates	Use when
Service type LoadBalancer	L4	A passthrough/internal Network LB to the pods	Simple TCP/UDP exposure of one Service
Ingress (GKE Ingress)	L7	A Google Application Load Balancer via HTTP(S)	HTTP routing, managed certs, Cloud CDN/Armor; the long-standing default
Gateway API (GKE Gateway)	L7/L4	LBs driven by the standard Gateway/HTTPRoute CRDs	Modern, role-oriented, multi-cluster traffic, finer control
Container-native LB (NEGs)	—	LB sends traffic directly to pod IPs (skips node hop)	Default with VPC-native; better latency, accurate health checks

GKE Ingress and Gateway both lean on the same Cloud Load Balancing building blocks (forwarding rule → target proxy → URL map → backend service → NEG of pod IPs). Container-native load balancing via Network Endpoint Groups (NEGs) — automatic in VPC-native clusters — routes the LB straight to pod IPs, which is why VPC-native matters for ingress too. Gateway is the forward-looking choice for new platforms; Ingress remains fully supported.

Workload Identity Federation for GKE

The single most important security feature: Workload Identity Federation for GKE lets a Kubernetes ServiceAccount impersonate a Google service account, so pods call Google APIs (Cloud Storage, Pub/Sub, BigQuery…) with short-lived, automatically-rotated credentials and no downloaded JSON keys. It replaces the bad old pattern of mounting a service-account key into a pod (a long-lived secret that leaks and never rotates) and the blunt pattern of granting the node’s service account broad scopes (which gives every pod on the node the same access).

It is on and enforced in Autopilot; on Standard you enable it on the cluster (--workload-pool=PROJECT.svc.id.goog) and on each node pool. The wiring is a three-step bind:

# 1. Allow the K8s SA (namespace/ksa) to impersonate the Google SA
gcloud iam service-accounts add-iam-policy-binding \
  app-gsa@$PROJECT.iam.gserviceaccount.com \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:$PROJECT.svc.id.goog[my-ns/app-ksa]"

# 2. Annotate the Kubernetes ServiceAccount to point at the Google SA
kubectl annotate serviceaccount app-ksa -n my-ns \
  iam.gke.io/gcp-service-account=app-gsa@$PROJECT.iam.gserviceaccount.com

# 3. Grant the Google SA only the roles the workload needs (least privilege)
gcloud projects add-iam-policy-binding $PROJECT \
  --member="serviceAccount:app-gsa@$PROJECT.iam.gserviceaccount.com" \
  --role=roles/storage.objectViewer

Pods running under app-ksa then authenticate to Google APIs as app-gsa automatically — no keys anywhere. (The depth lesson on Workload Identity walks through troubleshooting, the metadata server, and migration; this is the fundamentals.)

Security: the controls worth enabling

GKE ships more secure defaults than self-managed Kubernetes, but production warrants going further. The major controls:

Shielded GKE nodes — secure boot, vTPM and integrity monitoring so a node’s boot integrity is verifiable. On by default on Autopilot; enable on Standard.
Workload Identity — keyless pod-to-Google-API auth (above). The default for service access.
Private cluster + authorized networks — internal-only nodes and a locked-down API endpoint (above).
Binary Authorization — an admission controller that only admits container images that are signed/attested by your build pipeline, blocking unvetted images (supply-chain control).
GKE Security Posture dashboard — built-in detection of workload misconfigurations and container vulnerabilities, with a standard tier for deeper scanning.
Node auto-upgrade — keeps nodes patched against CVEs automatically (required on release channels); pair with maintenance windows so it happens on your schedule.
NetworkPolicy (Dataplane V2) — default-deny pod-to-pod traffic and allow only what’s needed.
Least-privilege node service account — give nodes a custom SA with minimal roles (not the default Compute SA, and never the broad cloud-platform scope) and rely on Workload Identity for pod-level access.
Confidential GKE nodes — encrypt node memory in use for sensitive data.

Release channels and upgrades

A release channel subscribes the cluster to an upgrade cadence; Google then auto-upgrades the control plane (and, with auto-upgrade, the nodes) along that channel, having validated each version.

Channel	Cadence / maturity	Use when
Rapid	Newest versions soonest (incl. latest minor)	Test/staging, early access to features
Regular	Balanced — proven a few weeks after Rapid	Default for most production
Stable	Most conservative, longest soak	Risk-averse production
Extended	Longer support window for a version (extra cost)	Workloads that need to pin a version longer
Static (no channel)	Standard only; you pin a version manually	Avoid — versions reach end-of-life and force upgrades

Control upgrade timing with maintenance windows (when upgrades may run) and maintenance exclusions (blackout periods — e.g. freeze during a launch or peak shopping week). For nodes, choose the upgrade strategy per pool: surge upgrades (extra temporary nodes drain-and-replace gradually) or blue-green (stand up a parallel set, shift, then tear down the old) for safer, reversible rollouts. Regional control planes upgrade with zero API downtime; zonal planes are briefly unavailable during the control-plane upgrade.

Architecture at a glance

The diagram below contrasts the two operating modes and shows the shared building blocks — the Google-managed control plane on top, then Autopilot (Google-run nodes, pay-per-pod-request) beside Standard (your node pools on Compute Engine VMs, pay-per-node), with VPC-native pod/service ranges, Dataplane V2, a private-cluster boundary, and Workload Identity linking a pod’s Kubernetes ServiceAccount to a Google service account.

Google Kubernetes Engine: Autopilot vs Standard

Keep this picture in mind: almost every setting on this page configures one of these boxes — the control plane, a node pool, an IP range, the dataplane, the cluster boundary, or the identity link — or the wiring between two of them.

Hands-on lab

Create an Autopilot cluster (lowest-friction, pay-per-pod), deploy and expose an app, demonstrate Workload Identity, then clean up. Run this in Cloud Shell, where gcloud and kubectl are pre-installed and you are already authenticated. Autopilot bills per pod request; a tiny deployment for an hour is a rupee or two, and the cluster fee is the free first-cluster fee on many accounts — we delete everything at the end. (New accounts get the $300 free-trial credit; the GKE free tier also offsets one cluster’s management fee per month.)

Step 1 — Set the project and a region.

gcloud config set project "$(gcloud config get-value project)"
REGION=us-central1

Expected: Updated property [core/project].

Step 2 — Create an Autopilot cluster (a few minutes).

gcloud container clusters create-auto demo-auto \
  --region=$REGION --release-channel=regular

Expected: progress lines ending in Created and a RUNNING status with an endpoint IP.

Step 3 — Get credentials so kubectl targets the cluster.

gcloud container clusters get-credentials demo-auto --region=$REGION
kubectl get nodes

Expected: kubeconfig entry generated, then one or more nodes Ready (Autopilot provisions them as workloads land).

Step 4 — Deploy a sample app and expose it with an external L4 Service.

kubectl create deployment hello --image=us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
kubectl set resources deployment hello --requests=cpu=250m,memory=256Mi
kubectl expose deployment hello --type=LoadBalancer --port=80 --target-port=8080
kubectl get service hello -w

Expected: the Service shows <pending> then an EXTERNAL-IP. Press Ctrl-C, then curl http://EXTERNAL_IP returns Hello, world! with the version and hostname.

Step 5 — Prove Workload Identity (pod authenticates as a Google SA).

# Create a Google SA and allow the default KSA in 'default' to impersonate it
PROJECT=$(gcloud config get-value project)
gcloud iam service-accounts create wi-demo
gcloud iam service-accounts add-iam-policy-binding \
  wi-demo@$PROJECT.iam.gserviceaccount.com \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:$PROJECT.svc.id.goog[default/default]"
kubectl annotate serviceaccount default \
  iam.gke.io/gcp-service-account=wi-demo@$PROJECT.iam.gserviceaccount.com

# Run a Cloud SDK pod and check which identity it is
kubectl run wi-test --rm -it --restart=Never \
  --image=google/cloud-sdk:slim -- \
  gcloud auth list

Expected: the active account is wi-demo@$PROJECT.iam.gserviceaccount.com — the pod authenticated with no key file.

Validation. kubectl get deploy,svc shows hello available with an external IP; kubectl get nodes shows Autopilot-managed nodes; the gcloud auth list output inside the pod shows the federated Google SA.

Cleanup.

kubectl delete service hello
kubectl delete deployment hello
gcloud iam service-accounts delete wi-demo@$PROJECT.iam.gserviceaccount.com --quiet
gcloud container clusters delete demo-auto --region=$REGION --quiet

Cost note. Deleting the cluster stops the management fee and all pod billing. The external load balancer and its forwarding rule bill while they exist, so delete the Service before (or with) the cluster. With Autopilot you were charged only for the pod’s small CPU/memory request for the time it ran — typically a rupee or two for this lab.

Common mistakes & troubleshooting

Symptom	Likely cause	Fix
Pods stuck `Pending`, “Insufficient cpu/memory”	Standard pool too small / autoscaler off or at max	Enable/raise pool autoscaling or NAP; on Autopilot check requests vs quotas
Cannot grow pod IP space / nodes capped	Pod (alias) range or max-pods-per-node sized too small — and immutable	Plan ranges with headroom at creation; rebuild with a larger range
Private-cluster nodes can’t pull public images	No Cloud NAT / image not in Artifact Registry	Add Cloud NAT for egress; mirror images to Artifact Registry (Private Google Access covers Google)
Pod gets 403 calling a Google API	Workload Identity binding/annotation missing or wrong SA roles	Re-check the `workloadIdentityUser` binding, the KSA annotation, and the Google SA’s roles
`kubectl` times out connecting to API	Private endpoint + your IP not in authorized networks	Add your CIDR to control-plane authorized networks, or use the private endpoint from inside the VPC
Cluster auto-upgraded during peak traffic	No/loose maintenance window	Set a maintenance window and an exclusion around peak periods
Spot-pool pods evicted constantly	Spot reclamation under capacity pressure	Tolerate disruption (PDBs, replicas) and add an on-demand pool as fallback; set min-nodes appropriately
Tried to switch Standard → Autopilot in place	Mode is fixed at creation	Create a new Autopilot cluster and migrate workloads

Best practices

Default to Autopilot; pick Standard only for host-level DaemonSets, custom GPU drivers, Windows, or bin-packing you must own.
Regional clusters for production — HA control plane and zero-downtime upgrades.
Always use VPC-native and size the pod/service ranges with headroom (they’re immutable).
Enable Dataplane V2 and write default-deny NetworkPolicies, opening only required flows.
Make clusters private with authorized networks (or a private endpoint) and add Cloud NAT for egress.
Use Workload Identity everywhere; never mount service-account keys or rely on broad node scopes.
Subscribe to a release channel (Regular for most), set a maintenance window + exclusions, and keep auto-upgrade/auto-repair on.
On Standard, turn on cluster autoscaler (and consider NAP) with sane maxima; right-size machine types and set resource requests on every workload.
Give nodes a least-privilege service account; turn on Shielded nodes, and for high-assurance pipelines Binary Authorization.

Security notes

GKE’s defaults do a lot, but the responsibility line is real: Google secures and patches the control plane; you secure your workloads, IAM, network policy and image supply chain. The non-negotiables: Workload Identity (no keys), private nodes + restricted API access, node auto-upgrade (CVE patching), and Shielded nodes. Layer on NetworkPolicy for east-west segmentation, Binary Authorization to admit only trusted images, and the Security Posture dashboard to catch misconfigurations and vulnerable workloads. Audit access with Cloud Audit Logs, and keep the node service account and pod-level Google access least-privilege and separate — node scopes are for the kubelet, Workload Identity is for your pods.

Cost & sizing

The levers that move a GKE bill:

Mode. Autopilot bills pod requests (so accuracy of requests is everything); Standard bills node VMs (so utilisation/bin-packing is everything). Pick the model whose discipline your team can sustain.
Cluster management fee. A flat hourly fee per cluster (same for zonal/regional, Autopilot/Standard). The GKE free tier offsets the fee for one cluster per billing account per month — consolidate where reasonable rather than sprawling clusters.
Machine family (Standard). E2 is the cheap general-purpose default; T2D/N2D (AMD) often win on price/performance; C3 for compute-heavy; right-size to the workload’s CPU:RAM.
Spot VMs. Up to ~60–91% off for fault-tolerant, stateless or batch work — taint a Spot pool and tolerate disruption; keep an on-demand pool for the things that must not be evicted.
Autoscaling. Cluster autoscaler/NAP (Standard) and Autopilot’s implicit scaling cut idle spend; scale-to-zero pools (min-nodes=0) for spiky/batch workloads.
Committed-use discounts / Sustained-use. CUDs on Standard node VMs cut steady-state compute materially.
Networking. External load balancers, Cloud NAT and cross-zone/egress traffic all bill — delete LBs you don’t need and prefer container-native LB (NEGs) to avoid extra hops.

Interview & exam questions

What is the core difference between GKE Autopilot and Standard? Autopilot is pod-centric — Google runs and scales the nodes and you pay for pod resource requests; Standard is node-centric — you manage node pools and pay for node VMs. Autopilot enforces secure defaults and removes node access; Standard gives full node control and responsibility. Mode is fixed at creation.
Zonal vs regional cluster — what changes? A regional cluster runs three control-plane replicas across three zones, surviving a zone failure and upgrading with zero API downtime; a zonal cluster has a single-zone control plane that is unavailable during its zone’s outage or control-plane upgrade. On Standard regional, node counts are per-zone (1 → 3 nodes across 3 zones).
What does “VPC-native” mean and why does it matter? Pods get real IPs from a secondary (alias) range on the subnet, making pod traffic routable across the VPC, peered networks and on-prem with no extra routes, and enabling container-native load balancing (NEGs). The legacy alternative is routes-based; always choose VPC-native. Pod and service ranges are immutable, so size them with headroom.
Why is mounting a service-account key into a pod bad, and what replaces it? A mounted key is a long-lived secret that doesn’t rotate and leaks easily; it also can’t be scoped per pod easily. Workload Identity Federation for GKE replaces it: a Kubernetes SA federates to a Google SA and pods get short-lived, auto-rotated tokens with no keys.
Cluster autoscaler vs node auto-provisioning? The cluster autoscaler scales node counts within existing pools based on pending pods. Node auto-provisioning additionally creates and removes whole pools with shapes that fit pending pods — closer to Autopilot while keeping node access.
What is Dataplane V2 and what do you get from it? An eBPF/Cilium dataplane (default on new clusters) replacing kube-proxy/iptables and Calico. It brings scalable service handling, built-in NetworkPolicy enforcement and network-policy logging, and better observability.
How do you upgrade nodes safely with minimal disruption? Use surge upgrades (max-surge/max-unavailable) to add temporary nodes and drain-and-replace gradually, or blue-green node-pool upgrades for a parallel, reversible rollout. Pair with PodDisruptionBudgets, maintenance windows and exclusions. Regional control-plane upgrades have no API downtime.
What does a private cluster need to reach the internet and Google APIs? Private nodes have no public IP, so outbound internet needs Cloud NAT; access to Google APIs/Artifact Registry uses Private Google Access (on for GKE subnets). Restrict the control plane with authorized networks or a private endpoint.
What are taints and tolerations used for in node pools? A taint on a pool repels pods that lack a matching toleration, letting you dedicate pools (Spot, GPU, Windows) so only opted-in workloads schedule there.
Name three security controls you’d enable on a production GKE cluster and why. Workload Identity (keyless API access), private nodes + restricted API endpoint (smaller attack surface), and node auto-upgrade (CVE patching) — plus Shielded nodes, NetworkPolicy and Binary Authorization for defence in depth.
What do release channels do, and which would you pick? They subscribe the cluster to an auto-upgrade cadence (Rapid/Regular/Stable, plus Extended). Regular is the balanced default for most production; Stable for risk-averse; Rapid for test/early access. Static (no channel) is discouraged — versions reach end-of-life.
How is GKE billed, and what’s the cluster fee? A flat per-cluster management fee (same across zonal/regional, Autopilot/Standard) plus, underneath, pod requests (Autopilot) or node VM-hours (Standard). The free tier offsets one cluster’s fee per account per month.

Quick check

You need a production cluster that survives a single zone failing and upgrades without API downtime. What location type do you choose, and how many control-plane replicas does it have?
True or false: you can convert an existing Standard cluster to Autopilot in place.
A pod returns 403 calling Cloud Storage. You’re using Workload Identity. Name the three things to verify.
Why must you size the pod (alias) IP range carefully at creation rather than later?
You want batch jobs to run on cheap, interruptible capacity but never have your API pods evicted. How do you structure node pools?

Answers

Regional, with three control-plane replicas across three zones. It survives a zone outage and upgrades the control plane with zero API downtime.
False. Mode is fixed at creation; you create a new Autopilot cluster and migrate workloads.
The workloadIdentityUser IAM binding from the Google SA to the K8s SA member, the iam.gke.io/gcp-service-account annotation on the Kubernetes ServiceAccount, and the Google SA’s IAM roles (e.g. storage.objectViewer).
The pod and service ranges are immutable for the cluster’s life; the pod range plus max-pods-per-node cap how many pods and nodes you can ever run, so over-provision with headroom up front.
Run a Spot node pool (tainted, e.g. min-nodes=0) for the batch jobs (which tolerate the taint and disruption) and a separate on-demand pool for the API pods, so eviction of Spot capacity never touches them.

Exercise

In Cloud Shell, create a Standard regional cluster with --enable-ip-alias, --enable-dataplane-v2, --enable-private-nodes, --workload-pool, Shielded-node flags and --num-nodes=1 in us-central1 (note you get 3 nodes). Then: (a) add a second, autoscaling Spot node pool (--spot --enable-autoscaling --min-nodes=0 --max-nodes=4) tainted cloud.google.com/gke-spot=true:NoSchedule; (b) deploy an app with a matching toleration and confirm it lands on the Spot pool with kubectl get pods -o wide and kubectl get nodes -L cloud.google.com/gke-spot; © apply a default-deny NetworkPolicy in its namespace and verify cross-pod traffic is blocked; (d) bind a Workload Identity Google SA to the app’s KSA and prove access with gcloud auth list inside a pod; (e) clean up with gcloud container clusters delete. Bonus: enable node auto-provisioning with CPU/memory limits and observe the autoscaler create a pool for a pending pod.

Certification mapping

Associate Cloud Engineer (ACE) — Setting up and configuring a cloud solution / Deploying and implementing: create and configure GKE clusters and node pools (gcloud container clusters/node-pools create), choose Autopilot vs Standard and zonal vs regional, deploy and expose workloads (Deployments, Services, Ingress), configure autoscaling and node auto-repair/upgrade, and manage access with get-credentials. Networking (VPC-native, private clusters) and Workload Identity are squarely in scope.
Professional Cloud Architect (PCA) — Designing and planning a cloud solution architecture / Ensuring reliability / security & compliance: choosing the mode and location type for availability and cost, designing VPC-native networking and private clusters, planning release channels and maintenance windows, and applying Workload Identity, Binary Authorization, Shielded nodes and NetworkPolicy as part of a secure, reliable, cost-optimised platform.

Glossary

GKE — Google Kubernetes Engine, Google Cloud’s managed Kubernetes service.
Control plane — the managed Kubernetes brain (API server, scheduler, controllers, etcd); Google-operated in GKE.
Node — a Compute Engine VM that runs pods via the kubelet and containerd.
Node pool — a group of identical nodes managed together (Standard only).
Autopilot — hands-off mode: Google runs nodes; you pay for pod requests.
Standard — node-centric mode: you manage node pools; you pay for node VMs.
Zonal / regional cluster — control plane in one zone vs replicated across three zones in a region.
VPC-native — cluster where pods get real alias IPs from a subnet secondary range.
Pod / Service range — secondary ranges backing pod IPs and ClusterIP Services (immutable; size up front).
Max-pods-per-node — cap on pods per node; with the pod range, bounds cluster size (set at pool creation).
Dataplane V2 — GKE’s eBPF/Cilium dataplane with built-in NetworkPolicy and logging.
NetworkPolicy — Kubernetes pod-to-pod firewall rules (enforced by Dataplane V2).
Private cluster — cluster whose nodes have internal IPs only; API access via authorized networks or a private endpoint.
Authorized networks — CIDR allow-list for the cluster’s public API endpoint.
Cluster autoscaler — scales node counts within existing pools by pending-pod pressure.
Node auto-provisioning (NAP) — auto-creates/deletes whole node pools to fit pending pods.
Spot VM — deeply discounted, reclaimable node VM for fault-tolerant work.
Taint / toleration — a node-pool repellent and the pod opt-in that lets it schedule there.
Surge / blue-green upgrade — gradual drain-and-replace vs parallel reversible node-pool upgrade.
Workload Identity Federation for GKE — lets a Kubernetes SA act as a Google SA so pods call Google APIs keylessly.
Shielded GKE nodes — secure/measured boot and integrity monitoring for nodes.
Binary Authorization — admission control that admits only signed/attested images.
Release channel — Rapid/Regular/Stable/Extended auto-upgrade cadence for the cluster.
NEG (Network Endpoint Group) — pod-IP backends enabling container-native load balancing.
Cloud NAT — managed egress so private nodes can reach the internet.
Private Google Access — lets internal-only nodes reach Google APIs/Artifact Registry.

Next steps

You now know GKE end to end — both modes, the control-plane topology, node pools, networking, identity, security and cost. The natural follow-ons go deeper on running it well and on locking down identity: