Containerization Fundamentals

Kubernetes Services & Networking, In Depth: ClusterIP, NodePort, LoadBalancer, Headless & DNS

Pods are ephemeral and disposable. A Deployment kills and recreates them on every rollout, the autoscaler adds and removes them as load changes, and a node failure can wipe a whole batch in seconds. Every time a Pod is recreated it gets a brand-new IP address. So if your front-end tried to talk to your API by its Pod IP, it would break the moment that API Pod was rescheduled. You need something that does not move — a stable address that always points at “whatever healthy Pods currently back this app.” That something is a Service.

A Service is the load balancer and stable identity layer of Kubernetes. It gives a set of Pods a fixed virtual IP and a DNS name, watches which Pods are currently ready, and spreads traffic across them — all without you touching a single Pod IP. This lesson takes the Service apart completely: every type (ClusterIP, NodePort, LoadBalancer, ExternalName, and the special “headless” Service), every field on the spec, the Endpoints/EndpointSlices objects that track the backing Pods, the kube-proxy component that actually programs the load-balancing rules on every node, CoreDNS and how name resolution really works, and the flat pod network model (the CNI) that the whole thing sits on. It is long on purpose: by the end you should be able to answer almost any Service or cluster-networking question an interviewer or a CKA/CKAD exam can throw at you, and debug a broken Service from first principles.

Learning objectives

By the end of this lesson you can:

Prerequisites & where this fits

You need a working local cluster and basic kubectl comfort. If you have not set one up, do the lab in What Is Kubernetes? Control Plane, Nodes, etcd & the kubelet first — it walks you through a free local cluster with kind, minikube or k3d. It also helps to have met Pods and Deployments already: this lesson assumes you know that a Deployment owns a ReplicaSet which owns Pods, and that Pods carry labels that a selector can match. This is Lesson 4 of the Kubernetes Zero-to-Hero “deepening” track — it takes the Service you met briefly earlier and exhausts it, so that ingress, network policy and service mesh later all sit on solid ground.

Core concepts: the problem a Service solves

Start from the failure it prevents. Three things make raw Pod IPs unusable as an address:

  1. Pods are mortal. They are created and destroyed constantly — by rollouts, scaling, evictions, node failures. Each new Pod gets a new IP.
  2. There are many of them. A Deployment with replicas: 5 is five Pods on (perhaps) five different nodes. A client should not have to know all five, nor load-balance across them itself.
  3. Some are not ready. A Pod that is starting up, or failing its readiness probe, must not receive traffic, even though it exists.

A Service solves all three at once. It is an API object that:

The crucial mental model: a Service is not a process and not a proxy server sitting in the data path. There is no “Service pod.” The ClusterIP is a virtual IP that exists only as load-balancing rules programmed into the Linux kernel on every node by a component called kube-proxy. When a Pod sends a packet to a ClusterIP, the kernel on that Pod’s own node rewrites the destination to one of the real backing Pod IPs (DNAT) and sends it straight there. This is why Services are fast and have no single bottleneck: the “load balancer” is the kernel of whichever node the client happens to be on.

Three objects work together, and it pays to keep them straight:

Object What it is Who creates it
Service The stable identity: a virtual IP + DNS name + a selector + port mapping. You author this. You
EndpointSlice (and legacy Endpoints) The live list of IP:port of the Pods currently backing the Service. Auto-maintained. The EndpointSlice controller
kube-proxy The node agent that turns the Service + its EndpointSlices into kernel load-balancing rules on every node. Runs as a DaemonSet

You write the Service. The control plane keeps the EndpointSlices in sync with reality. kube-proxy programs the kernel. DNS gives you a name. That is the whole machine.

The Service types, end to end

Kubernetes has a handful of Service type values, and the important insight is that they stack: each higher type is the previous one plus a way to reach it from somewhere new. Headless is the odd one out — it removes the virtual IP entirely. Here is the comparison you should be able to reproduce, followed by a full treatment of each.

Type Gets a ClusterIP? Reachable from How it exposes Typical use
ClusterIP (default) Yes Inside the cluster only Virtual IP + DNS Internal services (API ↔ DB, microservice ↔ microservice)
NodePort Yes Inside, plus every node’s IP on a high port ClusterIP + a port (30000–32767) open on all nodes Dev/test, bare metal without a cloud LB, behind an external LB
LoadBalancer Yes Inside, plus a single external IP NodePort + a cloud/MetalLB load balancer in front Internet-facing service on a cloud (or with MetalLB on bare metal)
ExternalName No Inside (as a name) A CNAME to an external DNS name — no proxying Aliasing an external dependency (e.g. a managed DB) by an in-cluster name
Headless (clusterIP: None) No (none) Inside (per-Pod DNS) DNS returns the Pod IPs directly, no load balancing StatefulSets, client-side LB, service discovery of individual Pods

ClusterIP — the default, internal-only Service

ClusterIP is what you get if you do not set type. It allocates a virtual IP from the Service CIDR (a range carved out at cluster install, distinct from the Pod CIDR — e.g. 10.96.0.0/12 by default in kubeadm) and makes it reachable only from inside the cluster. This is the workhorse: 90% of Services in a typical cluster are ClusterIP, because most traffic is service-to-service inside the cluster.

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  type: ClusterIP            # the default; can be omitted
  selector:
    app: web                 # match Pods labelled app=web
  ports:
    - name: http             # name is mandatory once you have >1 port
      port: 80               # the port the Service listens on (the ClusterIP:80)
      targetPort: 8080       # the port on the Pod to forward to
      protocol: TCP          # TCP (default), UDP, or SCTP

Clients reach it at web (same namespace), web.<namespace> (cross-namespace), or the full web.<namespace>.svc.cluster.local — never by IP. More on those names in the CoreDNS section.

NodePort — expose on every node’s IP

A NodePort Service is a ClusterIP plus an extra trick: it opens the same high-numbered port on every node in the cluster, and traffic arriving on <any-node-IP>:<nodePort> is forwarded to the Service (and on to a backing Pod). The default range is 30000–32767; you can let Kubernetes pick one or pin it with nodePort:.

apiVersion: v1
kind: Service
metadata:
  name: web
spec:
  type: NodePort
  selector:
    app: web
  ports:
    - port: 80
      targetPort: 8080
      nodePort: 30080        # optional; if omitted, one is auto-assigned from 30000–32767

Key facts to internalise:

LoadBalancer — a real external IP from the cloud

type: LoadBalancer is NodePort plus an instruction to your environment: “please provision an external load balancer that forwards to these NodePorts.” On a cloud (EKS/AKS/GKE) the cloud-controller-manager sees the Service and provisions a cloud L4 load balancer (e.g. AWS NLB, Azure Load Balancer), then writes the LB’s public IP/hostname back into the Service’s status.loadBalancer.ingress. On bare metal you install something like MetalLB to play that role.

apiVersion: v1
kind: Service
metadata:
  name: web
  annotations:
    # cloud-specific knobs live in annotations, e.g. on AWS:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
    - port: 80
      targetPort: 8080
  # loadBalancerClass: service.k8s.aws/nlb   # pick a specific LB implementation
  # loadBalancerSourceRanges: ["203.0.113.0/24"]  # firewall the LB to these CIDRs

Important details:

ExternalName — a CNAME, with no proxying at all

type: ExternalName is the odd one: it has no selector, no ClusterIP, no Endpoints, and no kube-proxy involvement. It is purely a DNS alias: CoreDNS returns a CNAME to whatever you put in externalName.

apiVersion: v1
kind: Service
metadata:
  name: prod-db
  namespace: app
spec:
  type: ExternalName
  externalName: mydb.abc123.eu-west-1.rds.amazonaws.com   # an external DNS name

Now prod-db.app.svc.cluster.local resolves (via CNAME) to the RDS hostname. This lets your Pods refer to an external dependency by a stable in-cluster name — so you can swap dev/staging/prod databases by changing one Service, with no app config change. Gotchas: because it is a CNAME, the target must be a DNS name, not an IP; and since there is no proxying, TLS SNI and HTTP Host headers point at the real external name, which is usually what you want but occasionally surprises people. (If you need to alias a raw IP inside the cluster, use a Service without a selector plus a manual EndpointSlice instead — see the next section.)

Headless Service — DNS to the Pods, no virtual IP

Set clusterIP: None and you get a headless Service. It has no virtual IP and no load balancing. Instead, a DNS lookup of the Service name returns the A/AAAA records of all the ready backing Pods directly (one record per Pod). The client then connects to a Pod itself — doing its own selection, or connecting to a specific Pod.

apiVersion: v1
kind: Service
metadata:
  name: cassandra
spec:
  clusterIP: None            # <-- this makes it headless
  selector:
    app: cassandra
  ports:
    - port: 9042
      name: cql

You use headless Services when:

A subtle but exam-worthy point: a headless Service with a selector returns one A record per ready Pod; a headless Service without a selector returns whatever records you (or an operator) created via EndpointSlices — this is one way to alias external endpoints by IP.

The full Service spec, field by field

Beyond type, a Service has many fields. This is the matrix to know — every field, what it does, its values, the default, when you set it, and the gotcha.

Field What it does Values Default When to set Gotcha
type The exposure model ClusterIP / NodePort / LoadBalancer / ExternalName ClusterIP Whenever you need external reach Higher types include lower ones (all but ExternalName still get a ClusterIP)
selector Which Pods back this Service (by label) label map Almost always Omit it to manage Endpoints manually (e.g. alias an external IP)
ports[].port The port the Service listens on 1–65535 Always This is the consumer-facing port, not the Pod’s
ports[].targetPort The port on the Pod to forward to number or named port equals port When Pod port ≠ Service port Can be a name defined in the Pod’s containerPort — decouples the number
ports[].nodePort The node-wide port (NodePort/LB only) 30000–32767 auto-assigned Pin only if a firewall/client needs a fixed port Conflicts if two Services pin the same value
ports[].protocol L4 protocol TCP / UDP / SCTP TCP UDP for DNS/QUIC, etc. A Service can mix TCP and UDP ports only via separate entries
ports[].name Names a port DNS-label string Mandatory once there is >1 port Used by SRV records and by targetPort references
ports[].appProtocol Hints the L7 protocol e.g. http, https, grpc, kubernetes.io/h2c For LB/ingress that route by protocol A hint only; kube-proxy ignores it
clusterIP The virtual IP auto / a specific IP / None auto from Service CIDR None for headless; a fixed IP rarely Immutable after creation (except switching type appropriately)
clusterIPs Dual-stack list of cluster IPs up to 2 (IPv4+IPv6) derived Dual-stack clusters Order matters; tied to ipFamilies
ipFamilyPolicy Single vs dual stack SingleStack / PreferDualStack / RequireDualStack SingleStack Dual-stack clusters RequireDualStack fails if the cluster isn’t dual-stack
ipFamilies Which IP families [IPv4], [IPv6], or both cluster default Force a family/order Must be consistent with ipFamilyPolicy
sessionAffinity Sticky sessions None / ClientIP None When a client must hit the same Pod ClientIP stickiness is by source IP, not cookies (it’s L4)
sessionAffinityConfig.clientIP.timeoutSeconds Stickiness duration seconds 10800 (3h) Tune session length Resets per new connection within the window
externalTrafficPolicy How external (NodePort/LB) traffic is routed Cluster / Local Cluster Local to preserve client source IP Local drops traffic on nodes with no local Pod
internalTrafficPolicy How in-cluster traffic is routed Cluster / Local Cluster Keep traffic node-local (e.g. node-local DNS, logging) Local means clients on a node with no local Pod get nothing
publishNotReadyAddresses Include not-ready Pods in DNS/Endpoints true / false false StatefulSet peer discovery during startup Sends traffic to Pods that may not be ready
externalIPs Extra IPs the cluster will accept for this Service list of IPs Rare, manual ingress You must route those IPs to nodes yourself
loadBalancerClass Which LB implementation handles it string provider default Multiple LB controllers Only valid for type: LoadBalancer
loadBalancerSourceRanges Firewall the external LB CIDR list open Restrict who can hit the LB Provider support varies
allocateLoadBalancerNodePorts Whether a LB Service also gets NodePorts true / false true Set false to save NodePorts when the LB targets Pods directly Some LBs need the NodePorts; check your provider
healthCheckNodePort The port the external LB health-checks (with externalTrafficPolicy: Local) 30000–32767 auto Rarely pinned Only meaningful with Local

port vs targetPort vs nodePort — the three ports, untangled

This trips up nearly everyone, so make it concrete. Imagine a Pod whose container listens on 8080, fronted by a NodePort Service:

So one request to NodeIP:30080 → the node DNATs it to a backing Pod’s 8080; one request to ClusterIP:80 → DNAT to a Pod’s 8080. The numbers are independent on purpose.

sessionAffinity, traffic policies and topology — the routing knobs

Selectors → Endpoints → EndpointSlices

Here is the machinery that connects a Service to the actual Pods. When a Service has a selector, a controller continuously lists/watches Pods matching those labels, filters them to the ready ones (readiness probe passing, not terminating), and records their IP:port in EndpointSlices. kube-proxy watches those slices and programs the kernel. The flow is:

Service selector → matching, ready Pods → their IP:port recorded in EndpointSlices → kube-proxy turns those into kernel rules → traffic to the ClusterIP is DNAT’d to a backing Pod.

The legacy Endpoints object

Originally there was one Endpoints object per Service (same name as the Service), holding every backing address in a single object:

$ kubectl get endpoints web
NAME   ENDPOINTS                                       AGE
web    10.244.1.5:8080,10.244.2.7:8080,10.244.3.9:8080 5m

This worked but scaled badly. With, say, 5,000 Pods behind a Service, every Pod change rewrote the entire Endpoints object, and that whole object had to be pushed to every node’s kube-proxy and re-read — a storm of large updates that hammered the API server and etcd.

EndpointSlices — the scalable replacement

EndpointSlices (GA since v1.21, the default since well before v1.30) fix this by sharding the endpoint list into many smaller objects, up to 100 endpoints per slice by default. A Service with 5,000 endpoints has ~50 slices. When one Pod changes, only its slice is rewritten and pushed — a tiny, targeted update instead of a full rewrite. They also carry richer per-endpoint data that the old object could not: the endpoint’s zone and node (used for topology-aware routing), its readiness/serving/terminating conditions separately, and the hostname. Inspect them with:

$ kubectl get endpointslices -l kubernetes.io/service-name=web
NAME        ADDRESSTYPE   PORTS   ENDPOINTS                       AGE
web-abc12   IPv4          8080    10.244.1.5,10.244.2.7,...       5m

$ kubectl describe endpointslice web-abc12
# shows per-endpoint: Addresses, Conditions (Ready/Serving/Terminating),
# Topology (kubernetes.io/hostname, topology.kubernetes.io/zone), targetRef -> the Pod

Each slice is tied to its Service by the label kubernetes.io/service-name, and has an addressType of IPv4, IPv6, or FQDN. The legacy Endpoints object is still created in parallel for backward compatibility, but EndpointSlices are the source of truth kube-proxy uses today.

Two readiness-related conditions matter for graceful shutdown: a terminating Pod is marked serving: true, terminating: true for a window so that existing connections drain while no new traffic is routed to it. This is how Services avoid dropping in-flight requests during a rollout.

Services without selectors — manual endpoints

If you omit the selector, no controller manages the endpoints — you (or an operator) create an EndpointSlice by hand. This is how you point an in-cluster Service name at an external IP (a legacy database on a VM, say) so your Pods can use a stable Kubernetes name for it:

apiVersion: v1
kind: Service
metadata:
  name: legacy-db
spec:
  ports:
    - port: 5432
      targetPort: 5432
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: legacy-db-1
  labels:
    kubernetes.io/service-name: legacy-db   # ties this slice to the Service
addressType: IPv4
ports:
  - name: ""
    port: 5432
endpoints:
  - addresses: ["192.0.2.42"]               # the external server's IP

kube-proxy: how a virtual IP becomes real

The ClusterIP is virtual — nothing actually listens on it. kube-proxy, a DaemonSet running on every node, is what makes it work. It watches Services and EndpointSlices via the API server and programs the node’s kernel so that packets destined for a Service VIP are rewritten (DNAT) to one of the backing Pod IPs and delivered. Note kube-proxy is not in the data path of normal traffic in the common modes — it only installs the rules; the kernel does the actual rewriting per packet. Its modes:

Mode Mechanism Performance at scale Notes
iptables (default) Linear-ish chains of NAT rules; a random rule picks the backend Rule updates slow as Services grow (O(n) reprogramming); per-packet match is kernel-fast Simple, ubiquitous, the historical default; random backend selection
IPVS Kernel IP Virtual Server with hash tables Scales to thousands of Services with near-constant lookup; faster bulk updates Needs kernel IPVS modules; offers real LB algorithms (rr, lc, dh, sh, …)
nftables Newer kernel nftables backend (beta/maturing in recent releases) Much faster updates than iptables, modern data structures The intended long-term successor to the iptables backend
(no kube-proxy) eBPF dataplanes (e.g. Cilium) replace kube-proxy entirely Highest performance; handles Services in eBPF A CNI feature, not a kube-proxy mode — you run “kube-proxy-free”

Practical guidance: iptables mode is fine for most clusters (hundreds of Services). Switch to IPVS when you have thousands of Services/endpoints and notice control-plane churn or latency in rule programming, or when you want a specific load-balancing algorithm. nftables is where the project is heading; eBPF/Cilium is the high-end option that removes kube-proxy. In all cases the behaviour you write (Service types, ports, policies) is identical — only the data-plane implementation differs. One visible behavioural nuance: in iptables mode backend choice is effectively random per connection; IPVS gives you the algorithm you configure.

CoreDNS: service discovery in detail

A stable IP is only half the story — you address Services by name, and CoreDNS (the cluster DNS server, itself running as a Deployment in kube-system and fronted by a ClusterIP Service usually called kube-dns at a fixed IP like 10.96.0.10) is what resolves those names. Every Pod’s /etc/resolv.conf is wired to it by the kubelet.

The record types

For a normal (ClusterIP) Service web in namespace app:

For a headless Service (clusterIP: None):

For an ExternalName Service: a CNAME to the external name, as covered earlier. (Pods also get records — by default <pod-ip-with-dashes>.<ns>.pod.cluster.local — but Service records are what you use day to day.)

The search domains and ndots:5 — why short names work

Look inside any Pod:

$ kubectl exec -it mypod -- cat /etc/resolv.conf
nameserver 10.96.0.10
search app.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Two things make curl http://web work from inside app:

  1. The search list appends those suffixes in turn. A lookup of web is tried as web.app.svc.cluster.local, then web.svc.cluster.local, then web.cluster.local, until one resolves. That is why a bare web finds the Service in your own namespace, and web.other-namespace finds it in another.
  2. options ndots:5 says: if the name has fewer than 5 dots, try the search suffixes first (treat it as a likely in-cluster short name) before trying it as an absolute name. Service short names have few dots, so they get the search treatment — exactly what you want inside the cluster.

The famous ndots:5 gotcha: an external name like api.github.com has only 2 dots, so the resolver dutifully tries api.github.com.app.svc.cluster.local, api.github.com.svc.cluster.local, api.github.com.cluster.local — all of which fail — before finally querying api.github.com as-is. That is 4 extra useless DNS lookups on every external call, which can add latency and load. Fixes: use a fully-qualified external name with a trailing dot (api.github.com. — the dot makes it absolute, skipping the search list), deploy NodeLocal DNSCache, or tune ndots via the Pod’s dnsConfig. This is a very common interview question and a real production performance bug.

dnsPolicy and custom DNS

A Pod’s dnsPolicy controls how that resolv.conf is built: ClusterFirst (the default — cluster DNS first, then upstream for external names), Default (inherit the node’s resolv.conf — not cluster DNS), ClusterFirstWithHostNet (use cluster DNS even when the Pod uses host networking), and None (ignore defaults and supply everything via dnsConfig, where you can set custom nameservers, searches and ndots). CoreDNS itself is configured by the Corefile in a ConfigMap; the kubernetes plugin serves the cluster.local zone, and a forward plugin sends everything else to upstream resolvers.

The Kubernetes network model: the flat pod network

Services sit on top of a network model with a few non-negotiable rules that every conforming cluster must satisfy. Understanding them explains why Services work the way they do.

  1. Every Pod gets its own unique IP from a cluster-wide Pod CIDR (distinct from the Service CIDR and from node IPs).
  2. Every Pod can reach every other Pod directly, on any node, with no NAT — the source Pod sees the destination’s real Pod IP and vice versa. This is the “flat network.”
  3. Every node can reach every Pod (and the agents on a node, like the kubelet, can reach Pods on that node).
  4. The IP a Pod sees for itself is the same IP others use to reach it (no address translation in the middle).

This “IP-per-Pod, flat, NAT-free” model is deliberately simple: from an app’s point of view, a Pod is just a host on a big flat network, like a VM. There are no port-mapping games as in plain Docker — a container that listens on 8080 is reachable at podIP:8080 from anywhere in the cluster.

CNI — who actually provides this network

Kubernetes itself does not implement pod networking. It defines the Container Network Interface (CNI) and delegates to a CNI plugin that you install — Calico, Cilium, Flannel, Weave, or a cloud CNI (AWS VPC CNI, Azure CNI). When the kubelet starts a Pod, it calls the CNI plugin, which allocates the Pod’s IP, creates its network interface (a veth pair into the Pod’s network namespace), and wires up routing so the flat-network rules hold — using an overlay (VXLAN/Geneve encapsulation, e.g. Flannel) or native routing/BGP (e.g. Calico) or a cloud-native model where Pods get real VPC IPs (AWS VPC CNI). The CNI also typically implements NetworkPolicy (the pod-level firewall). For this lesson the key point is: kube-proxy programs Service load-balancing rules; the CNI provides the underlying flat Pod network they ride on. Different layers, different jobs.

The four communication paths

Putting it together, here are the paths a request can take and what handles each:

Path Mechanism
Container ↔ container in the same Pod localhost — they share one network namespace and IP
Pod ↔ Pod (any node) The flat CNI network — direct, by Pod IP, no NAT
Pod ↔ Service ClusterIP DNAT’d by kube-proxy (kernel) to a backing Pod; name resolved by CoreDNS
External ↔ Service NodePort, LoadBalancer, or Ingress/Gateway → NodePort → kube-proxy → Pod

Kubernetes Services & networking

The diagram traces a request from an external client through a LoadBalancer to a node’s NodePort, where kube-proxy DNATs it onto the flat Pod network to a ready endpoint — and, alongside, an in-cluster Pod resolving a Service name via CoreDNS and hitting the ClusterIP directly.

Hands-on lab

Everything here runs free on a local cluster (kind, minikube or k3d). We will create a Deployment, put each Service type in front of it, watch EndpointSlices update live, and prove DNS works — then clean up.

1. A cluster and a backing Deployment

# Create a multi-node kind cluster so NodePort/topology behave realistically
cat <<'EOF' | kind create cluster --name svc-lab --config -
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
  - role: worker
  - role: worker
EOF

# A simple web Deployment that serves on port 80, 3 replicas
kubectl create deployment web --image=nginx:1.27 --replicas=3
kubectl set resources deployment web --requests=cpu=50m,memory=32Mi
kubectl rollout status deployment/web
kubectl get pods -l app=web -o wide   # note each Pod's IP and node

2. ClusterIP + watch the EndpointSlices

kubectl expose deployment web --port=80 --target-port=80 --name=web   # creates a ClusterIP Service
kubectl get svc web
kubectl get endpointslices -l kubernetes.io/service-name=web
kubectl describe endpointslice -l kubernetes.io/service-name=web | grep -E 'Addresses|Conditions|Hostname|Zone' 

# Prove it resolves and load-balances from inside the cluster:
kubectl run client --image=nicolaka/netshoot --rm -it --restart=Never -- \
  sh -c 'for i in 1 2 3 4 5; do curl -s -o /dev/null -w "%{http_code}\n" http://web; done'
# Expected: 200 five times. Now scale and watch the slice change:

In a second terminal, watch the endpoints update live as you scale:

kubectl get endpointslices -l kubernetes.io/service-name=web -w &
kubectl scale deployment web --replicas=5     # slice gains 2 endpoints
kubectl scale deployment web --replicas=2     # slice loses 3 endpoints

3. DNS: the search path and ndots in action

kubectl run dnsdemo --image=nicolaka/netshoot --rm -it --restart=Never -- sh -c '
  cat /etc/resolv.conf;
  echo "--- short name (search list resolves it) ---";
  nslookup web;
  echo "--- FQDN ---";
  nslookup web.default.svc.cluster.local;
  echo "--- SRV record for the named port (expose used default port name) ---";
  nslookup -type=SRV _80._tcp.web.default.svc.cluster.local || true
'

4. NodePort

kubectl patch svc web -p '{"spec":{"type":"NodePort"}}'
kubectl get svc web -o wide          # note the 3xxxx nodePort
NODEPORT=$(kubectl get svc web -o jsonpath='{.spec.ports[0].nodePort}')
# With kind, reach a node via 'docker exec'; on minikube use 'minikube service web --url'
docker exec svc-lab-worker curl -s -o /dev/null -w "node-local hit: %{http_code}\n" localhost:$NODEPORT

5. Headless Service — DNS returns Pod IPs

kubectl create service clusterip web-hl --clusterip="None" --tcp=80:80 || \
  kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Service
metadata: { name: web-hl }
spec:
  clusterIP: None
  selector: { app: web }
  ports: [{ port: 80, targetPort: 80 }]
EOF
# A headless lookup returns MULTIPLE A records (one per ready Pod), not a single ClusterIP:
kubectl run dnsdemo2 --image=nicolaka/netshoot --rm -it --restart=Never -- \
  nslookup web-hl.default.svc.cluster.local

6. ExternalName — a CNAME alias

kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Service
metadata: { name: example-ext }
spec:
  type: ExternalName
  externalName: example.com
EOF
kubectl run dnsdemo3 --image=nicolaka/netshoot --rm -it --restart=Never -- \
  nslookup example-ext.default.svc.cluster.local      # resolves via CNAME to example.com

Validation

Cleanup

kubectl delete svc web web-hl example-ext
kubectl delete deployment web
kind delete cluster --name svc-lab

Cost note

Everything above is ₹0 — it runs entirely in local containers. The only thing that would cost money is type: LoadBalancer on a real cloud (each provisions a billable cloud load balancer); we deliberately demonstrate that type with YAML only, because a local cluster has no cloud LB to fulfil it (the Service would sit in <pending>).

Common mistakes & troubleshooting

Symptom Likely cause Fix
EndpointSlice has no endpoints; Service times out selector labels don’t match any Pod’s labels kubectl get pods --show-labels and compare to kubectl get svc <svc> -o yaml selector; align them
Endpoints exist but connections refuse targetPort doesn’t match the port the container actually listens on Confirm containerPort/the app’s real port; set targetPort to it
Service works sometimes, fails on some nodes externalTrafficPolicy: Local with Pods not on every node Use Cluster, or spread Pods (topology spread / DaemonSet) so every node has one
Backing Pods exist but get no traffic Pods are not ready (readiness probe failing) — only ready Pods are endpoints kubectl describe pod; fix the readiness probe / the app’s health
External calls are slow from Pods ndots:5 causing 4 extra failed lookups per external name Use FQDN with trailing dot, deploy NodeLocal DNSCache, or tune dnsConfig
type: LoadBalancer stuck in <pending> No cloud-controller / MetalLB to provision the LB Install MetalLB (bare metal) or run on a cloud; locally, use NodePort instead
Backend always loses the client IP Default Cluster policy SNATs external traffic Set externalTrafficPolicy: Local (accepting the empty-node trade-off)
Two NodePort Services clash Both pinned the same nodePort Let one auto-assign, or pick distinct values in 30000–32767
DNS resolves but to the wrong namespace Short name resolved via search path in the caller’s namespace Use svc.ns or the FQDN to be explicit

Best practices

Security notes

Interview & exam questions

  1. Why do Services exist — what problem do they solve? Pods are ephemeral and get new IPs on every recreation, there are many of them, and some are not ready. A Service provides a stable virtual IP and DNS name, continuously discovers the ready backing Pods via a label selector, and load-balances across them — decoupling clients from individual Pod lifecycles.

  2. Walk through what happens when a Pod sends a packet to a ClusterIP. The ClusterIP is virtual — nothing listens on it. On the sending Pod’s node, kube-proxy has programmed kernel rules (iptables/IPVS) that DNAT the packet’s destination from the ClusterIP to one of the ready backing Pod IPs (from the EndpointSlices), and the packet is delivered over the flat CNI network. No central proxy is involved.

  3. Explain port vs targetPort vs nodePort. port is the port the ClusterIP answers on; targetPort is the Pod’s port traffic is forwarded to (can be a named port); nodePort is the port opened on every node’s IP for NodePort/LoadBalancer types (30000–32767). They are independent numbers.

  4. What changed with EndpointSlices, and why? The old single Endpoints object held all addresses, so any one Pod change rewrote and re-pushed the whole object — a scaling disaster. EndpointSlices shard the list (≤100 endpoints each by default), so updates are small and targeted, and they carry per-endpoint zone/node/topology and readiness/serving/terminating conditions that enable topology-aware routing and graceful drain.

  5. Compare kube-proxy iptables and IPVS modes. iptables is the default and simple, but rule updates scale roughly linearly with the number of Services, so very large clusters see control-plane churn; backend choice is effectively random. IPVS uses kernel hash tables for near-constant lookups, scales to thousands of Services, updates faster, and offers real LB algorithms (rr, lc, sh, …). nftables is the emerging successor; eBPF CNIs like Cilium can replace kube-proxy entirely.

  6. What is a headless Service and when do you use one? A Service with clusterIP: Noneno virtual IP, no load balancing. DNS returns the Pod IPs directly (one A record per ready Pod). Used for StatefulSets (stable per-Pod DNS like db-0.db...), client-side load balancing (e.g. gRPC), and discovering individual backends.

  7. Explain the ndots:5 behaviour and the performance gotcha. ndots:5 tells the resolver to try the search suffixes first for any name with fewer than 5 dots. This makes short in-cluster names resolve nicely, but an external name like api.github.com (2 dots) triggers 4 failed cluster lookups before the real one — adding latency. Fix with a trailing-dot FQDN, NodeLocal DNSCache, or a custom ndots in dnsConfig.

  8. What does externalTrafficPolicy: Local do, and its trade-off? It makes a node forward external (NodePort/LB) traffic only to Pods on that same node, with no SNAT, so the backend sees the real client IP. The trade-off: nodes with no local Pod drop the traffic (the LB must health-check and avoid them), and load can be uneven.

  9. Difference between the Pod network and Services — who provides each? The CNI plugin (Calico/Cilium/Flannel/cloud) provides the flat, NAT-free Pod network (IP-per-Pod, Pod-to-Pod reachability). kube-proxy provides Service load balancing (VIP → backend DNAT) on top of that network. Different layers, different components.

  10. A new Deployment’s Service times out with no endpoints. How do you debug? Check that the Service selector matches the Pods’ labels (kubectl get pods --show-labels vs the Service’s selector); confirm the Pods are ready (only ready Pods become endpoints); verify targetPort matches the container’s actual port. Inspect kubectl get endpointslices -l kubernetes.io/service-name=<svc>.

  11. How do you give Pods a stable in-cluster name for an external database? Either an ExternalName Service (CNAME to the DB’s DNS name — no IPs), or a Service without a selector plus a manually-created EndpointSlice pointing at the DB’s IP when you must alias a raw address.

  12. What is topology-aware routing and why use it? With the service.kubernetes.io/topology-mode: Auto annotation, the control plane adds zone hints to EndpointSlices and kube-proxy prefers same-zone endpoints, reducing cross-zone latency and cloud egress charges — falling back to cluster-wide routing when the distribution is too imbalanced to be safe.

Quick check

  1. Which Service type has no ClusterIP and no proxying, returning only a CNAME?
  2. What is the default nodePort range, and on how many nodes is the port opened?
  3. Name two things an EndpointSlice records per endpoint that the legacy Endpoints object did not.
  4. Which component programs the kernel rules that make a ClusterIP work, and on which nodes?
  5. From a Pod in namespace app, what full name does web resolve to first, and why?

Answers

  1. ExternalName — it is a pure DNS CNAME to externalName, with no ClusterIP, selector, endpoints or kube-proxy involvement.
  2. 30000–32767, and the port is opened on every node in the cluster (even ones running none of the backing Pods).
  3. Any two of: the endpoint’s zone and node (topology), and its separate readiness/serving/terminating conditions (also the hostname) — used for topology-aware routing and graceful drain.
  4. kube-proxy, running as a DaemonSet on every node; it installs iptables/IPVS rules and the kernel does the per-packet DNAT.
  5. web.app.svc.cluster.local first, because the Pod’s resolv.conf lists app.svc.cluster.local first in its search path and ndots:5 makes the short name try the search suffixes before treating it as absolute.

Exercise

On your local lab cluster, build the whole picture and prove each layer:

  1. Create a 3-replica Deployment and a ClusterIP Service. From a netshoot Pod, curl the Service name 10 times and confirm 200s. Then kubectl get endpointslices -l kubernetes.io/service-name=<svc> -o yaml and annotate, for one endpoint, what its conditions, nodeName and zone mean.
  2. Scale the Deployment up and down while watching the EndpointSlice with -w. In one sentence, explain why this is cheaper than the old Endpoints object would have been at 5,000 replicas.
  3. Patch the Service to NodePort, then to externalTrafficPolicy: Local. Hit a node that runs a backing Pod and one that does not, and record what happens to each request — then explain the result.
  4. Reproduce the ndots gotcha: from a Pod, run nslookup api.github.com with +search-style tracing (or read /etc/resolv.conf and reason it through), count the failed lookups, then re-run with a trailing dot (api.github.com.) and compare.
  5. Create a headless Service over the same Deployment and a nslookup of its name. Explain, in two sentences, how the result differs from the ClusterIP Service and when you would want it.

Certification mapping

Glossary

Next steps

You can now give any workload a stable address and reason about every packet’s path. Next, learn how apps get their configuration and secrets injected — the other half of running a real service: Kubernetes ConfigMaps & Secrets, In Depth: Injection, Mounting, Immutability & Encryption. After that, the Ingress and Gateway API lesson builds directly on the LoadBalancer and Service foundations from here to do host/path routing and TLS at the cluster edge.

KubernetesServicesNetworkingCoreDNSkube-proxyEndpointSlices
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading