Containerization Networking

Designing Zero-Trust Pod Networking: Default-Deny NetworkPolicies and Cilium L7-Aware Rules

A flat pod network is the default failure mode of every Kubernetes cluster. Out of the box any pod can reach any other pod, any node, the API server, and the cloud metadata endpoint — which means one compromised container is one curl away from lateral movement across your entire estate. Zero-trust pod networking inverts that: deny everything, then allow named, intended flows. This guide builds a default-deny posture with stock NetworkPolicy, then extends it with Cilium’s identity model and L7-aware rules so you can write policy in terms of what a workload may do, not which ephemeral IP it happens to have today.

1. How NetworkPolicy actually works: additive allow, and the default-allow trap

NetworkPolicy is the upstream API, but it has two properties that bite everyone exactly once.

First, rules are purely additive whitelists. There is no deny rule. A policy only ever adds permitted traffic; the effective allowance for a pod is the union of every policy that selects it. You restrict traffic not by writing denials but by writing a policy that selects a pod and permits nothing.

Second — the trap — a pod is “default-allow” until at least one policy selects it for a given direction. The moment a single NetworkPolicy with policyTypes: [Ingress] selects a pod, that pod’s ingress flips to default-deny and only the listed rules are allowed. Egress is independent: selecting a pod for ingress does nothing to its egress. Each direction is gated separately.

Pod state Ingress Egress
No policy selects it Allow all Allow all
Selected by an Ingress policy Only listed ingress rules Still allow all
Selected by an Egress policy Still allow all Only listed egress rules
Selected by both Both locked down Both locked down

The other detail people miss: NetworkPolicy is enforced by the CNI, not the API server. kubectl apply succeeds even if your CNI ignores the object entirely. Flannel, for instance, does not enforce policy at all — the YAML applies cleanly and does nothing. Confirm enforcement before you trust it.

Selectors resolve against pod labels, not names or IPs. An empty podSelector: {} selects every pod in the policy’s namespace — that is the lever you pull for a namespace-wide default-deny.

2. Roll out a namespace-scoped default-deny, safely

The safe ordering is: turn on default-deny for ingress first, verify nothing broke, then do egress — because egress lockdown breaks DNS, and a pod that cannot resolve names fails in confusing ways that look like application bugs.

Start with ingress only, scoped to one namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: payments
spec:
  podSelector: {}          # every pod in this namespace
  policyTypes:
    - Ingress
  # no ingress rules => deny all ingress

Apply it, then confirm your services still answer intended callers (they will not yet — you have not written allow rules — so do this in a non-prod namespace or pair it with the allow rules from the next step). Once ingress is understood, layer egress:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress
  namespace: payments
spec:
  podSelector: {}
  policyTypes:
    - Egress
  # no egress rules => deny all egress (including DNS!)

The instant this lands, every pod in payments loses DNS, loses the API server, loses everything. That is correct — but only acceptable if you apply the allow-list from step 3 in the same change. Treat default-deny-egress and its DNS allow as an atomic unit; never merge one without the other.

A useful discipline: keep these two deny policies in every namespace as a baseline, managed by your platform layer (a Kustomize base or a Helm chart applied to all tenant namespaces), and let application teams add only allow policies on top.

3. Allow DNS, kube-api, and metadata without opening the cluster

With egress denied, you must explicitly re-permit the handful of things almost every pod needs. The trick is to scope each one tightly.

DNS is the first casualty. CoreDNS runs in kube-system; allow egress to it on UDP/TCP 53. Because you need to select pods in another namespace, use namespaceSelector + podSelector together (an AND, not an OR, when in the same from/to element):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: payments
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns        # CoreDNS keeps this legacy label
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

The kubernetes.io/metadata.name label is set automatically by the API server on every namespace, so you can rely on it without labeling namespaces yourself.

The API server is reached via the in-cluster kubernetes.default.svc ClusterIP, which is a stable virtual IP but not a pod — so a podSelector cannot match it. Stock NetworkPolicy can only express this as an ipBlock to the service CIDR or the control-plane endpoint, which is brittle. This is the first concrete place plain NetworkPolicy runs out of road; Cilium has a first-class toEntities: [kube-apiserver] selector that solves it (step 5).

Cloud metadata (169.254.169.254) is the endpoint you most want to block by default — SSRF to it is how attackers steal node IAM credentials. With egress default-deny you get this for free: if no rule permits 169.254.169.254, it is denied. If a workload legitimately needs IMDS, allow it narrowly and prefer IMDSv2 / hop-limit hardening at the node level too:

  egress:
    - to:
        - ipBlock:
            cidr: 169.254.169.254/32
      ports:
        - protocol: TCP
          port: 80

ipBlock matches the post-SNAT source/destination as seen by the CNI. Pod-to-pod traffic is not expressed via ipBlock reliably because pod IPs are ephemeral — reserve ipBlock for genuinely external, stable CIDRs.

4. The Cilium identity model: labels over IPs, and why it survives churn

Everything above leans on IPs for anything non-pod, and that is the structural weakness of NetworkPolicy. Cilium replaces IP-based matching with security identities: Cilium hashes the set of security-relevant labels on a pod into a numeric identity, and the eBPF datapath enforces policy on identity, not IP.

The payoff is direct. When a Deployment rolls and 30 pods get 30 new IPs, their identity is unchanged because their labels are unchanged. No policy update, no datapath churn, no window where a new pod IP is briefly unmatched. Identity also makes policy readable — you allow app=frontend to talk to app=backend, and that sentence is the policy.

Inspect identities directly on any node:

# list identities and the label sets that define them
kubectl -n kube-system exec ds/cilium -- cilium identity list

# what identity does this endpoint (pod) have?
kubectl -n kube-system exec ds/cilium -- cilium endpoint list

Cilium also ships reserved identities for things that are not pods: reserved:host (the node itself), reserved:remote-node, reserved:world (anything outside the cluster), reserved:kube-apiserver, and reserved:health. These are how you express “the API server” or “the internet” without hardcoding IPs — exactly the gap stock NetworkPolicy left open.

Cilium enforces stock NetworkPolicy objects too, so your step 2-3 work is not wasted. CiliumNetworkPolicy (CNP) is a superset you reach for when you need identity entities, L7 rules, or FQDN matching.

5. Writing CiliumNetworkPolicy with L7 HTTP and DNS FQDN rules

Two CNP capabilities are why teams adopt Cilium: L7 HTTP filtering and DNS-based egress.

L7 HTTP: method and path enforcement

A normal policy can say “frontend may reach backend on 8080.” An L7 policy says “frontend may only GET /api/v1/products on backend.” Cilium transparently redirects the matched traffic through its per-node Envoy and enforces the HTTP rules:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: frontend-to-backend-l7
  namespace: shop
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: frontend
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
          rules:
            http:
              - method: "GET"
                path: "/api/v1/products"
              - method: "POST"
                path: "/api/v1/orders"

Anything outside those two method/path pairs is dropped at L7 with an HTTP 403 — the connection is allowed at L4 but the request is denied, which is a far better failure signal than a TCP reset. Note port is a string in CNP, a common gotcha versus the integer used in stock NetworkPolicy.

DNS-aware egress: allow by FQDN

For egress to external services, IPs are hopeless — api.stripe.com resolves to a rotating CDN range. Cilium solves this by observing DNS responses and pinning the returned IPs to an FQDN rule. You allow the DNS lookup and the destination by name:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: egress-to-stripe
  namespace: payments
spec:
  endpointSelector:
    matchLabels:
      app: checkout
  egress:
    # 1. permit DNS to kube-dns AND snoop the answers
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP
          rules:
            dns:
              - matchPattern: "*.stripe.com"
    # 2. permit egress to whatever those names resolved to
    - toFQDNs:
        - matchName: "api.stripe.com"
        - matchPattern: "*.stripe.com"
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP

The two blocks are both required: the dns proxy rule lets Cilium see the resolution and learn the IPs; toFQDNs then permits traffic to exactly those IPs for the rule’s TTL. Without the DNS visibility rule, toFQDNs has nothing to pin and the connection is denied.

FQDN policy enforces on the IPs Cilium observed in DNS answers. If a pod hardcodes an IP or uses its own resolver that bypasses the proxy, toFQDNs cannot match it. Force all DNS through CoreDNS and the Cilium DNS proxy, or the model leaks.

The API-server problem from step 3 also disappears here:

  egress:
    - toEntities:
        - kube-apiserver

6. Observe allowed and dropped flows with Hubble

You cannot write tight policy blind. Hubble is Cilium’s flow observability layer; it shows you the verdict (FORWARDED / DROPPED) and the reason for every flow, which turns policy debugging from guesswork into reading.

Enable it (if not already) and open the relay:

cilium hubble enable --ui            # one-time, via the cilium CLI
cilium hubble port-forward &          # exposes the relay locally

Then watch live, filtered to what you care about:

# every dropped flow in a namespace — your policy-gap finder
hubble observe --namespace payments --verdict DROPPED --follow

# why was a specific pod denied? show the policy verdict + L7
hubble observe --pod payments/checkout-7d9 --verdict DROPPED -o json

# confirm an L7 rule is doing what you think
hubble observe --protocol http --to-label app=backend --follow

A DROPPED flow with reason Policy denied and a source/destination identity tells you exactly which allow rule is missing. This is the loop: apply default-deny, drive real traffic, watch DROPPED, add the minimal allow, repeat until the drop stream is quiet except for genuine intrusions.

7. Cluster-wide policy, host firewall, and external CIDR egress

Per-namespace policy does not cover non-namespaced concerns. Cilium adds two cluster-scoped tools.

CiliumClusterwideNetworkPolicy (CCNP) has no namespace and applies across the cluster — ideal for platform-wide guardrails like “no workload may reach the metadata IP, ever,” which no tenant policy can override:

apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: deny-cloud-metadata
spec:
  endpointSelector: {}              # all endpoints, all namespaces
  egressDeny:
    - toCIDR:
        - 169.254.169.254/32

Note egressDeny — CNP/CCNP do support explicit deny rules (unlike stock NetworkPolicy), and deny takes precedence over any allow. This is how you write non-negotiable backstops.

The host firewall extends policy to the node itself. By default Cilium policies govern pod (endpoint) traffic; the node’s host network is separate. Enable hostFirewall and use a CCNP with nodeSelector to lock down what can reach node ports such as the kubelet (10250) or SSH:

apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: host-fw-lockdown
spec:
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: ""
  ingress:
    - fromEntities:
        - remote-node              # other cluster nodes
        - kube-apiserver
      toPorts:
        - ports:
            - port: "10250"
              protocol: TCP

For egress to external CIDRs, prefer toFQDNs when you can, and toCIDR / toCIDRSet (with except for carve-outs) when you are pinned to known IP ranges — a partner’s static VPN range, say:

  egress:
    - toCIDRSet:
        - cidr: 10.20.0.0/16
          except:
            - 10.20.5.0/24         # this subnet stays denied

Verify

Prove the posture rather than assume it.

# 1. A pod with NO allow rules cannot egress (should hang/fail):
kubectl -n payments run probe --image=nicolaka/netshoot --rm -it --restart=Never \
  -- curl -m 5 https://example.com ; echo "exit=$?"
# expect a timeout / non-zero exit

# 2. DNS works (you allowed it) but the destination is still denied at L4:
kubectl -n payments exec deploy/checkout -- nslookup api.stripe.com   # resolves
kubectl -n payments exec deploy/checkout -- curl -m 5 https://api.stripe.com   # allowed only if FQDN policy applied

# 3. L7 enforcement: the wrong path is 403, the right one is 200:
kubectl -n shop exec deploy/frontend -- curl -s -o /dev/null -w "%{http_code}\n" \
  http://backend:8080/api/v1/products      # 200
kubectl -n shop exec deploy/frontend -- curl -s -o /dev/null -w "%{http_code}\n" \
  http://backend:8080/admin                # 403

# 4. Watch the verdicts that explain all of the above:
hubble observe --namespace payments --verdict DROPPED --last 50

# 5. Validate a policy parses before you ship it:
kubectl apply --dry-run=server -f policy.yaml

If step 1 returns 200, your default-deny-egress is not in effect (or your CNI is not enforcing). If step 3’s /admin returns 200, L7 redirection is not active for that endpoint — check that the CNP toPorts.rules.http block actually selects the pod.

Enterprise scenario

A fintech platform team ran a 200-namespace, multi-tenant cluster on Cilium. A PCI audit required that the cardholder-data environment (CDE) namespaces could egress only to a named token-vault service and the payment processor, and that this could not be loosened by a tenant. Their first attempt used per-namespace CiliumNetworkPolicy with toFQDNs, but two findings broke it: (1) one tenant added a permissive egress policy in their own CDE namespace that widened the allow-list, and (2) an app used a baked-in IP for the processor, which toFQDNs could not match, so it silently relied on a leftover allow-all during a migration window.

The fix was a two-layer model. They moved the hard boundary into a CiliumClusterwideNetworkPolicy with an explicit egressDeny that no namespace policy can override, scoped by a cde=true label the platform applies to namespaces (tenants cannot relabel their own namespace — that label is managed by the platform’s admission policy). Tenants could still author allow rules, but never escape the deny. They also forced all DNS through the Cilium proxy and used Hubble’s DROPPED stream in CI to catch the hardcoded-IP app before promotion.

apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: cde-egress-lockdown
spec:
  endpointSelector:
    matchLabels:
      io.cilium.k8s.namespace.labels.cde: "true"
  # explicit deny wins over any tenant allow
  egressDeny:
    - toEntities:
        - world
  egress:
    - toFQDNs:
        - matchName: "vault.internal.example.com"
        - matchName: "api.processor.example.com"
      toPorts:
        - ports: [{ port: "443", protocol: TCP }]
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports: [{ port: "53", protocol: UDP }]
          rules:
            dns:
              - matchPattern: "*.internal.example.com"
              - matchPattern: "*.processor.example.com"

The egressDeny: toEntities: [world] is the backstop; the egress allows are the only permitted exits. Because the boundary lives in a cluster-wide object keyed off a platform-managed label, tenant changes cannot regress it — which is precisely what the auditor wanted to see in writing.

Checklist

kubernetesciliumnetwork-policyebpfzero-trust

Comments

Keep Reading