Kubernetes Storage, In Depth: Volumes, PV, PVC, StorageClass & Access Modes

Containers are deliberately forgetful. When a container restarts, its writable layer is thrown away and recreated from the image; when a Pod is rescheduled to another node, anything it wrote to the container filesystem is gone. That amnesia is exactly what makes stateless apps so easy to operate — but the moment you run a database, a message broker, a file upload service, or anything that must remember something across restarts, you have to give Kubernetes a way to attach storage that outlives the container. This lesson is the complete tour of how Kubernetes does that.

We will start with volumes — storage attached to a Pod — and work through every ephemeral type you will actually use: emptyDir, the config-injection volumes (configMap, secret, downwardAPI, projected), and the newer generic ephemeral volumes. Then we move to persistent storage, where the real depth lives: the PersistentVolume (PV) and PersistentVolumeClaim (PVC) pair, every field on each, the four access modes (RWO, ROX, RWX, RWOP), the three reclaim policies, volumeMode, and how a claim binds to a volume. From there we cover the StorageClass and dynamic provisioning — how Kubernetes creates disks on demand — the CSI (Container Storage Interface) model that drives all modern storage, and the day-two operations that matter: volume snapshots, cloning, and online resize. We finish with subPath and the link to StatefulSet volumeClaimTemplates. By the end you will understand every field you are likely to set, why it is there, and the gotcha that bites people who set it wrong.

Learning objectives

By the end of this lesson you can:

Explain the difference between ephemeral and persistent storage in Kubernetes, and pick the right volume type for a given need.
Use every common ephemeral volume — emptyDir (including medium: Memory), configMap, secret, downwardAPI, projected, and generic ephemeral volumes — and explain what each is for.
Describe the PV ↔ PVC relationship, the binding lifecycle, and every important field on both objects.
Choose the correct access mode (RWO / ROX / RWX / RWOP) and reclaim policy (Retain / Delete / Recycle) for a workload, and explain volumeMode: Filesystem vs Block.
Author a StorageClass and use dynamic provisioning, and explain volumeBindingMode, allowVolumeExpansion, reclaimPolicy, parameters, and allowedTopologies.
Explain the CSI model and perform snapshots, clones, and online resize.
Use subPath safely and wire persistent storage into a StatefulSet with volumeClaimTemplates.

Prerequisites & where this fits

You need a working local cluster and basic comfort with kubectl and YAML — if Pods and Deployments are still new, read Pods, ReplicaSets, Deployments & Services first, and make sure you have a free local cluster running per What Is Kubernetes?. It also helps (but is not required) to have seen ConfigMaps & Secrets, because two of the ephemeral volume types simply mount those objects. This is the storage foundation of the Kubernetes Zero-to-Hero course: every stateful lesson later on — StatefulSets, the Postgres operator, CSI snapshots at scale — assumes you know the material here cold. After this lesson you will move on to Ingress, controllers and TLS.

Core concepts: the storage mental model

Before the fields, fix four ideas in your head. They explain everything that follows.

1. A volume is mounted into a Pod, not into a container. You declare volumes in spec.volumes at the Pod level, then each container mounts the ones it needs via volumeMounts. This is precisely how containers in the same Pod share files: they mount the same volume.

2. There are two lifetimes — and that is the whole taxonomy. An ephemeral volume lives and dies with the Pod (some die with the container). A persistent volume lives independently of any Pod: delete the Pod, the data stays; a new Pod can mount it again. Almost every storage decision starts with “does this data need to survive the Pod?”

3. Persistent storage uses a claim-check pattern. Application authors do not want to know whether the cluster runs on AWS EBS, Google Persistent Disk, Ceph, or an NFS server. So Kubernetes splits the concern in two: the PersistentVolume (PV) is the actual piece of storage (the cluster/admin concern), and the PersistentVolumeClaim (PVC) is a request for storage (the app author’s concern). A Pod references a PVC by name; Kubernetes binds that claim to a suitable PV. It is the same idea as a coat-check: you hand over a claim ticket, the system finds your coat.

4. Modern storage is plugged in via CSI. Kubernetes itself does not know how to create an AWS disk or talk to NetApp. A CSI driver — a vendor-written plugin — does that. Kubernetes just calls a standard interface. Every “magic” you will see (a PVC turning into a real cloud disk in seconds) is a StorageClass pointing at a CSI provisioner.

Jargon check. Provisioning means creating the underlying storage. Static provisioning = an admin creates PVs by hand in advance. Dynamic provisioning = Kubernetes creates a PV automatically the moment a PVC asks for one, using a StorageClass. Dynamic is what you will use 95% of the time.

Ephemeral volumes: storage tied to the Pod

Ephemeral volumes need no PV or PVC — you declare them inline in the Pod spec and they exist only as long as the Pod (or, for some, the container) does. Here are all the ones you will use.

emptyDir — scratch space shared in a Pod

An emptyDir is created empty when the Pod is assigned to a node and exists as long as that Pod runs on that node. It is deleted permanently when the Pod is removed from the node. It is the canonical way for two containers in one Pod to share files, and for a single container to get scratch space.

apiVersion: v1
kind: Pod
metadata:
  name: scratch-demo
spec:
  containers:
    - name: writer
      image: busybox
      command: ["sh", "-c", "echo hello > /data/file && sleep 3600"]
      volumeMounts:
        - name: cache
          mountPath: /data
  volumes:
    - name: cache
      emptyDir:
        sizeLimit: 1Gi        # optional cap; eviction if exceeded
        medium: ""            # "" = node disk (default); "Memory" = tmpfs (RAM)

Field	What it does	Values	Default	Gotcha
`medium`	Where the dir is backed	`""` (node disk) or `"Memory"`	`""`	`Memory` is a tmpfs in RAM — fast, but counts against the container’s memory limit and is lost on reboot.
`sizeLimit`	Caps total size	quantity (e.g. `1Gi`)	unlimited	Exceeding it makes the Pod a candidate for eviction, not a hard write error.

Survives a container crash/restart within the same Pod (the data is at Pod level), but not a Pod reschedule. Use it for caches, scratch, and sharing between sidecars — never for data you must keep.

configMap and secret — injecting configuration as files

These mount the keys of a ConfigMap or Secret as files. Each top-level key becomes a filename; the value becomes the file contents. This is how you ship config files and credentials into a container without baking them into the image.

volumes:
  - name: app-config
    configMap:
      name: my-config        # the ConfigMap to mount
      defaultMode: 0644       # file permissions (octal); default 0644
      optional: false         # if true, Pod starts even when ConfigMap is missing
      items:                  # optional: project only specific keys, rename them
        - key: app.properties
          path: conf/app.properties   # relative to mountPath
  - name: app-secret
    secret:
      secretName: my-secret
      defaultMode: 0400       # secrets often 0400 (owner-read only)
      optional: true

Field	Applies to	What it does	Gotcha
`defaultMode`	both	Default permission bits for projected files (octal, e.g. `0644`)	When `fsGroup` or non-root users are involved, modes interact with ownership; secrets that scripts must read sometimes need `0444`/`0440`.
`items`	both	Select a subset of keys and rename/relocate them	If you use `items`, only the listed keys appear — keys you forget are silently absent.
`optional`	both	Pod may start even if the object is missing	Default is `false` for these in a volume — a missing ConfigMap/Secret blocks Pod start.

A subtle but important behaviour: mounted ConfigMaps and Secrets are updated in place when the source object changes (eventually — via kubelet sync, typically tens of seconds), except when mounted with subPath (covered later) or when the object is marked immutable. Values consumed as environment variables, by contrast, are not live-updated — only the volume form is.

downwardAPI — exposing Pod metadata as files

The downward API lets a container read information about itself — labels, annotations, the Pod name, namespace, resource limits — as files (or env vars). Useful when an app needs its own identity without calling the API server.

volumes:
  - name: podinfo
    downwardAPI:
      items:
        - path: "labels"
          fieldRef:
            fieldPath: metadata.labels
        - path: "cpu_limit"
          resourceFieldRef:
            containerName: app
            resource: limits.cpu
            divisor: "1m"

You can expose metadata.name, metadata.namespace, metadata.uid, metadata.labels, metadata.annotations via fieldRef, and requests/limits for cpu/memory/ephemeral-storage via resourceFieldRef. Labels and annotations exposed via a volume are updated live when they change; the same data exposed as env vars is fixed at start.

projected — combining several sources into one directory

A projected volume merges multiple sources — configMap, secret, downwardAPI, and serviceAccountToken — into a single directory. The killer feature is serviceAccountToken, which mounts a short-lived, audience-scoped, auto-rotated token for the Pod’s ServiceAccount. This is the modern, secure way Pods authenticate to the API server (and the pattern external secret stores build on).

volumes:
  - name: combined
    projected:
      defaultMode: 0420
      sources:
        - serviceAccountToken:
            path: token
            audience: vault           # who the token is for
            expirationSeconds: 3600   # min 600; kubelet rotates before expiry
        - configMap:
            name: app-config
        - secret:
            name: app-secret
        - downwardAPI:
            items:
              - path: "namespace"
                fieldRef:
                  fieldPath: metadata.namespace

Generic ephemeral volumes — per-Pod volumes with full storage features

Sometimes you want scratch space that is bigger than node disk allows, on a specific StorageClass, or even snapshottable — but still tied to the Pod lifetime. That is a generic ephemeral volume: it dynamically provisions a real volume (via a StorageClass and CSI driver) that is created when the Pod starts and deleted when the Pod is removed. It gives you the power of persistent storage with ephemeral semantics.

spec:
  containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "sleep 3600"]
      volumeMounts:
        - name: scratch
          mountPath: /scratch
  volumes:
    - name: scratch
      ephemeral:
        volumeClaimTemplate:
          metadata:
            labels: { type: scratch }
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: "fast-ssd"
            resources:
              requests:
                storage: 20Gi

Behind the scenes Kubernetes creates a PVC named <pod-name>-<volume-name>, owned by the Pod, so it is garbage-collected automatically when the Pod dies. Contrast with CSI ephemeral inline volumes (csi: in volumes), which let specialised drivers (e.g. secrets-store CSI) inject ephemeral data directly — those do not use a PVC at all and are driver-specific.

Container Storage Interface ephemeral note. There are two distinct “ephemeral + CSI” things: generic ephemeral volumes (above, use any provisioner, full PVC features) and CSI ephemeral inline volumes (driver-provided, lightweight, e.g. mounting secrets). Reach for generic ephemeral when you want normal storage that just happens to be Pod-scoped.

PersistentVolume and PersistentVolumeClaim

Now the core. A PersistentVolume (PV) is a cluster resource representing a real piece of storage; a PersistentVolumeClaim (PVC) is a namespaced request that binds to a PV. Pods reference the PVC, never the PV directly.

The PersistentVolume spec — every field

Here is a statically-defined PV (e.g. an admin wiring up an existing NFS export or a pre-created cloud disk). With dynamic provisioning you rarely write these by hand, but you must be able to read one.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-data
spec:
  capacity:
    storage: 100Gi                 # how much storage this PV offers
  volumeMode: Filesystem           # Filesystem (default) | Block
  accessModes:
    - ReadWriteOnce                # how it can be mounted (see access modes)
  persistentVolumeReclaimPolicy: Retain   # Retain | Delete | Recycle(deprecated)
  storageClassName: ""             # "" = no class; matched by PVCs asking for ""
  mountOptions:                    # passed to the mount command (driver-dependent)
    - hard
    - nfsvers=4.1
  nodeAffinity:                    # restrict which nodes can use it (local/zonal)
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: topology.kubernetes.io/zone
              operator: In
              values: ["eu-west-1a"]
  nfs:                             # the actual storage backend (one of many)
    server: 10.0.0.10
    path: /exports/data

Field	What it does	Values	Notes / gotcha
`capacity.storage`	Size the PV advertises	quantity (`100Gi`)	A PVC binds only if PV capacity ≥ the request. Statically, exact-fit is wise; dynamically, the PV is created at the requested size.
`volumeMode`	Filesystem vs raw block	`Filesystem` \| `Block`	`Block` exposes a raw device (no filesystem) for DBs that manage their own; mount via `volumeDevices`, not `volumeMounts`.
`accessModes`	How it may be mounted	RWO / ROX / RWX / RWOP	A list, but a PVC binds on a single matching mode; the backend must actually support it.
`persistentVolumeReclaimPolicy`	What happens to data when the PVC is deleted	`Retain` \| `Delete` \| `Recycle`	Dynamically-provisioned PVs inherit this from the StorageClass (default `Delete`).
`storageClassName`	Class this PV belongs to	string or `""`	Must match the PVC’s `storageClassName` for binding. `""` ≠ unset.
`mountOptions`	Extra mount flags	list	Not validated by Kubernetes; an invalid option fails the mount at attach time.
`nodeAffinity`	Which nodes can access it	node selector	Required for local and zonal volumes so the scheduler co-locates the Pod with the disk.
backend (`nfs`, `csi`, `hostPath`, …)	The actual storage source	one block	Exactly one. Modern PVs use `csi:`; `hostPath` is single-node/testing only; in-tree types like `awsElasticBlockStore` are deprecated in favour of CSI.

A PV moves through phases you will see in kubectl get pv: Available (free, unbound), Bound (matched to a PVC), Released (the PVC was deleted but the PV is not yet reclaimed — common with Retain), and Failed (automatic reclamation failed). A Released PV is not automatically reusable: with Retain you must manually scrub the data and clear spec.claimRef to make it Available again.

The PersistentVolumeClaim spec — every field

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-claim
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 20Gi          # minimum size required
    # limits:                # optional upper bound (rarely used)
    #   storage: 20Gi
  storageClassName: fast-ssd # which StorageClass to provision from
  selector:                  # optional: bind only to PVs with these labels
    matchLabels:
      tier: gold
  volumeName: pv-data        # optional: bind to a specific PV by name
  dataSource:                # optional: clone from a PVC or restore a snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
    name: nightly-snap

Field	What it does	Gotcha
`accessModes`	Modes the claim needs	Must be satisfiable by the bound PV / provisioner.
`volumeMode`	Filesystem or Block	Must match the PV’s mode to bind.
`resources.requests.storage`	Minimum capacity	The PVC may bind to a larger PV (static) or provision exactly this (dynamic). To grow later, edit this field — see resize.
`storageClassName`	Class to use	Omitting it uses the cluster’s default StorageClass; setting `""` explicitly disables dynamic provisioning (static binding only). These two are different!
`selector`	Label-match a specific PV	Only meaningful for static binding; ignored once a class dynamically provisions.
`volumeName`	Bind to one named PV	Pre-binding; the named PV must match modes/size or the claim stays `Pending`.
`dataSource` / `dataSourceRef`	Populate from a snapshot or clone a PVC	`dataSourceRef` is the newer, more general form (allows cross-namespace and custom populators).

A Pod consumes the claim like this:

spec:
  containers:
    - name: app
      image: postgres:16
      volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: data-claim     # reference the PVC by name
        readOnly: false

Binding: how a claim finds its volume

When you create a PVC, the control plane tries to bind it:

Dynamic path (the common one): the PVC names a StorageClass (or uses the default). The class’s provisioner creates a brand-new PV sized to the request and binds it. With volumeBindingMode: WaitForFirstConsumer, this is delayed until a Pod using the PVC is scheduled, so the volume lands in the right zone/node.
Static path: no provisioner. Kubernetes searches existing Available PVs for one whose storageClassName, accessModes, volumeMode, capacity (≥ request), and any selector/volumeName all match, and binds it.

Binding is one-to-one and exclusive — a bound PV serves exactly one PVC. If nothing matches, the PVC sits in Pending until a suitable PV appears (static) or provisioning succeeds (dynamic). kubectl describe pvc <name> shows the events that explain a stuck claim.

The “different storageClassName” trap, stated plainly. Omit storageClassName → use the default class (dynamic). Set it to a name → use that class (dynamic). Set it to "" (empty string) → no dynamic provisioning; bind only to a pre-created PV that also has "". Mixing these up is the number-one reason a PVC is unexpectedly Pending (or unexpectedly provisions a disk you did not want).

Access modes: RWO, ROX, RWX, RWOP

Access modes describe how many nodes can mount a volume and in what way. They are a property of capability — the backend must support the mode you ask for; asking for ReadWriteMany on a plain block device (EBS, GCE PD) will not work because block devices attach to one node at a time.

Mode	Short	Meaning	Typical backends	When to use
`ReadWriteOnce`	RWO	Read-write by one node (many Pods on that node may share it)	Cloud block disks (EBS, GCE PD, Azure Disk), Ceph RBD	Databases, single-writer apps — the default and most common.
`ReadOnlyMany`	ROX	Read-only by many nodes at once	NFS, CephFS, object-backed FS, pre-loaded disks	Shared read-only data (static assets, ML model artefacts).
`ReadWriteMany`	RWX	Read-write by many nodes at once	NFS, CephFS, Azure Files, EFS	Shared upload dirs, CMS media, anything multiple Pods on different nodes must write.
`ReadWriteOncePod`	RWOP	Read-write by exactly one Pod in the whole cluster	CSI drivers supporting it (k8s 1.27+ GA)	Strict single-writer guarantee — leader-only databases where even two Pods writing would corrupt data.

Two clarifications people miss. First, RWO is per-node, not per-Pod: several Pods scheduled to the same node can all mount one RWO volume — which is why RWOP exists when you need a true single-Pod lock. Second, the mode is enforced by the kubelet/CSI at attach/mount time, not by Kubernetes guessing — so the underlying storage genuinely has to support concurrency for RWX/ROX.

Reclaim policy and volumeMode

Reclaim policy decides what happens to the underlying storage when its PVC is deleted:

Policy	What happens on PVC deletion	Use when
`Delete`	The PV and the backing storage (cloud disk, etc.) are deleted	Default for dynamic provisioning; fine for reproducible data. Dangerous for anything you cannot lose.
`Retain`	The PV becomes Released; data is kept; an admin must manually reclaim	Production databases and anything where accidental PVC deletion must not destroy data.
`Recycle`	(Deprecated) basic scrub (`rm -rf`) then made Available	Do not use; gone in favour of dynamic provisioning.

For dynamically-provisioned volumes the policy comes from the StorageClass (reclaimPolicy), defaulting to Delete. A common production pattern is a StorageClass with reclaimPolicy: Retain for stateful data so a fat-fingered kubectl delete pvc does not wipe the disk.

volumeMode controls the abstraction the Pod sees:

Filesystem (default): Kubernetes formats and mounts a filesystem; the container sees a directory at mountPath.
Block: the container gets a raw block device (no filesystem) referenced via volumeDevices + devicePath. Databases that manage their own I/O layout (some configurations of Oracle, Cassandra, certain HPC workloads) want raw block for performance.

StorageClass and dynamic provisioning

A StorageClass is the template that turns a PVC into a real PV automatically. It names a provisioner (a CSI driver) and the parameters that driver needs (disk type, IOPS, filesystem, encryption keys), plus policy fields.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"   # makes this the default
provisioner: ebs.csi.aws.com          # the CSI driver that creates volumes
parameters:                            # driver-specific (NOT validated by k8s)
  type: gp3
  iops: "5000"
  throughput: "250"
  encrypted: "true"
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete                  # Delete (default) | Retain
volumeBindingMode: WaitForFirstConsumer  # Immediate | WaitForFirstConsumer
allowVolumeExpansion: true             # permit growing PVCs later
mountOptions:
  - noatime
allowedTopologies:                     # restrict where volumes are created
  - matchLabelExpressions:
      - key: topology.kubernetes.io/zone
        values: ["eu-west-1a", "eu-west-1b"]

Field	What it does	Values	Default	When to set / gotcha
`provisioner`	Which driver creates volumes	CSI driver name (e.g. `ebs.csi.aws.com`, `disk.csi.azure.com`, `pd.csi.storage.gke.io`) or `kubernetes.io/no-provisioner` for static-only	—	Must match an installed driver. `no-provisioner` is used for local volumes you create by hand.
`parameters`	Driver-specific options	key/values	none	Opaque to Kubernetes — typos are caught only when provisioning fails. Check the driver’s docs.
`reclaimPolicy`	Policy stamped onto provisioned PVs	`Delete` \| `Retain`	`Delete`	Use `Retain` for irreplaceable data.
`volumeBindingMode`	When binding/provisioning happens	`Immediate` \| `WaitForFirstConsumer`	`Immediate`	Use `WaitForFirstConsumer` for zonal block storage so the disk is created in the zone the Pod lands in — otherwise Pods become unschedulable across zones.
`allowVolumeExpansion`	Whether PVCs can grow	`true` \| `false`	`false`	Must be `true` before you try to resize; you cannot retro-enable expansion on an already-bound volume by editing the class alone for some drivers — set it up front.
`mountOptions`	Mount flags for provisioned PVs	list	none	Driver/filesystem dependent.
`allowedTopologies`	Constrain placement	topology selector	none	Pin volumes to specific zones; pairs with `WaitForFirstConsumer`.

Setting the default class. A cluster can have at most one StorageClass annotated storageclass.kubernetes.io/is-default-class: "true". PVCs that omit storageClassName get it. If two are marked default, the newest wins and you get a warning — a classic source of “why did my PVC use the wrong disk type” confusion.

Immediate vs WaitForFirstConsumer, the single most important storage tuning knob. Immediate provisions the volume the instant the PVC is created — before any Pod is scheduled — so Kubernetes has to guess the zone, and then the scheduler must place the Pod where that disk already is. In a multi-zone cluster this routinely strands Pods (the disk is in zone A, but the only spare capacity is in zone B). WaitForFirstConsumer delays provisioning until a Pod consumes the claim, so the volume is cut in the same zone the scheduler chose. For any zonal block storage, set it.

The CSI model

The Container Storage Interface (CSI) is the standard that lets storage vendors write a single driver that works across container orchestrators. Since the in-tree volume plugins were deprecated and migrated (the “CSI migration” effort), CSI is how essentially all real storage works in modern Kubernetes.

A CSI driver typically ships as:

A controller plugin (a Deployment) that handles cluster-wide operations: CreateVolume, DeleteVolume, ControllerPublishVolume (attach), snapshots, expand. It runs alongside Kubernetes sidecar containers the community provides — external-provisioner (watches PVCs → calls CreateVolume), external-attacher, external-resizer, external-snapshotter.
A node plugin (a DaemonSet) that handles node-local operations: NodeStageVolume / NodePublishVolume (mount the device into the Pod), registered with the kubelet via the node-driver-registrar.

You interact with all of this indirectly: you install the driver (often a Helm chart), you create a StorageClass with provisioner: <driver-name>, and from then on you only ever touch PVCs. The CSIDriver object advertises the driver’s capabilities (does it support fsGroup? volume expansion? ephemeral inline?), and CSINode objects track which drivers each node runs. You almost never edit these — but knowing they exist explains how a PVC becomes a disk.

Volume snapshots, cloning, and online resize

These are the day-two operations that separate “I can create a PVC” from “I can operate stateful workloads.”

Snapshots

A VolumeSnapshot is a point-in-time copy of a PVC, created through CSI. It needs a VolumeSnapshotClass (analogous to a StorageClass, naming the CSI driver and its snapshot parameters) and the snapshot CRDs + the external-snapshotter controller installed.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-snapclass
driver: ebs.csi.aws.com
deletionPolicy: Delete          # Delete | Retain (mirrors reclaim policy)
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: nightly-snap
spec:
  volumeSnapshotClassName: csi-snapclass
  source:
    persistentVolumeClaimName: data-claim   # the PVC to snapshot

A VolumeSnapshot (namespaced, the user’s request) binds to a VolumeSnapshotContent (cluster-scoped, the actual snapshot) — exactly mirroring the PVC↔PV pattern. You then restore by creating a new PVC with dataSource pointing at the snapshot (shown earlier).

Cloning

A clone creates a new, independent PVC pre-populated from an existing PVC (no snapshot needed), if the CSI driver supports it. Same dataSource mechanism, kind: PersistentVolumeClaim:

spec:
  storageClassName: fast-ssd
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 20Gi
  dataSource:
    kind: PersistentVolumeClaim
    name: data-claim          # source PVC to clone

The clone must use the same StorageClass and (usually) be ≥ the source size. Great for spinning up a copy of production data for a test environment.

Online resize (volume expansion)

To grow a volume, the StorageClass must have allowVolumeExpansion: true. Then you simply increase spec.resources.requests.storage on the PVC and apply. With modern CSI drivers the disk expands and the filesystem grows online — no Pod restart needed. You can only ever grow, never shrink. If a filesystem expansion needs the node, the PVC carries a FileSystemResizePending condition until the Pod next mounts it.

kubectl patch pvc data-claim -p '{"spec":{"resources":{"requests":{"storage":"40Gi"}}}}'

subPath: mounting one file or sub-directory

By default a volume mount replaces the entire contents of mountPath. subPath lets you mount just one sub-directory or file of a volume into a path, leaving the rest of the container’s directory intact. The classic use is mounting a single config file into /etc without hiding everything else there, or giving two containers different sub-directories of one shared volume.

volumeMounts:
  - name: app-config
    mountPath: /etc/myapp/app.conf
    subPath: app.conf              # mount only this key/file
  - name: data
    mountPath: /var/lib/db
    subPath: postgres             # mount the "postgres" sub-dir of the volume

subPathExpr is a variant that lets you build the sub-path from environment variables (e.g. per-Pod directories using the downward-API Pod name).

The big subPath gotcha. A volume mounted with subPath does NOT receive live updates when the source ConfigMap/Secret changes — unlike a normal mount. If you rely on hot-reloading config, mount the whole volume (no subPath) and point your app at the specific file, or restart the Pod on config change. Historically subPath also had CVEs around symlink traversal; keep your kubelet patched.

StatefulSets and volumeClaimTemplates

A Deployment’s Pods are interchangeable, so they cannot each own a stable, individual disk. A StatefulSet can — via volumeClaimTemplates. Each replica (web-0, web-1, …) gets its own PVC, created from the template, with a stable name (<claim>-<statefulset>-<ordinal>), and that PVC follows the Pod across reschedules. This is how you run replicated databases where each member keeps its own data.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
spec:
  serviceName: web
  replicas: 3
  selector:
    matchLabels: { app: web }
  template:
    metadata:
      labels: { app: web }
    spec:
      containers:
        - name: nginx
          image: nginx
          volumeMounts:
            - name: data
              mountPath: /usr/share/nginx/html
  volumeClaimTemplates:            # each Pod gets its own PVC from this
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 1Gi

Two behaviours to remember. First, the per-Pod PVCs are not deleted when you scale down or delete the StatefulSet by default — so scaling 3→1 then 1→3 re-attaches the same data to web-1 and web-2. (The newer persistentVolumeClaimRetentionPolicy field can opt into deletion on scale-down/delete if you want that.) Second, pair volumeClaimTemplates with a StorageClass using WaitForFirstConsumer so each replica’s disk is provisioned in the zone its Pod is scheduled to.

Kubernetes storage: volumes, PV, PVC, StorageClass

The diagram traces the full path: a Pod mounts a PVC, which binds to a PV; for dynamic provisioning the StorageClass drives a CSI provisioner that creates the real disk on demand, while ephemeral volumes (top) live and die with the Pod.

Hands-on lab

Everything below runs on a free local cluster. We will use kind, whose default StorageClass (standard, backed by the local-path-provisioner) supports dynamic provisioning — so you can practise PV/PVC/StorageClass mechanics without any cloud account.

1. Create a cluster and confirm the default StorageClass.

kind create cluster --name storage-lab
kubectl get storageclass
# NAME                 PROVISIONER             ... DEFAULT
# standard (default)   rancher.io/local-path  ... true

2. emptyDir sharing between two containers in one Pod.

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata: { name: shared }
spec:
  containers:
    - name: writer
      image: busybox
      command: ["sh","-c","echo 'from writer' > /shared/msg && sleep 3600"]
      volumeMounts: [{ name: scratch, mountPath: /shared }]
    - name: reader
      image: busybox
      command: ["sh","-c","sleep 3600"]
      volumeMounts: [{ name: scratch, mountPath: /shared }]
  volumes:
    - name: scratch
      emptyDir: {}
EOF
kubectl wait --for=condition=Ready pod/shared --timeout=60s
kubectl exec shared -c reader -- cat /shared/msg     # -> from writer

The reader sees the writer’s file: that is volume sharing inside a Pod.

3. Dynamic provisioning — a PVC that becomes a PV automatically.

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata: { name: data-claim }
spec:
  accessModes: ["ReadWriteOnce"]
  resources: { requests: { storage: 1Gi } }
  # no storageClassName -> uses the default class
EOF

kubectl get pvc data-claim          # may show Pending with WaitForFirstConsumer

With kind’s local-path class (which uses WaitForFirstConsumer), the PVC stays Pending until a Pod uses it — exactly the behaviour we discussed.

4. Consume the PVC from a Pod and watch it bind.

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata: { name: app }
spec:
  containers:
    - name: app
      image: busybox
      command: ["sh","-c","echo persisted-$(date +%s) > /data/state && sleep 3600"]
      volumeMounts: [{ name: data, mountPath: /data }]
  volumes:
    - name: data
      persistentVolumeClaim: { claimName: data-claim }
EOF

kubectl wait --for=condition=Ready pod/app --timeout=90s
kubectl get pvc data-claim          # now Bound
kubectl get pv                      # a PV was created automatically
kubectl exec app -- cat /data/state # shows your value

5. Prove persistence across a Pod delete.

VAL=$(kubectl exec app -- cat /data/state)
kubectl delete pod app
# recreate the same Pod spec (re-run the apply from step 4)
kubectl wait --for=condition=Ready pod/app --timeout=90s
kubectl exec app -- cat /data/state    # SAME value as $VAL -> data survived

The Pod was destroyed and recreated, yet the data is intact — because it lives on the PV, not in the Pod.

6. (Optional) Online resize. kind’s local-path class does not support expansion, so this is read-only learning: on a cloud cluster you would set allowVolumeExpansion: true on the class, then kubectl patch pvc data-claim -p '{"spec":{"resources":{"requests":{"storage":"2Gi"}}}}' and watch the volume grow without restarting the Pod.

Validation. You should have seen: an emptyDir shared between containers; a PVC go Pending → Bound; a PV created on demand; and data survive a Pod deletion. If your PVC is stuck Pending, run kubectl describe pvc data-claim and read the events.

Cleanup.

kubectl delete pod shared app --ignore-not-found
kubectl delete pvc data-claim --ignore-not-found
kind delete cluster --name storage-lab

Cost note. Entirely free — kind runs in Docker on your laptop and provisions volumes as directories on the node. On a real cloud, every dynamically-provisioned PVC with reclaimPolicy: Delete is a billable disk that disappears when you delete the PVC; PVCs with Retain keep billing until you delete the disk manually.

Common mistakes & troubleshooting

Symptom	Likely cause	Fix
PVC stuck `Pending` forever	No matching PV (static) / no default StorageClass / `storageClassName: ""` set by mistake	`kubectl describe pvc`; set a valid class or mark a default; remove the empty `""` if you meant dynamic.
Pod `Pending`, events say “node(s) had volume node affinity conflict”	Volume provisioned in one zone (`Immediate`), Pod must run elsewhere	Use `volumeBindingMode: WaitForFirstConsumer` on the StorageClass.
Pod `ContainerCreating`, “Multi-Attach error”	An RWO volume is being attached to a second node before the first detaches (e.g. fast reschedule)	Wait for detach; ensure only one node mounts RWO; consider RWOP or RWX if you truly need multi-node.
Resize edit ignored	StorageClass has `allowVolumeExpansion: false`, or driver lacks support, or you tried to shrink	Set `allowVolumeExpansion: true` (before creating); only grow, never shrink.
Mounted ConfigMap not updating in the container	Mounted with `subPath`, or marked immutable, or value used as env var	Mount without `subPath`; for env vars, restart the Pod.
Deleting a PVC wiped the data unexpectedly	StorageClass `reclaimPolicy: Delete` (the default)	Use `Retain` for important data; back up with snapshots.
`RWX` PVC won’t mount on a block-storage class	Block disks (EBS/GCE PD/Azure Disk) cannot do `ReadWriteMany`	Use a file/shared backend (NFS, EFS, Azure Files, CephFS).
`Released` PV won’t rebind	`Retain` policy leaves `claimRef` set	Scrub the data, then `kubectl edit pv` to clear `spec.claimRef`, returning it to `Available`.

Best practices

Default to dynamic provisioning with a well-chosen StorageClass; reserve static PVs for special cases (existing NFS exports, pre-seeded data, local NVMe).
Set volumeBindingMode: WaitForFirstConsumer on every zonal block-storage class. This one setting prevents the most common stateful scheduling failure.
Use reclaimPolicy: Retain for irreplaceable data (databases, single-source-of-truth stores), and rely on snapshots for backups rather than hoping a PVC is never deleted.
Enable allowVolumeExpansion: true up front so you can grow disks online later without recreating workloads.
Right-size requests — you pay for provisioned capacity, and you can grow but never shrink, so start modest.
Pick the narrowest access mode that works: RWO (or RWOP for strict single-writer) for databases, RWX only when multiple nodes genuinely must write.
Keep ephemeral and persistent clearly separated — never store durable data in emptyDir; use generic ephemeral volumes when you need big/feature-rich scratch.
Mount whole volumes (avoid subPath) when you need live config reloads.
Use StatefulSets with volumeClaimTemplates for any workload where each replica needs its own stable disk.

Security notes

Secret volumes are not encryption. A secret volume base64-decodes data onto a tmpfs in the Pod; protect the source with encryption at rest and RBAC, and prefer projected serviceAccountToken (short-lived, audience-scoped) over long-lived mounted tokens.
Restrict who can create PVs and StorageClasses. A PV can mount a hostPath from the node; a user who can create arbitrary PVs (or use a hostPath volume) can read host files and escalate. Gate PV/StorageClass creation behind admin RBAC and block hostPath with Pod Security Admission / a policy engine.
Use fsGroup and runAsNonRoot so volume files are owned by the right group and the container does not run as root on shared storage; set readOnlyRootFilesystem and mount only what needs to be writable.
subPath has a history of path-traversal CVEs — keep nodes patched.
Encrypt volumes at the storage layer (encrypted: "true" parameter / CMK keys) for any sensitive data, and scrub Retained volumes before reuse so the next claimant cannot read old data.
Be deliberate about reclaim policy: Delete on the wrong class can destroy data on PVC deletion; Retain can leak data if released volumes are reused without scrubbing.

Interview & exam questions

What is the difference between a PV and a PVC? A PV is the actual storage resource (admin/cluster concern); a PVC is a namespaced request for storage (app concern). Pods reference the PVC, which binds to a PV one-to-one.
Static vs dynamic provisioning? Static: an admin pre-creates PVs and PVCs bind to matching ones. Dynamic: a StorageClass + CSI provisioner creates a PV automatically when a PVC asks. Dynamic is the norm.
Explain the four access modes. RWO = read-write by one node; ROX = read-only by many nodes; RWX = read-write by many nodes; RWOP = read-write by exactly one Pod cluster-wide. RWO is per-node (multiple Pods on the same node can share), which is why RWOP exists for strict single-writer.
What does volumeBindingMode: WaitForFirstConsumer solve? It delays volume provisioning until a Pod is scheduled, so the volume is created in the same zone/node as the Pod — preventing unschedulable Pods when Immediate provisions in the wrong zone.
What are the reclaim policies and what do they do? Delete removes the PV and backing storage when the PVC is deleted (default for dynamic); Retain keeps the data and marks the PV Released for manual reclaim; Recycle is deprecated.
Difference between omitting storageClassName, setting it to a name, and setting it to ""? Omit → default class (dynamic); a name → that class (dynamic); "" → disable dynamic provisioning, bind only to a static PV that also has "".
What is CSI and why does it matter? The Container Storage Interface is the standard plugin API for storage; vendors ship one driver (controller Deployment + node DaemonSet + community sidecars) and Kubernetes drives it through StorageClasses. In-tree plugins are deprecated/migrated to CSI.
How do you grow a volume, and what are the constraints? Ensure the StorageClass has allowVolumeExpansion: true, then increase spec.resources.requests.storage on the PVC. Modern CSI does it online; you can only grow, never shrink.
Why and how do StatefulSets use volumeClaimTemplates? So each replica gets its own stable, individually-named PVC that follows the Pod across reschedules — essential for replicated databases. PVCs persist on scale-down by default.
When would you use volumeMode: Block? When the app wants a raw device with no filesystem (high-performance databases managing their own I/O); mounted via volumeDevices.
What is the subPath update gotcha? Volumes mounted with subPath do not receive live ConfigMap/Secret updates; mount the whole volume if you need hot reload.
What is a generic ephemeral volume vs an emptyDir? Both are Pod-lifetime, but a generic ephemeral volume is dynamically provisioned through a StorageClass/CSI (so it can be large, on specific media, snapshottable) whereas emptyDir is plain node disk or RAM.

Quick check

A Pod must keep data across rescheduling to another node. emptyDir or PVC?
Your StorageClass uses Immediate and Pods are unschedulable across zones. What one field do you change, and to what?
Three Pods on different nodes must all write to one shared volume. Which access mode, and which kind of backend?
You delete a PVC and its cloud disk vanishes. Which StorageClass field caused this, and what value would have preserved it?
You edit a mounted ConfigMap but the container never sees the change. Name two reasons.

Answers

PVC — emptyDir dies with the Pod; a PVC binds to a PV that persists independently.
volumeBindingMode: WaitForFirstConsumer on the StorageClass, so the volume is provisioned in the Pod’s zone.
RWX (ReadWriteMany) on a shared/file backend (NFS, CephFS, Azure Files, EFS) — block disks cannot do RWX.
reclaimPolicy: Delete (the default); Retain would have kept the disk.
It was mounted with subPath, the ConfigMap is immutable, or the value is consumed as an environment variable (env vars are not live-updated). (Any two.)

Exercise

On a local kind cluster, build a tiny stateful app end to end:

Create a StorageClass named lab-retain that uses the cluster’s local-path provisioner, with reclaimPolicy: Retain and volumeBindingMode: WaitForFirstConsumer.
Create a StatefulSet with 2 replicas and a volumeClaimTemplates entry using lab-retain, each Pod writing its own hostname into a file on its volume.
Confirm two distinct PVCs (...-0, ...-1) were created and bound, and that each Pod’s file contains its own ordinal name.
Delete the StatefulSet, observe that the PVCs and PVs remain (because they are not auto-deleted and the policy is Retain).
Recreate the StatefulSet and confirm each Pod re-attaches its original data.
Clean up: delete the StatefulSet, the leftover PVCs, then any Released PVs, then the cluster. Note in a sentence why the PVs needed manual deletion.

Certification mapping

CKA (Certified Kubernetes Administrator): the Storage domain directly — PVs, PVCs, StorageClasses, access modes, reclaim policies, dynamic vs static provisioning, and configuring applications with persistent storage. Expect tasks like “create a PVC of size X with class Y and mount it,” and “make a StorageClass the default.”
CKAD (Certified Kubernetes Application Developer): the Application Environment, Configuration and Security and Application Deployment areas — defining volumes (including configMap/secret/emptyDir/projected), requesting persistent storage with PVCs, and using subPath. You will write Pod/Deployment YAML that consumes claims under time pressure.
Cross-references: snapshots/clone/resize and topology depth are explored further in CSI volume snapshots, cloning, resize & topology; stateful operation patterns in the StatefulSet lessons.

Glossary

Volume: storage attached to a Pod and mounted into its containers; ephemeral or persistent.
emptyDir: a Pod-lifetime scratch volume on node disk (or RAM with medium: Memory).
PersistentVolume (PV): a cluster resource representing a real piece of storage.
PersistentVolumeClaim (PVC): a namespaced request for storage that binds to a PV.
Binding: the one-to-one association of a PVC to a PV.
StorageClass: a template naming a provisioner + parameters that enables dynamic provisioning.
Provisioner: the (CSI) driver that creates and deletes the underlying storage.
Dynamic provisioning: automatic PV creation when a PVC requests storage via a StorageClass.
Static provisioning: admin-created PVs that PVCs bind to.
Access mode: how a volume may be mounted — RWO, ROX, RWX, RWOP.
Reclaim policy: what happens to storage when its PVC is deleted — Retain, Delete, Recycle (deprecated).
volumeMode: Filesystem (a mounted directory) or Block (a raw device).
volumeBindingMode: when binding/provisioning occurs — Immediate or WaitForFirstConsumer.
CSI (Container Storage Interface): the standard plugin API for storage drivers.
VolumeSnapshot / VolumeSnapshotClass: a point-in-time copy of a PVC and its template.
Clone: a new PVC pre-populated from an existing PVC.
subPath: mounting a single file or sub-directory of a volume into a path.
volumeClaimTemplates: the StatefulSet field giving each replica its own stable PVC.
Generic ephemeral volume: a Pod-lifetime volume that is dynamically provisioned via a StorageClass.

Next steps

Next, learn how to expose your stateful and stateless apps to the outside world in Kubernetes Ingress, In Depth: Controllers, Rules, TLS, IngressClass & the Gateway API. To go deeper on the storage operations introduced here — snapshots across regions, cloning at scale, and topology-aware provisioning — read CSI Volume Snapshots, Cloning, Resize & Topology, and to put persistent storage to work in a real database, see the StatefulSet Postgres operator lesson.