Kubernetes Monitoring, In Depth: metrics-server, Prometheus, Grafana & Alerting

A Kubernetes cluster is a fleet of moving parts — pods being scheduled and evicted, nodes filling up, Deployments rolling, autoscalers reacting — and you cannot operate what you cannot see. Monitoring is the sensory system of a cluster: it tells you whether the control plane is healthy, whether your workloads are getting the CPU and memory they asked for, whether users are actually being served, and — crucially — it is the source of truth that the Horizontal Pod Autoscaler reads to decide whether to add replicas. Get monitoring wrong and you are flying blind; get it right and the cluster becomes legible, debuggable and, increasingly, self-managing.

This lesson is a complete tour of the native Kubernetes monitoring stack, built bottom-up. We start with metrics-server — the small, in-cluster component that powers kubectl top and the HPA — and are careful to draw the line between it and a real monitoring system, because confusing the two is the single most common beginner mistake. We then build out Prometheus: its pull-based architecture, the four metric types (counter, gauge, histogram, summary), how scraping and service discovery work on Kubernetes, the PromQL query language, and the two exporters that turn a cluster into a rich metrics source — kube-state-metrics (object state) and node-exporter (machine metrics). We cover the Prometheus Operator and its ServiceMonitor/PodMonitor custom resources, which is how essentially everyone runs Prometheus on Kubernetes today. We add Grafana for dashboards and Alertmanager for turning firing rules into paged humans (with routing, grouping, inhibition and silences). Finally we step up a level to the methodology that separates noise from signal: the four golden signals, the USE and RED methods, and how to define SLIs and SLOs with error budgets so your alerts page you when users hurt — not when a graph wiggles.

By the end you will understand every component, every metric type, the query language, the Operator CRDs and the alerting pipeline well enough to stand up the stack, debug it, answer CKA-style questions about it and design alerts that a tired on-call engineer will thank you for.

Learning objectives

By the end of this lesson you will be able to:

Explain the difference between the metrics pipelines in Kubernetes — the resource metrics pipeline (metrics-server → Metrics API → kubectl top/HPA) versus a full monitoring pipeline (Prometheus), and pick the right one for a task.
Describe Prometheus’s pull model, scrape configuration and service discovery, and the four metric types (counter, gauge, histogram, summary) including when each is appropriate.
Write PromQL queries: instant and range vectors, selectors and matchers, rate()/irate()/increase(), aggregation operators, histogram_quantile(), and recording rules.
Deploy and reason about kube-state-metrics (cluster object state) and node-exporter (node-level OS metrics), and know which questions each answers.
Use the Prometheus Operator and its ServiceMonitor, PodMonitor, PrometheusRule and Alertmanager custom resources to manage monitoring declaratively.
Build Grafana dashboards backed by Prometheus, and configure Alertmanager routing, grouping, inhibition and silences.
Apply the four golden signals, USE and RED methods, and define SLIs/SLOs with error budgets to drive symptom-based alerting.

Prerequisites & where this fits

You should be comfortable with core Kubernetes objects (Pods, Deployments, Services, Namespaces) and basic kubectl, and ideally have read the autoscaling lesson, since the HPA is the most important consumer of metrics-server. You will want a local cluster (kind or minikube) and Helm v3 for the hands-on lab. No prior Prometheus knowledge is assumed — we define every term.

This lesson sits in the Operations module of the Kubernetes Zero-to-Hero course, immediately after the networking-internals lesson (you need to understand Services and DNS to follow scrape discovery) and before the worker-node-internals lesson. Monitoring is the foundation that day-2 operations, autoscaling, SLOs and incident response all build on, so it is worth studying carefully.

Core concepts: observability, the two metrics pipelines, and the metrics triad

Observability is the property of a system that lets you ask arbitrary questions about its internal state from the outside, using its outputs. The three classic pillars of observability are metrics (numeric time series — cheap, aggregatable, ideal for alerting and dashboards), logs (timestamped text records of discrete events — rich, high-cardinality, good for forensics) and traces (the path of a single request across services — essential for distributed debugging). This lesson is about metrics; logs and traces are covered in the SigNoz/OpenTelemetry lesson linked at the end. A complete observability strategy needs all three, but metrics are where you start because they are cheap, they power alerting, and they drive autoscaling.

The single most important mental model in Kubernetes monitoring is that there are two separate metrics pipelines, and they exist for different reasons.

Pipeline	Component	API it serves	Storage	Consumers	Purpose
Resource metrics	metrics-server	`metrics.k8s.io` (Metrics API)	In-memory, ~last value only	`kubectl top`, HPA, VPA, scheduler hints	Fast, lightweight CPU/RAM for autoscaling
Full / custom metrics	Prometheus (+ adapter)	`custom.metrics.k8s.io`, `external.metrics.k8s.io`, plus PromQL/HTTP	On-disk TSDB, weeks of history	Dashboards, alerting, HPA-on-custom-metrics	Rich, historical, queryable monitoring

The resource metrics pipeline is deliberately minimal: metrics-server scrapes only CPU and memory from each kubelet, keeps roughly the latest value in memory, and exposes it through the aggregated Metrics API (metrics.k8s.io). It has no history, no query language and no dashboards — it exists so that kubectl top is fast and so the HPA has a low-latency source. The full monitoring pipeline is Prometheus: it scrapes hundreds of metrics from many targets, stores them in a time-series database (TSDB) on disk for weeks, and exposes a powerful query language (PromQL). When people say “monitoring”, they almost always mean Prometheus; metrics-server is a tiny, single-purpose cousin. Confusing them — e.g. expecting kubectl top to show you yesterday’s memory, or expecting Prometheus to drive a basic CPU HPA without an adapter — is the classic beginner error.

A few more terms you will see throughout:

Exporter — a small process that exposes metrics about something else in Prometheus’s text format on an HTTP /metrics endpoint (e.g. node-exporter exposes Linux machine metrics; kube-state-metrics exposes Kubernetes object state).
Instrumentation — code inside your own application that exposes its own /metrics (via a Prometheus client library).
Scrape / target — Prometheus pulls metrics by making an HTTP GET to a target’s /metrics endpoint on a schedule.
Time series — a stream of timestamped values uniquely identified by a metric name plus a set of labels (key/value pairs). http_requests_total{method="GET", status="200"} and http_requests_total{method="POST", status="500"} are two distinct series.
Cardinality — the number of distinct time series. High-cardinality labels (user IDs, request IDs, raw URLs) blow up memory and are the number-one Prometheus performance footgun.

metrics-server: the resource metrics pipeline in full

metrics-server is a cluster add-on that collects resource usage (CPU and memory) from every node’s kubelet and exposes it through the Metrics API (metrics.k8s.io), registered into the API server via the API Aggregation Layer. It is what makes kubectl top nodes and kubectl top pods return numbers, and it is the default source for the HorizontalPodAutoscaler and VerticalPodAutoscaler.

How it works, end to end:

Each kubelet computes CPU and memory usage for the node and its pods (sourced from the container runtime via cAdvisor, exposed at the kubelet’s /metrics/resource endpoint).
metrics-server scrapes every kubelet on a short interval (default 15 seconds) over the kubelet’s secure port (10250).
It keeps the latest readings in memory only (no database, no history) and serves them through the aggregated metrics.k8s.io API.
kubectl top, the HPA controller and the VPA query that API.

Key facts and gotchas:

Aspect	Detail
Metrics collected	CPU and memory only — nothing else (no disk, no network, no custom metrics)
History	None — roughly the latest scrape; you cannot ask “what was memory an hour ago”
Scrape interval	`--metric-resolution`, default 15s; must be ≥ kubelet housekeeping interval
Transport	Scrapes kubelet over TLS on port 10250
HA	Run ≥2 replicas for availability; readings are reconciled, not aggregated
Not for	Monitoring, alerting, dashboards, capacity history — that is Prometheus’s job

The most common failure is metrics-server failing to scrape kubelets because of TLS: on kind/minikube and many self-managed clusters the kubelet serving certificate is self-signed and not in metrics-server’s trust store, so scrapes fail with x509 errors and kubectl top returns error: Metrics API not available. The well-known (and lab-only) fix is the flag --kubelet-insecure-tls; in production you instead enable proper kubelet serving certificates signed by the cluster CA (serverTLSBootstrap: true and an approver for the kubernetes.io/kubelet-serving CSRs). Other causes of “Metrics API not available”: metrics-server not installed at all, the aggregation layer not reaching the pod (network policy/firewall on port 4443/10250), or the pod crash-looping.

Because metrics-server feeds the HPA, its health is your autoscaler’s health: if kubectl top pods is broken, a CPU/memory HPA shows <unknown> in its TARGETS column and will not scale. Always verify metrics-server before debugging an HPA.

Prometheus: architecture and the pull model

Prometheus is an open-source monitoring system and time-series database, and the de-facto standard for Kubernetes. Its defining architectural choice is the pull model: Prometheus scrapes (HTTP GETs) a /metrics endpoint on each target on a fixed interval, rather than having targets push metrics to it. This is the opposite of many older systems (and of the StatsD/push style), and the trade-offs are worth understanding because they come up in interviews.

	Pull (Prometheus default)	Push (e.g. via Pushgateway)
Target discovery	Prometheus owns the target list (service discovery)	Targets must know the server address
Liveness signal	A failed scrape is itself a signal (`up == 0`)	A silent target looks identical to a healthy one
Short-lived jobs	Awkward (job may exit before a scrape) — use Pushgateway	Natural fit
Firewalls / NAT	Prometheus must reach targets	Targets reach Prometheus
Control	Central control of scrape rate, easy to fan out	Decentralised

The major components of the Prometheus ecosystem:

Prometheus server — does service discovery, scraping, rule evaluation and storage, and answers PromQL queries. Stores data in a local TSDB on disk.
Exporters — expose third-party system metrics in Prometheus format (node-exporter, kube-state-metrics, blackbox-exporter, database exporters, …).
Client libraries — instrument your own app to expose /metrics (Go, Java, Python, etc.).
Pushgateway — an intermediary that holds metrics from short-lived batch jobs so Prometheus can scrape them. Use sparingly; it breaks the liveness-via-scrape property.
Alertmanager — receives alerts fired by Prometheus rules and handles routing, grouping, dedup, silencing and delivery (email, Slack, PagerDuty, …).
Grafana — the visualisation layer (technically separate from Prometheus but near-universal alongside it).

Storage. Prometheus’s local TSDB writes incoming samples to a write-ahead log (WAL) and periodically compacts them into immutable two-hour blocks on disk, later merged into larger blocks. Retention is by time (--storage.tsdb.retention.time, default 15 days) and/or size. Local storage is not clustered or replicated — for long-term storage and global query you use remote write to a system like Thanos, Cortex, Mimir or VictoriaMetrics. For this lesson, single-server local storage is exactly right.

A scrape config tells Prometheus what to scrape and how. The classic static example:

global:
  scrape_interval: 15s        # how often to pull each target
  evaluation_interval: 15s    # how often to evaluate alerting/recording rules
scrape_configs:
  - job_name: "prometheus"    # the 'job' label applied to these targets
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "node"
    static_configs:
      - targets: ["10.0.0.1:9100", "10.0.0.2:9100"]

On Kubernetes you almost never use static_configs; you use kubernetes_sd_configs (service discovery), which queries the API server for node, pod, service, endpoints, endpointslice or ingress objects and turns them into targets. Relabeling (relabel_configs) then filters and rewrites those targets — keep only pods with a certain annotation, set the scrape port from a label, drop noisy targets, and so on. In modern setups the Prometheus Operator generates all of this for you from ServiceMonitor/PodMonitor objects (covered below), so you rarely hand-write kubernetes_sd_configs — but you should understand that it is what runs underneath.

The four metric types

Every Prometheus metric is one of four types. Picking the right type is fundamental — using a gauge where you need a counter (or vice versa) produces nonsense graphs.

Type	What it represents	Can it decrease?	Typical query	Example
Counter	A cumulative total that only ever increases (resets to 0 on restart)	No (except reset)	`rate(x[5m])`	`http_requests_total`, `container_cpu_usage_seconds_total`
Gauge	A value that can go up or down	Yes	use directly, `avg`, `max`	`node_memory_MemAvailable_bytes`, `kube_pod_status_phase`
Histogram	Observations bucketed into configurable ranges, plus `_sum` and `_count`	(buckets are counters)	`histogram_quantile()`	`http_request_duration_seconds_bucket`
Summary	Client-side computed quantiles, plus `_sum` and `_count`	(counters)	read the quantile series directly	`rpc_duration_seconds{quantile="0.99"}`

Counter. The workhorse. Counters only go up (a process restart resets to zero — Prometheus’s rate() detects and handles the reset). You almost never look at a counter’s raw value; you look at its rate of change. rate(http_requests_total[5m]) gives requests per second averaged over 5 minutes. By convention counters end in _total.

Gauge. A snapshot value that can rise and fall — current memory in use, current temperature, number of items in a queue, number of pods in Running. You graph gauges directly and aggregate them with avg/sum/max.

Histogram. The right tool for latency and response sizes. The application pre-defines buckets (e.g. ≤0.1s, ≤0.5s, ≤1s, …); each observation increments the counter for every bucket it falls into (buckets are cumulative — le = “less than or equal”). A histogram exposes three series families: <name>_bucket{le="..."}, <name>_sum and <name>_count. Crucially, quantiles are computed server-side at query time with histogram_quantile(), which means histograms are aggregatable across instances — you can compute a cluster-wide p99 by summing buckets across pods. The cost is choosing buckets up front, and bucket cardinality. (Newer native histograms remove the fixed-bucket limitation and are far more efficient, but classic bucketed histograms remain the common case.)

Summary. Also for latency/sizes, but quantiles are computed client-side over a sliding window and exposed directly as {quantile="0.5"}, {quantile="0.99"} series, alongside _sum and _count. The advantage is accurate per-instance quantiles with no bucket choice; the fatal limitation is that summary quantiles cannot be aggregated — you cannot average two pods’ p99s to get a meaningful cluster p99. For that reason, prefer histograms for anything you will aggregate across replicas (which on Kubernetes is almost everything). Use summaries only when you need an exact single-instance quantile and will never aggregate.

Interview-grade one-liner: Use a histogram when you need to aggregate latency percentiles across instances (the Kubernetes default); use a summary only for accurate single-instance quantiles you will never sum.

PromQL: querying the time series

PromQL (Prometheus Query Language) is how you slice the data. Master a handful of constructs and you can answer almost anything.

Selectors and matchers. The simplest query is a metric name, which returns an instant vector (one sample per matching series at the evaluation time):

http_requests_total

Filter with label matchers in braces — = equals, != not-equals, =~ regex-match, !~ regex-not-match:

http_requests_total{job="api", status=~"5.."}      # all 5xx from the api job

Append a range in square brackets to get a range vector (a window of samples per series), which is what rate-style functions consume:

http_requests_total{job="api"}[5m]

Rate functions turn counters into per-second rates:

Function	Use
`rate(c[5m])`	Per-second average over the window; smooth; the default for alerting/dashboards
`irate(c[5m])`	Instantaneous rate from the last two samples; spiky; for fast-moving graphs only
`increase(c[1h])`	Total increase over the window (`= rate * window`); good for “how many in the last hour”

Rule of thumb: use rate() for almost everything. Reach for irate() only for high-resolution graphs, never for alerts. Make the range at least 4× the scrape interval so a window always contains several samples.

Aggregation operators collapse many series into fewer, with by (keep these labels) or without (drop these labels):

sum(rate(http_requests_total[5m])) by (status)       # total RPS grouped by status code
avg(node_memory_MemAvailable_bytes) by (instance)    # avg available memory per node
topk(5, sum(rate(container_cpu_usage_seconds_total[5m])) by (pod))   # 5 hottest pods
count(kube_pod_status_phase{phase="Running"} == 1)   # number of running pods

Common aggregators: sum, avg, min, max, count, count_values, stddev, topk, bottomk, quantile, group.

Percentiles from histograms — the canonical latency query:

histogram_quantile(
  0.99,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le)
)

Note the pattern: rate() the buckets, sum ... by (le) to aggregate across instances while preserving the bucket boundary label, then histogram_quantile(). This is the most important PromQL snippet to memorise.

Binary operators and vector matching. Arithmetic (+ - * /), comparison (> < ==), and logical (and or unless) operators work between vectors, matching on identical label sets (use on(...)/ignoring(...) and group_left/group_right for many-to-one joins). An error-ratio SLI:

sum(rate(http_requests_total{status=~"5.."}[5m]))
  /
sum(rate(http_requests_total[5m]))

Recording rules precompute expensive or frequently-used expressions on the evaluation_interval and save them as new series, so dashboards and alerts read a cheap pre-aggregated metric:

groups:
  - name: api-slo
    interval: 30s
    rules:
      - record: job:http_request_errors:ratio_rate5m
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m]))
            / sum(rate(http_requests_total[5m]))

The naming convention level:metric:operation (e.g. job:http_request_errors:ratio_rate5m) signals the aggregation level. Alerting rules are similar but use alert:/expr:/for:/labels:/annotations: and feed Alertmanager (see below).

kube-state-metrics: the state of your cluster objects

By itself, Prometheus + node-exporter tells you about machines. To monitor Kubernetes objects — Deployments, Pods, DaemonSets, Jobs, PVCs, nodes-as-objects — you need kube-state-metrics (KSM). KSM is a service that listens to the Kubernetes API and exposes the current state of objects as Prometheus gauges, without modification, caching or opinion.

KSM answers “is the cluster in the state I declared?” questions:

kube_deployment_status_replicas_available vs kube_deployment_spec_replicas — are all desired replicas up?
kube_pod_status_phase{phase="Pending|Running|Failed"} — pod phase distribution.
kube_pod_container_status_restarts_total — crash-looping containers.
kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff|ImagePullBackOff"} — why a pod is stuck.
kube_node_status_condition{condition="Ready", status="true"} — node readiness.
kube_job_status_failed, kube_cronjob_next_schedule_time, kube_persistentvolumeclaim_status_phase, kube_resourcequota, kube_horizontalpodautoscaler_status_current_replicas, and hundreds more.

The critical distinction interviewers probe:

	metrics-server	kube-state-metrics	node-exporter
Source	kubelet (cAdvisor)	Kubernetes API objects	the host OS (`/proc`, `/sys`)
Tells you	resource usage (CPU/RAM)	object state (desired vs actual, phases, counts)	machine health (CPU, mem, disk, net, FS)
Exposes	Metrics API for HPA	Prometheus gauges	Prometheus gauges
Example question	“how much CPU is this pod using?”	“are all my replicas available?”	“is the node’s disk full?”

KSM does not report resource usage (that is metrics-server/cAdvisor) and it is not the same as metrics-server — it is a complement. A healthy Prometheus stack runs all three: node-exporter (machines), kube-state-metrics (objects), and metrics-server (HPA), with Prometheus also scraping cAdvisor-style container metrics from the kubelet directly.

node-exporter: machine-level metrics

node-exporter is the canonical Prometheus exporter for *nix machine metrics. It runs as a DaemonSet (one pod per node), reads the host’s /proc and /sys (mounted in), and exposes hardware/OS metrics on :9100/metrics. It is how you see what is happening underneath Kubernetes — the actual Linux box.

Key metric families:

CPU — node_cpu_seconds_total{mode="idle|user|system|iowait|..."} (a counter; rate it and subtract idle for utilisation).
Memory — node_memory_MemAvailable_bytes, node_memory_MemTotal_bytes (gauges).
Disk space — node_filesystem_avail_bytes, node_filesystem_size_bytes (the right metrics for “disk full” alerts — and for predicting the kubelet’s eviction).
Disk I/O — node_disk_read_bytes_total, node_disk_io_time_seconds_total.
Network — node_network_receive_bytes_total, node_network_transmit_bytes_total, errors and drops.
Load & uptime — node_load1/5/15, node_boot_time_seconds.

Example: node CPU utilisation as a percentage:

100 * (1 - avg by (instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])))

Because node-exporter exposes the host, it is the basis of the USE method (Utilisation, Saturation, Errors — see below) for nodes, and it is what tells you the cluster is about to start evicting pods because a node’s disk or memory is exhausted — something Kubernetes object metrics alone cannot reveal.

The Prometheus Operator: ServiceMonitor, PodMonitor and friends

Running Prometheus on Kubernetes by hand-editing prometheus.yml and reloading it does not scale — every new service means editing central config. The Prometheus Operator solves this with the operator pattern: it installs CustomResourceDefinitions and a controller that generates Prometheus’s configuration from Kubernetes objects, so monitoring becomes declarative and namespaced. This is how virtually everyone runs Prometheus on Kubernetes today, most commonly via the kube-prometheus-stack Helm chart (Operator + Prometheus + Alertmanager + Grafana + node-exporter + kube-state-metrics + a library of dashboards and alert rules, in one install).

The Operator’s custom resources:

CRD	Purpose
Prometheus	Declares a Prometheus instance (replicas, retention, storage, resources, which monitors/rules it selects)
Alertmanager	Declares an Alertmanager cluster
ServiceMonitor	Declares how to scrape a set of Services (selects Services by label, names the port, sets path/interval)
PodMonitor	Declares how to scrape Pods directly (no Service needed)
Probe	Declares blackbox/synthetic probes of ingresses or static targets
PrometheusRule	Declares recording and alerting rules as a Kubernetes object
AlertmanagerConfig	Namespaced routing/receivers, so teams own their own alert routing
ScrapeConfig	An escape hatch for raw scrape configs (e.g. external targets) the CRDs don’t cover

The key insight is the two-level selection: the Prometheus resource has a serviceMonitorSelector (a label selector) that picks which ServiceMonitors it honours; each ServiceMonitor in turn has a selector that picks which Services it scrapes. A ServiceMonitor that no Prometheus selects is silently ignored — the most common “my target isn’t showing up” cause.

A typical ServiceMonitor for an app whose Service exposes a metrics port:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: payments
  namespace: shop
  labels:
    release: kube-prometheus-stack   # must match the Prometheus serviceMonitorSelector
spec:
  selector:
    matchLabels:
      app: payments                  # selects Services with this label
  namespaceSelector:
    matchNames: ["shop"]
  endpoints:
    - port: metrics                  # the *named* port on the Service
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s

The two most common Operator mistakes: (1) the ServiceMonitor’s labels do not match the Prometheus instance’s serviceMonitorSelector (so Prometheus ignores it), and (2) the port field must be the Service port’s name, not a number. Use a PodMonitor when there is no Service (e.g. a headless workload or a DaemonSet). Check the Operator did its job in Prometheus’s UI under Status → Targets and Status → Configuration.

A PrometheusRule carries alerts and recording rules as a first-class object the Operator wires into Prometheus:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: payments-slo
  namespace: shop
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: payments.rules
      rules:
        - alert: PaymentsHighErrorRate
          expr: |
            sum(rate(http_requests_total{job="payments",status=~"5.."}[5m]))
              / sum(rate(http_requests_total{job="payments"}[5m])) > 0.02
          for: 10m
          labels: { severity: page }
          annotations:
            summary: "Payments 5xx error ratio above 2% for 10m"

Grafana: dashboards and visualisation

Grafana is the visualisation and dashboarding layer. It connects to one or more data sources (Prometheus being the canonical one, but also Loki for logs, Tempo for traces, and many databases), runs PromQL on your behalf, and renders panels (time series graphs, stat/gauge/bar panels, tables, heatmaps) on dashboards. It is technically independent of Prometheus, but the two are almost always deployed together (and kube-prometheus-stack bundles Grafana pre-wired with a large set of cluster dashboards).

What you should know:

Data sources — add Prometheus by URL; in-cluster that is typically http://kube-prometheus-stack-prometheus.monitoring:9090.
Panels and queries — each panel runs one or more PromQL queries; you can template the legend, set units (seconds, bytes, percent), thresholds and axis scales.
Variables / templating — dashboard-level dropdowns (e.g. $namespace, $pod) built from PromQL label_values(...) queries, so one dashboard works for any namespace. This is what makes a dashboard reusable.
Provisioning & dashboards-as-code — dashboards are JSON; in production you store them in Git and provision them (via ConfigMaps with the grafana_dashboard label, the Grafana Operator, or files), not by clicking in the UI, so they are reviewable and reproducible.
Importing community dashboards — grafana.com hosts thousands (e.g. the well-known Kubernetes cluster/namespace/node dashboards); import by ID and point them at your Prometheus.
Grafana alerting (optional) — Grafana has its own unified alerting engine that can alert on any data source; on Kubernetes most teams still alert via Prometheus rules → Alertmanager (covered next) and use Grafana purely for visualisation. Know that both options exist.

A dashboard is for exploration and situational awareness; it should not be your alerting mechanism — nobody is staring at a screen at 3am. Alerts come from Prometheus rules.

Alertmanager: routing, grouping, inhibition and silences

Prometheus fires alerts (a rule’s expr is true for its for duration), but Prometheus does not send notifications. It pushes firing alerts to Alertmanager, whose entire job is to turn a stream of alerts into the right notifications to the right people, without spamming them. The split is deliberate: Prometheus decides what is wrong; Alertmanager decides who to tell and how.

The alert lifecycle: a PrometheusRule’s expr becomes true → it is pending for the for duration (debounce) → it becomes firing and is pushed to Alertmanager → Alertmanager groups, dedups, routes, applies inhibition and silences, then notifies a receiver.

Alertmanager’s core features:

Feature	What it does
Routing tree	A `route` with nested `routes` matches alerts by label (`match`/`matchers`) and sends them to a receiver; supports per-route timing
Grouping	`group_by` bundles related alerts into one notification (e.g. all alerts for a cluster, or all instances of one alert) so a node failure is one page, not 50
Timing	`group_wait` (wait before first send, to batch), `group_interval` (wait before sending updates to a group), `repeat_interval` (how often to re-notify a still-firing alert)
Receivers	Integrations: email, Slack, PagerDuty, Opsgenie, webhook, etc.
Inhibition	One alert suppresses others (e.g. a `ClusterDown` alert inhibits all the per-service alerts it would obviously cause)
Silences	Time-bounded, label-matched mutes you create in the UI/API (e.g. during a planned maintenance window) — alerts still fire but are not notified
HA	Run Alertmanager as a gossiping cluster; it dedups so multiple Prometheis don’t double-page

A representative configuration:

route:
  receiver: "slack-default"
  group_by: ["alertname", "namespace"]   # batch by alert + namespace
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - matchers: ['severity="page"']       # page-severity goes to PagerDuty
      receiver: "pagerduty"
    - matchers: ['namespace="dev"']        # dev alerts go to a low-noise channel
      receiver: "slack-dev"

inhibit_rules:
  - source_matchers: ['severity="page"', 'alertname="ClusterDown"']
    target_matchers: ['severity=~"warning|page"']
    equal: ["cluster"]                     # cluster-down hides everything in that cluster

receivers:
  - name: "slack-default"
    slack_configs:
      - channel: "#alerts"
        api_url_file: /etc/alertmanager/secrets/slack-url   # never inline the webhook
  - name: "pagerduty"
    pagerduty_configs:
      - routing_key_file: /etc/alertmanager/secrets/pd-key
  - name: "slack-dev"
    slack_configs:
      - channel: "#alerts-dev"

Grouping is what keeps on-call humane: when a node dies and 40 pods go unready, group_by collapses that into a single notification. Inhibition is the next level — the NodeDown page suppresses the 40 downstream PodNotReady warnings entirely. Both should be configured before you enable real paging.

Kubernetes monitoring stack

The diagram above shows the full pipeline: node-exporter (per-node DaemonSet) and kube-state-metrics (cluster object state) and your instrumented apps expose /metrics; the Prometheus Operator turns ServiceMonitor/PodMonitor objects into scrape config; Prometheus pulls those targets into its TSDB; Grafana queries Prometheus for dashboards; Prometheus evaluates PrometheusRule alerts and pushes them to Alertmanager, which routes, groups and silences before notifying Slack/PagerDuty — while the separate, lightweight metrics-server pipeline feeds kubectl top and the HPA.

The four golden signals, USE and RED

Tools collect metrics; methodology decides which metrics matter and when to page. Three frameworks dominate, and they are complementary.

The four golden signals (from Google’s SRE book) — the signals to monitor for any user-facing service:

Signal	Question	Typical metric
Latency	How long do requests take? (split success vs error latency)	`histogram_quantile(0.99, ...)` on request duration
Traffic	How much demand?	`sum(rate(http_requests_total[5m]))`
Errors	What fraction is failing?	5xx ratio, exception rate
Saturation	How “full” is the service? (the constrained resource)	queue depth, CPU/mem near limit, thread-pool usage

The RED method (Tom Wilkie) — a request-centric specialisation for services, easy to remember:

Rate — requests per second.
Errors — failed requests per second.
Duration — distribution (percentiles) of request latency.

RED is essentially the golden signals minus saturation; it is the right default dashboard for every microservice.

The USE method (Brendan Gregg) — resource-centric, for machines and components:

Utilisation — % of time the resource is busy (CPU, disk, NIC).
Saturation — the degree of queued extra work it cannot service yet (run-queue length, swap, I/O wait).
Errors — error counts for the resource (disk errors, dropped packets).

The clean division: RED for services (request-driven, from app instrumentation), USE for resources (from node-exporter/cAdvisor). A complete cluster dashboard set has RED panels per service and USE panels per node. Saturation is the signal beginners most often omit and the one that gives you lead time — a service can be at 100% utilisation but only queuing (saturated) is the early warning of impending failure.

SLIs, SLOs and error budgets

The final and most important step is to stop alerting on causes (CPU is high) and start alerting on symptoms users feel, expressed as service-level objectives.

SLI (Service Level Indicator) — a measured number that reflects user happiness, almost always a ratio of good events to total events: e.g. “fraction of HTTP requests served in < 300 ms with a non-5xx status”. As PromQL:
```
sum(rate(http_request_duration_seconds_bucket{le="0.3",status!~"5.."}[5m]))
  / sum(rate(http_request_duration_seconds_count[5m]))
```
SLO (Service Level Objective) — a target for an SLI over a window: “99.9% of requests succeed within 300 ms over 30 days”. The SLO is an internal goal you alert against.
SLA (Service Level Agreement) — a contractual promise to a customer with consequences (refunds) if broken. SLAs are usually looser than your internal SLOs (you alert before you breach the contract).
Error budget — the inverse of the SLO: 100% − SLO. A 99.9% SLO permits 0.1% failures = ~43 minutes/month of “bad”. The budget is a shared currency: when it is healthy, ship features fast; when it is being burned, freeze risky changes and fix reliability.

Burn-rate alerting is the modern best practice and the reason SLOs beat threshold alerts. Instead of paging on “errors > 2%”, you page on how fast you are consuming the error budget. A multi-window, multi-burn-rate alert combines a fast, high-rate signal (e.g. 14.4× burn over 1h means you’d exhaust a 30-day budget in ~2 days — page now) with a slower, lower-rate signal (e.g. 1× burn over 6h — ticket), each gated by a short and a long window so you alert quickly on big breakages but don’t flap on transient blips. The payoff: you page when users are actually being harmed at a rate that threatens the SLO, not every time a graph twitches — dramatically less alert fatigue, far fewer false pages. This is the destination the whole stack exists to reach.

Hands-on lab: stand up the full stack on kind

We will create a local cluster, install kube-prometheus-stack (Operator + Prometheus + Grafana + Alertmanager + node-exporter + kube-state-metrics), deploy a sample app, scrape it with a ServiceMonitor, query it in Prometheus and Grafana, and add a PrometheusRule. Everything runs locally and is free.

0. Prerequisites. Install kind, kubectl and helm (v3). Then create a cluster:

kind create cluster --name monitoring-lab
kubectl cluster-info --context kind-monitoring-lab

1. Install metrics-server (kind ships without it) and prove the resource pipeline.

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
helm repo update
helm install metrics-server metrics-server/metrics-server -n kube-system \
  --set "args={--kubelet-insecure-tls}"          # lab-only: kind kubelet uses a self-signed cert

kubectl -n kube-system rollout status deploy/metrics-server
sleep 30
kubectl top nodes        # expect CPU/MEM columns with numbers, not an error
kubectl top pods -A      # per-pod usage

If kubectl top returns error: Metrics API not available, wait another 20–30s for the first scrape, then re-check; persistent failure means the --kubelet-insecure-tls flag did not apply.

2. Install the full Prometheus stack via Helm.

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
helm install kps prometheus-community/kube-prometheus-stack -n monitoring \
  --set grafana.adminPassword='lab-password'

kubectl -n monitoring rollout status deploy/kps-grafana
kubectl -n monitoring get pods
# Expect: prometheus-kps-... , alertmanager-kps-... , kps-grafana-... ,
#         kps-kube-state-metrics-... , and a kps-prometheus-node-exporter-... per node.
kubectl -n monitoring get servicemonitors    # the bundled monitors (kubelet, apiserver, etc.)

3. Open the Prometheus UI and explore targets and types.

kubectl -n monitoring port-forward svc/kps-kube-prometheus-stack-prometheus 9090:9090
# (service name may be 'kps-prometheus' — check: kubectl -n monitoring get svc | grep prometheus)

Browse to http://localhost:9090. Under Status → Targets every target should be UP. In the expression bar try:

up                                                  # 1 per healthy target
kube_pod_status_phase{phase="Running"}              # kube-state-metrics: running pods (gauge)
rate(node_cpu_seconds_total{mode="idle"}[5m])       # node-exporter counter -> rate
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)   # per-pod CPU

4. Deploy a sample app and scrape it with a ServiceMonitor. We use an app that natively exposes Prometheus metrics on a metrics port.

kubectl create namespace demo
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata: { name: hello, namespace: demo }
spec:
  replicas: 2
  selector: { matchLabels: { app: hello } }
  template:
    metadata: { labels: { app: hello } }
    spec:
      containers:
        - name: hello
          image: ghcr.io/prometheus/prometheus:v2.53.0   # exposes /metrics on 9090
          args: ["--config.file=/etc/prometheus/prometheus.yml"]
          ports: [{ name: metrics, containerPort: 9090 }]
---
apiVersion: v1
kind: Service
metadata: { name: hello, namespace: demo, labels: { app: hello } }
spec:
  selector: { app: hello }
  ports: [{ name: metrics, port: 9090, targetPort: metrics }]
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hello
  namespace: demo
  labels: { release: kps }          # must match the Prometheus serviceMonitorSelector
spec:
  selector: { matchLabels: { app: hello } }
  endpoints:
    - port: metrics                  # the *named* Service port, not a number
      interval: 15s
EOF

Wait ~30s, then in the Prometheus UI under Status → Targets you should see a new serviceMonitor/demo/hello/0 job with both pods UP. If it does not appear, check the ServiceMonitor release label matches (kubectl -n monitoring get prometheus -o yaml | grep -A3 serviceMonitorSelector) and that the endpoint port name matches the Service.

5. Add a PrometheusRule (alerting rule).

cat <<'EOF' | kubectl apply -f -
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: hello-rules
  namespace: demo
  labels: { release: kps }
spec:
  groups:
    - name: hello.rules
      rules:
        - alert: HelloTargetDown
          expr: up{job="hello"} == 0
          for: 1m
          labels: { severity: warning }
          annotations: { summary: "A hello replica has been down for 1m" }
EOF

In the Prometheus UI under Alerts you should see HelloTargetDown in the inactive state. Delete a pod (kubectl -n demo delete pod -l app=hello --field-selector ... or just scale to expose it) to watch it go pending then firing.

6. Open Grafana and view a dashboard.

kubectl -n monitoring port-forward svc/kps-grafana 3000:80
# login: admin / lab-password

Browse to http://localhost:3000, open Dashboards and explore the bundled “Kubernetes / Compute Resources / Namespace (Pods)” dashboard. Then create a panel with the query sum(rate(container_cpu_usage_seconds_total{namespace="demo"}[5m])) by (pod) to see your app’s CPU.

7. Validation checklist.

kubectl top nodes                                  # metrics-server pipeline works
kubectl -n monitoring get prometheus,alertmanager  # Operator-managed instances exist
kubectl get servicemonitor -A                      # your 'hello' monitor is listed
# In Prometheus UI: Status->Targets all UP; Alerts shows HelloTargetDown
# In Grafana: a cluster dashboard renders data

8. Cleanup.

kubectl delete namespace demo
helm uninstall kps -n monitoring
kubectl delete namespace monitoring
helm uninstall metrics-server -n kube-system
kind delete cluster --name monitoring-lab

Cost note. Everything here runs in a local kind cluster on your laptop — zero cloud cost. In a managed cluster, the levers that move the monitoring bill are Prometheus retention (disk), scrape interval and cardinality (RAM and storage scale with the number of time series — the dominant cost), Grafana/Alertmanager being negligible, and any remote-write to a long-term backend (Thanos/Mimir/managed Prometheus) which is usually the largest line item at scale.

Common mistakes & troubleshooting

Symptom	Likely cause	Fix
`kubectl top` → “Metrics API not available”	metrics-server not installed, or kubelet TLS (`x509`) scrape failures	Install metrics-server; on self-signed kubelets use `--kubelet-insecure-tls` (lab) or enable signed kubelet serving certs (prod)
HPA `TARGETS` shows `<unknown>`	metrics-server broken, or requests not set on pods	Fix metrics-server first; set CPU/memory requests (utilisation targets are % of request)
ServiceMonitor created but no target appears	Its labels don’t match the Prometheus `serviceMonitorSelector`, or wrong namespace selection	Add the matching label (e.g. `release: kps`); check `Status → Configuration`/`Targets` in the UI
ServiceMonitor endpoint scrapes nothing	`port` set to a number instead of the Service port name	Use the named port; ensure the Service actually exposes `/metrics`
Latency p99 graph looks wrong / impossible to aggregate	Using a summary and trying to average quantiles across pods	Switch to a histogram and use `histogram_quantile(0.99, sum(rate(..._bucket[5m])) by (le))`
Counter graph shows huge negative spikes	Graphing a counter’s raw value across a restart (reset to 0)	Use `rate()`/`increase()`, which handle resets
Alertmanager floods on a node failure	No `group_by`/inhibition configured	Group by `alertname`/`node`; add inhibition so `NodeDown` suppresses downstream alerts
Prometheus OOMs or disk fills	High cardinality (per-request/per-user labels) or long retention	Drop high-cardinality labels via relabeling; tune retention; pre-aggregate with recording rules
Alerts never fire despite the condition being true	Rule has a long `for:` and the condition flaps below it; or `PrometheusRule` label not selected by the Prometheus	Lower/verify `for:`; ensure the rule’s labels match `ruleSelector`

Best practices

Run the full trio plus metrics-server. node-exporter (machines) + kube-state-metrics (objects) + cAdvisor/kubelet (containers) + metrics-server (HPA). Each answers questions the others can’t.
Use the Operator and manage everything declaratively — ServiceMonitor/PodMonitor/PrometheusRule in Git, reviewed like code. Don’t hand-edit prometheus.yml.
Prefer histograms over summaries for any latency/size you aggregate across instances (which on Kubernetes is almost everything). Choose buckets to bracket your SLO threshold.
Always rate your counters with rate()/increase(), never graph raw counters; make windows ≥ 4× the scrape interval.
Guard cardinality ruthlessly. Never put unbounded values (user IDs, request IDs, raw paths, full URLs) in labels. Cardinality is the dominant cost and the top cause of Prometheus OOMs.
Alert on symptoms, not causes. Build dashboards with USE (nodes) + RED (services); page on SLO burn rate, not raw thresholds. “CPU high” is rarely worth a page; “users are seeing errors fast enough to blow the budget” always is.
Make alerts actionable. Every paging alert needs a clear summary, a runbook link in annotations, and a severity. If an alert has no action, it should be a dashboard, not a page.
Configure grouping, inhibition and silences before going live. This is the difference between sustainable on-call and pager hell.
Treat dashboards as code. Provision Grafana dashboards from Git/ConfigMaps; use template variables so one dashboard serves every namespace.
Plan retention and long-term storage deliberately. Local TSDB for recent data; remote-write to Thanos/Mimir/managed Prometheus for long retention and global query — don’t over-retain locally.

Security notes

Metrics endpoints leak information. /metrics can expose internal hostnames, versions, queue names, build info and traffic patterns. Don’t expose them on public Ingress; keep scraping in-cluster and require auth/mTLS where possible (the Operator supports bearerTokenSecret, tlsConfig and authorization on ServiceMonitor endpoints).
Lock down the kubelet scrape. metrics-server and Prometheus talk to the kubelet on 10250 over TLS; in production use signed kubelet serving certificates rather than --kubelet-insecure-tls, which disables certificate verification.
Protect Grafana and the Prometheus/Alertmanager UIs. They are unauthenticated or weakly authenticated by default. Put them behind SSO/OAuth proxy or your ingress auth, set a strong Grafana admin password, and never expose them publicly. Grafana data-source credentials and dashboard query power are sensitive.
Never inline secrets in Alertmanager config. Slack webhooks, PagerDuty keys and SMTP creds go in Kubernetes Secrets referenced via *_file/api_url_file/routing_key_file, not plaintext in the config (or git).
Scope RBAC tightly. kube-state-metrics needs broad read across the API — review its ClusterRole. Prometheus’s service account should be read-only. Run all components as non-root with restricted Pod Security.
Alerting is a security signal too. Wire alerts for suspicious states (cert expiry, sudden RBAC changes, repeated auth failures, control-plane component down) — monitoring is part of detection, not just performance.

Interview & exam questions

What is the difference between metrics-server and Prometheus? metrics-server is a tiny resource-metrics pipeline that scrapes only CPU/memory from kubelets, keeps the latest value in memory (no history), and serves the aggregated Metrics API for kubectl top and the HPA. Prometheus is a full monitoring system: it scrapes many targets, stores time series on disk for weeks, and offers PromQL for dashboards and alerting. They are complementary; metrics-server is not a monitoring system.
Why does Prometheus pull instead of push, and what are the trade-offs? Pull lets Prometheus own service discovery and gives a free liveness signal (a failed scrape sets up == 0); it’s awkward for short-lived jobs (use Pushgateway) and requires network reachability to targets. Push fits ephemeral jobs and firewalled targets but loses the “silent target = down” signal.
Explain the four metric types and when to use each. Counter — monotonically increasing total, always rate()d (requests, bytes). Gauge — value up/down (memory, queue length). Histogram — bucketed observations, quantiles computed server-side with histogram_quantile(), aggregatable across instances — the default for latency. Summary — quantiles computed client-side, accurate per-instance but not aggregatable.
Histogram vs summary — which for cluster-wide p99 latency, and why? Histogram. Summary quantiles are pre-computed per instance and cannot be meaningfully averaged across pods. Histograms expose raw le buckets, so you sum(rate(..._bucket[5m])) by (le) across instances and then histogram_quantile(0.99, ...).
Write a PromQL query for the p99 request latency across all replicas of a service. histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{job="api"}[5m])) by (le)).
rate() vs irate() vs increase()? rate() = smooth per-second average over the window (use for alerts/dashboards). irate() = instantaneous rate from the last two samples (spiky high-res graphs only). increase() = total rise over the window (rate × window), good for “how many in the last hour”. All three correctly handle counter resets.
What do kube-state-metrics and node-exporter each provide, and how do they differ from metrics-server? kube-state-metrics exposes the state of Kubernetes objects (replica counts, pod phases, restart counts, node conditions) from the API. node-exporter exposes host OS metrics (CPU, memory, disk, network) per node via a DaemonSet. metrics-server exposes live resource usage (CPU/RAM) for the HPA. State vs machine vs usage — three different questions.
What problem does the Prometheus Operator solve, and what do ServiceMonitor and PodMonitor do? It makes Prometheus configuration declarative and Kubernetes-native: instead of editing prometheus.yml, you create CRDs and a controller generates the config. ServiceMonitor says how to scrape a set of Services; PodMonitor scrapes Pods directly (no Service). A Prometheus resource selects which monitors it honours via serviceMonitorSelector.
My ServiceMonitor exists but the target never appears in Prometheus. Why? Most often its labels don’t match the Prometheus instance’s serviceMonitorSelector (so it’s ignored), the port is a number instead of the Service port name, the namespaceSelector excludes it, or the Service doesn’t actually expose /metrics. Check Status → Targets/Configuration in the UI.
What is the difference between Prometheus alerting rules and Alertmanager? Prometheus evaluates alerting rules and fires alerts when an expr is true for its for duration. Alertmanager receives firing alerts and handles routing, grouping, deduplication, inhibition, silences and delivery to receivers. Prometheus decides what is wrong; Alertmanager decides who to tell and how.
What are the four golden signals, and how do USE and RED relate? Golden signals: latency, traffic, errors, saturation. RED (Rate, Errors, Duration) is the request-centric subset for services. USE (Utilisation, Saturation, Errors) is resource-centric for machines/components. Use RED for services, USE for resources; saturation is the early-warning signal beginners omit.
Define SLI, SLO, SLA and error budget, and explain burn-rate alerting. SLI = a measured good/total ratio reflecting user happiness. SLO = a target for that SLI over a window (e.g. 99.9%/30d). SLA = a contractual promise with penalties (looser than the SLO). Error budget = 100% − SLO (99.9% ≈ 43 min/month). Burn-rate alerting pages on how fast you’re consuming the budget (multi-window/multi-burn-rate), so you alert on user-impacting harm rather than raw thresholds — far less alert fatigue.
What is cardinality and why does it matter? Cardinality is the number of distinct time series (each unique metric-name + label-set combination). High-cardinality labels (user/request IDs, raw URLs) multiply series, ballooning Prometheus memory and storage — the top cause of OOMs. Keep label values bounded.
Where does Prometheus store data and how do you keep it long-term / highly available? A local TSDB (WAL → 2h blocks → compaction), retained by time/size, not replicated. For long-term retention and global, HA query you use remote-write to Thanos, Cortex/Mimir or VictoriaMetrics; run multiple Prometheis scraping the same targets and let Thanos/Alertmanager dedup.

Quick check

Which component powers kubectl top and is the default source for the HPA?
A counter resets to 0 when its process restarts. Which PromQL function should you use so this reset doesn’t produce a misleading graph?
You need a single cluster-wide p99 latency aggregated across 10 pods. Do you instrument with a histogram or a summary, and why?
What does kube-state-metrics expose that node-exporter does not?
Your ServiceMonitor exists but no target shows up in Prometheus. Name the two most likely causes.

Answers

metrics-server (serving the metrics.k8s.io Metrics API).
rate() (or increase()) — both detect and correct counter resets; never graph a raw counter.
A histogram — summary quantiles are computed client-side and cannot be aggregated across instances, whereas histogram buckets can be summed and fed to histogram_quantile().
The state of Kubernetes objects (replica counts, pod phases, restart counts, node conditions). node-exporter exposes host OS metrics (CPU/mem/disk/net), not object state.
(a) The ServiceMonitor’s labels don’t match the Prometheus instance’s serviceMonitorSelector (so it’s ignored); (b) the endpoint port is a number instead of the Service port name (or the Service doesn’t expose /metrics).

Exercise

On a fresh kind cluster:

Install metrics-server and confirm kubectl top nodes and kubectl top pods -A return data.
Install kube-prometheus-stack via Helm and confirm every target under Status → Targets is UP.
Deploy any app that exposes /metrics, expose it with a Service, and add a ServiceMonitor so Prometheus scrapes it. Prove the target appears.
In Prometheus, write three queries: (a) the per-second request rate of your app’s HTTP counter using rate(); (b) the p95 latency using histogram_quantile() over the bucket metric; © the number of Running pods in your namespace using a kube_state_metrics gauge.
Create a PrometheusRule that fires when your app’s 5xx error ratio exceeds 1% for 5 minutes. Trigger it (e.g. by stopping the backend) and watch it move pending → firing in the Alerts tab.
In Grafana, build a small RED dashboard (Rate, Errors, Duration) for your app using template variable $namespace.
Configure Alertmanager to group_by: [alertname, namespace] and route severity="page" to one receiver and everything else to another. Add a silence for your test alert and confirm it stops notifying while still firing.

Write down, in two sentences, the SLI and SLO you would set for this app, and the error budget (in minutes/month) it implies.

Certification mapping

CKA — Monitoring is squarely in the Troubleshooting and cluster-operations domains. Be fluent with metrics-server and kubectl top (install, the x509/--kubelet-insecure-tls issue, “Metrics API not available”), interpreting resource usage, and understanding the metrics pipeline that feeds the HPA. Know that the Metrics API is served via the API Aggregation Layer.
CKAD — Understand how application resource requests/limits interact with metrics and the HPA, and how to read kubectl top and pod conditions when debugging your own workloads.
CKS / KCNA — KCNA touches observability concepts (the three pillars, Prometheus’s role in the CNCF landscape). For CKS, know the security posture of monitoring components (metrics endpoint exposure, kubelet TLS, RBAC scope of kube-state-metrics, securing Grafana/Alertmanager). Prometheus, kube-state-metrics, node-exporter, Grafana and the Prometheus Operator are all CNCF/ecosystem projects worth recognising by name.

Glossary

Observability — the ability to infer a system’s internal state from its external outputs (metrics, logs, traces).
metrics-server — cluster add-on serving live CPU/memory via the Metrics API for kubectl top and the HPA.
Metrics API (metrics.k8s.io) — the aggregated API exposing resource metrics; backed by metrics-server.
Prometheus — pull-based monitoring system and time-series database; the Kubernetes standard.
TSDB — time-series database; Prometheus’s on-disk store (WAL + compacted blocks).
Scrape / target — Prometheus pulling /metrics from an endpoint; a target is one such endpoint.
Exporter — a process exposing third-party metrics in Prometheus format (node-exporter, kube-state-metrics, …).
Counter / Gauge / Histogram / Summary — the four Prometheus metric types.
PromQL — Prometheus Query Language; selectors, range vectors, rate(), aggregations, histogram_quantile().
Recording rule — a precomputed PromQL expression saved as a new series.
kube-state-metrics (KSM) — exposes the state of Kubernetes objects as metrics from the API.
node-exporter — DaemonSet exporter of host OS metrics (CPU, memory, disk, network).
Prometheus Operator — controller that manages Prometheus/Alertmanager and generates scrape config from CRDs.
ServiceMonitor / PodMonitor — CRDs declaring how to scrape Services / Pods.
PrometheusRule — CRD carrying recording and alerting rules.
Grafana — visualisation/dashboarding layer querying Prometheus and other data sources.
Alertmanager — routes, groups, dedups, inhibits, silences and delivers fired alerts.
Inhibition / Silence — suppressing alerts because of another alert / a time-bounded manual mute.
Cardinality — the number of distinct time series; high cardinality is the main Prometheus cost/risk.
Four golden signals — latency, traffic, errors, saturation.
USE / RED — Utilisation-Saturation-Errors (resources) / Rate-Errors-Duration (services).
SLI / SLO / SLA — measured indicator / internal target / contractual promise.
Error budget — 100% − SLO; the allowable amount of “bad”, spent or saved.
Burn rate — how fast the error budget is consumed; basis of modern SLO alerting.

Next steps

Add logs and traces: Deploy SigNoz on Kubernetes with OpenTelemetry for APM and logs — extend metrics into full three-pillar observability.
Go deeper on the nodes you’re now monitoring: Kubernetes Worker Node Internals: kubelet, the CRI, kube-proxy & cgroups — understand the kubelet metrics and eviction thresholds your alerts watch.
Drive autoscaling from these metrics: Kubernetes Autoscaling in Depth: HPA, KEDA & Karpenter — metrics-server feeds CPU/memory HPAs; the Prometheus Adapter feeds custom-metric HPAs.