Taming Metric Cardinality: Relabeling, Limits, and Cost Governance in Prometheus

Cardinality is the single number that decides whether your Prometheus stack is a quiet utility or a recurring 2 a.m. incident. It is the count of unique time series — every distinct combination of a metric name and its label values — and it governs three things at once: the RAM the head block needs to hold its index, how many samples a query must touch to compute an answer, and, the moment you remote-write to a vendor, the recurring line item on your bill. One badly chosen label — a user ID, a full request URL, a Kubernetes pod name churning under an autoscaler, the first six digits of a card number — can multiply your series count by two orders of magnitude in a single afternoon, and nobody notices until the head block OOM-kills the node or the Grafana Cloud invoice quintuples.

This is the working playbook. We treat cardinality not as a mysterious force but as a product you can compute, a set of offenders you can find with three PromQL queries, and a stack of controls applied in a deliberate order: relabel at scrape time, cap with hard limits so a bad exporter fails its own scrape instead of the cluster, pre-aggregate with recording rules, filter at the remote-write boundary so you pay to keep only what matters, and attribute every series to a team so the cost has an owner. You will learn where each lever sits in the pipeline (relabel_configs vs metric_relabel_configs vs write_relabel_configs are three different stages, and confusing them is the most common mistake in this whole domain), what each action does (drop, keep, labeldrop, labelkeep, replace, hashmod, labelmap), and how to build a governance loop that catches the next leak the day it ships rather than after the invoice arrives.

By the end you will stop guessing. When head series climbs from 1.4M to 6.8M with no traffic growth, you will run promtool tsdb analyze on a snapshot, name the offending label (not just the metric) in under a minute, drop it at the cheapest possible stage, and set a per-team budget alert on the week-over-week ratio so the person who reaches for trace_id as a label gets paged before it hits storage. The diagnostic queries, relabel actions, limits, and failure modes are all laid out as scannable tables — read the prose once, then keep the tables open.

What problem this solves

Prometheus makes it trivially easy to add a label. You write histogram.WithLabelValues(method, status, handler, tenant) and ship it, and it works — in dev, where you have three tenants and forty handlers. In production, with 9,000 tenants and route templates that include raw UUIDs, that one histogram becomes several million series streaming into a paid backend, and the abstraction that made it easy to add is exactly what hides the cost. The label looks free at the call site. It is not.

What breaks without cardinality discipline is concrete and expensive. The head block — the in-memory index of every active series plus the most recent uncompacted samples — grows linearly with series count, and when it can no longer fit in RAM the process is OOM-killed during head compaction, which reads like a random crash but is a cardinality event every time. Query latency degrades because a sum by (...) over a high-cardinality histogram across 30 days has to touch enormous sample counts on every dashboard refresh, so your on-call dashboards time out precisely when you need them. And the remote-write bill — Grafana Cloud, Amazon Managed Prometheus (AMP), Chronosphere, and every other managed backend meters on active series or samples ingested — climbs in lockstep with cardinality, turning a metrics label into a finance conversation.

Who hits this: every team running Prometheus at more than toy scale, and it bites hardest on multi-tenant platforms (a tenant_id label is unbounded by design), Kubernetes shops (pod churn under autoscaling mints new series on every deploy), teams that remote-write to a vendor (where cardinality is literally the meter), and anyone whose developers add labels without a review gate. The fix is almost never “buy a bigger node” — it is “find the offending label, drop it at the cheapest stage, and govern so it doesn’t come back.” A bigger node buys you a week; governance buys you a quarter.

To frame the whole field before the deep dive, here is where cardinality bites, what it looks like, and the first place to look:

Symptom	What it actually is	First question to ask	First place to look	Most common single cause
OOM during compaction	Head index too large for RAM	Which metric/label exploded?	`/api/v1/status/tsdb`; `process_resident_memory_bytes`	Unbounded label (`user_id`, `path`, `trace_id`)
Slow dashboards / query timeouts	Query touches too many series	Which panel scans the most series?	`topk(10, count by (__name__)(...))`	High-cardinality histogram queried over 30d
Remote-write bill spike	Active series billed by vendor	Did series grow without traffic?	Vendor billing dashboard; `prometheus_remote_storage_samples_in_total`	Raw per-pod/per-tenant series forwarded
`sample_limit` scrape failures	A target exceeded its cap	Which target/job blew the cap?	`prometheus_target_scrapes_exceeded_sample_limit_total`	A runaway exporter or bad label loop
Slow churn / creeping series	Cumulative unique series over time	Is a label churning per deploy?	`prometheus_tsdb_head_series` trend	`pod`, `instance`, `pod_template_hash` under autoscaling
Duplicate-sample errors after a fix	Two series collapsed to identical	Did a `labeldrop` remove identity?	Scrape error logs; `scrape_samples_scraped`	Dropped a label that was part of series identity

Learning objectives

By the end of this article you can:

Compute the cardinality of any metric as the product of its label cardinalities, and explain why series count — not sample rate — is the load that dominates memory, query time, and bill.
Find the specific offending labels (not just metrics) on any cluster using /api/v1/status/tsdb, count by (__name__), the head-series query, promtool tsdb analyze, and the count(count by (label)(...)) pattern.
Place every relabeling rule at the correct pipeline stage — relabel_configs (pre-scrape, target meta-labels), metric_relabel_configs (post-scrape, pre-ingest), and write_relabel_configs (remote-write boundary) — and explain what each stage can and cannot do.
Use every relabel action correctly: drop, keep, labeldrop, labelkeep, replace, hashmod (for sharding), labelmap, lowercase/uppercase, keepequal/dropequal.
Enforce hard guardrails with sample_limit, target_limit, label_limit, label_name_length_limit, and label_value_length_limit, and alert on the metrics that fire when they trip.
Pre-aggregate high-cardinality data with recording rules and short local retention, then forward only aggregates and an allow-list to long-term storage via write_relabel_configs — frequently a 5–10× reduction.
Attribute series to a team/owner label at scrape time, build a per-team budget as a recording rule, and alert on both hard-budget breach and week-over-week growth so leaks are caught while the fix is still a one-line rule.
Reason about cost and chargeback: what drives the bill (active series), how remote-write filtering and Mimir/Thanos downsampling change the meter, and how to attribute spend per team.

Prerequisites & where this fits

You should already understand the Prometheus data model: a metric has a name and a set of labels, and each unique name-plus-label-set combination is one time series identified internally by a hash of its labels. You should be able to write basic PromQL (rate(), sum by, count, topk), edit a prometheus.yml, reload the server with curl -X POST .../-/reload or SIGHUP, and read JSON with jq. Familiarity with Kubernetes service discovery (the kubernetes_sd_configs meta-labels like __meta_kubernetes_namespace) helps because that is where most relabeling lives. You do not need Thanos or Mimir experience — we treat those as the long-term-storage tier that cardinality control feeds.

This sits at the center of the Observability & Cost track. It assumes the query fundamentals from the PromQL Deep Dive: rate(), Histograms & Aggregation (you will read a lot of count by and rate here) and pairs tightly with Prometheus Recording Rules & Remote Write for Long-Term Storage, which goes deep on the recording-rule and remote-write mechanics we use as levers. Downstream, Thanos: Global Query, Deduplication & Downsampling and Grafana Mimir: Multi-Tenant, Horizontally Scalable Metrics are the long-term backends whose per-series or per-tenant limits make cardinality control non-optional. The same cardinality problem in logs is covered in Loki, LogQL & Label Cardinality: Chunk Storage Tuning, and the FinOps attribution model generalizes in Multi-Cloud FinOps with Apptio Cloudability: Unit Economics.

A quick map of who owns what, so you route the fix to the right place:

Layer	What lives here	Who usually owns it	Cardinality role
Application / exporter	The metric + its labels at the call site	App / dev team	Where cardinality is created — the true fix lives here
Service discovery	Target meta-labels (`__meta_*`)	Platform / SRE	`relabel_configs` shapes target identity + attribution
Scrape config	`metric_relabel_configs`, limits	Platform / SRE	Cheapest place to drop series before ingest
TSDB (head + blocks)	The stored series index	Platform / SRE	Where the RAM and query cost land
Recording rules	Pre-aggregated series	Platform + app	Turns raw high-cardinality into cheap aggregates
Remote write	Forwarding to long-term storage	Platform / SRE	`write_relabel_configs` decides what you pay to keep
Long-term storage (Thanos/Mimir/AMP)	Blocks + per-tenant limits	Platform / vendor	Where the bill and global limits bite

Core concepts

Five mental models make every later decision obvious.

Cardinality is a product, and products explode. The total number of series for one metric is the product of the cardinalities of its labels. http_requests_total{method, status, handler, instance} with |method|=5, |status|=6, |handler|=40, |instance|=200 is 5 × 6 × 40 × 200 = 240,000 series. Multiplication, not addition, is why one new label matters so much: add user_id with 100,000 distinct values and you have not added 100,000 series, you have multiplied by 100,000, taking that metric to 24 billion potential series (bounded in practice only by which combinations actually occur, which is still catastrophic). The governing intuition: a label is a dimension you slice by, not a field you store data in. If a value is unbounded or high-cardinality, it does not belong in a label — it belongs in a trace or a log line.

Series count is the load; sample rate is almost noise by comparison. Prometheus keeps an in-memory head block holding the index of every active series plus the most recent ~2 hours of uncompacted samples before flushing to a persistent block on disk. The cost that dominates memory is the number of distinct series the index must track, not how fast samples arrive on each. A metric scraped once a minute with two million series is far more expensive than a metric scraped every second with fifty. So scrape_interval tuning barely moves the needle on the problem that actually kills you; label discipline moves everything.

Relabeling runs at three distinct stages, and they are not interchangeable. relabel_configs runs before the scrape against the target’s meta-labels (__address__, __meta_kubernetes_*, discovery labels) — it decides what to scrape and shapes target identity. metric_relabel_configs runs after the scrape against every sample’s full label set, before ingestion into the TSDB — this is the primary cardinality lever, because whatever you drop here costs zero memory, zero query time, and zero bill. write_relabel_configs runs at the remote-write boundary against series about to be forwarded — it decides what you pay a vendor to keep, independent of what you store locally. Same engine, three positions in the pipeline; using the wrong one is the number-one mistake.

Limits are circuit breakers; relabeling is a scalpel. Relabeling is precise and surgical, but it is a config you wrote — it cannot protect you from a new exporter someone else ships with a runaway label. That is what sample_limit, target_limit, and the label limits are for: they make a bad target fail its own scrape loudly rather than silently minting series until the head OOMs. You want both — the scalpel to shape known data and the circuit breaker to contain the unknown.

Cost lives at the remote-write boundary and in the head, not in “Prometheus” generically. Local storage on a self-hosted server is cheap disk and RAM you already own; the expensive meters are (a) the head-block RAM, which caps how many series a single node can hold, and (b) the remote-write active-series count, which is what a managed vendor bills. This split is a feature: you can store raw, high-cardinality data locally on a 24-hour retention for debugging while forwarding only a downsampled, filtered subset to the paid long-term store. Cardinality control is largely the art of exploiting that split.

The vocabulary in one table

Pin down every moving part before the deep sections; the glossary at the end repeats these for lookup.

Concept	One-line definition	Where it lives	Why it matters to cardinality
Time series	One unique metric-name + label-set combination	TSDB index	The unit of cardinality; the thing you count
Active series	A series that received a sample recently (in the head)	Head block	The number that drives RAM and the vendor bill
Head block	In-memory index + last ~2h of samples	RAM	OOMs when series count exceeds capacity
Cardinality	Count of distinct series (often per metric or per label)	Derived	The load; the meter; the incident
`__name__`	The internal label holding the metric name	Every series	`count by (__name__)` finds headline offenders
`relabel_configs`	Pre-scrape rules on target meta-labels	`scrape_config`	Decides what to scrape; sets attribution labels
`metric_relabel_configs`	Post-scrape, pre-ingest rules on sample labels	`scrape_config`	Cheapest drop point; the primary lever
`write_relabel_configs`	Rules at the remote-write boundary	`remote_write`	Decides what you pay to keep long-term
`labeldrop` / `labelkeep`	Remove / retain labels by name regex	Relabel action	Collapses series that differ only in that label
`hashmod`	Hash a label into N buckets for sharding	Relabel action	Splits load across Prometheus shards
Recording rule	A pre-computed, stored aggregate	Rule group	Turns raw high-cardinality into a cheap series
`sample_limit`	Max samples one scrape may yield	`scrape_config`	Whole scrape fails past the cap (circuit breaker)
`target_limit`	Max targets a job may scrape	`scrape_config`	Contains SD churn/explosions
Chargeback	Attributing metric cost to a team/owner	Governance	Makes cardinality someone’s budget

How cardinality explodes: the arithmetic and the usual suspects

The failure is always the same shape — a label whose value set is unbounded or large multiplies a metric’s series count — but the specific labels that cause it are a short, well-known list. Learn to recognize them at the call site and you prevent 90% of incidents before they ship.

Start with the arithmetic, because it is the whole story. For a metric with labels L1..Ln, potential series is |L1| × |L2| × ... × |Ln| (bounded in practice by which combinations actually occur). Two consequences follow. First, the highest-cardinality single label dominates — a metric with a 40-value handler label and a 100,000-value user label has at most 40 useful dimensions of slicing and 100,000× the cost. Second, histograms multiply the damage because a histogram is not one series per label combination — it is (number_of_buckets + 2) series (one per le bucket, plus _sum and _count). A histogram with 12 buckets and a 9,000-value label is 14 × 9,000 = 126,000 series for that one histogram, before you multiply by method and status.

Here is the canonical list of high-cardinality offenders — the labels that cause almost every real incident, why each explodes, and the correct fix:

Offending label	Why it explodes	Correct fix	Where to fix it
`user_id`, `customer_id`, `tenant_id`	Unbounded; grows with the business	`labeldrop`, or aggregate away via recording rule	`metric_relabel_configs` / rule
`path`, `url`, `endpoint` (raw)	Unique per ID / query string	Normalize to a route template (`:id`) via `replace`	App instrumentation or `metric_relabel_configs`
`pod`, `instance` under autoscaling	Churns on every deploy / scale event	`labeldrop` if not sliced by; `target_limit`	`metric_relabel_configs`
`pod_template_hash`, `controller_revision_hash`	New value on every rollout	`labeldrop` — pure churn, never queried	`metric_relabel_configs`
`trace_id`, `request_id`, `span_id`	One value per request — a cardinality bomb	Never a label; belongs in traces/logs	Remove at the source
`email`, `session_id`, IP addresses	Effectively unbounded; also PII/GDPR risk	Drop entirely	`metric_relabel_configs` (`drop`/`labeldrop`)
`status_message`, free-text errors	Arbitrary strings	Use a bounded `status_code` instead	App instrumentation
`card_bin`, account numbers	Thousands of values; PII-adjacent	Drop, or aggregate before long-term storage	`write_relabel_configs` + rule
`version`, `image_tag`, `git_sha`	New value every deploy; accumulates	Keep only if you query it; else `labeldrop`	`metric_relabel_configs`
`le` on an over-bucketed histogram	Buckets × every other label	Reduce buckets; pre-aggregate `by (le)`	Instrumentation + recording rule

The governing rule, written so a developer can self-check before adding a label:

If you cannot name a finite, reasonably small set of values a label will ever take, it is not a label. Put that value in a trace or a log line, and slice metrics by something bounded (a route template, a status code, a service name).

A subtle offender worth calling out: exporter defaults you never look at. kube-state-metrics, node_exporter, and cAdvisor emit hundreds of metrics, many with per-container or per-mountpoint labels you never dashboard on. cAdvisor’s container_* metrics with the full pod, container, image, and name label set are a classic silent multiplier under autoscaling. You do not have a bug — the exporter is doing exactly what it is designed to — but you are ingesting (and paying for) series nobody queries. The fix is a keep allow-list, covered below.

Here is how the same metric grows as you add labels, to make the multiplication visceral:

Metric definition	Labels	Cardinality (illustrative)	Verdict
`http_requests_total{method, status}`	5 × 6	30	Healthy
`+ handler`	5 × 6 × 40	1,200	Healthy
`+ instance` (200 pods)	5 × 6 × 40 × 200	240,000	Watch it; churns on scale
`+ user_id` (100k)	× 100,000	up to 24,000,000,000	Catastrophic — drop the label
`http_request_duration_seconds` histogram, 12 buckets `+ handler`	14 × 40	560	Healthy
`+ tenant` (9,000)	14 × 40 × 9,000	5,040,000	Catastrophic — aggregate or drop `tenant`

Diagnose the offenders before you touch a config

Never relabel blind. The instinct under pressure is to start deleting metrics; the discipline is to find what is actually expensive first, because the offender is usually a single label on a single metric, and dropping the wrong thing loses signal you need. There are four sources of truth, from fastest to most thorough.

The TSDB status page and API. The built-in TSDB status surfaces the head’s cardinality breakdown directly. In the UI it is under Status → TSDB Status (http://<prometheus>:9090/tsdb-status); the same data is exposed via the API:

# The four lists that tell you exactly where the series live
curl -s http://localhost:9090/api/v1/status/tsdb | jq '.data | {
  seriesCountByMetricName,
  seriesCountByLabelValuePair,
  labelValueCountByLabelName,
  memoryInBytesByLabelName
}'

Those four lists are the whole diagnosis:

TSDB status list	What it answers	What to look for
`seriesCountByMetricName`	Which metrics have the most series	A single metric name dwarfing the rest
`seriesCountByLabelValuePair`	Which exact `label="value"` pairs cost most	A specific hot value (e.g. one job)
`labelValueCountByLabelName`	Which label names have the most unique values	The unbounded-label smoking gun
`memoryInBytesByLabelName`	RAM attributed to each label name	Where the head memory actually goes

labelValueCountByLabelName is the one that catches unbounded labels — if user_id shows 80,000 distinct values, that is your fuse, and no metric-name list would have told you as directly.

Interactive PromQL. For ad-hoc hunting on an unfamiliar cluster, these are the queries to run first, in order:

# 1) Headline number: total active series in the head
prometheus_tsdb_head_series

# 2) Top 10 metric names by series count — the headline offenders
topk(10, count by (__name__)({__name__=~".+"}))

# 3) Which job is generating the most series? Find the bad exporter.
topk(10, count by (job)({__name__=~".+"}))

# 4) Which namespace/team (if you attribute)? 
topk(10, count by (namespace)({__name__=~".+"}))

To find which label on a specific metric is doing the damage — the step most people skip — count distinct values per label with the nested-count pattern:

# How many distinct values does each label of this metric carry?
# Inner: series per label value. Outer: count of those = distinct values.
count(count by (handler)(http_request_duration_seconds_bucket))
count(count by (user_id)(http_request_duration_seconds_bucket))
count(count by (le)(http_request_duration_seconds_bucket))

If user_id returns 80,000 and handler returns 40, you have found and confirmed the fuse without touching a config file.

promtool tsdb analyze (offline, on a block). This reads a block on disk and prints the same breakdown without touching the running server — ideal in CI against a snapshot, or on a copy of prod data so you do not add query load during an incident:

# Analyze the highest-cardinality labels and label pairs in a block
promtool tsdb analyze /prometheus/data --limit=20

It prints the label names with the most unique values (the unbounded-label detector), the label pairs with the most series, and the metric names with the most series and highest churn. The churn number is important and unique to promtool: it distinguishes an instantaneously large metric from one that turns over series constantly (the autoscaling-pod-churn signature).

Remote-write and per-vendor views. If you remote-write, the vendor’s own cardinality explorer (Grafana Cloud’s Cardinality Management, AMP’s usage metrics) shows what you are paying for, which can differ from your local head if you already filter at the write boundary. Corroborate the local diagnosis against the bill.

Here are the diagnostic tools side by side, so you pick the right one for the situation:

Tool	Runs against	Best for	Load on prod	Gotcha
`/api/v1/status/tsdb`	Live head	Fast triage, the four canonical lists	Low	Only the head (active), not full retention
`count by (__name__)` PromQL	Live TSDB	Interactive metric ranking	Moderate (heavy query)	Wildcard `{__name__=~".+"}` is expensive
`count(count by (label)(...))`	Live TSDB	Confirming the offending label	Moderate	Must name the metric; run per suspect
`promtool tsdb analyze`	A block on disk	Offline / CI, churn detection	None (offline)	Analyzes one block; snapshot first
`prometheus_tsdb_head_series`	Live gauge	Trend / alerting baseline	Trivial	A number, not a breakdown
Vendor cardinality explorer	Remote-write data	What you actually pay for	None (vendor-side)	Vendor-specific; lags local

Beyond ad-hoc queries, Prometheus exposes internal metrics about its own TSDB that you should graph and alert on continuously — these are the leading indicators of a cardinality problem building:

Internal metric	What it tells you	Use it to
`prometheus_tsdb_head_series`	Current active series in the head	Baseline + trend; alert on growth
`prometheus_tsdb_head_chunks`	Chunks in the head	Corroborate memory pressure
`prometheus_tsdb_head_series_created_total`	Cumulative series ever created	Rate of this = churn rate
`process_resident_memory_bytes`	Process RAM	Correlate with series count for sizing
`prometheus_tsdb_compaction_duration_seconds`	Head compaction time	Long/failing compaction = OOM risk
`prometheus_remote_storage_samples_in_total`	Samples entering remote-write	What you’re forwarding (bill proxy)
`prometheus_target_scrapes_exceeded_sample_limit_total`	Scrapes killed by `sample_limit`	Detect a limit breach (`up` won’t)

One practical note on the expensive wildcard query. count by (__name__)({__name__=~".+"}) scans every series and can itself add meaningful load on a large TSDB. During an active incident, prefer /api/v1/status/tsdb (pre-computed and cheap) or promtool tsdb analyze on a snapshot; save the interactive wildcard queries for calmer investigation.

Relabeling: the three stages and every action

Relabeling is the core lever, and it uses one engine at three positions in the pipeline. Get the stage right first, then the action.

The three stages, and why the difference is everything

Stage	Runs	Operates on	Can it reduce cardinality?	Typical use
`relabel_configs`	Before the scrape	Target meta-labels (`__address__`, `__meta_*`)	Indirectly (choose what to scrape; set attribution labels)	Filter targets; set `team`, `job`, `instance`; drop targets
`metric_relabel_configs`	After scrape, before TSDB ingest	Every sample’s label set	Yes — the primary lever. Drops series before storage	Drop noisy metrics; strip unbounded labels; normalize paths
`write_relabel_configs`	At the remote-write boundary	Series about to be forwarded	Yes, for the bill — filters what leaves for long-term storage	Forward only aggregates + allow-list; drop raw per-pod series

The key insight: metric_relabel_configs acts on data already at the server but not yet stored, so whatever you drop there costs zero local memory, zero query time, and zero remote-write bill. relabel_configs cannot drop individual metrics or sample labels (the samples do not exist yet — only target meta-labels do), so people who put a metric-name drop in relabel_configs are silently doing nothing. And write_relabel_configs only affects what you forward, not what you store locally — perfect for keeping raw data on short retention while paying to keep only aggregates.

The relabel rule anatomy

Every relabel rule is a small state machine with these fields:

Field	What it does	Default	Notes
`source_labels`	Labels concatenated (with `separator`) into the match input	(none)	Order matters; `[__name__]` is the common one
`separator`	Joins multiple `source_labels`	`;`	Change when values may contain `;`
`regex`	RE2 regex matched against the concatenated source	`(.*)`	Anchored (full-string match) implicitly
`action`	What to do on match	`replace`	See action table below
`target_label`	Label to write (for `replace`/`hashmod`/`labelmap`)	(none)	The output label
`replacement`	Value written to `target_label`	`$1`	Supports `$1`, `${1}` capture-group refs
`modulus`	Number of buckets (for `hashmod`)	(none)	Used with `hashmod` for sharding

Every action, and when to use it

Action	What it does	Uses `target_label`?	Typical cardinality use
`keep`	Keep series/targets where `regex` matches `source_labels`; drop the rest	No	Allow-list a known set of metrics from a chatty exporter
`drop`	Drop series/targets where `regex` matches	No	Remove noisy metrics you never query
`replace`	Write `replacement` (with capture groups) into `target_label`	Yes	Normalize a raw `path` to a route template
`labeldrop`	Remove labels whose name matches `regex`	No (matches label names)	Strip an unbounded label (`user_id`) from everything
`labelkeep`	Keep only labels whose name matches; drop all others	No (matches label names)	Reduce to a known-good label set
`hashmod`	Set `target_label` to `hash(source_labels) % modulus`	Yes (+`modulus`)	Shard series across N Prometheus instances
`labelmap`	Copy labels matching `regex` to new names via `replacement`	Yes (name pattern)	Turn `__meta_kubernetes_label_*` into real labels
`lowercase` / `uppercase`	Normalize the case of `source_labels` into `target_label`	Yes	Collapse `GET`/`get` case variants
`keepequal`	Keep series where `source_labels` equals `target_label`	Yes	Advanced filtering on label equality
`dropequal`	Drop series where `source_labels` equals `target_label`	Yes	Advanced filtering on label equality

Now the worked examples, in the order you would reach for them.

Drop a whole noisy metric you never query (cheapest possible win):

scrape_configs:
  - job_name: "node"
    static_configs:
      - targets: ["node-exporter:9100"]
    metric_relabel_configs:
      # Drop Go runtime GC histograms nobody dashboards on
      - source_labels: [__name__]
        regex: "go_gc_duration_seconds.*"
        action: drop

Strip a single high-cardinality label while keeping the metric — labeldrop removes the label by name, collapsing every series that differed only in that dimension:

    metric_relabel_configs:
      # Remove the unbounded user_id label from everything in this job.
      # Series collapse: e.g. 80,000 -> ~40 once user_id is gone.
      - regex: "user_id"
        action: labeldrop

A warning that bites people: labeldrop collapses series, and if two surviving series become identical (same name, same remaining labels), Prometheus reports a duplicate-sample error for that scrape and drops the sample. Only drop a label that is genuinely extra detail, not part of the identity you need. If you get Error on ingesting samples with different value but same timestamp, you dropped a label that was distinguishing two real series.

Keep only an allow-list of metrics from a chatty exporter — invert with keep so you ingest a known set and discard everything else (the fix for kube-state-metrics / cAdvisor bloat):

    metric_relabel_configs:
      # Keep only the four metrics we actually use from kube-state-metrics
      - source_labels: [__name__]
        regex: "kube_pod_status_phase|kube_deployment_status_replicas|kube_node_status_condition|kube_pod_container_resource_requests"
        action: keep

Normalize a high-cardinality path into a bounded route template — rewrite /api/v1/orders/8a3f... to /api/v1/orders/:id so the label becomes bounded instead of unbounded:

    metric_relabel_configs:
      # Collapse UUID / numeric-id path segments into a placeholder
      - source_labels: [path]
        regex: "(/api/v1/orders/)[0-9a-f-]+"
        target_label: path
        replacement: "${1}:id"

Reduce to a known-good label set with labelkeep — sometimes an exporter attaches many meta-labels and you want only a whitelist:

    metric_relabel_configs:
      # Keep ONLY these labels on the metric; drop every other label name
      - regex: "__name__|job|instance|method|status|service|team"
        action: labelkeep

Attach an attribution label at scrape time with relabel_configs (note the different stage — this shapes target identity, not sample labels):

    relabel_configs:
      # Map a Kubernetes namespace to a team label for chargeback
      - source_labels: [__meta_kubernetes_namespace]
        target_label: team
        regex: "(payments|checkout|search)-.*"
        replacement: "${1}"

Which stage each of the common tasks belongs in — the cheat sheet that prevents the number-one mistake:

Task	Correct stage	Why not the others
Decide which targets to scrape	`relabel_configs` (`keep`/`drop` on `__meta_*`)	Samples don’t exist pre-scrape; only meta-labels do
Set `team`/`instance`/`job` identity	`relabel_configs`	Target identity is fixed before scraping
Drop a whole metric by `__name__`	`metric_relabel_configs`	`relabel_configs` has no `__name__` yet
Strip an unbounded label everywhere	`metric_relabel_configs` (`labeldrop`)	Must act on sample labels
Normalize a `path` value	`metric_relabel_configs` (`replace`)	Same — sample-level
Forward only aggregates to a vendor	`write_relabel_configs`	Local storage should keep raw; only forwarding is filtered
Shard series across servers	`relabel_configs` (`hashmod`)	Sharding decides which server scrapes which target

`hashmod` sharding — when one Prometheus is not enough

There is a ceiling to how many series a single Prometheus head can hold (a function of RAM; a well-provisioned node handles a few million active series, but tens of millions needs sharding). hashmod is how you split the load: hash a stable label into modulus buckets and have each of N Prometheus instances keep only its bucket. Every target lands on exactly one shard, deterministically.

# On Prometheus replica 0 of 4 (set SHARD=0..3 per replica)
scrape_configs:
  - job_name: "kubernetes-pods"
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      # Hash a stable target identity into 4 buckets
      - source_labels: [__address__]
        modulus: 4
        target_label: __tmp_shard
        action: hashmod
      # Keep only the targets whose bucket matches THIS replica's shard
      - source_labels: [__tmp_shard]
        regex: "0"           # replica 1 uses "1", etc.
        action: keep

The trade-offs of sharding, because it is not free:

Aspect	Single Prometheus	`hashmod`-sharded (N replicas)
Max active series	Bounded by one node’s RAM	~N × single-node capacity
Global queries	Native (one TSDB)	Need Thanos/Mimir to fan-in and dedupe
Operational complexity	Low	Higher (N configs, a query layer)
Rebalancing on scale change	N/A	Changing `modulus` reshuffles all targets
Per-target placement	N/A	Deterministic (`hash(label) % modulus`)
When to use it	Up to a few million series	Tens of millions; per-shard blast radius

Sharding is a scaling answer, not a cardinality reduction answer — it lets you hold more series, it does not make the series cheaper. Always exhaust relabeling and aggregation first; shard only when the legitimate, already-minimized series count genuinely exceeds one node.

Hard limits: circuit breakers so one bad target can’t win

Relabeling is a config you wrote; it cannot anticipate the exporter someone else ships tomorrow with a runaway label. Limits are the circuit breakers that contain the unknown. A new exporter pushed by a team that did not read this article should fail its own scrape — a loud, visible signal — not silently mint series until the head OOMs.

The per-scrape limits

sample_limit caps how many samples a single scrape may yield. Exceed it and the entire scrape is dropped and marked failed — deliberately loud:

scrape_configs:
  - job_name: "app"
    sample_limit: 5000              # whole scrape fails past 5k series per target
    target_limit: 2000              # refuse if SD returns >2000 targets
    label_limit: 30                 # reject a sample with >30 labels
    label_name_length_limit: 200    # reject a label name longer than 200 chars
    label_value_length_limit: 1000  # reject a label value longer than 1000 chars
    static_configs:
      - targets: ["app:8080"]

Set a sane floor in global so every job inherits protection even if a new scrape config forgets it:

global:
  scrape_interval: 30s
  scrape_timeout: 10s
  # Applied to every job unless overridden
  sample_limit: 10000
  label_limit: 30
  label_name_length_limit: 200
  label_value_length_limit: 2000

Every limit, what it protects against, and its behavior on breach:

Limit	Caps	On breach	Signature it catches	Sensible starting value
`sample_limit`	Samples per single scrape	Whole scrape fails (marked up=1 but no samples ingested)	A target that suddenly emits thousands of series	10,000 global; tighten per job
`target_limit`	Targets per job (from SD)	Job’s SD update rejected	A label/SD misconfig returning thousands of targets	2,000–5,000 per job
`label_limit`	Labels per sample	Sample rejected	A runaway label-generation bug	30
`label_name_length_limit`	Chars in a label name	Sample rejected	Accidentally using a value as a name	200
`label_value_length_limit`	Chars in a label value	Sample rejected	Free-text/stack-trace stuffed into a label	1,000–2,000
`keep_dropped_targets`	Memory for dropped-target metadata	Caps retained dropped targets	Huge SD with mostly-dropped targets	Set if `relabel` drops many targets

An important nuance on sample_limit: when a scrape is rejected for exceeding it, the target’s up metric stays 1 (the target was reachable), but no samples from that scrape are ingested and the failure is recorded in a dedicated counter. This is why a sample_limit breach is easy to miss — up == 1 looks healthy. You must alert on the counter:

# Targets whose scrape was rejected by sample_limit — alert on any increase
increase(prometheus_target_scrapes_exceeded_sample_limit_total[5m]) > 0

# Companion counters for the label limits
increase(prometheus_target_scrapes_exceeded_label_limits_total[5m]) > 0

Target churn protection

Autoscaling and CI runners create the other cardinality leak: churn. Pods come and go, each with a unique pod or instance label, so series that are no longer scraped still occupy the head until they age out of the retention window — and the cumulative unique count over a day dwarfs the instantaneous count. A cluster that shows 2M active series at any instant may have minted 20M distinct series across a day of deploys, and every one of those sat in the head for a while. Two defenses:

Drop pod-identifying labels you do not slice by (pod, pod_template_hash, controller_revision_hash) with labeldrop, so a ReplicaSet’s churn does not mint new series on every rollout.
Cap concurrent targets with target_limit, so a job that suddenly tries to scrape thousands of endpoints (a bad SD selector) fails visibly instead of exploding.

scrape_configs:
  - job_name: "kubernetes-pods"
    target_limit: 2000
    kubernetes_sd_configs:
      - role: pod
    metric_relabel_configs:
      # These labels only ever churn; nobody queries them
      - regex: "pod_template_hash|controller_revision_hash"
        action: labeldrop

The churn-vs-instantaneous distinction is worth internalizing because it explains “why did my series count keep climbing even though I dropped labels?” — you dropped the instantaneous offenders but a churning label is still minting new series faster than old ones expire. promtool tsdb analyze’s churn column is how you spot it.

Aggregate at write time with recording rules

Sometimes you genuinely need a high-cardinality metric for short-term debugging but only ever query an aggregate. Recording rules pre-compute the aggregate on a schedule and store the small result under a new metric name; the dashboards query the cheap result, and — paired with short local retention and remote-write filtering — the raw data never reaches expensive long-term storage.

The naming convention matters: recording-rule output is conventionally named level:metric:operations (e.g. service:http_requests:rate5m) so it is self-documenting and easy to allow-list at the remote-write boundary with a single service:.* regex.

groups:
  - name: cardinality-reduction
    interval: 30s
    rules:
      # Collapse per-pod, per-handler request rate into per-service rate.
      # The stored result drops the pod dimension entirely.
      - record: "service:http_requests:rate5m"
        expr: |
          sum by (service, method, status) (
            rate(http_requests_total[5m])
          )

      # Pre-aggregate a latency histogram down to per-service buckets,
      # so the quantile query later touches a fraction of the series.
      - record: "service:http_request_duration_seconds_bucket:rate5m"
        expr: |
          sum by (service, le) (
            rate(http_request_duration_seconds_bucket[5m])
          )

Dashboards then query service:http_requests:rate5m instead of the raw metric — fewer series scanned, faster refresh, and the recomputed result is deterministic and cheap. The high-leverage pattern is pairing this with a short local retention for the raw data and forwarding only the aggregate to long-term storage.

The economics of recording-rule pre-aggregation, illustrated:

Approach	Series stored long-term	Query cost	Debugging fidelity	When to use
Store raw per-pod series	Full (e.g. 5M)	High (scan all pods)	Perfect (per-pod)	Never, for long-term
Recording rule → per-service	Aggregated (e.g. 5k)	Low	Per-service only	Dashboards + long-term
Raw local (24h) + aggregate long-term	5M local, 5k remote	Low remote, high local	Full for 24h, aggregate forever	The recommended pattern
Drop the metric entirely	0	N/A	None	Only if truly never needed

Which dimension to aggregate away is the design decision — you keep the labels you slice by and collapse the rest. A quick guide:

Raw dimension	Keep it if you…	Collapse it if you…	Typical choice
`pod` / `instance`	Debug a single bad pod	Only ever look per-service	Collapse (aggregate `by (service)`)
`le` (histogram bucket)	Compute quantiles	Only need a count/sum	Keep `le`; it’s needed for quantiles
`tenant` / `user_id`	Bill or debug per tenant (rare, short-term)	Report service-wide	Collapse for long-term; keep raw local
`method` / `status`	Slice error rates by verb/code	Only need a total	Usually keep (small, useful)
`handler` / route	Per-endpoint latency	Only need per-service	Depends — keep if bounded and queried

A caution: recording rules add some series (the pre-computed output) and consume rule-evaluation CPU, so don’t create one for a metric you already query cheaply — you’d add series for no benefit. See Prometheus Recording Rules & Remote Write for Long-Term Storage for the full rule-evaluation and staleness mechanics.

Filter at the remote-write boundary

write_relabel_configs is the same relabeling engine applied at the remote-write boundary — the point where you decide what you pay a vendor to keep, independent of what you store locally. This is where “store everything locally for 24 hours” becomes “pay to keep only the aggregates and an allow-list,” frequently a 5–10× reduction in remote-write active series and therefore in bill.

Keep only pre-aggregated recording-rule output and a small allow-list:

remote_write:
  - url: "https://prometheus-prod.example.com/api/v1/write"
    write_relabel_configs:
      # Forward only recording-rule series (service:...) plus a few essentials.
      # Everything else stays local on short retention and never bills.
      - source_labels: [__name__]
        regex: "service:.*|up|node_(cpu|memory|filesystem)_.*|kube_deployment_status_replicas"
        action: keep

Drop a specific high-cardinality raw metric while keeping everything else (the surgical inverse — when most metrics are fine but one is a bomb):

remote_write:
  - url: "https://prometheus-prod.grafana.net/api/prom/push"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: "payment_authorization_duration_seconds_bucket"
        action: drop

Strip an expensive label at the boundary while forwarding the metric — keep the metric useful long-term but without its costliest dimension:

    write_relabel_configs:
      # Forward the histogram but without the per-BIN dimension
      - regex: "card_bin"
        action: labeldrop

Remote-write filtering strategy compared:

Strategy	`write_relabel_configs`	Effect on bill	Effect on local debugging	Best for
Allow-list aggregates	`keep` on `service:.*` + essentials	Largest reduction (5–10×)	None — raw stays local 24h	Mature setups with recording rules
Drop specific bombs	`drop` on named metric	Targeted reduction	None	One known offender among many
Strip a label at the edge	`labeldrop` on the costly label	Medium reduction	None	Metric useful long-term, label not
Forward everything	(no rules)	Full cost	N/A	Only if long-term needs raw fidelity
Shard by tenant to different endpoints	`keep`/`hashmod` per endpoint	Splits cost by tenant	None	Multi-tenant chargeback at the vendor

One remote-write-specific control worth knowing: some vendors and Prometheus itself expose queue and shard tuning (queue_config) that affects throughput and memory, not cardinality — do not confuse “my remote-write queue is backed up” (a throughput problem) with “my remote-write bill is high” (a cardinality problem). The fix for the former is queue tuning; the fix for the latter is write_relabel_configs.

Cost governance and chargeback

Controlling cardinality once is a project; keeping it controlled is governance. The model that holds in a multi-team platform is a budget per team, attributed by a team (or owner) label you attach via relabeling at scrape time, then measured with a recording rule and alerted on two ways.

First, attribute every series to an owner. In Kubernetes, map a namespace to a team via relabel_configs so attribution is automatic and no developer has to remember to add a label:

    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace]
        target_label: team
        regex: "(payments|checkout|search)-.*"
        replacement: "${1}"

Then build the per-team series count as a recording rule so the budget dashboard is cheap to render (the raw count by (team) is an expensive query to run on every dashboard refresh):

groups:
  - name: cardinality-governance
    interval: 1m
    rules:
      - record: "team:series:count"
        expr: "count by (team) ({__name__=~'.+'})"

Alert two ways. A hard-budget breach tells you something already broke; the week-over-week growth ratio catches the new exporter the day it ships, while the fix is still a one-line relabel rule:

groups:
  - name: cardinality-budgets
    rules:
      # Hard budget breach
      - alert: TeamCardinalityBudgetExceeded
        expr: "team:series:count > 200000"
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Team {{ $labels.team }} over its 200k series budget"
          description: "{{ $labels.team }} is at {{ $value }} active series."

      # Growth detector: series up >25% week-over-week
      - alert: CardinalityGrowthSpike
        expr: |
          team:series:count
            / (team:series:count offset 1w) > 1.25
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "Team {{ $labels.team }} cardinality up >25% WoW"
          description: "{{ $labels.team }} grew from {{ $value }}x baseline in a week."

The growth alert is the one that earns its keep. A budget alert is a lagging indicator (you are already over); the week-over-week ratio is a leading indicator (you are heading over). Wire the alert into your on-call routing so the owning team, not the platform team, gets paged — the whole point of chargeback is that the cost lands where the label was added. See Alertmanager Routing Trees, Inhibition & Deduplication for routing these by team label.

Chargeback in money, not just series. To turn series counts into a bill you can put in front of a team, divide the total metrics spend across teams by their share of active series. If your managed-Prometheus invoice is a function of active series, a team’s chargeback is (team:series:count / sum(team:series:count)) × total_spend. Publish it monthly. The behavior change is immediate: a team that sees “your metrics cost ₹40,000/month, 80% of it from one histogram’s tenant label” fixes the label without being told to.

The governance controls, from lightest-touch to strongest:

Control	What it does	Effort	Behavior it drives
`team` attribution label	Assigns every series an owner	Low (one relabel rule)	Enables everything below
Per-team budget dashboard	Visibility of who costs what	Low (one recording rule + panel)	Teams self-monitor
Budget breach alert	Pages when a team exceeds allocation	Low	Reactive cleanup
Week-over-week growth alert	Pages when a team’s series jump	Low	Catches leaks the day they ship
Monthly chargeback report	Puts the cost in currency, per team	Medium	Structural — teams design leaner metrics
Pre-merge cardinality check (CI)	Fails a PR that adds a high-card label	High	Prevents the leak from ever shipping
Per-tenant remote-write limits	Hard cap enforced by Mimir/vendor	Medium	Backstop when governance is ignored

Architecture at a glance

Picture the full pipeline as a left-to-right flow, because cardinality control is fundamentally about where in that flow you intervene, and each intervention is cheaper the earlier it sits.

On the far left is the application or exporter, where a metric is emitted with its labels. This is the true origin of cardinality — a label chosen here (tenant, path, trace_id) sets the ceiling for everything downstream, and the cheapest fix of all is to not add the bad label in the first place. Next, service discovery hands Prometheus a set of targets carrying meta-labels (__meta_kubernetes_namespace, __address__); here relabel_configs decides which targets to scrape (a keep/drop), sets attribution labels like team, and — if you shard — uses hashmod to assign each target to exactly one Prometheus replica. Nothing is stored yet; you are shaping what gets scraped.

The target is scraped, and every sample flows through metric_relabel_configs — the primary cardinality lever — before it reaches the TSDB head block. This is the cheapest per-series intervention: a drop, labeldrop, or replace here means the series never consumes head RAM, never adds query cost, never bills. What survives lands in the head, where the in-memory index holds every active series (and where the node OOMs if you got the earlier stages wrong). Guarding the entrance to the head are the hard limits — sample_limit, target_limit, and the label limits — which fail a bad scrape loudly instead of letting it silently bloat the head.

From the head, two paths diverge. Recording rules run on a schedule, reading raw high-cardinality series and writing small pre-aggregated series (service:...) back into the same TSDB. And remote write forwards series toward long-term storage (Thanos, Mimir, AMP, Grafana Cloud) — but only after passing through write_relabel_configs, the last filter, which typically keeps just the recording-rule aggregates and a small allow-list, so you store everything locally on short retention but pay to keep only what matters. Overlaying the whole flow is the governance loop: a team label attached at relabel_configs, a per-team series-count recording rule, and budget-plus-growth alerts that page the owning team. Read the flow once and the strategy is obvious — intervene as far left as you can, cap what you cannot predict, aggregate before you forward, and attribute the cost so it has an owner.

Real-world scenario

Nimbus Pay, a fictional but representative payments platform, ran a single-tenant Prometheus per environment, remote-writing to Grafana Cloud. Over six weeks their active series climbed from 1.4M to 6.8M with no corresponding traffic growth, and the monthly metrics bill roughly quintupled (from about ₹95,000 to ₹470,000). The on-call narrative was “Prometheus is slow” — dashboards were timing out — but the real story was billing, and the root cause was a single label.

The platform team, three SREs, started correctly: they took a snapshot block and ran promtool tsdb analyze /snapshot/data --limit=20 on a copy, so they added zero load to the struggling production server. It was obvious in under a minute. The top label name by unique values was card_bin — the first six digits of a card number — on a single payment_authorization_duration_seconds histogram. A well-meaning engineer had added it three weeks earlier to slice authorization latency by issuing bank during a fraud investigation. With ~12 histogram buckets, roughly 9,000 distinct BINs in production, and the existing method (4) and status (5) labels, that one histogram had ballooned to 14 × 9,000 × 4 × 5 ≈ 2.5M series on its own — and it was streaming straight to the paid backend, where every one of those series was a billed active series.

The constraint made it interesting: they could not simply delete the metric, because the fraud team genuinely used per-BIN latency during incident reviews. Deleting it would win the bill fight and lose a real capability. So they split storage by audience. They kept the raw, per-BIN histogram locally on a 24-hour retention for fraud debugging, but stripped card_bin at the remote-write boundary and forwarded only a pre-aggregated recording rule to long-term storage:

# Local recording rule: aggregate away the BIN dimension for long-term
groups:
  - name: payments
    interval: 30s
    rules:
      - record: "service:payment_auth_duration:bucket:rate5m"
        expr: |
          sum by (service, method, status, le) (
            rate(payment_authorization_duration_seconds_bucket[5m])
          )

# Remote-write: drop the raw per-BIN series; keep everything else + the aggregate
remote_write:
  - url: "https://prometheus-prod.grafana.net/api/prom/push"
    write_relabel_configs:
      - source_labels: [__name__]
        regex: "payment_authorization_duration_seconds_bucket"
        action: drop

Remote-write series for that metric dropped from ~2.5M to a few thousand (the aggregate). The fraud team kept its high-resolution local view for the 24-hour window they actually used; dashboards moved to the aggregated service:payment_auth_duration:bucket:rate5m series and rendered in under a second instead of timing out. They then added the week-over-week growth alert scoped per team, so the next engineer who reached for a high-cardinality label would get paged the day it shipped — before it hit the invoice. Monthly spend returned to its ~₹95,000 baseline within one billing cycle, and the fraud capability was preserved.

The incident as a timeline, because the order of moves is the lesson:

Time	State	Action taken	Effect	Note
Week 0	1.4M series, ₹95k/mo	(baseline)	—	Healthy
Week 3	+2.5M, climbing	Engineer adds `card_bin` for a fraud review	Bill starts climbing silently	No alert existed
Week 6	6.8M, ₹470k/mo	Dashboards time out; “Prometheus is slow”	Wrong diagnosis (latency, not bill)	The trap
+10 min	Diagnosed	`promtool tsdb analyze` on a snapshot	`card_bin` named as the offender	Zero prod load
+1 hr	Mitigated	Recording rule + `write_relabel_configs` drop	Remote series 2.5M → ~few thousand	Fraud view preserved locally
+1 day	Governed	Per-team WoW growth alert added	Next leak will page in a day	The durable fix
+1 cycle	Resolved	—	Spend back to ₹95k baseline	Fraud capability intact

Advantages and disadvantages

The Prometheus model — a pull-based, label-rich TSDB with a three-stage relabeling pipeline and a remote-write split — is exactly what makes cardinality both a risk and something you can surgically control. Weigh it honestly:

Advantages (why this model helps you)	Disadvantages (why it bites)
`metric_relabel_configs` drops series before storage — the cheapest possible intervention, zero cost to what you drop	The three stages (`relabel`/`metric_relabel`/`write_relabel`) are easy to confuse; a rule in the wrong stage silently does nothing
The remote-write split lets you keep raw data locally on short retention while paying to keep only aggregates	Requires discipline and setup — recording rules + allow-lists are not on by default; the default is “store and forward everything”
`/api/v1/status/tsdb` and `promtool tsdb analyze` make the offender findable in under a minute	Nothing warns you before the head OOMs or the bill spikes — you must build the governance loop yourself
Hard limits (`sample_limit` et al.) contain an unknown bad exporter loudly	`sample_limit` breach leaves `up == 1`, so it’s easy to miss without alerting on the counter
Labels make slicing trivially expressive and powerful for real dimensions	The same expressiveness makes it trivially easy to add an unbounded label at the call site
Recording-rule aggregates render dashboards fast and cheaply	Aggregation loses per-instance fidelity; you must decide what to keep raw
Chargeback attribution turns cardinality into a team’s budget	Attribution needs a consistent `team` label everywhere — a governance project in itself

The model is right for anyone running metrics at scale who is willing to invest in the governance loop; it bites hardest on teams who deploy Prometheus with defaults, remote-write everything, and never look until the invoice or the OOM. Every disadvantage is manageable — but only if you know it exists, which is the entire point of this article.

Hands-on lab

Reproduce a cardinality explosion locally, diagnose it with the exact queries from this article, fix it with metric_relabel_configs, and verify the series count drops — all with a local Prometheus and a tiny fake exporter. No cloud, no cost. Requires Docker (or a local prometheus binary) and curl/jq.

Step 1 — Create a working directory and a fake exporter that emits a high-cardinality metric. We use a shell script that serves a metric with an unbounded user_id label via nc-style text; the simplest portable approach is a static file with many series that Prometheus scrapes from a tiny HTTP server. Write the exposition file:

mkdir -p ~/card-lab && cd ~/card-lab
# Generate a metric with 2000 distinct user_id values -> 2000 series
{
  echo '# HELP http_requests_total Total requests'
  echo '# TYPE http_requests_total counter'
  for i in $(seq 1 2000); do
    echo "http_requests_total{method=\"GET\",status=\"200\",user_id=\"$i\"} $((RANDOM % 1000))"
  done
} > metrics.prom
wc -l metrics.prom   # expect ~2003 lines

Step 2 — Serve it over HTTP (Python’s built-in server is everywhere):

# Serve the current dir on :8000; the metric is at /metrics.prom
python3 -m http.server 8000 &
curl -s http://localhost:8000/metrics.prom | head -5   # confirm it serves

Step 3 — Write a Prometheus config that scrapes it (no relabeling yet — reproduce the explosion):

cat > prometheus.yml <<'EOF'
global:
  scrape_interval: 5s
scrape_configs:
  - job_name: "card-lab"
    metrics_path: /metrics.prom
    static_configs:
      - targets: ["host.docker.internal:8000"]
EOF

Step 4 — Run Prometheus (Docker; host.docker.internal reaches your host server):

docker run --rm -d --name card-lab-prom -p 9090:9090 \
  -v "$PWD/prometheus.yml:/etc/prometheus/prometheus.yml" \
  prom/prometheus:latest
sleep 15   # let it scrape a few times

Step 5 — Confirm the explosion with the diagnostic queries from this article:

# Headline series count (expect ~2000+ from this one metric)
curl -s 'http://localhost:9090/api/v1/query?query=prometheus_tsdb_head_series' \
  | jq -r '.data.result[0].value[1]'

# Which metric dominates?
curl -s 'http://localhost:9090/api/v1/query?query=topk(5,count%20by%20(__name__)(%7B__name__%3D~%22.%2B%22%7D))' \
  | jq -r '.data.result[] | "\(.metric.__name__)  \(.value[1])"'

# Confirm user_id is the offending LABEL (expect ~2000 distinct values)
curl -s 'http://localhost:9090/api/v1/query?query=count(count%20by%20(user_id)(http_requests_total))' \
  | jq -r '.data.result[0].value[1]'

Expected: http_requests_total tops the metric list with ~2000 series, and the user_id distinct-value count is ~2000 — you have reproduced and confirmed the offender, exactly as you would in production.

Step 6 — Fix it with metric_relabel_configs (drop the unbounded label):

cat > prometheus.yml <<'EOF'
global:
  scrape_interval: 5s
scrape_configs:
  - job_name: "card-lab"
    metrics_path: /metrics.prom
    static_configs:
      - targets: ["host.docker.internal:8000"]
    metric_relabel_configs:
      - regex: "user_id"
        action: labeldrop
EOF

Step 7 — Reload and verify the collapse:

docker kill -s HUP card-lab-prom     # SIGHUP reloads config without restart
sleep 15

# The user_id label must now be GONE — this should return nothing / empty
curl -s 'http://localhost:9090/api/v1/query?query=count(count%20by%20(user_id)(http_requests_total))' \
  | jq -r '.data.result'

# http_requests_total should collapse to ~1 series (method+status only)
curl -s 'http://localhost:9090/api/v1/query?query=count(http_requests_total)' \
  | jq -r '.data.result[0].value[1]'

Expected: the user_id query returns an empty result ([] or null), and http_requests_total collapses from ~2000 series to ~1. You have proven that metric_relabel_configs drops the series before storage — the head no longer holds them.

Step 8 — (Optional) Add a sample_limit and watch it fail loudly. Set sample_limit: 500 in the job, remove the labeldrop, reload, and observe the scrape fail while up stays 1:

curl -s 'http://localhost:9090/api/v1/query?query=prometheus_target_scrapes_exceeded_sample_limit_total' \
  | jq -r '.data.result[] | .value[1]'   # expect a non-zero, climbing counter

Step 9 — Teardown:

docker kill card-lab-prom 2>/dev/null
kill %1 2>/dev/null            # stop the python http.server
cd ~ && rm -rf ~/card-lab

Validation summary — what each step proves:

Step	Proves	Success signal
5	The explosion is real and the offender is a label	`user_id` distinct-count ~2000
7	`metric_relabel_configs` drops series pre-storage	`user_id` query empty; metric collapses to ~1 series
8	`sample_limit` fails loudly while `up == 1`	`..._exceeded_sample_limit_total` climbs

Common mistakes & troubleshooting

The failure modes here are subtle because the tooling does not always shout. This is the playbook — symptom, root cause, how to confirm, and the fix:

#	Symptom	Root cause	Confirm (exact command / query)	Fix
1	Metric-name `drop` rule “does nothing”	Put in `relabel_configs`, not `metric_relabel_configs` (no `__name__` pre-scrape)	Check which block the rule is in; `curl .../status/config \| grep -A3 relabel`	Move the rule to `metric_relabel_configs`
2	`Error on ingesting samples with different value but same timestamp`	A `labeldrop` collapsed two distinct series into an identical one	Scrape error logs; which label was dropped	Keep a label that’s part of identity; don’t drop it
3	Series count keeps climbing despite dropping labels	Churn — a label mints new series faster than old expire	`promtool tsdb analyze` churn column; trend `prometheus_tsdb_head_series`	`labeldrop` the churning label (`pod_template_hash`); `target_limit`
4	Head OOMs during compaction, no obvious offender	A high-cardinality label pair, not a single metric	`curl .../status/tsdb \| jq .data.seriesCountByLabelValuePair`	Drop the offending pair at `metric_relabel_configs`
5	Remote-write bill high but local head looks fine	You already filter locally but forward raw upstream	Vendor cardinality explorer; `write_relabel_configs` present?	Add/tighten `write_relabel_configs` `keep` allow-list
6	`sample_limit` set but bad target still bloats	Limit only applies per-scrape; target churn creates many scrapes	`increase(prometheus_target_scrapes_exceeded_sample_limit_total[5m])`	Add `target_limit`; `labeldrop` churn labels
7	Recording rule not reducing anything	Rule output kept locally and raw still forwarded	Compare `count(service:...)` vs raw; check `write_relabel_configs`	`keep` only `service:.*` in `write_relabel_configs`
8	`relabel_configs` `keep` drops everything	Regex is anchored/full-match; partial pattern fails	Test the regex against the full concatenated `source_labels` value	Anchor correctly; remember RE2 full-string match
9	Dropped a metric you now need	`drop` is at ingest — historical data is gone, not recoverable	Query returns nothing for the period the rule was active	Remove the `drop`; data resumes forward only
10	`hashmod` shards unbalanced	Hashed a high-churn or skewed label	Compare `count({__name__=~".+"})` per shard	Hash a stable, uniform label (`__address__`)
11	Growth alert never fires though series grew	`offset 1w` has no data (retention < 1w, or new team)	Check `team:series:count offset 1w` returns data	Ensure retention ≥ window; guard new teams
12	Vendor rejects writes: “per-tenant series limit exceeded”	Mimir/AMP/Grafana Cloud per-tenant cap hit	Vendor error in `prometheus_remote_storage_*` logs	Filter with `write_relabel_configs`; raise tenant limit
13	`count by (__name__)` query itself times out	The wildcard scan is expensive on a large TSDB	Query duration in the UI	Use `/api/v1/status/tsdb` (pre-computed) instead
14	Labels reappear after a fix	Rule applied to one job; the label comes from another	`count by (job)(metric)` to find the source job	Apply the `labeldrop` to every job that emits it

The single most common and most confusing failure is #1 — a drop rule silently doing nothing because it is in relabel_configs where __name__ does not yet exist. Any time a relabel rule “isn’t working,” check the stage first. The second most common is #2 — a labeldrop that collapses two real series into an identical one, producing the duplicate-sample error; the label you dropped was load-bearing.

A quick decision table for “my series count is wrong”:

If you see…	It’s probably…	Do this
One metric name dominating `seriesCountByMetricName`	An over-labeled or over-bucketed metric	`count(count by (label)(metric))` to find the label; `labeldrop` or `keep`
One label name dominating `labelValueCountByLabelName`	An unbounded label (`user_id`, `path`, `trace_id`)	`labeldrop` it at `metric_relabel_configs`
Series climbing over time, flat at any instant	Churn from autoscaling / deploys	`labeldrop` churn labels; `promtool ... analyze` churn column
High bill, low local head	Raw series forwarded upstream	`write_relabel_configs` allow-list
A rule “doing nothing”	Wrong relabel stage or unanchored regex	Verify stage; test the RE2 regex

Best practices

Diagnose before you drop. Capture a baseline prometheus_tsdb_head_series and the four /api/v1/status/tsdb lists before changing anything, so you can prove the fix worked and you don’t drop signal you actually need.
Find the offending label, not just the metric. seriesCountByMetricName tells you which metric; labelValueCountByLabelName and count(count by (label)(...)) tell you why — the label. Fix the why.
Drop at the cheapest stage. metric_relabel_configs (pre-ingest) costs nothing to what you drop. Prefer it over storing-then-aggregating. Prefer not adding the label at the source over all of it.
Know the three stages cold. relabel_configs = targets + attribution; metric_relabel_configs = drop metrics/labels before storage; write_relabel_configs = what you pay to keep. Never confuse them.
Allow-list chatty exporters. kube-state-metrics and cAdvisor emit hundreds of metrics; keep only the ones you dashboard on. This is usually the single biggest one-time reduction.
Set hard limits as a global floor. sample_limit, target_limit, label_limit, and the length limits in global, tightened per job — so a new exporter fails its own scrape, not the cluster.
Alert on the limit counters. sample_limit breach leaves up == 1; alert on prometheus_target_scrapes_exceeded_sample_limit_total or it’s invisible.
Normalize unbounded values to bounded templates. Raw path/url → route template :id via replace; free-text errors → bounded status_code. Do it at the source if you can, at metric_relabel_configs if you can’t.
Aggregate before you forward. Recording rules to per-service aggregates, short local retention for raw, write_relabel_configs to forward only the aggregates + an allow-list. Frequently 5–10× off the bill.
Attribute every series to a team. A team label via relabel_configs, a per-team recording rule, and budget + week-over-week growth alerts. The growth alert catches leaks the day they ship.
Publish chargeback in money. Convert series share to currency and send it monthly; nothing changes behavior like a team seeing its own bill.
Gate labels in review. The cheapest control of all: a PR reviewer (or a CI check) that asks “can this label’s value set be enumerated?” before a high-cardinality label ever ships.

Security notes

Cardinality control and security overlap more than people expect — the same unbounded labels that blow up the bill are often the ones that leak sensitive data.

High-cardinality labels are frequently PII. user_id, email, session_id, IP addresses, and card_bin are both cardinality bombs and personal or regulated data. Dropping them with labeldrop/drop at metric_relabel_configs simultaneously fixes the cost problem and removes PII from a metrics store that is typically less access-controlled than your primary database. Treat “is this label PII?” and “is this label high-cardinality?” as the same review question — the answer is usually yes to both.
Metrics are widely readable. Dashboards, alert annotations, and the /api/v1/* endpoints expose label values broadly across an org. A card BIN or email in a label is visible to anyone with Grafana access. Never put anything in a label you would not put on a shared dashboard.
Lock down the admin and query surfaces. The Prometheus HTTP API (/api/v1/status/tsdb, /api/v1/query) and the admin endpoints (/-/reload, and the TSDB admin API if enabled) should be behind authentication and network policy — --web.enable-admin-api in particular allows series deletion and should be off or tightly restricted in production.
Relabel drops are not retroactive. A drop/labeldrop stops future ingestion; data already stored (or already forwarded to a vendor) still contains the value. If a PII label already leaked into long-term storage, dropping the rule going forward is necessary but not sufficient — you may need the TSDB delete API (/api/v1/admin/tsdb/delete_series) locally and a data-deletion request to the vendor.
Redact in the collector for defense in depth. If metrics pass through an OpenTelemetry Collector before Prometheus, drop or hash sensitive labels there too (attributes/transform processors), so a misconfigured exporter can’t leak a value that never should have left the app. See OpenTelemetry Collector Pipelines in Production.
Least privilege on remote-write credentials. The remote-write endpoint credential can push arbitrary series into your (billed) backend; scope it to a single tenant/endpoint and rotate it, so a leaked credential can’t be used to inflate someone else’s bill or exfiltrate via crafted series.

Cost & sizing

Cardinality is the cost model for metrics, so sizing is mostly cardinality budgeting. Two meters matter: head-block RAM (caps how many series one node holds) and remote-write active series (what a managed vendor bills).

Head RAM scales with active series. A rough planning figure is a few KB of head memory per active series (index + chunk overhead), so a node with, say, 16–32 GB usable for the head holds on the order of a few million active series comfortably; past that you either shard with hashmod or reduce cardinality. The exact number varies with label count and churn — measure with process_resident_memory_bytes against prometheus_tsdb_head_series, don’t guess.
Local disk is cheap; the vendor is not. Storing raw series locally on a 24-hour retention is inexpensive disk you already own. The expensive meter is remote-write active series — Grafana Cloud, AMP, and Chronosphere all price on it (roughly on the order of a fraction of a rupee to a few rupees per thousand active series per month, vendor-dependent). This asymmetry is the reason to keep raw local and forward only aggregates.
Remote-write filtering is the highest-ROI cost lever. A write_relabel_configs allow-list that forwards only recording-rule aggregates plus essentials is frequently a 5–10× reduction in billed series — for the cost of a few lines of YAML. Do this before you consider buying a bigger plan.
Downsampling extends the split further. In the long-term tier, Thanos and Mimir downsample old blocks (5m and 1h resolutions), so historical data costs less to store and query — cardinality control feeds cleaner, smaller data into that machinery.

A rough monthly picture for a mid-size platform (illustrative — verify against your vendor’s current pricing):

Scenario	Active series (local)	Active series (remote)	Rough remote cost/mo	Note
No filtering, remote-write everything	5M	5M	High (₹300k–500k range)	The default trap
`keep` allow-list of aggregates + essentials	5M	~500k	~10× lower	The recommended pattern
+ drop chatty-exporter bloat at ingest	3M	~400k	Lower still + less head RAM	Local head also shrinks
+ per-tenant sharding to separate endpoints	3M	per-tenant split	Split by chargeback	Multi-tenant cost attribution

The cost drivers, what each buys, and the watch-out:

Cost driver	What you pay for	Lever to control it	Watch-out
Head-block RAM	Node memory to hold the active-series index	`metric_relabel_configs` drops; `hashmod` shard	OOM on compaction if under-provisioned
Local disk (retention)	Block storage for the retention window	Short retention for raw; keep aggregates longer	Cheap — don’t over-optimize this
Remote-write active series	Vendor bill, metered on active series	`write_relabel_configs` allow-list	The dominant metrics cost
Rule-evaluation CPU	Recording-rule computation	Keep rule count sane; sensible intervals	Over-ruling adds series + CPU
Long-term storage (Thanos/Mimir)	Object storage + compaction/query compute	Downsampling; retention tiers	Grows with retained cardinality
Query compute	Scanning series on dashboard refresh	Query aggregates, not raw	High-card panels time out

Free-tier note: self-hosted Prometheus itself is free (open source); the cost is the infrastructure it runs on and any managed remote-write backend. A single-node Prometheus with disciplined cardinality can run comfortably on a modest VM (₹2,000–4,000/month range) and remote-write only aggregates to a vendor’s lower tier — the discipline is what keeps it cheap, not the software license.

Interview & exam questions

1. What is cardinality in Prometheus, and why does it matter more than sample rate? Cardinality is the count of unique time series — every distinct combination of metric name and label values. It matters more than sample rate because Prometheus holds an in-memory index of every active series in the head block, so memory, query latency, and remote-write bill all scale with the number of series, not how fast samples arrive on each. A metric scraped once a minute with 2M series is far more expensive than one scraped every second with 50.

2. Why is adding one label so dangerous? Because total series is the product of label cardinalities, not the sum. Adding a user_id label with 100,000 values doesn’t add 100,000 series — it multiplies the metric’s existing series count by 100,000. One unbounded label can take a metric from thousands to billions of potential series.

3. Explain the difference between relabel_configs, metric_relabel_configs, and write_relabel_configs. relabel_configs runs before the scrape on target meta-labels — it decides what to scrape and sets identity/attribution labels. metric_relabel_configs runs after the scrape on every sample’s labels, before TSDB ingestion — the primary cardinality lever, because what you drop costs zero. write_relabel_configs runs at the remote-write boundary — it decides what you pay a vendor to keep, independent of local storage.

4. A drop rule on __name__ in relabel_configs isn’t working. Why? Because relabel_configs runs before the scrape, when only target meta-labels exist — the samples and their __name__ don’t exist yet. A metric-name drop must go in metric_relabel_configs, which runs on post-scrape samples. This is the single most common relabeling mistake.

5. How do you find the offending label (not just metric) causing a cardinality explosion? Use /api/v1/status/tsdb’s labelValueCountByLabelName list, or promtool tsdb analyze, or the nested-count PromQL pattern count(count by (label)(metric)) per suspect label. seriesCountByMetricName only tells you which metric; you need the label-value-count list to find which label is unbounded.

6. What does sample_limit do, and what’s the trap in monitoring it? sample_limit caps how many samples one scrape may yield; exceeding it drops the entire scrape and marks it failed. The trap: the target’s up metric stays 1 (it was reachable), so a breach looks healthy — you must alert on prometheus_target_scrapes_exceeded_sample_limit_total, not up.

7. What is hashmod used for? Sharding. It sets a target label to hash(source_labels) % modulus, so you can run N Prometheus replicas where each keeps only its bucket, deterministically splitting the target set. It’s a scaling answer (hold more series across nodes), not a cardinality reduction answer — the series still exist, just spread across shards, and you need Thanos/Mimir to query globally.

8. When would you use a recording rule for cardinality control? When you need raw high-cardinality data for short-term debugging but only ever query an aggregate. The rule pre-computes the aggregate (e.g. sum by (service)), you keep raw data on short local retention, and forward only the aggregate to long-term storage via write_relabel_configs — often a 5–10× reduction with per-service fidelity preserved for queries.

9. Your remote-write bill is high but the local head looks fine. What’s happening and how do you fix it? You’re already filtering (or the head is naturally small) locally, but forwarding raw high-cardinality series upstream where the vendor meters active series. Confirm with the vendor’s cardinality explorer. Fix with a write_relabel_configs keep allow-list that forwards only recording-rule aggregates and essentials, so long-term storage holds only what you pay to keep.

10. Why can a labeldrop cause a duplicate-sample error? Dropping a label collapses all series that differed only in that label. If two surviving series become identical (same name, same remaining labels) but carry different values at the same timestamp, Prometheus reports different value but same timestamp and drops the sample. The dropped label was part of the series identity — don’t drop it.

11. How do you govern cardinality across many teams? Attribute every series to a team label via relabel_configs, build a per-team series-count recording rule, and alert two ways: a hard-budget breach (lagging) and a week-over-week growth ratio (leading — catches the leak the day it ships). Add monthly chargeback in currency to change behavior structurally.

12. What’s the relationship between cardinality and PII/security? They overlap heavily: the labels that explode cardinality (user_id, email, IPs, card_bin) are usually PII, and metrics stores are broadly readable. Dropping them at metric_relabel_configs fixes cost and removes PII simultaneously. Note that drops are not retroactive — already-stored/forwarded values still contain the data.

These map to certifications and role expectations as follows:

Question theme	Relevant cert / role	Objective area
Cardinality mechanics, TSDB, relabeling	PCA (Prometheus Certified Associate)	Instrumentation & metrics; PromQL; recording rules
Kubernetes SD, kube-state-metrics, relabeling	CKA / CKAD (monitoring), PCA	Cluster observability; metrics pipeline
Cost governance, chargeback, remote write	FinOps Practitioner; SRE	Unit economics; observability cost
Limits, OOM, sizing, sharding	SRE / Platform Engineer	Capacity planning; reliability
PII in labels, admin API lockdown	Security Engineer	Data protection; least privilege

Quick check

A metric has labels method (4 values), status (5), and someone adds tenant (8,000). Roughly how many series does the metric now have, and is that a problem?
You add a drop rule for a noisy metric to relabel_configs and nothing changes. What’s wrong and where should the rule go?
Your sample_limit is set, a target is clearly over it, but the target’s up is 1 so it looks fine. How do you actually detect the breach?
You need per-BIN payment latency for fraud debugging but can’t afford to store it long-term. What’s the pattern?
Two things you’d do to attribute and govern cardinality across ten teams.

Answers

4 × 5 × 8,000 = 160,000 series — up from 20 without tenant. Yes, it’s a problem: an 8,000-value tenant label multiplied the metric 8,000×. If you don’t slice by tenant on this metric, labeldrop it; if you do, aggregate it away for long-term storage with a recording rule and keep raw on short local retention.
The rule is in the wrong stage. relabel_configs runs before the scrape on target meta-labels, where __name__ doesn’t exist yet, so a metric-name drop is a no-op there. Move it to metric_relabel_configs, which runs on post-scrape samples before ingestion.
Alert on the dedicated counter prometheus_target_scrapes_exceeded_sample_limit_total (e.g. increase(...[5m]) > 0). A sample_limit breach keeps up == 1 because the target was reachable — only the counter reveals that the scrape’s samples were dropped.
Keep the raw per-BIN histogram locally on a short (e.g. 24h) retention, write a recording rule that aggregates away the card_bin dimension (sum by (service, method, status, le)), and use write_relabel_configs to forward only the aggregate to long-term storage (drop the raw metric at the boundary). Fraud keeps the local view; you pay only for the aggregate.
(a) Attach a team label to every series via relabel_configs (e.g. mapping Kubernetes namespace → team) and build a per-team series-count recording rule for a cheap budget dashboard. (b) Alert on both a hard budget breach and a week-over-week growth ratio (the leading indicator that catches a leak the day it ships), routed to the owning team; publish monthly chargeback in currency.

Glossary

Time series — one unique combination of a metric name and its label values; the atomic unit of cardinality.
Active series — a series that has received a sample recently and is held in the head block; the number that drives head RAM and the remote-write bill.
Cardinality — the count of distinct time series, often measured per metric or per label; the load, the meter, and the incident.
Head block — the in-memory index of all active series plus the most recent ~2 hours of uncompacted samples before they flush to a persistent block; OOMs if series count exceeds capacity.
__name__ — the internal label that holds a metric’s name; count by (__name__) ranks metrics by series count.
relabel_configs — pre-scrape relabeling on target meta-labels; decides what to scrape and sets identity/attribution labels.
metric_relabel_configs — post-scrape, pre-ingestion relabeling on sample labels; the primary cardinality lever because dropped series cost nothing.
write_relabel_configs — relabeling at the remote-write boundary; decides what you pay a long-term backend to keep, independent of local storage.
labeldrop / labelkeep — relabel actions that remove / retain labels by name regex; collapse series that differed only in the dropped label.
drop / keep — relabel actions that discard / retain entire series (or targets) whose source_labels match the regex.
replace — relabel action that writes a value (with regex capture groups) into a target label; used to normalize raw paths into route templates.
hashmod — relabel action that sets a target label to hash(source_labels) % modulus, used to shard targets deterministically across N Prometheus instances.
Recording rule — a pre-computed, stored aggregate written under a new (conventionally level:metric:operations) name; turns raw high-cardinality data into a cheap series.
sample_limit — per-scrape cap on samples; exceeding it fails the entire scrape while leaving up == 1.
target_limit — per-job cap on the number of targets service discovery may return; contains SD explosions.
label_limit / label_name_length_limit / label_value_length_limit — per-sample caps on label count and label name/value length; reject runaway or free-text labels.
Churn — the rate at which series are created and expire; high churn (e.g. from autoscaling pods) inflates cumulative unique series far above the instantaneous count.
Chargeback — attributing metric cost to a team/owner (via a team label) so cardinality has a budget and, in currency, a bill.
TSDB status — the /api/v1/status/tsdb endpoint (and UI page) exposing the head’s seriesCountByMetricName, seriesCountByLabelValuePair, labelValueCountByLabelName, and memoryInBytesByLabelName.
promtool tsdb analyze — an offline command that reads a block on disk and reports the highest-cardinality labels, label pairs, and metric churn without loading the running server.

Next steps

You can now find any cardinality offender, drop it at the cheapest stage, cap the unknown with limits, aggregate before you forward, and govern the cost per team. Build outward:

Next: Prometheus Recording Rules & Remote Write for Long-Term Storage — go deep on the recording-rule evaluation, staleness, and remote-write queue mechanics that this article uses as levers.
Related: PromQL Deep Dive: rate(), Histograms & Aggregation — master the count by, rate, and quantile queries that power both the diagnosis and the recording rules here.
Related: Thanos: Global Query, Deduplication & Downsampling — the long-term backend whose downsampling extends the local/remote cost split you built.
Related: Grafana Mimir: Multi-Tenant, Horizontally Scalable Metrics — per-tenant series limits that enforce cardinality budgets at the storage layer.
Related: Loki, LogQL & Label Cardinality: Chunk Storage Tuning — the exact same cardinality discipline applied to logs, where high-cardinality labels are even more punishing.
Related: Multi-Cloud FinOps with Apptio Cloudability: Unit Economics — generalize the per-team chargeback model to your whole cloud spend.

Taming Metric Cardinality: Relabeling, Limits, and Cost Governance in Prometheus

What problem this solves

Learning objectives

Prerequisites & where this fits

Core concepts

The vocabulary in one table

How cardinality explodes: the arithmetic and the usual suspects

Diagnose the offenders before you touch a config

Relabeling: the three stages and every action

The three stages, and why the difference is everything

The relabel rule anatomy

Every action, and when to use it

`hashmod` sharding — when one Prometheus is not enough

Hard limits: circuit breakers so one bad target can’t win

The per-scrape limits

Target churn protection

Aggregate at write time with recording rules

Filter at the remote-write boundary

Cost governance and chargeback

Architecture at a glance

Real-world scenario

Advantages and disadvantages

Hands-on lab

Common mistakes & troubleshooting

Best practices

Security notes

Cost & sizing

Interview & exam questions

Quick check

Answers

Glossary

Next steps

Written by Vinod

Comments

Keep Reading

Configure Dynatrace SLOs, Davis AI Anomaly Detection, and Management Zones

Configure Grafana Tempo with TraceQL, Metrics-Generator, and S3 Block Storage

Configure VictoriaMetrics Cluster for High-Cardinality Long-Term Metrics Storage

Taming Metric Cardinality: Relabeling, Limits, and Cost Governance in Prometheus

What problem this solves

Learning objectives

Prerequisites & where this fits

Core concepts

The vocabulary in one table

How cardinality explodes: the arithmetic and the usual suspects

Diagnose the offenders before you touch a config

Relabeling: the three stages and every action

The three stages, and why the difference is everything

The relabel rule anatomy

Every action, and when to use it

hashmod sharding — when one Prometheus is not enough

Hard limits: circuit breakers so one bad target can’t win

The per-scrape limits

Target churn protection

Aggregate at write time with recording rules

Filter at the remote-write boundary

Cost governance and chargeback

Architecture at a glance

Real-world scenario

Advantages and disadvantages

Hands-on lab

Common mistakes & troubleshooting

Best practices

Security notes

Cost & sizing

Interview & exam questions

Quick check

Answers

Glossary

Next steps

Written by Vinod

Comments

Keep Reading

Configure Dynatrace SLOs, Davis AI Anomaly Detection, and Management Zones

Configure Grafana Tempo with TraceQL, Metrics-Generator, and S3 Block Storage

Configure VictoriaMetrics Cluster for High-Cardinality Long-Term Metrics Storage

`hashmod` sharding — when one Prometheus is not enough