Observability Platform

Configure VictoriaMetrics Cluster for High-Cardinality Long-Term Metrics Storage

A payments platform team runs forty Kubernetes clusters and a fleet of edge appliances, and their single Prometheus pair has stopped coping: the active series count crossed nine million the week they added per-transaction-id labels to a latency histogram, the box now needs 180 GiB of RAM just to stay up, queries that used to return in a second time out, and retention is capped at fifteen days because the local disk is full. The SRE lead’s mandate is blunt — “thirteen months of metrics, sub-second dashboards, and stop paging me about OOM at 3 a.m.” This guide walks through replacing that single Prometheus with a VictoriaMetrics clustervminsert, vmselect, and vmstorage as independently scalable tiers — fronted by vmagent as a drop-in Prometheus remote_write backend that absorbs high-cardinality ingestion and serves long-term queries without the memory cliff. Everything below is run against a real Kubernetes cluster and is reversible.

VictoriaMetrics splits the monolith Prometheus does in one process into three roles. vmstorage holds the data and does the heavy lifting of the query (it is stateful and the one tier you scale for cardinality and retention). vminsert is stateless, accepts writes, and shards each series across the storage nodes by a consistent hash of its labels. vmselect is stateless, fans a query out to every storage node, and merges the results. vmagent replaces Prometheus’ own scraping and remote-write: it scrapes targets (or receives Prometheus’ remote_write), buffers to disk when the backend is briefly unavailable, and can drop or relabel high-cardinality labels before they ever hit storage. Because the read and write tiers are stateless, you scale them with a replica count; because storage is sharded, you add cardinality headroom by adding vmstorage pods.

Prerequisites

Target topology

Configure VictoriaMetrics Cluster for High-Cardinality Long-Term Metrics Storage — topology

The write path and the read path share the storage tier but are otherwise independent, and keeping them separate in your head is the key to operating this well. On the write path, vmagent scrapes pods and appliances (or receives remote_write from your legacy Prometheus), applies relabeling to tame cardinality, and pushes to a load-balanced vminsert Service; vminsert hashes each series and shards it across the vmstorage pods. On the read path, Grafana (and Dynatrace, via a Prometheus datasource) queries vmselect, which scatter-gathers across every vmstorage pod and merges the result. vmstorage is the only stateful tier — its PVCs hold both the inverted index (what makes high cardinality expensive) and the compressed samples (what makes thirteen-month retention cheap). Around the edges: vmauth terminates auth and routes, Vault issues the object-storage credentials vmbackup uses for the durable copy, and Entra/Okta gate the dashboards.

The components, and the one configuration choice that matters most for each:

Component Role The choice that matters
vmagent Scrape / receive remote_write, relabel, buffer, forward -remoteWrite.maxDiskUsagePerURL so a backend blip buffers instead of dropping
vminsert Stateless write router; shards series across storage Replica count for write throughput; -replicationFactor for durability
vmstorage Stateful index + sample store -retentionPeriod, disk size, and pod count (your cardinality lever)
vmselect Stateless query fan-out and merge -search.maxUniqueTimeseries, cache size, replica count for QPS
vmauth Auth proxy / router in front of insert + select Per-tenant routing and bearer-token enforcement
vmbackup / vmrestore Snapshot to object storage and restore Vault-issued S3/Blob creds; backup cadence
Vault Issues short-lived object-storage credentials for backups Dynamic secrets engine; no static keys on disk
Entra ID + Okta SSO for Grafana and vmauth OIDC; Okta federated to Entra; group → role mapping

1. Create the namespace and add the Helm repo

VictoriaMetrics ships an official Helm chart for the cluster topology. Pin the chart version so a helm upgrade never silently jumps a major.

kubectl create namespace monitoring

helm repo add vm https://victoriametrics.github.io/helm-charts/
helm repo update

# Pin to a known-good chart version (inspect what's available first)
helm search repo vm/victoria-metrics-cluster --versions | head

Confirm your SSD StorageClass exists before you ask a StatefulSet to bind 500 GiB volumes to it:

kubectl get storageclass
# Expect a premium/SSD class, e.g. managed-csi-premium or gp3, marked (default) or named explicitly below.

2. Write the cluster values file

This values.yaml defines the three tiers, sets thirteen-month retention, and enables a replication factor of 2 so the cluster survives a single vmstorage pod loss. The vmstorage block is where high-cardinality decisions live — disk size, retention, and the replica count you will grow over time.

# vm-cluster-values.yaml
vmstorage:
  replicaCount: 3
  retentionPeriod: "13"          # months; the long-term-storage requirement
  extraArgs:
    dedup.minScrapeInterval: "30s"   # global dedup if you HA-pair vmagent
    # Reject absurd-cardinality streams at the door rather than OOM later:
    storage.maxHourlySeries: "2000000"
    storage.maxDailySeries: "8000000"
  persistentVolume:
    enabled: true
    storageClassName: "managed-csi-premium"
    size: 500Gi
  resources:
    requests: { cpu: "4", memory: "16Gi" }
    limits:   { cpu: "8", memory: "32Gi" }
  podDisruptionBudget:
    enabled: true
    maxUnavailable: 1

vminsert:
  replicaCount: 3
  extraArgs:
    replicationFactor: "2"        # each series written to 2 storage nodes
    maxLabelsPerTimeseries: "40"  # hard cap on label count per series
  resources:
    requests: { cpu: "1", memory: "1Gi" }
    limits:   { cpu: "2", memory: "2Gi" }

vmselect:
  replicaCount: 3
  cacheMountPath: /cache
  persistentVolume:
    enabled: true
    storageClassName: "managed-csi-premium"
    size: 50Gi
  extraArgs:
    dedup.minScrapeInterval: "30s"
    search.maxUniqueTimeseries: "1000000"   # guard against runaway queries
    search.maxQueryDuration: "60s"
  resources:
    requests: { cpu: "2", memory: "4Gi" }
    limits:   { cpu: "4", memory: "8Gi" }

A note on replicationFactor: it is set on vminsert (the writer), not storage, and it must be strictly less than the vmstorage replica count. With replicationFactor: 2 and 3 storage pods, vmselect must be told to tolerate one missing node, which the chart wires automatically when it sees the insert replication arg; verify it in step 5 if you tune these by hand.

3. Install the cluster

helm install vmcluster vm/victoria-metrics-cluster \
  --namespace monitoring \
  --version <pinned-chart-version> \
  -f vm-cluster-values.yaml

# Watch the storage StatefulSet bind its PVCs and go Ready
kubectl -n monitoring rollout status statefulset/vmcluster-victoria-metrics-cluster-vmstorage --timeout=300s
kubectl -n monitoring get pods -l app.kubernetes.io/instance=vmcluster

Note the two Service names the chart creates — you will write to one and read from the other:

kubectl -n monitoring get svc | grep -E 'vminsert|vmselect'
# vminsert  -> :8480  (write endpoint)
# vmselect  -> :8481  (read endpoint)

The cluster URL paths carry a tenant id (0 for single-tenant). Writes go to /insert/0/prometheus/api/v1/write; reads go to /select/0/prometheus as a Prometheus-compatible datasource. The 0 is accountID:projectID collapsed to one number — you get multi-tenancy for free later by changing it.

4. Deploy vmagent as the remote-write backend

vmagent is what your existing Prometheus talks to, and what tames cardinality. Deploy it with a scrape config plus a relabeling rule that drops the high-cardinality label that started the incident, then forwards to vminsert. Buffering to disk (-remoteWrite.maxDiskUsagePerURL) means a storage hiccup queues data instead of losing it.

# vmagent-values.yaml
remoteWriteUrls:
  - http://vmcluster-victoria-metrics-cluster-vminsert.monitoring.svc:8480/insert/0/prometheus/api/v1/write

extraArgs:
  remoteWrite.maxDiskUsagePerURL: "10GiB"   # on-disk buffer if vminsert is briefly down
  remoteWrite.tmpDataPath: /vmagent-buffer
  promscrape.maxScrapeSize: "32MiB"

persistentVolume:
  enabled: true
  storageClassName: "managed-csi-premium"
  size: 20Gi

config:
  global:
    scrape_interval: 30s
    external_labels:
      cluster: payments-prod
  scrape_configs:
    - job_name: kubernetes-pods
      kubernetes_sd_configs: [{ role: pod }]
      relabel_configs:
        # only scrape pods that opt in
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: "true"
      metric_relabel_configs:
        # DROP the per-transaction-id label that exploded cardinality
        - source_labels: [transaction_id]
          action: labeldrop
        # Drop a noisy histogram we never query
        - source_labels: [__name__]
          regex: "envoy_cluster_.*_bucket"
          action: drop
helm install vmagent vm/victoria-metrics-agent \
  --namespace monitoring \
  --version <pinned-chart-version> \
  -f vmagent-values.yaml

kubectl -n monitoring rollout status deployment/vmagent-victoria-metrics-agent

To migrate without ripping out the legacy Prometheus first, point its remote_write at the same vminsert endpoint and run both in parallel during cutover:

# add to the legacy prometheus.yml, then reload Prometheus
remote_write:
  - url: http://vmcluster-victoria-metrics-cluster-vminsert.monitoring.svc:8480/insert/0/prometheus/api/v1/write
    queue_config:
      max_shards: 30
      capacity: 20000

5. Point Grafana (and Dynatrace) at vmselect

Add vmselect as a Prometheus datasource. VictoriaMetrics speaks the Prometheus query API and MetricsQL (a superset of PromQL), so existing dashboards work unchanged.

# grafana datasource (provisioning/datasources/vm.yaml)
apiVersion: 1
datasources:
  - name: VictoriaMetrics
    type: prometheus
    access: proxy
    url: http://vmcluster-victoria-metrics-cluster-vmselect.monitoring.svc:8481/select/0/prometheus
    isDefault: true
    jsonData:
      httpMethod: POST
      prometheusType: Prometheus
      timeInterval: 30s

The same vmselect URL becomes a Datadog/Dynatrace ingestion source where those platforms consume a Prometheus endpoint, so a single long-term store backs both your Grafana dashboards and your APM tool’s metric correlation — no second copy of the data. For SSO, front Grafana with Microsoft Entra ID as the OIDC provider, with Okta federated to Entra as the workforce IdP, and map an Entra group to the Grafana Admin role so dashboard access follows the same identity your humans already use everywhere else.

6. Wire backups to object storage via Vault

vmstorage PVCs are durable within the cluster, but the long-term-storage requirement means an off-cluster copy. Run vmbackup as a sidecar that snapshots to S3/Azure Blob, and fetch the bucket credentials at runtime from HashiCorp Vault so no static access key ever sits in a Secret or on disk.

# Vault issues short-lived S3 creds via its AWS secrets engine
export VAULT_ADDR=https://vault.internal:8200
vault read aws/creds/vmbackup-writer
# -> access_key / secret_key with a 1h TTL, rotated automatically
# vmbackup sidecar args (added to the vmstorage pod spec)
- name: vmbackup
  image: victoriametrics/vmbackup:v1.103.0-cluster
  args:
    - -storageDataPath=/storage
    - -snapshot.createURL=http://localhost:8482/snapshot/create
    - -dst=s3://payments-vm-longterm/$(POD_NAME)/
    - -customS3Endpoint=https://s3.ap-south-1.amazonaws.com
  env:
    - name: AWS_ACCESS_KEY_ID            # injected by the Vault Agent sidecar
      valueFrom: { secretKeyRef: { name: vmbackup-s3, key: access_key } }
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom: { secretKeyRef: { name: vmbackup-s3, key: secret_key } }
  volumeMounts:
    - { name: vmstorage-volume, mountPath: /storage }

Schedule backups (a CronJob invoking the snapshot+upload, or vmbackupmanager for retention-aware rotation) and test a restore with vmrestore -src=s3://... -storageDataPath=/storage into a scratch pod before you trust it.

7. Manage it all as code

Keep every values file and Helm release in Git and reconcile with Argo CD, so the cluster’s desired state is reviewable and revertable — a helm upgrade becomes a pull request, not an SSH session. A GitHub Actions (or Jenkins) pipeline lints the chart values and runs helm template | kubeconform on every PR before Argo CD syncs. Provision the underlying nodes, the SSD StorageClass, the S3 bucket, and the Vault roles with Terraform, and use Ansible to lay down vmagent on the bare-metal virtual appliances at the edge that are not part of the Kubernetes cluster but still need to ship metrics into the same backend.

# what the CI gate runs on every PR
helm template vmcluster vm/victoria-metrics-cluster -f vm-cluster-values.yaml \
  | kubeconform -strict -summary -kubernetes-version 1.28.0

Validation

Prove ingestion, sharding, retention, and query before you cut traffic over.

# 1. vmagent is forwarding and not dropping — check its /metrics
kubectl -n monitoring port-forward deploy/vmagent-victoria-metrics-agent 8429 &
curl -s localhost:8429/metrics | grep -E 'vmagent_remotewrite_(requests_total|errors_total|pending_data_bytes)'
# pending_data_bytes near 0 and errors_total flat = healthy forwarding

# 2. Storage is actually holding series, and they are sharded across pods
kubectl -n monitoring port-forward svc/vmcluster-victoria-metrics-cluster-vmstorage 8482 &
curl -s 'localhost:8482/metrics' | grep vm_cache_entries
# Repeat against each vmstorage pod; series counts should be roughly even.

# 3. Query through vmselect returns data (Prometheus API)
kubectl -n monitoring port-forward svc/vmcluster-victoria-metrics-cluster-vmselect 8481 &
curl -s 'localhost:8481/select/0/prometheus/api/v1/query?query=up' | jq '.data.result | length'

# 4. Cardinality explorer — find what is eating your index
curl -s 'localhost:8481/select/0/prometheus/api/v1/status/tsdb' | jq '.data.seriesCountByMetricName[0:10]'
# This is the high-cardinality audit: the top metrics by series count.

The fourth call is the one to run weekly: VictoriaMetrics’ cardinality explorer (also a UI at vmselect’s /select/0/vmui/#/cardinality) tells you exactly which metric or label is driving series growth, so you tune the vmagent labeldrop rule with evidence rather than guesswork. Confirm retention by querying a timestamp older than your old Prometheus’ fifteen-day cap and getting a result.

Rollback / teardown

Because the legacy Prometheus kept running through cutover, rollback is a config revert, not a recovery.

# Roll back a bad upgrade to the previous release revision
helm -n monitoring history vmcluster
helm -n monitoring rollback vmcluster <previous-revision>

# Point Grafana back at the old Prometheus datasource (revert the provisioning PR via Argo CD)

# Full teardown — uninstall the releases, then DELETE PVCs explicitly
helm -n monitoring uninstall vmagent vmcluster
kubectl -n monitoring delete pvc -l app.kubernetes.io/instance=vmcluster

helm uninstall deliberately does not delete the vmstorage PVCs — that is a safety feature, so your thirteen months of data survives an accidental uninstall. Delete them only when you are certain, and confirm a fresh backup exists in object storage first.

Common pitfalls

Security notes

Do not expose vminsert or vmselect directly. Put vmauth in front as the single authenticated front door — it validates bearer tokens, routes per tenant, and rate-limits — and gate the human-facing Grafana with OIDC through Microsoft Entra ID, federated from Okta so dashboard access follows your existing workforce identity and conditional-access policies. Pull the object-storage credentials vmbackup uses from HashiCorp Vault’s dynamic secrets engine so they are short-lived and never written to a Kubernetes Secret in plain text. Run Wiz (with Wiz Code scanning the Helm values and Terraform in the repo) for continuous cloud-posture and misconfiguration detection — it flags the moment a Service drifts to a public load balancer or a backup bucket loses its block-public-access setting. Deploy CrowdStrike Falcon sensors on the cluster nodes and the edge virtual appliances for runtime threat detection feeding your SOC, and auto-raise a ServiceNow incident on a guardrail breach (a public-exposure alert from Wiz, a sustained vmagent drop) so security gets a ticket, not just a log line. Network-policy the monitoring namespace so only vmagent and vmauth can reach vminsert, and only vmselect and vmauth can reach the query path.

Cost notes

The win is mostly RAM: VictoriaMetrics holds the same series in roughly an order of magnitude less memory than Prometheus, so the 180 GiB single box becomes three modest 32 GiB storage pods — and its on-disk compression (typically well under one byte per sample) makes thirteen-month retention on SSD genuinely affordable where Prometheus could not hold fifteen days. Scale the tiers independently and only where the pressure is: add vmstorage pods for cardinality and retention, vmselect replicas for dashboard QPS, vminsert replicas for write throughput — never one oversized everything. The biggest single lever is the vmagent relabel rule: every high-cardinality label you drop before storage is index you never pay to build, hold, or query, so the cardinality explorer pays for itself directly in storage cost. Tier the backup bucket to infrequent-access/cool storage since restores are rare, and meter ingestion and series growth into Dynatrace so the metrics platform’s own cost is on a dashboard the SRE lead sees — the same dashboard that proves the 3 a.m. pages stopped.

VictoriaMetricsPrometheusKubernetesObservabilityHigh CardinalityHelm
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading