Loki gets sold as “Prometheus for logs,” and that one-liner causes most of the production incidents I get called into. Loki does not index your log content. It indexes a small set of labels and stores everything else as compressed chunks in object storage, then brute-forces the text at query time. That design is why Loki is an order of magnitude cheaper than Elasticsearch — and why one badly chosen label can detonate your index, your ingesters, and your bill in an afternoon. This is the working architecture: how the components fit, how to design labels that stay cheap, the LogQL you actually run, and how to tune chunks, caching, and retention so the thing pays for itself.
1. The architecture you must internalize
Loki splits cleanly into a write path and a read path, with object storage in the middle and a TSDB index describing what lives where.
logs in queries in
| |
[ distributor ] --hash ring, by stream--> [ ingester ] [ querier ]
| | | |
validate, rate-limit build chunks flush [ query-frontend ]
| | (split + cache)
v v
[ object storage: chunks + TSDB index ]
^
[ compactor ] dedupe index, apply retention
- Distributor is the front door. It validates lines, enforces per-tenant and per-stream rate limits, and uses a consistent hash ring to route each stream to ingesters (replication factor 3 by default in a cluster). It holds no state.
- Ingester batches incoming lines for a stream into a chunk, compresses it, keeps it in memory until it is full or idle, then flushes it to object storage and writes the corresponding TSDB index entry. This is the component that dies first when cardinality explodes, because every active stream costs memory.
- Querier executes LogQL. It fetches matching chunks from object storage (and recently-flushed data still resident in ingesters), decompresses them, and runs your filters and aggregations.
- Query-frontend sits in front of queriers. It splits a big time range into sub-queries, runs them in parallel across queriers, and caches results. It is the single highest-leverage component for query performance.
- Compactor runs on the index in object storage: it merges per-ingester TSDB files into a single compacted index per day, and it is the only component that applies retention and deletions.
A stream is the atomic unit of Loki: a unique set of label key-value pairs. {app="api", env="prod", pod="api-7d9f-x"} is one stream; change any value and it is a different stream. Everything about cost and performance follows from how many streams you create.
2. Label design: the one decision that determines your bill
Loki’s index size is governed by the number of unique streams, which is the product of the cardinalities of your labels. Two labels with 1,000 values each is potentially 1,000,000 streams. This is the cardinality bomb, and it is almost always a pod, request_id, user_id, trace_id, or path label that lights the fuse.
The rule that prevents 90% of Loki disasters:
Labels are for routing and selecting streams, not for storing data you want to search. If a value is high-cardinality or unbounded, it belongs inside the log line, extracted at query time with a parser — never as a label.
Good labels are bounded, predictable, and small in count: cluster, namespace, app, env, level, component. Everything else — IDs, IPs, paths, user agents, durations — stays in the line.
Configure hard guardrails so a bad pipeline can’t take down the cluster:
# loki config: limits_config
limits_config:
# Reject streams with too many labels (a runaway label-extraction pipeline)
max_label_names_per_series: 15
max_label_value_length: 2048
max_label_name_length: 1024
# Cap active streams per tenant; protects ingester memory
max_global_streams_per_user: 50000
max_streams_per_user: 0 # 0 = use global limit only
# Per-stream ingest rate (bytes/sec) and burst, the per-stream throttle
per_stream_rate_limit: 5MB
per_stream_rate_limit_burst: 20MB
# Per-tenant ingestion ceiling
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
ingestion_rate_strategy: local # per-distributor; use 'global' for shared budget
Find your worst offenders before they hurt you. The label dimensions endpoint and logcli report per-label cardinality:
# Top label values by cardinality for a given label over the last hour
logcli series '{namespace="prod"}' --since=1h | \
awk -F'pod=' 'NF>1{print $2}' | sort -u | wc -l
# Or query Loki's cardinality endpoint directly
curl -s -G "http://loki:3100/loki/api/v1/index/stats" \
--data-urlencode 'query={namespace="prod"}' \
--data-urlencode "start=$(date -u -d '-1 hour' +%s)000000000" \
--data-urlencode "end=$(date -u +%s)000000000" | jq
When you genuinely need to slice by a high-cardinality field, do it with a parser at query time (next section), not a label. The index stays tiny; the work happens only on the bytes a query actually touches.
3. LogQL filtering: stream selector, line filters, and parsers
Every LogQL query starts with a stream selector in braces — that is the part the index resolves, and it should be as specific as possible so Loki opens the fewest chunks:
{app="api", env="prod"}
After the selector come line filters, which scan the raw text. These run before parsing and are extremely fast (Loki uses SIMD/optimized substring matching), so put your cheapest, most selective filter first:
# |= contains, != not contains, |~ regex match, !~ regex not match
{app="api", env="prod"} |= "error" != "healthcheck" |~ `status=5\d\d`
Then parsers turn the line into labels you can filter and format on. Pick the parser that matches your format — they differ massively in cost:
# logfmt: key=value pairs. Cheap and predictable.
{app="api"} | logfmt | level="error" | duration > 500ms
# json: parses JSON into labels (nested keys flattened with _). Pricier than logfmt.
{app="api"} | json | status_code >= 500 | line_format "{{.method}} {{.path}} {{.status_code}}"
# pattern: explicit positional extraction, the fastest structured parser for
# fixed-shape lines like nginx access logs. <_> discards a field.
{app="nginx"} | pattern `<ip> - - <_> "<method> <path> <_>" <status> <size>` | status="500"
Two formatting tools matter for both readability and downstream metric queries:
label_formatrenames or rewrites a label:| label_format duration_ms={{ div .duration 1000000 }}``.line_formatrewrites the displayed line with Go templating, useful for collapsing noisy JSON into something readable.
The order is load-bearing for performance. This is the canonical efficient shape, and getting it wrong is the most common reason “Loki is slow”:
Stream selector (index) -> line filters (raw bytes) -> parser (structured) -> label filters (post-parse) -> formatting. Filter on raw text with
|=before you| json, so the JSON parser only runs on the lines that survive.
# GOOD: line filter prunes 99% of lines before the expensive json parse
{app="api", env="prod"} |= "error" | json | status_code >= 500
# BAD: parses every single line, then throws most away
{app="api", env="prod"} | json | status_code >= 500 |= "error"
4. LogQL metric queries: turning logs into SLO signals
LogQL has two query types. Everything above is a log query (returns lines). Wrap a log query in a range aggregation and you get a metric query (returns a time series) — this is how you alert on logs and build SLOs without a separate metrics pipeline.
The core range-vector functions over a log stream:
# Lines per second matching the selector+filters (log-range, counts lines)
rate({app="api"} |= "error" [5m])
# Total matching lines over the window
count_over_time({app="api"} |= "error" [5m])
# Bytes per second ingested for a stream (capacity planning)
bytes_rate({app="api"}[5m])
For numeric SLOs you need unwrap, which pulls a numeric value out of an extracted label and lets you aggregate it. This is how you compute latency percentiles or error ratios straight from logs:
# p99 request latency from a logfmt 'duration' field, in seconds, per route
quantile_over_time(0.99,
{app="api"} | logfmt | unwrap duration_seconds [5m]
) by (route)
# Error ratio as an SLI: 5xx lines divided by all lines, over 5m
sum(rate({app="api"} | logfmt | status >= 500 [5m]))
/
sum(rate({app="api"} | logfmt [5m]))
unwrap understands duration and bytes suffixes when you use the helpers unwrap duration(field) and unwrap bytes(field), so you do not have to pre-divide. Wrap these in a sum by (...) and you have a recording-rule-ready SLI that lives next to your Prometheus burn-rate alerts.
A word on cost: metric queries decompress and scan every chunk in the range. A 30-day rate() over a chatty stream is a lot of object-storage reads. Keep alerting queries on short windows and let the query-frontend cache and the compactor’s per-day index do the heavy lifting for dashboards.
5. Chunk storage, caching, and query splitting
A flushed chunk is a compressed blob of log lines for a single stream over a time window, stored as an object in S3/GCS/Azure Blob. The TSDB index maps (labels, time) -> chunk references. Tune the chunk lifecycle and the caches, and you control both ingester memory and query latency.
# Modern Loki (TSDB shipper) storage + chunk lifecycle
schema_config:
configs:
- from: 2024-04-01
store: tsdb
object_store: s3
schema: v13
index:
prefix: index_
period: 24h
storage_config:
aws:
s3: s3://us-east-1/loki-chunks
tsdb_shipper:
active_index_directory: /loki/tsdb-index
cache_location: /loki/tsdb-cache
ingester:
chunk_target_size: 1572864 # 1.5MB compressed; the sweet spot for read efficiency
chunk_idle_period: 30m # flush a stream after 30m of no new lines
max_chunk_age: 2h # force-flush even an active chunk after 2h
chunk_encoding: snappy # snappy = fast; zstd = smaller but more CPU
The three caches that move the needle, in order of impact:
query_range:
align_queries_with_step: true
cache_results: true # 1) results cache: identical sub-queries are free
results_cache:
cache:
memcached_client:
addresses: dns+memcached-results:11211
chunk_store_config:
chunk_cache_config: # 2) chunk cache: decompressed chunks reused across queries
memcached_client:
addresses: dns+memcached-chunks:11211
# 3) index/TSDB cache is handled by the shipper's cache_location above
Query splitting and parallelism are what make a “last 7 days” query return in seconds instead of timing out. The query-frontend chops the range into intervals and fans them across queriers:
limits_config:
split_queries_by_interval: 30m # each sub-query covers 30m, run in parallel
max_query_parallelism: 32 # max sub-queries dispatched concurrently
tsdb_max_query_parallelism: 128 # TSDB can shard much harder than the old index
max_query_series: 500 # cap fan-out so one query can't OOM the read path
With TSDB, Loki also shards a single matcher across queriers automatically, so a heavy count_over_time is split both by time and by index shard. This is why the move from BoltDB-shipper to TSDB is the single biggest performance upgrade most clusters can make.
6. Compactor, retention, and per-tenant limits
Retention is not a storage-bucket lifecycle rule in Loki — it is the compactor’s job, and you must turn it on explicitly or chunks accumulate forever.
compactor:
working_directory: /loki/compactor
retention_enabled: true # REQUIRED; off by default
delete_request_store: s3 # where deletion markers live
compaction_interval: 10m
retention_delete_delay: 2h # grace period before physical delete
limits_config:
retention_period: 744h # global default: 31 days
# Per-stream overrides: keep audit logs longer, drop debug noise fast
retention_stream:
- selector: '{namespace="audit"}'
priority: 10
period: 8760h # 365 days
- selector: '{level="debug"}'
priority: 5
period: 72h # 3 days
For multi-tenant clusters, set retention and rate limits per tenant in an overrides file rather than the global block. This is how a platform team gives each product team its own budget without separate Loki deployments:
# runtime overrides file, hot-reloaded; one block per X-Scope-OrgID tenant
overrides:
team-payments:
retention_period: 2160h # 90 days
per_stream_rate_limit: 10MB
max_global_streams_per_user: 100000
ingestion_rate_mb: 25
team-batch:
retention_period: 168h # 7 days
ingestion_rate_mb: 5
Loki enforces tenancy through the X-Scope-OrgID header — the distributor reads it on write, the querier on read. Run a gateway (auth proxy) in front that injects this header from authenticated identity, and never expose Loki’s HTTP port directly. With auth_enabled: true, the header is mandatory and Loki rejects requests without it.
7. Correlating logs to traces with derived fields
Logs become an order of magnitude more useful when a log line links straight to the trace that produced it. The mechanism is derived fields in the Grafana Loki data source: a regex extracts a value (a trace_id) from the log line, and Grafana renders it as a clickable link into your Tempo data source.
{
"name": "Loki",
"type": "loki",
"jsonData": {
"derivedFields": [
{
"name": "TraceID",
"matcherType": "label",
"matcherRegex": "trace_id",
"url": "${__value.raw}",
"datasourceUid": "tempo-uid",
"urlDisplayLabel": "View Trace"
}
]
}
}
matcherType: "label" keys off an extracted label (use this when your logs are JSON/logfmt and you parse trace_id out); the older default matches a regex against the raw line body. Either way the prerequisite is that your application logs the trace_id in the first place — inject it from the active span context in your logging middleware. Once that link exists, the loop closes both ways: a metric exemplar jumps to a trace, a trace span jumps to its logs via tracesToLogsV2, and a log line jumps back to the trace. Three pillars, one investigation.
8. Capacity and cost versus an Elasticsearch/OpenSearch baseline
The reason to run Loki at all is the cost model, so make the comparison explicit. The fundamental difference: Elasticsearch builds an inverted index over every term in every document; Loki indexes only labels and stores the rest as compressed object-storage chunks.
| Dimension | Elasticsearch / OpenSearch | Grafana Loki |
|---|---|---|
| What is indexed | Every field/term (full-text) | Labels only |
| Storage tier | Hot SSD on data nodes | Object storage (S3/GCS/Azure) |
| Storage cost | High (replicated SSD + index overhead, often >1x raw) | Low (compressed chunks, ~0.1-0.3x raw) |
| Arbitrary-field search | Fast (indexed) | Brute-force scan at query time |
| Stream/label search | n/a | Fast (TSDB index) |
| Scaling pain point | Shard/heap management, index lifecycle | Stream cardinality, query fan-out |
| Best fit | Search-heavy, ad-hoc field queries | High-volume logs, known label dimensions, cost-sensitive |
Loki wins decisively on ingest and storage cost and on operational simplicity (stateless components plus object storage). It loses when your access pattern is genuinely “search any field across everything,” because that becomes a full scan. The honest framing for a platform team: Loki is cheaper to store and pricier to search broadly; Elasticsearch is the inverse. Most production logging is “I know the service and roughly when, now show me the errors,” which is exactly Loki’s strength — provided your labels are designed for it.
Sizing rule of thumb for the write path: ingesters are memory-bound by active streams, not by raw bytes. Budget roughly tens of thousands of active streams per ingester and scale on stream count, not log volume. The read path scales on query-frontend parallelism and cache hit rate. Keep both healthy and Loki is the cheapest pillar you run.
Enterprise scenario
A fintech platform team migrated ~40 microservices from a self-managed Elasticsearch cluster to Loki to cut a six-figure annual storage bill. Within two weeks of the cutover, ingesters started OOMing every few hours, ingestion lagged, and per_stream_rate_limit rejections flooded the distributors. The Elasticsearch bill went away; a stability fire replaced it.
The constraint was self-inflicted. Their Promtail/Alloy relabel config had been written to mirror Elasticsearch’s field-level searchability — they had promoted request_id, user_id, pod, and the full request path to labels, reasoning that “if it was searchable before, it should be a label now.” With ~40 services, dynamic pod names, and unbounded request IDs, active streams had blown past 2 million. The TSDB index was enormous and every ingester was trying to hold tens of thousands of tiny, never-filling chunks in memory.
The fix was to demote everything high-cardinality back into the log line and search it at query time. The relabel config kept only bounded labels, and the queries moved the slicing into a parser:
# Grafana Alloy: keep ONLY low-cardinality labels; drop the cardinality bombs
loki.relabel "trim" {
forward_to = [loki.write.default.receiver]
// keep app/namespace/level/cluster as labels
rule { source_labels = ["__meta_kubernetes_namespace"], target_label = "namespace" }
rule { source_labels = ["__meta_kubernetes_pod_label_app"], target_label = "app" }
// DROP pod, request_id, path as labels - they live in the line instead
rule { regex = "pod|request_id|trace_id|path", action = "labeldrop" }
}
# What used to be a label match {request_id="abc123"} becomes a query-time parse.
# The index resolves {app,env}; the parser does the rest on a small chunk set.
{app="payments", env="prod"} |= "request_id=abc123" | logfmt | status >= 500
Active streams dropped from ~2,000,000 to ~28,000. Ingester memory fell by roughly 85%, the OOMs stopped, the rate-limit rejections vanished, and request_id lookups — now line filters against a tiny set of chunks resolved by {app,env} — returned in under a second. The lesson they wrote into their onboarding doc: in Loki, a label is a routing key, not a search field, and the cost of forgetting that is paid in ingester RAM.
Verify
Confirm the pipeline end to end before you trust it:
# 1) Loki is ready and ingesters are healthy in the ring
curl -s http://loki:3100/ready
curl -s http://loki:3100/ring | grep -c ACTIVE
# 2) Streams are being created at a sane cardinality (NOT millions)
logcli series '{}' --since=15m | wc -l
# 3) A line filter beats a parser-first query (compare returned stats.bytesProcessed)
logcli query '{app="api"} |= "error" | json' --since=10m --stats
# 4) A metric query returns a time series (SLI is computable)
curl -s -G "http://loki:3100/loki/api/v1/query_range" \
--data-urlencode 'query=sum(rate({app="api"} |= "error" [5m]))' \
--data-urlencode "start=$(date -u -d '-1 hour' +%s)" \
--data-urlencode "end=$(date -u +%s)" \
--data-urlencode 'step=60' | jq '.data.result | length'
# 5) Compactor is actually applying retention (look for retention activity)
curl -s http://loki:3100/metrics | grep loki_compactor_apply_retention
- In Grafana Explore, run a stream selector and confirm the query stats panel shows low
Total bytes processedwhen you front-load a line filter. - Open a log line with a
trace_idand confirm the derived field renders a “View Trace” link into Tempo. - Drop a
level="debug"line and confirm it ages out at the per-stream retention you set, while{namespace="audit"}survives.
Checklist
Pitfalls
- Retention silently off.
retention_enableddefaults tofalse. Chunks accumulate forever and the bill climbs quietly. It is the compactor, not an S3 lifecycle rule, that deletes Loki data. - Parser-first queries.
| jsonbefore|=parses every line in range and is the most common cause of slow, expensive queries. Always prune with a line filter first. - Dynamic labels from relabeling. A relabel rule that promotes a Kubernetes annotation or a request field into a label can quietly create unbounded streams. Audit relabel configs as carefully as the queries.
- High-cardinality
unwrapaggregations.quantile_over_time(... ) by (high_card_label)produces a series per value and can blow pastmax_query_series. Aggregate by bounded dimensions only. - One Loki, one tenant. Skipping
X-Scope-OrgIDand running everything asfakemeans no per-team retention, no per-team rate limits, and noisy-neighbor blast radius across the whole org.
Next steps
Move your highest-traffic SLIs from ad-hoc LogQL into Loki recording rules (the ruler component) so burn-rate alerts read a pre-aggregated series instead of scanning chunks on every evaluation. Stand up Bloom filters (the accelerated structured-metadata path) if your access pattern is dominated by needle-in-haystack ID lookups, to skip chunks that can’t contain a value. And formalize a cardinality budget per service the same way you budget Prometheus labels — review it at design time, because the cheapest stream is the one you never created.