Deploy Loki in Distributed Microservices Mode with S3 Chunk Storage and Index Gateway

A SaaS company runs forty-odd microservices across three EKS clusters, and its single-binary Loki has hit a wall: at 9 a.m. every weekday the ingest path falls over, queries spanning more than a day time out, and the on-call engineer cannot tell whether the payments team’s log flood is starving the checkout team’s queries because everything shares one process and one tenant. The platform team’s mandate is to turn Loki into a real multi-tenant service — one where each product team is an isolated tenant with its own rate limits and retention, where chunk storage is cheap and effectively infinite on S3, and where the read and write paths scale independently so a query storm never takes down ingest. This guide walks through deploying Loki in distributed microservices mode to get exactly that: separate distributor, ingester, querier, query-frontend, compactor, and index-gateway deployments, with chunks and the TSDB index living in S3.

Prerequisites

A Kubernetes cluster (this guide assumes EKS 1.29+) with at least three nodes and a working kubectl context.
helm 3.14+ and the aws CLI v2 configured against the target account.
An IAM OIDC provider associated with the cluster (eksctl utils associate-iam-oidc-provider) so Loki pods can use IRSA (IAM Roles for Service Accounts) instead of static keys.
Permission to create an S3 bucket, an IAM role, and a Route 53 record (or your own ingress/DNS path).
A running Grafana (or Grafana Cloud) to point at the Loki gateway as a data source.
Optional but assumed here: Terraform for the bucket/IAM/role, GitHub Actions + Argo CD for GitOps delivery of the Helm release, HashiCorp Vault for any non-IRSA secrets, Dynatrace or Datadog for meta-monitoring the Loki components themselves, and Okta federated to your IdP for Grafana SSO. Each of these is wired in at the step where it earns its place.

Target topology

Deploy Loki in Distributed Microservices Mode with S3 Chunk Storage and Index Gateway — topology

The defining idea of distributed mode is that Loki’s single binary is decomposed into independently-scalable components along the read path and the write path, sharing only the object store and a hash ring for coordination.

On the write path, agents push logs to a distributor, which validates the stream, enforces per-tenant rate limits, and forwards entries (replicated across the ring) to ingesters. Ingesters batch entries into compressed chunks in memory and flush them to S3; they also write the TSDB index that maps stream labels to chunks. A compactor periodically merges per-ingester index files into shared, deduplicated index files in S3 and enforces retention.

On the read path, a query hits the query-frontend, which splits it by time and shards it, then queues the sub-queries. Queriers pull work from that queue, fetch the relevant index from the index-gateway (a dedicated component that holds and serves the TSDB index so queriers do not each download it), then fetch the matching chunks from S3 and run the LogQL evaluation. The frontend stitches the results back together.

Because read and write are separate Deployments/StatefulSets, a query storm scales queriers without touching ingest, and a log flood scales distributors and ingesters without slowing queries — the exact isolation the single binary could not give. Multi-tenancy is enforced end-to-end by the X-Scope-OrgID header, so each product team is a hard-isolated tenant.

1. Provision the S3 bucket and IRSA role with Terraform

Loki needs one bucket for chunks and index, and an IAM role the pods assume via IRSA. Keep this in Terraform so the bucket policy, encryption, and lifecycle rules are reviewable. (This is application infrastructure, not a secret store — no credentials live here.)

# loki-storage.tf
resource "aws_s3_bucket" "loki" {
  bucket = "kloudvin-loki-chunks-prod-use1"
}

resource "aws_s3_bucket_server_side_encryption_configuration" "loki" {
  bucket = aws_s3_bucket.loki.id
  rule {
    apply_server_side_encryption_by_default { sse_algorithm = "aws:kms" }
    bucket_key_enabled = true
  }
}

# Lifecycle: abort incomplete multipart uploads from crashed ingesters
resource "aws_s3_bucket_lifecycle_configuration" "loki" {
  bucket = aws_s3_bucket.loki.id
  rule {
    id     = "abort-mpu"
    status = "Enabled"
    abort_incomplete_multipart_upload { days_after_initiation = 3 }
  }
}

# IRSA role assumed by Loki service accounts in the "loki" namespace
data "aws_iam_policy_document" "loki_s3" {
  statement {
    actions   = ["s3:ListBucket", "s3:GetObject", "s3:PutObject", "s3:DeleteObject"]
    resources = [aws_s3_bucket.loki.arn, "${aws_s3_bucket.loki.arn}/*"]
  }
}

module "loki_irsa" {
  source    = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  role_name = "loki-s3-prod"
  oidc_providers = {
    main = {
      provider_arn               = var.cluster_oidc_provider_arn
      namespace_service_accounts = ["loki:loki"]
    }
  }
  role_policy_arns = { s3 = aws_iam_policy.loki_s3.arn }
}

Apply it, then capture the role ARN — the Helm values reference it so every Loki component pod gets scoped S3 access with no static keys:

terraform apply -target=aws_s3_bucket.loki -target=module.loki_irsa
terraform output -raw loki_irsa_role_arn
# arn:aws:iam::123456789012:role/loki-s3-prod

2. Lay down the namespace and the IRSA service account

kubectl create namespace loki

# Annotate the SA that the Helm chart will use, binding it to the IRSA role
kubectl -n loki create serviceaccount loki
kubectl -n loki annotate serviceaccount loki \
  eks.amazonaws.com/role-arn=arn:aws:iam::123456789012:role/loki-s3-prod

If you keep any non-IRSA secret — say a webhook token for alerting — pull it from HashiCorp Vault via the Vault Agent injector rather than a plain Secret; Loki itself needs none for S3 thanks to IRSA, which is the point of using it.

3. Author the distributed Helm values

Use the official grafana/loki chart in its distributed form (deploymentMode: Distributed), which exposes each component as its own workload. The key decisions are baked into loki.storage (S3 + TSDB schema), the per-component replica counts, and the multi-tenant limits.

# loki-values.yaml
deploymentMode: Distributed

loki:
  auth_enabled: true          # require X-Scope-OrgID — real multi-tenancy
  schemaConfig:
    configs:
      - from: "2026-01-01"
        store: tsdb           # TSDB index (not deprecated boltdb-shipper)
        object_store: s3
        schema: v13
        index:
          prefix: index_
          period: 24h
  storage:
    type: s3
    bucketNames:
      chunks: kloudvin-loki-chunks-prod-use1
      ruler:  kloudvin-loki-chunks-prod-use1
    s3:
      region: us-east-1
      # no accessKeyId/secretAccessKey: IRSA supplies credentials
  storage_config:
    tsdb_shipper:
      active_index_directory: /var/loki/tsdb-index
      cache_location:         /var/loki/tsdb-cache
      index_gateway_client:
        server_address: dns:///loki-index-gateway-headless.loki.svc.cluster.local:9095
  ingester:
    chunk_target_size: 1572864   # ~1.5 MB chunks before flush
    chunk_idle_period: 30m
    wal:
      enabled: true              # write-ahead log: survive ingester restarts
      dir: /var/loki/wal
  limits_config:
    retention_period: 744h       # 31 days default; overridden per tenant below
    ingestion_rate_mb: 8
    ingestion_burst_size_mb: 16
    max_query_parallelism: 64
    split_queries_by_interval: 15m
    tsdb_max_query_parallelism: 128
    volume_enabled: true
  querier:
    max_concurrent: 8
  compactor:
    retention_enabled: true
    delete_request_store: s3
    compaction_interval: 10m

serviceAccount:
  create: false
  name: loki                     # the IRSA-annotated SA from step 2

# --- Component scaling: read and write scale independently ---
distributor:
  replicas: 3
ingester:
  replicas: 3                    # StatefulSet; backed by the WAL PVC
  persistence:
    enabled: true
    size: 20Gi
querier:
  replicas: 4
queryFrontend:
  replicas: 2
indexGateway:
  replicas: 2                    # dedicated TSDB index servers
  persistence:
    enabled: true
    size: 20Gi
compactor:
  replicas: 1                    # exactly one — compaction must not run concurrently
ruler:
  replicas: 2

# Single binary / backend modes off
gateway:
  enabled: true                  # nginx that routes read vs write paths

Two choices here are the ones teams get wrong. First, compactor.replicas: 1 is non-negotiable — running two compactors corrupts the shared index because both try to rewrite the same files. Second, the index_gateway_client.server_address must point at the headless service with the dns:/// scheme so queriers gRPC-load-balance across all index-gateway pods; pointing at the regular ClusterIP service pins every querier to one gateway and defeats the component.

4. Deploy with Helm (driven by Argo CD)

Install directly for the first bring-up, then hand ongoing management to GitOps:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm upgrade --install loki grafana/loki \
  --namespace loki \
  --version 6.* \
  --values loki-values.yaml \
  --wait --timeout 10m

In production, the same loki-values.yaml lives in a Git repo. GitHub Actions lints it (helm template | kubeval) and Argo CD syncs the release, so a limits change for a tenant is a reviewed pull request with an audit trail rather than an ad-hoc helm upgrade. Watch the components come up:

kubectl -n loki get pods -l app.kubernetes.io/instance=loki
# loki-distributor-...     1/1 Running   (x3)
# loki-ingester-0/1/2      1/1 Running   (StatefulSet)
# loki-querier-...         1/1 Running   (x4)
# loki-query-frontend-...  1/1 Running   (x2)
# loki-index-gateway-0/1   1/1 Running
# loki-compactor-0         1/1 Running
# loki-gateway-...         1/1 Running

5. Configure per-tenant limits with a runtime overrides file

The whole reason for auth_enabled: true is to give each product team its own ceiling. Loki reads a runtime overrides file that can be reloaded without a restart. Define per-tenant ingestion rates and retention:

# in loki-values.yaml, under `loki.runtimeConfig` (rendered to a ConfigMap)
runtimeConfig: |
  overrides:
    payments:
      ingestion_rate_mb: 24
      ingestion_burst_size_mb: 48
      retention_period: 2160h        # 90 days for a regulated tenant
      max_global_streams_per_user: 100000
    checkout:
      ingestion_rate_mb: 12
      retention_period: 744h         # 31 days
    sandbox:
      ingestion_rate_mb: 2
      retention_period: 168h         # 7 days, cheap and disposable

Now the payments flood that started this project is capped to its own quota and can never starve checkout, because the distributor enforces these limits per X-Scope-OrgID before anything reaches the ingesters.

6. Point your agents and Grafana at the right tenant

Agents (Promtail, Grafana Alloy, the OTel Collector) push to the gateway’s write path with the tenant header. Example for Grafana Alloy:

loki.write "default" {
  endpoint {
    url       = "http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push"
    tenant_id = "checkout"     // sets X-Scope-OrgID
  }
}

In Grafana, add one Loki data source per tenant, each carrying its X-Scope-OrgID header, and gate access behind Okta-federated SSO so a member of the checkout team cannot select the payments data source. Map Okta groups to Grafana teams and use data-source permissions for the isolation.

# Grafana data source: URL http://loki-gateway.loki.svc.cluster.local
# Custom HTTP Header:  X-Scope-OrgID = checkout

Validation

Confirm the write path, the S3 backing, the index-gateway, and tenant isolation independently — do not declare victory on a green kubectl get pods.

1. Components are ready and the ring is healthy. Port-forward a distributor and check the ingester ring shows all members ACTIVE:

kubectl -n loki port-forward svc/loki-distributor 3100:3100 &
curl -s localhost:3100/ring | grep -c ACTIVE      # expect 3
curl -s localhost:3100/ready                       # "ready"

2. Push a log line as a tenant and read it back. The X-Scope-OrgID must round-trip:

NOW=$(date +%s)000000000
curl -s -H "X-Scope-OrgID: checkout" \
  -H "Content-Type: application/json" \
  -XPOST "http://localhost:3100/loki/api/v1/push" \
  --data-raw "{\"streams\":[{\"stream\":{\"app\":\"smoke\"},\"values\":[[\"$NOW\",\"hello loki\"]]}]}"

# Query it back through the gateway (read path)
kubectl -n loki port-forward svc/loki-gateway 8080:80 &
curl -s -G -H "X-Scope-OrgID: checkout" \
  "http://localhost:8080/loki/api/v1/query_range" \
  --data-urlencode 'query={app="smoke"}' | jq '.data.result[0].values'

3. Chunks actually landed in S3. After ~30 minutes (the chunk_idle_period) or a forced flush, objects appear under the tenant prefix:

aws s3 ls s3://kloudvin-loki-chunks-prod-use1/checkout/ --recursive | head
# checkout/<fingerprint>/<chunk>   ...
aws s3 ls s3://kloudvin-loki-chunks-prod-use1/index/  | head

4. The index-gateway is serving, not the queriers. Confirm queriers route index lookups to the gateway:

kubectl -n loki logs deploy/loki-querier | grep -i "index gateway"
# "connecting to index gateway" / dns:///loki-index-gateway-headless...

5. Tenant isolation holds. A query with the wrong (or missing) X-Scope-OrgID must return no smoke data — proving streams are partitioned by tenant.

For ongoing health, scrape Loki’s own /metrics into your monitoring stack and watch in Dynatrace (or Datadog): loki_distributor_bytes_received_total per tenant, loki_ingester_wal_disk_full_failures_total (your WAL early-warning), loki_request_duration_seconds on the query-frontend, and loki_boltdb_shipper_query_resends / index-gateway request latency. Alert on ingester restarts and on any compactor run exceeding its interval.

Rollback / teardown

Because state lives in S3 and the WAL, a clean rollback is safe.

Roll back a bad config (e.g., a limits change that broke ingest) — Helm keeps revisions:

helm -n loki history loki
helm -n loki rollback loki <previous-revision> --wait

If Argo CD owns the release, revert the Git commit and let it sync; never helm rollback underneath Argo or it will fight you on the next reconcile.

Full teardown — uninstall the workloads, then deal with state deliberately:

helm -n loki uninstall loki
kubectl delete namespace loki        # removes WAL PVCs with it

# The S3 data is intentionally NOT deleted by Helm. Remove it only when sure:
aws s3 rm s3://kloudvin-loki-chunks-prod-use1/ --recursive
# Then destroy the bucket + IRSA role via Terraform:
terraform destroy -target=aws_s3_bucket.loki -target=module.loki_irsa

Keep the bucket if you might restore — a fresh Loki pointed at the same bucket and schema will read every historical chunk. That separation of compute from storage is the safety net distributed mode buys you.

Common pitfalls

Forgetting auth_enabled: true. Without it, every push lands in a single fake tenant and your per-team isolation silently evaporates. Set it from day one; retrofitting tenancy onto existing data means re-keying chunk prefixes.
More than one compactor. Two compactor replicas corrupt the shared index. The chart defaults to one; never scale it.
Pointing the index-gateway client at the ClusterIP service. Use the dns:///...headless...:9095 address so gRPC load-balances; otherwise all queriers hammer a single gateway pod and you get tail-latency cliffs.
No WAL or no PVC on ingesters. Without the write-ahead log on durable storage, an ingester crash loses every un-flushed chunk (up to chunk_idle_period of logs). The values above enable both.
Querying huge time ranges without splitting. If split_queries_by_interval is unset, a 30-day query becomes one giant unsharded scan. The frontend splits it into 15-minute shards across queriers here.
Mismatched schema dates. Adding a new schemaConfig entry with a from date in the past silently breaks reads for the overlap. New schema versions must start at a future date.
S3 list/throttle costs. Under-tuned compaction leaves thousands of tiny index files; the compactor merging on a 10-minute interval keeps LIST volume — and your bill — sane.

Security notes

Authenticate to S3 with IRSA, never static keys baked into a Secret — the IAM role above scopes Loki to exactly one bucket. Loki has no built-in user auth, so treat the gateway as the trust boundary: terminate TLS at it, and put a reverse proxy or service mesh in front that injects and validates X-Scope-OrgID so a tenant can never spoof another’s header. Front Grafana with Okta SSO and use data-source permissions for read isolation. Encrypt chunks at rest with SSE-KMS (set above) and enforce TLS on the gRPC traffic between components in regulated environments. Pull any remaining non-IRSA secrets from HashiCorp Vault. Finally, scan the rendered manifests and the cluster posture with your existing tooling — a Wiz or Wiz Code policy that flags a public S3 bucket or an over-broad IAM trust policy on the Loki role is a cheap backstop, and CrowdStrike Falcon sensors on the node pool cover runtime threats on the ingester hosts that handle every tenant’s log data. Route any policy or guardrail breach into ServiceNow so security gets a ticketed change record, not just a Slack ping.

Cost notes

The economics are why you put chunks in S3: object storage is roughly an order of magnitude cheaper per GB than the EBS volumes a single-binary Loki would need, and it scales without capacity planning. Drive cost down further with per-tenant retention_period (the sandbox tenant at 7 days, payments at 90) so you never pay to store cheap logs as long as regulated ones. Keep chunks compact (chunk_target_size ~1.5 MB) to balance S3 PUT/GET request charges against object count, and let the compactor merge index files on a tight interval to crush LIST costs. Right-size queriers and ingesters separately — the whole point of distributed mode — so you are not paying for read capacity to handle a write spike or vice versa. Track spend per tenant by exporting loki_distributor_bytes_received_total and an S3 storage-by-prefix metric into Datadog (or Dynatrace) and build the chargeback view there, so each product team owns the cost of its own log volume.