Architecture Multi-cloud

Kubernetes Cost Allocation and Rightsizing with Kubecost

A direct-to-consumer health-insurance carrier finishes a year of aggressive migration onto Kubernetes — claims adjudication, member portal, a fraud-scoring service, and a quoting engine all now run as namespaces on a handful of large EKS, AKS, and GKE clusters — and the CFO opens the consolidated cloud bill to a single, useless number: ₹4.1 crore a month of “compute,” with no idea which product line, which team, or which feature drove it. Worse, the platform team’s own dashboards show cluster CPU averaging 19% utilization, which means the company is paying for roughly five nodes’ worth of capacity to do one node’s worth of work. The actuarial side of an insurer lives and dies by unit economics — cost per policy, cost per claim — and right now the single largest variable cost in the business is a black box. The mandate from the CFO and the VP of Engineering is blunt: “Tell me what each product costs to run, stop paying for idle, and bill it back to the P&L that caused it.” This article is the reference architecture for doing exactly that — a Kubernetes cost-allocation, rightsizing, and chargeback platform built around Kubecost that a FinOps lead, a platform engineer, and a finance controller will all trust.

The pressures here are the FinOps pressures, and they stack the way they always do once Kubernetes adoption crosses a threshold. Opacity is the first: a cloud provider bills you per node, per disk, per load balancer — but a node runs forty pods from six teams, so the provider’s invoice can never tell you who spent what. Waste is the second: developers set CPU and memory requests by guessing high “to be safe,” the scheduler reserves that capacity whether or not it is used, and idle reservation is invisible on any utilization graph that only shows actual usage. Accountability is the third: until a cost lands on a team’s own budget, no one has a reason to fix it. Kubecost is the pattern that addresses all three at once — it reconstructs the cloud bill at the pod, namespace, and label level, attributes shared and idle cost honestly, and turns “compute” into a per-team, per-product line item that a chargeback process can act on.

Why not the obvious shortcuts

The naive fixes each fail predictably, and naming why matters because someone on the project will propose all three in the first week.

Reading the cloud provider’s cost console (Cost Explorer, Azure Cost Management, GCP Billing) gets you cost per node, never cost per namespace. The provider has no idea that node ip-10-2-4-9 ran the claims service and the fraud scorer side by side; it sees an EC2 instance. You can tag the node, but you cannot tag the forty pods sharing it, so the granularity you actually need does not exist at that layer.

Tagging Kubernetes resources and hoping the bill follows breaks because the unit of billing (the node) and the unit of work (the pod) are different objects with different lifetimes. A node lives for days; a pod lives for minutes. No tag on the node can follow the churn of pods scheduled onto it through the day.

Eyeballing utilization dashboards and resizing by hand ignores the single most expensive number in the system — the gap between what pods request (and the scheduler therefore reserves and you pay for) and what they actually use. A pod requesting 2 CPU and burning 0.2 is 90% waste, but a usage graph showing 0.2 CPU looks healthy. Without request-vs-usage accounting, you optimize the wrong thing.

Kubecost threads the needle. It joins three streams the other approaches keep separate — the cloud provider’s actual billing data (so on-demand, Spot, Savings Plans, and committed-use discounts are reflected at real prices), Prometheus metrics for what each pod requested and used, and the Kubernetes API for the labels and ownership that map a pod back to a team and a product. From that join it produces an allocation: this namespace, with these labels, cost this many rupees this hour, broken into CPU, memory, GPU, storage, network, and load-balancer components, with idle and shared cost attributed rather than hidden.

Architecture overview

Kubernetes Cost Allocation and Rightsizing with Kubecost — architecture

The platform runs two distinct loops that share data but live on different schedules: a measurement loop that continuously reconstructs cost from metrics and billing, and an action loop that turns those measurements into rightsizing changes, node consolidation, and a chargeback report. Keeping them separate in your head is the first step to operating this well — measurement must be trustworthy and read-only before anyone lets the action loop touch a workload.

The defining property of the topology is that Kubecost runs as a workload inside each cluster but federates its allocation data up to a single primary, so a multi-cluster, multi-cloud estate produces one coherent cost model rather than three disconnected ones. Each cluster reports its own truth; the primary stitches them into the number finance sees.

Measurement loop, following the data flow:

  1. In every cluster — the EKS clusters in ap-south-1, the AKS cluster, and the GKE cluster — a Kubecost agent plus a scoped Prometheus scrape pod resource requests, limits, and actual CPU/memory/GPU usage at short intervals. This is the raw signal for “requested versus used.”
  2. Kubecost pulls the cloud billing export for each provider — the AWS Cost and Usage Report (CUR) in S3, the Azure cost export, the GCP billing export to BigQuery — so it prices each pod-hour at the actual rate paid, honoring Spot/preemptible discounts, Savings Plans, and committed-use discounts rather than list price. Allocating at list price overstates discounted workloads and is the most common way these numbers lose finance’s trust.
  3. Kubecost reads the Kubernetes API for labels, annotations, namespaces, and controller ownership, and applies an allocation model: direct costs go to the owning pod; shared costs (the cluster’s control plane overhead, monitoring, kube-system, a shared ingress) are split by a configured key — even, weighted, or proportional to each tenant’s usage; and idle cost (reserved-but-unused node capacity) is computed explicitly and either shown on its own or distributed to the teams whose over-requests caused it.
  4. Each cluster’s Kubecost federates its allocation data — written as cost-model snapshots to an object-storage bucket (S3/Blob/GCS) — up to a primary Kubecost instance, which presents the unified, cross-cluster, cross-cloud view.

Action loop, independent and driven off those measurements:

  1. Kubecost’s rightsizing recommendations compare each workload’s requests against its real usage percentiles and propose new request/limit values — “this pod requests 2 CPU, has never exceeded 0.4 at p99 over 30 days, recommend 0.5.” These surface in the UI, via the API, and as Kubernetes events.
  2. At the node layer, Karpenter (on the EKS/AKS clusters) consolidates: as rightsized pods free up reserved capacity, Karpenter bin-packs them onto fewer, cheaper, often Spot nodes and terminates the now-empty ones — turning a rightsizing recommendation into an actual smaller bill.
  3. The unified allocation data is exported nightly into the company’s FinOps pipeline — a chargeback/showback report per product line, reconciled against the provider invoice, that finance loads into the P&L and that drives a ServiceNow request whenever a team’s spend breaches its budget.

Component breakdown

Component Service / tool Role in the platform Key configuration choices
Cost model Kubecost Joins billing + metrics + labels into per-namespace/label allocation Federated multi-cluster; CUR/Azure/GCP cloud integration; idle split policy
Usage metrics Prometheus Scrapes pod requests, limits, actual CPU/mem/GPU usage Short scrape interval; retention sized for percentile windows; per-cluster
Node provisioning Karpenter Consolidates pods onto fewer/cheaper nodes after rightsizing Consolidation enabled; Spot-first with on-demand fallback; instance diversity
Identity / SSO Okta + Entra ID Workforce SSO into the Kubecost UI; team/cost-center claims OIDC to Kubecost; group claims map to cost-center filters; conditional access
Secrets HashiCorp Vault Cloud billing-export creds, DB credentials, API tokens for the FinOps job Dynamic leases; Vault Agent sidecar injection; no static keys in cluster
Cloud billing AWS CUR / Azure export / GCP export Actual-price source so discounts are reflected CUR to S3; Azure cost export; GCP billing → BigQuery; daily refresh
ITSM / approvals ServiceNow Budget-breach tickets, rightsizing change requests, monthly chargeback record Auto-ticket on threshold breach; change gate before prod rightsizing applies
Observability Dynatrace / Datadog Correlates cost spikes with deploys, traffic, and SLOs Kubecost metrics scraped in; cost-vs-performance dashboard; anomaly alerts
CSPM / posture Wiz / Wiz Code Verifies Kubecost’s footprint is least-privilege and not publicly exposed Agentless scan of the namespace + IAM; Wiz Code checks the Helm/Terraform
Runtime security CrowdStrike Falcon Runtime threat detection on the cluster nodes Kubecost observes Sensor on node pools; detections to the SOC
CI / IaC GitHub Actions / Jenkins + Argo CD + Terraform Deploys Kubecost via GitOps; provisions billing exports and IAM OIDC to cloud (no stored creds); Argo CD syncs the Helm release; policy gate
Config mgmt Ansible Bootstraps agents/exporters on legacy VM and virtual-appliance estate Idempotent playbooks; same cost tags as the Kubernetes estate
Edge Akamai Fronts the internal Kubecost portal for distributed FinOps reviewers TLS, WAF, access control to the dashboard origin

A few of these choices deserve the why, because they are the ones teams get wrong.

Why actual-price billing integration, not list price. Kubecost can estimate cost from public on-demand rates with no cloud integration at all, and many teams stop there — then finance notices the Kubecost total does not reconcile with the invoice, because half the fleet runs on Spot at a 70% discount and a chunk is covered by a Savings Plan. Once the numbers disagree with the bill, every allocation is suspect. Wiring the CUR, the Azure export, and the GCP BigQuery export in means Kubecost prices each pod-hour at the real negotiated rate, and the sum of allocations reconciles to the provider invoice. That reconciliation is what earns the platform a seat in the actual P&L conversation.

Why idle cost must be named, not hidden. The single most expensive and least visible number in a Kubernetes estate is reserved-but-unused capacity. If a team requests 2 CPU and uses 0.2, the other 1.8 is paid for and doing nothing — and it appears nowhere on a usage graph. Kubecost computes idle explicitly as the difference between what nodes cost and what pods actually consumed, and lets you choose where it lands: shown as its own line (good for a first reckoning — “we are paying ₹70 lakh a month for nothing”), or distributed back to the teams whose over-requests created it (good for accountability — it makes the team that guessed high feel the cost). For this insurer we start with idle visible to shock the org, then switch to distributing it so the incentive to rightsize lands on the right desk.

Why allocate on labels, not just namespaces. Namespace-level allocation is the easy 80%, but an insurer’s claims namespace contains both the adjudication engine and a batch reprocessing job that belong to different cost centers, and the member portal serves three products. Kubecost allocates on any label or annotation, so the real unit of accounting is a label convention — team, cost-center, product, environment — enforced on every workload. The architecture is only as good as that labeling discipline; without it, cost lands in an “unallocated” bucket that finance will not accept.

Implementation guidance

Provision with Terraform and deploy with GitOps; treat the billing integration as the first deliverable. The order matters because allocation without real prices is worse than useless — it is confidently wrong.

  1. Terraform creates the cloud billing exports and the least-privilege read roles Kubecost needs: the CUR to an S3 bucket with an IAM role Kubecost assumes via IRSA, the Azure cost export with a scoped reader identity, and the GCP billing export to BigQuery with a service account. No write permissions anywhere — Kubecost reads cost, it never moves money.
  2. Terraform stands up the Prometheus/agent prerequisites and the federation object-storage bucket the primary reads from.
  3. Argo CD syncs the Kubecost Helm release into each cluster from a Git repo, so the cost platform’s own configuration is version-controlled, reviewable, and revertable — the same GitOps discipline as everything else on the cluster. The CI that lints and bumps that chart runs in GitHub Actions (or Jenkins on the teams that still standardize on it), authenticating to the cloud via OIDC so there is no stored service-principal secret to leak.
  4. Designate one cluster’s Kubecost as the primary and point the others at the shared bucket for federation.

A minimal values shape for a federated agent cluster communicates the intent — read-only billing, federated up, idle made explicit:

# kubecost-values.yaml (agent cluster)
kubecostProductConfigs:
  clusterName: "eks-claims-aps1"
  shareCostsWithAggregator: true          # federate to the primary
federatedETL:
  federatedCluster: true
  primaryCluster: false
  federatedStorageConfigSecret: "kubecost-federated-store"  # S3/Blob/GCS
prometheus:
  server:
    retention: "32d"                      # cover the 30d percentile window
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::...:role/kubecost-cur-read"  # IRSA, read-only

And the allocation policy that decides where idle and shared cost go — the most consequential lines in the whole config:

sharedNamespaces: "kube-system,monitoring,kubecost"
idle: true                 # compute idle explicitly
idleByNode: true           # attribute idle to the node, then to over-requesting tenants
shareTenancyCosts: true    # spread control-plane/shared cost across tenants
sharingStrategy: "weighted" # proportional to each tenant's usage

Identity: federate the humans, kill the static keys. The Kubecost UI is gated by Okta as the workforce IdP (brokered to Entra ID for the teams whose Azure resources expect a native Entra token), so a FinOps reviewer logs in with corporate SSO and conditional-access policies, and the group claims map them to the cost centers they are allowed to see — a product owner sees their product’s spend, not the whole estate. The few secrets Kubecost and the nightly FinOps job genuinely need — the database credential for Kubecost’s durable store, the token the export job uses to push the chargeback report — live in HashiCorp Vault, leased dynamically and injected by the Vault Agent sidecar, so they are short-lived and never written to a Kubernetes Secret or a Helm value. The cloud billing reads themselves use IRSA / workload identity, so there is no static cloud key in the cluster at all.

Labeling discipline is the real prerequisite. Before the first allocation is trustworthy, every workload must carry the cost labels. Enforce them at admission with a policy engine (an OPA/Kyverno rule that rejects a Deployment lacking team and cost-center), and bootstrap the legacy estate the same way: Ansible playbooks stamp the equivalent cost tags onto the VM and virtual-appliance fleet (the still-on-VM quoting components, the network virtual appliances fronting the clusters) so finance gets one cost taxonomy across containerized and non-containerized infrastructure, not two that never reconcile.

Enterprise considerations

Security & least privilege. A cost tool is a tempting target precisely because it has read access to billing and to the whole cluster’s metadata — so scope it hard. Kubecost gets read-only cloud billing roles via IRSA/workload identity (never a key), a read-mostly Kubernetes RBAC role, and no ability to mutate workloads in the measurement loop. The action loop’s rightsizing is proposed, not auto-applied to production, until it passes a gate (below). Wiz / Wiz Code runs continuous posture scanning across the Kubecost namespace and its IAM, alerting the moment the dashboard drifts to public exposure or the billing role widens beyond read — and Wiz Code checks the Helm values and Terraform in the pull request, before the misconfiguration ever ships. CrowdStrike Falcon sensors on the node pools provide runtime threat detection for the nodes Kubecost observes, feeding the SOC. The internal Kubecost portal sits behind Akamai for TLS, WAF, and access control, so the dashboard is not directly internet-exposed.

Cost optimization — the whole point, applied to itself and the estate.

Lever Mechanism Typical effect
Rightsizing requests Set requests to p95–p99 of real usage, not a guess Reclaims the request-vs-usage gap; often 40–60% of “compute”
Karpenter consolidation Bin-pack freed pods onto fewer/cheaper/Spot nodes, terminate empties Turns reclaimed requests into a smaller node bill
Spot / preemptible Run interruptible workloads (batch reprocessing) on Spot ~60–80% off those node-hours
Idle attribution Distribute idle to over-requesting teams Creates the incentive that drives rightsizing
Commitment coverage Use Kubecost’s reservation data to size Savings Plans/CUDs Discounts the steady baseline the fleet always runs

The sequence matters: rightsize first, then let Karpenter consolidate. Rightsizing alone only changes a number in a manifest — the bill does not move until the freed capacity lets Karpenter pack workloads onto fewer nodes and delete the empty ones. Run them out of order and you will report “savings” finance never sees on the invoice.

Scalability. Each piece scales independently. Prometheus is the usual ceiling — high pod churn and short scrape intervals across many clusters generate serious cardinality — so size retention to exactly the percentile window you need (32 days for a 30-day p99, no more) and federate rather than centralizing one giant Prometheus. Kubecost’s federated-ETL model scales to dozens of clusters because each cluster does its own heavy lifting and only ships compact cost snapshots to the primary. The FinOps export job is nightly batch, so it scales trivially.

Failure modes, and what each one looks like. Name them before they mislead finance.

Reliability. Kubecost being down does not take production down — it is an observability and accounting plane, not in the request path — but a gap in its data is a gap in the chargeback record, which finance treats as a real problem at month-end. Run Kubecost’s durable store (its database/object-storage ETL) with the same backup posture as any system of record, and keep the cloud billing exports (CUR in S3, the BigQuery export) as the geo-durable source of truth from which the cost model can always be rebuilt. A pragmatic target: the dashboard can be down for hours without business impact, but no calendar day of allocation data may be permanently lost, because the monthly chargeback depends on every day.

Observability — and closing the loop with the rest of the stack. Kubecost exposes its allocation and efficiency metrics on a Prometheus endpoint, so scrape them into Dynatrace or Datadog and build the dashboard the business actually argues over: cost per namespace and per product, request-vs-usage efficiency, idle as a share of total, cost per claim / per policy (the unit economics the actuaries care about), and cost-per-deploy delta so a release that doubles a service’s spend is caught the day it ships, correlated against the trace and traffic data Dynatrace/Datadog already hold. That correlation — cost spike next to the deploy and the traffic that caused it — is what turns FinOps from a monthly autopsy into a same-day signal.

Governance and chargeback. The output of the whole platform is a monthly chargeback report, reconciled to the invoice, that finance loads into the P&L. Wrap it in process: a team breaching its budget auto-raises a ServiceNow request so there is a ticket and an owner, not just a red number on a dashboard; a production rightsizing change passes through a ServiceNow change gate before Argo CD applies it, giving the platform team a documented approval; and the chargeback figures themselves are version-controlled artifacts so a finance controller can audit how last quarter’s number was produced. Showback (visibility only) is the right first phase to build trust in the numbers; chargeback (the cost actually hits the team’s budget) is the phase that changes behavior — graduate from one to the other deliberately, once allocations reconcile and the teams believe them.

Explicit tradeoffs

Accept these or do not build it. Cost allocation in Kubernetes is genuinely hard because the billing unit (node) and the work unit (pod) are different objects, and Kubecost’s accuracy is bounded by two things you must invest in: a working cloud billing integration (without it, list-price estimates that will not reconcile) and disciplined labeling (without it, an unallocated bucket finance rejects). The idle and shared-cost attribution involves modeling choices — how you split shared cost, where idle lands — that are defensible but not objective, and you will defend them to a controller. Rightsizing trades cost against headroom: cut requests too close to observed usage and you save money right up until the traffic spike that OOM-kills the pod, so the safety margin is a real and permanent cost of the savings. And Karpenter consolidation trades cost against churn — fewer, cheaper nodes mean more pod rescheduling, which workloads that hate disruption will feel. None of this is free; it is cheaper than the black box.

The alternatives, and when they win. If you are a single small cluster with one team, the cloud provider’s own cost console plus node tags may be enough — the namespace granularity Kubecost provides only pays off once a cluster is genuinely multi-tenant. If you are all-in on one cloud and want a vendor-managed option, the provider’s native tooling (AWS Split Cost Allocation for EKS, GCP’s GKE cost allocation) covers the basic namespace split without running another workload, though with less cross-cloud reach and weaker rightsizing than Kubecost. If your goal is purely node-level savings and you do not need per-team chargeback, a Karpenter-plus-Spot strategy alone will cut the bill without any allocation layer at all. Kubecost earns its place specifically when you are multi-cluster and/or multi-cloud, genuinely multi-tenant, and need a chargeback number finance will load into the P&L — which is precisely the insurer’s situation.

The shape of the win

For the insurer, the payoff is not “a cost dashboard.” It is that the CFO opens a report showing claims adjudication costs ₹X per thousand claims, the quoting engine ₹Y per ten thousand quotes, and the fraud scorer ₹Z — each reconciled to the actual invoice, each billed back to the product P&L that owns it — and that the same month the platform team rightsized the over-provisioned services and let Karpenter consolidate, the ₹4.1 crore “compute” line dropped by a third with no product slowed down. That last sentence is the one that funds the platform. Everything upstream — the federated Kubecost agents, the real-price billing integration, the Vault-held credentials, the Wiz posture checks, the Dynatrace cost-vs-deploy correlation, the ServiceNow chargeback gate — exists to turn an opaque, wasteful “compute” number into a per-product unit cost that an actuary, a product owner, and a CFO can each act on. The architecture here is the destination; start with showback on one multi-tenant cluster if you must, but this is where Kubernetes cost accountability at scale has to land.

KubernetesKubecostFinOpsKarpenterCost OptimizationMulti-cloud
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading