Architecture Multi-cloud

Multi-Cloud FinOps with Apptio Cloudability and Unit Economics

A B2B SaaS company in the logistics-tech sector — fleet telematics and route optimization sold to shippers and carriers — gets a pointed question from its board after a down round: “What does it cost us to serve each customer, and is our gross margin getting better or worse as we grow?” The CFO cannot answer it. The cloud bill is $2.1M a month spread across AWS (the core platform), Azure (a regulated EU workload and the data warehouse), and GCP (the ML routing models), and it grew 40% year over year while revenue grew 22%. Three separate billing consoles, three tagging conventions, a Kubernetes cluster where forty microservices and two hundred customers share the same nodes, and a finance team reconciling it all in a spreadsheet that is three weeks stale by the time anyone reads it. The unit that should be obvious — cost to serve one customer — is invisible, and so every pricing, packaging, and “should we fire this unprofitable account” decision is a guess.

This is the problem FinOps exists to solve, and at this scale it is an architecture problem, not a spreadsheet problem. The pressures are the familiar ones inverted. Scale means cost data measured in tens of millions of line items a month across three providers, none of which agree on a schema. Latency means finance needs yesterday’s spend allocated by this morning, not at month-end close. Accuracy means the cost-per-customer number has to survive a board member who used to be a McKinsey partner. And accountability means the number is useless unless it lands in front of the engineer who can change it. This article is the reference architecture for building that: a multi-cloud FinOps platform with Apptio Cloudability as the system of record, real unit economics, automated commitment optimization, and the resulting KPIs pushed into Datadog for engineers and ServiceNow for finance and governance.

Why the obvious approaches fail

Three shortcuts get proposed in every FinOps kickoff, and each fails predictably.

Native cloud cost tools — Cost Explorer, Azure Cost Management, GCP Billing. Each is competent inside its own cloud and blind outside it. There is no native way to sum a shared-services cost that straddles AWS and GCP, no common allocation model, and no shared notion of a “customer” or a “product.” You end up with three dashboards and a human doing the addition.

A homemade data warehouse over the billing exports (CUR, Azure cost exports, GCP BigQuery billing). This can work and many companies start here, but you have just signed up to build and maintain a normalization engine across three constantly-changing billing schemas, reimplement amortization of Savings Plans and Reserved Instances, model blended versus unblended rates, and rebuild Cloudability’s rate-optimization analytics from scratch. It is a multi-quarter platform project that is not your product.

Allocate by tags alone. Tagging is necessary and never sufficient. Shared infrastructure — a Kubernetes cluster, a shared RDS instance, a NAT gateway, an observability bill, network egress — carries no customer tag because it serves all of them. If you only allocate what is tagged, the 35–45% of spend that is shared simply disappears from your unit economics, and the number is a fiction.

The platform threads these needles. Cloudability does the heavy lifting of multi-cloud ingestion and normalization, amortization, and rate optimization out of the box; a business-mapping and showback layer allocates the shared and untagged remainder by defensible drivers; and an activation layer pushes the resulting KPIs to where engineers and finance live so the numbers change behavior instead of decorating a dashboard.

Architecture overview

Multi-Cloud FinOps with Apptio Cloudability and Unit Economics — architecture

The platform runs three logical stages that operate on different cadences: a daily ingestion + normalization stage that lands and standardizes raw billing, a modeling stage that maps cost to business dimensions and computes unit economics, and an activation stage that surfaces KPIs and drives action. Keep them separate in your head — the first is a data-engineering problem, the second a finance-modeling problem, and the third a behavior-change problem, and they fail in different ways.

The defining property of the whole topology is that Cloudability is the single system of record for normalized, amortized, fully-allocated cost — every downstream consumer reads from it, and nobody re-derives cost from raw billing in a side channel. That single source is what lets finance, engineering, and the board argue about the same number instead of three.

Ingestion path, following the data flow:

  1. Each cloud emits its native billing feed on its own schedule. AWS delivers the Cost and Usage Report (CUR 2.0) in Parquet to a billing S3 bucket, refreshed multiple times daily. Azure pushes amortized + actual cost exports to a storage account (or streams via the Cost Management exports API). GCP streams detailed billing export into a BigQuery dataset. These are authoritative, line-item-level, and mutually incompatible.
  2. Cloudability ingests all three via cross-account/cross-tenant read roles — an AWS IAM role it assumes into the payer account, an Azure app registration with Cost Management Reader on the billing scope, and a GCP service account with BigQuery read on the billing dataset. Credentials for these connectors are issued and rotated out of HashiCorp Vault rather than pasted into the Cloudability console, so a billing-reader key is short-lived and auditable.
  3. Cloudability normalizes the three feeds into one schema, amortizes commitment purchases (Savings Plans, Reserved Instances, Azure Reservations, GCP CUDs) across their term so a one-time upfront payment shows as daily effective cost, and reconciles blended/unblended and credits/discounts. This is the work you are paying it to not build yourself.
  4. A nightly job pulls the Kubernetes split-cost allocation signal — CPU, memory, and GPU consumption per namespace/pod/label from the cluster — so shared-cluster spend can later be divided by actual resource usage rather than guessed. Cloudability’s container cost allocation consumes this to turn one EKS/AKS/GKE bill into per-workload cost.

Modeling path: Inside Cloudability, Business Mappings translate raw tags, account IDs, subscriptions, projects, and resource metadata into the dimensions the business reasons in — product, team, environment, and crucially customer (or customer-tier). Untagged and shared cost is allocated to these dimensions by drivers: the Kubernetes usage split for cluster cost, weighted distribution for shared databases, network egress allocated by traffic share, and a residual “platform overhead” pool spread across customers by a fair key (seats, active devices, or revenue). The output is fully-loaded cost per customer, which combined with the revenue feed from the billing/CRM system yields the metric the board asked for: gross margin per customer and cost-to-serve per active device.

Activation path: The modeled KPIs leave Cloudability two ways. Cloudability’s APIs feed a small exporter (running as a scheduled job) that pushes FinOps metrics — cost per customer, margin, commitment coverage, waste — into Datadog as custom metrics so engineers see cost on the same dashboards as latency and error rate. In parallel, anomaly detections and governance items (a budget breach, an unallocated-cost spike, an idle-resource finding) raise ServiceNow records routed to the owning team, turning a finding into a ticket with an assignee and an SLA instead of an email nobody owns.

Component breakdown

Component Service / tool Role in the platform Key configuration choices
AWS billing source Cost and Usage Report (CUR 2.0) Authoritative AWS line-item cost in Parquet Hourly granularity; resource IDs + tags enabled; payer-account export
Azure billing source Cost Management exports Amortized + actual cost for subscriptions Daily export to storage; both amortized and actual; EA/MCA billing scope
GCP billing source BigQuery detailed billing export Per-SKU GCP cost with labels Detailed (not standard) export; resource-level labels on
Cost system of record Apptio Cloudability Multi-cloud ingest, normalize, amortize, allocate, optimize Read-only connectors per cloud; amortized view default; container allocation on
Container allocation Cloudability + cluster usage feed Split shared K8s cost by real CPU/mem/GPU usage Per-namespace/label collection; idle cost surfaced separately
Allocation logic Cloudability Business Mappings Map tags/accounts → product/team/customer; allocate shared cost Driver-based split for untagged; residual overhead by fair key
Secrets HashiCorp Vault Issue/rotate the billing-reader credentials Cloudability uses Dynamic short-lived creds; cloud auth backends; audit log
Identity / SSO Okta + Entra ID SSO into Cloudability, Datadog, ServiceNow with RBAC SAML/OIDC; SCIM provisioning; finance vs eng scopes
Engineering KPIs Datadog Cost metrics beside latency/error so engineers see spend Custom metrics via API exporter; per-team cost dashboards + monitors
Finance & governance ServiceNow Budgets, anomaly tickets, optimization approvals, CMDB link Anomaly → incident/request; commitment-purchase change gate
Posture / waste Wiz Idle, oversized, and orphaned-resource findings as cost signal Read-only scan; findings tagged with owner; feed to ServiceNow
Automation / CI GitHub Actions + Terraform IaC for connectors/exporters; scheduled metric push OIDC to clouds (no stored keys); exporter as scheduled workflow
Optimization actor Terraform / Ansible Apply the rightsizing/cleanup a finding recommends PR-gated changes; Ansible for in-place resize where safe

A few choices carry the architecture and are the ones teams get wrong.

Why amortized cost is the default view, not actual. Actual cash-out spikes the month you prepay a Reserved Instance and then looks artificially cheap for a year — useless for unit economics, because a customer onboarded in month two shouldn’t look cheap because of a commitment bought in month one. Cloudability amortizes the commitment across its term so each day shows the effective cost of what ran that day. Unit economics must be built on the amortized, fully-allocated view; the actual/cash view is a separate lens for treasury, not for margin.

Why shared cost allocation is the whole game. The 35–45% of spend that carries no customer tag — the shared Kubernetes cluster, the multi-tenant database, NAT and egress, the Datadog and Cloudability bills themselves — is exactly where unprofitable customers hide. Allocating it credibly is the difference between unit economics the board trusts and a number they dismiss. The cluster split by real CPU/memory/GPU usage is the anchor; everything else is a defensible driver. Document the methodology, because the first question finance asks is “how did you decide my product carries that overhead.”

Why the KPIs have to leave Cloudability. A cost number a finance analyst sees once a month changes nothing. The same number on the Datadog dashboard an engineer already watches — cost-per-request rising next to p99 latency — gets acted on in the same sprint. Activation is not a nice-to-have; it is the step that converts visibility into saved dollars.

Unit economics: the model that answers the board

The board’s question decomposes into a chain, and each link is a join in the model.

cost_to_serve(customer) =
      direct_tagged_cost(customer)                          # resources tagged to the customer
    + k8s_allocated_cost(customer)                          # cluster cost ÷ real CPU/mem/GPU usage
    + shared_db_cost(customer)                              # weighted by query/storage share
    + network_egress_cost(customer)                         # by measured traffic share
    + platform_overhead × fair_key_share(customer)          # residual spread by seats/devices/revenue

gross_margin(customer) = revenue(customer) − cost_to_serve(customer)
margin_pct(customer)   = gross_margin(customer) / revenue(customer)
cost_per_active_device = cost_to_serve(customer) / active_devices(customer)

The revenue feed comes from the billing/CRM system keyed on the same customer identifier that Business Mappings produce — getting those two identifiers to agree is the unglamorous integration that makes or breaks the project. Once they join, the output is a per-customer P&L line that updates daily.

Unit economics KPI What it answers Who acts on it Where it surfaces
Gross margin % per customer Which accounts are profitable to serve Finance, Sales/CS leadership ServiceNow exec dashboard
Cost per active device Is the product getting cheaper to run at scale Product, Engineering Datadog product dashboard
Cost per customer trend (MoM) Is a specific account’s cost drifting Account owner / CS ServiceNow + Datadog monitor
Margin by tier (SMB/Mid/Ent) Is our packaging priced right Pricing, Finance ServiceNow report
Unallocated cost % How much we cannot explain (data quality) FinOps team Datadog monitor (alert >5%)

That last KPI — unallocated cost percentage — is the platform’s own health metric. If 12% of spend lands in “unallocated,” the unit economics are 12% fiction, and the FinOps team’s job that week is to fix tagging and mapping until it drops under the 3–5% you can honestly call rounding.

Commitment optimization: the largest controllable lever

Rate optimization through commitments is usually the single biggest savings line, and it is exactly the analysis you do not want to do by hand across three clouds with different instruments — AWS Savings Plans and RIs, Azure Reservations and Savings Plans, GCP Committed Use Discounts. Cloudability models the commitment portfolio against actual and forecast usage and recommends what to buy, at what term and payment option, to maximize effective savings rate without over-committing into capacity you will stop using.

The operating loop is deliberately human-gated, because a three-year commitment is a balance-sheet decision:

  1. Cloudability computes current coverage (share of eligible usage covered by a commitment) and utilization (share of purchased commitment actually used) per cloud, and flags the gap to the optimal frontier.
  2. It recommends specific purchases — e.g., “add a 1-year, no-upfront Compute Savings Plan at $14/hr commitment; projected effective savings 31%; breakeven at 71% utilization.”
  3. The recommendation raises a ServiceNow change request to FinOps and finance with the projected savings, breakeven, and risk if usage drops. No commitment is ever bought automatically.
  4. On approval, the purchase is recorded; Cloudability then tracks realized utilization and alerts in Datadog if utilization decays below breakeven — the signal that a commitment is now costing money, which matters because an underused RI is worse than no RI.
Lever Mechanism Typical effect
Commitment coverage Savings Plans / Reservations / CUDs to optimal coverage 20–40% off on-demand for steady baseline
Rightsizing Cloudability + Wiz recommendations applied via Terraform/Ansible 10–25% on oversized compute
Idle & orphan cleanup Wiz findings → ServiceNow task → cleanup PR Recovers 5–15% of “always-on, never-used”
Storage tiering Lifecycle to infrequent/archive tiers by access pattern 30–60% on cold data
Egress reduction Allocate egress by customer; surface to owning team Behavioral — eng fixes chatty paths

The discipline that makes this safe: coverage and utilization are watched together. Chasing coverage alone leads to buying commitments you don’t use; watching utilization alone leaves savings on the table. The optimal point is a frontier, and Cloudability’s job is to keep you near it as usage shifts.

Enterprise considerations

Security & access. A FinOps platform is a read-heavy, blast-radius-low system, but it touches the entire cost surface of the company, so least privilege still matters. Every cloud connector is read-only — Cost Management Reader, BigQuery Data Viewer, an AWS billing-read policy — and the credentials those connectors use are issued and rotated by HashiCorp Vault with short leases, never long-lived keys pasted into a SaaS console (a billing-reader key in plaintext is still a key worth stealing for reconnaissance). Human access to Cloudability, Datadog, and ServiceNow is SSO via Okta federated to Entra ID with SCIM provisioning, and RBAC splits cleanly: finance sees margins and revenue, engineering sees cost-per-service but not customer revenue, and only FinOps admins edit Business Mappings. CrowdStrike Falcon protects the small compute that runs the exporter and any self-hosted collectors, feeding the SOC, because that job holds API tokens to three SaaS platforms.

Cost of the FinOps platform itself. The irony is real: Cloudability is licensed as a percentage of cloud spend under management, Datadog custom metrics and ServiceNow seats cost money, and the exporter and collectors are compute. Include the FinOps tooling in its own showback so the platform is held to the ROI standard it imposes on everyone else — a FinOps practice that cannot show it saves multiples of its cost loses its mandate at the next budget cut. The realistic bar: the platform should surface savings worth several times its all-in cost within the first two quarters, primarily through commitment optimization and idle cleanup.

Scalability. Cloudability is SaaS and scales with line-item volume on Apptio’s side, so the parts you operate are the ingestion edges and the exporter. The thing that breaks at scale is cardinality: pushing per-customer-per-service cost into Datadog as custom metrics across two hundred customers and forty services is forty thousand time series, and custom-metric pricing punishes that. Push aggregated KPIs (per team, per product, top-N customers by spend, margin distribution) to Datadog and keep the full per-customer grain in Cloudability where it belongs, querying it on demand. The BigQuery and storage exports scale natively; the Kubernetes usage collector scales with cluster size and should sample, not stream every pod-second.

Failure modes, and what each looks like. Name them before they corrupt a board deck.

Reliability & cadence. This is not a five-nines latency system; its SLO is freshness and correctness, not uptime. The target most teams set: fully-allocated cost available by 9am daily with a tolerated lag of one billing cycle for true-ups, and month-end close reconciled to the cloud invoices within 1%. The exporter is idempotent and re-runnable, so a missed push self-heals on the next run; the durable record always lives in Cloudability and the raw billing exports, both of which are independently recoverable.

Observability of the platform. Instrument the FinOps pipeline itself in Datadog: ingestion freshness per cloud, allocation-run success, unallocated-%, exporter run status, and the count of open FinOps tickets in ServiceNow. A FinOps platform that cannot tell you it is healthy will quietly feed a wrong number into a pricing decision, and the cost of that is far larger than the platform.

Governance. Budgets and forecasts live in Cloudability with alerts routed to ServiceNow so a breach is a tracked work item with an owner and an SLA, not a chart someone may notice. Commitment purchases pass a ServiceNow change gate with the projected savings and breakeven attached, giving finance a documented decision trail. The whole allocation methodology is a versioned, reviewable artifact — because the moment unit economics influence pricing or which customers to retain, the method becomes auditable, and “trust me, the spreadsheet said so” does not survive a board’s scrutiny.

Explicit tradeoffs

Accept these or do not build it. Cloudability is a meaningful annual cost priced against your spend, and you are betting that bought, normalized, multi-cloud cost intelligence beats building and maintaining it — true for most companies above roughly seven figures of annual cloud spend, false for a single-cloud startup where native tools and tags suffice. Allocation is modeling, not measurement: every shared-cost split is a defensible choice, not a physical truth, and reasonable people will argue the drivers — the answer is to make the method transparent and consistent, not to pretend it is exact. The activation layer is extra plumbing — an exporter, Datadog cardinality discipline, ServiceNow workflows — that a small team can skip in favor of reading Cloudability directly, and absolutely cannot skip once the goal is changing engineering behavior at scale. And unit economics demand that finance’s revenue model and engineering’s cost model share a customer key, which is an organizational alignment problem disguised as an integration — the hardest part of the whole build, and the part no tool solves for you.

The alternatives, and when they win. If you live on one cloud, that provider’s native cost tooling plus disciplined tagging and a BI dashboard will take you a long way for free; reach for Cloudability when the second or third cloud arrives and the addition stops being possible by hand. If you are an engineering-led org that wants cost in code and Kubernetes-native allocation above polished finance reporting, OpenCost/Kubecost plus a homegrown warehouse is a credible, cheaper path — at the cost of building the rate optimization and multi-cloud normalization Cloudability gives you. And if you are pre–product-market-fit, skip all of this: a monthly look at the bill and a tag policy is the right amount of FinOps, and this architecture is premature until cost is both large and shared enough that the unit is genuinely invisible — which is exactly the logistics-tech company’s situation, and exactly when this platform earns its keep.

The shape of the win

For the logistics-tech CFO, the payoff is not “a cost dashboard.” It is that the next board meeting opens with a slide showing gross margin per customer trending up three points over two quarters — because the platform revealed that the bottom decile of accounts ran at negative margin on a shared cluster nobody had allocated, pricing was adjusted at renewal, two chronically chatty integrations were fixed once an engineer saw egress cost next to latency in Datadog, and a year of commitment coverage was bought at the optimal frontier through a ServiceNow-gated decision finance actually signed. The cost-to-serve number that was a guess is now a daily, defensible, fully-allocated line item that a former McKinsey partner cannot poke a hole in. Everything upstream — the three normalized billing feeds, the amortized view, the Kubernetes split, the Vault-issued reader credentials, the Datadog activation, the ServiceNow governance — exists to make that one slide true. Start narrower if you must, with one cloud and showback before chargeback, but for a multi-cloud company at scale, this is where defensible unit economics have to land.

FinOpsMulti-cloudCloudabilityUnit EconomicsCost OptimizationEnterprise
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading