A fast-growing SaaS company opens its Azure invoice and finds ₹2.4 crore — roughly triple the forecast — and nobody can say whose spend it is. The bill is real, the resources are real, and yet half of it lands in an “unallocated” bucket because the resources shipped without tags, the cost data was read in ActualCost (so a one-off Reservation purchase made one team look like it 10×'d for a month), idle non-production environments ran 24×7 over a weekend with a budget alert nobody wired to an action, and production VMs were sized for a peak that lasts ninety minutes a day. None of that is a finance problem you discover when the bill arrives. Every rupee of it was an engineering design decision made — or skipped — at provision time. This is the gap FinOps closes: a cultural and operational practice that brings engineering, finance and product into one feedback loop so the people who spend the money also see and own it, in near-real time, while the workload is still running.
This article is the operating model, not a feature tour. Azure Cost Management — the native, free billing-analytics service built into every subscription — is the data plane; the discipline around it is what makes spend predictable at scale. You will learn to run the loop the diagram below traces: govern and tag at the management-group root so every resource is attributable; ingest usage as amortized daily exports in the FOCUS schema; allocate 100% of the invoice (including shared hub costs) back to teams as showback or chargeback; optimize the rate (Reservations, Savings Plans, Azure Hybrid Benefit) and the usage (right-sizing, auto-stop); and act through budgets that alert on forecast and trigger automation, not just email. Because cost work at scale is a reference discipline — you return to it every month-end and every anomaly — the tag schema, the export options, the commitment matrix, the budget knobs and the failure modes are all laid out as scannable tables. Read the prose once; keep the tables open at month-end.
By the end you will stop being surprised by the invoice. You will know why a chart says a team’s spend tripled and went to zero (ActualCost vs Amortized), why showback never reconciles to the bill (unsplit shared cost), why a Reservation discount landed on a team that never paid for it (Shared scope), and why a budget “fired” but the spend kept climbing (no action group). Knowing which leak you are looking at — and the one az command or Cost Analysis view that confirms it — is what turns a quarterly bill-shock into a Tuesday adjustment.
What problem this solves
Cloud’s pay-as-you-go model inverts the old capex control. There is no purchase order, no procurement gate, no “the server is full” ceiling. An engineer types az vm create and the meter starts; a misconfigured autoscale rule or a forgotten P3v3 in a dev resource group bleeds money silently until the invoice lands a month later. The spend is decentralized (hundreds of engineers can provision), continuous (per-second metering), and opaque after the fact (the invoice is a single number unless you built the attribution beforehand). Without FinOps, finance sees a number it can’t question and engineering sees uptime it’s proud of, and the two never reconcile.
What breaks without this: the unallocated bucket grows until “who owns this?” is unanswerable; reserved-capacity decisions get made on gut feel (or not at all), leaving 30–50% pay-as-you-go premium on steady-state compute; idle resources — dev environments, orphaned disks, unattached public IPs, over-provisioned databases — accrete because nobody is accountable for switching them off; and anomalies (a runaway query, a leaked credential spinning up crypto-mining VMs, a log-ingestion explosion) are discovered weeks late, on the invoice, instead of within hours by an alert. The damage is both money and trust: when finance can’t predict the bill, they cap cloud spend bluntly, and engineering velocity dies under approval gates.
Who hits this: every organization past a single team on Azure. It bites hardest on multi-subscription enterprises (where the Azure resource hierarchy of management groups, subscriptions and resource groups is the cost-allocation boundary), on platform teams running shared services (hub firewall, Log Analytics, gateways) that no single product wants to pay for, and on anyone who bought commitments before understanding their baseline. The fix is never “spend less” as a blanket order — it’s making spend visible, attributable, and optimizable so each team trims its own waste while shipping faster.
To frame the whole field before the deep dive, here is every cost-leak class this article covers, the question it forces, and the one place to look first:
| Leak class | What you observe | First question to ask | First place to look | Most common single cause |
|---|---|---|---|---|
| Unallocated spend | A large “untagged/no CostCenter” bucket | Are resources born tagged, or tagged later (never)? | Cost Analysis grouped by CostCenter tag | No tag-inheritance policy at MG scope |
| Skewed cost trends | A team “tripled then went to zero” | Am I reading ActualCost or AmortizedCost? | Cost Analysis metric selector | Reporting in ActualCost; an RI/SP landed |
| Showback ≠ invoice | Per-team sum < total bill | Is shared/hub cost being split to teams? | Cost allocation rules; amortized totals | Shared services have no allocation rule |
| Commitment waste | Low RI/SP utilization, or discount on wrong team | Is the commitment scoped and sized to a real baseline? | Reservations → Utilization; appliedScopeType | Over-bought, or Shared scope on a single workload |
| Idle / over-provisioned | Advisor flags right-sizing; non-prod runs 24×7 | Does this resource’s size/uptime match real load? | Advisor Cost recommendations | No auto-stop; SKU sized for rare peak |
| Silent overrun | Bill spikes, found weeks late | Did anything alert before the invoice? | Budgets + anomaly alerts | Budget with email but no action/forecast |
Learning objectives
By the end of this article you can:
- Stand up a tag-governance baseline with Azure Policy (
require-tagdeny + tag-inheritance modify) at management-group scope, and remediate existing untagged resources — so every cost is attributable. - Read Azure Cost Management correctly: choose AmortizedCost vs ActualCost for the question you’re asking, group by tag/RG/service, and pull data via the Query API instead of clicking.
- Configure daily cost exports in the FOCUS schema to ADLS Gen2 for analysis in a lakehouse, and explain when an export beats the portal.
- Build showback and chargeback that reconciles to 100% of the invoice, including cost allocation rules that split shared hub services back to teams.
- Choose between Reservations, Savings Plans, Azure Hybrid Benefit and Spot, size them to a measured baseline, scope them correctly (single vs shared), and monitor utilization so you never strand a commitment.
- Right-size with Azure Advisor and automate non-production shutdown so idle capacity stops costing money.
- Create budgets that alert on forecast and drive action groups / automation runbooks, plus anomaly alerts, so an overrun is caught in hours, not on the invoice.
- Map the whole practice to the FinOps Framework phases (Inform → Optimize → Operate) and to AZ-104 / AZ-305 cost objectives.
Prerequisites & where this fits
You should already understand the Azure control-plane shape: a tenant contains management groups (an inheritance tree), under which sit subscriptions (the billing and policy boundary), each holding resource groups and resources — the model covered in Azure Resource Hierarchy Explained. You should know that Azure Policy evaluates and can deny or modify resources (see Azure Policy and Governance at Scale), and be comfortable running az in Cloud Shell, reading JSON, and basic KQL. Familiarity with reservations and savings plans mechanics from Azure Cost: Reservations, Savings Plans & Hybrid Benefit Strategy lets you go deeper on the commitment math; this article uses them as one lever in the larger loop.
This sits in the Governance & FinOps track and is the cost counterpart to the security/identity governance you apply in an Enterprise-Scale Landing Zone. It depends on the resource hierarchy (your allocation boundary) and on policy (your enforcement engine), and it feeds observability — the same Azure Monitor and Application Insights telemetry that tells you a workload is slow also tells you it’s expensive (log-ingestion cost is a real line item). For the deep commitment-engineering mechanics, The Azure FinOps Engineering Guide is the companion to this operating-model view.
A quick map of who owns what in the cost loop, so you route a question to the right team fast:
| Layer | What lives here | Who usually owns it | Cost-leak classes it can cause |
|---|---|---|---|
| Management group / policy | Tag inheritance, deny rules, initiative | Platform / governance | Unallocated spend (no tag policy) |
| Subscription | Billing boundary, budgets, RBAC | Platform + finance | Showback gaps; over-broad budgets |
| Resource group | Workload grouping, tags, lifecycle | App / product team | Idle resources; untagged RGs |
| Cost Management data | Usage records, amortization, exports | FinOps / data | Skewed trends (Actual vs Amortized) |
| Commitment layer | RI / SP / AHB scope and utilization | FinOps + finance | Commitment waste; wrong-scope discount |
| Automation / alerting | Budgets, anomaly, action groups, runbooks | Platform + SRE | Silent overruns (no action wired) |
Core concepts
Six mental models make every later decision obvious.
Cost is created at provision time, not billed time. The invoice is a lagging report of decisions already made. Every control that matters — tagging, sizing, commitment, auto-stop — is applied before or during provisioning, in the same IaC and policy plane you use for everything else. FinOps is “shift-left” for money: the cheapest place to fix a cost is in the pull request that created the resource, not in the meeting that reviews the bill.
The resource hierarchy is the cost-allocation hierarchy. Spend rolls up exactly the way the management group → subscription → resource group → resource tree does. Tags add an orthogonal dimension (CostCenter, Owner, Environment, Product) so you can slice cost by team across subscriptions, or by environment within one. If a resource is untagged and lives in a shared subscription, it is effectively un-ownable — which is why tag governance is the foundation, not a nicety.
Amortized cost is the truth for trends; actual cost is the truth for cash. ActualCost records the charge on the day it hits the account — so a 1-year Reservation paid upfront shows the entire year’s cost on the purchase day, then ₹0 for that resource for 12 months. AmortizedCost spreads that commitment evenly across its term, so a team’s monthly trend reflects consumption, not payment timing. Use Amortized for showback, budgets and trend analysis; use Actual only when reconciling to the cash invoice. Reading the wrong one is the single most common analysis mistake.
Cost optimization has two independent axes: rate and usage. You reduce the rate (price per unit) with commitments — Reservations, Savings Plans, Hybrid Benefit, Spot — without changing what you run. You reduce the usage (units consumed) with right-sizing, auto-stop, deleting orphans, and architectural changes (serverless, autoscale-to-zero). They compose: right-size first (so you don’t commit to oversized capacity), then commit to the smaller, stable baseline.
Showback informs; chargeback enforces. Showback shows each team its cost without moving money (visibility, low friction, the usual starting point). Chargeback actually bills the cost back to the team’s budget (real accountability, real friction, needs trust and clean allocation first). Both require that you can attribute ~100% of the invoice — including shared services — or teams reject the numbers as unfair.
Budgets are alerts, not limits — until you wire them to action. An Azure budget does not stop spending when breached; by default it sends an email. It becomes a control only when its alert triggers an action group that runs automation (a Function or Automation runbook that deallocates non-prod). Alerting on forecast (predicted month-end) rather than actual buys you the days needed to act before the overage.
The vocabulary in one table
Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side:
| Concept | One-line definition | Where it lives | Why it matters to cost at scale |
|---|---|---|---|
| Cost Management | Native, free cost-analysis/budgets/exports service | Every subscription / billing account | The data plane for the whole loop |
| ActualCost | Charge on the day it hits the account | Cost Analysis metric | Skews trends when a commitment lands |
| AmortizedCost | Commitment spread across its term | Cost Analysis metric | The truth for showback and trends |
| Tag | Key/value metadata on a resource/RG/sub | Resource Manager | The cost-attribution dimension |
| CostCenter / Owner tag | Who pays / who is responsible | Tag schema | Turns spend into team-level cost |
| Budget | A spend threshold with alert rules | Cost Management | Catches overrun (only if wired to action) |
| Anomaly alert | ML-detected spend deviation | Cost Management | Catches unexpected spend early |
| Reservation (RI) | 1/3-yr capacity pre-commit (~up to 72% off) | Reservations | Rate cut on steady-state compute |
| Savings Plan (SP) | $/hr compute commitment (flexible) | Savings Plans | Rate cut with SKU/region flexibility |
| Azure Hybrid Benefit | Use owned Windows/SQL licenses | Resource config | Removes license cost on eligible SKUs |
| Spot | Evictable surplus capacity (deep discount) | VM/VMSS/AKS config | Cheap for interruptible workloads |
| Cost allocation rule | Splits shared cost to teams | Cost Management | Makes showback reconcile to 100% |
| Export | Scheduled cost data to storage | Cost Management | Lakehouse-scale analysis (FOCUS) |
| Advisor (Cost) | Right-sizing/idle recommendations | Azure Advisor | The usage-reduction worklist |
Tag governance: making every cost attributable
If you fix nothing else, fix tagging — it is the foundation every other control stands on. The goal: every resource is born with the tags that let you attribute its cost, enforced by policy, with existing resources remediated. Manual tagging always decays; the only durable approach is deny what’s untagged and inherit tags down from the resource group/subscription.
The tag schema
Decide the schema once and enforce it everywhere. A pragmatic, cost-focused minimum:
| Tag key | Purpose | Example values | Enforcement | Allocation use |
|---|---|---|---|---|
CostCenter |
Finance code that pays | CC-4412, CC-7781 |
Deny if missing | Chargeback line |
Owner |
Accountable person/DL | team-payments@, vinod.h@ |
Deny if missing | Who to ping on overrun |
Environment |
Lifecycle stage | prod, staging, dev, sandbox |
Allowed-values + deny | Non-prod auto-stop targeting |
Product / Service |
App/workload name | checkout, search, billing |
Deny if missing | Per-product unit economics |
BusinessUnit |
Org rollup | retail, platform |
Inherit from MG | Executive showback |
Project |
Initiative / funding line | migration-2026, bau |
Optional, allowed-values | Project-based budgets |
DataClass |
Sensitivity (governance) | public, confidential |
Audit (not cost, but ride-along) | Compliance filtering |
ExpiryDate |
Auto-cleanup date | 2026-12-31 |
Audit + automation reads it | Drives orphan/sandbox sweep |
ManagedBy |
IaC vs manual | terraform, bicep, portal |
Audit | Flags click-ops drift |
Two rules keep the schema usable: keep it small (5–7 cost tags; every extra mandatory tag is friction at create time and a source of deny-failures) and lowercase, fixed-vocabulary values (use Azure Policy allowedValues for Environment, or prod and Prod and PROD fracture your reports).
Enforce with Azure Policy: deny + inherit + remediate
Three policy patterns work together. Deny blocks creation of a resource missing a required tag. Modify (inherit) copies a tag from the resource group (or subscription) onto the resource if absent — invaluable because many resource types are created by services that don’t set tags. Audit reports non-compliance without blocking (use while you roll out, before flipping to deny).
# Assign the built-in "Require a tag on resources" (deny) at a management group
az policy assignment create \
--name "require-costcenter" \
--display-name "Require CostCenter tag (deny)" \
--scope "/providers/Microsoft.Management/managementGroups/mg-landingzones" \
--policy "871b6d14-10aa-478d-b590-94f262ecfa99" \
--params '{ "tagName": { "value": "CostCenter" } }'
# Assign "Inherit a tag from the resource group if missing" (modify) — needs an identity for remediation
az policy assignment create \
--name "inherit-costcenter" \
--display-name "Inherit CostCenter from RG" \
--scope "/providers/Microsoft.Management/managementGroups/mg-landingzones" \
--policy "cd3aa116-8754-49c9-a813-ad46512ece54" \
--params '{ "tagName": { "value": "CostCenter" } }' \
--mi-system-assigned --location centralindia
In Bicep, ship the assignment as code so it lives in the landing-zone repo, reviewed in PRs:
// Inherit-tag (modify) assignment at a management group, with a managed identity for remediation
resource inheritCostCenter 'Microsoft.Authorization/policyAssignments@2024-04-01' = {
name: 'inherit-costcenter'
location: 'centralindia'
identity: { type: 'SystemAssigned' }
properties: {
displayName: 'Inherit CostCenter from RG'
policyDefinitionId: tenantResourceId('Microsoft.Authorization/policyDefinitions', 'cd3aa116-8754-49c9-a813-ad46512ece54')
parameters: { tagName: { value: 'CostCenter' } }
enforcementMode: 'Default' // 'DoNotEnforce' = audit-only while rolling out
}
}
Existing resources stay non-compliant until you run a remediation task (the modify effect only fires on create/update otherwise):
# Find the assignment's policy definition reference, then remediate existing resources
az policy remediation create \
--name "remediate-inherit-costcenter" \
--policy-assignment "inherit-costcenter" \
--resource-discovery-mode ReEvaluateCompliance
Confirm coverage — the number that proves tagging is working is the compliance percentage and the size of the unallocated bucket:
# Summarize compliance for the require-tag policy across the MG
az policy state summarize \
--management-group mg-landingzones \
--filter "PolicyAssignmentName eq 'require-costcenter'" \
--query "policyAssignments[].results.{nonCompliant:nonCompliantResources, total:resourceDetails[0].count}"
The policy effects, what each does, and when to use which:
| Policy effect | What it does at evaluation | Blocks creation? | Fixes existing? | Use it when |
|---|---|---|---|---|
| Audit | Logs non-compliance, no change | No | No | Rolling out; measuring before enforcing |
| Deny | Rejects the create/update request | Yes | No | The tag is mandatory going forward |
| Modify (add/inherit) | Adds/replaces the tag | No (allows + fixes) | Yes (via remediation) | Backfilling + auto-tagging from RG/sub |
| Append | Adds a property if missing (legacy) | No | No (create-time) | Older tag-add scenarios; prefer Modify |
| DeployIfNotExists | Deploys a related resource | No | Yes (remediation) | Tagging via a deployed config, advanced |
| Disabled | Turns the rule off | No | No | Temporarily suspend without deleting |
The classic tag-governance failure modes and how each shows up:
| Symptom | Root cause | Confirm | Fix |
|---|---|---|---|
| Big “no CostCenter” bucket in Cost Analysis | No require-tag policy, or only at sub scope | az policy state summarize shows low compliance |
Assign deny + inherit at MG; remediate |
| Some resource types still untagged | Created by a service that ignores tags | Resource Graph: where isnull(tags.CostCenter) |
Add Modify-inherit; remediation task |
Reports fracture across prod/Prod |
No fixed vocabulary on values | Group by Environment shows duplicates | allowedValues policy; normalize existing |
| Deny breaks pipelines | A required tag isn’t set by IaC | Deployment error names the tag | Set the tag in the module’s tags block |
| Tags exist but cost still unattributed | Tag added after billing period | Cost predates the tag | Remediate early; tags apply forward only |
A subtlety that bites: tags are not retroactive in cost data. Tagging a resource today does not re-attribute last month’s spend for it. This is why you enforce tagging before a workload accrues cost, and why the remediation task should run as soon as the policy is assigned — every untagged day is a day of unallocated spend you can’t fix later.
Reading Cost Management correctly: analysis, amortization, and the Query API
Cost Management is free and built in, but it answers different questions depending on the scope, the metric, and the grouping you choose. Getting these three right is most of the skill.
Scopes: where you point the analysis
Cost data exists at several scopes; you analyze and budget at the one that matches your accountability boundary:
| Scope | What it aggregates | Who uses it | Note |
|---|---|---|---|
| Billing account (EA/MCA) | Everything under the agreement | Finance, central FinOps | Highest level; invoice reconciliation |
| Billing profile / invoice section (MCA) | A billing slice | Finance | MCA-specific grouping |
| Management group | All subs beneath it | Platform / BU leads | Org/BU rollups |
| Subscription | One sub’s resources | App + finance | Most common budget scope |
| Resource group | One workload | Product team | Fine-grained showback |
| Tag filter (cross-scope) | All resources with a tag value | Per-team across subs | The team view |
Metric: AmortizedCost vs ActualCost (the most important toggle)
The metric selector in Cost Analysis silently changes the answer. Internalize this table:
| Question you’re asking | Use this metric | Why |
|---|---|---|
| What did this team consume this month? | AmortizedCost | Spreads commitments; reflects usage |
| What is the monthly trend per product? | AmortizedCost | Trend isn’t distorted by purchase dates |
| What will I be billed in cash this period? | ActualCost | Matches the invoice cash-flow |
| Did a Reservation purchase hit this month? | ActualCost | The upfront charge shows on its day |
| Showback / chargeback numbers | AmortizedCost | Fair per-team consumption |
| Reconciling my export to the PDF invoice | ActualCost | The invoice is actual charges |
The trap, concretely: a team buys a ₹12,00,000 1-year upfront VM Reservation on the 5th. In ActualCost, June shows ₹12,00,000+ for that team and July–next-May show ~₹0 for those VMs — a chart that looks like a 10× spike then a collapse. In AmortizedCost, every month shows ~₹1,00,000 — the real consumption. Every showback report, budget, and trend should be Amortized; reserve Actual for cash reconciliation.
Grouping and filtering
Group by the dimension that answers your question; the common ones:
| Group by | Answers | Typical use |
|---|---|---|
| Service name | Where is the money going by service? | “Storage is 40% — why?” |
| Resource type | Which resource kind dominates? | VMs vs disks vs DBs split |
| Resource group | Which workload costs most? | Per-team RG showback |
| Resource | Which exact resource? | Hunting the expensive single thing |
| Tag (CostCenter/Owner/Environment) | Which team/env? | Showback; non-prod ratio |
| Location | Which region? | Egress and region-price analysis |
| Meter | Which billed unit? | RU/s, GB-month, vCPU-hours detail |
| Reservation | Commitment utilization | Are we using what we bought? |
| Subscription | Which sub drives cost? | Per-sub budget vs actual |
| Charge type | Usage / purchase / refund | Separate commitments from usage |
Pull data with the Query API, not clicks
At scale you do not click through Cost Analysis monthly — you query. The Cost Management Query API returns aggregated, server-side-grouped cost so you build dashboards and month-end packs programmatically:
# Amortized cost this month, grouped by the CostCenter tag, at a subscription scope
SUB=$(az account show --query id -o tsv)
az rest --method post \
--uri "https://management.azure.com/subscriptions/$SUB/providers/Microsoft.CostManagement/query?api-version=2024-08-01" \
--body '{
"type": "AmortizedCost",
"timeframe": "MonthToDate",
"dataset": {
"granularity": "None",
"aggregation": { "totalCost": { "name": "Cost", "function": "Sum" } },
"grouping": [ { "type": "TagKey", "name": "CostCenter" } ]
}
}'
For ad-hoc CLI summaries, az consumption usage list reads metered records, and Azure Resource Graph finds the resources behind the cost (e.g. every untagged or orphaned thing):
// Resource Graph: resources missing a CostCenter tag (the unallocated bucket's membership)
Resources
| where isnull(tags['CostCenter']) or tags['CostCenter'] == ''
| project name, type, resourceGroup, subscriptionId, location
| order by type asc
// Resource Graph: orphaned managed disks (Unattached) — pure waste, delete or snapshot
Resources
| where type == 'microsoft.compute/disks' and properties.diskState == 'Unattached'
| project name, resourceGroup, sizeGB = properties.diskSizeGB, sku = sku.name
| order by sizeGB desc
The data sources and what each is best for:
| Source | Granularity | Best for | Note |
|---|---|---|---|
| Cost Analysis (portal) | Aggregated, interactive | Ad-hoc exploration | Click; not for automation |
| Query API | Aggregated, scriptable | Dashboards, month-end packs | Server-side grouping; respects Amortized |
az consumption usage list |
Per usage record | Quick CLI checks | Metered detail; rate-limited |
| Exports (FOCUS) | Full per-record dataset | Lakehouse analysis at scale | Daily/monthly to storage |
| Azure Resource Graph | Resource inventory | Finding the resources (orphans, untagged) | Not cost numbers, but the targets |
| Advisor (Cost) | Recommendations | The right-size/idle worklist | Actionable, prioritized |
Cost exports and the FOCUS schema
When cost analysis outgrows the portal — you want to join cost to your own data (deployments, business KPIs), retain history beyond the portal’s window, or run it through a lakehouse — you configure a scheduled export. An export writes the full, per-record cost dataset to an ADLS Gen2 / Storage container on a daily or monthly cadence.
The current best practice is to export in the FOCUS schema (FinOps Open Cost and Usage Specification) — a vendor-neutral column set so the same Spark/SQL works across clouds and the same dashboards survive a billing change. Configure it:
# Create a daily FOCUS-format export of amortized cost to a storage container
az costmanagement export create \
--name "daily-focus-export" \
--scope "/subscriptions/$SUB" \
--storage-account-id "/subscriptions/$SUB/resourceGroups/rg-finops/providers/Microsoft.Storage/storageAccounts/stfinopsexports" \
--storage-container "cost-focus" \
--timeframe MonthToDate \
--recurrence Daily \
--recurrence-period from="2026-06-01T00:00:00Z" to="2027-06-01T00:00:00Z" \
--schema-version "1.0" --format Csv
The export options and when each matters:
| Option | Values | When to change | Note |
|---|---|---|---|
| Schema | FOCUS / legacy Actual / Amortized | FOCUS for new pipelines | FOCUS is cross-cloud, future-proof |
| Timeframe | MonthToDate / previous month / custom | MTD for a rolling daily push | Daily MTD overwrites the month file |
| Recurrence | Daily / Weekly / Monthly | Daily for fresh dashboards | Monthly for invoice-close snapshots |
| Format | CSV / Parquet | Parquet for lakehouse | Smaller, typed; better for Spark |
| Partitioning | On / off (file partitioning) | On for very large accounts | Splits big months into chunks |
| Destination | Storage account + container | — | Use a locked-down FinOps storage acct |
| Scope | Sub / MG / billing account | Billing account for org-wide | Higher scope = full picture in one file |
| Overwrite vs append | Replace or add daily file | Overwrite for MTD; append for history | Decide retention strategy upfront |
| Compression | None / gzip (with CSV) | gzip for large CSV | Smaller egress/storage footprint |
When to use an export instead of the portal or Query API:
| Need | Portal | Query API | Export |
|---|---|---|---|
| Quick “where’s the money” look | Best | OK | No |
| Automated daily dashboard refresh | No | Good | Good |
| Join cost to deployments / KPIs | No | Hard | Best |
| Retain >13 months history | No | No | Best |
| Run through Spark / SQL warehouse | No | No | Best |
| Cross-cloud unified schema | No | No | Best (FOCUS) |
A practical note on the destination: put exports in a dedicated FinOps storage account with restricted RBAC, lifecycle rules to tier old months to cool/archive, and (ideally) a private endpoint — cost data is sensitive (it reveals architecture and scale). The same storage fundamentals you’d apply to any data apply here.
Allocation: showback, chargeback, and splitting shared cost
Attribution is only useful if it reconciles to the invoice. The hardest part at scale is shared cost — the hub firewall, Bastion, Log Analytics workspace, DDoS plan, and gateways that serve everyone and are owned by the platform team’s subscription. If you ignore them, the sum of per-team showback is always less than the bill, and teams (rightly) distrust numbers that don’t add up.
Showback vs chargeback
| Dimension | Showback | Chargeback |
|---|---|---|
| What it does | Shows each team its cost | Bills cost to the team’s budget |
| Money moves? | No | Yes (internal cross-charge) |
| Friction | Low | High |
| Accountability | Awareness | Real ownership |
| Prerequisite | Tagging | Tagging + clean shared-cost split + trust |
| Good starting point | Yes (start here) | After showback is trusted |
| Risk if done early | Low | Teams reject “unfair” numbers |
Cost allocation rules: split the shared cost
Cost Management supports cost allocation rules that take a source (a shared resource group or subscription) and distribute its cost to target teams by a chosen basis — proportional to compute spend, proportional to total cost, or a fixed percentage. This is how showback reaches 100%.
| Allocation basis | How it splits shared cost | Best when | Watch-out |
|---|---|---|---|
| Proportional to total cost | By each team’s share of total spend | Default, “fair” general split | Big teams subsidize small ones evenly |
| Proportional to compute | By compute (vCPU) spend | Shared cost tracks compute (e.g. logs) | Storage-heavy teams under-charged |
| Proportional to a specific tag/metric | By a chosen dimension | A clear cost driver exists | Needs a clean driver metric |
| Fixed percentage | Hard-coded splits per team | Stable, negotiated agreements | Drifts from reality; revisit quarterly |
| Even split | Equal shares | Few teams, similar size | Penalizes small teams |
The reconciliation check that proves allocation works — the per-team amortized total must equal the invoice amortized total:
# Per-CostCenter amortized totals (sum these; it must equal the account amortized invoice total)
az rest --method post \
--uri "https://management.azure.com/subscriptions/$SUB/providers/Microsoft.CostManagement/query?api-version=2024-08-01" \
--body '{
"type": "AmortizedCost", "timeframe": "TheLastMonth",
"dataset": { "granularity": "None",
"aggregation": { "total": { "name": "Cost", "function": "Sum" } },
"grouping": [ { "type": "TagKey", "name": "CostCenter" } ] }
}' --query "properties.rows"
The allocation failure modes:
| Symptom | Root cause | Confirm | Fix |
|---|---|---|---|
| Per-team sum < invoice | Shared cost not allocated | Compare grouped total vs account total | Add a cost allocation rule for shared RGs |
| One team’s cost jumped, no usage change | A new shared service got split to them | Diff the allocation rule’s basis/period | Re-examine the basis; pin a fairer metric |
| “Unallocated” still large after rules | Untagged resources upstream | Resource Graph untagged query | Fix tagging first; allocation can’t fix tags |
| Teams dispute the split | Basis doesn’t match their driver | Review which basis is configured | Switch to a driver-aligned basis; socialize it |
| Chargeback rejected by finance | Numbers don’t tie to GL | Reconcile amortized export to invoice | Use Actual for cash tie-out; Amortized for show |
Rate optimization: reservations, savings plans, Hybrid Benefit and Spot
The rate axis cuts price-per-unit without changing what you run. Azure offers four overlapping levers; choosing among them is the core commitment decision. (For the full commitment-engineering math, see Azure Cost: Reservations, Savings Plans & Hybrid Benefit Strategy; here is the operating-model view.)
The four levers compared
| Lever | What you commit to | Discount (rough) | Flexibility | Best for |
|---|---|---|---|---|
| Reservation (RI) | A specific SKU family + region, 1 or 3 yr | Up to ~72% vs PAYG | Low (instance-size flex within family) | Stable, known SKU baseline (VMs, SQL, Cosmos RU, Storage) |
| Savings Plan (SP) | A fixed $/hour of compute, 1 or 3 yr | Up to ~65% vs PAYG | High (any region/SKU compute) | Steady compute spend, changing shapes |
| Azure Hybrid Benefit (AHB) | Nothing — use owned licenses | Windows ~40%+, SQL large | N/A (eligibility-based) | You own Windows Server / SQL Server licenses |
| Spot | Nothing — take evictable capacity | Up to ~90% vs PAYG | N/A (can be evicted with 30s notice) | Interruptible: batch, CI, dev, stateless scale |
| Dev/Test pricing | A Dev/Test subscription offer | Reduced Windows/some rates | N/A (subscription-type gated) | Non-prod environments under EA/Dev-Test |
| Pay-as-you-go (no commit) | Nothing | 0% (list price) | Maximum | Spiky/unknown/short-lived workloads |
These stack: apply AHB to remove license cost, cover the steady compute baseline with a Savings Plan or RIs, and burst on Spot for interruptible work. Right-size before committing, or you lock in oversized capacity.
Term, payment and break-even
| Choice | Options | Trade-off |
|---|---|---|
| Term | 1-year vs 3-year | 3-yr deeper discount, less flexibility/longer lock-in |
| Payment | Upfront vs monthly | Upfront slightly cheaper; monthly preserves cash & avoids ActualCost spike |
| RI vs SP | Specific SKU vs flexible $/hr | RI deeper for a known shape; SP forgiving as shapes change |
| Coverage target | % of baseline committed | Commit the floor (e.g. P50 of steady usage), leave headroom on PAYG/Spot |
| Scope | Single vs Shared vs MG | Single = predictable ownership; Shared = max utilization but messy attribution |
The break-even rule of thumb: a 3-year RI/SP typically pays back versus pay-as-you-go in roughly 8–14 months depending on SKU and discount, so it only makes sense for capacity you are confident will run past that window. Commit the stable floor of usage, not the peak.
Scope: the leak that lands a discount on the wrong team
A commitment’s scope decides which resources receive its discount. Shared scope auto-applies the discount to any matching resource across the billing account — maximizing utilization but meaning the discount can land on a team that never paid for the commitment. Single scope ties it to one subscription. For clean chargeback, default to single scope unless you deliberately want pooled utilization.
# Inspect a reservation order's scope and utilization
az reservations reservation-order list --query "[].{name:displayName, term:term, billingPlan:billingPlan}" -o table
# Change a reservation's applied scope to a single subscription (clean attribution)
az reservations reservation update \
--reservation-order-id <orderId> --reservation-id <reservationId> \
--applied-scope-type Single --applied-scopes "/subscriptions/$SUB"
Monitor utilization — an under-used commitment is wasted money, the inverse of the problem you bought it to solve:
| Commitment metric | What it tells you | Healthy | Action if unhealthy |
|---|---|---|---|
| Utilization % | How much of the commit is used | >90% sustained | Re-scope (Single→Shared) or resize down at renewal |
| Coverage % | How much eligible usage is committed | 60–80% of baseline | Buy more if PAYG hours are high and stable |
| Applied scope | Single / Shared / MG | Matches chargeback model | Re-scope to Single for clean attribution |
| Expiry date | When the term ends | Tracked + alerted | Renew or let lapse deliberately, never by surprise |
| PAYG hours above commit | Uncommitted steady usage | Low | Candidate for an additional commitment |
The commitment failure modes:
| Symptom | Root cause | Confirm | Fix |
|---|---|---|---|
| A team’s spend “tripled then zeroed” | Upfront commitment read in ActualCost | Spike aligns with purchase date | Report in AmortizedCost everywhere |
| Discount on a team that didn’t buy | Shared scope auto-applying org-wide | appliedScopeType == Shared |
Re-scope to Single; default new buys Single |
| Low RI/SP utilization | Over-bought, or baseline shrank | Reservations → Utilization < 90% | Re-scope Shared for pooling; resize at renewal |
| Committed but still high PAYG bill | Coverage too low vs stable usage | PAYG hours high and flat | Increase coverage on the stable floor |
| Bought RI then re-architected to serverless | Committed to capacity you no longer run | Utilization drops post-migration | Prefer SP (flexible) when shapes may change |
| Windows VMs at full price | AHB not enabled despite owned licenses | VM shows PAYG Windows rate | Enable Hybrid Benefit on eligible SKUs |
Usage optimization: right-sizing, auto-stop, and killing orphans
The usage axis reduces units consumed. It is where the fastest wins live, because most fleets carry obvious waste: over-sized SKUs, non-production running 24×7, and orphaned resources nobody deletes.
Right-sizing with Advisor
Azure Advisor continuously analyzes utilization and recommends downsizing or shutting down underused resources, with the estimated saving attached. It is your prioritized worklist.
# List Advisor Cost recommendations with estimated annual savings
az advisor recommendation list --category Cost \
--query "[].{resource:impactedValue, problem:shortDescription.problem, savings:extendedProperties.annualSavingsAmount}" -o table
The usage-reduction worklist, by lever and typical payoff:
| Usage lever | What it targets | Typical saving | Effort | Risk |
|---|---|---|---|---|
| Right-size VMs/DBs | Over-provisioned SKUs | 20–50% on those resources | Low | Validate headroom for peaks |
| Auto-stop non-prod | Dev/test running 24×7 | ~65% on non-prod compute | Low | Schedule must respect work hours |
| Delete orphans | Unattached disks, unused IPs, stale snapshots | Pure waste removed | Low | Confirm truly unused first |
| Autoscale / scale-to-zero | Fixed capacity for variable load | Tracks demand | Medium | Tune min/max; cold-start cost |
| Serverless / consumption | Idle always-on services | Pay-per-use | Medium | Re-architecture; cold starts |
| Storage tiering | Hot data that’s actually cold | 50%+ on cold blobs | Low | Retrieval cost/latency on archive |
| Log-ingestion control | Verbose/duplicated logs | Often large | Low | Don’t drop signal you need |
| Disk SKU downgrade | Premium SSD on low-IOPS disks | 30–60% on those disks | Low | Validate IOPS/throughput need |
| Egress reduction | Cross-region/internet traffic | Varies | Medium | Private Link, same-region, CDN |
| Snapshot lifecycle | Snapshots never pruned | Pure waste removed | Low | Keep a retention policy |
Auto-stop non-production
Non-production compute that runs nights and weekends is the most common easy win. Target it by the Environment tag and deallocate on a schedule (deallocated VMs stop compute charges; you still pay for disks). Azure Automation, a Logic App, or a scheduled Function all work:
# Deallocate every VM tagged Environment=dev (run on a schedule via Automation/Functions)
az vm deallocate --ids $(az vm list --query "[?tags.Environment=='dev'].id" -o tsv)
The key distinction that catches people: Stop (deallocate) releases the compute and stops billing for it; Stop (from inside the OS) leaves the VM allocated and still billing. Always deallocate.
| VM power state | Compute billed? | Disk billed? | Public IP (static) billed? |
|---|---|---|---|
| Running | Yes | Yes | Yes |
| Stopped (OS shutdown, still allocated) | Yes | Yes | Yes |
| Stopped (deallocated) | No | Yes | Yes |
| Deleted | No | No (if disk deleted) | No (if IP deleted) |
Hunt the orphans
Orphaned resources are silent, pure waste. The usual suspects and how to find them:
| Orphan type | Why it lingers | Find it | Action |
|---|---|---|---|
| Unattached managed disks | VM deleted, disk kept | Resource Graph diskState == 'Unattached' |
Snapshot then delete |
| Unassociated public IPs (static) | NIC/LB deleted | Graph ipConfiguration == null |
Delete |
| Stale snapshots | Backups never pruned | Graph by age on snapshots | Lifecycle-prune |
| Idle/empty App Service plans | App removed, plan kept | Plans with 0 sites | Delete the plan |
| Old disks of deallocated VMs | “We might need it” | Deallocated VM age | Review + delete |
| Unused NAT Gateways / gateways | Workload retired | Graph by association | Delete |
| Over-provisioned DB tiers | Sized for launch peak | Advisor + DTU/RU metrics | Scale down |
| Idle load balancers (no backends) | Backend pool emptied | Graph: empty backend pool | Delete |
| Orphaned NICs (no VM) | VM deleted, NIC kept | Graph virtualMachine == null |
Delete |
| Premium disks on stopped VMs | Dev disks left Premium SSD | Disk SKU on deallocated VMs | Downgrade to Standard |
Budgets, anomaly detection, and closing the loop with automation
Visibility and optimization are nothing without a control loop that catches overruns before the invoice. Azure gives you budgets, anomaly alerts, and action groups; the discipline is wiring them to forecast and action, not just email.
Budgets that actually control spend
A budget is a threshold at a scope with notification rules. By itself it only emails — it does not cap spending. Two design choices make it useful: alert on forecasted spend (predicted month-end, so you act early) and attach an action group that runs automation.
# Create a subscription budget that alerts at 80% actual and 100% forecast, to an action group
az consumption budget create \
--budget-name "sub-monthly-cap" \
--amount 500000 --time-grain Monthly \
--start-date 2026-06-01 --end-date 2027-06-01 \
--category Cost \
--notifications '{
"actual80": { "enabled": true, "operator": "GreaterThan", "threshold": 80,
"contactGroups": ["/subscriptions/'$SUB'/resourceGroups/rg-finops/providers/microsoft.insights/actionGroups/ag-finops"] },
"forecast100": { "enabled": true, "operator": "GreaterThan", "threshold": 100, "thresholdType": "Forecasted",
"contactGroups": ["/subscriptions/'$SUB'/resourceGroups/rg-finops/providers/microsoft.insights/actionGroups/ag-finops"] }
}'
In Bicep, ship budgets as code per landing zone so every new subscription is born with a guardrail:
resource budget 'Microsoft.Consumption/budgets@2023-11-01' = {
name: 'sub-monthly-cap'
properties: {
category: 'Cost'
amount: 500000
timeGrain: 'Monthly'
timePeriod: { startDate: '2026-06-01', endDate: '2027-06-01' }
notifications: {
actual80: { enabled: true, operator: 'GreaterThan', threshold: 80, contactGroups: [ actionGroupId ], thresholdType: 'Actual' }
forecast100: { enabled: true, operator: 'GreaterThan', threshold: 100, contactGroups: [ actionGroupId ], thresholdType: 'Forecasted' }
}
}
}
The budget knobs and how to reason about each:
| Setting | What it does | Default / typical | When to change |
|---|---|---|---|
| Amount | The threshold value | Your monthly cap | Set per scope from baseline + growth |
| Time grain | Reset cadence | Monthly | Quarterly/Annual for capex-style caps |
| Scope | Where it measures | Subscription | RG for team-level; MG for BU |
| Threshold % | Alert trip points | 50/80/100 | Add an early 50% for fast-growing subs |
| Threshold type | Actual vs Forecasted | Actual | Forecasted to act before overage |
| Action group | What fires on breach | Email only | Attach automation to control, not just notify |
| Filters | Restrict to a tag/RG/service | None | Budget a single team/product via tag filter |
| Reset / recurrence period | Start & end of the budget window | 1 year | Re-baseline annually as the estate grows |
| Notification recipients | Emails / contact roles / groups | Owner email | Route to the team that can act, not a shared inbox |
Anomaly detection: catch the unexpected
Budgets catch known limits; anomaly alerts catch unexpected deviations (a leaked key spinning up VMs, a log explosion, a runaway query) using Cost Management’s built-in ML. Subscribe to anomaly alerts so a 3× day-over-day jump pages you in hours, not on the invoice.
| Detection mechanism | Catches | Latency | Best for |
|---|---|---|---|
| Budget (actual) | Crossing a known threshold | Hours–day | Hard caps you set |
| Budget (forecast) | Predicted to cross threshold | Days early | Acting before the overage |
| Anomaly alert | Statistically unusual spend | ~Daily | Unknown unknowns (leaks, runaways) |
| Scheduled export + query | Anything you script a check for | Daily | Custom rules (per-team caps, ratios) |
| Advisor (cost) | Right-size/idle opportunities | Continuous | Proactive savings, not overruns |
Close the loop: action groups → automation
The control becomes real when the alert does something. Wire the budget/anomaly action group to an Automation runbook or Function that takes a safe action — deallocate non-prod, or post to the team channel with the offending resource and a one-click stop.
# Action group that triggers an Automation webhook on budget/anomaly breach
az monitor action-group create \
--name ag-finops --resource-group rg-finops \
--action webhook stopNonProd "https://<automation-webhook-url>" \
--action email finops finops@example.com
The escalation ladder — match the action to the severity and the environment:
| Trigger | Severity | Safe automated action | Human action |
|---|---|---|---|
| Non-prod budget 80% (actual) | Low | Post to channel | Review what’s running |
| Non-prod budget 100% (forecast) | Medium | Deallocate Environment=dev VMs |
Confirm nothing legit broke |
| Prod budget 100% (forecast) | High | Notify only (never auto-kill prod) | Investigate; scale/optimize |
| Anomaly: 3× day-over-day | High | Snapshot context, page on-call | Identify the runaway/leak |
| Anomaly in a sandbox sub | Medium | Throttle / deallocate sandbox | Find who/what spun up |
The cardinal rule: automate destructive actions only in non-production. A budget breach in prod is a notify-and-investigate event — never let automation deallocate production because a forecast crossed a line.
Architecture at a glance
The diagram traces spend the way it actually moves through a mature cost program — left to right as a closed loop — and marks the five places it leaks. Read it as a pipeline. In GOVERN + TAG, the management group anchors policy that flows down to subscriptions and resource groups; Azure Policy denies untagged resources and inherits CostCenter/Owner/Environment so every resource is born attributable (badge 1 marks the leak: weak enforcement → an unallocated bucket). Those tagged resources emit usage records into INGEST, where Cost Management amortizes commitments and a daily FOCUS export lands in ADLS Gen2 (badge 2: reading ActualCost instead of Amortized skews every trend). The amortized data feeds ALLOCATE, where showback slices cost per team by tag and a shared-split allocation rule distributes the hub firewall, Bastion and Log Analytics back to teams so the numbers reconcile to 100% (badge 3: unsplit shared cost makes showback under-count).
From allocation you derive a savings target that drives OPTIMIZE — Reservations / Savings Plans / Hybrid Benefit cut the rate on the stable baseline (badge 4: a Shared-scope commitment discounts a team that never paid), while right-sizing + auto-stop cut the usage. Finally ACT closes the loop: budgets alert on forecast and anomaly, and an action group triggers a Function/runbook that remediates — deallocating non-prod, or remediating tags back at the GOVERN stage (badge 5: a budget that emails but triggers no action lets non-prod burn over a weekend). Notice the loop closes — the remediate flow runs from ACT back to GOVERN, because the output of every overrun is a tightened policy or a stopped resource at the origin. The whole method is: govern so it’s attributable, ingest amortized, allocate to 100%, optimize rate and usage, and act on forecast with automation — and every numbered badge is a specific, confirmable leak with a one-command check.
Real-world scenario
Northwind Commerce runs a multi-tenant retail platform on Azure across 38 subscriptions organized under an mg-landingzones management group: a shared platform subscription (hub VNet, Azure Firewall, Bastion, a central Log Analytics workspace, an Application Gateway), and per-product subscriptions for checkout, search, catalog, billing and a dozen others, plus sandbox subs per team. The FinOps function is two people inside the platform team. Monthly Azure spend had grown to about ₹1.9 crore and the forecast was wrong by 30–40% every month — finance had started talking about a hard spend freeze.
The first audit was brutal. Cost Analysis grouped by CostCenter showed 41% “unallocated” — nearly half the bill belonged to no team. A chart of the billing product showed a 9× spike in March then near-zero in April, which finance had flagged as “a billing bug”; it was actually a ₹14,00,000 1-year SQL Reservation bought upfront and read in ActualCost. The platform subscription — firewall, Bastion, Log Analytics, gateway — was ₹38,00,000/month and charged to nobody, so every product’s showback was wildly understated and no team believed the numbers. Sandbox subscriptions ran 24×7; one had spun up eight Standard_NC GPU VMs for “a quick experiment” six weeks earlier and left them running — about ₹6,00,000 of pure waste discovered only because someone finally grouped by resource.
The remediation ran in three waves over a quarter. Wave 1 — make it attributable. They assigned require-tag (deny) for CostCenter, Owner, Product, Environment and inherit-tag (modify) for the same at mg-landingzones, then ran a remediation task that backfilled tags from resource groups; the unallocated bucket fell from 41% to under 4% in two weeks. They switched every report, budget and export to AmortizedCost, and the “billing bug” vanished — the SQL Reservation now showed as a flat ~₹1,16,000/month. Wave 2 — allocate to 100%. A cost allocation rule split the platform subscription proportionally to each product’s compute spend; for the first time per-team showback summed to the invoice, and the product teams accepted the numbers because they could see why they owed a slice of the firewall.
Wave 3 — optimize and close the loop. With clean tags and trusted allocation, they right-sized 60+ over-provisioned VMs and databases off Advisor (~₹11,00,000/month), put a scheduled Function on Environment=sandbox/dev to deallocate nightly and weekends (~₹9,00,000/month), and — only after right-sizing — bought a 3-year Savings Plan sized to the P50 of steady compute at Single scope per product (so each team’s discount stayed with that team), layering Azure Hybrid Benefit on their owned Windows/SQL licenses. Finally they shipped budgets-as-Bicep into every landing zone: a per-sub budget alerting at 80% actual and 100% forecast, anomaly alerts, and an action group that posts to the team channel and (for non-prod only) triggers the deallocate runbook. The next quarter’s spend landed at ₹1.34 crore — a ~30% reduction while compute capacity grew 18% — and, more importantly, the forecast came within 6% every month, so finance dropped the freeze. The lesson on the wall: “You can’t optimize what you can’t attribute — fix tags and amortization first, or every later number is a fight.”
The program as a before/after, because the order of the fixes is the lesson:
| Stage | Before | Action | After |
|---|---|---|---|
| Attribution | 41% unallocated | Deny + inherit tags at MG; remediate | <4% unallocated |
| Trend accuracy | “9× then zero” billing chart | Switch all reports to AmortizedCost | Flat, real consumption trend |
| Allocation | Platform sub charged to nobody | Allocation rule splits shared cost | Showback sums to 100% of invoice |
| Usage | 60+ oversized; sandbox 24×7; GPU orphans | Advisor right-size + auto-stop + delete | ~₹20,00,000/mo removed |
| Rate | All PAYG | Right-size then 3-yr SP (Single) + AHB | Deep discount on stable floor |
| Control | Surprise on the invoice | Budgets (forecast) + anomaly + runbook | Overruns caught in hours |
Advantages and disadvantages
The FinOps operating model both prevents a class of expensive surprises and imposes real discipline. Weigh it honestly:
| Advantages (why this model helps you) | Disadvantages (why it bites) |
|---|---|
| Cost Management is native and free — no third-party tool needed to start | Doing it well (allocation, automation, FOCUS lakehouse) is real engineering effort |
| Tag governance via policy makes every cost attributable and reconcilable to the invoice | Tag discipline is unforgiving — one missing policy and the unallocated bucket grows; tags aren’t retroactive |
| Amortized reporting gives finance a stable, trustworthy trend to forecast against | The Actual-vs-Amortized distinction is subtle and silently breaks analysis if misread |
| Reservations/SP/AHB/Spot cut steady-state cost 30–70% without changing the workload | Commitments lock you in (term, region, SKU/scope); over-buying or wrong scope wastes money |
| Budgets + anomaly + automation catch overruns in hours, not on the invoice | Budgets don’t cap spend by default; automation on prod is dangerous — careful scoping required |
| Showback creates accountability so each team trims its own waste, preserving velocity | Chargeback adds cross-charging friction and needs trust + clean allocation first |
| Right-sizing and auto-stop are fast, low-risk wins off Advisor’s prioritized list | Aggressive right-sizing without headroom causes performance incidents under peak |
The model is right for any organization past a single team — the cost of not doing it is paid in bill-shock, blunt spend freezes, and unattributable waste. It bites hardest when treated as a finance afterthought rather than an engineering practice: tags applied late (so cost is already unallocated), commitments bought before a baseline exists, and automation bolted onto production where it can do damage. Every disadvantage is manageable — and the whole point is to make cost a continuous, low-friction part of how you build, not a quarterly fire-drill.
Hands-on lab
Stand up the core controls on one subscription — tag enforcement, a budget with forecast alerting, an amortized query, and an orphan hunt — all using free Cost Management and a near-zero-cost test resource. Run in Cloud Shell (Bash).
Step 1 — Variables and a resource group.
RG=rg-finops-lab
LOC=centralindia
SUB=$(az account show --query id -o tsv)
az group create -n $RG -l $LOC -o table
Step 2 — Assign a require-tag (deny) policy at the resource-group scope (we scope to the RG for a safe, reversible lab; in production you’d scope to a management group).
az policy assignment create \
--name "lab-require-costcenter" \
--display-name "Lab: require CostCenter (deny)" \
--scope "/subscriptions/$SUB/resourceGroups/$RG" \
--policy "871b6d14-10aa-478d-b590-94f262ecfa99" \
--params '{ "tagName": { "value": "CostCenter" } }'
Step 3 — Prove the deny works. Try to create a public IP without the tag (expect a policy denial), then with it (expect success):
# Expect: RequestDisallowedByPolicy — the deny fired
az network public-ip create -g $RG -n pip-untagged -o table
# Expect: success — the required tag is present
az network public-ip create -g $RG -n pip-tagged --tags CostCenter=CC-LAB Owner=you Environment=dev -o table
The first command failing with RequestDisallowedByPolicy is the lab’s core lesson: untagged spend can’t be created, so it can’t become unallocated.
Step 4 — Create a budget with a forecast alert. A ₹1,000 monthly budget alerting at 80% actual and 100% forecast (swap in an action group ID if you have one):
az consumption budget create \
--budget-name "lab-budget" \
--amount 1000 --time-grain Monthly \
--start-date 2026-06-01 --end-date 2026-12-01 \
--category Cost \
--notifications '{
"actual80": { "enabled": true, "operator": "GreaterThan", "threshold": 80, "contactEmails": ["you@example.com"] },
"forecast100": { "enabled": true, "operator": "GreaterThan", "threshold": 100, "thresholdType": "Forecasted", "contactEmails": ["you@example.com"] }
}'
Step 5 — Query amortized cost for the subscription, grouped by CostCenter. This is the month-end pack in one call:
az rest --method post \
--uri "https://management.azure.com/subscriptions/$SUB/providers/Microsoft.CostManagement/query?api-version=2024-08-01" \
--body '{ "type": "AmortizedCost", "timeframe": "MonthToDate",
"dataset": { "granularity": "None",
"aggregation": { "total": { "name": "Cost", "function": "Sum" } },
"grouping": [ { "type": "TagKey", "name": "CostCenter" } ] } }' \
--query "properties.rows"
Expected: rows of [cost, CostCenter, currency] — your CC-LAB resources appear under their tag; anything untagged appears as a blank bucket (which, after Step 2, should be shrinking).
Step 6 — Hunt orphans with Resource Graph. Find unattached disks and unassociated static IPs across the subscription:
az graph query -q "Resources
| where (type == 'microsoft.compute/disks' and properties.diskState == 'Unattached')
or (type == 'microsoft.network/publicipaddresses' and isnull(properties.ipConfiguration))
| project name, type, resourceGroup, location" -o table
Validation checklist. You enforced tagging (deny blocked an untagged create), created a budget that alerts on forecast before the overage, pulled amortized cost grouped by team in one API call, and inventoried orphaned waste — the four pillars of the loop, on one subscription.
| Step | What you did | What it proves | Real-world analogue |
|---|---|---|---|
| 2–3 | Deny untagged create | Untagged spend can’t be born | MG-scope tag governance |
| 4 | Budget with forecast alert | You act before the overage | Per-sub budgets-as-code |
| 5 | Amortized query by tag | The correct metric, scripted | Month-end showback pack |
| 6 | Resource Graph orphan hunt | Waste is findable and deletable | Monthly orphan sweep |
Cleanup (avoid lingering charges).
az policy assignment delete --name "lab-require-costcenter" --scope "/subscriptions/$SUB/resourceGroups/$RG"
az consumption budget delete --budget-name "lab-budget"
az group delete -n $RG --yes --no-wait
Cost note. A static public IP is a few paise per hour; the whole lab runs well under ₹20, and deleting the resource group plus the budget/policy stops everything. Cost Management, budgets, and the Query API are free.
Common mistakes & troubleshooting
This is the playbook — the part you bookmark and reopen at month-end. First as a scannable table, then the entries that bite hardest with full confirm-detail.
| # | Symptom | Root cause | Confirm (exact cmd / portal path) | Fix |
|---|---|---|---|---|
| 1 | Large “unallocated / no CostCenter” bucket | No tag-governance policy, or only at sub scope | Cost Analysis → group by CostCenter; az policy state summarize low compliance |
Deny + inherit tags at MG scope; remediation task |
| 2 | A team “tripled then dropped to zero” | Reading ActualCost; a commitment landed | Cost Analysis metric = Actual; spike aligns with RI/SP buy date | Switch all reports/budgets/exports to AmortizedCost |
| 3 | Per-team showback < invoice total | Shared/hub cost not allocated to teams | Sum grouped amortized < account amortized | Add a cost allocation rule for shared RGs |
| 4 | Reservation discount on a team that didn’t buy | Shared scope auto-applies org-wide | az reservations reservation list → appliedScopeType == Shared |
Re-scope to Single; default new buys Single |
| 5 | Low RI/SP utilization, money wasted | Over-bought, or baseline shrank/re-architected | Reservations → Utilization < 90% | Re-scope Shared to pool; resize/let lapse at renewal |
| 6 | Budget “fired” but spend kept climbing | Budget emails only; no action group / no forecast | Budget has notifications, no action group | Wire to action group → runbook; alert on forecast |
| 7 | Tagged today but last month still unallocated | Tags aren’t retroactive in cost data | Cost predates the tag application | Enforce + remediate early; can’t backfill old cost |
| 8 | Non-prod bill high despite “stopped” VMs | VMs stopped from OS, still allocated | az vm get-instance-view → PowerState/stopped (not deallocated) |
Deallocate (az vm deallocate), not OS shutdown |
| 9 | Storage cost creeping with little new data | Hot tier for cold data; orphaned snapshots/disks | Cost by meter; Resource Graph orphan query | Lifecycle-tier to cool/archive; delete orphans |
| 10 | Log Analytics / App Insights bill exploded | Verbose or duplicated ingestion | Cost by service = Monitor; ingestion volume spike | Sampling, table-level retention, drop noisy logs |
| 11 | Export job produces no/partial files | Wrong scope, storage RBAC, or schema mismatch | az costmanagement export show; storage container empty |
Fix scope/RBAC; re-run; verify FOCUS schema |
| 12 | Anomaly/overrun found weeks late on invoice | No anomaly alert; no forecast budget | No anomaly subscription; budgets actual-only | Enable anomaly alerts + forecast budgets |
| 13 | Chargeback numbers rejected by finance | Amortized used for cash tie-out (or vice-versa) | Numbers don’t tie to GL/invoice | Actual for cash tie-out, Amortized for showback |
| 14 | Right-sized then performance incidents | Downsized without peak headroom | Advisor applied blindly; p95 CPU/RU pinned post-change | Re-size up; validate against real peak before cutting |
The expanded form, for the entries that cost the most time and money:
1. A large “unallocated / no CostCenter” bucket in Cost Analysis.
Root cause: No require-tag (deny) and no inherit-tag (modify) policy, or they’re only at subscription scope so new subs and service-created resources slip through.
Confirm: Cost Analysis → group by CostCenter shows a big blank bucket; az policy state summarize --management-group <mg> --filter "PolicyAssignmentName eq 'require-costcenter'" shows low compliance; Resource Graph Resources | where isnull(tags['CostCenter']) lists the offenders.
Fix: Assign deny + inherit at the management-group root; run a remediation task to backfill existing resources; add allowedValues on Environment to stop value fragmentation.
2. A team “tripled then went to zero.”
Root cause: Reporting in ActualCost, so an upfront Reservation/Savings Plan purchase posts its whole charge on the buy date, then ₹0 for that resource over the term.
Confirm: The Cost Analysis metric selector reads Actual; the spike date matches a reservation order’s purchase date (az reservations reservation-order list).
Fix: Switch every report, budget, and export to AmortizedCost; reserve Actual only for cash-invoice reconciliation.
3. Per-team showback sums to less than the invoice.
Root cause: Shared services (hub firewall, Bastion, Log Analytics, gateways) in the platform subscription aren’t allocated to teams.
Confirm: Sum the per-CostCenter amortized totals from the Query API; it’s less than the account amortized total.
Fix: Add a cost allocation rule that splits the shared RGs/subscription to teams by a basis (proportional to compute is usually fairest); re-check that the per-team sum now equals the invoice.
4. A reservation discount landed on a team that never bought it.
Root cause: The commitment was purchased with Shared applied-scope, so its discount auto-applies to any matching resource across the billing account.
Confirm: az reservations reservation list shows appliedScopeType == Shared.
Fix: az reservations reservation update --applied-scope-type Single --applied-scopes /subscriptions/<id>; make Single the default for new commitments unless you deliberately want pooled utilization.
6. A budget “fired” but spend kept climbing.
Root cause: Budgets don’t cap spend — by default they email. The alert wasn’t wired to an action group that runs automation, and it alerted on actual (too late) rather than forecast.
Confirm: The budget shows notifications but no contactGroups/action group; threshold type is Actual.
Fix: Attach an action group → Automation runbook/Function that deallocates non-prod; add a Forecasted threshold so you act days before the overage. Never auto-deallocate production.
8. Non-prod bill stays high even though VMs are “stopped.”
Root cause: The VMs were stopped from inside the OS (or “Stop” that leaves them allocated) — compute is still billed. Only deallocated VMs stop compute charges.
Confirm: az vm get-instance-view --ids <id> --query "instanceView.statuses[?starts_with(code,'PowerState')].code" shows PowerState/stopped rather than PowerState/deallocated.
Fix: Use az vm deallocate (or the auto-stop runbook) — and remember disks and static IPs still bill even when deallocated.
10. Log Analytics / Application Insights bill exploded. Root cause: Verbose, duplicated, or unsampled ingestion — a chatty app, debug logging left on, or multiple agents shipping the same data. Confirm: Cost by service shows Monitor climbing; the workspace’s ingestion volume spikes; the same telemetry observability story from Azure Monitor and Application Insights. Fix: Turn on adaptive sampling, set table-level retention (keep verbose tables short), drop noisy logs at the data collection rule, and consolidate duplicate agents — without dropping signal you need for incidents.
Best practices
- Enforce tagging at the management-group root, with deny + inherit + remediation. Tags applied by hand decay; the only durable attribution is “untagged can’t be created” plus “inherit from the RG” plus a remediation task for what already exists. Do this first — every later number depends on it.
- Report everything in AmortizedCost. Showback, budgets, trends, exports — all Amortized. Reserve ActualCost strictly for tying out to the cash invoice. Reading the wrong metric is the most common analysis error.
- Allocate to 100%, including shared cost. Use cost allocation rules to split the hub firewall, Bastion, Log Analytics and gateways back to teams, or showback never reconciles and teams reject it.
- Right-size before you commit. Cut over-provisioned SKUs off Advisor first, then buy Reservations/Savings Plans against the smaller, stable floor — never commit to oversized capacity.
- Default commitments to Single scope. Single keeps each team’s discount with the team that paid for it (clean chargeback). Use Shared deliberately, only when you want pooled utilization and accept messier attribution.
- Layer the rate levers. Apply Hybrid Benefit to owned licenses, cover the steady baseline with a Savings Plan (flexible) or Reservations (deeper for a known shape), and burst interruptible work on Spot.
- Auto-stop non-production by the
Environmenttag. Deallocate (not OS-stop) dev/test/sandbox nights and weekends — ~65% off non-prod compute for a few lines of automation. - Budgets-as-code in every landing zone. Ship a per-subscription budget in Bicep so every new sub is born with a guardrail; alert on forecast at 100% and actual at 80%.
- Wire alerts to action, and automate destructively only in non-prod. A budget that emails is a smoke detector with no sprinkler; attach a runbook. Never let automation deallocate production.
- Enable anomaly alerts. Budgets catch known limits; anomaly detection catches the unknowns — leaked keys, runaway queries, log explosions — in hours rather than on the invoice.
- Run a regular cadence. Weekly anomaly/utilization review, monthly showback pack and orphan sweep, quarterly commitment and right-sizing review. Cost is a habit, not a project.
- Hunt orphans on a schedule. Unattached disks, unassociated static IPs, stale snapshots, empty App Service plans — a monthly Resource Graph sweep removes pure waste.
Security notes
- Cost data is sensitive — treat it like architecture documentation. A cost export reveals your SKUs, scale, regions and service mix. Store exports in a dedicated FinOps storage account with restricted RBAC (and ideally a private endpoint), not a shared bucket.
- Least-privilege Cost Management roles. Grant Cost Management Reader for analysts who only view, Cost Management Contributor for those who manage budgets/exports, and reserve Owner/Billing roles for the few who purchase commitments. Don’t hand out billing-account access to read a chart.
- Separate “see cost” from “spend money.” Viewing cost (Reader) is very different from buying a 3-year Reservation (a real financial commitment). Gate purchases behind a named approver and an audit trail, not a broad role.
- Automation runbooks act with power — scope their identity tightly. The Function/Automation managed identity that deallocates non-prod must be limited to non-production scopes with only the deallocate/stop actions it needs — never Owner, never production. A compromised over-privileged cost runbook could take down prod.
- Cost anomalies are a security signal. A sudden spend spike is often the first visible sign of a compromise — leaked credentials spinning up crypto-mining VMs, or data exfiltration egress. Route anomaly alerts to security, not just finance.
- Don’t leak tags that are secrets.
Owneremails andCostCentercodes are fine; never put credentials, tokens, or sensitive identifiers in tags — tags are broadly readable and surface in exports and Resource Graph. - Audit who changes budgets and commitments. Budget thresholds and reservation scopes are control-plane changes; review them in the Activity log so a silently-raised budget or a re-scoped reservation doesn’t hide an overrun.
Cost & sizing
FinOps tooling is itself nearly free; the spend is in the workload, and the practice is what right-sizes it. What drives the (small) tooling cost and the (large) savings:
- Cost Management, budgets, anomaly alerts and the Query API are free. There is no excuse not to run the loop. The only direct costs are exports (you pay for the storage they write to — pennies, tiered with lifecycle rules) and any automation (a Consumption Function/Automation runbook for auto-stop costs a few rupees a month).
- The savings dwarf the tooling. Right-sizing typically reclaims 20–50% on the affected resources; auto-stop reclaims ~65% of non-production compute; Reservations/Savings Plans cut steady-state compute 30–72%; Hybrid Benefit removes license cost on eligible Windows/SQL. A program that touches all four routinely lands 25–40% off a previously un-optimized bill.
- Right-size before committing, so you don’t pay a 3-year discount on capacity you didn’t need. Commit the floor (P50 of steady usage), not the peak — leave headroom on pay-as-you-go and Spot.
- The two FinOps engineers pay for themselves many times over at this scale: on a ₹1.9 crore/month bill, a 30% reduction is ~₹57,00,000/month — the headcount is a rounding error against it.
A rough monthly picture of the tooling cost for a large multi-subscription estate (the workload is separate and is what you’re optimizing):
| Tooling cost driver | What you pay for | Rough INR / month | What it enables | Watch-out |
|---|---|---|---|---|
| Cost Management + budgets + anomaly | Native service | ₹0 | The entire analysis + alert loop | None — it’s free |
| Query API | Native API | ₹0 | Scripted month-end packs, dashboards | Throttled at high call rates |
| Daily FOCUS export storage | ADLS Gen2 GB-month | ~₹100–1,000 | Lakehouse-scale analysis, history | Lifecycle-tier old months |
| Auto-stop automation | Function/Automation runs | ~₹50–500 | ~65% off non-prod compute | Schedule must respect work hours |
| Lakehouse compute (optional) | Spark/SQL for exports | Varies | Cost joined to KPIs / unit economics | Only if you outgrow the portal |
| Net effect | — | Tooling ≈ ₹1k–2k | Savings ≈ 25–40% of the bill | Effort is the real cost, not money |
Interview & exam questions
1. What is the difference between AmortizedCost and ActualCost, and which do you use for showback? ActualCost records a charge on the day it hits the account, so an upfront Reservation shows its whole cost on the purchase day then ₹0 over the term. AmortizedCost spreads commitments evenly across their term, reflecting consumption. Use Amortized for showback, budgets and trends; use Actual only to reconcile to the cash invoice.
2. You see a large “unallocated” bucket in Cost Analysis. What’s the cause and the durable fix? Resources shipped untagged because there’s no tag-governance policy (or only at subscription scope). The durable fix is deny (require-tag) plus modify (inherit-tag) at the management-group root, then a remediation task to backfill existing resources — and accept that tags aren’t retroactive, so cost before tagging stays unallocated.
3. A reservation’s discount is landing on a team that never paid for it. Why, and how do you fix it? The reservation was bought with Shared applied-scope, which auto-applies its discount to any matching resource across the billing account. Re-scope it to Single (the subscription that owns the baseline) with az reservations reservation update --applied-scope-type Single, and default new commitments to Single for clean chargeback.
4. How do you make a budget actually control spend rather than just notify? A budget only emails by default. Attach an action group that triggers an Automation runbook/Function to take a safe action (deallocate non-prod), and alert on Forecasted spend so you act before the overage. Critically, never auto-deallocate production — that’s a notify-and-investigate event.
5. Reservations vs Savings Plans — when do you pick which? Reservations commit to a specific SKU family + region and give the deepest discount (up to ~72%) for a known, stable shape. Savings Plans commit to a fixed $/hour of compute with full region/SKU flexibility (up to ~65%) and are forgiving when your shapes change. Pick RIs for a fixed baseline you’re confident in; SPs when the workload mix evolves.
6. Why must you right-size before buying commitments? Commitments lock in a rate for whatever capacity you run; if you commit to oversized resources you pay a multi-year discount on waste. Right-size off Advisor first, then commit to the smaller, stable floor of usage — never the pre-optimization or peak number.
7. How do you allocate shared services (hub firewall, Log Analytics) so showback reconciles to the invoice? Use a cost allocation rule that splits the shared resource group/subscription to teams by a basis — proportional to compute spend is usually fairest. Without it, the sum of per-team showback is always less than the bill and teams reject the numbers. Validate by checking the per-team amortized total equals the account amortized total.
8. A dev VM was “stopped” but still costs money. Why? It was stopped from inside the OS (or otherwise left allocated) — Azure still bills compute for allocated VMs. Only deallocated VMs stop compute charges (az vm deallocate); even then, disks and static public IPs continue to bill.
9. What does the FOCUS schema give you over the legacy export formats? FOCUS (FinOps Open Cost and Usage Specification) is a vendor-neutral, standardized column set, so the same queries and dashboards work across clouds and survive a billing-format change. It future-proofs a lakehouse cost pipeline and eases multi-cloud unit-economics.
10. How do you catch a runaway cost (a leaked key spinning up VMs) before the invoice? Budgets catch known thresholds; anomaly alerts (Cost Management’s built-in ML) catch statistically unusual spend day-over-day and page you in hours. Wire both to an action group, and route anomaly alerts to security too — a spend spike is often the first visible sign of a compromise.
11. What’s the difference between showback and chargeback, and which do you start with? Showback shows each team its cost without moving money (low friction — start here). Chargeback actually bills the cost to the team’s budget (real accountability, more friction). Move to chargeback only after showback is trusted and you can attribute ~100% of the invoice, including shared cost.
12. Which Azure roles separate “viewing cost” from “spending money,” and why does it matter? Cost Management Reader views cost; Cost Management Contributor manages budgets/exports; purchasing Reservations/Savings Plans needs billing/owner-level rights. Separating them enforces least privilege — viewing a chart shouldn’t grant the ability to make a 3-year financial commitment.
These map to AZ-104 (Administrator) — monitor and manage Azure resources, cost management, budgets, tags — and AZ-305 (Solutions Architect) — design a cost-optimized architecture, governance, and the resource-organization/allocation model. The commitment and billing depth touches the Microsoft FinOps guidance and the FinOps Framework certification. A compact mapping for revision:
| Question theme | Primary cert | Objective area |
|---|---|---|
| Tags, policy governance, allocation | AZ-104 / AZ-305 | Governance; resource organization |
| Cost Management, budgets, alerts | AZ-104 | Monitor & manage resources |
| Amortized vs Actual, exports/FOCUS | FinOps Framework | Inform / data |
| Reservations, Savings Plans, AHB, Spot | AZ-305 / FinOps | Cost-optimized design; Optimize |
| Right-sizing, auto-stop, anomalies | AZ-305 / FinOps | Optimize / Operate |
| Showback vs chargeback, scope/roles | FinOps Framework | Operate; allocation |
Quick check
- A chart shows one team’s spend at 9× in March and near-zero in April. Which cost metric are you almost certainly reading, and what should you switch to?
- Your per-team showback sums to ₹1.2 crore but the invoice says ₹1.6 crore. What’s the most likely cause and the fix?
- True or false: an Azure budget will stop spending once it’s breached.
- A 3-year Reservation’s discount is applying to teams that didn’t buy it. What setting caused this and what do you change it to?
- You “stopped” all dev VMs from inside the OS but the non-prod bill barely moved. Why, and what’s the correct action?
Answers
- You’re reading ActualCost, which posts an upfront Reservation/Savings Plan charge entirely on its purchase day and ₹0 over the rest of the term. Switch every report, budget and export to AmortizedCost, which spreads the commitment across its term and reflects real consumption.
- Shared cost isn’t being allocated — the hub firewall, Log Analytics, Bastion and gateways in the platform subscription aren’t split back to teams, so the per-team sum is short of the invoice. Add a cost allocation rule to distribute the shared RGs/subscription to teams (proportional to compute is usually fairest), then re-check that the per-team amortized total equals the account amortized total.
- False. A budget only alerts (emails by default); it does not cap spend. It becomes a control only when its alert triggers an action group → automation that takes action, and you alert on forecast to act before the overage. Never auto-deallocate production.
- The reservation was bought with Shared applied-scope, which auto-applies the discount org-wide. Change it to Single scope (
az reservations reservation update --applied-scope-type Single --applied-scopes /subscriptions/<id>) tied to the subscription that owns the baseline, and default new commitments to Single. - VMs stopped from inside the OS stay allocated, and Azure still bills compute for allocated VMs. Use
az vm deallocate(or an auto-stop runbook) to release the compute — though disks and static public IPs continue to bill even when deallocated.
Glossary
- FinOps — a cultural and operational practice bringing engineering, finance and product into one loop so cloud spend is visible, attributable and continuously optimized; phased as Inform → Optimize → Operate.
- Azure Cost Management — the native, free service for cost analysis, budgets, alerts and exports, built into every subscription and billing account.
- ActualCost — the cost metric recording a charge on the day it posts to the account; an upfront commitment shows its full cost on the purchase day.
- AmortizedCost — the cost metric spreading a commitment evenly across its term, reflecting consumption; the correct metric for showback, budgets and trends.
- Tag — key/value metadata on a resource, resource group or subscription; the orthogonal dimension that lets you slice cost by team, environment or product.
- CostCenter / Owner / Environment tags — the minimal cost-attribution schema: who pays, who’s responsible, and which lifecycle stage (drives non-prod auto-stop).
- Azure Policy (deny / modify / audit) — the enforcement engine: deny blocks untagged creation, modify (inherit) copies tags down and backfills via remediation, audit reports without blocking.
- Showback — showing each team its cost without moving money (visibility, low friction).
- Chargeback — actually billing cost back to a team’s budget (accountability, higher friction; needs clean allocation and trust first).
- Cost allocation rule — a Cost Management rule that splits shared cost (hub firewall, Log Analytics, gateways) to teams by a basis, so showback reconciles to 100% of the invoice.
- Export / FOCUS — a scheduled write of the full cost dataset to storage; FOCUS is the vendor-neutral FinOps Open Cost and Usage Specification schema for cross-cloud analysis.
- Reservation (RI) — a 1- or 3-year commitment to a specific SKU family + region for a deep discount (up to ~72%) on a known, stable baseline.
- Savings Plan (SP) — a 1- or 3-year commitment to a fixed $/hour of compute with region/SKU flexibility (up to ~65%); forgiving as workload shapes change.
- Azure Hybrid Benefit (AHB) — using owned Windows Server / SQL Server licenses to remove license cost on eligible Azure SKUs.
- Spot — evictable surplus capacity at a deep discount (up to ~90%) for interruptible workloads; can be reclaimed with ~30 seconds’ notice.
- Applied scope (Single / Shared) — which resources a commitment discounts; Single ties it to one subscription (clean attribution), Shared auto-applies org-wide (max utilization, messy attribution).
- Budget — a spend threshold at a scope with alert rules; alerts (doesn’t cap) on actual or forecasted spend, and becomes a control when wired to an action group.
- Anomaly alert — Cost Management’s ML-based detection of statistically unusual spend, catching unknown-unknowns (leaks, runaways) day-over-day.
- Right-sizing — matching a resource’s SKU to its real utilization (off Azure Advisor) to cut the usage axis of cost.
- Deallocate — fully releasing a VM’s compute so it stops billing (distinct from an OS-level stop, which leaves the VM allocated and still billing).
- Azure Advisor (Cost) — the service that continuously recommends right-sizing and idle-resource cleanup with estimated savings — the prioritized usage-reduction worklist.
Next steps
You can now stand up the full cost-control loop — attribute, amortize, allocate, optimize and act. Build outward:
- Next: Azure Cost: Reservations, Savings Plans & Hybrid Benefit Strategy — go deep on the commitment math, break-even and scope decisions behind the rate axis.
- Related: The Azure FinOps Engineering Guide — the engineering-grade companion: amortization internals, allocation queries and the commitment loop in code.
- Foundation: Azure Policy and Governance at Scale — the enforcement engine behind tag governance, deny rules and remediation.
- Foundation: Azure Resource Hierarchy Explained — the management-group/subscription/RG tree that is your cost-allocation boundary.
- Related: Azure Enterprise-Scale Landing Zone — where budgets-as-code, tag policy and shared-service allocation live in a real platform.
- Related: Azure Monitor & Application Insights for Observability — control the log-ingestion line item that quietly inflates many bills.