Azure Fundamentals

Azure Global Infrastructure: Geographies, Regions, Availability Zones, Availability Sets, Fault & Update Domains

Every Azure service you will ever use — a virtual machine, a database, a storage account — ultimately runs on a real computer, in a real building, in a real country, plugged into real power and cooling. That physical reality is Azure’s global infrastructure, and understanding it is the difference between an app that quietly survives a building fire and one that vanishes when a single rack loses power. It is also the most-asked beginner and interview topic in the whole Azure world: “What’s the difference between a region and an availability zone?”, “Fault domain versus update domain?”, “What SLA do I get from an availability set?” come up in screening calls, on the AZ-900 exam, and again on AZ-104.

This lesson builds the complete picture from a single server in a rack out to the worldwide map of geographies, and gives you the precise, confident answer to every one of those questions. It is beginner-friendly (every term is defined the first time it appears) yet deep enough that you can reason about real availability decisions and the SLAs that back them.

Learning objectives

By the end of this lesson you can:

Prerequisites & where this fits

You need only basic IT literacy — you know what a server, a network, and a data centre are — and, for the hands-on lab, a free Azure account (the lab itself costs nothing if you clean up). No prior Azure knowledge is assumed; this is Lesson 2 of Module 1 — Foundations in the Azure Zero-to-Hero course, following the cloud-fundamentals lesson and preceding your first hands-on tooling lesson. The mental models here — failure domains, zones, region pairs — are not VM trivia: they recur in every Azure service (databases, load balancers, Kubernetes, storage redundancy), so the time you invest now pays off for the rest of the course.

Core concept: the failure domain

Before the map, internalise the single idea the topic hangs on. A failure domain is a set of components that share a single point of failure — if one thing breaks, all of them go down together. A power distribution unit feeds a rack of servers: if it fails, every server in that rack dies at once, so that rack is one failure domain. A whole building shares a power feed, a cooling plant, and a network entry point: lose those and the building is a failure domain.

High availability is nothing more than spreading copies of your workload across different failure domains so no single failure can take all of them. Everything in Azure’s physical hierarchy is, at heart, a progressively larger failure-domain boundary you can spread across — different racks, buildings, cities, countries. Climb that ladder and your “blast radius” (how much breaks when one thing fails) shrinks, in exchange for a little more cost and complexity. The rest of the lesson is just naming the rungs.

From datacentre to geography: the physical hierarchy

Azure’s global infrastructure is a clean set of nested boundaries. We will build it from the smallest physical unit outward.

Datacentres — the physical foundation

A datacentre is a single, secured building full of racks of servers, with its own power feeds, cooling, and network connectivity. This is the lowest level of Azure’s physical infrastructure and one you almost never touch directly — Microsoft does not let you choose an individual datacentre, and a single building is a failure domain you would never bet a production workload on. Datacentres are the raw material; the boundaries above them are what you deploy into.

Availability zones — physically separate datacentres within a region

An availability zone (AZ) is one or more datacentres inside a region with independent power, cooling, and networking, physically separated from the other zones (far enough that a single fire, flood, or power event cannot affect more than one) yet connected by a high-speed, low-latency private network. Every region that supports zones has at least three, numbered 1, 2, and 3.

The point of zones is datacentre-level fault tolerance: spread your workload across Zone 1, 2, and 3, and the loss of an entire building keeps your app running in the other two. Zones are the standard way to achieve high availability within a single region. One subtlety worth knowing early: zone numbers are logical and per-subscription — my “Zone 1” is not guaranteed to be the same physical building as your “Zone 1” — which is why Microsoft also exposes physical zone identifiers for the rare cases that need exact co-location.

Regions — the unit you actually deploy into

A region is a set of datacentres within a latency-defined perimeter connected by a dedicated low-latency network — in practice a metropolitan area such as Central India (Pune), West Europe (Netherlands), or East US (Virginia). The region is what you pick when you create almost any resource: “deploy this VM in Central India.” Azure has 60+ regions worldwide, more than any other provider — a genuine advantage for data-residency and latency needs.

Regions come in two flavours: regions with availability zones (offering the three-zone HA story above) and non-zonal regions (older or smaller regions where you fall back to availability sets for local HA). When zone-level resilience matters, choose a zone-enabled region; new regions launch with zones by default.

Region pairs — two regions tied together for resilience

Most regions are joined into a region pair: two regions within the same geography (so they stay on the same side of any residency boundary), usually a few hundred kilometres apart. Central India pairs with South India; East US with West US. Pairing gives three concrete benefits:

A modern caveat: Microsoft also offers some regions without a fixed pair (you arrange cross-region redundancy yourself), and zones now make many workloads resilient enough to skip cross-region DR. But for the exam, “region pairs are used for DR and sequential updates” is the answer.

Geographies — the data-residency boundary

A geography is a discrete market — typically a country or area such as India, the EU, the US, or the UK — containing one or more regions and preserving data-residency and compliance boundaries. Geographies are the level at which Microsoft promises “your data stays within India.” Region pairs always live inside a single geography, so failing over never moves data across a residency boundary. When a regulation says data must stay in a country, you are really choosing a geography.

To anchor the hierarchy: a VM you create lives on a server in a rack (a fault domain), in a datacentre, which belongs to an availability zone (say Zone 2), inside the Central India region, paired with South India, both within the India geography. Each layer out is a larger failure domain and a stronger residency boundary.

Azure global infrastructure: geographies, regions, AZs, fault & update domains

The diagram above shows the full picture: the worldwide geographies at the top, each holding paired regions, each region holding its three availability zones, and — zoomed in — a single availability set spreading VMs across a grid of fault domains and update domains. Read it outside-in for residency and disaster recovery, and inside-out for local high availability; almost every Azure resilience decision is a point somewhere on this diagram.

Availability sets: fault domains and update domains

Availability zones are the modern, building-level HA mechanism — but they are only offered in zone-enabled regions, and they are not the only way to get redundancy. The original, datacentre-local mechanism is the availability set, and it is where the famous fault domain and update domain concepts live. This pairing is the single most-probed sub-topic in Azure interviews, so we will be precise.

An availability set is a logical grouping you place two or more VMs into so Azure guarantees it spreads them across different physical hardware within a single datacentre. It costs nothing extra (you pay only for the VMs), and its entire job is to ensure one rack failure or one wave of host maintenance can never take down all the VMs doing the same job. It is built from two independent kinds of failure domain.

Fault domains (FD) — protection from unplanned hardware failure

A fault domain is a group of hardware sharing a common power source and network switch — essentially a server rack. If that rack’s power feed or top-of-rack switch fails, every VM in that fault domain goes down together, but VMs in other fault domains carry on unaffected. Fault domains protect you against unplanned failures: the dead power supply, the failed switch, the rack that trips offline. When you create a set you choose how many to spread across: the default is 2, the maximum is 3 (the exact ceiling varies slightly by region). With two fault domains, a single rack outage takes at most half your instances.

Update domains (UD) — protection from planned maintenance

An update domain is a logical group of VMs that Azure reboots together, one update domain at a time, during planned host maintenance. While one update domain is patched and rebooted, all the others stay running — so a maintenance wave only ever takes a slice of your fleet offline at once, then moves to the next. Update domains protect you against planned maintenance. The default is 5, the maximum is 20; with five, at most one-fifth of your VMs reboot together.

How VMs spread across the FD × UD grid

The two domain types are independent axes of a grid, and Azure stripes your VMs across both as you create them — incrementing the update domain and alternating the fault domain per VM — so no single rack failure or maintenance wave takes a disproportionate share. With 2 fault domains and 5 update domains, the first six VMs land like this:

VM Fault domain Update domain
VM 1 FD 0 UD 0
VM 2 FD 1 UD 1
VM 3 FD 0 UD 2
VM 4 FD 1 UD 3
VM 5 FD 0 UD 4
VM 6 FD 1 UD 0

The crucial consequence: an availability set is only meaningful with two or more VMs — with one VM there is nothing to spread. Because every VM in the set should do the same job (e.g. identical web servers), you put a load balancer with health probes in front so traffic reaches only the instances currently up. Two more facts interviewers love: fault and update domain counts are fixed at creation (to change them, create a new set and recreate the VMs), and you cannot move an existing VM into or out of a set in place.

This lesson covers availability sets at the depth you need for AZ-900 and most interviews. For the exhaustive, exam-grade treatment — including how VM Scale Sets layer on top, Uniform vs Flexible orchestration, autoscale, live migration and maintenance configurations — see the companion deep-dive linked in Next steps.

Zonal vs zone-redundant services

Once a region has availability zones, services use them in one of two distinct ways, and confusing the two is a classic mistake.

Remember it as: zonal = you place copies in zones; zone-redundant = Azure spreads one resource across zones. A well-architected app mixes both — zonal VMs you spread across three zones, behind a zone-redundant load balancer, with state in zone-redundant storage.

The SLA ladder: 99.9% → 99.95% → 99.99%

An SLA (Service Level Agreement) is a financially backed monthly uptime promise — a percentage Microsoft commits to, with service credits if it misses. The percentages look almost identical, but the allowed downtime is wildly different, and which one you qualify for depends entirely on how you deploy. First, what the numbers mean:

Monthly SLA Allowed downtime / month Allowed downtime / year
99.9% (“three nines”) ~43.2 minutes ~8.77 hours
99.95% ~21.9 minutes ~4.38 hours
99.99% (“four nines”) ~4.38 minutes ~52.6 minutes

Now the ladder — the deployment pattern that earns each rung for Azure VMs:

Deployment VM SLA Survives The catch
Single VM 99.9% Host issues only Only if all OS and data disks are Premium SSD or Ultra Disk. With Standard SSD it drops to 99.5%; with Standard HDD there is no SLA at all.
2+ VMs in an availability set 99.95% Rack/power/network faults and planned host maintenance — within one datacentre Does not survive a whole-datacentre (zone) outage.
2+ VMs across 2+ availability zones 99.99% The loss of an entire datacentre (zone) Only in zone-enabled regions; you must spread instances across at least two zones.

The interview trap hidden in this table: a single VM has no 99.95% SLA — that number requires two or more VMs in an availability set, and a single VM only earns any SLA when its disks are premium. The lesson Azure is teaching is that high availability comes from redundancy of instances, not from making one instance bulletproof — the instant you need a real promise, you deploy multiple instances: in a set for 99.95%, across zones for 99.99%.

Choosing a region: the five criteria

When you create a resource you must pick a region, and the choice is not arbitrary. Weigh these five criteria, roughly in order:

  1. Data residency and compliance — does the law or your policy require data to stay within a country or geography? This often eliminates most regions before anything else. (Choosing residency really means choosing a geography.)
  2. Latency / proximity to users — pick a region physically close to the people or systems that call your service; every thousand kilometres adds measurable round-trip latency.
  3. Service and feature availability — not every service, VM size, or feature exists in every region; new services and the newest GPU/VM SKUs land in a handful of large regions first. Confirm the specific services and SKUs you need exist before committing.
  4. Price — prices for the same service vary by region (driven by local power, land, and tax). A region one country over can be noticeably cheaper at scale.
  5. Region pairs / DR strategy — if you need cross-region DR, prefer a paired region and check its pair also offers the services you need. The pair is your failover target.

A sensible default: choose a zone-enabled region in the right geography, close to your users, with the services you need at an acceptable price, whose pair can serve as your DR target.

Sovereign and government clouds

Most of the world runs on Azure public (commercial) cloud. But for customers with extraordinary regulatory, security, or national-sovereignty needs, Microsoft operates sovereign clouds — physically and logically isolated, separate instances of Azure with their own datacentres, portal endpoints, compliance accreditations, and (often) operations staffed by screened in-country personnel. The two to know:

Microsoft has also introduced broader Microsoft Cloud for Sovereignty capabilities for public-sector customers in other geographies. The key exam point: sovereign clouds are separate Azure instances, not just regions — different endpoints, different compliance scope, and not automatically reachable from the commercial cloud.

Putting it together: a comparison table

The single most useful summary of this lesson is a side-by-side of the three resilience boundaries, so you can pick the right one on demand:

Availability Set Availability Zone Region Pair
Spreads across Racks within one datacentre (FDs + UDs) Separate datacentres within a region Two regions in the same geography
Protects against Rack/power/network faults + planned host maintenance Loss of an entire datacentre Loss of an entire region
Does not protect against A whole-datacentre outage A whole-region outage (Disaster-recovery scope)
VM SLA 99.95% 99.99% DR target (RTO/RPO, not an uptime SLA)
Latency between members Same datacentre — negligible Low (fast intra-region link) Higher (inter-region distance)
Extra cost None (pay for the VMs) None for the construct (data egress between zones may apply) Second-region resources + replication/egress
Set up by You (place VMs in a set) You (zonal) or Azure (zone-redundant) You (replication, e.g. Site Recovery / geo-redundant storage)
Typical use Local HA, or regions without zones Building-level HA within a region Disaster recovery across regions

A note that trips people up: for a given VM, an availability set and availability zones are mutually exclusive — you choose one availability option per VM at creation. Zones generally win where available, with sets as the fallback for non-zonal regions.

Hands-on lab: list regions/zones and create an availability set + zonal VM

This lab uses Azure Cloud Shell (the browser terminal at https://shell.azure.com — no local install) or any machine with az signed in via az login. You will explore the global infrastructure, then create an availability set and a zonal VM and read back their placement.

Step 1 — Sign in and pick a region. Choose a zone-enabled region (e.g. centralindia, eastus, or westeurope):

LOCATION=centralindia
RG=rg-infra-lab
az group create --name $RG --location $LOCATION --output table

Step 2 — List all regions to see the scale of Azure’s footprint:

az account list-locations \
  --query "[?metadata.regionType=='Physical'].{Name:name, Display:displayName, Geo:metadata.geographyGroup}" \
  --output table

You will see 60+ physical regions grouped by geography (Asia Pacific, Europe, US, and so on).

Step 3 — See which availability zones your region exposes. This lists the VM SKUs available and, for zone-enabled regions, the zones each supports:

az vm list-skus --location $LOCATION --zone --size Standard_B \
  --query "[0].{SKU:name, Zones:locationInfo[0].zones}" --output json

A zone-enabled region returns something like ["1","2","3"] — proof the region has three zones.

Step 4 — Create an availability set with explicit fault and update domain counts (2 FDs, 5 UDs — the defaults, stated for clarity), then read them back:

az vm availability-set create \
  --resource-group $RG \
  --name avset-web \
  --platform-fault-domain-count 2 \
  --platform-update-domain-count 5 \
  --output table

az vm availability-set show \
  --resource-group $RG --name avset-web \
  --query "{Name:name, FaultDomains:platformFaultDomainCount, UpdateDomains:platformUpdateDomainCount}" \
  --output table

Expected output confirms FaultDomains: 2 and UpdateDomains: 5.

Step 5 — Create two VMs inside the set (so the set actually means something) and read back which fault/update domain each landed in:

for i in 1 2; do
  az vm create \
    --resource-group $RG --name vm-web-$i \
    --availability-set avset-web \
    --image Ubuntu2204 --size Standard_B1s \
    --admin-username azureuser --generate-ssh-keys \
    --public-ip-address "" --nsg "" --output none
done

az vm get-instance-view \
  --resource-group $RG --name vm-web-1 \
  --query "instanceView.{FaultDomain:platformFaultDomain, UpdateDomain:platformUpdateDomain}" \
  --output table

vm-web-1 will report FaultDomain: 0, and vm-web-2 (check it the same way) will report a different fault/update domain — visible proof that Azure striped them across the grid.

Step 6 — Create a zonal VM pinned to availability zone 1, and confirm its zone:

az vm create \
  --resource-group $RG --name vm-zonal-z1 \
  --zone 1 \
  --image Ubuntu2204 --size Standard_B1s \
  --admin-username azureuser --generate-ssh-keys \
  --public-ip-address "" --nsg "" --output none

az vm show \
  --resource-group $RG --name vm-zonal-z1 \
  --query "{Name:name, Zone:zones[0], Location:location}" --output table

Zone: 1 confirms a zonal VM. (Note you cannot combine --zone with --availability-set — they are mutually exclusive, exactly as the comparison table said.)

Validation. You now have: an availability set reporting 2 FDs / 5 UDs, two VMs striped across different fault domains inside it, and a separate zonal VM pinned to Zone 1 — a hands-on demonstration of every core concept in this lesson.

Cleanup. Delete everything in one command so you are charged nothing further:

az group delete --name $RG --yes --no-wait

Cost note. Two or three Standard_B1s VMs left running cost only a few rupees per hour, and an availability set itself is free. If you delete the resource group promptly (Step Cleanup), the whole lab costs a negligible amount — well within free-trial credit. The usual gotcha is forgetting to delete: a stopped-but-allocated VM still bills for compute, so always remove the resource group when done.

Common mistakes & troubleshooting

Symptom Likely cause Fix
--zone and --availability-set together fail The two availability options are mutually exclusive for a VM Pick one: a zonal VM or a VM in an availability set.
Availability set seems to give no benefit Only one VM in the set — nothing to spread Deploy 2+ VMs; put a load balancer with health probes in front.
“Can’t change fault/update domain count” FD/UD counts are fixed at creation Create a new availability set with the desired counts and recreate the VMs.
Region has no availability zones to choose The region is non-zonal Use a zone-enabled region, or fall back to an availability set for local HA.
Required VM size/service missing in your region Service/SKU availability varies by region Check az vm list-skus / the region’s product availability; choose a region that has it.
Geo-redundant copy is “in the wrong region” Geo-redundancy defaults to the paired region This is expected — the pair is the DR target; choose your primary region with its pair in mind.
Promised “99.95% for one VM” doesn’t hold A single VM never gets 99.95% Use 2+ VMs in a set (99.95%) or across zones (99.99%); a single VM tops out at 99.9% with Premium disks.
Can’t reach Azure Government / China portal Sovereign clouds are separate instances with different endpoints Sign in to the correct cloud (az cloud set --name AzureUSGovernment / AzureChinaCloud) and use its URLs.

Best practices

Security notes

Physical placement has real security and compliance consequences. Data residency is the headline: choosing the right geography is often a legal requirement, not a preference, and it is enforced at the geography boundary — region pairs never move data outside it, which is precisely why pairs sit within a single geography. For strict regulatory regimes, a sovereign cloud (Azure Government, Azure China) adds physical and operational isolation plus the specific accreditations those regimes demand. Even in the commercial cloud, knowing where your data lives — and that geo-redundant copies land in the paired region within the same geography — is essential for answering compliance questionnaires honestly. All Azure datacentres are covered by Microsoft’s physical-security programme, but the placement decision (which geography, which region, public vs sovereign) is yours, and it is the part auditors ask about.

Interview & exam questions

This topic is interview gold — practise saying these answers out loud until they are automatic.

Quick check

  1. List, from smallest to largest, the five physical boundaries: datacentre, availability zone, region, region pair, geography — and say which one is the data-residency boundary.
  2. State the default and maximum counts for fault domains and update domains, and which kind of failure each protects against.
  3. A colleague says their single production VM is “guaranteed 99.95% by Azure.” What is wrong, and what does each SLA rung actually require?
  4. Explain the difference between a zonal and a zone-redundant resource, with one example of each.
  5. You must keep all customer data inside India, with the lowest latency for Mumbai users and a disaster-recovery copy. Which geography, roughly which region, and which DR target would you choose, and why?

Answers

  1. Datacentre → availability zone → region → region pair → geography. The geography is the data-residency boundary (e.g. India, EU).
  2. Fault domains: default 2, max 3 — protect against unplanned hardware failure (rack power/network). Update domains: default 5, max 20 — protect against planned maintenance, rebooted one group at a time.
  3. A single VM never gets 99.95% — that needs 2+ VMs in an availability set. A single VM gets at best 99.9% (all Premium/Ultra disks), 99.5% with Standard SSD, and none with Standard HDD. 99.99% needs instances across 2+ zones.
  4. Zonal = pinned to one zone you choose, copies deployed by you (e.g. a VM with --zone 1). Zone-redundant = Azure spreads one resource across zones for you (e.g. a zone-redundant Standard Load Balancer, or ZRS storage).
  5. Geography: India (satisfies residency). Region: Central India (Pune) or West India — close to Mumbai for low latency. DR target: the paired region (e.g. South India), which keeps data inside the India geography and inherits sequential updates and prioritised recovery. Pick residency first, then latency, then the pair for DR.

Exercise

Using the az CLI in a zone-enabled region, build a minimal but real layout that demonstrates both local and zonal resilience, and prove the placement:

  1. Create a resource group rg-infra-exercise in a zone-enabled region.
  2. Create an availability set avset-app (leave FD/UD at the defaults) and two small VMs (Standard_B1s, no public IP) inside it.
  3. Run az vm get-instance-view on each VM and confirm the two report different fault domains — proof the set spread them.
  4. Create one zonal VM vm-z2 pinned to zone 2 and confirm with az vm show --query "zones[0]" that it reports 2.
  5. Try to create a VM with both --zone 1 and --availability-set avset-app and observe the error — first-hand proof they are mutually exclusive.
  6. Delete the resource group: az group delete --name rg-infra-exercise --yes --no-wait.

If steps 3, 4, and 5 behave as described, you have hands-on proof of fault-domain spreading, zonal placement, and the set-vs-zone exclusivity — the heart of this lesson.

Certification mapping

Glossary

Next steps

You now have the complete map of Azure’s physical world — from a single rack’s fault domain out to worldwide geographies — and you can answer every classic interview and exam question about regions, zones, sets, fault and update domains, and the SLA ladder.

Related reading to go deeper:

AzureCloud FundamentalsAvailability ZonesFault DomainsHigh AvailabilityAZ-900
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading