Most teams treat Azure SQL Database as “SQL Server, but someone else patches it.” That mental model leaves enormous capability on the table. The parts that change how you architect have no on-prem equivalent: a storage engine that decouples compute from a distributed log and a page-server tier, read replicas you can size independently of the primary, tamper-evident ledger tables backed by a Merkle tree, and Always Encrypted enclaves that run WHERE name LIKE over data the engine itself cannot decrypt. This is a tour of those four patterns — Hyperscale, elastic pools, ledger, and Always Encrypted with secure enclaves — wired correctly, with the gotchas I have hit in production and the exact az/T-SQL to confirm each one is actually doing what you think.
Because this is a reference you will return to while designing a real data platform, every moving part is enumerated: the four Hyperscale tiers and what each costs you, the replica types side by side, every elastic-pool guardrail, the two ledger flavours and their verification surface, the encryption modes and what each can and cannot query, and a symptom→cause→confirm→fix playbook for the failure modes that actually page you at 2 a.m. Read the prose once, then keep the tables open. Everything below assumes a logical server (Microsoft.Sql/servers) already exists; I use sql-prod-eastus2 as the logical server and rg-data-eastus2 as the resource group throughout.
By the end you will be able to choose Hyperscale over Business Critical for the right reason (not “it’s newer”), size an elastic pool from telemetry instead of a guess, hand an auditor a cryptographic verification procedure instead of a promise, and let analysts run rich predicates over columns that no DBA — yours or Microsoft’s — can read in plaintext. These are the patterns that separate “we host a database” from “we engineered a data platform.”
What problem this solves
Four distinct production pains converge here, and each one is the kind that gets worse the longer you ignore it.
Storage and scale on a monolith hurt. On General Purpose or Business Critical, the database is (conceptually) a data file on a managed disk. Growth means pre-provisioning storage; a vCore resize on Business Critical reseeds a full local copy and can take hours; backing up a 50 TB database is a 50 TB problem. Teams hit a ceiling and start sharding their application by hand — months of work to dodge an engine limitation. Hyperscale removes the ceiling: storage grows transparently to 128 TB, resize is minutes not hours, and backups are snapshot-based and effectively size-independent.
Per-tenant isolation collides with per-tenant cost. SaaS teams want one database per tenant for hard data isolation and a clean blast radius, then balk at paying a dedicated SKU per tenant when most tenants are idle most of the time. Without a pooling model you either over-pay massively or compromise isolation by cramming tenants into shared tables with a TenantId column and pray your WHERE clauses are perfect. Elastic pools let hundreds of databases share one budget of vCores, so you provision for aggregate peak, not the sum of peaks.
“Trust us, the data wasn’t tampered with” is not a control. Regulated workloads (finance, health, supply chain) increasingly need to prove an audit record was not altered — even by a sysadmin, even by the cloud operator. Application-level hashing is brittle and bypassable. Ledger bakes cryptographic tamper-evidence into the engine: every change is hashed into a Merkle tree whose root you publish to immutable storage, and verification recomputes the tree.
Encryption that breaks search is encryption you can’t use. Classic Always Encrypted protects column data end-to-end — the service never sees plaintext — but the engine, seeing only ciphertext, can do equality at best and nothing else. So teams “solve” the search requirement by not encrypting the sensitive column, defeating the point. Secure enclaves let the engine decrypt inside a sealed, attested memory region and run LIKE/range predicates there, so you get confidentiality and rich queries.
Who hits these: platform and data teams running OLTP at scale, multi-tenant SaaS, compliance-bound audit trails, and any system holding regulated identifiers. The cost of getting it wrong ranges from a surprise tenfold bill to an un-passable audit to a sharding project you never needed.
Learning objectives
By the end of this article you can:
- Explain the four Hyperscale tiers (compute, page servers, log service, long-term storage) and why backups and restores are size-independent.
- Choose between HA replicas and named replicas for read-scale, and size each independently of the primary’s compute.
- Provision a Hyperscale database with zone redundancy and HA replicas, then scale vCores with no data movement.
- Design an elastic pool with per-database min/max guardrails sized from Azure Monitor telemetry, and move databases in and out with zero downtime.
- Implement append-only and updatable ledger tables, generate and verify a database digest, and configure automatic digest upload to immutable storage.
- Encrypt a column with randomized Always Encrypted and still run
LIKE/range predicates through a secure enclave with Azure Attestation. - Lock the perimeter and identity: Private Endpoint, public-access-off, Entra-only authentication, and least-privilege database roles.
- Diagnose the failure modes that actually bite — attestation drift, replica lag, connection storms, quota throttling — with the exact command that confirms each.
Prerequisites & where this fits
You should already understand the Azure SQL Database fundamentals — the difference between a logical server and a database, the DTU vs vCore purchasing models, and how to connect with a connection string. You should be comfortable running az in Cloud Shell, reading JSON output, and writing basic T-SQL DDL. Familiarity with TLS, managed identities, and private networking helps; the security section leans on the Azure Key Vault and Private Endpoint mechanics.
This sits at the top of the Data platform track. It assumes the service-tier basics and goes past them into the patterns that have no on-prem analogue. It pairs with Azure SQL connectivity troubleshooting (the playbook for when connections fail), encryption at rest with CMK (the layer beneath Always Encrypted), and Entra ID and RBAC (who is allowed to touch the data). For the managed-instance cousin of failover groups, see Azure SQL Managed Instance failover groups.
Where each capability sits in the bigger picture, and who usually owns it:
| Layer | What lives here | Pattern in this article | Who usually owns it |
|---|---|---|---|
| Compute / engine | Relational engine, RBPEX cache | Hyperscale compute + named replicas | Data / platform team |
| Storage | Page servers, log service, backups | Hyperscale disaggregated storage | Microsoft (platform) |
| Density / cost | Shared vCore budget | Elastic pools | Platform + FinOps |
| Integrity | Merkle tree, digests | Ledger tables | App + compliance |
| Confidentiality | Client-held keys, VBS enclave | Always Encrypted + enclaves | Security + app |
| Perimeter / identity | Private Endpoint, Entra auth | Network + identity lockdown | Network + identity team |
| Resilience | Zones, geo-replication, failover groups | HA/DR layering | SRE / DR owner |
Core concepts
Five mental models make every later decision obvious.
Hyperscale disaggregates storage from compute. General Purpose and Business Critical bind data to the compute node (or a managed disk). Hyperscale splits the database into four cooperating tiers — compute nodes, page servers, a log service, and long-term Azure Storage — connected by a log-is-the-database design. Compute becomes stateless-ish: it caches pages locally but the durable truth is the log. That single architectural choice is why resize, backup, restore, and read-scale all behave differently (and better) than on any other tier.
A replica is either coupled or independent. An HA replica is a hot standby sized exactly like the primary, sharing its page servers and log; it exists for failover and generic read traffic. A named replica is a separate database resource with its own name, its own compute SKU, and its own logins, but reading the same page servers — so it scales reads without copying storage and can be isolated per consumer. Knowing which you want is the difference between “fast failover” and “true read-scale-out.”
An elastic pool is a shared budget, not a shared database. Databases in a pool stay fully separate (separate schemas, separate logins, separate blast radius) but draw vCores/DTUs and storage from one pool budget. A tenant idle at 2 a.m. lends capacity to a tenant under load. Two guardrails — per-database min (a reserved floor) and max (a usage ceiling) — keep one noisy tenant from starving the rest. You size the pool to the aggregate peak.
Ledger is tamper-evidence, not tamper-prevention. It does not stop a privileged user from changing data; it guarantees you can detect it. Rows are hashed into a Merkle tree, the tree root is periodically sealed into a database digest, and you publish that digest to storage even Azure operators cannot rewrite. Verification recomputes the tree and compares. The value is that you can hand an auditor a procedure (sp_verify_database_ledger) instead of an assurance.
Always Encrypted keeps keys with the client; enclaves restore rich queries. The column encryption keys live with the client (wrapped by a column master key in a store the client controls); the SQL service never sees plaintext. Classic mode therefore allows only equality (deterministic) or nothing (randomized). A secure enclave — a VBS-protected memory region on the SQL node, gated by attestation — lets the engine briefly decrypt inside that sealed region and run LIKE/range predicates and in-place re-encryption, with the DBA still locked out.
The vocabulary in one table
Pin down every moving part before the deep sections. The glossary repeats these for lookup; this is the mental model side by side:
| Concept | One-line definition | Where it lives | Why it matters |
|---|---|---|---|
| Hyperscale | Disaggregated-storage SQL service tier | Service tier on a database | Removes scale/backup/restore ceilings |
| Compute node | Engine + local RBPEX SSD cache | Hyperscale compute tier | Where queries run; resize swaps it |
| Page server | Owns ≤1 TB slice, serves 8 KB pages | Hyperscale storage tier | Scales storage transparently to 128 TB |
| Log service | Fans log records to pages/replicas/backup | Hyperscale storage tier | Durability + the source of read replicas |
| HA replica | Hot standby, same SKU as primary | Same page servers/log | Fast failover; generic reads |
| Named replica | Independent DB, own SKU, same page servers | Separate resource | True read-scale-out, isolated |
| Elastic pool | Shared vCore/DTU budget for many DBs | On a logical server | Density for multi-tenant |
| Ledger table | Hash-chained, tamper-evident table | Inside a database | Cryptographic audit proof |
| Database digest | Sealed Merkle root you store off-box | Generated by the engine | The value verification trusts |
| Always Encrypted | Client-side column encryption | Column + client driver | Service never sees plaintext |
| Secure enclave | Attested protected-memory region | On the SQL node | Rich queries over encrypted data |
| Attestation | Proof the enclave is genuine | Azure Attestation (AAS) | Gate before keys enter the enclave |
| Failover group | Managed listener over geo-replication | Across two regions | Stable FQDN through failover |
A quick orientation across the four service tiers, so the rest of the article has a baseline — what each tier is for and where it caps out:
| Service tier | Storage model | Max size | Resize behaviour | Best for |
|---|---|---|---|---|
| General Purpose | Remote managed disk | 4 TB | Online, minutes | Most workloads; cost-balanced |
| Business Critical | Local SSD + AlwaysOn AG | 4 TB | Reseeds local copy (slow) | Low-latency OLTP, in-memory OLTP |
| Hyperscale | Disaggregated page servers + log | 128 TB | Minutes, no data move | Large/VLDB, fast restore, read-scale |
| Hyperscale (serverless) | Same, auto-pause-less compute | 128 TB | Auto-scale compute | Bursty/intermittent large DBs |
Hyperscale architecture: page servers, the log service, and named replicas
Hyperscale is a fundamentally different engine. Instead of a monolithic data file, it splits the database into four cooperating tiers:
- Compute nodes run the relational engine and a local SSD cache (the RBPEX — resilient buffer pool extension). The primary compute serves writes and reads; secondary computes serve reads.
- Page servers each own a slice of the database (covering up to ~1 TB of data) and serve 8 KB pages on demand to compute nodes that miss their local cache. Adding data adds page servers transparently — this is why Hyperscale scales to 128 TB without you ever pre-provisioning storage.
- The log service is the heart of the design. The primary writes log records to a durable landing zone (Azure Premium Storage), and the log service fans those records out to page servers and secondary replicas, and offloads them to long-term storage for backups. There is no checkpoint-and-flush of data pages from compute; durability is the log.
- Azure Storage holds the long-term log and the snapshot-based backups.
The consequence that matters: backups are snapshot-based and effectively size-independent, so a 50 TB database backs up in roughly the same wall-clock time as a 500 GB one, and a restore is fast because it rehydrates page-server snapshots rather than replaying a full backup chain.
The four tiers, what each owns, and the failure/scaling behaviour you inherit from each:
| Tier | Owns | Scales by | Durability role | What fails if it’s slow |
|---|---|---|---|---|
| Compute node | Engine + RBPEX cache | vCore resize (minutes) | None (cache only) | Query latency, cold cache after resize |
| Page server | ≤1 TB data slice, 8 KB pages | Auto, per TB of data | Holds materialized pages | Page-miss latency on cold reads |
| Log service | Durable log landing + fan-out | Throughput-bound (~100 MB/s) | The durability boundary | Write throughput ceiling, replica lag |
| Long-term storage | Snapshots + log for PITR | Auto, per retention | Backups / restore source | Restore time (still fast) |
Provision a Hyperscale database with secondary replicas in one shot:
az sql db create \
--resource-group rg-data-eastus2 \
--server sql-prod-eastus2 \
--name appdb \
--edition Hyperscale \
--family Gen5 \
--capacity 8 \
--ha-replicas 2 \
--zone-redundant true \
--backup-storage-redundancy Zone
resource appdb 'Microsoft.Sql/servers/databases@2023-08-01' = {
parent: sqlServer
name: 'appdb'
location: location
sku: { name: 'HS_Gen5_8', tier: 'Hyperscale', family: 'Gen5', capacity: 8 }
properties: {
highAvailabilityReplicaCount: 2
zoneRedundant: true
requestedBackupStorageRedundancy: 'Zone'
}
}
--ha-replicas (0–4) provisions HA secondary replicas. They share the same page servers and log as the primary, give you fast failover targets, and serve read-only traffic via ApplicationIntent=ReadOnly. Zone redundancy spreads the primary and HA replicas across availability zones; do not skip it for production.
The provisioning knobs that actually shape cost and resilience — what each does, its default, and the trade-off:
| Flag / property | What it controls | Default | Range / values | Trade-off if wrong |
|---|---|---|---|---|
--edition |
Service tier | (required) | Hyperscale |
Wrong tier = wrong scaling model |
--family |
Hardware generation | Gen5 |
Gen5, Gen8, etc. |
Newer = better perf/vCore, check region |
--capacity |
vCore count | (required) | 2–128 (Gen5) | Under-size → throttle; over-size → bill |
--ha-replicas |
HA standby count | 0 | 0–4 | 0 = slow failover; each adds compute cost |
--zone-redundant |
Spread across AZs | false | true/false | false → single-AZ outage = downtime |
--backup-storage-redundancy |
Backup geo-spread | Geo | Local/Zone/Geo/GeoZone |
Local = no cross-region restore |
--read-scale |
Route RO intent to replicas | Enabled (HS) | Enabled/Disabled | Disabled wastes HA replica read capacity |
--compute-model |
Provisioned vs serverless | Provisioned | Provisioned/Serverless |
Serverless saves on bursty, costs on steady |
Named replicas: the feature that changes read-scale architecture
HA replicas are coupled to the primary’s compute size and serve generic read traffic. Named replicas are different and far more powerful: each is a separate database resource with its own name, its own compute SKU, and its own access control, but reads from the same page servers as the primary (so it costs compute, not a second copy of storage). You can attach up to 30 of them.
This is how you build true read-scale-out: a small primary sized for OLTP writes, and named replicas individually sized — a big one for the BI/Power BI workload, a small one for an ad-hoc reporting team, each isolated so a runaway report cannot starve the OLTP primary.
# A large named replica dedicated to the analytics workload,
# pointed at the SAME primary database (--partner-database).
az sql db replica create \
--resource-group rg-data-eastus2 \
--server sql-prod-eastus2 \
--name appdb \
--partner-server sql-prod-eastus2 \
--partner-database appdb-bi-replica \
--secondary-type Named \
--family Gen5 \
--capacity 16
Named replicas are read-only and lag the primary by milliseconds (they consume the same log stream). Because each has its own logins and
DENY-able surface, you can hand a reporting team a connection toappdb-bi-replicawith zero write capability and zero blast radius onto production — something you cannot cleanly do with a plain HA replica.
The two replica types side by side — this table is the whole decision:
| Dimension | HA replica | Named replica |
|---|---|---|
| Resource identity | Hidden, part of primary | Separate database resource |
| Compute SKU | Same as primary | Independent (own vCore count) |
| Max count | 4 | 30 |
| Storage | Shared page servers | Shared page servers |
| Logins / access control | Inherited from primary | Own logins, own grants |
| Primary use | Failover + generic reads | Read-scale-out, isolation |
| Routing | ApplicationIntent=ReadOnly |
Connect to its own FQDN |
| Failover target? | Yes | No |
| Lag | Milliseconds | Milliseconds |
| Cost | Compute (primary SKU) | Compute (its own SKU) |
When to reach for which read path, as a decision table:
| If you need… | Use | Why |
|---|---|---|
| A failover target with read offload | HA replica | It can be promoted; reads are a bonus |
| A dedicated, independently sized analytics endpoint | Named replica | Own SKU, own logins, isolated |
| To stop a BI query starving OLTP | Named replica | Separate compute can’t steal the primary’s |
| Read-only access for a third party with no write risk | Named replica | Grant on its own surface only |
| Cross-region readable copy | Geo-replica / failover group | Replicas above are same-region |
| The cheapest possible read offload | ApplicationIntent=ReadOnly on an HA replica |
Reuses a replica you already pay for |
Rapid scale, fast restore, and read-scale-out in practice
Because storage is disaggregated, scaling compute up or down does not move data — it reattaches a new compute node to the existing page servers and warms the cache. A vCore resize that would take hours on Business Critical (it has to seed a new local copy) completes in minutes on Hyperscale.
# Vertical scale: 8 -> 16 vCores. No data movement; cache warm-up is the only cost.
az sql db update \
--resource-group rg-data-eastus2 \
--server sql-prod-eastus2 \
--name appdb \
--capacity 16
Point-in-time restore (PITR) leans on the snapshot architecture and is the cheap insurance you should rehearse:
az sql db restore \
--resource-group rg-data-eastus2 \
--server sql-prod-eastus2 \
--name appdb \
--dest-name appdb-restored \
--time "2026-06-08T03:30:00Z" \
--edition Hyperscale --family Gen5 --capacity 8
For read-scale, point reporting connections at a named replica’s FQDN, or use ApplicationIntent=ReadOnly against the primary’s server to be routed to an HA replica:
Server=tcp:sql-prod-eastus2.database.windows.net,1433;
Database=appdb;Authentication=Active Directory Default;
ApplicationIntent=ReadOnly;Encrypt=True;
Why each operation is fast on Hyperscale (and slow elsewhere) — the mechanism, not the marketing:
| Operation | Hyperscale behaviour | Why it’s fast | What still costs time |
|---|---|---|---|
| Vertical resize | Reattach new compute to page servers | No local data reseed | RBPEX cache warm-up |
| Backup | Snapshot of page servers | Size-independent, no full copy | Effectively none |
| PITR restore | Rehydrate page-server snapshots + log replay | No full restore chain replay | Log replay from snapshot to target time |
| Add HA replica | Attach compute to existing storage | No storage copy | Cache warm-up |
| Add named replica | New compute on same page servers | No storage copy | Minutes to provision compute |
| Storage growth | Add page servers transparently | No pre-provisioning | None (automatic) |
The restore options compared — pick the right one for the incident:
| Restore type | Source | Granularity | Cross-region? | Typical use |
|---|---|---|---|---|
| PITR | Continuous backup | Any second in retention | No | Oops-delete, bad deploy |
| Long-term retention (LTR) | Weekly/monthly/yearly copies | Per stored backup | Yes (if geo) | Compliance, “restore last June” |
| Geo-restore | Geo-replicated backup | Last available backup | Yes | Region loss, no failover group |
| Failover group failover | Live geo-secondary | Near-zero RPO | Yes | Region outage, fast RTO |
Copy (az sql db copy) |
Live source | Point of copy | Yes | Clone for test/staging |
Elastic pools, vCore density, and tenant-per-database design
If you run a SaaS product with hundreds or thousands of tenants, the cleanest isolation model is one database per tenant. The cost objection — “I cannot pay for a dedicated database SKU per tenant” — is exactly what elastic pools solve. A pool is a shared budget of vCores (or DTUs) and storage that all databases in the pool draw from. A tenant idle at 2 a.m. lends its capacity to a tenant under load; you provision for the aggregate peak, not the sum of individual peaks.
# A vCore elastic pool with per-database min/max guardrails so no single
# noisy tenant can monopolise the whole pool.
az sql elastic-pool create \
--resource-group rg-data-eastus2 \
--server sql-prod-eastus2 \
--name pool-tenants-01 \
--edition GeneralPurpose --family Gen5 \
--capacity 16 \
--db-min-capacity 0.5 \
--db-max-capacity 4 \
--zone-redundant true
resource pool 'Microsoft.Sql/servers/elasticPools@2023-08-01' = {
parent: sqlServer
name: 'pool-tenants-01'
location: location
sku: { name: 'GP_Gen5', tier: 'GeneralPurpose', family: 'Gen5', capacity: 16 }
properties: {
perDatabaseSettings: { minCapacity: json('0.5'), maxCapacity: json('4') }
zoneRedundant: true
}
}
The knobs that make or break density — set every one of these deliberately:
| Setting | Purpose | Default | Failure mode if wrong |
|---|---|---|---|
--capacity (pool) |
Total shared vCores | (required) | Under-size → everyone throttles at peak |
--db-max-capacity |
Cap one database’s vCore usage | = pool capacity | One tenant starves the rest |
--db-min-capacity |
Reserve a floor per database | 0 | Over-reserve → you lose density benefit |
--storage / --max-size |
Pool storage budget | tier default | Hit it → all DBs read-only |
--zone-redundant |
Spread pool across AZs | false | Single-AZ outage takes the pool down |
| Max DBs per pool | Density ceiling | tier-dependent | Exceed → can’t add tenants |
--edition/family |
Pool service tier | (required) | Must match DBs you add |
Move databases in and out of the pool without downtime — this is a metadata operation:
az sql db update \
--resource-group rg-data-eastus2 \
--server sql-prod-eastus2 \
--name tenant-0457 \
--elastic-pool pool-tenants-01
Watch out: a database can only join a pool of the same logical server and edition family, and Hyperscale has its own pool type. Mixing service tiers in one pool is not allowed. Size the pool from real telemetry — the pool’s vCore and storage percentage metrics in Azure Monitor are your sizing evidence, not a guess. (Pair this with the KQL/Log Analytics skills to turn that telemetry into a sizing decision.)
The compatibility rules for joining a pool, stated exactly so you stop fighting the API:
| Constraint | Rule | What happens if violated |
|---|---|---|
| Logical server | DB and pool on the same server | Move rejected |
| Edition family | Must match (GP↔GP, BC↔BC) | Move rejected |
| Hardware gen | Same Gen family | Move rejected |
| Hyperscale | Separate Hyperscale pool type | Cannot mix with GP/BC |
| Geo-replica | Secondary must match primary’s pool eligibility | Replication errors |
| Per-DB max ≤ pool capacity | db-max-capacity ≤ pool vCores |
Setting rejected |
Single database vs elastic pool vs sharding — when each is the right multi-tenant model:
| Approach | Isolation | Cost at 1,000 tenants | Operational load | When to choose |
|---|---|---|---|---|
| DB per tenant, single SKUs | Hard | Highest (sum of peaks) | Per-DB ops | Few, large tenants |
| DB per tenant, elastic pool | Hard | Aggregate peak only | Pool-level ops | Many tenants, classic SaaS |
| Shared DB, TenantId column | Soft (query-enforced) | Lowest | Schema/query discipline | Thousands of tiny tenants |
| Sharded shared DBs | Soft, partitioned | Low-medium | Shard map maintenance | Extreme scale, uniform tenants |
Ledger tables for tamper-evidence and cryptographic verification
Ledger gives you blockchain-style integrity inside a relational database: every change is recorded such that any tampering — even by a sysadmin with full rights, even by Microsoft — is cryptographically detectable. It works by hashing rows into a Merkle tree, periodically sealing the tree root into a database digest, and letting you publish that digest to immutable storage you control. Verification recomputes the tree and compares it to the digest you trusted.
Two flavours:
- Updatable ledger tables allow
UPDATE/DELETE, but the engine keeps the full history in a paired history table and surfaces it through a_Ledgerview. Nothing is ever truly destroyed. - Append-only ledger tables reject
UPDATEandDELETEentirely — ideal for an audit log.
-- Append-only ledger: an immutable audit trail. UPDATE/DELETE will fail.
CREATE TABLE dbo.PaymentAudit
(
AuditId BIGINT IDENTITY(1,1) PRIMARY KEY,
AccountId INT NOT NULL,
Action NVARCHAR(50) NOT NULL,
Amount DECIMAL(19,4) NOT NULL,
OccurredAt DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME()
)
WITH (LEDGER = ON (APPEND_ONLY = ON));
-- Updatable ledger: full history is retained automatically and queryable.
CREATE TABLE dbo.AccountBalance
(
AccountId INT PRIMARY KEY,
Balance DECIMAL(19,4) NOT NULL
)
WITH (LEDGER = ON); -- creates AccountBalance_Ledger view + history table
The two ledger flavours compared — choose by whether the data must be mutable:
| Dimension | Append-only ledger | Updatable ledger |
|---|---|---|
UPDATE / DELETE |
Rejected (error) | Allowed |
| History table | Not needed (no mutation) | Auto-created, full history |
_Ledger view |
Surfaces inserts | Surfaces current + history |
| Storage overhead | Low | Higher (keeps every version) |
| Best for | Audit logs, event streams | Master data needing traceable edits |
| Tamper-evidence | Yes | Yes |
| Can be dropped? | Only if not protected by digest policy | Same |
The hidden columns ledger adds — know these so your SELECT * and ETL don’t break:
| Column / object | Purpose | Where it appears |
|---|---|---|
ledger_start_transaction_id |
Tx that created the row version | Generated, on the table |
ledger_end_transaction_id |
Tx that ended it (updatable) | Generated |
ledger_start_sequence_number |
Order within the transaction | Generated |
ledger_end_sequence_number |
End order (updatable) | Generated |
<table>_History |
Prior row versions | Paired history table (updatable) |
<table>_Ledger view |
Unified current + history | View |
sys.database_ledger_blocks |
The sealed block chain | System view |
sys.database_ledger_transactions |
Per-transaction hashes | System view |
Generate and verify a digest. In production you configure automatic digest storage to an immutable Azure Storage container or Confidential Ledger, so digests are written to a location even Azure operators cannot rewrite:
-- Generate the current database digest (the value you persist off-box).
EXEC sys.sp_generate_database_ledger_digest;
-- Verify the entire database against the digests you trust.
-- @digests is the JSON array of previously stored digest blocks.
EXEC sys.sp_verify_database_ledger @digests;
If a single byte of any ledger row was altered outside the normal transaction path, sp_verify_database_ledger raises an error instead of returning cleanly. That is the whole value proposition: you can hand an auditor a verification procedure rather than a promise.
The verification surface — what each procedure/object is for during an audit:
| Object | Type | What it does | When you call it |
|---|---|---|---|
sp_generate_database_ledger_digest |
Proc | Emits current sealed digest | On schedule, store off-box |
sp_verify_database_ledger |
Proc | Recompute + compare to digests | Audit, incident, periodic |
sp_verify_database_ledger_from_digest_storage |
Proc | Verify using auto-stored digests | When digests live in immutable storage |
| Automatic digest storage | Config | Auto-upload digests to immutable container | Set once at design time |
sys.database_ledger_blocks |
View | Inspect the block hash chain | Forensics |
| Confidential Ledger | Service | Tamper-proof digest sink | Highest assurance |
Ledger is tamper-evidence, not tamper-prevention. It does not stop a privileged user from changing data; it guarantees you will find out. Pair it with
Encrypt=True, RBAC, and auditing — and configure automatic digest upload, because a digest you never stored off-box proves nothing.
Always Encrypted with secure enclaves and rich computations
Always Encrypted keeps column data encrypted all the way through the database engine — the keys live with the client (or in a key store the client controls), and the SQL service literally never sees plaintext. Classic Always Encrypted has a brutal limitation: because the engine sees only ciphertext, it can do equality comparisons (with deterministic encryption) and nothing else. No LIKE, no range queries, no in-place re-encryption.
Secure enclaves lift that limitation. A VBS (Virtualization-Based Security) enclave is a protected memory region on the SQL node that the host OS and DBAs cannot inspect. The client establishes attestation (via Microsoft Azure Attestation), then delegates the plaintext keys into the enclave. Inside the enclave — and only there — the engine can decrypt, run LIKE and range predicates, and even re-encrypt columns in place. The DBA still never sees plaintext; the enclave is a sealed black box.
The encryption modes and exactly what each can do — this table is why enclaves exist:
| Mode | Same plaintext → same ciphertext? | Equality (=) |
LIKE / range |
In-place re-encrypt | Use when |
|---|---|---|---|---|---|
| Deterministic (no enclave) | Yes | Yes | No | No | Join/group-by on the column needed |
| Randomized (no enclave) | No | No | No | No | Maximum secrecy, no server-side query |
| Deterministic + enclave | Yes | Yes | Yes | Yes | Equality + occasional rich query |
| Randomized + enclave | No | Yes (in enclave) | Yes | Yes | Strongest secrecy AND rich queries |
Determinism leaks information (equal plaintexts produce equal ciphertexts, so an attacker can correlate). With an enclave you can use randomized — the strongest mode — and still query richly, because the comparison happens on decrypted data inside the sealed enclave. That combination is the whole reason to adopt enclaves.
Enable the enclave on the database and create an enclave-enabled key:
-- Column Master Key must allow enclave computations.
CREATE COLUMN MASTER KEY CMK_Enclave
WITH (
KEY_STORE_PROVIDER_NAME = 'AZURE_KEY_VAULT',
KEY_PATH = 'https://kv-data-eastus2.vault.azure.net/keys/cmk-ae/abc123',
ENCLAVE_COMPUTATIONS (SIGNATURE = 0x...) -- signature emitted by the tooling
);
CREATE COLUMN ENCRYPTION KEY CEK_Enclave
WITH VALUES (
COLUMN_MASTER_KEY = CMK_Enclave,
ALGORITHM = 'RSA_OAEP',
ENCRYPTED_VALUE = 0x...
);
Encrypt a column with randomized encryption (the strongest mode) and still query it with rich predicates — impossible without the enclave:
ALTER TABLE dbo.Customer
ALTER COLUMN [SSN] NVARCHAR(11)
ENCRYPTED WITH (
COLUMN_ENCRYPTION_KEY = CEK_Enclave,
ENCRYPTION_TYPE = RANDOMIZED,
ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256'
)
WITH (ONLINE = ON);
The client connection must opt into attestation. With the sqlcmd/ODBC stack and Azure Attestation:
Server=tcp:sql-prod-eastus2.database.windows.net,1433;Database=appdb;
Authentication=Active Directory Default;Encrypt=True;
ColumnEncryptionSetting=Enabled;
AttestationProtocol=AAS;
EnclaveAttestationUrl=https://attest-eastus2.eus.attest.azure.net/attest/SgxEnclave;
The key hierarchy — three layers, who holds each, and the blast radius if it leaks:
| Key | What it does | Where it lives | Who can access | Blast radius if compromised |
|---|---|---|---|---|
| Column Master Key (CMK) | Wraps the CEK | Key Vault (you control) | Client + KV RBAC | Can unwrap CEKs → all encrypted columns |
| Column Encryption Key (CEK) | Encrypts the column data | Stored encrypted in the DB | Decrypted only client-side | The columns it protects |
| Data (column value) | The secret itself | In the table, as ciphertext | Plaintext only client/enclave | One value |
| Attestation policy | Proves enclave genuineness | Azure Attestation | AAS RBAC | Mis-set → keys won’t enter enclave |
The connection-string parameters that make or break enclave queries — every one matters:
| Parameter | Purpose | Value | If missing/wrong |
|---|---|---|---|
ColumnEncryptionSetting |
Enable Always Encrypted on the driver | Enabled |
Reads return ciphertext, writes fail |
AttestationProtocol |
Attestation provider type | AAS (or None for SGX-less) |
Enclave queries rejected |
EnclaveAttestationUrl |
The AAS attestation endpoint | region-specific URL | #1 cause of CI failures |
Authentication |
Identity to the server | Active Directory Default |
Auth fails |
Encrypt |
TLS to the server | True |
Connection refused (TLS enforced) |
The enclave processes plaintext in protected memory only. To benefit, your column master key must be marked
ENCLAVE_COMPUTATIONSand the client must successfully attest — otherwise the engine falls back to ciphertext-only behaviour and yourLIKEquery fails. Test the attestation path in a lower environment first; misconfigured attestation URLs are the number-one cause of “it works on my machine but not in CI.”
Private connectivity, Entra-only auth, and least-privilege RBAC
None of the above matters if the database is reachable from the public internet with SQL logins. Lock the perimeter and the identity layer:
# Kill public network access; the database is reachable only via Private Endpoint.
az sql server update \
--resource-group rg-data-eastus2 \
--name sql-prod-eastus2 \
--enable-public-network false
# Private Endpoint into your data subnet, plus the privatelink DNS zone.
az network private-endpoint create \
--resource-group rg-data-eastus2 \
--name pe-sql-prod \
--vnet-name vnet-data-eastus2 --subnet snet-data \
--private-connection-resource-id "/subscriptions/<sub>/resourceGroups/rg-data-eastus2/providers/Microsoft.Sql/servers/sql-prod-eastus2" \
--group-id sqlServer \
--connection-name pe-sql-prod-conn
Enforce Microsoft Entra-only authentication so SQL logins and passwords cannot be used at all — this single setting eliminates an entire class of credential-leak incidents:
az sql server ad-only-auth enable \
--resource-group rg-data-eastus2 \
--server sql-prod-eastus2
Then grant database access to managed identities and Entra groups, with the least role that works — prefer the granular fixed database roles over db_owner:
-- App's user-assigned managed identity gets reader/writer, NOT db_owner.
CREATE USER [mi-appdb-writer] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [mi-appdb-writer];
ALTER ROLE db_datawriter ADD MEMBER [mi-appdb-writer];
-- Reporting Entra group gets read-only, scoped to the named replica.
CREATE USER [grp-bi-readers] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [grp-bi-readers];
The connectivity options compared — pick the most locked-down one your topology allows:
| Option | Reachability | Data path | DNS requirement | Use when |
|---|---|---|---|---|
| Public + firewall rules | Internet, IP-allowlisted | Public IP | Default public FQDN | Quick dev only |
| Public + Azure-services toggle | Azure ranges | Public IP | Default | Legacy PaaS callers |
| Service endpoint | VNet subnet | Optimized public route | Default FQDN | Subnet-scoped, no PE budget |
| Private Endpoint | VNet, private IP | Microsoft backbone | privatelink zone A record | Production default |
| PE + public-access-off | VNet only | Backbone | privatelink zone | Hard isolation / compliance |
The fixed database roles, ordered by privilege — grant the lowest that works:
| Role | Grants | Use for | Never use for |
|---|---|---|---|
db_datareader |
SELECT on all tables | Reporting, read APIs | Anything writing |
db_datawriter |
INSERT/UPDATE/DELETE | Write APIs (with reader) | Schema changes |
db_ddladmin |
CREATE/ALTER/DROP objects | Migration runners | App runtime identity |
db_securityadmin |
Manage roles/permissions | Rare, controlled | App runtime |
db_owner |
Everything in the DB | Almost never | Routine app access |
| Custom role | Exactly the grants you define | Least-privilege ideal | — |
The identity controls that close credential-leak classes — set all of these:
| Control | What it stops | Command / setting |
|---|---|---|
| Entra-only auth | SQL login/password use entirely | az sql server ad-only-auth enable |
| Managed identity | Secrets in connection strings | Authentication=Active Directory Default |
| Public access off | Internet reachability | --enable-public-network false |
| Private Endpoint | Data over public IP | az network private-endpoint create |
| Least-privilege roles | Over-broad db_owner access |
Fixed/custom DB roles |
| Auditing | Undetected access | az sql server audit-policy update |
| TDE / CMK | At-rest exposure | CMK encryption |
Performance: automatic tuning and Query Store baselines
Hyperscale and elastic pools do not absolve you of query hygiene — they amplify the cost of a bad plan because more workloads share the resource. Query Store is on by default and is your flight recorder: it captures query text, plans, and runtime stats so you can see which plan regressed and when. Enforce sensible retention and read-write mode:
ALTER DATABASE appdb SET QUERY_STORE = ON
(OPERATION_MODE = READ_WRITE,
QUERY_CAPTURE_MODE = AUTO,
MAX_STORAGE_SIZE_MB = 1024,
STALE_QUERY_THRESHOLD_DAYS = 30);
Turn on automatic tuning so the platform forces the last good plan when a regression is detected and creates/drops indexes from its recommendations:
-- Enable plan regression correction and index automation at the DB level.
ALTER DATABASE appdb SET AUTOMATIC_TUNING
(FORCE_LAST_GOOD_PLAN = ON, CREATE_INDEX = ON, DROP_INDEX = ON);
Find the regressions Query Store is hiding for you:
-- Top 10 queries by total duration over the last 24h, with their plan count.
SELECT TOP 10
qsq.query_id,
qst.query_sql_text,
COUNT(DISTINCT p.plan_id) AS plan_count,
SUM(rs.avg_duration * rs.count_executions) AS total_duration_us
FROM sys.query_store_query qsq
JOIN sys.query_store_query_text qst ON qst.query_text_id = qsq.query_text_id
JOIN sys.query_store_plan p ON p.query_id = qsq.query_id
JOIN sys.query_store_runtime_stats rs ON rs.plan_id = p.plan_id
JOIN sys.query_store_runtime_stats_interval rsi ON rsi.runtime_stats_interval_id = rs.runtime_stats_interval_id
WHERE rsi.start_time > DATEADD(HOUR, -24, SYSUTCDATETIME())
GROUP BY qsq.query_id, qst.query_sql_text
ORDER BY total_duration_us DESC;
A plan_count greater than 1 on a hot query is your signal to investigate parameter-sensitivity and consider forcing a plan.
The automatic-tuning options and what each automates — enable deliberately:
| Option | What it does | Default (Azure SQL) | When to disable |
|---|---|---|---|
FORCE_LAST_GOOD_PLAN |
Reverts a regressed plan automatically | On (inherited) | Almost never — keep on |
CREATE_INDEX |
Adds indexes from recommendations | Off by default | Tight write-heavy OLTP |
DROP_INDEX |
Removes unused/duplicate indexes | Off by default | If you manage indexes by hand |
| Query Store mode | Capture level | AUTO |
ALL for forensics, NONE never in prod |
MAX_STORAGE_SIZE_MB |
Query Store size cap | tier-dependent | Raise for long retention |
HA/DR with zone redundancy, geo-replication, and failover groups
Layer your resilience; do not pick one and call it done.
- Zone redundancy (set at provisioning) survives a single-AZ outage with no data loss and no action.
- Active geo-replication creates a readable secondary in another region you manage explicitly.
- Failover groups wrap geo-replication with a managed listener endpoint, so applications use a stable FQDN that survives failover, and add an automatic-failover policy.
# A failover group across regions with automatic failover after a grace period.
az sql failover-group create \
--resource-group rg-data-eastus2 \
--server sql-prod-eastus2 \
--partner-server sql-dr-westus2 \
--name fog-appdb \
--add-db appdb \
--failover-policy Automatic \
--grace-period 1
Applications then connect to the failover-group listener — never a regional server name:
Server=tcp:fog-appdb.database.windows.net,1433;Database=appdb;
Authentication=Active Directory Default;Encrypt=True;
A read-write listener (
fog-appdb.database.windows.net) and a read-only listener (fog-appdb.secondary.database.windows.net) are created for you. Route reporting at the read-only listener so a region failover transparently follows the secondary. Rehearse a real failover quarterly — an untested DR plan is a hypothesis, not a control. For the broader regional-failover pattern, see Azure Front Door + Traffic Manager global failover and Azure backup & site recovery.
The resilience layers compared — what each protects against and its RPO/RTO:
| Mechanism | Protects against | RPO | RTO | Readable secondary? | Cost |
|---|---|---|---|---|---|
| Zone redundancy | Single-AZ failure | 0 | Seconds (automatic) | No | Small premium |
| HA replica | Node failure | 0 | Seconds | Yes (RO intent) | Compute per replica |
| Active geo-replication | Region failure | Seconds (async) | Manual failover | Yes | 2nd region DB |
| Failover group | Region failure | Seconds | Managed, stable FQDN | Yes (RO listener) | 2nd region DB |
| Geo-restore | Region loss, no replica | Last backup | Minutes–hours | After restore | Backup storage only |
The failover-group policy knobs that decide your DR behaviour:
| Setting | What it controls | Values | Recommendation |
|---|---|---|---|
--failover-policy |
Auto vs manual failover | Automatic/Manual |
Automatic for prod with grace period |
--grace-period |
Hours before auto-failover fires | ≥1 | 1 (balances false-trigger vs RTO) |
| Read-write listener | Stable primary FQDN | auto-created | Apps always use this, never regional |
| Read-only listener | Stable RO FQDN | auto-created | Route reporting here |
| Secondary type | DB / elastic pool | matches source | Pool FOG for multi-tenant |
Architecture at a glance
The diagram traces the production data path left to right and maps the failure that bites at each hop. An application or serverless front end connects over TLS on 1433 to the failover-group listener — never a regional server name — which resolves through a private DNS zone to a private IP (badge 2: if the privatelink zone isn’t linked, the FQDN falls back to the public name and dies when public access is off). The request lands on the Hyperscale primary in zone 1, whose compute caches pages locally (RBPEX) but whose durability lives in the log service. The log service fans records to page servers (the transparent-growth storage tier), to the HA standby in zone 2 (badge 3: zone-redundant HA adds a cross-zone commit round trip), to named replicas sized independently for analytics (badge 4: a runaway report on its own compute can’t starve the OLTP primary, but async lag is real), and offloads snapshots to long-term storage for size-independent backups.
Two control planes sit alongside the data path. Always Encrypted routes the sensitive columns through a secure enclave gated by Azure Attestation and a Key Vault-held column master key (badge 5: if attestation is mis-wired or the CMK isn’t marked for enclave computations, rich LIKE queries silently fall back to ciphertext and fail). And ledger seals row hashes into digests pushed to immutable storage (badge 1: a connection storm from a serverless front end exhausts max_connections long before CPU climbs). Read the badges as the diagnostic map: each number is the exact hop where a named failure class bites, and the legend narrates symptom · confirm · fix for each.
Real-world scenario
MedLedger Health runs a multi-tenant clinical-records SaaS: ~1,400 clinic tenants, one database per tenant for hard data isolation, on a logical server in East US 2. Two constraints collided. A compliance mandate required patient national IDs (SSN-equivalents) to be encrypted such that no operator — theirs or the cloud’s — could read them, while clinicians still needed to search patients by partial name and ID. And finance flagged that running 1,400 standalone database SKUs would multiply the bill roughly tenfold over actual aggregate load. The team had four data engineers and an audit deadline.
They solved it as a stack. Tenants went into elastic pools — a few pools of ~200 databases each, sized from real Azure Monitor vCore-percentage telemetry rather than per-tenant peaks — which let them provision for aggregate peak and cut the dedicated-SKU cost projection by about 80%. The national-ID columns moved to Always Encrypted with secure enclaves, RANDOMIZED encryption with a VBS enclave and Azure Attestation, so clinicians’ partial-match queries ran inside the enclave on plaintext that never left protected memory, while DBAs querying the same column saw only ciphertext. The patient-record audit trail became an append-only ledger table with automatic digest upload to immutable storage, giving auditors a sp_verify_database_ledger procedure instead of a trust-us assertion. The busiest tenants’ reporting was offloaded to a named replica per pool so a heavy analytics query could never starve a clinic’s live OLTP.
The trap they hit was the attestation policy, not the crypto. Their enclave worked perfectly in dev but every CI integration test failed with attestation errors — because the CI runners’ connection string pointed at the dev region’s attestation URL while talking to a test region’s database, and attestation is region-bound. The fix was to template the attestation URL per environment and gate deployment on a smoke test that actually issues an enclave query:
-- CI smoke gate: must succeed THROUGH the enclave or the deploy fails.
-- A range/LIKE predicate on a RANDOMIZED column only works if attestation succeeded.
IF NOT EXISTS (
SELECT 1 FROM dbo.Customer WHERE [SSN] LIKE N'%0000' -- enclave-evaluated
)
THROW 50010, 'Enclave query returned nothing -- attestation likely misconfigured.', 1;
The second surprise was a connection storm. The app tier was serverless and scaled to hundreds of instances at clinic-opening time (8 a.m. local), and each instance opened its own pool of connections. CPU on the pool stayed under 30%, yet clinicians got FATAL: too many clients — they had hit max_connections, a connection ceiling not a compute one. The fix was a connection-pooling proxy in front and per-tenant db-max-capacity guardrails so one busy clinic could not also monopolise vCores. Outcome: the aggregate bill came in ~80% below the naive projection, partial-name search worked through the enclave with DBAs locked out, auditors got a cryptographic verification procedure, and the 8 a.m. storm flattened. The lesson on the wall: “With enclaves the encryption rarely breaks — the attestation wiring does; and a database has two ceilings, compute and connections, and the cheap one to hit is connections.”
The incident timeline, because the order of discovery is the lesson:
| Time | Symptom | What they checked | Finding | Fix |
|---|---|---|---|---|
| Week 1 | CI tests fail, dev works | Attestation URL per env | Region-bound URL mismatch | Template URL; smoke-gate deploy |
| Week 2 | too many clients at 8 a.m. |
Pool CPU (low) vs pg_stat-equivalent |
Connection ceiling, not CPU | Pooling proxy + per-DB max |
| Week 2 | One clinic slow during BI run | DTU/vCore per database | BI starving OLTP | Named replica for reporting |
| Week 3 | Audit readiness | Ledger digest storage | Digests not pushed off-box | Automatic digest upload to immutable container |
| Week 4 | Cost review | Pool vCore % telemetry | Over-provisioned floors | Lowered db-min-capacity |
Advantages and disadvantages
These patterns are powerful precisely because they break the on-prem mental model — which is also why they surprise teams. Weigh them honestly:
| Advantages (why these patterns help you) | Disadvantages (why they bite) |
|---|---|
| Hyperscale scales storage to 128 TB with no pre-provisioning and backs up size-independently | Hyperscale has its own pool type and some features (e.g. certain in-memory OLTP) differ from Business Critical |
| Named replicas give true read-scale-out with independent SKUs and isolated logins | Each named replica is its own compute bill; lag is async (milliseconds, but non-zero) |
| Elastic pools cut multi-tenant cost dramatically by sharing an aggregate budget | A missing db-max-capacity lets one tenant starve the pool; sizing needs real telemetry |
| Ledger hands auditors cryptographic proof instead of a promise | It’s tamper-evidence, not prevention; a digest you never stored off-box proves nothing |
| Always Encrypted means the service never sees plaintext — strongest confidentiality | Classic mode breaks rich queries; enclaves add attestation wiring that is fiddly and region-bound |
Enclaves restore LIKE/range over randomized-encrypted columns |
Mis-set attestation silently falls back to ciphertext; CI is where it bites |
| Entra-only auth + Private Endpoint eliminate whole credential/exposure classes | Public-access-off requires correct private DNS or every connection fails |
| Failover groups give a stable FQDN through region failover | Auto-failover needs a grace period tuned against false triggers; DR must be rehearsed |
The patterns are right for large OLTP, multi-tenant SaaS, compliance-bound audit trails, and regulated-identifier workloads. They bite hardest on teams who adopt them with on-prem assumptions — expecting a Business-Critical-style reseed, forgetting the connection ceiling, or assuming attestation “just works” across environments. Every disadvantage above is manageable once you know it exists, which is the point of this article.
Hands-on lab
Provision a Hyperscale database, add a named replica, create an append-only ledger table, and verify a digest — then tear it all down. Run in Cloud Shell (Bash). This uses small SKUs; delete at the end to avoid charges.
Step 1 — Variables and a logical server.
RG=rg-hslab
LOC=eastus2
SRV=sql-hslab-$RANDOM
ADMIN=sqladmin
PWD=$(openssl rand -base64 18)Aa1!
az group create -n $RG -l $LOC -o table
az sql server create -n $SRV -g $RG -l $LOC \
--admin-user $ADMIN --admin-password "$PWD" -o table
Expected: a resource group and a logical server. (We use a SQL admin only for the lab; in production enable Entra-only auth.)
Step 2 — Create a Hyperscale database with one HA replica.
az sql db create -g $RG -s $SRV -n appdb \
--edition Hyperscale --family Gen5 --capacity 2 --ha-replicas 1 -o table
Expected: a database row with edition = Hyperscale. Provisioning takes a few minutes.
Step 3 — Add a named replica sized independently.
az sql db replica create -g $RG -s $SRV -n appdb \
--partner-server $SRV --partner-database appdb-bi \
--secondary-type Named --family Gen5 --capacity 2 -o table
Expected: a second database appdb-bi on the same server, reading the same page servers.
Step 4 — Allow your client and create an append-only ledger table. Open the firewall to your IP for the lab, then connect with sqlcmd (or the portal query editor) and run:
CREATE TABLE dbo.PaymentAudit
(
AuditId BIGINT IDENTITY(1,1) PRIMARY KEY,
AccountId INT NOT NULL,
Action NVARCHAR(50) NOT NULL,
Amount DECIMAL(19,4) NOT NULL,
OccurredAt DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME()
)
WITH (LEDGER = ON (APPEND_ONLY = ON));
INSERT dbo.PaymentAudit (AccountId, Action, Amount) VALUES (1001, 'DEBIT', 250.00);
Step 5 — Prove append-only and generate a digest. Confirm a mutation is rejected, then seal a digest:
-- This MUST fail with a ledger error (append-only forbids UPDATE):
UPDATE dbo.PaymentAudit SET Amount = 0 WHERE AuditId = 1;
-- Generate the digest you would persist off-box:
EXEC sys.sp_generate_database_ledger_digest;
Expected: the UPDATE errors out (proving immutability); the digest proc returns a JSON block with a sealed hash.
Step 6 — Teardown. One command removes everything:
az group delete -n $RG --yes --no-wait
The lab cost is a few rupees of Hyperscale compute for the minutes it runs; deleting the resource group stops all billing.
Common mistakes & troubleshooting
The failure modes that actually page you — symptom, the root cause, the exact way to confirm it, and the fix. Scan for your symptom:
| # | Symptom | Root cause | Confirm (exact command / path) | Fix |
|---|---|---|---|---|
| 1 | FATAL: too many clients / login failures, CPU low |
Hit max_connections, not compute |
SELECT COUNT(*) FROM sys.dm_exec_connections; vs the SKU’s connection limit |
Connection pooling proxy; raise vCore (limit scales with it); fix per-request connections |
| 2 | Enclave LIKE/range query returns nothing or errors |
Attestation failed → ciphertext fallback | Driver throws attestation error; check EnclaveAttestationUrl matches the DB’s region |
Fix per-environment URL; mark CMK ENCLAVE_COMPUTATIONS; smoke-gate deploy |
| 3 | Named replica far behind primary | Async log apply under write pressure | SELECT replica_lag... / sys.dm_database_replica_states |
Alert on lag; size replica compute up; offload fewer heavy queries |
| 4 | Connection to FQDN fails after public-access-off | privatelink DNS zone not linked → resolves public | nslookup sql-prod-eastus2.database.windows.net returns a public IP |
Link the privatelink DNS zone; A record → private IP |
| 5 | Dev/test database 503s or throttles unexpectedly | Hit pool vCore/storage ceiling | Pool vcore_percent/storage_percent in Azure Monitor near 100% |
Raise pool capacity; add db-max-capacity guardrails |
| 6 | sp_verify_database_ledger fails |
Data altered out-of-band, or wrong digests passed | The proc raises an error naming the block | If digests are right, you have real tampering — investigate; else pass correct digest JSON |
| 7 | vCore resize “stuck” / queries slow right after | Cache (RBPEX) cold post-reattach | Latency high but settling; sys.dm_io_virtual_file_stats shows page-server reads |
Wait for cache warm-up; pre-warm with representative queries |
| 8 | Move DB into pool rejected | Edition/family/server mismatch | Error names the constraint | Match logical server + edition family; Hyperscale needs HS pool |
| 9 | ALTER COLUMN ... ENCRYPTED blocks/locks |
ONLINE = OFF re-encrypt holds locks |
Long-running schema lock in sys.dm_tran_locks |
Use WITH (ONLINE = ON); run off-peak |
| 10 | Geo-restore unavailable for a region loss | Backup redundancy was Local |
az sql db show --query requestedBackupStorageRedundancy = Local |
Set Geo/GeoZone at creation; can’t retro-fix the lost data |
| 11 | App can’t connect after Entra-only enabled | Still using SQL login | az sql server ad-only-auth get = Enabled, app uses user/pass |
Switch to Authentication=Active Directory Default + grant the MI |
| 12 | Ledger table can’t be dropped | Protected by digest policy | DROP errors referencing ledger | Expected — that’s the protection; archive instead |
The most important confirm commands, collected for incident use:
# Tier, replica count, zone redundancy, backup redundancy in one view
az sql db show -g rg-data-eastus2 -s sql-prod-eastus2 -n appdb \
--query "{edition:currentServiceObjectiveName, replicas:highAvailabilityReplicaCount, zone:zoneRedundant, backup:requestedBackupStorageRedundancy}" -o table
# Pool utilisation (your sizing + throttle evidence)
PID=$(az sql elastic-pool show -g rg-data-eastus2 -s sql-prod-eastus2 -n pool-tenants-01 --query id -o tsv)
az monitor metrics list --resource "$PID" --metric eDTU_used storage_used --interval PT5M -o table
-- Which physical role am I on? Primary = READ_WRITE; replicas READ_ONLY.
SELECT DATABASEPROPERTYEX(DB_NAME(), 'Updateability') AS rw_or_ro;
-- Are enclave columns RANDOMIZED (not ciphertext-only)?
SELECT c.name, ek.name AS cek, c.encryption_type_desc
FROM sys.columns c
JOIN sys.column_encryption_keys ek ON ek.column_encryption_key_id = c.column_encryption_key_id
WHERE c.encryption_type IS NOT NULL;
A quick decision table for the “is it compute or connections?” fork that wastes the most time:
| If you see… | It’s probably… | Do this |
|---|---|---|
| Login failures, CPU/vCore low | Connection ceiling | Pool connections; the limit scales with vCores |
| Slow queries, CPU/vCore high | Compute ceiling | Scale vCores up; find the heavy query in Query Store |
| Slow only during reports | BI starving OLTP | Move reporting to a named replica |
| Slow only right after a resize | Cold RBPEX cache | Wait/pre-warm; it’s transient |
| Throttle on a pooled DB | Pool or per-DB cap | Raise pool capacity / db-max-capacity |
Best practices
- Choose Hyperscale for the right reason — large size, fast restore, read-scale, or fast resize — not because it’s the newest tier. For sub-4 TB low-latency OLTP, Business Critical’s local SSD may still win.
- Always set
--zone-redundant truein production and pickGeo/GeoZonebackup redundancy at creation; you cannot retro-fit a cross-region restore for data already lost. - Isolate read-scale onto named replicas with their own SKU and least-privilege logins; never let a BI query share the OLTP primary’s compute.
- Size elastic pools from telemetry, not guesses — use the pool’s vCore/storage percentage metrics — and always set
db-max-capacityso one tenant cannot starve the rest. - Use append-only ledger for audit logs, updatable ledger for traceable master data, and always configure automatic digest upload to immutable storage; an unstored digest proves nothing.
- Prefer randomized + enclave over deterministic where you need both secrecy and rich queries — determinism leaks equality.
- Template the attestation URL per environment and gate every deploy on a smoke test that issues a real enclave query.
- Disable public network access and enable Entra-only authentication; this single pair eliminates an entire class of credential-leak incidents.
- Grant the least role that works —
db_datareader/db_datawriteror a custom role, neverdb_ownerfor app runtime identities. - Keep Query Store
READ_WRITEwithFORCE_LAST_GOOD_PLANon; it auto-reverts plan regressions before they page you. - Connect apps to the failover-group listener, never a regional server name, and route reporting at the read-only listener.
- Rehearse a real failover quarterly and a PITR restore at least as often — an untested DR plan is a hypothesis.
Security notes
The security model here is defence in depth across four planes. At rest, Transparent Data Encryption is on by default; for key sovereignty, layer customer-managed keys (CMK) so Microsoft holds the data but you hold the key. In the engine, Always Encrypted (especially randomized + enclave) keeps the most sensitive columns invisible even to a DBA with sysadmin — the keys live in Key Vault under your control, and the enclave decrypts only in attested protected memory. For integrity, ledger makes any out-of-band change detectable and gives auditors a verification procedure. At the perimeter and identity layer, Private Endpoint over the Microsoft backbone plus public-access-off plus Entra-only authentication removes both internet reachability and password-based logins; least-privilege roles cap blast radius; and SQL auditing records who touched what.
The layered controls, what each defends, and the residual risk each leaves:
| Plane | Control | Defends against | Residual risk |
|---|---|---|---|
| At rest | TDE (default) | Stolen disk/backup | Cloud operator could (in theory) hold the key |
| At rest | CMK in Key Vault | Operator key access | KV access policy/RBAC must be tight |
| In engine | Always Encrypted + enclave | DBA/operator reading plaintext | Client key store compromise |
| Integrity | Ledger + immutable digests | Silent data tampering | Detection, not prevention |
| Perimeter | Private Endpoint + public-off | Internet reachability | Private DNS must be correct |
| Identity | Entra-only auth | Password/credential leaks | Entra account/MI compromise |
| Identity | Least-privilege roles | Over-broad access | Privilege creep over time |
| Detection | SQL auditing | Undetected access | Logs must be reviewed/alerted |
The least-privilege grant pattern for the three identities that touch a typical app:
| Identity | Should get | Should NOT get | Scope |
|---|---|---|---|
| App runtime MI | db_datareader + db_datawriter |
db_owner, DDL |
The app database |
| Migration runner | db_ddladmin (+ data roles) |
db_owner |
The app database, CI only |
| Reporting group | db_datareader |
Any write | Named replica only |
| Break-glass admin | db_owner |
Day-to-day use | Logged, PIM-elevated |
Cost & sizing
Cost on these patterns is driven by compute (vCores), replica count, storage, and backup redundancy. Hyperscale bills compute per vCore-hour and storage separately; each HA or named replica is additional compute; backups beyond the included amount and geo-redundant backup storage add to the bill. Serverless compute can cut cost for bursty/intermittent databases by scaling compute with load. Elastic pools cut multi-tenant cost by letting you pay for aggregate peak instead of the sum of per-tenant peaks — often a 60–85% reduction at scale.
What drives the bill and how to control each lever:
| Cost driver | What it bills | How to reduce | Watch-out |
|---|---|---|---|
| Primary compute | vCore-hours | Right-size; serverless for bursty | Under-size → throttle |
| HA replicas | Per-replica compute | Use only as many as failover needs | Each is full primary-SKU cost |
| Named replicas | Per-replica compute (own SKU) | Size small for light reporting | Easy to over-provision |
| Storage | Per-GB of data + log | Archive/retention hygiene | Grows automatically |
| Backup storage | Beyond included; geo costs more | Local/Zone for non-DR |
Don’t cripple DR to save pennies |
| Elastic pool | Pool vCores (shared) | Lower db-min-capacity floors |
Over-reserved floors waste density |
| Enclave/attestation | AAS is low-cost | Minimal | Mostly negligible |
Rough indicative figures (East US 2, list, subject to change — always price with the calculator and apply reservations/savings plans):
| Item | Approx scale | Indicative USD/mo | Indicative INR/mo |
|---|---|---|---|
| Hyperscale 2 vCore primary | Small prod | ~$300–400 | ~₹25k–33k |
| + 1 HA replica (2 vCore) | Failover target | ~$300–400 | ~₹25k–33k |
| Named replica (2 vCore) | Light reporting | ~$300–400 | ~₹25k–33k |
| Elastic pool, GP 16 vCore | ~200 tenants | ~$2,000–2,600 | ~₹165k–215k |
| Same 200 tenants, single SKUs | naive | ~$10,000+ | ~₹830k+ |
| Geo-redundant backup uplift | per-GB | small % of compute | small % |
The headline sizing rule: pool size is the aggregate peak of all tenants, not the sum of their peaks — measure it from vcore_percent telemetry over a representative week. For the primary, size to the steady-state write workload and offload reads to replicas; scaling vCores up is fast and reversible on Hyperscale, so start conservative and grow from evidence. There is no free tier for Hyperscale; for genuinely free experimentation use the serverless General Purpose free offer on a separate database.
Interview & exam questions
1. Why are Hyperscale backups and restores effectively size-independent? Because storage is disaggregated into page servers and backups are snapshot-based against those page servers, not a full copy of a monolithic file. A 50 TB database snapshots in roughly the same wall-clock time as a 500 GB one, and a restore rehydrates page-server snapshots plus log replay rather than replaying a full backup chain. (Maps to DP-300.)
2. Differentiate HA replicas from named replicas. HA replicas are hidden hot standbys sized identically to the primary, sharing its page servers and log, usable for failover and generic reads. Named replicas are separate database resources with their own SKU, name, and logins, reading the same page servers — built for isolated, independently sized read-scale-out. Only HA replicas are failover targets; you can have up to 30 named replicas.
3. A vCore resize on Business Critical takes hours but minutes on Hyperscale. Why? Business Critical keeps data on local SSD with an AlwaysOn availability group, so a resize must seed a full local copy onto the new node. Hyperscale’s compute is decoupled from storage, so a resize just reattaches a new compute node to the existing page servers; the only cost is warming the RBPEX cache.
4. What problem do elastic pools solve, and what two guardrails prevent a noisy tenant?
They solve multi-tenant cost: many databases share one vCore/storage budget, so you provision for aggregate peak. db-max-capacity caps any single database’s vCore usage (stops one tenant starving the rest) and db-min-capacity reserves a floor (guarantees responsiveness, but over-reserving erodes density).
5. Append-only vs updatable ledger — when each?
Append-only rejects UPDATE/DELETE and suits immutable audit logs and event streams. Updatable allows mutation but retains full history in a paired history table surfaced via a _Ledger view, suiting master data that must change but remain fully traceable. Both are tamper-evident.
6. Is ledger tamper-prevention? No — it is tamper-evidence. It does not stop a privileged user from altering data; it guarantees the alteration is cryptographically detectable via the Merkle-tree digest. You must store digests off-box in immutable storage, or there is nothing trustworthy to verify against.
7. Why can classic Always Encrypted not run LIKE, and how do enclaves fix it?
The engine sees only ciphertext, so it can do at most equality with deterministic encryption (equal plaintext → equal ciphertext). A secure enclave is an attested protected-memory region where the engine can briefly decrypt and evaluate LIKE/range predicates on plaintext, including over randomized-encrypted columns, with the DBA still locked out.
8. Deterministic vs randomized encryption — the trade-off? Deterministic produces identical ciphertext for identical plaintext, enabling equality/joins server-side but leaking which values are equal. Randomized produces different ciphertext each time (no leakage) but allows no server-side operations — unless an enclave is present, in which case randomized + enclave gives both maximum secrecy and rich queries.
9. What single setting eliminates an entire class of credential-leak incidents on Azure SQL?
Enabling Microsoft Entra-only authentication (az sql server ad-only-auth enable). It disables SQL logins and passwords entirely, so leaked connection-string credentials cannot authenticate at all; apps use managed identities instead.
10. After disabling public network access, connections to the FQDN fail. Why and how do you confirm?
The privatelink DNS zone is not linked, so the FQDN still resolves to the public IP, which is now unreachable. Confirm with nslookup <server>.database.windows.net — a public IP means the zone isn’t wired. Fix by linking the privatelink DNS zone with an A record to the private endpoint’s IP. (Maps to AZ-305/DP-300.)
11. Why connect apps to a failover-group listener instead of a regional server name? The listener FQDN is stable and follows the primary through a region failover, so applications need no reconfiguration during DR. A regional server name pins the app to one region and breaks on failover. Reporting should use the read-only listener so it transparently follows the secondary.
12. A serverless app tier reports too many clients while CPU stays low — diagnosis?
It hit the max_connections ceiling, which is a connection limit, not a compute limit (and it scales with vCores). Fix with a connection-pooling proxy and, in a pool, per-database db-max-capacity guardrails; raising vCores raises the connection limit as a side effect but pooling is the real fix.
Quick check
- Which Hyperscale tier is the durability boundary — compute, page servers, the log service, or long-term storage?
- You need an independently sized, isolated read endpoint for a BI team. HA replica or named replica?
- Which two elastic-pool settings stop one tenant from starving the others?
- You must let analysts run
WHERE ssn LIKE '%1234'while the DBA can never see plaintext. Which encryption mode + feature? - After enabling Entra-only auth and public-access-off, a working app suddenly can’t connect at all. Name the two most likely causes.
Answers
- The log service — the primary writes to its durable landing zone and it fans records out; durability is the log, not the cached pages on compute.
- A named replica — its own SKU and logins, reading the same page servers, isolated from the OLTP primary’s compute.
db-max-capacity(caps one DB’s usage) anddb-min-capacity(reserves a floor) — the max is the one that actually prevents starvation.- Randomized encryption with a secure enclave (Always Encrypted), with the CMK marked
ENCLAVE_COMPUTATIONSand the client attesting via Azure Attestation. - (a) The app is still using a SQL login/password (must switch to a managed identity), and/or (b) the privatelink DNS zone isn’t linked so the FQDN resolves to the now-unreachable public IP.
Glossary
- Hyperscale — Azure SQL service tier with disaggregated storage (page servers + log service), scaling to 128 TB with fast resize/restore.
- Page server — A storage node owning up to ~1 TB of the database and serving 8 KB pages to compute on cache miss.
- Log service — The component that writes the durable log and fans records to page servers, replicas, and backups; the durability boundary.
- RBPEX — Resilient Buffer Pool Extension; the local SSD cache on a Hyperscale compute node.
- HA replica — A hot standby sized like the primary, sharing its storage/log; a failover target and generic read endpoint.
- Named replica — A separate database resource with its own SKU and logins reading the primary’s page servers; built for isolated read-scale-out (up to 30).
- Elastic pool — A shared vCore/DTU and storage budget for many databases on one logical server; the multi-tenant density model.
- db-max-capacity / db-min-capacity — Per-database ceiling and floor inside an elastic pool that bound each database’s resource use.
- Ledger table — A tamper-evident table whose row hashes form a Merkle tree sealed into database digests.
- Database digest — The sealed Merkle-tree root you store off-box; verification recomputes the tree and compares to it.
- Append-only / updatable ledger — Ledger flavours that forbid mutation (audit logs) or allow it with full retained history (traceable master data).
- Always Encrypted — Client-side column encryption where the SQL service never sees plaintext; keys live with the client.
- Column Master Key / Column Encryption Key — The two-tier key hierarchy: the CMK (in Key Vault) wraps the CEK that encrypts column data.
- Secure enclave — A VBS-protected, attested memory region on the SQL node where the engine can decrypt and run rich predicates.
- Attestation — Cryptographic proof (via Azure Attestation) that the enclave is genuine, required before keys enter it.
- Failover group — A managed listener over geo-replication giving a stable FQDN (read-write and read-only) that survives region failover.
Next steps
- Azure SQL connectivity troubleshooting: timeouts, throttling, blocking — the incident playbook for when connections to this database fail.
- Encryption at rest with customer-managed keys and double encryption — the at-rest layer beneath Always Encrypted.
- Azure Key Vault: secrets, keys, and certificates — where your column master keys and connection secrets should live.
- Private Endpoints and Private DNS at scale — get the privatelink DNS wiring right so public-access-off never breaks connectivity.
- Azure SQL Managed Instance failover groups and link — the managed-instance cousin when you need near-full SQL Server surface area.