AWS KMS in Depth: Multi-Region Keys, Envelope Encryption, Key Policies, and Grants

Most teams treat KMS as a checkbox: tick “encryption at rest,” pick the AWS-managed key, move on. That works until you need cross-Region DR, cross-account data sharing, a key you can prove only one role can use, or 100k decrypts a second on a hot path. At that point KMS stops being a checkbox and becomes an authorization system with a latency budget and a request quota. This guide treats it that way — the key types, how envelope encryption actually moves bytes, multi-Region keys, the three-layer authorization model, and the operational edges (rotation, quotas, audit) that bite at scale.

The single most important fact, the one every later decision falls out of: your plaintext almost never goes to KMS, and KMS key material never leaves KMS. A KMS key (the CMK, formally a “KMS key”) is a logical reference to material that lives inside FIPS 140-3 validated HSMs. You cannot export it. What you can do is ask KMS to wrap and unwrap small blobs — and the standard pattern is to have KMS wrap a data key that you then use locally to encrypt the actual payload. That is envelope encryption.

By the end you will stop guessing about encryption design. When someone asks “can the DR Region read this?”, “who can actually call Decrypt on that key?”, or “why is the KMS bill suddenly four figures?”, you will know the mechanism, the exact CLI to confirm it, and the fix. Because this is a reference you will return to mid-incident, the key types, the conditions, the limits, the errors and the cost levers are all laid out as scannable tables — read the prose once, then keep the tables open when the pager goes off.

What problem this solves

Encryption-at-rest is easy to enable and hard to get right the moment your data or your blast radius crosses a boundary. The defaults — AWS-managed keys, a Region-locked CMK, no encryption context, a wide-open key policy delegated to IAM and never tightened — work in a single account, in a single Region, with one team, until exactly one of those assumptions breaks. Then you discover that an AWS-managed key cannot be shared cross-account, that a Region-scoped ciphertext is unreadable in your DR Region even though the bytes replicated fine, or that a hot path calling GenerateDataKey per object is throttling and running up a four-figure monthly bill.

What breaks without this knowledge: a team enables a DynamoDB global table, replicates client-side-encrypted PAN data to a second Region, runs a failover game-day, and cannot decrypt a single replicated record — the worst kind of DR failure, because it looks healthy until you cut over. Or an auditor asks “prove only the payments role can decrypt cardholder data,” and the answer is a key policy delegated to account root with no kms:ViaService or EncryptionContext constraint, so the honest answer is “anyone in the account with the right IAM can.” Or a launch melts under ThrottlingException because nobody enabled S3 Bucket Keys or raised the request-rate quota ahead of time.

Who hits this: any team with cross-Region DR of encrypted data, cross-account data sharing, a compliance regime that demands provable key isolation, or a high-throughput encryption path. It bites hardest on active-active applications with global tables (multi-Region key portability), regulated workloads (encryption context and least-privilege key policies), and anything that calls KMS per object (quota and cost). The fix is almost never “turn on a bigger key” — it is “make the key portable where ciphertext travels, scope authorization to the exact caller and context, and architect the call volume down.”

To frame the whole field before the deep dive, here is every problem class this article covers, the question it forces, and where to look first:

Problem class	What is actually wrong	First question to ask	Where to confirm	Most common single cause
Replicated-but-unreadable	DR Region can’t decrypt replicated ciphertext	Did the key travel, or only the bytes?	`describe-key` → `MultiRegion`	Single-Region CMK behind a global table
Locked-out key (AccessDenied)	Decrypt denied despite IAM allowing it	Does the key policy delegate to IAM?	`get-key-policy` → `EnableIAMRoot`	Key policy with no IAM delegation
Throttling / surprise bill	`ThrottlingException`, four-figure KMS spend	Is it one KMS call per object?	CloudTrail event count; Service Quotas	Per-object `GenerateDataKey`, no Bucket Keys
Cross-account read fails closed	Foreign principal can’t decrypt shared data	Are both sides (key policy + IAM) aligned?	`get-key-policy` + consumer IAM	Only one side of the handshake granted
Context / audit gap	Unexpected `Decrypt`, weak provable scope	Is the key pinned to service + context?	CloudTrail `Decrypt` events	No `kms:ViaService` / `EncryptionContext`
Rotation that didn’t re-wrap	Compliance expects fresh material on old data	Did rotation re-encrypt, or just rotate?	`get-key-rotation-status` + design review	Assuming rotation re-wraps stored data

Learning objectives

By the end of this article you can:

Explain why KMS never encrypts bulk data and design envelope encryption with data keys so plaintext and key material both stay where they belong.
Choose the right key type (symmetric, asymmetric, HMAC, multi-Region) and management model (AWS-owned, AWS-managed, customer-managed) deliberately, knowing which choices are immutable one-way doors.
Architect multi-Region keys for DR and active-active, scope each replica’s policy independently, and run a ReEncrypt backfill of pre-existing ciphertext.
Reason fluently about the three-layer authorization model — key policy vs IAM vs grants — and know why a KMS key policy that omits IAM delegation makes IAM ignored.
Build cross-account encrypted sharing (S3, EBS snapshots) as a two-sided handshake and explain why missing either side fails closed.
Use encryption context, kms:ViaService, aws:PrincipalOrgID and ABAC conditions to pin a key to an exact caller, service, and context — and prove it from CloudTrail.
Manage rotation, request quotas, and cost: enable automatic rotation, use S3 Bucket Keys and data-key caching to crush per-object calls, and raise the request-rate quota before a launch.

Prerequisites & where this fits

You should be comfortable with IAM policy evaluation (identity vs resource policies, explicit deny wins, condition keys), basic AWS CLI (--query, JSON output, fileb://), and the idea of at-rest vs in-transit encryption. You should know what an ARN is and how cross-account access works in principle. Familiarity with AES-GCM and the words “symmetric” and “authenticated encryption” helps but isn’t required — the article defines what it needs.

This sits in the Security & Cryptography track. It assumes the identity fundamentals from AWS IAM Fundamentals: Users, Roles, Policies & the Evaluation Engine and the least-privilege patterns in IAM Least Privilege: Permission Boundaries & Inescapable Ceilings, because a key policy is just a resource policy and the three-layer model is IAM evaluation with the key policy as the root of trust. It is upstream of every storage deep-dive: S3 storage classes, versioning, lifecycle & encryption, EBS, EFS & FSx, and RDS & Aurora all consume KMS for SSE. It pairs with Secrets Manager & Parameter Store (both wrap their secrets under a CMK), CloudTrail, Config & audit (where every Decrypt lands), and at org scale with Organizations: SCPs, guardrails & delegated admin and Resource Control Policies & the data perimeter.

A quick map of who owns which layer during an encryption design or incident, so you call the right person fast:

Layer	What lives here	Who usually owns it	Failure classes it can cause
Application crypto	Data keys, AAD, the Encryption SDK	App / dev team	Wrong encryption context, unbounded cache, plaintext leak
Key policy	Resource policy on the CMK (root of trust)	Security / platform	Lockout, over-broad access, missing cross-account grant
IAM	Identity policies referencing KMS actions	Platform + app	Decrypt denied (key policy didn’t delegate), wrong key ARN
Grants	Programmatic, temporary delegations	Platform + AWS services	Orphaned grants, service can’t mint data keys
Multi-Region / DR	Primary + replicas, ReEncrypt backfill	Platform / DR owner	Replicated-but-unreadable, divergent replica policy
Quota & cost	Request-rate quota, Bucket Keys, caching	Platform / FinOps	Throttling, surprise bill, throttled launch

Core concepts

Six mental models make every later decision obvious.

KMS is a wrapping and authorization service, not a bulk cipher. Two API verbs anchor the whole service. Encrypt/Decrypt send up to 4 KB of plaintext/ciphertext and KMS does the crypto — fine for small secrets, wrong for large objects. GenerateDataKey mints a fresh symmetric data key and returns it to you both in plaintext and wrapped under your KMS key; you encrypt your gigabytes locally with the plaintext copy, throw it away, and store only the ciphertext payload plus the wrapped blob. Every design decision — quotas, caching, multi-Region portability — falls out of “KMS protects the key that protects the data.”

The key policy is the root of trust, not IAM. Unlike S3, where an IAM policy alone can grant access, a KMS key policy that does not delegate to IAM means IAM policies are ignored. The key policy is authoritative. This single fact is responsible for most KMS lockouts and most “why is my Decrypt denied when IAM clearly allows it” tickets.

KeySpec and KeyUsage are immutable. You choose symmetric vs asymmetric vs HMAC, and encrypt-vs-sign, at creation, and you can never change them. Picking wrong means creating a new key and re-encrypting. Treat key creation as a one-way door.

A normal key is Region-locked; ciphertext is portable only with a multi-Region key. Ciphertext produced in eu-west-1 can only be decrypted by calling KMS in eu-west-1. If that Region is down, the data is unreadable even though the bytes replicated elsewhere. Multi-Region keys share the same key material across Regions so an envelope decrypts in any of them — a deliberate availability/isolation trade-off you opt into, not a default.

Encryption context is authenticated, logged, additional data — your cheapest authorization and audit tool. It is not secret and not encrypted, but it is bound to the ciphertext and required, byte-for-byte, at decrypt time, appears in CloudTrail, and can be constrained in policy with kms:EncryptionContext: conditions. It is the difference between “this role can decrypt anything” and “this role can decrypt only tenant=acme invoices.”

Throughput is bounded by a shared, Region-level request-rate quota. Symmetric Decrypt/GenerateDataKey/Encrypt share a per-Region quota (tens of thousands of requests/second depending on Region). A hot path that calls KMS per object hits ThrottlingException long before you expect. The architecture answer is to call KMS less (Bucket Keys, data-key caching), not just to retry harder.

The vocabulary in one table

Before the deep sections, pin every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side:

Concept	One-line definition	Where it lives	Why it matters
KMS key (CMK)	Logical reference to HSM-resident key material	KMS, in-Region	The thing you authorize; never exported
Data key	Symmetric key minted by `GenerateDataKey`	In your app (plaintext) + storage (wrapped)	Does the actual bulk encryption
Envelope encryption	Encrypt data with a data key, wrap the data key with the CMK	App + storage	The pattern behind almost everything
Key policy	Resource policy on the CMK	On the key	Root of trust; authoritative over IAM
Grant	Programmatic, temporary delegation	On the key	How AWS services and short-lived jobs get access
Encryption context	Authenticated additional data (AAD)	Bound to ciphertext, logged	Cheap authorization + audit binding
Multi-Region key (MRK)	Primary + replicas sharing key material	Multiple Regions (`mrk-...`)	Cross-Region ciphertext portability for DR
Alias	Mutable, Region-scoped friendly pointer	In-Region	Human-friendly key reference; repointable
`kms:ViaService`	Condition pinning a key to one AWS service	Key/IAM policy condition	“Only through S3,” not a human running decrypt
Request-rate quota	Shared per-Region cryptographic-ops cap	Region	The throttling ceiling you architect around
S3 Bucket Keys	Bucket-level data key caching for SSE-KMS	S3 bucket setting	Collapses per-object KMS calls dramatically
`ReEncrypt`	Swap the wrapping key without exposing plaintext	KMS API	Re-wraps stored ciphertext during migrations

1. Key types: pick the right primitive

KMS keys are not interchangeable. The KeySpec and KeyUsage are immutable at creation, so this is a one-way door. Choose the primitive for the job, then never look back.

Type	KeySpec	KeyUsage	Use it for	API verbs	Notes / limit
Symmetric	`SYMMETRIC_DEFAULT` (AES-256-GCM)	`ENCRYPT_DECRYPT`	Envelope encryption; default for S3/EBS/RDS/Secrets Manager	`Encrypt`, `Decrypt`, `GenerateDataKey*`	Never leaves KMS; 4 KB direct limit; the workhorse
Asymmetric (encrypt)	`RSA_2048/3072/4096`	`ENCRYPT_DECRYPT`	Encrypt where the encryptor has no AWS creds	`Encrypt`, `Decrypt` (+ public key)	Public key downloadable; no `GenerateDataKey`; small payloads
Asymmetric (sign)	`ECC_NIST_P256/384/521`, `ECC_SECG_P256K1`, `RSA_*`	`SIGN_VERIFY`	Code/document signing, external verification	`Sign`, `Verify` (+ public key)	Verifier may be outside AWS; pick curve per standard
Key agreement	`ECC_NIST_*`, `SM2` (China)	`KEY_AGREEMENT`	Derive a shared secret (ECDH)	`DeriveSharedSecret`	Niche; for negotiated session keys
HMAC	`HMAC_224/256/384/512`	`GENERATE_VERIFY_MAC`	MACs, signed tokens, deterministic integrity	`GenerateMac`, `VerifyMac`	Symmetric secret; never exported; no encrypt
Multi-Region	any above + `MultiRegion: true`	per spec	DR, global tables, cross-Region ciphertext portability	per spec + `ReplicateKey`	Shares material across Regions; `mrk-` id prefix

Orthogonal to spec is who manages the key. This choice is not immutable, but migrating between models means re-encryption, so decide deliberately:

Management model	Who controls policy	Rotation	Cross-account?	Visible in your account?	Cost	When it’s acceptable
AWS-owned	AWS (invisible)	AWS-managed	No	No	Free	Zero audit/access-control requirement
AWS-managed (`aws/s3`, `aws/ebs`…)	AWS (you can’t edit)	Auto, yearly	No	Yes	Free key; per-request charges	Single-account, single-Region, no policy edits
Customer-managed (CMK)	You (full policy)	Optional, configurable	Yes	Yes	~$1/key/month + requests	Anything that needs policy, sharing, or proof

The takeaway: the moment you need an editable policy, cross-account sharing, custom rotation, grants, or independent deletion, you are forced to a customer-managed key. Everything below assumes CMKs. The decision rule in one table:

If you need…	AWS-owned	AWS-managed	Customer-managed
Edit the key policy	No	No	Yes
Share ciphertext cross-account	No	No	Yes
Cross-Region DR portability (MRK)	No	No	Yes
Custom rotation cadence	No	No (yearly only)	Yes
Grants for services/short-lived jobs	No	Limited	Yes
See it / audit its policy	No	Yes	Yes
Pay nothing for the key	Yes	Yes	No (~$1/mo)

# A customer-managed symmetric key, with rotation on from day one
aws kms create-key \
  --description "app-prod data-at-rest" \
  --key-spec SYMMETRIC_DEFAULT \
  --key-usage ENCRYPT_DECRYPT \
  --tags TagKey=env,TagValue=prod TagKey=app,TagValue=payments

# Give it a human-friendly alias (aliases are Region-scoped, mutable pointers)
aws kms create-alias \
  --alias-name alias/payments-prod \
  --target-key-id <key-id>

# Terraform equivalent: key + alias + rotation in one place, reviewed in a PR
resource "aws_kms_key" "payments" {
  description             = "app-prod data-at-rest"
  key_usage               = "ENCRYPT_DECRYPT"
  customer_master_key_spec = "SYMMETRIC_DEFAULT"
  enable_key_rotation     = true
  rotation_period_in_days = 180
  deletion_window_in_days = 30
  tags = { env = "prod", app = "payments" }
}

resource "aws_kms_alias" "payments" {
  name          = "alias/payments-prod"
  target_key_id = aws_kms_key.payments.key_id
}

Aliases deserve their own note, because they are the moving part teams misuse. An alias is a Region-scoped, mutable pointer to a key — alias/payments-prod resolves to whatever key it currently targets. That makes manual rotation (repoint the alias to a new key) trivial, but it also means an alias is not a stable identity for audit. The alias quirks worth knowing:

Alias behaviour	Detail	Gotcha
Scope	One Region only	The same name in another Region is a different pointer
Mutability	`update-alias` repoints it instantly	A typo’d repoint silently sends encrypt to the wrong key
`aws/` prefix	Reserved for AWS-managed keys	You cannot create `alias/aws-...`
In CloudTrail	Calls log the key ARN, not the alias	Audit on key ID, never on alias name
Multi-Region	Use the same alias in each Region pointing at the local MRK	Convention, not enforced — keep it disciplined

2. Envelope encryption: data keys and the SDK

For anything larger than 4 KB, you encrypt locally with a data key. The raw flow, before any SDK:

# 1. Mint a data key: plaintext + ciphertext (wrapped) come back together
aws kms generate-data-key \
  --key-id alias/payments-prod \
  --key-spec AES_256 \
  --query '{plaintext:Plaintext, wrapped:CiphertextBlob}' \
  --output json
# 2. Encrypt the payload locally with `plaintext` (AES-256-GCM in your app)
# 3. Persist the ciphertext payload + `wrapped` blob; ZERO the plaintext key in memory
# 4. To read: Decrypt(wrapped) -> plaintext key -> decrypt payload locally

The data-key API has variants, and picking the wrong one is a common slip — GenerateDataKeyWithoutPlaintext exists precisely for the case where the minting service shouldn’t see the key (it will be decrypted later, elsewhere):

API	Returns	Use it when	Counts against quota
`GenerateDataKey`	Plaintext + wrapped key	You encrypt now, in this process	Yes (1 op)
`GenerateDataKeyWithoutPlaintext`	Wrapped key only	A different component will decrypt later	Yes (1 op)
`GenerateDataKeyPair`	Plaintext + wrapped private key + public key	Asymmetric envelope (sign/encrypt offline)	Yes (heavier op)
`GenerateDataKeyPairWithoutPlaintext`	Wrapped private + public key	Mint for later asymmetric use	Yes
`Encrypt` (direct)	Ciphertext (≤4 KB plaintext)	Tiny secret, no envelope needed	Yes
`Decrypt`	Plaintext (≤4 KB)	Unwrap a data key or tiny secret	Yes
`ReEncrypt`	Ciphertext under a new key	Migrate wrapping key without plaintext	Yes (decrypt + encrypt)

Rolling your own framing (IV, AAD, key blob, algorithm tags) is where teams introduce vulnerabilities. Use the AWS Encryption SDK — it produces a portable, self-describing message format that bundles the wrapped data key with the ciphertext, handles authenticated encryption, and supports multiple wrapping keys:

import aws_encryption_sdk
from aws_encryption_sdk import CommitmentPolicy

client = aws_encryption_sdk.EncryptionSDKClient(
    commitment_policy=CommitmentPolicy.REQUIRE_ENCRYPT_REQUIRE_DECRYPT
)
key_provider = aws_encryption_sdk.StrictAwsKmsMasterKeyProvider(
    key_ids=["arn:aws:kms:eu-west-1:111122223333:key/<key-id>"]
)

ciphertext, header = client.encrypt(
    source=plaintext_bytes,
    key_provider=key_provider,
    # Encryption context is AAD: authenticated, logged in CloudTrail, NOT secret
    encryption_context={"tenant": "acme", "purpose": "invoice"},
)

Two things that matter at principal level. Key commitment (REQUIRE_ENCRYPT_REQUIRE_DECRYPT, the 2.x+ default) prevents a class of attacks where one ciphertext decrypts to different plaintexts under different keys; do not lower it to interop with ancient clients unless you understand exactly what you give up. And encryption context is your cheapest authorization and audit tool — additional authenticated data that is not encrypted but is bound to the ciphertext, required byte-for-byte at decrypt, logged in CloudTrail, and constrainable in policy. The Encryption SDK options worth knowing:

SDK concept	What it controls	Default (v2+)	When to change
Commitment policy	Whether key commitment is required	`REQUIRE_ENCRYPT_REQUIRE_DECRYPT`	Only to read legacy v1 ciphertext (temporarily)
Algorithm suite	Cipher + signing + commitment	AES-256-GCM + HKDF + ECDSA + commit	Drop signing only if you understand the trade
Key provider	Which CMK(s) wrap the data key	`StrictAwsKmsMasterKeyProvider`	Multi-CMK (multi-Region/multi-account) decrypt
Encryption context	The AAD map bound to ciphertext	empty	Always set it — it’s free authorization + audit
Caching CMM	Reuse data keys across messages	off	High-throughput app encryption (see below)
Discovery provider	Decrypt with any CMK in an account/Region	off (strict)	Multi-Region decrypt where key ARN varies

What encryption context is and isn’t trips up almost everyone the first time. The boundaries:

Encryption context…	Is	Is NOT
Secrecy	Authenticated (integrity-bound)	Encrypted / secret — it’s plaintext in logs
Requirement at decrypt	Required byte-for-byte	Optional metadata
Order sensitivity	Order-independent (it’s a map)	A positional list
Policy use	Constrainable via `kms:EncryptionContext:<k>`	A substitute for the key policy
Good values	`tenant`, `purpose`, `table`, `pk`	Anything secret (passwords, PII, the data itself)
Audit	Logged in CloudTrail on every call	Hidden — assume it’s visible

Caching: trading blast radius for throughput

Calling GenerateDataKey per object is correct but expensive — every write becomes a KMS request against your quota. The data key caching layer in the Encryption SDK reuses a data key across many messages, bounded by max_age, max_messages_encrypted, and max_bytes_encrypted:

from aws_encryption_sdk.caches.local import LocalCryptoMaterialsCache
from aws_encryption_sdk.materials_managers.caching import CachingCryptoMaterialsManager

cache = LocalCryptoMaterialsCache(capacity=1000)
cmm = CachingCryptoMaterialsManager(
    master_key_provider=key_provider,
    cache=cache,
    max_age=300.0,             # rotate the cached data key every 5 minutes
    max_messages_encrypted=1000  # ...or after 1000 messages, whichever first
)

The trade-off is explicit: a larger cache and longer max_age mean fewer KMS calls (cheaper, faster, quota-friendly) but a wider blast radius per data key and weaker cryptographic isolation between messages. Tune it; never leave it unbounded. The knobs and how to reason about each:

Cache parameter	What it bounds	Lower value	Higher value	Sensible starting point
`max_age`	Wall-clock lifetime of a cached data key	More KMS calls, tighter isolation	Fewer calls, wider blast radius	60–300 s
`max_messages_encrypted`	Messages per cached data key	Tighter isolation	Fewer calls	100–1000
`max_bytes_encrypted`	Bytes per cached data key	Tighter isolation	Fewer calls	Stay well under cipher limits
`capacity`	Distinct cache entries (by context)	More cache misses	More memory	100–1000
Per-context keying	Separate keys per encryption context	Stronger tenant isolation	More entries	Always key on `tenant`/`purpose`

3. Multi-Region keys: portability for DR

A normal KMS key is Region-locked: ciphertext produced in eu-west-1 can only be decrypted by calling KMS in eu-west-1. If that Region is down, your data is unreadable even though the bytes are safely replicated elsewhere. Multi-Region keys (MRKs) fix this. A primary and its replicas share the same key material and a related key ID (mrk-...), so ciphertext encrypted under the primary decrypts under any replica — no re-encryption.

# Create the primary as multi-region
aws kms create-key --multi-region \
  --description "global-table encryption primary" \
  --region eu-west-1

# Replicate it into a DR Region (same material, independent policy)
aws kms replicate-key \
  --key-id mrk-1234567890abcdef0 \
  --replica-region us-east-1 \
  --description "global-table encryption replica"

# Terraform: a primary MRK and a replica with its OWN, narrower policy
resource "aws_kms_key" "mrk_primary" {
  provider                 = aws.euwest1
  description              = "global-table encryption primary"
  multi_region             = true
  enable_key_rotation      = true
}

resource "aws_kms_replica_key" "mrk_replica" {
  provider                = aws.useast1
  description             = "global-table encryption replica (decrypt-only app)"
  primary_key_arn         = aws_kms_key.mrk_primary.arn
  policy                  = data.aws_iam_policy_document.replica_decrypt_only.json
}

The nuances that catch people are exactly the things that make MRKs powerful and dangerous:

MRK property	What it means	Implication
Shared key material	Primary and replicas hold identical material	Ciphertext is portable; isolation is weaker by design
Independent policies	Each replica has its own key policy, grants, tags	DR Region can be decrypt-only while primary writes
Independent rotation state	Replicas track rotation but material flows from primary	Old ciphertext stays decryptable everywhere
`mrk-` key ID prefix	Related key IDs across Regions	Same logical key; different ARNs per Region
Deletion protection	Can’t delete the primary while replicas exist	KMS prevents orphaning material
Not auto-created	You explicitly `replicate-key` per Region	No “automatic” global key — opt in per Region
Rotation propagation	Auto-rotation on the primary propagates to replicas	Don’t separately rotate replicas

The single most important caveat, learned the hard way in the scenario below: adopting an MRK does nothing for data already wrapped under a single-Region key. Existing envelopes are bound to the old key in the old Region. Making them portable requires a ReEncrypt backfill. The single-Region vs multi-Region decision, distilled:

If…	Choose	Why
Ciphertext never leaves its Region	Single-Region CMK	Stronger isolation; no shared material
Active-active app, both Regions read/write	Multi-Region key	Either Region decrypts any envelope
DynamoDB / S3 cross-Region replication of encrypted data	Multi-Region key	Replication moves bytes, not the key
Cross-Region copy of encrypted EBS snapshots for DR	Multi-Region key (or re-encrypt on copy)	Avoid an unreadable DR copy
Strict per-Region key isolation is a compliance requirement	Single-Region CMK	Material must not be duplicated
You “might need it someday”	Single-Region CMK	Don’t weaken isolation speculatively

4. Key policies vs IAM vs grants: the authorization model

This is where KMS differs sharply from most AWS services and where the dangerous mistakes live. Three layers decide whether a Decrypt succeeds, and the order of trust is not what IAM-first intuition expects:

The key policy — the resource policy on the key. It is the root of trust. Unlike S3, where IAM alone can grant access, a KMS key policy that does not delegate to IAM means IAM policies are ignored. The key policy is authoritative.
IAM policies — only effective if the key policy enables IAM (the canonical kms:* to the account root statement). With that statement present, IAM grants behave normally.
Grants — programmatic, temporary, fine-grained delegations, ideal for AWS services and short-lived workloads.

The three layers side by side, because choosing the wrong instrument is the most common architectural error:

Mechanism	Granularity	Lifetime	Who edits it	Best for	Limit / gotcha
Key policy	Per principal + condition	Until you edit it	Key admins	The authoritative allow-list; IAM delegation	Omit IAM delegation → IAM ignored → lockout risk
IAM policy	Per principal, per action	Until you edit it	IAM admins	Account-internal access at scale	Only works if key policy delegates to IAM
Grant	Per grantee + operations + constraints	Until retired/revoked	Anyone with `CreateGrant`	AWS services, short-lived jobs	~50k grants/key; can be orphaned; eventually consistent

The minimum sane key policy delegates administration and usage to IAM rather than hard-coding every principal:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "EnableIAMRoot",
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::111122223333:root" },
      "Action": "kms:*",
      "Resource": "*"
    },
    {
      "Sid": "KeyUsers",
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::111122223333:role/payments-app" },
      "Action": ["kms:Encrypt", "kms:Decrypt", "kms:GenerateDataKey*", "kms:DescribeKey"],
      "Resource": "*",
      "Condition": {
        "StringEquals": { "kms:EncryptionContext:tenant": "acme" }
      }
    }
  ]
}

The EnableIAMRoot statement is not a backdoor to root credentials — it delegates the decision to IAM in this account. Omit it and you must enumerate every principal in the key policy forever, including the admins who could otherwise fix a lockout. That is the classic way teams brick a key.

The KMS actions you will actually write into policies, grouped by what they let a principal do — least privilege means granting only the row you need:

Action	What it permits	Typical grantee	Risk if over-granted
`kms:Encrypt`	Encrypt ≤4 KB directly	Apps writing tiny secrets	Low (encrypt-only is benign)
`kms:Decrypt`	Unwrap data keys / decrypt	Read paths	High — this is “read the data”
`kms:GenerateDataKey*`	Mint data keys for envelope encryption	Write paths	Medium (enables new encryption)
`kms:ReEncrypt*`	Re-wrap ciphertext under another key	Migration jobs	Medium (can move data between keys)
`kms:DescribeKey`	Read key metadata	Almost everything	Low
`kms:CreateGrant`	Delegate to another principal	Services, admins	High — can broaden access
`kms:PutKeyPolicy`	Replace the key policy	Key admins only	Critical — can grant anyone
`kms:ScheduleKeyDeletion`	Schedule destruction of the key	Break-glass only	Critical — can destroy data
`kms:Sign` / `kms:Verify`	Asymmetric sign/verify	Signing services	Medium
`kms:GenerateMac` / `kms:VerifyMac`	HMAC operations	Token services	Medium

When to reach for grants

Grants shine where a static policy is wrong. They are issued via API, carry their own constraints, and can be retired the instant the work is done:

aws kms create-grant \
  --key-id <key-id> \
  --grantee-principal arn:aws:iam::111122223333:role/batch-worker \
  --operations Decrypt GenerateDataKey \
  --constraints EncryptionContextSubset={tenant=acme} \
  --retiring-principal arn:aws:iam::111122223333:role/grant-manager

This is exactly the mechanism AWS services use on your behalf: when you attach a CMK to an Auto Scaling group or an encrypted EBS volume, the service creates a grant so it can mint data keys without you widening the key policy. Prefer grants over key-policy edits for service integrations and transient access — they are revocable and don’t bloat the resource policy. The grant constraints and lifecycle, which trip teams up under eventual consistency:

Grant aspect	Detail	Gotcha
`Operations`	Explicit list (`Decrypt`, `GenerateDataKey`…)	Grant only what the workload needs
`EncryptionContextSubset`	Grantee must include these context pairs	Looser than `Equals`; allows extra context
`EncryptionContextEquals`	Exact context match required	Strictest; use for tight scoping
Retiring principal	Who can retire the grant	Set it, or you can only revoke as admin
`GrantToken`	Returned token bridges eventual consistency	Pass it on immediate use or get transient denies
Revoke vs retire	Admin revokes; grantee/retirer retires	Both remove it; revoke is the admin lever
Limit	~50,000 grants per key	Orphaned service grants accumulate — audit them

5. Cross-account encryption: S3, EBS, and snapshots

Sharing encrypted data across accounts is a two-sided handshake: the key policy in the owning account must allow the foreign principal, and that principal’s IAM policy in their own account must allow the KMS actions. Missing either side fails closed — and the failure is a generic AccessDenied, not a helpful “the other side didn’t grant you.”

Key policy in the owning account (111122223333):

{
  "Sid": "AllowConsumerAccount",
  "Effect": "Allow",
  "Principal": { "AWS": "arn:aws:iam::444455556666:root" },
  "Action": ["kms:Decrypt", "kms:DescribeKey", "kms:GenerateDataKey"],
  "Resource": "*",
  "Condition": { "StringEquals": { "aws:PrincipalOrgID": "o-exampleorgid" } }
}

IAM policy in the consumer account (444455556666) — note you must name the full key ARN, because the key lives in another account:

{
  "Effect": "Allow",
  "Action": ["kms:Decrypt", "kms:DescribeKey"],
  "Resource": "arn:aws:kms:eu-west-1:111122223333:key/<key-id>"
}

The two-sided requirement is the whole game. Which side does what:

Side	What it must allow	Reference style	If missing
Owning account (key policy)	The foreign principal/root + actions	`Principal: {AWS: arn:...:444455556666:root}`	`AccessDenied` — owner didn’t share
Consumer account (IAM)	The KMS actions on the full key ARN	`Resource: arn:aws:kms:...:111122223333:key/...`	`AccessDenied` — consumer didn’t grant
Both (recommended)	Scope with `aws:PrincipalOrgID`	Condition on the key policy	Org-wide blast radius if `*`

Service-specific edges that matter, because each service layers its own access surface on top of the KMS handshake:

Service	Extra surfaces beyond key policy + IAM	Cross-account requirement	Cost lever
S3 (SSE-KMS)	Bucket policy + object ownership	Bucket policy, key policy, reader IAM all align	S3 Bucket Keys — collapse per-object calls
EBS / snapshots	Snapshot share + grant on the key	CMK (not AWS-managed); target gets `Decrypt` + `CreateGrant`	Re-encrypt to target’s key on copy
RDS / Aurora	Snapshot share + KMS share	Share snapshot AND the CMK; target re-encrypts on copy	Storage-level SSE; few KMS calls
DynamoDB	Table encryption setting	MRK for cross-Region global tables	Table-level key; low call volume
Secrets Manager	Resource policy on the secret	Share secret + grant `Decrypt` on its CMK	One CMK for many secrets

Two edges deserve calling out explicitly. For S3 with SSE-KMS, cross-account reads need the bucket policy, the object’s KMS key policy, and the reader’s IAM all aligned; set the bucket’s default encryption to your CMK and enable S3 Bucket Keys — it caches a bucket-level data key and collapses thousands of per-object KMS calls into a handful, cutting both cost and quota pressure dramatically. For EBS / snapshots, you cannot share a snapshot encrypted with an AWS-managed key cross-account — full stop. Use a CMK, share the snapshot, grant the target account Decrypt + CreateGrant on the key, and the target re-encrypts to their key on copy. This is why production fleets standardize on CMKs for EBS even when defaults would “work.”

# Enable S3 Bucket Keys on a bucket's default SSE-KMS — the single biggest KMS-call reducer
aws s3api put-bucket-encryption --bucket payments-data \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "aws:kms",
        "KMSMasterKeyID": "arn:aws:kms:eu-west-1:111122223333:key/<key-id>"
      },
      "BucketKeyEnabled": true
    }]
  }'

6. Key rotation: automatic vs manual

Automatic rotation is the default answer. Enable it and KMS generates new cryptographic material on a schedule (the default is yearly; the rotation period is now configurable down to ~90 days), retaining all prior material so old ciphertext stays decryptable. The key ID, ARN, and policy never change — rotation is invisible to applications.

aws kms enable-key-rotation \
  --key-id <key-id> \
  --rotation-period-in-days 180

aws kms get-key-rotation-status --key-id <key-id>

Crucial subtlety: automatic rotation rotates the KMS key material, but it does not re-encrypt your existing data or your stored data keys. Old envelopes are unwrapped with retained old material; new writes use new material. If your compliance regime demands that data actually be re-wrapped under fresh material, that is a separate, application-driven re-encryption job, not something rotation does for you:

# Conceptual re-encrypt: ReEncrypt swaps the wrapping key without exposing plaintext.
# Plaintext never returns to your process — KMS decrypts and re-wraps internally.
new_blob = kms.re_encrypt(
    CiphertextBlob=old_wrapped_blob,
    SourceKeyId="alias/payments-prod-old",
    DestinationKeyId="alias/payments-prod-new",
    DestinationEncryptionContext={"tenant": "acme"},
)["CiphertextBlob"]

Manual rotation (a brand-new key behind the same alias) is for cases automatic rotation can’t cover: changing key spec, moving to a new Region/account, or responding to suspected compromise where you must invalidate old material. You repoint the alias and run a re-encryption backfill. Asymmetric and HMAC keys historically required this; symmetric keys rarely do. The two models compared:

Aspect	Automatic rotation	Manual rotation (new key + alias)
What changes	Backing material only	A whole new key (new key ID/ARN)
Key ID / ARN	Unchanged	New — alias repointed
App impact	Invisible	Repoint alias; backfill old data
Re-encrypts old data?	No (old material retained)	Only if you run a `ReEncrypt` backfill
Changes key spec?	No	Yes (the reason to do it manually)
Cadence	90–2560 days, configurable	Whenever you decide
Cost	Slightly more material stored	New key (~$1/mo) + ReEncrypt requests
Use it for	The default, compliance “rotate yearly”	Spec change, compromise, Region/account move

What rotation does and does not do — the table that prevents a false sense of security:

Statement about rotation	True?	Why it matters
New writes use fresh material	Yes	The point of rotation
Old ciphertext stays decryptable	Yes	Prior material is retained
Existing data is re-wrapped under new material	No	Needs a separate `ReEncrypt` job
Stored data keys are refreshed	No	They’re wrapped under retained material
The key ARN/ID changes	No	Apps and references are unaffected
Replica MRKs rotate too	Yes (from the primary)	Don’t rotate replicas separately
Compliance “data must be re-encrypted” is satisfied	Not by rotation alone	Run an application-driven backfill

7. Auditing and controls

Every KMS API call lands in CloudTrail — including Decrypt, with the encryptionContext, the calling principal, and the source IP. This is the highest-signal security telemetry in the account; treat unexpected Decrypt calls as you would unexpected AssumeRole.

Tighten policies with condition keys so a key is usable only in the intended context:

{
  "Sid": "OnlyViaS3FromOrg",
  "Effect": "Allow",
  "Principal": { "AWS": "arn:aws:iam::111122223333:role/payments-app" },
  "Action": ["kms:Decrypt", "kms:GenerateDataKey"],
  "Resource": "*",
  "Condition": {
    "StringEquals": {
      "kms:ViaService": "s3.eu-west-1.amazonaws.com",
      "aws:PrincipalOrgID": "o-exampleorgid"
    }
  }
}

kms:ViaService pins usage to a specific AWS service (the key can only be used through S3, not by a human running aws kms decrypt). aws:PrincipalOrgID confines cross-account use to your organization. For ABAC, gate Decrypt on tag parity between the principal and the key with aws:ResourceTag / aws:PrincipalTag so access scales with tags instead of hand-written ARNs:

"Condition": {
  "StringEquals": {
    "aws:ResourceTag/project": "${aws:PrincipalTag/project}"
  }
}

The condition keys that turn a wide key policy into a tightly-scoped one — the most useful security lever in KMS:

Condition key	Pins the key to…	Example value	Effect
`kms:ViaService`	One AWS service only	`s3.eu-west-1.amazonaws.com`	Blocks a human running `aws kms decrypt` directly
`aws:PrincipalOrgID`	Your organization	`o-exampleorgid`	Confines cross-account use to the org
`kms:EncryptionContext:<k>`	A specific context value	`tenant = acme`	“Decrypt only acme’s data”
`kms:EncryptionContextKeys`	Required context keys present	`["tenant"]`	Force callers to supply context
`aws:SourceVpce`	A specific VPC endpoint	`vpce-0abc...`	Only via the KMS interface endpoint
`aws:PrincipalTag` / `aws:ResourceTag`	Tag parity (ABAC)	`project` match	Access scales with tags, not ARNs
`kms:GrantIsForAWSResource`	AWS-service-created grants only	`true`	Limit who can create grants
`kms:CallerAccount`	A specific account	`444455556666`	Scope cross-account precisely

What to watch for in CloudTrail, and what each signal usually means:

CloudTrail signal	What it suggests	Action
`Decrypt` with no `kms:ViaService`	A human/role called KMS directly	Investigate; pin the key to a service if unexpected
`Decrypt` with mismatched `encryptionContext`	Wrong tenant/purpose, or probing	Alarm like an unexpected `AssumeRole`
`Decrypt` from an unexpected `sourceIPAddress`	Off-network access	Correlate with VPC endpoint / `aws:SourceVpce`
Spike in `GenerateDataKey` count	Per-object KMS storm	Enable Bucket Keys / caching; check throttling
`PutKeyPolicy` / `ScheduleKeyDeletion`	Sensitive admin change	High-priority alert; should be rare and reviewed
`Decrypt` `AccessDenied` bursts	Misconfig or probing	Check policy delegation / context / cross-account

8. Cost and request-quota management

KMS pricing has two parts: roughly $1 per CMK per month (replicas billed separately), and per-request charges with a shared, Region-level request rate quota for cryptographic operations. Symmetric Decrypt/GenerateDataKey/Encrypt share a quota (a few tens of thousands of requests/second depending on Region and key type). A hot path that calls KMS per object will hit ThrottlingException long before you expect.

The levers, in order of impact:

Lever	Effort	KMS-call reduction	Cost impact	When to use
S3 Bucket Keys	One toggle	Drastic (thousands → ~one/bucket)	Lowers bill + quota pressure	Every SSE-KMS bucket, always
Data-key caching	Code (Encryption SDK)	Large within the window	Lowers bill	High-volume app encryption
Quota increase	Service Quotas request	None (raises ceiling)	None directly	Before a launch, proactively
Backoff + jitter	Verify SDK retry config	None (smooths spikes)	None	Always — survive transient throttles
Fewer, broader CMKs	Design	Indirect	Fewer $1/mo keys	Don’t over-fragment keys
Right key choice	Design	Indirect	Asymmetric/ECC ops cost more	Use symmetric unless you need asymmetric

The cost trap is never the $1/month per key. It is millions of unbatched Decrypt calls from a service that should have been using Bucket Keys or a data-key cache. Architect the call volume down first; optimize the bill second.

The throttling-relevant limits and what each means in practice:

Limit / quota	Rough value	Shared across	What hitting it looks like	Mitigation
Symmetric crypto request rate	Tens of thousands rps / Region	`Encrypt`/`Decrypt`/`GenerateDataKey`	`ThrottlingException` under load	Bucket Keys, caching, quota raise
`GenerateDataKey` (asymmetric)	Lower than symmetric	Asymmetric ops pool	Throttling on heavy asymmetric use	Cache; prefer symmetric where possible
Direct `Encrypt`/`Decrypt` payload	≤ 4 KB	per request	`Validation` error on large input	Envelope-encrypt instead
Grants per key	~50,000	per key	`LimitExceeded` on `CreateGrant`	Audit/retire orphaned grants
Aliases per key / account	Bounded	per account	`LimitExceeded` on `CreateAlias`	Reuse aliases; clean up
Keys per account/Region	Soft limit (raisable)	per Region	`LimitExceeded` on `CreateKey`	Consolidate; request increase

Error and limit reference

The KMS errors you will actually see, what they mean, how to confirm, and the fix. The non-obvious ones are AccessDeniedException when IAM looks fine (the key policy didn’t delegate), InvalidCiphertextException from a mismatched encryption context, and KMSInvalidStateException on a key pending deletion or disabled:

Error / exception	Meaning	Likely cause	How to confirm	Fix
`AccessDeniedException`	Caller not authorized	Key policy lacks IAM delegation, or no allow statement	`get-key-policy` (no `EnableIAMRoot`?); `simulate-principal-policy`	Add IAM delegation; grant the action/condition
`AccessDeniedException` (cross-acct)	One side of handshake missing	Key policy or consumer IAM doesn’t allow	Check both key policy and consumer IAM (full ARN)	Align both sides; scope with `PrincipalOrgID`
`InvalidCiphertextException`	Ciphertext/context invalid	Wrong/missing encryption context, or corrupted blob	Compare encrypt vs decrypt context byte-for-byte	Pass the exact same AAD map
`ThrottlingException`	Request-rate quota exceeded	Per-object KMS storm	CloudTrail call count; Service Quotas usage	Bucket Keys, caching; raise quota; backoff
`KMSInvalidStateException`	Key in a bad state for the op	Key disabled or `PendingDeletion`	`describe-key` → `KeyState`	Enable the key; cancel deletion
`NotFoundException`	Key/alias not found	Wrong Region, wrong ID, deleted	`describe-key` in the right Region	Use correct ARN/Region; recreate alias
`DisabledException`	Key is disabled	Someone disabled it	`describe-key` → `Enabled:false`	`enable-key` if intended
`KMSInvalidSignatureException`	Signature verify failed	Wrong key/algorithm or tampered data	Check signing key + `SigningAlgorithm`	Use the correct key/algorithm
`IncorrectKeyException`	Wrong CMK for this ciphertext	Decrypting with the wrong key	Match key ARN in the ciphertext metadata	Use the key that produced the ciphertext
`LimitExceededException`	A KMS limit hit	Too many grants/aliases/keys	`list-grants` / `list-aliases` counts	Retire/clean up; request a limit increase
`DependencyTimeoutException`	KMS internal timeout	Transient	Retry pattern in logs	Exponential backoff + jitter
`MalformedPolicyDocumentException`	Bad key-policy JSON	Syntax/principal error in `PutKeyPolicy`	Validate the policy document	Fix JSON; ensure principal exists

The key states a CMK moves through, because several errors above are just “wrong state for this operation”:

Key state	Can encrypt/decrypt?	How you got here	How to leave
`Enabled`	Yes	Normal	—
`Disabled`	No	`disable-key`	`enable-key`
`PendingDeletion`	No	`schedule-key-deletion` (7–30 day window)	`cancel-key-deletion`
`PendingImport`	No	External key material, not yet imported	Import material
`Creating` / `Updating`	Transient	Replication / external import in progress	Wait
`Unavailable`	No	Custom key store disconnected	Reconnect the key store

Architecture at a glance

The diagram traces the write-then-fail-over path the way bytes actually move, then maps the five places authorization or availability breaks. Read it left to right. A workload needs at-rest crypto, so it calls GenerateDataKey (via the AWS Encryption SDK, with key commitment on and a bounded data-key cache). That call passes through the three-layer authorization stack — the key policy (root of trust, with the EnableIAMRoot delegation), then IAM and grants (effective only because the key policy delegated, scoped by kms:ViaService), then the encryption context check (the AAD that must match byte-for-byte at decrypt). Only if every layer allows does the request reach the multi-Region CMK: an mrk-... primary in eu-west-1 whose material is replicated — not re-encrypted — to a decrypt-only replica in eu-central-1. KMS returns the data key plaintext-and-wrapped; the app encrypts gigabytes locally with AES-256-GCM, zeroes the plaintext, and stores only the ciphertext and the wrapped blob in S3 (Bucket Keys on) or behind a DynamoDB global table that replicates the ciphertext to the second Region.

Notice the two things the diagram is built to teach. First, the dashed replicate-material arrow looping the primary to the replica is the whole DR story: it carries key material, so the same envelope decrypts in either Region — which is exactly why a single-Region key behind a global table produces “replicated-but-unreadable” data (badge 3). Second, every path converges on CloudTrail (every Decrypt, with principal and context) and a quota guard (watching ThrottlingException and request-rate headroom) — the audit-and-protect footer. The five numbered badges sit on the precise hops where it breaks: the key policy locking you out (1), a wrong encryption context failing closed (2), replicated-but-unreadable DR (3), a per-object KMS storm and throttling (4), and an unexpected Decrypt that signals an authorization gap (5). The legend narrates each as symptom, confirm command, and fix.

Real-world scenario

Northwind Pay, a fictional but representative pan-European payments platform, ran active-active in eu-west-1 and eu-central-1 behind a DynamoDB global table, with the application performing client-side field-level encryption on PAN (card number) data before writing. They had used a standard, Region-scoped CMK in eu-west-1. The global table dutifully replicated the ciphertext to eu-central-1 — about 240 million encrypted items, growing 3 million a day. The platform team was six engineers; the KMS bill was a rounding error and nobody had thought hard about it.

The first incident was a planned regional failover game-day. At 10:00 they drained eu-west-1; by 10:02 the eu-central-1 application could not decrypt a single replicated record. Every read of PAN data threw AccessDeniedException / NotFoundException from KMS. The ciphertext was a KMS envelope bound to a key that only existed in eu-west-1, and kms:Decrypt in eu-central-1 had no key to call. Replicated-but-unreadable data is the worst kind of DR failure: it looks perfectly healthy — the bytes are there, the table is green — until you cut over and discover the ability to read them never replicated. The game-day was aborted; the DR posture was, on paper, a lie.

The fix was a multi-Region key, and critically, a backfill — adopting an MRK does nothing for the 240 million records already wrapped under the old single-Region key. They created an MRK primary in eu-west-1, replicated it into eu-central-1, repointed the application’s key provider, and ran a controlled ReEncrypt migration (in batches, throttle-aware, idempotent on a per-item version flag) over the existing items so every envelope was re-wrapped under the portable key. The backfill ran for nine days at a deliberately capped rate to stay well under the request-rate quota. They also tightened each replica’s policy independently: the eu-central-1 replica granted the app Decrypt only, while writes (and thus GenerateDataKey) stayed pinned to the active primary, so a failover couldn’t silently start minting keys in the standby Region.

# Primary in the active Region
aws kms create-key --multi-region --region eu-west-1 \
  --description "pan-field-encryption primary"

# Replica in the standby Region (independent, decrypt-only policy attached after)
aws kms replicate-key \
  --key-id mrk-0a1b2c3d4e5f6a7b8 \
  --replica-region eu-central-1

A second, quieter incident surfaced during the same project. With the global table now genuinely portable, traffic doubled into both Regions and the payments service began throwing intermittent ThrottlingException on GenerateDataKey during the evening peak. The cause: client-side field encryption was calling GenerateDataKey per item, and at peak that exceeded the Region’s request-rate quota. They fixed it two ways: added data-key caching in the Encryption SDK (keyed per tenant, max_age=120s, max_messages_encrypted=500), which cut KMS calls by ~98%, and proactively raised the request-rate quota via Service Quotas ahead of the next quarter’s launch. The S3 archive of settlement files, separately, had Bucket Keys enabled in the same sweep — it had been making a GenerateDataKey call per object and was the single largest line on the (newly noticed) KMS bill.

The lesson the team wrote into their standards, in three lines: if a ciphertext can travel between Regions, the key that protects it must travel too — and you must re-encrypt the data that predates that decision. Replication moves bytes; it does not move the ability to read them. And: architect the KMS call volume down before you scale up the quota. The incident timeline, because the order of discovery is the lesson:

Time / phase	Symptom	Action taken	Effect	What it should have been
Game-day 10:02	DR Region can’t decrypt anything	Abort failover	Outage avoided in prod, DR exposed	Pre-test decrypt in DR, not just replication
Day 0	Root cause: single-Region key	`describe-key` → `MultiRegion: false`	Diagnosis confirmed	—
Day 0–1	Plan the fix	Create MRK primary + replica	Key now portable	Standardize MRK for replicated data from day one
Day 1–10	240M legacy items still bound to old key	`ReEncrypt` backfill, throttle-capped	Every envelope re-wrapped	Backfill is mandatory, not optional
Day 5	`ThrottlingException` at peak	Per-item `GenerateDataKey` found	Diagnosed quota pressure	Cache data keys from the start
Day 6	Crush the call volume	Data-key caching + S3 Bucket Keys	KMS calls −98%	Bucket Keys on every SSE-KMS bucket
Day 7	Pre-empt the next launch	Service Quotas request-rate increase	Headroom secured	Raise quota before incidents
+1 month	Re-run game-day	Failover with portable key + scoped replicas	Clean cutover, decrypt works	The DR posture is now real

Advantages and disadvantages

KMS’s design — keys that never leave the HSM, a resource policy as the root of trust, envelope encryption as the pattern — is what makes it both powerful and full of sharp edges. Weigh it honestly:

Advantages (why this model helps you)	Disadvantages (why it bites)
Key material never leaves FIPS 140-3 HSMs; you can’t accidentally export or leak it	You can’t hold the key either — every decrypt is a network call against a quota
The key policy is an authoritative, auditable allow-list independent of IAM sprawl	A key policy without IAM delegation makes IAM ignored — the classic lockout
Envelope encryption keeps bulk plaintext out of KMS — fast, cheap, quota-friendly	You must implement the envelope correctly (use the SDK) or you’ll roll a vuln
Encryption context gives free, logged, byte-bound authorization and audit	A context mismatch fails closed with an unhelpful `InvalidCiphertext`
Multi-Region keys make ciphertext portable for real cross-Region DR	They weaken isolation by design, and don’t re-encrypt your legacy data
Grants delegate to services and short jobs without bloating the key policy	Orphaned grants accumulate (50k/key) and are eventually consistent
Every `Decrypt` is in CloudTrail with principal + context — top-tier telemetry	High-volume per-object calls flood CloudTrail and throttle before you expect
Automatic rotation is invisible and keeps old ciphertext readable	Rotation does not re-wrap stored data — compliance needs a separate job

The model is right whenever you need provable, auditable, centrally-controlled key management — which is almost every regulated or multi-account workload. It bites hardest on teams that treat the key policy like an IAM afterthought (lockouts), that call KMS per object (throttling, cost), that assume rotation re-encrypts data (compliance gap), or that replicate ciphertext cross-Region without making the key portable (the Northwind failure). Every disadvantage is manageable — but only if you know it exists, which is the entire point of this article.

Hands-on lab

Prove envelope encryption, multi-Region portability, encryption-context enforcement, and the key-policy-is-root-of-trust behaviour end to end — all in the AWS Free Tier shape (a CMK is ~$1/month prorated; one MRK replica adds a second; delete at the end and the cost is pennies). Run in CloudShell with credentials that can manage KMS. You will create a multi-Region key, round-trip a data key, prove cross-Region decrypt, and prove that the wrong encryption context fails closed.

Step 1 — Variables.

PRIMARY_REGION=eu-west-1
DR_REGION=eu-central-1
ACCT=$(aws sts get-caller-identity --query Account --output text)
echo "account=$ACCT primary=$PRIMARY_REGION dr=$DR_REGION"

Step 2 — Create a multi-Region primary key and an alias.

KEY_ID=$(aws kms create-key --multi-region --region $PRIMARY_REGION \
  --description "kms-lab mrk primary" \
  --query KeyMetadata.KeyId --output text)
aws kms create-alias --region $PRIMARY_REGION \
  --alias-name alias/kms-lab --target-key-id $KEY_ID
echo "primary key: $KEY_ID"

Expected: a KeyId beginning mrk-. Confirm it is multi-Region:

aws kms describe-key --region $PRIMARY_REGION --key-id alias/kms-lab \
  --query 'KeyMetadata.{MR:MultiRegion, State:KeyState}'
# -> "MR": true, "State": "Enabled"

Step 3 — Enable rotation on a 180-day cadence.

aws kms enable-key-rotation --region $PRIMARY_REGION \
  --key-id $KEY_ID --rotation-period-in-days 180
aws kms get-key-rotation-status --region $PRIMARY_REGION --key-id $KEY_ID
# -> "KeyRotationEnabled": true, "RotationPeriodInDays": 180

Step 4 — Replicate into the DR Region with a (decrypt-only) policy.

aws kms replicate-key --region $PRIMARY_REGION \
  --key-id $KEY_ID --replica-region $DR_REGION \
  --description "kms-lab mrk replica"
aws kms create-alias --region $DR_REGION \
  --alias-name alias/kms-lab --target-key-id $KEY_ID
aws kms describe-key --region $DR_REGION --key-id alias/kms-lab \
  --query 'KeyMetadata.{MR:MultiRegion, State:KeyState}'
# -> "MR": true, "State": "Enabled" in the DR Region too

Step 5 — Envelope round-trip: mint a data key, then decrypt the wrapped blob.

WRAPPED=$(aws kms generate-data-key --region $PRIMARY_REGION \
  --key-id alias/kms-lab --key-spec AES_256 \
  --encryption-context tenant=acme \
  --query CiphertextBlob --output text)

# Decrypt the wrapped data key back, in the PRIMARY Region (must supply same context)
aws kms decrypt --region $PRIMARY_REGION \
  --ciphertext-blob fileb://<(echo "$WRAPPED" | base64 --decode) \
  --encryption-context tenant=acme \
  --query KeyId --output text
# -> returns the key ARN, proving decrypt authorization + correct context

Step 6 — Prove cross-Region portability (the DR claim). Decrypt the same wrapped blob by calling KMS in the DR Region:

aws kms decrypt --region $DR_REGION \
  --ciphertext-blob fileb://<(echo "$WRAPPED" | base64 --decode) \
  --encryption-context tenant=acme \
  --query KeyId --output text
# -> returns the DR-Region key ARN: the multi-Region key made the envelope portable

Step 7 — Prove encryption context fails closed. Decrypt with the wrong context — it must error:

aws kms decrypt --region $PRIMARY_REGION \
  --ciphertext-blob fileb://<(echo "$WRAPPED" | base64 --decode) \
  --encryption-context tenant=wrong 2>&1 | grep -i "InvalidCiphertext\|AccessDenied" \
  && echo "GOOD: wrong context correctly rejected"

Validation checklist. You created a multi-Region key, enabled rotation, replicated it, round-tripped a data key, decrypted the same envelope in two Regions (portability), and proved that the wrong encryption context fails closed. The lab steps mapped to what each proves:

Step	What you did	What it proves	Real-world analogue
2	Create MRK primary	Key is multi-Region (`mrk-`, `MultiRegion:true`)	Standardizing portable keys for replicated data
3	Enable rotation	Rotation is on and on your cadence	Compliance “rotate yearly” baseline
4	Replicate to DR	Replica shares material, own policy	DR Region gets decrypt-only scope
5	Data-key round-trip	Envelope encryption + correct context decrypts	Every app write/read path
6	Cross-Region decrypt	Same envelope decrypts in DR Region	The DR claim, actually tested
7	Wrong context rejected	Context is enforced authorization	“Decrypt only acme’s data”

Teardown (avoid the ~$1/key/month charge). A replica must be deleted; deleting an MRK has a scheduling window:

# Schedule deletion in BOTH Regions (7-day minimum window), then remove aliases
aws kms delete-alias --region $DR_REGION --alias-name alias/kms-lab
aws kms schedule-key-deletion --region $DR_REGION --key-id $KEY_ID --pending-window-in-days 7
aws kms delete-alias --region $PRIMARY_REGION --alias-name alias/kms-lab
aws kms schedule-key-deletion --region $PRIMARY_REGION --key-id $KEY_ID --pending-window-in-days 7

Cost note. Two keys for a few days, scheduled for deletion at the 7-day minimum, costs a fraction of the ~$1/key/month — well under ₹100 total. Crypto requests in this lab are a handful and effectively free.

Common mistakes & troubleshooting

This is the playbook — the part you bookmark. First a scannable table you can read mid-incident, then the full reasoning for the entries that bite hardest. Every row is symptom → root cause → confirm (exact command) → fix.

#	Symptom	Root cause	Confirm (exact cmd)	Fix
1	DR Region can’t decrypt replicated data after failover	Single-Region CMK behind a cross-Region replicated store	`aws kms describe-key --key-id <id> --query KeyMetadata.MultiRegion` = false	Adopt an MRK; replicate; `ReEncrypt` the backlog
2	`AccessDeniedException` on Decrypt though IAM clearly allows it	Key policy omits IAM delegation → IAM ignored	`aws kms get-key-policy` — no `EnableIAMRoot` statement	Add `kms:*` to account-root statement
3	`InvalidCiphertextException` on a blob that decrypted before	Encryption context mismatch at decrypt	Diff the encrypt-time vs decrypt-time `--encryption-context`	Pass the exact same AAD map, byte-for-byte
4	`ThrottlingException` on a hot path; surprising KMS bill	Per-object `GenerateDataKey`/`Decrypt`	CloudTrail event count per minute; Service Quotas usage	S3 Bucket Keys + data-key caching; raise quota
5	Cross-account read fails with generic `AccessDenied`	Only one side of the handshake granted	Check key policy AND consumer IAM (full key ARN)	Align both sides; scope with `aws:PrincipalOrgID`
6	Can’t share an encrypted EBS snapshot cross-account	Snapshot encrypted with an AWS-managed key	`describe-snapshots` KMS key is `aws/ebs`	Re-encrypt to a CMK; share + grant `Decrypt`+`CreateGrant`
7	Rotated the key but auditors say “data not re-encrypted”	Assumed rotation re-wraps stored data (it doesn’t)	`get-key-rotation-status` on, but no backfill ran	Run an application-driven `ReEncrypt` job
8	`KMSInvalidStateException` on every operation	Key is disabled or `PendingDeletion`	`aws kms describe-key --query KeyMetadata.KeyState`	`enable-key` / `cancel-key-deletion`
9	Service (ASG/EBS) can’t mint data keys	Service grant missing or revoked	`aws kms list-grants --key-id <id>` for the service principal	Let the service recreate the grant; don’t revoke service grants
10	Unexpected `Decrypt` from a human running `aws kms decrypt`	Key not pinned to a service	CloudTrail event has no `kms:ViaService`	Add `kms:ViaService` condition; alarm on direct decrypt
11	`IncorrectKeyException` decrypting an old object	Decrypting with the wrong CMK (alias repointed)	Compare the key ARN in the ciphertext vs the alias target	Use the key that produced the ciphertext; keep old key alive
12	Deleted a key and lost data	`ScheduleKeyDeletion` ran; material gone after window	CloudTrail `ScheduleKeyDeletion`; key `PendingDeletion`/gone	`cancel-key-deletion` within the window; use long windows
13	Asymmetric `Verify` fails on a valid signature	Wrong `SigningAlgorithm` or wrong public key	Check `SigningAlgorithm` + the verifying key	Match algorithm + key; re-download public key
14	Grant “works” then intermittently `AccessDenied`	Grant eventual consistency, no `GrantToken` used	Compare grant creation time vs first use	Pass the returned `GrantToken` on immediate use

The expanded form, for the entries that cause the most lost hours:

1. The DR Region cannot decrypt replicated data after a failover. Root cause: A single-Region CMK sits behind a cross-Region replicated store (DynamoDB global table, S3 CRR of SSE-KMS objects, cross-Region snapshot copies). The bytes replicate; the key does not. Confirm: aws kms describe-key --key-id <id> --query 'KeyMetadata.{MR:MultiRegion}' returns false; the ciphertext’s key ARN names the source Region only. Fix: Create a multi-Region key, replicate-key into the DR Region, repoint the app’s key provider, and run a ReEncrypt backfill over pre-existing ciphertext. Test decrypt in DR, not just replication, in every game-day.

2. AccessDeniedException on Decrypt even though the IAM policy plainly allows kms:Decrypt. Root cause: The key policy omits the IAM-delegation statement, so IAM is ignored and the key policy is the only authority — and it doesn’t list this principal. Confirm: aws kms get-key-policy --key-id <id> --policy-name default shows no kms:* to :root statement (no EnableIAMRoot). Fix: Add the account-root delegation statement so IAM is honored; then the existing IAM allow works. This is the most common KMS lockout.

3. InvalidCiphertextException on a blob that decrypted yesterday. Root cause: Encryption context mismatch. The context passed at decrypt differs (a key, a value, or a missing pair) from encrypt — it’s required byte-for-byte. Confirm: Diff the --encryption-context (or SDK encryption_context) used at encrypt vs decrypt; check whether a kms:EncryptionContext: policy condition changed. Fix: Pass the exact same AAD map. Treat context as required authorization, not optional metadata; store/propagate it alongside the ciphertext.

4. ThrottlingException on a hot path, and a four-figure KMS bill nobody expected. Root cause: One KMS call per object — per-object GenerateDataKey on writes or Decrypt on reads — exceeding the Region’s shared request-rate quota and metering every call. Confirm: CloudTrail shows one GenerateDataKey/Decrypt per object; Service Quotas → KMS shows you near the request-rate cap. Fix: Enable S3 Bucket Keys (collapses thousands of calls to ~one per bucket), add data-key caching in the Encryption SDK, verify backoff+jitter, and raise the request-rate quota ahead of launches.

5. A cross-account read fails with a bare AccessDenied and no hint which side. Root cause: The cross-account handshake is two-sided and one side is missing — either the owner’s key policy doesn’t allow the foreign principal, or the consumer’s IAM doesn’t allow the KMS action on the full key ARN. Confirm: Read both the owning account’s key policy (get-key-policy) and the consumer’s IAM policy; the consumer must reference the full key ARN. Fix: Grant both sides; scope the key policy with aws:PrincipalOrgID rather than a bare *.

6. You cannot share an encrypted EBS snapshot with another account. Root cause: The snapshot is encrypted with the AWS-managed aws/ebs key, which cannot be shared cross-account — full stop. Confirm: aws ec2 describe-snapshots --snapshot-ids <id> --query 'Snapshots[].KmsKeyId' shows the aws/ebs alias/key. Fix: Copy the snapshot to one encrypted with a CMK, share the snapshot, grant the target account Decrypt + CreateGrant, and let the target re-encrypt to their key on copy.

7. The key was rotated, but a compliance review fails because “the data wasn’t re-encrypted.” Root cause: A misunderstanding — automatic rotation rotates key material, not your stored ciphertext. Old envelopes are unwrapped with retained old material. Confirm: get-key-rotation-status shows rotation enabled, but there’s no ReEncrypt job in your pipeline. Fix: If the regime requires fresh material on existing data, run a separate, application-driven ReEncrypt backfill. Don’t claim rotation does it.

Best practices

Every key that needs an editable policy, cross-account use, or DR portability is a customer-managed key. AWS-managed keys can’t be shared, can’t be policy-edited, and can’t be multi-Region.
Always include the EnableIAMRoot delegation in the key policy. Omitting it makes IAM ignored and is the classic way to brick a key and lock out the admins who could fix it.
Envelope-encrypt anything over 4 KB with the AWS Encryption SDK — never raw Encrypt on large payloads, and never roll your own framing.
Set a meaningful encryption context on every operation and enforce it in policy with kms:EncryptionContext: conditions — it’s free, logged authorization and audit binding.
Bound your data-key cache (max_age, max_messages_encrypted, max_bytes_encrypted) and key it per tenant/purpose — never leave it unbounded; tune the blast-radius-vs-throughput trade deliberately.
Use multi-Region keys wherever ciphertext crosses Regions, scope each replica’s policy independently (e.g. DR Region decrypt-only), and always pair the MRK adoption with a ReEncrypt backfill of legacy data.
Prefer grants over key-policy edits for AWS services and short-lived jobs — they’re revocable, scoped, and don’t bloat the resource policy. Audit and retire orphaned grants.
Enable S3 Bucket Keys on every SSE-KMS bucket — it’s the single biggest reducer of per-object KMS calls, cutting both cost and quota pressure.
Pin keys with kms:ViaService and aws:PrincipalOrgID so a key is usable only through the intended service and only inside your org; treat direct human Decrypt as a red flag.
Enable automatic rotation on every long-lived symmetric key; handle compliance-driven re-encryption as a separate job, not an assumption about rotation.
Use long key-deletion windows (close to 30 days) and require multi-party review for ScheduleKeyDeletion and PutKeyPolicy — these are data-destroying / access-granting actions.
Monitor CloudTrail for unexpected Decrypt, PutKeyPolicy, and ScheduleKeyDeletion, and request request-rate quota increases before launches, not during incidents.

Security notes

Least privilege on kms:Decrypt. Decrypt is “read the data” — grant it narrowly, gated by encryption context and kms:ViaService, never broadly to account root for usage. Encrypt-only access is benign; decrypt access is the crown jewel.
The key policy is your security boundary, not IAM. Treat the key policy as the authoritative allow-list, reviewed in PRs, with the IAM-delegation statement present but every usage grant scoped and conditioned.
Encryption context for tenant isolation. Bind ciphertext to tenant/purpose and constrain it in policy so one tenant’s role can never decrypt another tenant’s data — provable from CloudTrail, not just asserted.
Confine cross-account use to the org. Always add aws:PrincipalOrgID to cross-account key-policy statements; a bare Principal: * or :root without an org condition is a data-perimeter hole.
Protect the destructive actions. ScheduleKeyDeletion, PutKeyPolicy, DisableKey, and CreateGrant can destroy data or broaden access — restrict them to break-glass roles, alarm on them, and prefer long deletion windows.
Pin keys to services and endpoints. kms:ViaService blocks a human running aws kms decrypt; aws:SourceVpce pins use to your KMS interface endpoint so off-network calls fail.
MRK replicas weaken isolation deliberately — scope them. Identical material in two Regions is an availability trade; give each replica the narrowest policy that Region needs (often decrypt-only in DR) so a failover can’t start minting keys.
Audit grants and rotation. Orphaned grants accumulate (50k/key) — list and retire them; verify rotation is actually on with get-key-rotation-status rather than assuming.

The security controls that also prevent the operational incidents — secure and resilient pull the same way here:

Control	Mechanism	Secures against	Also prevents
IAM-delegation statement	`kms:*` to `:root` in key policy	Permanent lockout	“AccessDenied though IAM allows” tickets
`kms:ViaService` condition	Pin key to one service	Direct human/role decrypt	Unexpected `Decrypt` audit noise
Encryption context + condition	AAD bound + `kms:EncryptionContext:`	Cross-tenant decrypt	Silent wrong-context reuse
`aws:PrincipalOrgID`	Org-scoped cross-account	External-account access	Over-broad sharing blast radius
Decrypt-only replica policy	Narrow MRK replica key policy	Standby Region minting keys	Divergent failover behaviour
Long deletion window + review	`--pending-window-in-days 30`	Accidental/malicious key deletion	Unrecoverable data loss
S3 Bucket Keys	Bucket-level data-key caching	(cost/quota)	Per-object KMS storm + throttling

Cost & sizing

KMS pricing has two components and one trap. The components: roughly $1 per CMK per month (each MRK replica is billed as its own key, so a primary + one replica ≈ $2/month), plus per-request charges for cryptographic operations (a small fee per 10,000 requests, with asymmetric and the heavier GenerateDataKeyPair operations costing more). The trap: the bill is never the $1/month per key — it is millions of unbatched Decrypt/GenerateDataKey calls from a service that should have used S3 Bucket Keys or a data-key cache.

Architect the call volume down first. S3 Bucket Keys collapse thousands of per-object calls into roughly one per bucket; data-key caching reuses a data key across hundreds of messages. Both cut the request line of the bill by orders of magnitude and relieve the request-rate quota. Do this before optimizing anything else.
CMK count is a rounding error; don’t over-fragment but don’t agonize. A few hundred keys is a few hundred dollars a month — trivial next to one throttled launch. Use enough keys for clean policy/blast-radius boundaries; don’t create a key per object.
Multi-Region replicas double the per-key cost of that key and bill requests in each Region separately — worth it for genuine DR, wasteful if you don’t need portability.
Asymmetric and HMAC operations cost more per request than symmetric; prefer symmetric envelope encryption unless you specifically need offline-encrypt, signing, or MACs.
Free tier: KMS includes a modest monthly allotment of free crypto requests; the per-key monthly charge is not free. CloudHSM (a different service) is far more expensive — use KMS unless you have a single-tenant HSM mandate.

A rough monthly picture for a mid-size regulated workload — and what each line actually buys:

Cost driver	What you pay for	Rough INR / month	What it buys	Watch-out
10 CMKs (single-Region)	Per-key monthly charge	~₹850	Clean policy/blast-radius boundaries	Don’t fragment to a key per object
1 MRK primary + 1 replica	Two keys, two Regions	~₹170	Cross-Region ciphertext portability (DR)	Replica is a second billed key
Crypto requests (with Bucket Keys + caching)	Per-10k requests, drastically reduced	~₹400–1,500	The actual encrypt/decrypt work	Without Bucket Keys this can be 50–100×
Crypto requests (naive, per-object)	Per-10k requests, unbatched	~₹40,000+	(same work, done wrong)	The classic surprise bill
Quota increase	Service Quotas request	Free	Headroom before a launch	Request early; approval isn’t instant
CloudHSM (if mandated)	Dedicated HSM hours	~₹1,20,000+	Single-tenant FIPS HSM	Only for hard HSM mandates

The sizing rule in one line: the only KMS number that ever surprises a CFO is request volume. Enable Bucket Keys and caching, raise the quota proactively, and the bill stays in the low thousands of rupees; skip them and a per-object hot path turns a rounding error into a five-figure line.

Interview & exam questions

1. Why does KMS “never encrypt your data,” and what does it encrypt instead? KMS is a wrapping and authorization service, not a bulk cipher: key material lives in FIPS 140-3 HSMs and never leaves. For anything over 4 KB you call GenerateDataKey, which returns a data key both in plaintext and wrapped under your CMK; you encrypt the payload locally with the plaintext key, discard it, and store only the ciphertext plus the wrapped blob. KMS protects the key that protects the data — envelope encryption.

2. A Decrypt is denied even though the caller’s IAM policy clearly allows kms:Decrypt. Why, and how do you confirm? The key policy almost certainly omits the IAM-delegation statement (kms:* to the account root), so IAM is ignored and the key policy is the sole authority — and it doesn’t list this principal. Confirm with aws kms get-key-policy; the absence of an EnableIAMRoot-style statement is the smoking gun. Fix by adding the delegation so IAM is honored.

3. A DynamoDB global table replicates client-side-encrypted data to a second Region, but the DR Region can’t decrypt it. What happened and how do you fix it? The data was encrypted under a single-Region CMK; replication moved the ciphertext but not the key, so kms:Decrypt in the DR Region has no key to call. Fix with a multi-Region key (replicate it into the DR Region) and a ReEncrypt backfill of the pre-existing ciphertext — adopting the MRK alone does nothing for data already wrapped under the old key.

4. Explain the three-layer KMS authorization model and which layer is authoritative. The key policy (resource policy on the key) is the root of trust and is authoritative; IAM policies are effective only if the key policy delegates to IAM; grants are programmatic, temporary, fine-grained delegations for services and short-lived jobs. Unlike S3, IAM alone cannot grant KMS access — the key policy must enable it.

5. What is encryption context, and what is it good for? It’s additional authenticated data (AAD): not secret, not encrypted, but bound to the ciphertext, required byte-for-byte at decrypt, and logged in CloudTrail. It’s the cheapest authorization and audit tool in KMS — constrain it with kms:EncryptionContext: conditions to say “this role can decrypt only tenant=acme data,” and read it in CloudTrail to see exactly what was decrypted.

6. Does automatic key rotation re-encrypt your existing data? If not, what does? No — automatic rotation generates new key material and retains the old, so new writes use fresh material while old ciphertext is still unwrapped with retained material; your stored data and data keys are untouched. To actually re-wrap existing data (a compliance requirement in some regimes) you run a separate, application-driven ReEncrypt backfill.

7. How do you share an encrypted EBS snapshot across accounts, and why won’t the default key work? You cannot share a snapshot encrypted with the AWS-managed aws/ebs key — it’s not shareable cross-account. Use a customer-managed key, share the snapshot, grant the target account Decrypt + CreateGrant on the key, and the target re-encrypts to their key on copy. This is why production fleets standardize on CMKs for EBS.

8. A hot path is throwing ThrottlingException and the KMS bill spiked. Diagnose and fix. The path is making one KMS call per object (GenerateDataKey/Decrypt), exceeding the Region’s shared request-rate quota and metering every call. Confirm via CloudTrail event counts and Service Quotas usage. Fix by enabling S3 Bucket Keys, adding data-key caching in the Encryption SDK, verifying backoff+jitter, and raising the request-rate quota ahead of launches.

9. When do you reach for a grant instead of editing the key policy? For AWS services and short-lived workloads — a grant is issued by API, carries its own operation list and encryption-context constraints, is retirable the instant the work is done, and doesn’t bloat the resource policy. It’s exactly how services like Auto Scaling and EBS mint data keys on your behalf. Prefer grants for transient/service access; key-policy edits for the durable allow-list.

10. How do you confine a KMS key so only a specific service in your org can use it? Add two condition keys to the policy: kms:ViaService (e.g. s3.eu-west-1.amazonaws.com) so the key is usable only through that service and not by a human running aws kms decrypt, and aws:PrincipalOrgID so cross-account use is confined to your organization. Add aws:SourceVpce to pin it to your KMS interface endpoint for off-network defense.

11. What’s the difference between a single-Region and a multi-Region key, and when is each correct? A single-Region key is Region-locked — its ciphertext decrypts only in that Region, and it offers the strongest isolation. A multi-Region key shares identical material across a primary and replicas so the same envelope decrypts in any of them. Use single-Region by default; use multi-Region only where ciphertext must cross Regions (global tables, cross-Region DR), accepting weaker isolation as the trade.

12. You scheduled a key for deletion. What are the risks and the safety net? Deleting a CMK destroys the material after a 7–30 day waiting window, rendering all ciphertext under it permanently unreadable. The safety net is cancel-key-deletion within the window, and the discipline is long windows (close to 30 days) plus multi-party review, because ScheduleKeyDeletion is a data-destroying action.

These map across several certs: AWS Certified Security – Specialty (SCS-C02) covers KMS authorization, grants, encryption context, cross-account, and CloudTrail auditing in depth; Solutions Architect – Professional (SAP-C02) covers multi-Region keys and DR portability; Solutions Architect – Associate (SAA-C03) covers envelope encryption, SSE integrations, and key types. A compact cert-mapping:

Question theme	Primary cert	Objective area
Key policy vs IAM vs grants	Security – Specialty	Identity & access management; data protection
Encryption context, `kms:ViaService`	Security – Specialty	Data protection; logging & monitoring
Multi-Region keys, DR portability	SA Professional	Design for resilience / continuity
Envelope encryption, key types, SSE	SA Associate	Secure architectures; storage encryption
Quotas, Bucket Keys, cost	SA Associate / Specialty	Cost-optimized & performant architectures
Rotation, deletion windows, audit	Security – Specialty	Data protection; incident response

Quick check

A Decrypt is denied even though the caller’s IAM allows kms:Decrypt. What is the most likely cause, and the one command to confirm it?
A global table replicated your encrypted data to a DR Region, but the DR Region can’t decrypt it. What single property of the key explains this, and what two-part fix is required?
True or false: enabling automatic key rotation re-encrypts your existing stored data under the new material.
You’re seeing ThrottlingException on an S3-backed hot path and a surprising KMS bill. Name the single biggest lever to fix both.
You want a CMK that can only be used through S3 and only by principals in your organization. Which two condition keys do you add?

Answers

The key policy omits the IAM-delegation statement, so IAM is ignored and the key policy is authoritative — and it doesn’t list this principal. Confirm with aws kms get-key-policy --key-id <id> --policy-name default; the missing kms:*-to-:root (EnableIAMRoot) statement is the cause. Add it.
The key is single-Region (MultiRegion: false) — replication moved the ciphertext but not the key. The fix is two parts: adopt a multi-Region key (replicate it into the DR Region) and run a ReEncrypt backfill of the pre-existing ciphertext; the MRK alone does nothing for data already wrapped under the old key.
False. Rotation generates new material and retains the old, so new writes use fresh material while old ciphertext is unwrapped with retained material — your stored data is untouched. Re-wrapping existing data needs a separate, application-driven ReEncrypt job.
Enable S3 Bucket Keys. It collapses thousands of per-object KMS calls into roughly one per bucket, cutting the request line of the bill and relieving the request-rate quota that’s causing the throttling. (Add data-key caching and a quota increase as follow-ups.)
kms:ViaService (e.g. s3.eu-west-1.amazonaws.com) to pin the key to S3 only — blocking a human running aws kms decrypt — and aws:PrincipalOrgID to confine use to your organization.

Glossary

KMS key (CMK) — a logical reference to symmetric or asymmetric key material that lives inside FIPS 140-3 validated HSMs and is never exported; formally a “KMS key.”
Customer-managed key — a CMK whose policy, rotation, grants, tags, and deletion you control; the only type worth architecting around (vs AWS-owned/AWS-managed).
Data key — a symmetric key minted by GenerateDataKey, returned in plaintext (for local encryption) and wrapped under the CMK (for storage); it does the actual bulk encryption.
Envelope encryption — encrypting data with a data key and wrapping that data key under a CMK, so plaintext and key material both stay out of KMS.
Key policy — the resource policy attached to a CMK; the root of trust in KMS, authoritative over IAM (which is ignored unless the key policy delegates to it).
Grant — a programmatic, temporary, fine-grained delegation (operations + constraints) used by AWS services and short-lived jobs; revocable without editing the key policy.
Encryption context — additional authenticated data (AAD): not secret, bound to the ciphertext, required byte-for-byte at decrypt, logged in CloudTrail, and constrainable in policy.
Multi-Region key (MRK) — a primary plus replicas that share identical key material (mrk-...), so a single envelope decrypts in any of their Regions; the basis of cross-Region DR portability.
ReEncrypt — a KMS operation that decrypts and re-wraps ciphertext under a new key without exposing plaintext to your process; used to migrate the wrapping key and backfill rotation.
Alias — a mutable, Region-scoped friendly pointer (e.g. alias/payments-prod) to a CMK; convenient but not a stable identity for audit (CloudTrail logs the key ARN).
kms:ViaService — a condition key that pins a CMK to a specific AWS service, so it can be used only through that service (e.g. S3) and not by a human calling KMS directly.
aws:PrincipalOrgID — a condition key that confines cross-account key use to principals within your AWS Organization.
S3 Bucket Keys — a bucket-level setting that caches a data key per bucket for SSE-KMS, collapsing thousands of per-object KMS calls into roughly one and cutting cost and quota pressure.
Request-rate quota — the shared, per-Region cap on cryptographic operations; exceeding it returns ThrottlingException, the throttling ceiling you architect call volume around.
Key commitment — an Encryption SDK property (REQUIRE_ENCRYPT_REQUIRE_DECRYPT) that prevents one ciphertext from decrypting to different plaintexts under different keys.
KeySpec / KeyUsage — the immutable cryptographic spec (e.g. SYMMETRIC_DEFAULT) and purpose (e.g. ENCRYPT_DECRYPT) chosen at key creation — a one-way door.
Key deletion window — the 7–30 day waiting period after ScheduleKeyDeletion before material is destroyed; cancel-key-deletion is the safety net within it.

Next steps

You can now architect KMS as an authorization system with a latency budget and a quota — pick key types deliberately, scope authorization to the exact caller and context, make ciphertext portable where it travels, and crush per-object call volume. Build outward:

Foundation: AWS IAM Fundamentals: Users, Roles, Policies & the Evaluation Engine — the policy-evaluation model the three-layer KMS authorization is built on.
Related: IAM Least Privilege: Permission Boundaries & Inescapable Ceilings — the boundary and ABAC patterns that pair with KMS condition keys.
Related: S3 Deep Dive: Storage Classes, Versioning, Lifecycle & Encryption — where SSE-KMS and S3 Bucket Keys live.
Related: Secrets Manager & Parameter Store Deep Dive — both wrap their secrets under a CMK and use encryption context.
Related: CloudTrail, Config & Audit/Compliance — where every Decrypt lands and how to alarm on unexpected ones.
Related: Cross-Account IAM Roles: External ID, Confused Deputy & Session Policies — the trust patterns behind cross-account KMS sharing.
Related: The Well-Architected Security Pillar, Deep Dive — where data-protection and key management sit in the bigger picture.