Most teams treat KMS as a checkbox: tick “encryption at rest,” pick the AWS-managed key, move on. That works until you need cross-Region DR, cross-account data sharing, a key you can prove only one role can use, or 100k decrypts a second on a hot path. At that point KMS stops being a checkbox and becomes an authorization system with a latency budget and a request quota. This guide treats it that way — the key types, how envelope encryption actually moves bytes, multi-Region keys, the three-layer authorization model, and the operational edges (rotation, quotas, audit) that bite at scale.
The single most important fact, the one every later decision falls out of: your plaintext almost never goes to KMS, and KMS key material never leaves KMS. A KMS key (the CMK, formally a “KMS key”) is a logical reference to material that lives inside FIPS 140-3 validated HSMs. You cannot export it. What you can do is ask KMS to wrap and unwrap small blobs — and the standard pattern is to have KMS wrap a data key that you then use locally to encrypt the actual payload. That is envelope encryption.
By the end you will stop guessing about encryption design. When someone asks “can the DR Region read this?”, “who can actually call Decrypt on that key?”, or “why is the KMS bill suddenly four figures?”, you will know the mechanism, the exact CLI to confirm it, and the fix. Because this is a reference you will return to mid-incident, the key types, the conditions, the limits, the errors and the cost levers are all laid out as scannable tables — read the prose once, then keep the tables open when the pager goes off.
What problem this solves
Encryption-at-rest is easy to enable and hard to get right the moment your data or your blast radius crosses a boundary. The defaults — AWS-managed keys, a Region-locked CMK, no encryption context, a wide-open key policy delegated to IAM and never tightened — work in a single account, in a single Region, with one team, until exactly one of those assumptions breaks. Then you discover that an AWS-managed key cannot be shared cross-account, that a Region-scoped ciphertext is unreadable in your DR Region even though the bytes replicated fine, or that a hot path calling GenerateDataKey per object is throttling and running up a four-figure monthly bill.
What breaks without this knowledge: a team enables a DynamoDB global table, replicates client-side-encrypted PAN data to a second Region, runs a failover game-day, and cannot decrypt a single replicated record — the worst kind of DR failure, because it looks healthy until you cut over. Or an auditor asks “prove only the payments role can decrypt cardholder data,” and the answer is a key policy delegated to account root with no kms:ViaService or EncryptionContext constraint, so the honest answer is “anyone in the account with the right IAM can.” Or a launch melts under ThrottlingException because nobody enabled S3 Bucket Keys or raised the request-rate quota ahead of time.
Who hits this: any team with cross-Region DR of encrypted data, cross-account data sharing, a compliance regime that demands provable key isolation, or a high-throughput encryption path. It bites hardest on active-active applications with global tables (multi-Region key portability), regulated workloads (encryption context and least-privilege key policies), and anything that calls KMS per object (quota and cost). The fix is almost never “turn on a bigger key” — it is “make the key portable where ciphertext travels, scope authorization to the exact caller and context, and architect the call volume down.”
To frame the whole field before the deep dive, here is every problem class this article covers, the question it forces, and where to look first:
| Problem class | What is actually wrong | First question to ask | Where to confirm | Most common single cause |
|---|---|---|---|---|
| Replicated-but-unreadable | DR Region can’t decrypt replicated ciphertext | Did the key travel, or only the bytes? | describe-key → MultiRegion |
Single-Region CMK behind a global table |
| Locked-out key (AccessDenied) | Decrypt denied despite IAM allowing it | Does the key policy delegate to IAM? | get-key-policy → EnableIAMRoot |
Key policy with no IAM delegation |
| Throttling / surprise bill | ThrottlingException, four-figure KMS spend |
Is it one KMS call per object? | CloudTrail event count; Service Quotas | Per-object GenerateDataKey, no Bucket Keys |
| Cross-account read fails closed | Foreign principal can’t decrypt shared data | Are both sides (key policy + IAM) aligned? | get-key-policy + consumer IAM |
Only one side of the handshake granted |
| Context / audit gap | Unexpected Decrypt, weak provable scope |
Is the key pinned to service + context? | CloudTrail Decrypt events |
No kms:ViaService / EncryptionContext |
| Rotation that didn’t re-wrap | Compliance expects fresh material on old data | Did rotation re-encrypt, or just rotate? | get-key-rotation-status + design review |
Assuming rotation re-wraps stored data |
Learning objectives
By the end of this article you can:
- Explain why KMS never encrypts bulk data and design envelope encryption with data keys so plaintext and key material both stay where they belong.
- Choose the right key type (symmetric, asymmetric, HMAC, multi-Region) and management model (AWS-owned, AWS-managed, customer-managed) deliberately, knowing which choices are immutable one-way doors.
- Architect multi-Region keys for DR and active-active, scope each replica’s policy independently, and run a
ReEncryptbackfill of pre-existing ciphertext. - Reason fluently about the three-layer authorization model — key policy vs IAM vs grants — and know why a KMS key policy that omits IAM delegation makes IAM ignored.
- Build cross-account encrypted sharing (S3, EBS snapshots) as a two-sided handshake and explain why missing either side fails closed.
- Use encryption context,
kms:ViaService,aws:PrincipalOrgIDand ABAC conditions to pin a key to an exact caller, service, and context — and prove it from CloudTrail. - Manage rotation, request quotas, and cost: enable automatic rotation, use S3 Bucket Keys and data-key caching to crush per-object calls, and raise the request-rate quota before a launch.
Prerequisites & where this fits
You should be comfortable with IAM policy evaluation (identity vs resource policies, explicit deny wins, condition keys), basic AWS CLI (--query, JSON output, fileb://), and the idea of at-rest vs in-transit encryption. You should know what an ARN is and how cross-account access works in principle. Familiarity with AES-GCM and the words “symmetric” and “authenticated encryption” helps but isn’t required — the article defines what it needs.
This sits in the Security & Cryptography track. It assumes the identity fundamentals from AWS IAM Fundamentals: Users, Roles, Policies & the Evaluation Engine and the least-privilege patterns in IAM Least Privilege: Permission Boundaries & Inescapable Ceilings, because a key policy is just a resource policy and the three-layer model is IAM evaluation with the key policy as the root of trust. It is upstream of every storage deep-dive: S3 storage classes, versioning, lifecycle & encryption, EBS, EFS & FSx, and RDS & Aurora all consume KMS for SSE. It pairs with Secrets Manager & Parameter Store (both wrap their secrets under a CMK), CloudTrail, Config & audit (where every Decrypt lands), and at org scale with Organizations: SCPs, guardrails & delegated admin and Resource Control Policies & the data perimeter.
A quick map of who owns which layer during an encryption design or incident, so you call the right person fast:
| Layer | What lives here | Who usually owns it | Failure classes it can cause |
|---|---|---|---|
| Application crypto | Data keys, AAD, the Encryption SDK | App / dev team | Wrong encryption context, unbounded cache, plaintext leak |
| Key policy | Resource policy on the CMK (root of trust) | Security / platform | Lockout, over-broad access, missing cross-account grant |
| IAM | Identity policies referencing KMS actions | Platform + app | Decrypt denied (key policy didn’t delegate), wrong key ARN |
| Grants | Programmatic, temporary delegations | Platform + AWS services | Orphaned grants, service can’t mint data keys |
| Multi-Region / DR | Primary + replicas, ReEncrypt backfill | Platform / DR owner | Replicated-but-unreadable, divergent replica policy |
| Quota & cost | Request-rate quota, Bucket Keys, caching | Platform / FinOps | Throttling, surprise bill, throttled launch |
Core concepts
Six mental models make every later decision obvious.
KMS is a wrapping and authorization service, not a bulk cipher. Two API verbs anchor the whole service. Encrypt/Decrypt send up to 4 KB of plaintext/ciphertext and KMS does the crypto — fine for small secrets, wrong for large objects. GenerateDataKey mints a fresh symmetric data key and returns it to you both in plaintext and wrapped under your KMS key; you encrypt your gigabytes locally with the plaintext copy, throw it away, and store only the ciphertext payload plus the wrapped blob. Every design decision — quotas, caching, multi-Region portability — falls out of “KMS protects the key that protects the data.”
The key policy is the root of trust, not IAM. Unlike S3, where an IAM policy alone can grant access, a KMS key policy that does not delegate to IAM means IAM policies are ignored. The key policy is authoritative. This single fact is responsible for most KMS lockouts and most “why is my Decrypt denied when IAM clearly allows it” tickets.
KeySpec and KeyUsage are immutable. You choose symmetric vs asymmetric vs HMAC, and encrypt-vs-sign, at creation, and you can never change them. Picking wrong means creating a new key and re-encrypting. Treat key creation as a one-way door.
A normal key is Region-locked; ciphertext is portable only with a multi-Region key. Ciphertext produced in eu-west-1 can only be decrypted by calling KMS in eu-west-1. If that Region is down, the data is unreadable even though the bytes replicated elsewhere. Multi-Region keys share the same key material across Regions so an envelope decrypts in any of them — a deliberate availability/isolation trade-off you opt into, not a default.
Encryption context is authenticated, logged, additional data — your cheapest authorization and audit tool. It is not secret and not encrypted, but it is bound to the ciphertext and required, byte-for-byte, at decrypt time, appears in CloudTrail, and can be constrained in policy with kms:EncryptionContext: conditions. It is the difference between “this role can decrypt anything” and “this role can decrypt only tenant=acme invoices.”
Throughput is bounded by a shared, Region-level request-rate quota. Symmetric Decrypt/GenerateDataKey/Encrypt share a per-Region quota (tens of thousands of requests/second depending on Region). A hot path that calls KMS per object hits ThrottlingException long before you expect. The architecture answer is to call KMS less (Bucket Keys, data-key caching), not just to retry harder.
The vocabulary in one table
Before the deep sections, pin every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side:
| Concept | One-line definition | Where it lives | Why it matters |
|---|---|---|---|
| KMS key (CMK) | Logical reference to HSM-resident key material | KMS, in-Region | The thing you authorize; never exported |
| Data key | Symmetric key minted by GenerateDataKey |
In your app (plaintext) + storage (wrapped) | Does the actual bulk encryption |
| Envelope encryption | Encrypt data with a data key, wrap the data key with the CMK | App + storage | The pattern behind almost everything |
| Key policy | Resource policy on the CMK | On the key | Root of trust; authoritative over IAM |
| Grant | Programmatic, temporary delegation | On the key | How AWS services and short-lived jobs get access |
| Encryption context | Authenticated additional data (AAD) | Bound to ciphertext, logged | Cheap authorization + audit binding |
| Multi-Region key (MRK) | Primary + replicas sharing key material | Multiple Regions (mrk-...) |
Cross-Region ciphertext portability for DR |
| Alias | Mutable, Region-scoped friendly pointer | In-Region | Human-friendly key reference; repointable |
kms:ViaService |
Condition pinning a key to one AWS service | Key/IAM policy condition | “Only through S3,” not a human running decrypt |
| Request-rate quota | Shared per-Region cryptographic-ops cap | Region | The throttling ceiling you architect around |
| S3 Bucket Keys | Bucket-level data key caching for SSE-KMS | S3 bucket setting | Collapses per-object KMS calls dramatically |
ReEncrypt |
Swap the wrapping key without exposing plaintext | KMS API | Re-wraps stored ciphertext during migrations |
1. Key types: pick the right primitive
KMS keys are not interchangeable. The KeySpec and KeyUsage are immutable at creation, so this is a one-way door. Choose the primitive for the job, then never look back.
| Type | KeySpec | KeyUsage | Use it for | API verbs | Notes / limit |
|---|---|---|---|---|---|
| Symmetric | SYMMETRIC_DEFAULT (AES-256-GCM) |
ENCRYPT_DECRYPT |
Envelope encryption; default for S3/EBS/RDS/Secrets Manager | Encrypt, Decrypt, GenerateDataKey* |
Never leaves KMS; 4 KB direct limit; the workhorse |
| Asymmetric (encrypt) | RSA_2048/3072/4096 |
ENCRYPT_DECRYPT |
Encrypt where the encryptor has no AWS creds | Encrypt, Decrypt (+ public key) |
Public key downloadable; no GenerateDataKey; small payloads |
| Asymmetric (sign) | ECC_NIST_P256/384/521, ECC_SECG_P256K1, RSA_* |
SIGN_VERIFY |
Code/document signing, external verification | Sign, Verify (+ public key) |
Verifier may be outside AWS; pick curve per standard |
| Key agreement | ECC_NIST_*, SM2 (China) |
KEY_AGREEMENT |
Derive a shared secret (ECDH) | DeriveSharedSecret |
Niche; for negotiated session keys |
| HMAC | HMAC_224/256/384/512 |
GENERATE_VERIFY_MAC |
MACs, signed tokens, deterministic integrity | GenerateMac, VerifyMac |
Symmetric secret; never exported; no encrypt |
| Multi-Region | any above + MultiRegion: true |
per spec | DR, global tables, cross-Region ciphertext portability | per spec + ReplicateKey |
Shares material across Regions; mrk- id prefix |
Orthogonal to spec is who manages the key. This choice is not immutable, but migrating between models means re-encryption, so decide deliberately:
| Management model | Who controls policy | Rotation | Cross-account? | Visible in your account? | Cost | When it’s acceptable |
|---|---|---|---|---|---|---|
| AWS-owned | AWS (invisible) | AWS-managed | No | No | Free | Zero audit/access-control requirement |
AWS-managed (aws/s3, aws/ebs…) |
AWS (you can’t edit) | Auto, yearly | No | Yes | Free key; per-request charges | Single-account, single-Region, no policy edits |
| Customer-managed (CMK) | You (full policy) | Optional, configurable | Yes | Yes | ~$1/key/month + requests | Anything that needs policy, sharing, or proof |
The takeaway: the moment you need an editable policy, cross-account sharing, custom rotation, grants, or independent deletion, you are forced to a customer-managed key. Everything below assumes CMKs. The decision rule in one table:
| If you need… | AWS-owned | AWS-managed | Customer-managed |
|---|---|---|---|
| Edit the key policy | No | No | Yes |
| Share ciphertext cross-account | No | No | Yes |
| Cross-Region DR portability (MRK) | No | No | Yes |
| Custom rotation cadence | No | No (yearly only) | Yes |
| Grants for services/short-lived jobs | No | Limited | Yes |
| See it / audit its policy | No | Yes | Yes |
| Pay nothing for the key | Yes | Yes | No (~$1/mo) |
# A customer-managed symmetric key, with rotation on from day one
aws kms create-key \
--description "app-prod data-at-rest" \
--key-spec SYMMETRIC_DEFAULT \
--key-usage ENCRYPT_DECRYPT \
--tags TagKey=env,TagValue=prod TagKey=app,TagValue=payments
# Give it a human-friendly alias (aliases are Region-scoped, mutable pointers)
aws kms create-alias \
--alias-name alias/payments-prod \
--target-key-id <key-id>
# Terraform equivalent: key + alias + rotation in one place, reviewed in a PR
resource "aws_kms_key" "payments" {
description = "app-prod data-at-rest"
key_usage = "ENCRYPT_DECRYPT"
customer_master_key_spec = "SYMMETRIC_DEFAULT"
enable_key_rotation = true
rotation_period_in_days = 180
deletion_window_in_days = 30
tags = { env = "prod", app = "payments" }
}
resource "aws_kms_alias" "payments" {
name = "alias/payments-prod"
target_key_id = aws_kms_key.payments.key_id
}
Aliases deserve their own note, because they are the moving part teams misuse. An alias is a Region-scoped, mutable pointer to a key — alias/payments-prod resolves to whatever key it currently targets. That makes manual rotation (repoint the alias to a new key) trivial, but it also means an alias is not a stable identity for audit. The alias quirks worth knowing:
| Alias behaviour | Detail | Gotcha |
|---|---|---|
| Scope | One Region only | The same name in another Region is a different pointer |
| Mutability | update-alias repoints it instantly |
A typo’d repoint silently sends encrypt to the wrong key |
aws/ prefix |
Reserved for AWS-managed keys | You cannot create alias/aws-... |
| In CloudTrail | Calls log the key ARN, not the alias | Audit on key ID, never on alias name |
| Multi-Region | Use the same alias in each Region pointing at the local MRK | Convention, not enforced — keep it disciplined |
2. Envelope encryption: data keys and the SDK
For anything larger than 4 KB, you encrypt locally with a data key. The raw flow, before any SDK:
# 1. Mint a data key: plaintext + ciphertext (wrapped) come back together
aws kms generate-data-key \
--key-id alias/payments-prod \
--key-spec AES_256 \
--query '{plaintext:Plaintext, wrapped:CiphertextBlob}' \
--output json
# 2. Encrypt the payload locally with `plaintext` (AES-256-GCM in your app)
# 3. Persist the ciphertext payload + `wrapped` blob; ZERO the plaintext key in memory
# 4. To read: Decrypt(wrapped) -> plaintext key -> decrypt payload locally
The data-key API has variants, and picking the wrong one is a common slip — GenerateDataKeyWithoutPlaintext exists precisely for the case where the minting service shouldn’t see the key (it will be decrypted later, elsewhere):
| API | Returns | Use it when | Counts against quota |
|---|---|---|---|
GenerateDataKey |
Plaintext + wrapped key | You encrypt now, in this process | Yes (1 op) |
GenerateDataKeyWithoutPlaintext |
Wrapped key only | A different component will decrypt later | Yes (1 op) |
GenerateDataKeyPair |
Plaintext + wrapped private key + public key | Asymmetric envelope (sign/encrypt offline) | Yes (heavier op) |
GenerateDataKeyPairWithoutPlaintext |
Wrapped private + public key | Mint for later asymmetric use | Yes |
Encrypt (direct) |
Ciphertext (≤4 KB plaintext) | Tiny secret, no envelope needed | Yes |
Decrypt |
Plaintext (≤4 KB) | Unwrap a data key or tiny secret | Yes |
ReEncrypt |
Ciphertext under a new key | Migrate wrapping key without plaintext | Yes (decrypt + encrypt) |
Rolling your own framing (IV, AAD, key blob, algorithm tags) is where teams introduce vulnerabilities. Use the AWS Encryption SDK — it produces a portable, self-describing message format that bundles the wrapped data key with the ciphertext, handles authenticated encryption, and supports multiple wrapping keys:
import aws_encryption_sdk
from aws_encryption_sdk import CommitmentPolicy
client = aws_encryption_sdk.EncryptionSDKClient(
commitment_policy=CommitmentPolicy.REQUIRE_ENCRYPT_REQUIRE_DECRYPT
)
key_provider = aws_encryption_sdk.StrictAwsKmsMasterKeyProvider(
key_ids=["arn:aws:kms:eu-west-1:111122223333:key/<key-id>"]
)
ciphertext, header = client.encrypt(
source=plaintext_bytes,
key_provider=key_provider,
# Encryption context is AAD: authenticated, logged in CloudTrail, NOT secret
encryption_context={"tenant": "acme", "purpose": "invoice"},
)
Two things that matter at principal level. Key commitment (REQUIRE_ENCRYPT_REQUIRE_DECRYPT, the 2.x+ default) prevents a class of attacks where one ciphertext decrypts to different plaintexts under different keys; do not lower it to interop with ancient clients unless you understand exactly what you give up. And encryption context is your cheapest authorization and audit tool — additional authenticated data that is not encrypted but is bound to the ciphertext, required byte-for-byte at decrypt, logged in CloudTrail, and constrainable in policy. The Encryption SDK options worth knowing:
| SDK concept | What it controls | Default (v2+) | When to change |
|---|---|---|---|
| Commitment policy | Whether key commitment is required | REQUIRE_ENCRYPT_REQUIRE_DECRYPT |
Only to read legacy v1 ciphertext (temporarily) |
| Algorithm suite | Cipher + signing + commitment | AES-256-GCM + HKDF + ECDSA + commit | Drop signing only if you understand the trade |
| Key provider | Which CMK(s) wrap the data key | StrictAwsKmsMasterKeyProvider |
Multi-CMK (multi-Region/multi-account) decrypt |
| Encryption context | The AAD map bound to ciphertext | empty | Always set it — it’s free authorization + audit |
| Caching CMM | Reuse data keys across messages | off | High-throughput app encryption (see below) |
| Discovery provider | Decrypt with any CMK in an account/Region | off (strict) | Multi-Region decrypt where key ARN varies |
What encryption context is and isn’t trips up almost everyone the first time. The boundaries:
| Encryption context… | Is | Is NOT |
|---|---|---|
| Secrecy | Authenticated (integrity-bound) | Encrypted / secret — it’s plaintext in logs |
| Requirement at decrypt | Required byte-for-byte | Optional metadata |
| Order sensitivity | Order-independent (it’s a map) | A positional list |
| Policy use | Constrainable via kms:EncryptionContext:<k> |
A substitute for the key policy |
| Good values | tenant, purpose, table, pk |
Anything secret (passwords, PII, the data itself) |
| Audit | Logged in CloudTrail on every call | Hidden — assume it’s visible |
Caching: trading blast radius for throughput
Calling GenerateDataKey per object is correct but expensive — every write becomes a KMS request against your quota. The data key caching layer in the Encryption SDK reuses a data key across many messages, bounded by max_age, max_messages_encrypted, and max_bytes_encrypted:
from aws_encryption_sdk.caches.local import LocalCryptoMaterialsCache
from aws_encryption_sdk.materials_managers.caching import CachingCryptoMaterialsManager
cache = LocalCryptoMaterialsCache(capacity=1000)
cmm = CachingCryptoMaterialsManager(
master_key_provider=key_provider,
cache=cache,
max_age=300.0, # rotate the cached data key every 5 minutes
max_messages_encrypted=1000 # ...or after 1000 messages, whichever first
)
The trade-off is explicit: a larger cache and longer max_age mean fewer KMS calls (cheaper, faster, quota-friendly) but a wider blast radius per data key and weaker cryptographic isolation between messages. Tune it; never leave it unbounded. The knobs and how to reason about each:
| Cache parameter | What it bounds | Lower value | Higher value | Sensible starting point |
|---|---|---|---|---|
max_age |
Wall-clock lifetime of a cached data key | More KMS calls, tighter isolation | Fewer calls, wider blast radius | 60–300 s |
max_messages_encrypted |
Messages per cached data key | Tighter isolation | Fewer calls | 100–1000 |
max_bytes_encrypted |
Bytes per cached data key | Tighter isolation | Fewer calls | Stay well under cipher limits |
capacity |
Distinct cache entries (by context) | More cache misses | More memory | 100–1000 |
| Per-context keying | Separate keys per encryption context | Stronger tenant isolation | More entries | Always key on tenant/purpose |
3. Multi-Region keys: portability for DR
A normal KMS key is Region-locked: ciphertext produced in eu-west-1 can only be decrypted by calling KMS in eu-west-1. If that Region is down, your data is unreadable even though the bytes are safely replicated elsewhere. Multi-Region keys (MRKs) fix this. A primary and its replicas share the same key material and a related key ID (mrk-...), so ciphertext encrypted under the primary decrypts under any replica — no re-encryption.
# Create the primary as multi-region
aws kms create-key --multi-region \
--description "global-table encryption primary" \
--region eu-west-1
# Replicate it into a DR Region (same material, independent policy)
aws kms replicate-key \
--key-id mrk-1234567890abcdef0 \
--replica-region us-east-1 \
--description "global-table encryption replica"
# Terraform: a primary MRK and a replica with its OWN, narrower policy
resource "aws_kms_key" "mrk_primary" {
provider = aws.euwest1
description = "global-table encryption primary"
multi_region = true
enable_key_rotation = true
}
resource "aws_kms_replica_key" "mrk_replica" {
provider = aws.useast1
description = "global-table encryption replica (decrypt-only app)"
primary_key_arn = aws_kms_key.mrk_primary.arn
policy = data.aws_iam_policy_document.replica_decrypt_only.json
}
The nuances that catch people are exactly the things that make MRKs powerful and dangerous:
| MRK property | What it means | Implication |
|---|---|---|
| Shared key material | Primary and replicas hold identical material | Ciphertext is portable; isolation is weaker by design |
| Independent policies | Each replica has its own key policy, grants, tags | DR Region can be decrypt-only while primary writes |
| Independent rotation state | Replicas track rotation but material flows from primary | Old ciphertext stays decryptable everywhere |
mrk- key ID prefix |
Related key IDs across Regions | Same logical key; different ARNs per Region |
| Deletion protection | Can’t delete the primary while replicas exist | KMS prevents orphaning material |
| Not auto-created | You explicitly replicate-key per Region |
No “automatic” global key — opt in per Region |
| Rotation propagation | Auto-rotation on the primary propagates to replicas | Don’t separately rotate replicas |
The single most important caveat, learned the hard way in the scenario below: adopting an MRK does nothing for data already wrapped under a single-Region key. Existing envelopes are bound to the old key in the old Region. Making them portable requires a ReEncrypt backfill. The single-Region vs multi-Region decision, distilled:
| If… | Choose | Why |
|---|---|---|
| Ciphertext never leaves its Region | Single-Region CMK | Stronger isolation; no shared material |
| Active-active app, both Regions read/write | Multi-Region key | Either Region decrypts any envelope |
| DynamoDB / S3 cross-Region replication of encrypted data | Multi-Region key | Replication moves bytes, not the key |
| Cross-Region copy of encrypted EBS snapshots for DR | Multi-Region key (or re-encrypt on copy) | Avoid an unreadable DR copy |
| Strict per-Region key isolation is a compliance requirement | Single-Region CMK | Material must not be duplicated |
| You “might need it someday” | Single-Region CMK | Don’t weaken isolation speculatively |
4. Key policies vs IAM vs grants: the authorization model
This is where KMS differs sharply from most AWS services and where the dangerous mistakes live. Three layers decide whether a Decrypt succeeds, and the order of trust is not what IAM-first intuition expects:
- The key policy — the resource policy on the key. It is the root of trust. Unlike S3, where IAM alone can grant access, a KMS key policy that does not delegate to IAM means IAM policies are ignored. The key policy is authoritative.
- IAM policies — only effective if the key policy enables IAM (the canonical
kms:*to the account root statement). With that statement present, IAM grants behave normally. - Grants — programmatic, temporary, fine-grained delegations, ideal for AWS services and short-lived workloads.
The three layers side by side, because choosing the wrong instrument is the most common architectural error:
| Mechanism | Granularity | Lifetime | Who edits it | Best for | Limit / gotcha |
|---|---|---|---|---|---|
| Key policy | Per principal + condition | Until you edit it | Key admins | The authoritative allow-list; IAM delegation | Omit IAM delegation → IAM ignored → lockout risk |
| IAM policy | Per principal, per action | Until you edit it | IAM admins | Account-internal access at scale | Only works if key policy delegates to IAM |
| Grant | Per grantee + operations + constraints | Until retired/revoked | Anyone with CreateGrant |
AWS services, short-lived jobs | ~50k grants/key; can be orphaned; eventually consistent |
The minimum sane key policy delegates administration and usage to IAM rather than hard-coding every principal:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EnableIAMRoot",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::111122223333:root" },
"Action": "kms:*",
"Resource": "*"
},
{
"Sid": "KeyUsers",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::111122223333:role/payments-app" },
"Action": ["kms:Encrypt", "kms:Decrypt", "kms:GenerateDataKey*", "kms:DescribeKey"],
"Resource": "*",
"Condition": {
"StringEquals": { "kms:EncryptionContext:tenant": "acme" }
}
}
]
}
The
EnableIAMRootstatement is not a backdoor to root credentials — it delegates the decision to IAM in this account. Omit it and you must enumerate every principal in the key policy forever, including the admins who could otherwise fix a lockout. That is the classic way teams brick a key.
The KMS actions you will actually write into policies, grouped by what they let a principal do — least privilege means granting only the row you need:
| Action | What it permits | Typical grantee | Risk if over-granted |
|---|---|---|---|
kms:Encrypt |
Encrypt ≤4 KB directly | Apps writing tiny secrets | Low (encrypt-only is benign) |
kms:Decrypt |
Unwrap data keys / decrypt | Read paths | High — this is “read the data” |
kms:GenerateDataKey* |
Mint data keys for envelope encryption | Write paths | Medium (enables new encryption) |
kms:ReEncrypt* |
Re-wrap ciphertext under another key | Migration jobs | Medium (can move data between keys) |
kms:DescribeKey |
Read key metadata | Almost everything | Low |
kms:CreateGrant |
Delegate to another principal | Services, admins | High — can broaden access |
kms:PutKeyPolicy |
Replace the key policy | Key admins only | Critical — can grant anyone |
kms:ScheduleKeyDeletion |
Schedule destruction of the key | Break-glass only | Critical — can destroy data |
kms:Sign / kms:Verify |
Asymmetric sign/verify | Signing services | Medium |
kms:GenerateMac / kms:VerifyMac |
HMAC operations | Token services | Medium |
When to reach for grants
Grants shine where a static policy is wrong. They are issued via API, carry their own constraints, and can be retired the instant the work is done:
aws kms create-grant \
--key-id <key-id> \
--grantee-principal arn:aws:iam::111122223333:role/batch-worker \
--operations Decrypt GenerateDataKey \
--constraints EncryptionContextSubset={tenant=acme} \
--retiring-principal arn:aws:iam::111122223333:role/grant-manager
This is exactly the mechanism AWS services use on your behalf: when you attach a CMK to an Auto Scaling group or an encrypted EBS volume, the service creates a grant so it can mint data keys without you widening the key policy. Prefer grants over key-policy edits for service integrations and transient access — they are revocable and don’t bloat the resource policy. The grant constraints and lifecycle, which trip teams up under eventual consistency:
| Grant aspect | Detail | Gotcha |
|---|---|---|
Operations |
Explicit list (Decrypt, GenerateDataKey…) |
Grant only what the workload needs |
EncryptionContextSubset |
Grantee must include these context pairs | Looser than Equals; allows extra context |
EncryptionContextEquals |
Exact context match required | Strictest; use for tight scoping |
| Retiring principal | Who can retire the grant | Set it, or you can only revoke as admin |
GrantToken |
Returned token bridges eventual consistency | Pass it on immediate use or get transient denies |
| Revoke vs retire | Admin revokes; grantee/retirer retires | Both remove it; revoke is the admin lever |
| Limit | ~50,000 grants per key | Orphaned service grants accumulate — audit them |
5. Cross-account encryption: S3, EBS, and snapshots
Sharing encrypted data across accounts is a two-sided handshake: the key policy in the owning account must allow the foreign principal, and that principal’s IAM policy in their own account must allow the KMS actions. Missing either side fails closed — and the failure is a generic AccessDenied, not a helpful “the other side didn’t grant you.”
Key policy in the owning account (111122223333):
{
"Sid": "AllowConsumerAccount",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::444455556666:root" },
"Action": ["kms:Decrypt", "kms:DescribeKey", "kms:GenerateDataKey"],
"Resource": "*",
"Condition": { "StringEquals": { "aws:PrincipalOrgID": "o-exampleorgid" } }
}
IAM policy in the consumer account (444455556666) — note you must name the full key ARN, because the key lives in another account:
{
"Effect": "Allow",
"Action": ["kms:Decrypt", "kms:DescribeKey"],
"Resource": "arn:aws:kms:eu-west-1:111122223333:key/<key-id>"
}
The two-sided requirement is the whole game. Which side does what:
| Side | What it must allow | Reference style | If missing |
|---|---|---|---|
| Owning account (key policy) | The foreign principal/root + actions | Principal: {AWS: arn:...:444455556666:root} |
AccessDenied — owner didn’t share |
| Consumer account (IAM) | The KMS actions on the full key ARN | Resource: arn:aws:kms:...:111122223333:key/... |
AccessDenied — consumer didn’t grant |
| Both (recommended) | Scope with aws:PrincipalOrgID |
Condition on the key policy | Org-wide blast radius if * |
Service-specific edges that matter, because each service layers its own access surface on top of the KMS handshake:
| Service | Extra surfaces beyond key policy + IAM | Cross-account requirement | Cost lever |
|---|---|---|---|
| S3 (SSE-KMS) | Bucket policy + object ownership | Bucket policy, key policy, reader IAM all align | S3 Bucket Keys — collapse per-object calls |
| EBS / snapshots | Snapshot share + grant on the key | CMK (not AWS-managed); target gets Decrypt + CreateGrant |
Re-encrypt to target’s key on copy |
| RDS / Aurora | Snapshot share + KMS share | Share snapshot AND the CMK; target re-encrypts on copy | Storage-level SSE; few KMS calls |
| DynamoDB | Table encryption setting | MRK for cross-Region global tables | Table-level key; low call volume |
| Secrets Manager | Resource policy on the secret | Share secret + grant Decrypt on its CMK |
One CMK for many secrets |
Two edges deserve calling out explicitly. For S3 with SSE-KMS, cross-account reads need the bucket policy, the object’s KMS key policy, and the reader’s IAM all aligned; set the bucket’s default encryption to your CMK and enable S3 Bucket Keys — it caches a bucket-level data key and collapses thousands of per-object KMS calls into a handful, cutting both cost and quota pressure dramatically. For EBS / snapshots, you cannot share a snapshot encrypted with an AWS-managed key cross-account — full stop. Use a CMK, share the snapshot, grant the target account Decrypt + CreateGrant on the key, and the target re-encrypts to their key on copy. This is why production fleets standardize on CMKs for EBS even when defaults would “work.”
# Enable S3 Bucket Keys on a bucket's default SSE-KMS — the single biggest KMS-call reducer
aws s3api put-bucket-encryption --bucket payments-data \
--server-side-encryption-configuration '{
"Rules": [{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "aws:kms",
"KMSMasterKeyID": "arn:aws:kms:eu-west-1:111122223333:key/<key-id>"
},
"BucketKeyEnabled": true
}]
}'
6. Key rotation: automatic vs manual
Automatic rotation is the default answer. Enable it and KMS generates new cryptographic material on a schedule (the default is yearly; the rotation period is now configurable down to ~90 days), retaining all prior material so old ciphertext stays decryptable. The key ID, ARN, and policy never change — rotation is invisible to applications.
aws kms enable-key-rotation \
--key-id <key-id> \
--rotation-period-in-days 180
aws kms get-key-rotation-status --key-id <key-id>
Crucial subtlety: automatic rotation rotates the KMS key material, but it does not re-encrypt your existing data or your stored data keys. Old envelopes are unwrapped with retained old material; new writes use new material. If your compliance regime demands that data actually be re-wrapped under fresh material, that is a separate, application-driven re-encryption job, not something rotation does for you:
# Conceptual re-encrypt: ReEncrypt swaps the wrapping key without exposing plaintext.
# Plaintext never returns to your process — KMS decrypts and re-wraps internally.
new_blob = kms.re_encrypt(
CiphertextBlob=old_wrapped_blob,
SourceKeyId="alias/payments-prod-old",
DestinationKeyId="alias/payments-prod-new",
DestinationEncryptionContext={"tenant": "acme"},
)["CiphertextBlob"]
Manual rotation (a brand-new key behind the same alias) is for cases automatic rotation can’t cover: changing key spec, moving to a new Region/account, or responding to suspected compromise where you must invalidate old material. You repoint the alias and run a re-encryption backfill. Asymmetric and HMAC keys historically required this; symmetric keys rarely do. The two models compared:
| Aspect | Automatic rotation | Manual rotation (new key + alias) |
|---|---|---|
| What changes | Backing material only | A whole new key (new key ID/ARN) |
| Key ID / ARN | Unchanged | New — alias repointed |
| App impact | Invisible | Repoint alias; backfill old data |
| Re-encrypts old data? | No (old material retained) | Only if you run a ReEncrypt backfill |
| Changes key spec? | No | Yes (the reason to do it manually) |
| Cadence | 90–2560 days, configurable | Whenever you decide |
| Cost | Slightly more material stored | New key (~$1/mo) + ReEncrypt requests |
| Use it for | The default, compliance “rotate yearly” | Spec change, compromise, Region/account move |
What rotation does and does not do — the table that prevents a false sense of security:
| Statement about rotation | True? | Why it matters |
|---|---|---|
| New writes use fresh material | Yes | The point of rotation |
| Old ciphertext stays decryptable | Yes | Prior material is retained |
| Existing data is re-wrapped under new material | No | Needs a separate ReEncrypt job |
| Stored data keys are refreshed | No | They’re wrapped under retained material |
| The key ARN/ID changes | No | Apps and references are unaffected |
| Replica MRKs rotate too | Yes (from the primary) | Don’t rotate replicas separately |
| Compliance “data must be re-encrypted” is satisfied | Not by rotation alone | Run an application-driven backfill |
7. Auditing and controls
Every KMS API call lands in CloudTrail — including Decrypt, with the encryptionContext, the calling principal, and the source IP. This is the highest-signal security telemetry in the account; treat unexpected Decrypt calls as you would unexpected AssumeRole.
Tighten policies with condition keys so a key is usable only in the intended context:
{
"Sid": "OnlyViaS3FromOrg",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::111122223333:role/payments-app" },
"Action": ["kms:Decrypt", "kms:GenerateDataKey"],
"Resource": "*",
"Condition": {
"StringEquals": {
"kms:ViaService": "s3.eu-west-1.amazonaws.com",
"aws:PrincipalOrgID": "o-exampleorgid"
}
}
}
kms:ViaService pins usage to a specific AWS service (the key can only be used through S3, not by a human running aws kms decrypt). aws:PrincipalOrgID confines cross-account use to your organization. For ABAC, gate Decrypt on tag parity between the principal and the key with aws:ResourceTag / aws:PrincipalTag so access scales with tags instead of hand-written ARNs:
"Condition": {
"StringEquals": {
"aws:ResourceTag/project": "${aws:PrincipalTag/project}"
}
}
The condition keys that turn a wide key policy into a tightly-scoped one — the most useful security lever in KMS:
| Condition key | Pins the key to… | Example value | Effect |
|---|---|---|---|
kms:ViaService |
One AWS service only | s3.eu-west-1.amazonaws.com |
Blocks a human running aws kms decrypt directly |
aws:PrincipalOrgID |
Your organization | o-exampleorgid |
Confines cross-account use to the org |
kms:EncryptionContext:<k> |
A specific context value | tenant = acme |
“Decrypt only acme’s data” |
kms:EncryptionContextKeys |
Required context keys present | ["tenant"] |
Force callers to supply context |
aws:SourceVpce |
A specific VPC endpoint | vpce-0abc... |
Only via the KMS interface endpoint |
aws:PrincipalTag / aws:ResourceTag |
Tag parity (ABAC) | project match |
Access scales with tags, not ARNs |
kms:GrantIsForAWSResource |
AWS-service-created grants only | true |
Limit who can create grants |
kms:CallerAccount |
A specific account | 444455556666 |
Scope cross-account precisely |
What to watch for in CloudTrail, and what each signal usually means:
| CloudTrail signal | What it suggests | Action |
|---|---|---|
Decrypt with no kms:ViaService |
A human/role called KMS directly | Investigate; pin the key to a service if unexpected |
Decrypt with mismatched encryptionContext |
Wrong tenant/purpose, or probing | Alarm like an unexpected AssumeRole |
Decrypt from an unexpected sourceIPAddress |
Off-network access | Correlate with VPC endpoint / aws:SourceVpce |
Spike in GenerateDataKey count |
Per-object KMS storm | Enable Bucket Keys / caching; check throttling |
PutKeyPolicy / ScheduleKeyDeletion |
Sensitive admin change | High-priority alert; should be rare and reviewed |
Decrypt AccessDenied bursts |
Misconfig or probing | Check policy delegation / context / cross-account |
8. Cost and request-quota management
KMS pricing has two parts: roughly $1 per CMK per month (replicas billed separately), and per-request charges with a shared, Region-level request rate quota for cryptographic operations. Symmetric Decrypt/GenerateDataKey/Encrypt share a quota (a few tens of thousands of requests/second depending on Region and key type). A hot path that calls KMS per object will hit ThrottlingException long before you expect.
The levers, in order of impact:
| Lever | Effort | KMS-call reduction | Cost impact | When to use |
|---|---|---|---|---|
| S3 Bucket Keys | One toggle | Drastic (thousands → ~one/bucket) | Lowers bill + quota pressure | Every SSE-KMS bucket, always |
| Data-key caching | Code (Encryption SDK) | Large within the window | Lowers bill | High-volume app encryption |
| Quota increase | Service Quotas request | None (raises ceiling) | None directly | Before a launch, proactively |
| Backoff + jitter | Verify SDK retry config | None (smooths spikes) | None | Always — survive transient throttles |
| Fewer, broader CMKs | Design | Indirect | Fewer $1/mo keys | Don’t over-fragment keys |
| Right key choice | Design | Indirect | Asymmetric/ECC ops cost more | Use symmetric unless you need asymmetric |
The cost trap is never the $1/month per key. It is millions of unbatched
Decryptcalls from a service that should have been using Bucket Keys or a data-key cache. Architect the call volume down first; optimize the bill second.
The throttling-relevant limits and what each means in practice:
| Limit / quota | Rough value | Shared across | What hitting it looks like | Mitigation |
|---|---|---|---|---|
| Symmetric crypto request rate | Tens of thousands rps / Region | Encrypt/Decrypt/GenerateDataKey |
ThrottlingException under load |
Bucket Keys, caching, quota raise |
GenerateDataKey (asymmetric) |
Lower than symmetric | Asymmetric ops pool | Throttling on heavy asymmetric use | Cache; prefer symmetric where possible |
Direct Encrypt/Decrypt payload |
≤ 4 KB | per request | Validation error on large input |
Envelope-encrypt instead |
| Grants per key | ~50,000 | per key | LimitExceeded on CreateGrant |
Audit/retire orphaned grants |
| Aliases per key / account | Bounded | per account | LimitExceeded on CreateAlias |
Reuse aliases; clean up |
| Keys per account/Region | Soft limit (raisable) | per Region | LimitExceeded on CreateKey |
Consolidate; request increase |
Error and limit reference
The KMS errors you will actually see, what they mean, how to confirm, and the fix. The non-obvious ones are AccessDeniedException when IAM looks fine (the key policy didn’t delegate), InvalidCiphertextException from a mismatched encryption context, and KMSInvalidStateException on a key pending deletion or disabled:
| Error / exception | Meaning | Likely cause | How to confirm | Fix |
|---|---|---|---|---|
AccessDeniedException |
Caller not authorized | Key policy lacks IAM delegation, or no allow statement | get-key-policy (no EnableIAMRoot?); simulate-principal-policy |
Add IAM delegation; grant the action/condition |
AccessDeniedException (cross-acct) |
One side of handshake missing | Key policy or consumer IAM doesn’t allow | Check both key policy and consumer IAM (full ARN) | Align both sides; scope with PrincipalOrgID |
InvalidCiphertextException |
Ciphertext/context invalid | Wrong/missing encryption context, or corrupted blob | Compare encrypt vs decrypt context byte-for-byte | Pass the exact same AAD map |
ThrottlingException |
Request-rate quota exceeded | Per-object KMS storm | CloudTrail call count; Service Quotas usage | Bucket Keys, caching; raise quota; backoff |
KMSInvalidStateException |
Key in a bad state for the op | Key disabled or PendingDeletion |
describe-key → KeyState |
Enable the key; cancel deletion |
NotFoundException |
Key/alias not found | Wrong Region, wrong ID, deleted | describe-key in the right Region |
Use correct ARN/Region; recreate alias |
DisabledException |
Key is disabled | Someone disabled it | describe-key → Enabled:false |
enable-key if intended |
KMSInvalidSignatureException |
Signature verify failed | Wrong key/algorithm or tampered data | Check signing key + SigningAlgorithm |
Use the correct key/algorithm |
IncorrectKeyException |
Wrong CMK for this ciphertext | Decrypting with the wrong key | Match key ARN in the ciphertext metadata | Use the key that produced the ciphertext |
LimitExceededException |
A KMS limit hit | Too many grants/aliases/keys | list-grants / list-aliases counts |
Retire/clean up; request a limit increase |
DependencyTimeoutException |
KMS internal timeout | Transient | Retry pattern in logs | Exponential backoff + jitter |
MalformedPolicyDocumentException |
Bad key-policy JSON | Syntax/principal error in PutKeyPolicy |
Validate the policy document | Fix JSON; ensure principal exists |
The key states a CMK moves through, because several errors above are just “wrong state for this operation”:
| Key state | Can encrypt/decrypt? | How you got here | How to leave |
|---|---|---|---|
Enabled |
Yes | Normal | — |
Disabled |
No | disable-key |
enable-key |
PendingDeletion |
No | schedule-key-deletion (7–30 day window) |
cancel-key-deletion |
PendingImport |
No | External key material, not yet imported | Import material |
Creating / Updating |
Transient | Replication / external import in progress | Wait |
Unavailable |
No | Custom key store disconnected | Reconnect the key store |
Architecture at a glance
The diagram traces the write-then-fail-over path the way bytes actually move, then maps the five places authorization or availability breaks. Read it left to right. A workload needs at-rest crypto, so it calls GenerateDataKey (via the AWS Encryption SDK, with key commitment on and a bounded data-key cache). That call passes through the three-layer authorization stack — the key policy (root of trust, with the EnableIAMRoot delegation), then IAM and grants (effective only because the key policy delegated, scoped by kms:ViaService), then the encryption context check (the AAD that must match byte-for-byte at decrypt). Only if every layer allows does the request reach the multi-Region CMK: an mrk-... primary in eu-west-1 whose material is replicated — not re-encrypted — to a decrypt-only replica in eu-central-1. KMS returns the data key plaintext-and-wrapped; the app encrypts gigabytes locally with AES-256-GCM, zeroes the plaintext, and stores only the ciphertext and the wrapped blob in S3 (Bucket Keys on) or behind a DynamoDB global table that replicates the ciphertext to the second Region.
Notice the two things the diagram is built to teach. First, the dashed replicate-material arrow looping the primary to the replica is the whole DR story: it carries key material, so the same envelope decrypts in either Region — which is exactly why a single-Region key behind a global table produces “replicated-but-unreadable” data (badge 3). Second, every path converges on CloudTrail (every Decrypt, with principal and context) and a quota guard (watching ThrottlingException and request-rate headroom) — the audit-and-protect footer. The five numbered badges sit on the precise hops where it breaks: the key policy locking you out (1), a wrong encryption context failing closed (2), replicated-but-unreadable DR (3), a per-object KMS storm and throttling (4), and an unexpected Decrypt that signals an authorization gap (5). The legend narrates each as symptom, confirm command, and fix.
Real-world scenario
Northwind Pay, a fictional but representative pan-European payments platform, ran active-active in eu-west-1 and eu-central-1 behind a DynamoDB global table, with the application performing client-side field-level encryption on PAN (card number) data before writing. They had used a standard, Region-scoped CMK in eu-west-1. The global table dutifully replicated the ciphertext to eu-central-1 — about 240 million encrypted items, growing 3 million a day. The platform team was six engineers; the KMS bill was a rounding error and nobody had thought hard about it.
The first incident was a planned regional failover game-day. At 10:00 they drained eu-west-1; by 10:02 the eu-central-1 application could not decrypt a single replicated record. Every read of PAN data threw AccessDeniedException / NotFoundException from KMS. The ciphertext was a KMS envelope bound to a key that only existed in eu-west-1, and kms:Decrypt in eu-central-1 had no key to call. Replicated-but-unreadable data is the worst kind of DR failure: it looks perfectly healthy — the bytes are there, the table is green — until you cut over and discover the ability to read them never replicated. The game-day was aborted; the DR posture was, on paper, a lie.
The fix was a multi-Region key, and critically, a backfill — adopting an MRK does nothing for the 240 million records already wrapped under the old single-Region key. They created an MRK primary in eu-west-1, replicated it into eu-central-1, repointed the application’s key provider, and ran a controlled ReEncrypt migration (in batches, throttle-aware, idempotent on a per-item version flag) over the existing items so every envelope was re-wrapped under the portable key. The backfill ran for nine days at a deliberately capped rate to stay well under the request-rate quota. They also tightened each replica’s policy independently: the eu-central-1 replica granted the app Decrypt only, while writes (and thus GenerateDataKey) stayed pinned to the active primary, so a failover couldn’t silently start minting keys in the standby Region.
# Primary in the active Region
aws kms create-key --multi-region --region eu-west-1 \
--description "pan-field-encryption primary"
# Replica in the standby Region (independent, decrypt-only policy attached after)
aws kms replicate-key \
--key-id mrk-0a1b2c3d4e5f6a7b8 \
--replica-region eu-central-1
A second, quieter incident surfaced during the same project. With the global table now genuinely portable, traffic doubled into both Regions and the payments service began throwing intermittent ThrottlingException on GenerateDataKey during the evening peak. The cause: client-side field encryption was calling GenerateDataKey per item, and at peak that exceeded the Region’s request-rate quota. They fixed it two ways: added data-key caching in the Encryption SDK (keyed per tenant, max_age=120s, max_messages_encrypted=500), which cut KMS calls by ~98%, and proactively raised the request-rate quota via Service Quotas ahead of the next quarter’s launch. The S3 archive of settlement files, separately, had Bucket Keys enabled in the same sweep — it had been making a GenerateDataKey call per object and was the single largest line on the (newly noticed) KMS bill.
The lesson the team wrote into their standards, in three lines: if a ciphertext can travel between Regions, the key that protects it must travel too — and you must re-encrypt the data that predates that decision. Replication moves bytes; it does not move the ability to read them. And: architect the KMS call volume down before you scale up the quota. The incident timeline, because the order of discovery is the lesson:
| Time / phase | Symptom | Action taken | Effect | What it should have been |
|---|---|---|---|---|
| Game-day 10:02 | DR Region can’t decrypt anything | Abort failover | Outage avoided in prod, DR exposed | Pre-test decrypt in DR, not just replication |
| Day 0 | Root cause: single-Region key | describe-key → MultiRegion: false |
Diagnosis confirmed | — |
| Day 0–1 | Plan the fix | Create MRK primary + replica | Key now portable | Standardize MRK for replicated data from day one |
| Day 1–10 | 240M legacy items still bound to old key | ReEncrypt backfill, throttle-capped |
Every envelope re-wrapped | Backfill is mandatory, not optional |
| Day 5 | ThrottlingException at peak |
Per-item GenerateDataKey found |
Diagnosed quota pressure | Cache data keys from the start |
| Day 6 | Crush the call volume | Data-key caching + S3 Bucket Keys | KMS calls −98% | Bucket Keys on every SSE-KMS bucket |
| Day 7 | Pre-empt the next launch | Service Quotas request-rate increase | Headroom secured | Raise quota before incidents |
| +1 month | Re-run game-day | Failover with portable key + scoped replicas | Clean cutover, decrypt works | The DR posture is now real |
Advantages and disadvantages
KMS’s design — keys that never leave the HSM, a resource policy as the root of trust, envelope encryption as the pattern — is what makes it both powerful and full of sharp edges. Weigh it honestly:
| Advantages (why this model helps you) | Disadvantages (why it bites) |
|---|---|
| Key material never leaves FIPS 140-3 HSMs; you can’t accidentally export or leak it | You can’t hold the key either — every decrypt is a network call against a quota |
| The key policy is an authoritative, auditable allow-list independent of IAM sprawl | A key policy without IAM delegation makes IAM ignored — the classic lockout |
| Envelope encryption keeps bulk plaintext out of KMS — fast, cheap, quota-friendly | You must implement the envelope correctly (use the SDK) or you’ll roll a vuln |
| Encryption context gives free, logged, byte-bound authorization and audit | A context mismatch fails closed with an unhelpful InvalidCiphertext |
| Multi-Region keys make ciphertext portable for real cross-Region DR | They weaken isolation by design, and don’t re-encrypt your legacy data |
| Grants delegate to services and short jobs without bloating the key policy | Orphaned grants accumulate (50k/key) and are eventually consistent |
Every Decrypt is in CloudTrail with principal + context — top-tier telemetry |
High-volume per-object calls flood CloudTrail and throttle before you expect |
| Automatic rotation is invisible and keeps old ciphertext readable | Rotation does not re-wrap stored data — compliance needs a separate job |
The model is right whenever you need provable, auditable, centrally-controlled key management — which is almost every regulated or multi-account workload. It bites hardest on teams that treat the key policy like an IAM afterthought (lockouts), that call KMS per object (throttling, cost), that assume rotation re-encrypts data (compliance gap), or that replicate ciphertext cross-Region without making the key portable (the Northwind failure). Every disadvantage is manageable — but only if you know it exists, which is the entire point of this article.
Hands-on lab
Prove envelope encryption, multi-Region portability, encryption-context enforcement, and the key-policy-is-root-of-trust behaviour end to end — all in the AWS Free Tier shape (a CMK is ~$1/month prorated; one MRK replica adds a second; delete at the end and the cost is pennies). Run in CloudShell with credentials that can manage KMS. You will create a multi-Region key, round-trip a data key, prove cross-Region decrypt, and prove that the wrong encryption context fails closed.
Step 1 — Variables.
PRIMARY_REGION=eu-west-1
DR_REGION=eu-central-1
ACCT=$(aws sts get-caller-identity --query Account --output text)
echo "account=$ACCT primary=$PRIMARY_REGION dr=$DR_REGION"
Step 2 — Create a multi-Region primary key and an alias.
KEY_ID=$(aws kms create-key --multi-region --region $PRIMARY_REGION \
--description "kms-lab mrk primary" \
--query KeyMetadata.KeyId --output text)
aws kms create-alias --region $PRIMARY_REGION \
--alias-name alias/kms-lab --target-key-id $KEY_ID
echo "primary key: $KEY_ID"
Expected: a KeyId beginning mrk-. Confirm it is multi-Region:
aws kms describe-key --region $PRIMARY_REGION --key-id alias/kms-lab \
--query 'KeyMetadata.{MR:MultiRegion, State:KeyState}'
# -> "MR": true, "State": "Enabled"
Step 3 — Enable rotation on a 180-day cadence.
aws kms enable-key-rotation --region $PRIMARY_REGION \
--key-id $KEY_ID --rotation-period-in-days 180
aws kms get-key-rotation-status --region $PRIMARY_REGION --key-id $KEY_ID
# -> "KeyRotationEnabled": true, "RotationPeriodInDays": 180
Step 4 — Replicate into the DR Region with a (decrypt-only) policy.
aws kms replicate-key --region $PRIMARY_REGION \
--key-id $KEY_ID --replica-region $DR_REGION \
--description "kms-lab mrk replica"
aws kms create-alias --region $DR_REGION \
--alias-name alias/kms-lab --target-key-id $KEY_ID
aws kms describe-key --region $DR_REGION --key-id alias/kms-lab \
--query 'KeyMetadata.{MR:MultiRegion, State:KeyState}'
# -> "MR": true, "State": "Enabled" in the DR Region too
Step 5 — Envelope round-trip: mint a data key, then decrypt the wrapped blob.
WRAPPED=$(aws kms generate-data-key --region $PRIMARY_REGION \
--key-id alias/kms-lab --key-spec AES_256 \
--encryption-context tenant=acme \
--query CiphertextBlob --output text)
# Decrypt the wrapped data key back, in the PRIMARY Region (must supply same context)
aws kms decrypt --region $PRIMARY_REGION \
--ciphertext-blob fileb://<(echo "$WRAPPED" | base64 --decode) \
--encryption-context tenant=acme \
--query KeyId --output text
# -> returns the key ARN, proving decrypt authorization + correct context
Step 6 — Prove cross-Region portability (the DR claim). Decrypt the same wrapped blob by calling KMS in the DR Region:
aws kms decrypt --region $DR_REGION \
--ciphertext-blob fileb://<(echo "$WRAPPED" | base64 --decode) \
--encryption-context tenant=acme \
--query KeyId --output text
# -> returns the DR-Region key ARN: the multi-Region key made the envelope portable
Step 7 — Prove encryption context fails closed. Decrypt with the wrong context — it must error:
aws kms decrypt --region $PRIMARY_REGION \
--ciphertext-blob fileb://<(echo "$WRAPPED" | base64 --decode) \
--encryption-context tenant=wrong 2>&1 | grep -i "InvalidCiphertext\|AccessDenied" \
&& echo "GOOD: wrong context correctly rejected"
Validation checklist. You created a multi-Region key, enabled rotation, replicated it, round-tripped a data key, decrypted the same envelope in two Regions (portability), and proved that the wrong encryption context fails closed. The lab steps mapped to what each proves:
| Step | What you did | What it proves | Real-world analogue |
|---|---|---|---|
| 2 | Create MRK primary | Key is multi-Region (mrk-, MultiRegion:true) |
Standardizing portable keys for replicated data |
| 3 | Enable rotation | Rotation is on and on your cadence | Compliance “rotate yearly” baseline |
| 4 | Replicate to DR | Replica shares material, own policy | DR Region gets decrypt-only scope |
| 5 | Data-key round-trip | Envelope encryption + correct context decrypts | Every app write/read path |
| 6 | Cross-Region decrypt | Same envelope decrypts in DR Region | The DR claim, actually tested |
| 7 | Wrong context rejected | Context is enforced authorization | “Decrypt only acme’s data” |
Teardown (avoid the ~$1/key/month charge). A replica must be deleted; deleting an MRK has a scheduling window:
# Schedule deletion in BOTH Regions (7-day minimum window), then remove aliases
aws kms delete-alias --region $DR_REGION --alias-name alias/kms-lab
aws kms schedule-key-deletion --region $DR_REGION --key-id $KEY_ID --pending-window-in-days 7
aws kms delete-alias --region $PRIMARY_REGION --alias-name alias/kms-lab
aws kms schedule-key-deletion --region $PRIMARY_REGION --key-id $KEY_ID --pending-window-in-days 7
Cost note. Two keys for a few days, scheduled for deletion at the 7-day minimum, costs a fraction of the ~$1/key/month — well under ₹100 total. Crypto requests in this lab are a handful and effectively free.
Common mistakes & troubleshooting
This is the playbook — the part you bookmark. First a scannable table you can read mid-incident, then the full reasoning for the entries that bite hardest. Every row is symptom → root cause → confirm (exact command) → fix.
| # | Symptom | Root cause | Confirm (exact cmd) | Fix |
|---|---|---|---|---|
| 1 | DR Region can’t decrypt replicated data after failover | Single-Region CMK behind a cross-Region replicated store | aws kms describe-key --key-id <id> --query KeyMetadata.MultiRegion = false |
Adopt an MRK; replicate; ReEncrypt the backlog |
| 2 | AccessDeniedException on Decrypt though IAM clearly allows it |
Key policy omits IAM delegation → IAM ignored | aws kms get-key-policy — no EnableIAMRoot statement |
Add kms:* to account-root statement |
| 3 | InvalidCiphertextException on a blob that decrypted before |
Encryption context mismatch at decrypt | Diff the encrypt-time vs decrypt-time --encryption-context |
Pass the exact same AAD map, byte-for-byte |
| 4 | ThrottlingException on a hot path; surprising KMS bill |
Per-object GenerateDataKey/Decrypt |
CloudTrail event count per minute; Service Quotas usage | S3 Bucket Keys + data-key caching; raise quota |
| 5 | Cross-account read fails with generic AccessDenied |
Only one side of the handshake granted | Check key policy AND consumer IAM (full key ARN) | Align both sides; scope with aws:PrincipalOrgID |
| 6 | Can’t share an encrypted EBS snapshot cross-account | Snapshot encrypted with an AWS-managed key | describe-snapshots KMS key is aws/ebs |
Re-encrypt to a CMK; share + grant Decrypt+CreateGrant |
| 7 | Rotated the key but auditors say “data not re-encrypted” | Assumed rotation re-wraps stored data (it doesn’t) | get-key-rotation-status on, but no backfill ran |
Run an application-driven ReEncrypt job |
| 8 | KMSInvalidStateException on every operation |
Key is disabled or PendingDeletion |
aws kms describe-key --query KeyMetadata.KeyState |
enable-key / cancel-key-deletion |
| 9 | Service (ASG/EBS) can’t mint data keys | Service grant missing or revoked | aws kms list-grants --key-id <id> for the service principal |
Let the service recreate the grant; don’t revoke service grants |
| 10 | Unexpected Decrypt from a human running aws kms decrypt |
Key not pinned to a service | CloudTrail event has no kms:ViaService |
Add kms:ViaService condition; alarm on direct decrypt |
| 11 | IncorrectKeyException decrypting an old object |
Decrypting with the wrong CMK (alias repointed) | Compare the key ARN in the ciphertext vs the alias target | Use the key that produced the ciphertext; keep old key alive |
| 12 | Deleted a key and lost data | ScheduleKeyDeletion ran; material gone after window |
CloudTrail ScheduleKeyDeletion; key PendingDeletion/gone |
cancel-key-deletion within the window; use long windows |
| 13 | Asymmetric Verify fails on a valid signature |
Wrong SigningAlgorithm or wrong public key |
Check SigningAlgorithm + the verifying key |
Match algorithm + key; re-download public key |
| 14 | Grant “works” then intermittently AccessDenied |
Grant eventual consistency, no GrantToken used |
Compare grant creation time vs first use | Pass the returned GrantToken on immediate use |
The expanded form, for the entries that cause the most lost hours:
1. The DR Region cannot decrypt replicated data after a failover.
Root cause: A single-Region CMK sits behind a cross-Region replicated store (DynamoDB global table, S3 CRR of SSE-KMS objects, cross-Region snapshot copies). The bytes replicate; the key does not.
Confirm: aws kms describe-key --key-id <id> --query 'KeyMetadata.{MR:MultiRegion}' returns false; the ciphertext’s key ARN names the source Region only.
Fix: Create a multi-Region key, replicate-key into the DR Region, repoint the app’s key provider, and run a ReEncrypt backfill over pre-existing ciphertext. Test decrypt in DR, not just replication, in every game-day.
2. AccessDeniedException on Decrypt even though the IAM policy plainly allows kms:Decrypt.
Root cause: The key policy omits the IAM-delegation statement, so IAM is ignored and the key policy is the only authority — and it doesn’t list this principal.
Confirm: aws kms get-key-policy --key-id <id> --policy-name default shows no kms:* to :root statement (no EnableIAMRoot).
Fix: Add the account-root delegation statement so IAM is honored; then the existing IAM allow works. This is the most common KMS lockout.
3. InvalidCiphertextException on a blob that decrypted yesterday.
Root cause: Encryption context mismatch. The context passed at decrypt differs (a key, a value, or a missing pair) from encrypt — it’s required byte-for-byte.
Confirm: Diff the --encryption-context (or SDK encryption_context) used at encrypt vs decrypt; check whether a kms:EncryptionContext: policy condition changed.
Fix: Pass the exact same AAD map. Treat context as required authorization, not optional metadata; store/propagate it alongside the ciphertext.
4. ThrottlingException on a hot path, and a four-figure KMS bill nobody expected.
Root cause: One KMS call per object — per-object GenerateDataKey on writes or Decrypt on reads — exceeding the Region’s shared request-rate quota and metering every call.
Confirm: CloudTrail shows one GenerateDataKey/Decrypt per object; Service Quotas → KMS shows you near the request-rate cap.
Fix: Enable S3 Bucket Keys (collapses thousands of calls to ~one per bucket), add data-key caching in the Encryption SDK, verify backoff+jitter, and raise the request-rate quota ahead of launches.
5. A cross-account read fails with a bare AccessDenied and no hint which side.
Root cause: The cross-account handshake is two-sided and one side is missing — either the owner’s key policy doesn’t allow the foreign principal, or the consumer’s IAM doesn’t allow the KMS action on the full key ARN.
Confirm: Read both the owning account’s key policy (get-key-policy) and the consumer’s IAM policy; the consumer must reference the full key ARN.
Fix: Grant both sides; scope the key policy with aws:PrincipalOrgID rather than a bare *.
6. You cannot share an encrypted EBS snapshot with another account.
Root cause: The snapshot is encrypted with the AWS-managed aws/ebs key, which cannot be shared cross-account — full stop.
Confirm: aws ec2 describe-snapshots --snapshot-ids <id> --query 'Snapshots[].KmsKeyId' shows the aws/ebs alias/key.
Fix: Copy the snapshot to one encrypted with a CMK, share the snapshot, grant the target account Decrypt + CreateGrant, and let the target re-encrypt to their key on copy.
7. The key was rotated, but a compliance review fails because “the data wasn’t re-encrypted.”
Root cause: A misunderstanding — automatic rotation rotates key material, not your stored ciphertext. Old envelopes are unwrapped with retained old material.
Confirm: get-key-rotation-status shows rotation enabled, but there’s no ReEncrypt job in your pipeline.
Fix: If the regime requires fresh material on existing data, run a separate, application-driven ReEncrypt backfill. Don’t claim rotation does it.
Best practices
- Every key that needs an editable policy, cross-account use, or DR portability is a customer-managed key. AWS-managed keys can’t be shared, can’t be policy-edited, and can’t be multi-Region.
- Always include the
EnableIAMRootdelegation in the key policy. Omitting it makes IAM ignored and is the classic way to brick a key and lock out the admins who could fix it. - Envelope-encrypt anything over 4 KB with the AWS Encryption SDK — never raw
Encrypton large payloads, and never roll your own framing. - Set a meaningful encryption context on every operation and enforce it in policy with
kms:EncryptionContext:conditions — it’s free, logged authorization and audit binding. - Bound your data-key cache (
max_age,max_messages_encrypted,max_bytes_encrypted) and key it per tenant/purpose — never leave it unbounded; tune the blast-radius-vs-throughput trade deliberately. - Use multi-Region keys wherever ciphertext crosses Regions, scope each replica’s policy independently (e.g. DR Region decrypt-only), and always pair the MRK adoption with a
ReEncryptbackfill of legacy data. - Prefer grants over key-policy edits for AWS services and short-lived jobs — they’re revocable, scoped, and don’t bloat the resource policy. Audit and retire orphaned grants.
- Enable S3 Bucket Keys on every SSE-KMS bucket — it’s the single biggest reducer of per-object KMS calls, cutting both cost and quota pressure.
- Pin keys with
kms:ViaServiceandaws:PrincipalOrgIDso a key is usable only through the intended service and only inside your org; treat direct humanDecryptas a red flag. - Enable automatic rotation on every long-lived symmetric key; handle compliance-driven re-encryption as a separate job, not an assumption about rotation.
- Use long key-deletion windows (close to 30 days) and require multi-party review for
ScheduleKeyDeletionandPutKeyPolicy— these are data-destroying / access-granting actions. - Monitor CloudTrail for unexpected
Decrypt,PutKeyPolicy, andScheduleKeyDeletion, and request request-rate quota increases before launches, not during incidents.
Security notes
- Least privilege on
kms:Decrypt.Decryptis “read the data” — grant it narrowly, gated by encryption context andkms:ViaService, never broadly to account root for usage. Encrypt-only access is benign; decrypt access is the crown jewel. - The key policy is your security boundary, not IAM. Treat the key policy as the authoritative allow-list, reviewed in PRs, with the IAM-delegation statement present but every usage grant scoped and conditioned.
- Encryption context for tenant isolation. Bind ciphertext to
tenant/purposeand constrain it in policy so one tenant’s role can never decrypt another tenant’s data — provable from CloudTrail, not just asserted. - Confine cross-account use to the org. Always add
aws:PrincipalOrgIDto cross-account key-policy statements; a barePrincipal: *or:rootwithout an org condition is a data-perimeter hole. - Protect the destructive actions.
ScheduleKeyDeletion,PutKeyPolicy,DisableKey, andCreateGrantcan destroy data or broaden access — restrict them to break-glass roles, alarm on them, and prefer long deletion windows. - Pin keys to services and endpoints.
kms:ViaServiceblocks a human runningaws kms decrypt;aws:SourceVpcepins use to your KMS interface endpoint so off-network calls fail. - MRK replicas weaken isolation deliberately — scope them. Identical material in two Regions is an availability trade; give each replica the narrowest policy that Region needs (often decrypt-only in DR) so a failover can’t start minting keys.
- Audit grants and rotation. Orphaned grants accumulate (50k/key) — list and retire them; verify rotation is actually on with
get-key-rotation-statusrather than assuming.
The security controls that also prevent the operational incidents — secure and resilient pull the same way here:
| Control | Mechanism | Secures against | Also prevents |
|---|---|---|---|
| IAM-delegation statement | kms:* to :root in key policy |
Permanent lockout | “AccessDenied though IAM allows” tickets |
kms:ViaService condition |
Pin key to one service | Direct human/role decrypt | Unexpected Decrypt audit noise |
| Encryption context + condition | AAD bound + kms:EncryptionContext: |
Cross-tenant decrypt | Silent wrong-context reuse |
aws:PrincipalOrgID |
Org-scoped cross-account | External-account access | Over-broad sharing blast radius |
| Decrypt-only replica policy | Narrow MRK replica key policy | Standby Region minting keys | Divergent failover behaviour |
| Long deletion window + review | --pending-window-in-days 30 |
Accidental/malicious key deletion | Unrecoverable data loss |
| S3 Bucket Keys | Bucket-level data-key caching | (cost/quota) | Per-object KMS storm + throttling |
Cost & sizing
KMS pricing has two components and one trap. The components: roughly $1 per CMK per month (each MRK replica is billed as its own key, so a primary + one replica ≈ $2/month), plus per-request charges for cryptographic operations (a small fee per 10,000 requests, with asymmetric and the heavier GenerateDataKeyPair operations costing more). The trap: the bill is never the $1/month per key — it is millions of unbatched Decrypt/GenerateDataKey calls from a service that should have used S3 Bucket Keys or a data-key cache.
- Architect the call volume down first. S3 Bucket Keys collapse thousands of per-object calls into roughly one per bucket; data-key caching reuses a data key across hundreds of messages. Both cut the request line of the bill by orders of magnitude and relieve the request-rate quota. Do this before optimizing anything else.
- CMK count is a rounding error; don’t over-fragment but don’t agonize. A few hundred keys is a few hundred dollars a month — trivial next to one throttled launch. Use enough keys for clean policy/blast-radius boundaries; don’t create a key per object.
- Multi-Region replicas double the per-key cost of that key and bill requests in each Region separately — worth it for genuine DR, wasteful if you don’t need portability.
- Asymmetric and HMAC operations cost more per request than symmetric; prefer symmetric envelope encryption unless you specifically need offline-encrypt, signing, or MACs.
- Free tier: KMS includes a modest monthly allotment of free crypto requests; the per-key monthly charge is not free. CloudHSM (a different service) is far more expensive — use KMS unless you have a single-tenant HSM mandate.
A rough monthly picture for a mid-size regulated workload — and what each line actually buys:
| Cost driver | What you pay for | Rough INR / month | What it buys | Watch-out |
|---|---|---|---|---|
| 10 CMKs (single-Region) | Per-key monthly charge | ~₹850 | Clean policy/blast-radius boundaries | Don’t fragment to a key per object |
| 1 MRK primary + 1 replica | Two keys, two Regions | ~₹170 | Cross-Region ciphertext portability (DR) | Replica is a second billed key |
| Crypto requests (with Bucket Keys + caching) | Per-10k requests, drastically reduced | ~₹400–1,500 | The actual encrypt/decrypt work | Without Bucket Keys this can be 50–100× |
| Crypto requests (naive, per-object) | Per-10k requests, unbatched | ~₹40,000+ | (same work, done wrong) | The classic surprise bill |
| Quota increase | Service Quotas request | Free | Headroom before a launch | Request early; approval isn’t instant |
| CloudHSM (if mandated) | Dedicated HSM hours | ~₹1,20,000+ | Single-tenant FIPS HSM | Only for hard HSM mandates |
The sizing rule in one line: the only KMS number that ever surprises a CFO is request volume. Enable Bucket Keys and caching, raise the quota proactively, and the bill stays in the low thousands of rupees; skip them and a per-object hot path turns a rounding error into a five-figure line.
Interview & exam questions
1. Why does KMS “never encrypt your data,” and what does it encrypt instead? KMS is a wrapping and authorization service, not a bulk cipher: key material lives in FIPS 140-3 HSMs and never leaves. For anything over 4 KB you call GenerateDataKey, which returns a data key both in plaintext and wrapped under your CMK; you encrypt the payload locally with the plaintext key, discard it, and store only the ciphertext plus the wrapped blob. KMS protects the key that protects the data — envelope encryption.
2. A Decrypt is denied even though the caller’s IAM policy clearly allows kms:Decrypt. Why, and how do you confirm? The key policy almost certainly omits the IAM-delegation statement (kms:* to the account root), so IAM is ignored and the key policy is the sole authority — and it doesn’t list this principal. Confirm with aws kms get-key-policy; the absence of an EnableIAMRoot-style statement is the smoking gun. Fix by adding the delegation so IAM is honored.
3. A DynamoDB global table replicates client-side-encrypted data to a second Region, but the DR Region can’t decrypt it. What happened and how do you fix it? The data was encrypted under a single-Region CMK; replication moved the ciphertext but not the key, so kms:Decrypt in the DR Region has no key to call. Fix with a multi-Region key (replicate it into the DR Region) and a ReEncrypt backfill of the pre-existing ciphertext — adopting the MRK alone does nothing for data already wrapped under the old key.
4. Explain the three-layer KMS authorization model and which layer is authoritative. The key policy (resource policy on the key) is the root of trust and is authoritative; IAM policies are effective only if the key policy delegates to IAM; grants are programmatic, temporary, fine-grained delegations for services and short-lived jobs. Unlike S3, IAM alone cannot grant KMS access — the key policy must enable it.
5. What is encryption context, and what is it good for? It’s additional authenticated data (AAD): not secret, not encrypted, but bound to the ciphertext, required byte-for-byte at decrypt, and logged in CloudTrail. It’s the cheapest authorization and audit tool in KMS — constrain it with kms:EncryptionContext: conditions to say “this role can decrypt only tenant=acme data,” and read it in CloudTrail to see exactly what was decrypted.
6. Does automatic key rotation re-encrypt your existing data? If not, what does? No — automatic rotation generates new key material and retains the old, so new writes use fresh material while old ciphertext is still unwrapped with retained material; your stored data and data keys are untouched. To actually re-wrap existing data (a compliance requirement in some regimes) you run a separate, application-driven ReEncrypt backfill.
7. How do you share an encrypted EBS snapshot across accounts, and why won’t the default key work? You cannot share a snapshot encrypted with the AWS-managed aws/ebs key — it’s not shareable cross-account. Use a customer-managed key, share the snapshot, grant the target account Decrypt + CreateGrant on the key, and the target re-encrypts to their key on copy. This is why production fleets standardize on CMKs for EBS.
8. A hot path is throwing ThrottlingException and the KMS bill spiked. Diagnose and fix. The path is making one KMS call per object (GenerateDataKey/Decrypt), exceeding the Region’s shared request-rate quota and metering every call. Confirm via CloudTrail event counts and Service Quotas usage. Fix by enabling S3 Bucket Keys, adding data-key caching in the Encryption SDK, verifying backoff+jitter, and raising the request-rate quota ahead of launches.
9. When do you reach for a grant instead of editing the key policy? For AWS services and short-lived workloads — a grant is issued by API, carries its own operation list and encryption-context constraints, is retirable the instant the work is done, and doesn’t bloat the resource policy. It’s exactly how services like Auto Scaling and EBS mint data keys on your behalf. Prefer grants for transient/service access; key-policy edits for the durable allow-list.
10. How do you confine a KMS key so only a specific service in your org can use it? Add two condition keys to the policy: kms:ViaService (e.g. s3.eu-west-1.amazonaws.com) so the key is usable only through that service and not by a human running aws kms decrypt, and aws:PrincipalOrgID so cross-account use is confined to your organization. Add aws:SourceVpce to pin it to your KMS interface endpoint for off-network defense.
11. What’s the difference between a single-Region and a multi-Region key, and when is each correct? A single-Region key is Region-locked — its ciphertext decrypts only in that Region, and it offers the strongest isolation. A multi-Region key shares identical material across a primary and replicas so the same envelope decrypts in any of them. Use single-Region by default; use multi-Region only where ciphertext must cross Regions (global tables, cross-Region DR), accepting weaker isolation as the trade.
12. You scheduled a key for deletion. What are the risks and the safety net? Deleting a CMK destroys the material after a 7–30 day waiting window, rendering all ciphertext under it permanently unreadable. The safety net is cancel-key-deletion within the window, and the discipline is long windows (close to 30 days) plus multi-party review, because ScheduleKeyDeletion is a data-destroying action.
These map across several certs: AWS Certified Security – Specialty (SCS-C02) covers KMS authorization, grants, encryption context, cross-account, and CloudTrail auditing in depth; Solutions Architect – Professional (SAP-C02) covers multi-Region keys and DR portability; Solutions Architect – Associate (SAA-C03) covers envelope encryption, SSE integrations, and key types. A compact cert-mapping:
| Question theme | Primary cert | Objective area |
|---|---|---|
| Key policy vs IAM vs grants | Security – Specialty | Identity & access management; data protection |
Encryption context, kms:ViaService |
Security – Specialty | Data protection; logging & monitoring |
| Multi-Region keys, DR portability | SA Professional | Design for resilience / continuity |
| Envelope encryption, key types, SSE | SA Associate | Secure architectures; storage encryption |
| Quotas, Bucket Keys, cost | SA Associate / Specialty | Cost-optimized & performant architectures |
| Rotation, deletion windows, audit | Security – Specialty | Data protection; incident response |
Quick check
- A
Decryptis denied even though the caller’s IAM allowskms:Decrypt. What is the most likely cause, and the one command to confirm it? - A global table replicated your encrypted data to a DR Region, but the DR Region can’t decrypt it. What single property of the key explains this, and what two-part fix is required?
- True or false: enabling automatic key rotation re-encrypts your existing stored data under the new material.
- You’re seeing
ThrottlingExceptionon an S3-backed hot path and a surprising KMS bill. Name the single biggest lever to fix both. - You want a CMK that can only be used through S3 and only by principals in your organization. Which two condition keys do you add?
Answers
- The key policy omits the IAM-delegation statement, so IAM is ignored and the key policy is authoritative — and it doesn’t list this principal. Confirm with
aws kms get-key-policy --key-id <id> --policy-name default; the missingkms:*-to-:root(EnableIAMRoot) statement is the cause. Add it. - The key is single-Region (
MultiRegion: false) — replication moved the ciphertext but not the key. The fix is two parts: adopt a multi-Region key (replicate it into the DR Region) and run aReEncryptbackfill of the pre-existing ciphertext; the MRK alone does nothing for data already wrapped under the old key. - False. Rotation generates new material and retains the old, so new writes use fresh material while old ciphertext is unwrapped with retained material — your stored data is untouched. Re-wrapping existing data needs a separate, application-driven
ReEncryptjob. - Enable S3 Bucket Keys. It collapses thousands of per-object KMS calls into roughly one per bucket, cutting the request line of the bill and relieving the request-rate quota that’s causing the throttling. (Add data-key caching and a quota increase as follow-ups.)
kms:ViaService(e.g.s3.eu-west-1.amazonaws.com) to pin the key to S3 only — blocking a human runningaws kms decrypt— andaws:PrincipalOrgIDto confine use to your organization.
Glossary
- KMS key (CMK) — a logical reference to symmetric or asymmetric key material that lives inside FIPS 140-3 validated HSMs and is never exported; formally a “KMS key.”
- Customer-managed key — a CMK whose policy, rotation, grants, tags, and deletion you control; the only type worth architecting around (vs AWS-owned/AWS-managed).
- Data key — a symmetric key minted by
GenerateDataKey, returned in plaintext (for local encryption) and wrapped under the CMK (for storage); it does the actual bulk encryption. - Envelope encryption — encrypting data with a data key and wrapping that data key under a CMK, so plaintext and key material both stay out of KMS.
- Key policy — the resource policy attached to a CMK; the root of trust in KMS, authoritative over IAM (which is ignored unless the key policy delegates to it).
- Grant — a programmatic, temporary, fine-grained delegation (operations + constraints) used by AWS services and short-lived jobs; revocable without editing the key policy.
- Encryption context — additional authenticated data (AAD): not secret, bound to the ciphertext, required byte-for-byte at decrypt, logged in CloudTrail, and constrainable in policy.
- Multi-Region key (MRK) — a primary plus replicas that share identical key material (
mrk-...), so a single envelope decrypts in any of their Regions; the basis of cross-Region DR portability. ReEncrypt— a KMS operation that decrypts and re-wraps ciphertext under a new key without exposing plaintext to your process; used to migrate the wrapping key and backfill rotation.- Alias — a mutable, Region-scoped friendly pointer (e.g.
alias/payments-prod) to a CMK; convenient but not a stable identity for audit (CloudTrail logs the key ARN). kms:ViaService— a condition key that pins a CMK to a specific AWS service, so it can be used only through that service (e.g. S3) and not by a human calling KMS directly.aws:PrincipalOrgID— a condition key that confines cross-account key use to principals within your AWS Organization.- S3 Bucket Keys — a bucket-level setting that caches a data key per bucket for SSE-KMS, collapsing thousands of per-object KMS calls into roughly one and cutting cost and quota pressure.
- Request-rate quota — the shared, per-Region cap on cryptographic operations; exceeding it returns
ThrottlingException, the throttling ceiling you architect call volume around. - Key commitment — an Encryption SDK property (
REQUIRE_ENCRYPT_REQUIRE_DECRYPT) that prevents one ciphertext from decrypting to different plaintexts under different keys. KeySpec/KeyUsage— the immutable cryptographic spec (e.g.SYMMETRIC_DEFAULT) and purpose (e.g.ENCRYPT_DECRYPT) chosen at key creation — a one-way door.- Key deletion window — the 7–30 day waiting period after
ScheduleKeyDeletionbefore material is destroyed;cancel-key-deletionis the safety net within it.
Next steps
You can now architect KMS as an authorization system with a latency budget and a quota — pick key types deliberately, scope authorization to the exact caller and context, make ciphertext portable where it travels, and crush per-object call volume. Build outward:
- Foundation: AWS IAM Fundamentals: Users, Roles, Policies & the Evaluation Engine — the policy-evaluation model the three-layer KMS authorization is built on.
- Related: IAM Least Privilege: Permission Boundaries & Inescapable Ceilings — the boundary and ABAC patterns that pair with KMS condition keys.
- Related: S3 Deep Dive: Storage Classes, Versioning, Lifecycle & Encryption — where SSE-KMS and S3 Bucket Keys live.
- Related: Secrets Manager & Parameter Store Deep Dive — both wrap their secrets under a CMK and use encryption context.
- Related: CloudTrail, Config & Audit/Compliance — where every
Decryptlands and how to alarm on unexpected ones. - Related: Cross-Account IAM Roles: External ID, Confused Deputy & Session Policies — the trust patterns behind cross-account KMS sharing.
- Related: The Well-Architected Security Pillar, Deep Dive — where data-protection and key management sit in the bigger picture.