AWS Databases

Amazon DynamoDB, In Depth: Tables, Keys, Capacity Modes, Indexes & Streams

Almost every DynamoDB problem I have been called in to fix traces back to a decision someone made in the first five minutes — a key they chose because it was “the obvious id”, a capacity mode they left on the default, an index they bolted on later to make one slow query go away. DynamoDB punishes those early decisions harder than a relational database does, because it gives you almost none of the escape hatches you are used to: no joins, no ad-hoc WHERE on any column, no “just add an index and the optimiser will sort it out”. A customerId partition key looks fine in the demo and then melts under a Black-Friday hot partition. A table left on provisioned capacity at five write units throttles the moment a campaign launches, and the team blames “DynamoDB being slow” when DynamoDB was doing exactly what it was told. Someone enables a Global Secondary Index with the wrong projection and quietly doubles their write bill. The service is extraordinary — single-digit-millisecond latency at any scale, genuinely hands-off operations — but only if you understand the handful of concepts underneath the friendly Create table button.

This is the deep dive that closes that gap. Amazon DynamoDB is AWS’s fully managed, serverless, key-value and document NoSQL database. You do not provision servers, choose instance types, patch anything, or manage replication; you create a table, define its keys, and read and write items via an API, and AWS spreads your data across a fleet of storage nodes that scales horizontally to effectively unlimited size and throughput. By the end of this lesson you will know the full data model (tables, items, attributes, and the all-important partition key + sort key), exactly how partitioning and hashing distribute your data and what causes hot partitions; both capacity modes (on-demand and provisioned with auto scaling) and the RCU/WCU arithmetic behind them; the difference between a Local Secondary Index and a Global Secondary Index and when to reach for each; DynamoDB Streams and change data capture; TTL; the DAX in-memory cache; transactions; the eventual-versus-strong consistency model; global tables for multi-Region; and point-in-time recovery, backups, and encryption. Every concept comes with the real aws CLI to drive it.

Learning objectives

By the end of this lesson you will be able to:

Prerequisites & where this fits

You should be comfortable with IAM users, roles, and policies, because every DynamoDB call is authorised by IAM (there is no separate database login), and a sense of what AWS Lambda does will help when we wire up Streams. No prior NoSQL experience is assumed — every term is defined as we go. This lesson sits in the Databases module of the AWS Zero-to-Hero course, alongside the relational RDS & Aurora deep dive; think of the two as the relational and the NoSQL halves of the same chapter. It is the foundation for the two advanced DynamoDB lessons it links at the end: single-table design and access patterns and change data capture with DynamoDB Streams.

Core concepts

Key-value and document, not relational. A relational database stores rows in tables with a fixed schema and lets you query any column, join tables, and let an optimiser figure out the plan. DynamoDB does almost none of that. It stores items (think “rows”, but schemaless beyond the key) addressed by a primary key, and it is ruthlessly optimised for one thing: fetching items by their key in single-digit milliseconds, at any scale, with predictable cost. The trade is that you must know your access patterns up front and design your keys and indexes around them — there is no SELECT * FROM t WHERE anyColumn = ? that stays fast as the table grows. This is why people say you “model for your queries, not for your entities” in DynamoDB; the single-table design lesson is entirely about doing that well.

Tables, items, and attributes. A table is a collection of items; an item is a collection of attributes (name/value pairs); an attribute has a data type. The only thing every item in a table must share is the primary key attributes — everything else is free-form, so two items in the same table can have completely different attributes. Items are limited to 400 KB each (the sum of attribute names and values), which is a hard design constraint: large blobs go in S3 with a pointer stored in DynamoDB. Attribute types are scalar (S string, N number, B binary, BOOL, NULL), document (M map, L list — these nest arbitrarily, which is the “document database” part), and set (SS string set, NS number set, BS binary set — unordered, no duplicates).

The primary key: partition key, optionally plus a sort key. This is the single most important decision you make. The primary key takes one of two forms:

Query vs Scan (learn this before anything else). A Query targets a single partition key and optionally a range of sort-key values; it reads only matching items and is fast and cheap. A Scan reads every item in the table (or index) and filters afterwards; it is slow and expensive and you should treat it as a code smell in any hot path. The whole art of DynamoDB modelling is arranging your keys and indexes so every access pattern is a Query (or a GetItem/BatchGetItem) and never a Scan.

Serverless and horizontally scaled. DynamoDB has no instances to size. Behind the scenes your table’s data is spread across many partitions (storage units, each on solid-state storage and replicated across three Availability Zones for durability), and DynamoDB adds partitions automatically as your data grows past ~10 GB per partition or as you push more throughput. You never see partitions directly, but understanding that they exist is the key to understanding both performance and hot partitions.

How partitioning and hashing work (and hot partitions)

DynamoDB decides which physical partition an item lives on by running the partition-key value through an internal hash function; the hash output maps the item to one partition. Items with the same partition-key value always land on the same partition (that is what makes Query efficient — they are physically together and sorted by sort key). Items with different partition-key values are spread across partitions roughly uniformly if the key values are diverse.

That last clause is everything. A partition has finite limits — historically a guideline of ~3,000 RCUs and ~1,000 WCUs and ~10 GB per partition. If your access concentrates on one partition-key value, all that traffic hits one partition and you get a hot partition: throttling on that key even though the table’s total provisioned (or on-demand) capacity is nowhere near exhausted. Classic causes:

Adaptive capacity mitigates this somewhat: DynamoDB automatically reallocates throughput toward partitions that need it (and can isolate a single hot item onto its own partition), so transient skew often “just works”. But adaptive capacity cannot save a fundamentally bad key — if all your traffic targets one value, there is nothing to rebalance. The design fixes are: choose a high-cardinality partition key (user id, order id — something with millions of distinct values), and where a naturally skewed key is unavoidable, write-shard it by appending a suffix (date#0date#9) and fanning reads across the shards. The single-table design lesson covers hot-partition avoidance in depth.

Capacity modes: on-demand vs provisioned

Every table runs in one of two capacity modes, which determine how you pay for throughput and whether you manage it.

On-demand Provisioned
You specify Nothing (it scales itself) RCUs and WCUs (a target throughput)
Pricing Per request (per million reads/writes) Per provisioned unit-hour, whether used or not
Scaling Instant, automatic, unlimited (up to table/account limits) Fixed unless auto scaling adjusts it; bursts use a token bucket
Best for Spiky/unpredictable traffic, new apps, dev/test, “set and forget” Steady, predictable traffic where you can forecast load
Cost shape More per request, zero when idle Cheaper per request if well-utilised, pays even when idle
Throttling Rare (only at very high sudden scale beyond previous peak) Happens when demand exceeds provisioned + burst
Switching You can switch modes once every 24 hours Same

Read & write capacity units (the arithmetic you must know). In provisioned mode you buy throughput in units, and the same units describe what on-demand requests cost:

So a Query returning ten 4 KB items eventually consistent costs 10 × 0.5 = 5 RCUs; the same strongly consistent costs 10 RCUs; in a transaction, 20 RCUs. Internalise the strong = full, eventual = half, transactional = double rule and the 1 KB-write / 4 KB-read granularity — it is exam gold and it is how you forecast a bill.

Burst capacity and the token bucket (provisioned mode). Provisioned mode is not a hard wall. DynamoDB accumulates unused capacity (up to the last 5 minutes / 300 seconds’ worth) into a burst bucket and lets short spikes draw it down, so brief overruns don’t throttle. But burst is best-effort and finite; sustained overload throttles once the bucket empties. (On-demand has its own behaviour: it serves up to double your previous peak instantly, and ramps higher within ~30 minutes — so a brand-new table or a never-before-seen spike can still throttle until it “learns” the new peak. You can pre-warm with warm throughput settings.)

Auto scaling (provisioned mode). Rather than guess a fixed number, you enable Application Auto Scaling, which watches a target utilisation (default 70%) of consumed-to-provisioned capacity and raises or lowers provisioned RCUs/WCUs between a min and max you set, via CloudWatch alarms. It reacts in minutes, not seconds, so it is great for daily cycles but not for instantaneous spikes — for those, on-demand is usually the better answer. You can also buy reserved capacity (a 1- or 3-year commitment on a baseline of provisioned units) for a steep discount on steady workloads.

Which mode? Start new and unpredictable workloads on on-demand — it is the safe default and you never throttle from under-provisioning. Move to provisioned + auto scaling (and consider reserved capacity) once traffic is steady and predictable enough that the per-request maths favours it. Because you can switch only once per 24 hours, treat the switch as a deliberate decision, not a knob to fiddle.

Secondary indexes: LSI vs GSI

By default you can only efficiently fetch items by the primary key. A secondary index lets you query by other attributes by maintaining an alternate key structure that DynamoDB keeps in sync automatically. There are two kinds, and choosing wrongly is a common and expensive mistake.

Local Secondary Index (LSI) Global Secondary Index (GSI)
Partition key Same as the table’s partition key Any attribute (different partition key allowed)
Sort key A different attribute (alternate sort key) Any attribute (optional sort key)
When created Only at table creation — cannot add/remove later Anytime — add or delete on a live table
How many Up to 5 per table Up to 20 per table (default; raisable)
Consistency Supports strong and eventual reads Eventual only (never strongly consistent)
Capacity Shares the base table’s RCUs/WCUs Its own provisioned RCUs/WCUs (or on-demand)
Size limit 10 GB per partition-key value (item collection limit) No item-collection size limit
Key uniqueness Index keys need not be unique Index keys need not be unique

The mental model. An LSI is “same partition, different sort order” — it lets you query the same set of items grouped by the same partition key, but ordered/filtered by a different attribute (e.g. items for a user sorted by lastUpdated instead of by itemId). Because it shares the partition, it can be strongly consistent, and it counts against the 10 GB per-partition item-collection limit — which is the LSI’s biggest gotcha (a single partition key with an LSI can never exceed 10 GB of items). A GSI is a genuinely different table-like view: any attribute as the partition key, its own throughput, eventually consistent, addable anytime. GSIs are the workhorse — single-table designs are built on a handful of overloaded GSIs.

Projections (what attributes the index copies). An index stores a copy of certain attributes from the base item; you choose how much via the projection type:

Projection What’s copied into the index Trade-off
KEYS_ONLY Only the index keys + the base table keys Smallest/cheapest; but a query often needs a follow-up GetItem on the base table to get other attributes
INCLUDE Keys + a named list of extra attributes Balanced — project exactly the attributes your queries return
ALL Every attribute of the item Most convenient (queries are self-contained), largest storage and highest write cost

If a query reads an attribute not projected into the index, DynamoDB does not transparently fetch it for a GSI — you only get the projected attributes (for a GSI; with an LSI it can fetch non-projected attributes from the base table at extra read cost). So choose INCLUDE with exactly the attributes your queries return: ALL is convenient but you pay to write a full copy on every base-item write, and KEYS_ONLY saves storage but forces extra reads.

GSI write amplification and throttling (the costly gotcha). Every write to the base table that touches a projected attribute is also a write to each affected GSI, billed separately against that GSI’s capacity. Five GSIs with ALL projection means roughly 6× the write cost of an un-indexed table. Worse, on a provisioned GSI, if the GSI’s own write capacity can’t keep up, writes to the base table are throttled too (because DynamoDB won’t let the index fall arbitrarily behind). The fixes: provision the GSI generously (or use on-demand), and project only what you need.

DynamoDB Streams and change data capture

A DynamoDB Stream is an ordered, time-ordered log of item-level changes in a table — every create, update, and delete — retained for 24 hours. Turning it on gives you a powerful, exactly-the-right-shape change-data-capture (CDC) feed to drive event-driven architectures: replicate to another store, maintain an aggregate, send a notification, index into OpenSearch, and so on.

What each record contains — the StreamViewType. When you enable a stream you pick how much of the change it carries:

StreamViewType Record contains Use when
KEYS_ONLY Only the key attributes of the changed item You just need to know which item changed and will re-fetch it
NEW_IMAGE The entire item after the change You need the new state (e.g. to project/replicate it)
OLD_IMAGE The entire item before the change You need the prior state (e.g. audit, undo, diff)
NEW_AND_OLD_IMAGES Both before and after You need to diff (compute exactly what changed) — the richest, most common choice for CDC

Ordering and processing. Stream records are organised into shards that mirror the table’s partitions, and DynamoDB guarantees ordering per partition key (records for the same item are delivered in the order the changes happened) — but not a single global order across the whole table. You consume a stream two ways: with the DynamoDB Streams Kinesis-style API (and the Kinesis Client Library) for custom consumers, or — far more commonly — with a Lambda trigger via an event source mapping, where Lambda polls the shards for you and invokes your function with batches of records. Because delivery is at-least-once, your consumer must be idempotent. The Streams CDC lesson goes deep on ordering, idempotency, batching/parallelisation, error handling (bisect-on-error, on-failure destinations), and EventBridge Pipes.

Streams vs Kinesis Data Streams for DynamoDB. As an alternative you can stream changes to an Amazon Kinesis Data Stream instead of (or as well as) the native stream. The difference: native DynamoDB Streams retain 24 hours and are consumed via Lambda/KCL with per-partition ordering; Kinesis Data Streams offer longer retention (up to 365 days), more/larger consumers, and integration with the broader Kinesis ecosystem (Firehose, Data Analytics), at the cost of running and paying for the Kinesis stream and accepting Kinesis’s at-least-once, possibly-duplicated, possibly-out-of-order-on-resharding semantics. Choose native Streams for tight, ordered Lambda triggers; choose Kinesis for fan-out to many consumers, long retention, or analytics pipelines.

Time to Live (TTL): automatic expiry

TTL lets DynamoDB delete expired items automatically and for free. You designate one numeric attribute as the TTL attribute and store an epoch timestamp (seconds since 1970, UTC) in it; a background process deletes items once that time passes. Key facts that trip people up:

DAX: the in-memory cache

DynamoDB Accelerator (DAX) is a fully managed, in-memory, write-through cache that sits in front of DynamoDB and speaks the DynamoDB API, so adopting it is largely a client-library swap — point the DAX client at the DAX cluster endpoint instead of DynamoDB and your GetItem/Query/Scan calls are cached. It turns single-digit-millisecond reads into single-digit-microsecond reads and absorbs read-heavy/hot-key traffic so it never reaches the table.

DAX
What it accelerates Reads (item cache for GetItem/BatchGetItem; query cache for Query/Scan)
Writes Write-through: writes go to DynamoDB and update the cache
Consistency Eventually consistent only — DAX cannot serve strongly consistent reads (those bypass DAX)
Form factor A cluster of nodes inside your VPC (a primary + read replicas across AZs) — you size the node type and count
When it helps Read-heavy, repeated reads, hot keys, microsecond latency targets
When it does not Write-heavy workloads, strongly-consistent read needs, low cache-hit ratios, very large items

DAX is the right tool when reads dominate and slight staleness is acceptable; it is the wrong tool if you need strong consistency or your workload is write-heavy. Note it runs as provisioned nodes (not serverless), so it has an always-on cost — size it to your working set.

Transactions: all-or-nothing across items

DynamoDB supports ACID transactions across multiple items and multiple tables in a single Region via two APIs:

Two essentials: transactional operations cost double the normal capacity (a transactional write = 2 WCUs per KB, a transactional read = 2 RCUs per 4 KB), and a transaction can fail with a TransactionCanceledException if a condition check fails or two transactions conflict on the same item — your code must handle and retry as appropriate. Transactions are scoped to one Region (they do not span global-table replicas). For single-item conditional logic you usually don’t need a full transaction — a plain PutItem/UpdateItem with a condition expression (e.g. attribute_not_exists(pk) to create-only, or optimistic locking with a version attribute) is cheaper and sufficient.

Read consistency: eventual vs strong

DynamoDB replicates every item across three copies in different Availability Zones for durability. That replication is why reads come in two flavours:

Two caveats worth memorising: GSIs are always eventually consistent (you cannot request a strong read on a GSI), and global tables are always eventually consistent across Regions (a strong read is only ever “strong” within a single Region). So “strongly consistent” never means “globally consistent”.

Global tables: multi-Region, active-active

A global table is a single DynamoDB table replicated across multiple AWS Regions, with active-active read and write in every Region. DynamoDB asynchronously propagates writes between Regions (typically within a second), giving you low-latency local access for users in each Region and a Region-level disaster-recovery posture out of the box. Essentials:

Global tables give multi-Region resilience and locality cheaply, provided your application can live with eventual cross-Region consistency and last-writer-wins.

Backup, restore, and point-in-time recovery

DynamoDB offers two complementary protections:

Both restore to a new table; you can also do cross-Region and cross-account restores via AWS Backup. For DR, PITR covers “oops” within 35 days while global tables cover Region loss in real time — they solve different problems and are often used together.

Encryption, security, and access control

Encryption at rest is always on — every DynamoDB table is encrypted, you cannot turn it off. You choose the key:

Key option Who owns/manages Cost When
AWS owned key (default) AWS, fully transparent Free Default; you don’t need key control or an audit trail
AWS managed key (aws/dynamodb) AWS, in your account’s KMS KMS charges You want CloudTrail visibility of key use without managing a key
Customer managed key (CMK) You, in KMS KMS + per-request You need control over rotation, key policies, and the ability to disable the key (which disables table access)

In transit, all API calls are over HTTPS/TLS. Access control is pure IAM — there is no database user/password. IAM policies authorise actions (dynamodb:GetItem, Query, PutItem, …) on table and index ARNs, and DynamoDB supports remarkably fine-grained access control: you can restrict a principal to specific items or even specific attributes using the dynamodb:LeadingKeys condition key (e.g. “a user may only read items whose partition key equals their own user id”) — the backbone of multi-tenant designs. Add VPC endpoints (Gateway type) to keep traffic off the public internet, and CloudTrail logs the control-plane and (optionally, via data events) the data-plane.

The DynamoDB landscape at a glance

Amazon DynamoDB deep dive

The diagram above ties the pieces together: items hashed by partition key onto partitions (with the sort key ordering items inside a partition), the two capacity modes feeding throughput, GSIs/LSIs as alternate query views, Streams emitting an ordered change log into Lambda/Kinesis (and powering global tables), DAX caching reads in front, and PITR/backups and KMS encryption wrapping the table — the same mental map to keep while you read the rest of this lesson.

Creating a table: every setting

Whether you use the console, CLI, or IaC, a table is defined by the same set of choices. Here is every one, with the what/choices/default/when/gotcha treatment.

Setting What it is / choices Default When / gotcha
Table name Unique per Region per account Immutable; choose a convention (app-env-entity)
Partition key Name + type (S/N/B) — the hash key required Immutable after creation; pick a high-cardinality value
Sort key Optional name + type — the range key none Adds Query/range power; immutable; the combination must be unique
Capacity mode On-demand or Provisioned On-demand (console default) Switchable once per 24 h; on-demand = safe default
Provisioned RCU/WCU Throughput numbers (provisioned mode) 5/5 (console) Enable auto scaling with min/max + target % instead of fixing
Table class Standard or Standard-IA (Infrequent Access) Standard Standard-IA: cheaper storage, pricier throughput — for large, rarely-read tables
Secondary indexes LSIs (creation-time only) and GSIs (anytime) none LSIs share base capacity + 10 GB collection limit; GSIs have own capacity
Encryption AWS owned / AWS managed / CMK AWS owned Always on; CMK for control + audit
DynamoDB Streams Off, or on with a StreamViewType Off Required for global tables & CDC; 24 h retention
Kinesis data stream Optionally also stream to Kinesis Off For long retention / fan-out / analytics
TTL Optional TTL attribute (Number, epoch seconds) Off Free deletes within ~48 h; deletions appear in Streams
PITR Continuous backup (35-day restore) Off (on by default for new tables in console as of recent updates) Per-GB cost; restores to a new table
Deletion protection Block accidental DeleteTable Off Turn on for any production table
Tags Key/value metadata none For cost allocation & governance

Create a table with the CLI (composite key, on-demand, streams, PITR, deletion protection).

REGION=ap-south-1
aws dynamodb create-table \
  --table-name AppData \
  --attribute-definitions \
      AttributeName=PK,AttributeType=S \
      AttributeName=SK,AttributeType=S \
  --key-schema \
      AttributeName=PK,KeyType=HASH \
      AttributeName=SK,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST \
  --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES \
  --deletion-protection-enabled \
  --tags Key=env,Value=lab \
  --region $REGION
aws dynamodb wait table-exists --table-name AppData --region $REGION
aws dynamodb update-continuous-backups --table-name AppData \
  --point-in-time-recovery-specification PointInTimeRecoveryEnabled=true --region $REGION

Provisioned mode with auto scaling instead uses --billing-mode PROVISIONED --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5, then aws application-autoscaling register-scalable-target + put-scaling-policy on dynamodb:table:ReadCapacityUnits/WriteCapacityUnits with a TargetTrackingScaling policy at 70%.

Add a GSI to a live table (any attribute, INCLUDE projection).

aws dynamodb update-table --table-name AppData \
  --attribute-definitions AttributeName=GSI1PK,AttributeType=S AttributeName=GSI1SK,AttributeType=S \
  --global-secondary-index-updates '[{"Create":{
      "IndexName":"GSI1",
      "KeySchema":[{"AttributeName":"GSI1PK","KeyType":"HASH"},{"AttributeName":"GSI1SK","KeyType":"RANGE"}],
      "Projection":{"ProjectionType":"INCLUDE","NonKeyAttributes":["status","total"]}}}]' \
  --region $REGION

The GSI back-fills in the background (the table stays available); watch IndexStatus go CREATINGACTIVE.

After creation: what you can (and can’t) change

Operation Can you? Notes
Change the partition/sort key No Keys are immutable — you must create a new table and migrate (export → transform → import).
Change capacity mode Yes, once per 24 h On-demand ⇄ provisioned.
Adjust provisioned RCU/WCU Yes, anytime Decreases are limited per day; auto scaling handles this for you.
Add/remove a GSI Yes, anytime Adding back-fills online; you can have GSIs in different states.
Add/remove an LSI No LSIs exist only from table creation.
Change a GSI’s projection No Delete and recreate the GSI with the new projection.
Enable/disable Streams Yes (re-enabling starts a new stream, no history) Required for global tables.
Enable/disable TTL, PITR, deletion protection Yes, anytime
Change table class Yes Standard ⇄ Standard-IA.
Change encryption key Yes Switch among AWS-owned/managed/CMK.
Add a replica Region (global table) Yes Streams must be on; each replica billed separately.

Hands-on lab

In this lab you create an on-demand table (so it costs essentially nothing), write and read items, run a Query, add a GSI, enable TTL, take a backup, and clean up. Uses the aws CLI (CloudShell or local).

1. Create an on-demand table with a composite key and a stream.

REGION=ap-south-1
aws dynamodb create-table --table-name LabOrders \
  --attribute-definitions AttributeName=PK,AttributeType=S AttributeName=SK,AttributeType=S \
  --key-schema AttributeName=PK,KeyType=HASH AttributeName=SK,KeyType=RANGE \
  --billing-mode PAY_PER_REQUEST \
  --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES \
  --region $REGION
aws dynamodb wait table-exists --table-name LabOrders --region $REGION

Expected: the wait returns once the table is ACTIVE.

2. Write a few items (two orders for one customer).

aws dynamodb put-item --table-name LabOrders --region $REGION --item '{
  "PK":{"S":"CUST#42"},"SK":{"S":"ORDER#2026-06-15#1001"},
  "status":{"S":"PLACED"},"total":{"N":"1299"},"city":{"S":"Mumbai"}}'
aws dynamodb put-item --table-name LabOrders --region $REGION --item '{
  "PK":{"S":"CUST#42"},"SK":{"S":"ORDER#2026-06-15#1002"},
  "status":{"S":"PLACED"},"total":{"N":"499"},"city":{"S":"Mumbai"}}'

Expected: both calls return with no error.

3. GetItem (one item by full key) and Query (all orders for the customer).

aws dynamodb get-item --table-name LabOrders --region $REGION \
  --key '{"PK":{"S":"CUST#42"},"SK":{"S":"ORDER#2026-06-15#1001"}}' \
  --consistent-read   # strongly consistent read

aws dynamodb query --table-name LabOrders --region $REGION \
  --key-condition-expression "PK = :c AND begins_with(SK, :p)" \
  --expression-attribute-values '{":c":{"S":"CUST#42"},":p":{"S":"ORDER#2026-06-15"}}' \
  --query "Items[].SK.S" --output table

Expected: the get-item returns the 1001 order (strongly consistent); the query returns both SKs in sort order — and note we used begins_with on the sort key, the canonical DynamoDB range pattern.

4. Add a GSI to query by status (a different access pattern), then query it.

aws dynamodb update-table --table-name LabOrders --region $REGION \
  --attribute-definitions AttributeName=status,AttributeType=S AttributeName=total,AttributeType=N \
  --global-secondary-index-updates '[{"Create":{
      "IndexName":"byStatus",
      "KeySchema":[{"AttributeName":"status","KeyType":"HASH"},{"AttributeName":"total","KeyType":"RANGE"}],
      "Projection":{"ProjectionType":"ALL"}}}]'
# wait for the GSI to finish back-filling:
aws dynamodb describe-table --table-name LabOrders --region $REGION \
  --query "Table.GlobalSecondaryIndexes[0].IndexStatus" --output text
# once it prints ACTIVE:
aws dynamodb query --table-name LabOrders --index-name byStatus --region $REGION \
  --key-condition-expression "#s = :v" \
  --expression-attribute-names '{"#s":"status"}' \
  --expression-attribute-values '{":v":{"S":"PLACED"}}' \
  --query "Items[].SK.S" --output table

Expected: IndexStatus transitions CREATINGACTIVE; the GSI query returns both orders by status — an access pattern the base key could not serve. (Note GSI reads are eventually consistent--consistent-read is rejected here.)

5. Enable TTL on an expiresAt attribute.

aws dynamodb update-time-to-live --table-name LabOrders --region $REGION \
  --time-to-live-specification "Enabled=true,AttributeName=expiresAt"
aws dynamodb describe-time-to-live --table-name LabOrders --region $REGION

Expected: TTL status ENABLED on expiresAt. (Items get deleted within ~48 h of their epoch timestamp passing — free of charge.)

6. Take an on-demand backup, then list it.

aws dynamodb create-backup --table-name LabOrders --backup-name LabOrders-snap --region $REGION
aws dynamodb list-backups --table-name LabOrders --region $REGION \
  --query "BackupSummaries[].{name:BackupName,status:BackupStatus}" --output table

Expected: a backup with status AVAILABLE.

7. Cleanup.

# delete the backup
BK=$(aws dynamodb list-backups --table-name LabOrders --region $REGION \
  --query "BackupSummaries[0].BackupArn" --output text)
aws dynamodb delete-backup --backup-arn "$BK" --region $REGION
# delete the table (this also removes its GSIs and stream)
aws dynamodb delete-table --table-name LabOrders --region $REGION
aws dynamodb wait table-not-exists --table-name LabOrders --region $REGION

Validation: aws dynamodb describe-table --table-name LabOrders --region $REGION eventually returns ResourceNotFoundException. (If delete-table is blocked, you left deletion protection on — aws dynamodb update-table --no-deletion-protection-enabled first.)

Cost note (INR-aware): an on-demand table with a handful of items and requests costs effectively nothing — DynamoDB’s free tier includes 25 GB of storage and a generous monthly allowance of on-demand requests, and you pay only per request beyond that. The things that quietly cost money: provisioned capacity left running (you pay per unit-hour even idle — which is why this lab used on-demand), GSIs with ALL projection (extra storage + a write per base write), PITR (per-GB), on-demand backups (persist until deleted — step 7 removes it), and a DAX cluster (always-on nodes). Deleting the table and the backup leaves nothing billing.

Common mistakes & troubleshooting

Symptom Likely cause Fix
ProvisionedThroughputExceededException / throttling on one key Hot partition from a low-cardinality or time-based partition key Choose a high-cardinality key; write-shard skewed keys; rely on adaptive capacity for transient skew.
Throttling though total capacity looks fine (provisioned) A single partition/key is the bottleneck, not the table total Same as above; for steady spikes, switch to on-demand.
A query is slow and expensive and reads the whole table You’re using Scan, not Query Redesign keys/indexes so the access pattern is a Query/GetItem; never Scan in a hot path.
ValidationException: ... ConsistentRead ... not supported on ... index Asked for a strongly consistent read on a GSI GSIs are eventually consistent only; drop --consistent-read or use an LSI/base table.
Writes to the base table suddenly throttle after adding a GSI The GSI’s (provisioned) write capacity can’t keep up Provision the GSI generously or use on-demand; project fewer attributes.
“I need to add an LSI but can’t” LSIs are creation-time only Use a GSI instead, or recreate the table with the LSI.
Query returns items missing some attributes Those attributes aren’t projected into the GSI Use INCLUDE (the needed attrs) or ALL projection; recreate the GSI to change projection.
Item won’t save: Item size has exceeded the maximum Item exceeds the 400 KB limit Store the large blob in S3, keep a pointer in DynamoDB; split the item.
TTL items still showing up TTL deletes within ~48 h, not instantly; or wrong attribute type Add a filter expression excluding expired items; ensure the TTL attribute is a Number epoch.
Cross-Region reads seem stale Global tables are eventually consistent across Regions Expected; design for it (last-writer-wins / Region-owned keys).

Best practices

Security notes

Interview & exam questions

  1. What is the difference between a partition key and a sort key, and what does each enable? The partition key (hash key) determines which physical partition an item lives on (via an internal hash) and, alone, identifies an item in a simple primary key. Adding a sort key makes a composite key: items with the same partition key are stored together and sorted by sort key, which enables the efficient Query (fetch a partition or a contiguous range). The partition key drives distribution; the sort key drives ordering/range within a partition.

  2. What causes a hot partition and how do you avoid it? Concentrating traffic on one (or few) partition-key values — a low-cardinality key, a time-based key (today’s date), or a single popular item. The table’s total capacity may be fine while one partition throttles. Avoid it with a high-cardinality partition key and by write-sharding skewed keys (suffix #0..#N); adaptive capacity absorbs transient skew but can’t fix a fundamentally bad key.

  3. On-demand vs provisioned capacity — when each? On-demand scales automatically and you pay per request — best for spiky/unpredictable or new workloads and to avoid throttling from under-provisioning. Provisioned (with auto scaling and optionally reserved capacity) is cheaper per request for steady, predictable traffic if well-utilised, but you pay for provisioned units even when idle and can throttle when demand exceeds provisioned + burst. You can switch modes once per 24 h.

  4. How do you compute RCUs and WCUs? 1 WCU = one 1 KB write/sec (round up; transactional = 2×). 1 RCU = one strongly consistent 4 KB read/sec, two eventually consistent 4 KB reads/sec (eventual = half), or one transactional read = 2 RCUs (round size up to 4 KB). E.g. a strongly-consistent read of an 8 KB item = 2 RCUs; eventually = 1 RCU.

  5. LSI vs GSI — the key differences? An LSI shares the table’s partition key with a different sort key, must be created with the table, shares the base table’s capacity, supports strong reads, and is bound by the 10 GB item-collection limit. A GSI can use any attribute as its key, can be added/removed anytime, has its own capacity, is eventually consistent only, and has no collection-size limit. GSIs are the everyday tool; LSIs are for “same partition, alternate sort, strong read” cases.

  6. What are index projections and why do they matter? A projection controls which attributes are copied into the index: KEYS_ONLY (keys only — smallest, may force a base-table follow-up read), INCLUDE (keys + named attributes — balanced), ALL (every attribute — convenient but largest, and you pay a full index write per base write). For a GSI, queries see only projected attributes. Choose INCLUDE with exactly what your queries return.

  7. What is a DynamoDB Stream and what are the StreamViewType options? An ordered, 24-hour log of item-level changes (create/update/delete) for CDC. StreamViewType selects the payload: KEYS_ONLY, NEW_IMAGE, OLD_IMAGE, or NEW_AND_OLD_IMAGES (both — needed to diff). Ordering is guaranteed per partition key; delivery is at-least-once, so consumers (often Lambda via an event source mapping) must be idempotent.

  8. Eventual vs strong consistency — and where can’t you get strong? Eventual reads (default, half the cost) may briefly miss the latest write; strong reads (full cost, slightly higher latency, less resilient) always return the latest committed write. You cannot get a strong read on a GSI, and global tables are eventually consistent across Regions — “strong” is only ever within one Region.

  9. What does DAX accelerate, and what can’t it do? DAX is a write-through, in-memory cache in front of DynamoDB that turns millisecond reads into microsecond reads for GetItem/Query/Scan. It cannot serve strongly consistent reads (those bypass it) and doesn’t help write-heavy workloads; it runs as provisioned nodes in your VPC (always-on cost).

  10. How do DynamoDB transactions work and what do they cost? TransactWriteItems/TransactGetItems give all-or-nothing ACID across up to 100 items/multiple tables in one Region. They cost double the normal capacity and can fail with TransactionCanceledException on a failed condition or a conflict (retry). For single-item atomicity, a condition expression on PutItem/UpdateItem is cheaper.

  11. What is a global table and what consistency does it offer? A multi-Region, active-active replicated table (built on Streams) giving local low-latency reads/writes per Region and Region-level DR. Cross-Region replication is eventually consistent with last-writer-wins conflict resolution — so design for eventual consistency or give each Region ownership of certain keys.

  12. PITR vs on-demand backups — when each? PITR continuously backs up and restores to any second in the last 35 days (great for “oops” recovery); on-demand backups are manual snapshots retained indefinitely for long-term/compliance needs. Both restore to a new table and don’t consume table capacity. For Region loss, use global tables, not backups.

Quick check

  1. You need every order for one customer, newest first, in one efficient call. What primary-key shape and which operation?
  2. True or false: you can add a Local Secondary Index to an existing table.
  3. A strongly-consistent read of a 10 KB item costs how many RCUs? Eventually consistent?
  4. Your GSI write capacity is too low in provisioned mode — what happens to writes on the base table?
  5. Which StreamViewType do you choose when you need to compute exactly what changed on each update?

Answers

  1. A composite primary key (partition key = customerId, sort key = something time/order-ordered like ORDER#<timestamp>), queried with Query (optionally with ScanIndexForward=false for newest-first). Same partition + sorted = one efficient range read.
  2. False. LSIs can be created only at table creation; for an existing table use a GSI.
  3. 10 KB rounds up to 12 KB → 3 × 4 KB = 3 RCUs strongly consistent; eventually consistent is half, so 1.5 RCUs.
  4. The base table’s writes are throttled too — DynamoDB won’t let the GSI fall arbitrarily behind. Provision the GSI generously or use on-demand.
  5. NEW_AND_OLD_IMAGES — it carries both the before and after item so the consumer can diff them.

Exercise

Design the DynamoDB table(s) and indexes for a multi-tenant SaaS task tracker that must serve these access patterns: (a) get a single task by id; (b) list all tasks in a project, sorted by due date; © list all tasks assigned to a user across projects, filtered by status; (d) expire tasks 90 days after completion automatically; (e) react to every task change to update a per-project “open task count”. For each access pattern, state the key or index and the operation (GetItem/Query) you would use — and justify why none of them needs a Scan. Specify: the partition/sort key for the base table and how you avoid a hot partition for a very large project; the GSI(s) with their keys and projection (and why that projection); how you implement (d) with TTL; and how you implement (e) with Streams (which StreamViewType and why, and how you keep the consumer idempotent). Then choose a capacity mode with a one-line justification, and decide whether this design warrants DAX and/or a global table. Finally, write the aws dynamodb create-table and update-table (GSI) commands to build it.

Certification mapping

Glossary

Next steps

You now know DynamoDB end to end — the data model and partition/sort keys, how partitioning and hashing cause and prevent hot partitions, both capacity modes and the RCU/WCU maths, LSIs vs GSIs and projections, Streams and CDC, TTL, DAX, transactions, the consistency model, global tables, and PITR/backups/encryption. From here:

awsdynamodbnosqlgsidynamodb-streamscapacity
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading