Relational instincts are the single biggest reason DynamoDB projects go sideways. You normalize entities into tables, then discover the only join DynamoDB offers is the one you pre-compute at write time. Single-table design is the discipline of collapsing many entity types into one physical table so the queries your application runs become single-partition reads. It is not about saving on table count — it is about co-locating related items so a Query returns a parent and its children in one round trip, at single-digit-millisecond latency, regardless of table size.
This is the process I follow on every greenfield DynamoDB design: enumerate access patterns first, design keys to satisfy them, overload indexes, then defend the schema against hot partitions and the 400 KB item ceiling. The order matters. Start from the entities and you will refactor; start from the queries and you will ship.
1. Working backward: enumerate access patterns before touching keys
The schema is downstream of the queries. Before naming a single attribute, write the complete list of access patterns the service needs — every read and write, with its filter, its sort, and its cardinality. This is the artifact you review with the team, not the data model.
For a multi-tenant order-management service, the list looks like this:
| # | Access pattern | Type | Frequency |
|---|---|---|---|
| A1 | Get a customer by ID | Read | High |
| A2 | Get all orders for a customer, newest first | Read | High |
| A3 | Get a single order with its line items | Read | High |
| A4 | List orders in a status (e.g. SHIPPED) for a customer |
Read | Medium |
| A5 | Get all open orders across all customers (ops dashboard) | Read | Low |
| A6 | Create order + line items atomically | Write | High |
| A7 | Update order status | Write | High |
The cardinality column is not decoration. A5 — “all open orders across all customers” — will create a hot partition if modeled naively, because every write funnels into one item collection. Flag those now.
Three rules I hold to:
- No
Scanin the steady state. If a pattern can only be served by aScanwith aFilterExpression, the model is wrong.Scanreads every item and then filters, so you pay read capacity for data you discard. - Filtering is not querying.
FilterExpressionruns after the key query, before results return. It reduces payload, never capacity consumed or items examined. Design keys so theKeyConditionExpressiondoes the selection. - Every pattern maps to exactly one
Query/GetItemon the base table or a GSI. If a pattern needs two queries, you have a missing index or a missing pre-computed item.
2. Primary key design: partition/sort composition and entity overloading
DynamoDB gives you a composite primary key: a partition key (PK, decides the physical partition via an internal hash) and a sort key (SK, orders items within that partition). The power of single-table design comes from entity overloading — naming the keys generically (PK, SK) so different entity types can share the table, and encoding the type into the value with a prefix.
Here is the item collection that satisfies A1, A2, A3, and A6 — a customer and all of their orders and line items live under one partition key:
PK SK attributes
------------------- ------------------------ ----------------------------------
CUST#a1b2 PROFILE name, email, tier, createdAt
CUST#a1b2 ORDER#2026-06-01#o-9001 status=OPEN, total=149.00
CUST#a1b2 ORDER#2026-06-03#o-9044 status=SHIPPED, total=72.50
ORDER#o-9001 ITEM#001 sku=ABC, qty=2, price=49.50
ORDER#o-9001 ITEM#002 sku=XYZ, qty=1, price=50.00
Two design choices do the heavy lifting:
- Prefixes make sort keys range-queryable by type. A2 (“orders for a customer, newest first”) is
PK = CUST#a1b2 AND begins_with(SK, "ORDER#"), withScanIndexForward = falseto reverse the sort. Because the order date is the first sortable component of the SK (ISO-8601), newest-first falls out for free. - Line items hang off the order, not the customer. A3 (“an order with its items”) is
PK = ORDER#o-9001— one query returns the order metadata row and itsITEM#rows, because they share a partition. This is the adjacency-list pattern.
Writing A6 (order plus line items, atomically) uses TransactWriteItems, covered in Section 7.
# A2: all orders for a customer, newest first (DynamoDB JSON via AWS CLI)
aws dynamodb query \
--table-name app-main \
--key-condition-expression "PK = :pk AND begins_with(SK, :prefix)" \
--expression-attribute-values '{":pk":{"S":"CUST#a1b2"},":prefix":{"S":"ORDER#"}}' \
--no-scan-index-forward
A practical rule for sort-key composition: order the components from coarsest to finest, and only put something in the SK if you will range-query or sort on it.
ORDER#<date>#<orderId>lets you filter a date range withBETWEEN;ORDER#<orderId>#<date>does not.
3. Global secondary indexes: sparse indexes, index overloading, projections
The base table answers patterns keyed on the customer or the order. A4 and A5 need a different access shape — that is what a Global Secondary Index is for: an alternate (PK, SK) over the same items, maintained asynchronously on every write.
Three GSI techniques carry single-table design:
Index overloading. Add generic attributes GSI1PK / GSI1SK and let each entity type populate them with whatever it needs to be found by. One physical index serves many logical patterns. For A4 (“orders in a status for a customer”), order items set:
GSI1PK = CUST#a1b2#SHIPPED GSI1SK = 2026-06-03#o-9044
A4 becomes Query on GSI1 with GSI1PK = CUST#a1b2#SHIPPED.
Sparse indexes. An item appears in a GSI only if it has both of that index’s key attributes — a feature, not a limitation. For A5 (“all open orders across all customers”), do not index every order, only OPEN ones. Write GSI2PK = "OPEN" only while the order is open, and remove the attribute when it ships. The index then holds exactly the working set of open orders, so the ops query touches a fraction of the data. This is the canonical sparse-index pattern: a queue you Query by presence.
# A7: status -> SHIPPED, which REMOVES the item from the sparse "open orders" GSI
aws dynamodb update-item \
--table-name app-main \
--key '{"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-01#o-9001"}}' \
--update-expression "SET #s = :shipped REMOVE GSI2PK, GSI2SK" \
--expression-attribute-names '{"#s":"status"}' \
--expression-attribute-values '{":shipped":{"S":"SHIPPED"}}'
Projection choices. A GSI stores a copy of attributes, billed as extra storage and extra write capacity on every base-table write that touches a projected attribute. Choose deliberately:
KEYS_ONLY— index plus base key attributes only. Smallest, cheapest. Use when you only need to find the key, thenGetItemthe full record.INCLUDE— keys plus a named list. The right default: project exactly what the index’s queries read.ALL— every attribute. Most expensive in storage and write cost. Reserve it for indexes whose queries genuinely need the whole item.
Project narrowly and widen later: you can create a new GSI online, but you cannot shrink a projection in place.
ALLon a wide, hot item is a line item you will see on the bill.
GSIs have their own provisioned throughput (or share the table’s on-demand capacity). Critically, under provisioned mode, if a GSI is throttled, base-table writes throttle too — an under-provisioned index becomes a write bottleneck for the whole table.
4. Modeling relationships: adjacency lists, hierarchies, many-to-many
Single-table design models relationships by placement, not by joins.
One-to-many (adjacency list). Already shown: parent and children share a partition (ORDER#o-9001 owns its ITEM# rows). One Query returns the aggregate.
Hierarchies. Encode the path in the sort key. A category tree — Electronics > Audio > Headphones — stores SK = CATEGORY#Electronics#Audio#Headphones. begins_with(SK, "CATEGORY#Electronics#Audio") returns the whole subtree in one query, because lexicographic ordering on the delimited path mirrors the tree.
Many-to-many. The relational answer is a join table; the DynamoDB answer is to materialize both directions of the edge and flip them with a GSI. For users-in-groups: store a membership item, then use GSI1 to invert PK and SK.
PK SK GSI1PK GSI1SK
--------------- --------------- --------------- ---------------
USER#u1 GROUP#g1 GROUP#g1 USER#u1
USER#u1 GROUP#g2 GROUP#g2 USER#u1
USER#u2 GROUP#g1 GROUP#g1 USER#u2
- “Which groups is user
u1in?” -> base table,PK = USER#u1 AND begins_with(SK, "GROUP#"). - “Which users are in group
g1?” -> GSI1,GSI1PK = GROUP#g1.
One item, two access directions, no second write to keep in sync. When the relationship carries denormalized data (a group name shown on the user’s view), accept the duplication and reconcile it with DynamoDB Streams (Section 8) rather than reading two items per query.
5. Write sharding to avoid hot partitions and throttling
DynamoDB spreads data across partitions by hashing the partition key. Two failure modes follow: a single partition key taking disproportionate traffic (a hot key), and the hard physical ceiling — a single partition sustains roughly 1,000 WCU and 3,000 RCU. Exceed either and you throttle, even if table-level capacity looks healthy.
Adaptive capacity helps but does not excuse key design. DynamoDB shifts capacity toward busy partitions and can isolate a single hot item, but it cannot exceed those per-partition limits. A key that needs more than 1,000 WCU must spread across multiple physical partitions — which means more than one partition-key value.
Time-series keys are the classic trap. A PK of the current date sends every write today to one partition; yesterday’s is cold. If you must key on time, write-shard: append a calculated suffix to fan writes across N logical partitions.
import hashlib
SHARDS = 10 # tune to required WCU / 1000, rounded up
def shard_suffix(item_id: str, shards: int = SHARDS) -> int:
# Deterministic so the read side can recompute it
h = hashlib.md5(item_id.encode()).hexdigest()
return int(h, 16) % shards
# write: PK = "ORDER#2026-06-08#7" (date + shard)
pk = f"ORDER#2026-06-08#{shard_suffix('o-9001')}"
The tradeoff is explicit: reading all of today’s orders now means N queries (PK = ORDER#2026-06-08#0 … #9) merged client-side. Sharding trades read fan-out for write throughput. Two ways to pick the suffix:
- Calculated suffix (above) — deterministic from a key attribute, so a point read recomputes the exact shard. Best when you read by ID.
- Random suffix —
random.randint(0, N-1). Maximizes spread, but you can only read by scattering across all N shards. Best for pure write-then-batch-read workloads.
Diagnose hot partitions with CloudWatch Contributor Insights for DynamoDB, which surfaces the most-accessed partition keys.
ThrottledRequestson a table that is nowhere near its provisioned total is the signature of a hot key, not insufficient capacity.
6. Capacity modes: on-demand vs provisioned with auto scaling
Two billing models, and the choice is about traffic shape, not just volume.
On-demand bills per request, scales instantly, and needs zero capacity planning. It keeps prior peaks warm so it can double instantly from the previous high-water mark, but a genuine cold 10x spike can still throttle for a moment. Use it for new tables (unknown traffic), spiky workloads, and dev/test.
Provisioned reserves RCU/WCU and is materially cheaper per request for steady, predictable load — but you pay for that capacity whether you use it or not. Pair it with Application Auto Scaling, which tracks a target utilization (typically 70%) between a min and max. It reacts on a CloudWatch alarm timescale (a minute or two): good for gentle diurnal curves, poor at absorbing sharp spikes.
# Provisioned table with target-tracking auto scaling (Terraform)
resource "aws_dynamodb_table" "main" {
name = "app-main"
billing_mode = "PROVISIONED"
hash_key = "PK"
range_key = "SK"
read_capacity = 50
write_capacity = 50
attribute { name = "PK" type = "S" }
attribute { name = "SK" type = "S" }
}
resource "aws_appautoscaling_target" "write" {
service_namespace = "dynamodb"
resource_id = "table/${aws_dynamodb_table.main.name}"
scalable_dimension = "dynamodb:table:WriteCapacityUnits"
min_capacity = 50
max_capacity = 2000
}
resource "aws_appautoscaling_policy" "write" {
name = "write-target-70"
service_namespace = aws_appautoscaling_target.write.service_namespace
resource_id = aws_appautoscaling_target.write.resource_id
scalable_dimension = aws_appautoscaling_target.write.scalable_dimension
policy_type = "TargetTrackingScaling"
target_tracking_scaling_policy_configuration {
target_value = 70.0
predefined_metric_specification {
predefined_metric_type = "DynamoDBWriteCapacityUtilization"
}
}
}
For a stable baseline, reserved capacity discounts that floor in exchange for a one- or three-year commitment — buy it for the min, let auto scaling handle the rest. You can switch a table between on-demand and provisioned only once every 24 hours, so the mode is not a runtime knob.
7. Transactions, condition expressions, and optimistic concurrency
A PutItem is atomic for one item and immediately visible to strongly-consistent reads on the base table. The harder guarantees come from three tools.
Condition expressions make a write conditional and reject it atomically otherwise. The most important one prevents blind overwrites: attribute_not_exists(PK) makes PutItem an insert, failing with ConditionalCheckFailedException if the item already exists.
Optimistic concurrency uses a version attribute so a lost update is rejected rather than silently clobbered:
# Update only if version is unchanged; bump it in the same call
aws dynamodb update-item \
--table-name app-main \
--key '{"PK":{"S":"ORDER#o-9001"},"SK":{"S":"META"}}' \
--update-expression "SET #st = :new, version = :nextv" \
--condition-expression "version = :curv" \
--expression-attribute-names '{"#st":"status"}' \
--expression-attribute-values '{":new":{"S":"PAID"},":curv":{"N":"7"},":nextv":{"N":"8"}}'
If another writer bumped version to 8 first, the condition version = 7 fails; you re-read and retry. No locks, no contention beyond the conflicting writers.
TransactWriteItems gives all-or-nothing across up to 100 items (and multiple tables), each with its own condition. This is how A6 inserts an order and its line items atomically:
{
"TransactItems": [
{ "Put": {
"TableName": "app-main",
"Item": {"PK":{"S":"CUST#a1b2"},"SK":{"S":"ORDER#2026-06-08#o-9100"},"status":{"S":"OPEN"}},
"ConditionExpression": "attribute_not_exists(PK)"
}},
{ "Put": {
"TableName": "app-main",
"Item": {"PK":{"S":"ORDER#o-9100"},"SK":{"S":"ITEM#001"},"sku":{"S":"ABC"}}
}}
]
}
Two costs to internalize: a transactional write consumes 2x the WCU of the same non-transactional write (prepare plus commit), and a transaction fails entirely if any condition fails or it collides with another transaction on the same item (TransactionCanceledException, with per-item reasons). Use transactions where you need the invariant; do not wrap every write in one.
8. Migrations and backfills: evolving the schema without downtime
Single-table schemas evolve constantly — a new access pattern means a new GSI or a derived attribute. DynamoDB is schemaless at the item level, so adding attributes needs no migration. The work is in indexes and backfills.
Adding a GSI is an online operation. UpdateTable with a GSI create returns immediately; DynamoDB backfills it in the background while the table stays fully available. The index reports CREATING then ACTIVE — do not query it until ACTIVE, and watch OnlineIndexPercentageProgress. A table allows at most 20 GSIs, with only one create or delete in flight at a time.
aws dynamodb update-table \
--table-name app-main \
--attribute-definitions \
AttributeName=GSI3PK,AttributeType=S AttributeName=GSI3SK,AttributeType=S \
--global-secondary-index-updates '[{
"Create": {
"IndexName": "GSI3",
"KeySchema": [
{"AttributeName":"GSI3PK","KeyType":"HASH"},
{"AttributeName":"GSI3SK","KeyType":"RANGE"}
],
"Projection": {"ProjectionType":"INCLUDE","NonKeyAttributes":["status","total"]}
}
}]'
A new GSI only contains items that already carry GSI3PK/GSI3SK. Existing items stay invisible to it until you write those attributes — that is the backfill.
Backfill with a throttled job, not a Scan-and-update loop that melts capacity. Pattern: parallel Scan with Segment/TotalSegments, transform each item, UpdateItem the new attributes, rate-limited against provisioned capacity. AWS Glue, Step Functions, or a Lambda fan-out are the usual harnesses. Make the transform idempotent (a condition like attribute_not_exists(GSI3PK) so re-runs skip done items) and write-shard the target if the new key would be hot.
For continuous reshaping, use DynamoDB Streams. A Lambda reacts to every change to keep a denormalized copy or new index attribute current — the same machinery that reconciles the many-to-many duplication from Section 4. For a one-time bulk transform across a huge table, export to S3 (a point-in-time export that consumes no read capacity), transform with Athena or Glue, and BatchWriteItem the result back, keeping the migration entirely off the live table’s capacity.
Verify
Prove the model against the access-pattern list, not against intuition.
# 1. Every steady-state read is a Query/GetItem, never a Scan.
# Run each access pattern and confirm it returns in one call.
aws dynamodb query --table-name app-main \
--key-condition-expression "PK = :pk AND begins_with(SK, :p)" \
--expression-attribute-values '{":pk":{"S":"CUST#a1b2"},":p":{"S":"ORDER#"}}' \
--return-consumed-capacity TOTAL
# 2. Confirm the sparse GSI holds only the working set (open orders),
# not every order.
aws dynamodb query --table-name app-main --index-name GSI2 \
--key-condition-expression "GSI2PK = :open" \
--expression-attribute-values '{":open":{"S":"OPEN"}}' \
--select COUNT
# 3. Look for throttling that indicates a hot partition.
aws cloudwatch get-metric-statistics \
--namespace AWS/DynamoDB --metric-name ThrottledRequests \
--dimensions Name=TableName,Value=app-main \
--start-time "$(date -u -v-1H +%Y-%m-%dT%H:%M:%SZ)" \
--end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--period 300 --statistics Sum
--return-consumed-capacity TOTALreveals real RCU cost; a query consuming far more than the rows returned is reading and filtering — fix the keys.- Non-zero
ThrottledRequestswhileConsumedWriteCapacityUnitssits below provisioned confirms a hot key. Cross-check it in Contributor Insights. - After a backfill,
--select COUNTon old and new indexes should reconcile; a mismatch means the job missed items or the idempotency condition skipped too many.
Enterprise scenario
A logistics platform tracked shipment events — scans, status changes, geofence crossings — in a table keyed PK = SHIPMENT#<id>, SK = EVENT#<timestamp>. It worked until peak season. A handful of mega-shipments (pallets with thousands of scanned parcels) accumulated tens of thousands of events under one partition key. Two failures hit at once: those hot partitions threw ThrottledRequests while the table sat at 40% of provisioned capacity (the per-partition 1,000-WCU ceiling, not table capacity), and a per-shipment rollup item approached the 400 KB item-size limit as they appended to a growing summary array.
The fix was two changes, both deployable without downtime. First, write-shard the event partition for high-volume shipments — PK = SHIPMENT#<id>#<shard> with a calculated suffix from the event ID — fanning hot shipments across 10 partitions. Full-history reads became 10 parallel queries merged client-side, acceptable because that query was rare and the per-event writes were the hot path. Second, they modeled each rollup increment as its own item instead of an unbounded array, sidestepping the 400 KB ceiling, with a Streams Lambda maintaining a small fixed-size “latest status” item for the dashboard read.
# Shard only high-volume shipments; keep small ones single-partition
# so their reads stay a single query.
def event_pk(shipment_id: str, event_id: str, high_volume: bool) -> str:
if not high_volume:
return f"SHIPMENT#{shipment_id}"
shard = int(hashlib.md5(event_id.encode()).hexdigest(), 16) % 10
return f"SHIPMENT#{shipment_id}#{shard}"
The lesson: adaptive capacity had silently smoothed things for two years, so the team assumed the key design was fine. It was below the per-partition limit only until peak traffic pushed a few keys over it. Hot-partition risk is a function of the busiest key, not the average — and it stays invisible until the day it isn’t.