The Cardinal Property: Re-Runnable Or Not A Migration
A migration script that you can run twice and get the same final state is idempotent. A migration script that you run twice and get duplicate rows, half the data, or a corrupt state is a single-shot weapon. Every production migration script must be idempotent — because the alternative is “we can’t retry on failure, so we’d better hope nothing fails.”
The cardinal property has four pillars:
| Pillar | What it provides | Failure if missing |
|---|---|---|
| Checkpoint | Resume from where you stopped | Re-process N hours of data on every retry |
| Watermark | Know what you’ve already processed | Duplicates from re-reading the same source range |
| Staging table | Atomic cutover at the end | Partial state visible to readers mid-migration |
| Back-out plan | Reverse the change cleanly | “We can’t roll back” → 6-hour outage |
Every script in this lesson encodes these pillars. The companion lib/migrate.sh makes them one-liner calls.
Pillar 1: Checkpoint Files — Resume From The Middle
A migration that processes 10 million rows fails at row 4 million. Without a checkpoint, retry starts from zero — re-processing 4M rows you already wrote and risking duplicates. With a checkpoint, retry starts at 4M+1.
The pattern:
#!/usr/bin/env bash
set -euo pipefail
readonly STATE=/var/lib/migrate/users-2026-06-22.state
readonly BATCH=1000
mkdir -p "$(dirname "$STATE")"
last=$(cat "$STATE" 2>/dev/null || echo 0)
while :; do
# Read next batch from source, starting after the last processed id
rows=$(psql --tuples-only --no-align -F$'\t' --command="
SELECT id, name, email FROM users_old
WHERE id > $last ORDER BY id ASC LIMIT $BATCH
")
[[ -z "$rows" ]] && break
# Process the batch — apply transformations, write to destination
while IFS=$'\t' read -r id name email; do
upsert_user "$id" "$name" "$email"
last="$id"
done <<< "$rows"
# Persist checkpoint AFTER the batch — atomic write via temp+mv
echo "$last" > "$STATE.tmp"
mv "$STATE.tmp" "$STATE"
done
echo "Migration complete; last processed id=$last"
Critical details:
- Atomic checkpoint write: temp file + mv. A crash mid-write of a non-atomic file would leave the checkpoint corrupted, defeating the whole point.
- Checkpoint after batch, not after each row: rows-per-batch is a knob. Smaller = finer-grained recovery, slower throughput. Common choice: 100-10,000 depending on cost-per-row of the upsert.
- Checkpoint file is per-migration, dated:
users-2026-06-22.state. Don’t reuse files across migrations — confusion guaranteed. ORDER BY id ASC: the checkpoint discipline only works if the source has a stable, monotonic key. If your source table has no such key, you must add one (a serial column or a composite key) before migrating.
When The Source Has No Monotonic Key
If you’re migrating from a key-value store or a denormalized log, you may not have an integer id. Solutions:
- Hash-based pagination: process all rows where
hash(key) % 100 = 0, then1, then2, etc. Each shard checkpoints independently. - Time-based: source has a
created_atcolumn. Watermark by timestamp (next pillar). - External enumeration: list all keys to a flat file at start, process the flat file with line-number checkpoint.
The last option is most robust for “mostly static” sources because the enumeration is taken once at the start and the migration is reading from a stable offline list, not the live source.
Pillar 2: Watermarks — The Key To Incremental ETL
A watermark is “the highest source value we’ve already processed.” It’s a checkpoint that’s also a resume cursor for incremental ETL — the migration runs once for backfill, then runs every hour to capture new rows.
#!/usr/bin/env bash
# incremental-etl.sh — runs every hour from cron
set -euo pipefail
readonly WATERMARK=/var/lib/etl/orders.watermark
readonly LOG=/var/log/etl/orders.log
mkdir -p "$(dirname "$WATERMARK")" "$(dirname "$LOG")"
last_ts=$(cat "$WATERMARK" 2>/dev/null || echo "1970-01-01T00:00:00")
# Read everything modified since the watermark
new_max=$(psql --tuples-only --no-align --command="
WITH new_rows AS (
SELECT * FROM orders WHERE updated_at > '$last_ts' ORDER BY updated_at ASC
)
SELECT max(updated_at) FROM new_rows
")
# Process the rows — left as exercise to reader (call upsert_order on each)
psql --tuples-only --no-align -F$'\t' --command="
SELECT id, customer_id, total, updated_at FROM orders
WHERE updated_at > '$last_ts' ORDER BY updated_at ASC
" | while IFS=$'\t' read -r id customer total ts; do
upsert_order "$id" "$customer" "$total" "$ts"
done
# Advance the watermark — only if processing succeeded (set -e bails otherwise)
if [[ -n "$new_max" && "$new_max" != "" ]]; then
echo "$new_max" > "$WATERMARK.tmp"
mv "$WATERMARK.tmp" "$WATERMARK"
fi
The watermark advances monotonically. If the script fails midway, the watermark stays at the last successful run; the next hour’s run picks up everything since then.
Watermark + Late-Arriving Data
What if a row’s updated_at is set to a time before the current watermark (clock skew on writers, or out-of-order replication)? Late-arriving data is invisible to the watermark, and silently lost.
Mitigations:
- Lookback window: process anything in the last 24 hours, not just since the watermark. Idempotent upserts make duplicates harmless.
cutoff=$(date -u -d "24 hours ago" -Iseconds) effective_watermark=$(awk -v a="$last_ts" -v b="$cutoff" 'BEGIN { print (a < b) ? a : b }') - Source-level event time + watermark: add
event_timeseparate fromupdated_at. Watermark onevent_time, process by both. - Reconciliation pass: a separate weekly job that diff-counts source vs. destination and re-syncs any drift.
Reconciliation passes are the gold standard. The hourly watermark gives you near-realtime; the weekly recon catches everything else.
Pillar 3: Staging Tables — The Atomic Cutover
When migrating to a new schema or a new system, you have a window where:
- Source has all data, destination has none.
- You want to backfill destination without writers seeing inconsistent state.
- At the end, you want to atomically swap source for destination.
The staging table pattern:
- Create destination table
users_newwith new schema. - Backfill
users_newfromusers_old(resumable with checkpoint). - Verify row counts and checksums match.
- In a single transaction:
BEGIN; ALTER TABLE users RENAME TO users_old_archive; ALTER TABLE users_new RENAME TO users; COMMIT;.
The RENAME TO is metadata-only and runs in milliseconds. Readers see consistent state before and after; there’s no in-between.
#!/usr/bin/env bash
# users-migration.sh
set -euo pipefail
source /usr/local/lib/migrate.sh
readonly SRC=users_old
readonly STAGE=users_new
readonly FINAL=users
# 1. Create staging table
psql --command="
CREATE TABLE IF NOT EXISTS $STAGE (
id BIGINT PRIMARY KEY,
name TEXT NOT NULL,
email_norm TEXT NOT NULL,
email_domain TEXT GENERATED ALWAYS AS (split_part(email_norm,'@',2)) STORED
);
CREATE INDEX IF NOT EXISTS $STAGE_email_idx ON $STAGE (email_norm);
"
# 2. Backfill (resumable with checkpoint)
migrate_resumable_batch "$SRC" "$STAGE" 1000 backfill_user_record
# 3. Verify
src_count=$(psql --tuples-only --no-align --command="SELECT count(*) FROM $SRC")
stg_count=$(psql --tuples-only --no-align --command="SELECT count(*) FROM $STAGE")
if (( src_count != stg_count )); then
echo "FAIL: src=$src_count stage=$stg_count"
exit 1
fi
# Domain-specific verification — sample a random subset and diff
migrate_sample_diff "$SRC" "$STAGE" 100 || exit 1
# 4. Atomic cutover
psql --command="
BEGIN;
ALTER TABLE $FINAL RENAME TO ${SRC}_archived;
ALTER TABLE $STAGE RENAME TO $FINAL;
COMMIT;
"
echo "OK: cutover complete. Old data in ${SRC}_archived (drop after 30 days)"
The 30-day retention on users_old_archived is your back-out window: if a problem surfaces in week 2, you can reverse the cutover by running the rename in opposite order.
When The Source Is The Live Production Table
The pattern above assumes you can read from users_old while it’s still being written. For most cases, this is fine because:
- Backfill reads at a snapshot (Postgres MVCC).
- New writes during backfill are captured by a trigger on the source that mirrors them to the staging table.
CREATE OR REPLACE FUNCTION users_mirror_trigger() RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN
INSERT INTO users_new (id, name, email_norm)
VALUES (NEW.id, NEW.name, lower(trim(NEW.email)))
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, email_norm = EXCLUDED.email_norm;
ELSIF TG_OP = 'DELETE' THEN
DELETE FROM users_new WHERE id = OLD.id;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER users_mirror AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION users_mirror_trigger();
Pattern: install trigger → start backfill → verify → cutover (RENAME TO) → drop trigger after observed stable.
This is exactly what gh-ost does for online schema change. For the shell version, the discipline is to install the trigger first, then start the backfill — so any writes during backfill are captured.
Pillar 4: The Back-Out Plan
Every migration must have a documented, tested back-out path. Without it, you’re betting the company on the migration script being perfect.
A back-out plan answers:
- What is reversed? (“Drop the new column”, “Rename users_new back to users_old”, “Restore from backup”.)
- At what point in the migration is each step viable? (Pre-cutover: just delete staging table. Post-cutover-pre-archive-drop: rename back. Post-archive-drop: restore from backup.)
- What is the RTO of the back-out? (Pre-cutover: seconds. Post-archive-drop: 4 hours from PITR.)
- What data loss does back-out cause? (Writes since cutover are lost — measure how many.)
Sample Back-Out Document
# Back-out plan: users-migration v1
## Stage A — pre-staging-table-creation
- Action: nothing, no state has changed.
- RTO: 0.
## Stage B — staging table created, backfill in progress
- Action: `DROP TABLE users_new; -- and disable trigger if installed`
- RTO: < 1 min.
- Data loss: none (source untouched).
## Stage C — backfill complete, before cutover
- Action: drop staging table + trigger.
- RTO: < 5 min.
- Data loss: none.
## Stage D — cutover done, archive table still present (within 30 days)
- Action:
```sql
BEGIN;
ALTER TABLE users RENAME TO users_new_archived;
ALTER TABLE users_old_archived RENAME TO users;
COMMIT;
- RTO: < 5 min.
- Data loss: any writes since cutover are lost. Measure via:
SELECT max(updated_at) FROM users WHERE updated_at > '<cutover_time>';
Stage E — archive table dropped (post day 30)
- Action: PITR restore from backup taken at cutover.
- RTO: 2-4 hours.
- Data loss: 0 with PITR; full data since cutover replayed.
The discipline is to **rehearse stages B-D in a staging environment** before going live. Stage E should be tested at least once a year as part of standard DR drills.
## The Drop-In `lib/migrate.sh`
```bash
# lib/migrate.sh — sourced helpers for migration scripts.
#
# Required env:
# MIGRATION_NAME — short identifier (e.g., users-2026-06-22)
#
# Optional env:
# MIGRATION_STATE_DIR — default /var/lib/migrate
# DRY_RUN — true|false, default false
set -o errexit -o nounset -o pipefail
: "${MIGRATION_NAME:?MIGRATION_NAME must be set}"
: "${MIGRATION_STATE_DIR:=/var/lib/migrate}"
: "${DRY_RUN:=false}"
readonly MIGRATE_STATE="$MIGRATION_STATE_DIR/$MIGRATION_NAME.state"
readonly MIGRATE_LOG="$MIGRATION_STATE_DIR/$MIGRATION_NAME.log"
migrate_log() {
printf '[%s] [%s] %s\n' "$(date -Iseconds)" "$MIGRATION_NAME" "$*" \
| tee -a "$MIGRATE_LOG"
}
migrate_init() {
mkdir -p "$MIGRATION_STATE_DIR"
}
# Atomic checkpoint write
migrate_checkpoint_write() {
local value="$1"
echo "$value" > "$MIGRATE_STATE.tmp"
mv "$MIGRATE_STATE.tmp" "$MIGRATE_STATE"
}
migrate_checkpoint_read() {
cat "$MIGRATE_STATE" 2>/dev/null || echo 0
}
# Resumable batch loop. Args: src_table, dest_table, batch_size, process_fn
# process_fn is called with (id, line) for each row.
migrate_resumable_batch() {
local src="$1" dest="$2" batch="$3" process_fn="$4"
local last total=0
migrate_init
last=$(migrate_checkpoint_read)
migrate_log "START batch migration: src=$src dest=$dest batch=$batch resume_from=$last"
while :; do
local rows
rows=$(psql --tuples-only --no-align -F$'\t' --command="
SELECT * FROM $src WHERE id > $last ORDER BY id ASC LIMIT $batch
")
[[ -z "$rows" ]] && break
local batch_count=0
while IFS=$'\t' read -r id rest; do
[[ -z "$id" ]] && continue
if $DRY_RUN; then
migrate_log "DRY-RUN: would process id=$id"
else
"$process_fn" "$id" "$rest"
fi
last="$id"
batch_count=$((batch_count + 1))
done <<< "$rows"
migrate_checkpoint_write "$last"
total=$((total + batch_count))
migrate_log "batch=$batch_count total=$total last=$last"
done
migrate_log "DONE: total=$total final_id=$last"
}
# Watermark-based incremental. Args: src_table, watermark_col, process_fn
migrate_watermark_incremental() {
local src="$1" wmcol="$2" process_fn="$3"
local last new_max
migrate_init
last=$(migrate_checkpoint_read)
[[ -z "$last" || "$last" == "0" ]] && last="1970-01-01T00:00:00"
migrate_log "START watermark: src=$src col=$wmcol since=$last"
# Get new max for atomic advance
new_max=$(psql --tuples-only --no-align --command="
SELECT max($wmcol) FROM $src WHERE $wmcol > '$last'
")
[[ -z "$new_max" ]] && { migrate_log "no new rows since $last"; return 0; }
psql --tuples-only --no-align -F$'\t' --command="
SELECT * FROM $src WHERE $wmcol > '$last' AND $wmcol <= '$new_max'
ORDER BY $wmcol ASC
" | while IFS=$'\t' read -r line; do
if $DRY_RUN; then
migrate_log "DRY-RUN: $line"
else
"$process_fn" "$line"
fi
done
if ! $DRY_RUN; then
migrate_checkpoint_write "$new_max"
fi
migrate_log "DONE: advanced watermark to $new_max"
}
# Verify counts match between source and destination. Args: src, dest, [where_clause]
migrate_verify_counts() {
local src="$1" dest="$2" where="${3:-}"
local src_count dest_count
src_count=$(psql --tuples-only --no-align --command="SELECT count(*) FROM $src ${where:+WHERE $where}")
dest_count=$(psql --tuples-only --no-align --command="SELECT count(*) FROM $dest ${where:+WHERE $where}")
migrate_log "VERIFY: src=$src_count dest=$dest_count"
if [[ "$src_count" != "$dest_count" ]]; then
migrate_log "FAIL: count mismatch (delta=$((src_count - dest_count)))"
return 1
fi
migrate_log "OK: counts match"
}
# Sample-and-diff verifier. Args: src, dest, sample_size
migrate_sample_diff() {
local src="$1" dest="$2" n="${3:-100}"
local mismatches=0
for id in $(psql --tuples-only --no-align --command="SELECT id FROM $src ORDER BY random() LIMIT $n"); do
local s d
s=$(psql --tuples-only --no-align --command="SELECT md5(row($src.*)::text) FROM $src WHERE id=$id")
d=$(psql --tuples-only --no-align --command="SELECT md5(row($dest.*)::text) FROM $dest WHERE id=$id")
if [[ "$s" != "$d" ]]; then
migrate_log "DIFF id=$id"
mismatches=$((mismatches + 1))
fi
done
migrate_log "Sample diff: $mismatches/$n mismatches"
(( mismatches == 0 ))
}
# Atomic cutover via rename. Args: live_table, staging_table, archive_suffix
migrate_cutover() {
local live="$1" stage="$2" suffix="${3:-_archived_$(date +%Y%m%d)}"
migrate_log "CUTOVER: $live → ${live}${suffix}; $stage → $live"
if $DRY_RUN; then
migrate_log "DRY-RUN: skipping cutover"
return 0
fi
psql --command="
BEGIN;
ALTER TABLE $live RENAME TO ${live}${suffix};
ALTER TABLE $stage RENAME TO $live;
COMMIT;
"
migrate_log "OK: cutover complete"
}
Worked Example: User Email Normalization Migration
#!/usr/bin/env bash
# users-email-normalize-migration.sh
set -euo pipefail
MIGRATION_NAME=users-email-normalize-2026-06-22
DRY_RUN=${DRY_RUN:-false}
source /usr/local/lib/migrate.sh
# Process function — called once per source row
process_user() {
local id="$1" line="$2"
local name email
name=$(echo "$line" | awk -F$'\t' '{print $1}')
email=$(echo "$line" | awk -F$'\t' '{print $2}')
local email_norm
email_norm=$(echo "$email" | tr '[:upper:]' '[:lower:]' | sed 's/^ *//; s/ *$//')
psql --command="
INSERT INTO users_new (id, name, email_norm)
VALUES ($id, '$(echo "$name" | sed "s/'/''/g")', '$email_norm')
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, email_norm = EXCLUDED.email_norm
"
}
# 1. Backfill
migrate_resumable_batch users users_new 1000 process_user
# 2. Verify counts
migrate_verify_counts users users_new
# 3. Sample-diff (NB: this won't match because schema differs — example only)
# migrate_sample_diff users users_new 100
# 4. Cutover (run only when verified — typically a separate runbook step)
# migrate_cutover users users_new
The ON CONFLICT (id) DO UPDATE is the upsert that makes processing idempotent — re-running the same row just overwrites the destination with the same data, no duplicates.
Escape SQL Single Quotes — The sed "s/'/''/g" Trick
In Postgres, single quotes in string literals are escaped by doubling: 'O''Brien'. The sed "s/'/''/g" does this. Use parameterized queries (psql -v, prepared statements) wherever possible — the sed approach is fine for trusted internal data but is SQL injection waiting for user-controlled input.
For untrusted input, switch to a real DB driver:
python3 -c "
import psycopg2, sys
conn = psycopg2.connect(...)
cur = conn.cursor()
for line in sys.stdin:
name, email = line.strip().split('\t')
cur.execute('INSERT ... VALUES (%s, %s, %s)', (id, name, email))
conn.commit()
"
The hybrid pattern — bash for orchestration, Python for DB work — is the right answer for migrations involving user-controlled fields.
ETL From Files: The Same Pattern, Different Source
ETL from CSV/JSONL into a database uses the same checkpoint discipline:
#!/usr/bin/env bash
# csv-to-db-etl.sh
set -euo pipefail
MIGRATION_NAME=etl-orders-csv-2026-06-22
source /usr/local/lib/migrate.sh
readonly SRC=/data/orders.csv
migrate_init
last_line=$(migrate_checkpoint_read)
[[ -z "$last_line" || "$last_line" == "0" ]] && last_line=0
# tail -n +$((last_line + 2)) skips header and previously-processed lines
total=0
tail -n "+$((last_line + 2))" "$SRC" | while IFS=, read -r id customer total ts; do
migrate_log "DEBUG: processing line $((last_line + 1))"
if ! $DRY_RUN; then
psql --command="INSERT INTO orders (id,customer,total,ts) VALUES ($id,$customer,$total,'$ts') ON CONFLICT (id) DO NOTHING"
fi
last_line=$((last_line + 1))
if (( last_line % 1000 == 0 )); then
migrate_checkpoint_write "$last_line"
migrate_log "checkpoint: $last_line"
fi
done
migrate_checkpoint_write "$last_line"
migrate_log "DONE: processed up to line $last_line"
The line-number checkpoint works because CSV is read in order. For JSONL the same pattern applies. For Parquet you’d use row-group-level checkpoints or a real ETL tool (dbt, Airflow).
The 8 Footguns
1. Non-Idempotent Processing
INSERT INTO ... (without ON CONFLICT) on a re-run gives unique-constraint violations or duplicates. Fix: Always INSERT ... ON CONFLICT DO UPDATE (Postgres) or INSERT IGNORE / ON DUPLICATE KEY UPDATE (MySQL), keyed by a natural key not a surrogate.
2. Checkpoint Saved Before Batch Commits
Save checkpoint before COMMIT and a crash leaves the checkpoint advanced past data that didn’t actually persist. Fix: Write checkpoint after commit. If using transactions, save the checkpoint inside the same transaction.
3. Watermark On updated_at With No Lookback
Late-arriving data is silently dropped. Fix: Lookback window of N hours, idempotent processing makes overlap harmless.
4. Mistaking set -e For Error Handling
set -e aborts on first error, but it does NOT abort on errors inside &&/|| chains, in pipes (without set -o pipefail), or in subshells. A failed psql inside a pipe might still let the script proceed and write checkpoint as if it succeeded. Fix: set -o pipefail, explicit return-code checks on critical commands.
5. Forgetting To Run The Trigger Before Backfill (Live Source)
If the trigger is added after backfill starts, writes during backfill are not mirrored to staging. Fix: Trigger first, then backfill. Verify trigger captures writes by running INSERT on source and checking destination immediately.
6. Cutover With Active Long Transactions
ALTER TABLE ... RENAME TO ... requires an ACCESS EXCLUSIVE lock; if a long-running query holds an ACCESS SHARE lock, the rename blocks indefinitely. Fix: Cancel long queries before cutover, or SET lock_timeout = '5s' to fail fast. Run cutover in a low-traffic window.
7. Missing Back-Out Test In Staging
You wrote the back-out plan but never executed it. The first time you actually run it, it fails on a permission issue or a forgotten cascade. Fix: Rehearse the back-out in staging before running the migration in prod. The dress rehearsal catches the broken back-out.
8. ETL Loads Without Validation Of Counts/Sums
Migration says “complete” but a transient psql error mid-batch silently dropped 50,000 rows. The loss is invisible until a downstream report fails three days later. Fix: Always run migrate_verify_counts after backfill, and a sum-of-numeric-columns check (SELECT sum(amount) FROM src/dest) for financial data. Differences > 0 abort the cutover.
The Migration Runbook Template
Every migration script should have a parallel runbook document. Sample skeleton:
# Migration: <name>
**Owner**: <team>
**Risk**: low/medium/high
**Estimated downtime**: <duration>
## Pre-checks
- [ ] Staging environment migration completed cleanly
- [ ] Back-out plan tested in staging
- [ ] Source table has stable monotonic key OR watermark column
- [ ] Replication lag < 30s (if cross-region)
- [ ] No long-running queries (cancel any active sessions in target schema)
- [ ] Backup taken within last 24h
## Execution
1. Install mirror trigger: `migration-trigger.sql` — runtime ~5s
2. Start backfill: `users-migration.sh` — runtime ~2h
3. Verify: `verify.sh` — runtime ~10min
4. Cutover (low-traffic window): `cutover.sql` — runtime <1s
5. Drop trigger: `drop-trigger.sql` — runtime ~5s
## Post-checks
- [ ] Application smoke test passes
- [ ] Replication caught up
- [ ] Monitoring graphs show no error spike
- [ ] Schedule archive table drop for +30 days
## Back-out
See backout.md.
The runbook is the operational artifact. The script is just one of its steps.
Quick-Reference Card
PILLARS
Checkpoint : resume from last successful position
Watermark : track highest processed source value
Staging : backfill into shadow table, atomic rename
Back-out : documented, tested reverse path
CHECKPOINT WRITES
- Atomic: write to .tmp, rename
- After batch commits, not before
- Per-migration file (don't reuse across migrations)
IDEMPOTENT PROCESSING
Postgres: INSERT ... ON CONFLICT (key) DO UPDATE SET ...
MySQL: INSERT ... ON DUPLICATE KEY UPDATE ...
Or: MERGE statement
WATERMARK + LOOKBACK
cutoff = max(watermark, now() - 24h)
Process where updated_at > cutoff
Idempotent processing makes overlap harmless
STAGING TABLE CUTOVER
1. CREATE TABLE users_new
2. Install mirror trigger
3. Backfill (resumable)
4. Verify counts + sample diff
5. BEGIN; rename; rename; COMMIT;
6. Drop trigger
7. Drop archive after retention
BACK-OUT STAGES
A: pre-staging — nothing to undo
B: staging exists, backfill — drop staging
C: backfill done, pre-cutover — drop staging
D: post-cutover, pre-drop — reverse rename (within 30d)
E: archive dropped — PITR restore (4h RTO)
What’s Next
You can now perform safe, idempotent, resumable data migrations. The next dimension is compliance scanning: writing shell scripts that produce machine-readable evidence of CIS/STIG/PCI control compliance, signed and bundled in a way auditors accept and engineers can act on.
In the next lesson — Compliance Scanning: STIG/CIS-as-Shell, Evidence Bundles & Signed Reports — we’ll build lib/compliance.sh covering CIS benchmark checks coded as shell tests, evidence bundle generation (JSON + signed metadata), drift detection across the fleet, and the integration pattern with audit tools (OpenSCAP, OSCAL) that turns shell scripts into formal compliance artifacts.