Shell Lesson 41 of 42

Shell Forensics & Incident Response: Order-Of-Volatility Capture, Triage Scripts, Read-Only Examination & The Evidence Chain That Holds Up Under Scrutiny

The First 60 Seconds Decide The Investigation

When you SSH into a host that’s been compromised, you have a small window — minutes, not hours — before the evidence you need is gone. Either the attacker is still active and erasing logs, or the system is still running so memory state changes constantly, or the cron job that masks the indicator runs again and rotates the log. Every command you type from your investigator’s prompt either captures state or changes state. Most shells default to changing.

The discipline of forensic shell scripting:

Pattern What it preserves
Order of volatility Capture most-ephemeral evidence first (memory, sockets) before disk
Read-only examination Examine without writing — every write is contamination
Hash-and-archive Every artifact’s sha256 is recorded the moment of capture
Chain of custody Who captured it, on which host, at which time, with what tool version

This lesson teaches each pattern with shell scripts and a lib/forensics.sh you can source. We are not building EnCase or Volatility — we’re building the first responder’s toolkit that buys you the 30 minutes of evidence the formal tools need.

Order Of Volatility (The Brian Carrier Model)

The textbook order, from most volatile to least:

  1. CPU registers, cache — gone the moment a process exits or context-switches.
  2. Routing tables, ARP cache, kernel state — flushed on reboot, or by ip neigh flush all.
  3. Memory (process and kernel) — gone on reboot.
  4. Open network connections, sockets — closed when process exits.
  5. Running processes — gone when killed.
  6. Filesystem timestamps — overwritten on next access.
  7. Disk content — persists, but logs may rotate.
  8. Backup and remote logs — most durable, but lag.

A triage script captures in this order. Capturing CPU registers from shell isn’t realistic without gdb attached pre-incident, but everything from #2 down is reachable.

The Triage Script Skeleton

#!/usr/bin/env bash
# triage.sh — runs in the first 60s of incident response.
set -uo pipefail   # NO 'errexit' — we want every step to attempt even if some fail.

readonly EVIDENCE_DIR=/var/forensics/$(hostname)-$(date +%Y%m%dT%H%M%S)
mkdir -p "$EVIDENCE_DIR"
cd "$EVIDENCE_DIR"

# 1. Network state (most volatile after registers/cache)
ss -tnap > 1-tcp-connections.txt 2>&1
ss -unap > 2-udp-connections.txt 2>&1
ip neigh > 3-arp-cache.txt 2>&1
ip route > 4-routing.txt 2>&1

# 2. Active processes
ps -eo pid,ppid,user,start_time,etime,command --sort=start_time > 5-processes.txt 2>&1
pstree -p > 6-pstree.txt 2>&1

# 3. Open files (per process)
lsof > 7-lsof.txt 2>&1

# 4. Network listeners specifically
ss -tlnp > 8-listeners.txt 2>&1

# 5. Loaded kernel modules
lsmod > 9-modules.txt 2>&1

# 6. Active sessions
who > 10-who.txt 2>&1
last -50 > 11-last.txt 2>&1
w > 12-w.txt 2>&1

# 7. Cron and systemd
ls -la /etc/cron.* /var/spool/cron > 13-cron-list.txt 2>&1
systemctl list-units --all --no-pager > 14-systemd-units.txt 2>&1
systemctl list-timers --all --no-pager > 15-systemd-timers.txt 2>&1

# 8. Recently modified files in suspect locations
find /tmp /var/tmp /dev/shm /home -type f -mtime -1 \
  -exec ls -la {} \; > 16-recent-files.txt 2>&1

# 9. Hash everything captured
sha256sum *.txt > MANIFEST.sha256
echo "host=$(hostname) captured_at=$(date -Iseconds) operator=$USER" > METADATA

echo "Triage complete. Evidence in: $EVIDENCE_DIR"

Note set -uo pipefail without errexit: if lsof fails because of permissions, we still want lsmod to run. The trade-off is verbose stderr; the gain is comprehensive capture.

The numeric prefix (1-, 2-, etc.) preserves capture order in alphabetical listings — auditors can see what was captured in what sequence.

Memory Capture Is The Hardest

Live memory capture from shell requires either:

# Per-process memory dump (e.g., for a suspected malicious process)
capture_process_memory() {
  local pid="$1"
  local out="proc-$pid-mem.dump"

  # Snapshot maps first
  cat "/proc/$pid/maps" > "proc-$pid-maps.txt"
  cat "/proc/$pid/status" > "proc-$pid-status.txt"
  cat "/proc/$pid/cmdline" | tr '\0' ' ' > "proc-$pid-cmdline.txt"

  # Dump memory regions
  while read -r start_end perms _; do
    [[ "$perms" =~ r ]] || continue
    local start=${start_end%-*}
    local end=${start_end#*-}
    local size=$((16#$end - 16#$start))
    dd if="/proc/$pid/mem" bs=1 skip=$((16#$start)) count="$size" 2>/dev/null \
      >> "$out" 2>/dev/null || true
  done < "/proc/$pid/maps"

  sha256sum "$out" >> MANIFEST.sha256
}

This is best-effort — many regions will fail to read (permissions, swapped-out pages). For real memory forensics, install LiME ahead of time and have it loaded; the script just runs insmod lime.ko "path=/var/forensics/mem.lime format=lime".

Network Capture — Catch The Connections While They Exist

ss -tnap (TCP, numeric, all states, with process info) is the modern replacement for netstat -tnap and runs faster.

# Snapshot every TCP connection with the process that owns it
ss -tnap | tee tcp-connections.txt | awk 'NR > 1 && /pid=/ {
  match($0, /pid=([0-9]+)/, m); pids[m[1]] = 1
} END { for (p in pids) print p }' > pids-with-network.txt

Then for each PID in that list, capture process state:

while read -r pid; do
  [[ -d "/proc/$pid" ]] || continue
  cat "/proc/$pid/cmdline" | tr '\0' ' ' > "pid-$pid-cmdline.txt"
  ls -la "/proc/$pid/exe" > "pid-$pid-exe.txt" 2>/dev/null
  cat "/proc/$pid/status" > "pid-$pid-status.txt"
done < pids-with-network.txt

This identifies “which binary is on which connection” — the canonical first question of network-driven incident response.

Read-Only Examination

The cardinal rule of working on a compromised host: never write where you read. Every cd, cat, ls updates atime by default. Every find walks the tree and may touch atimes on millions of files. Forensic-grade examination requires:

  1. Mount the filesystem read-only on a separate machine (gold standard).
  2. If you must work on the live host, use mount -o remount,ro,noatime.
  3. Or work entirely from /proc where all reads are safe.

Pattern: Snapshot To A Separate Disk, Mount Read-Only

# On a forensic workstation: pull a disk image via ssh
ssh -i forensic-key root@compromised \
  "dd if=/dev/sda1 bs=4M status=progress" \
  | dd of=/forensic/sda1.img bs=4M status=progress

# Verify integrity
sha256sum /forensic/sda1.img > /forensic/sda1.img.sha256

# Mount read-only on the workstation
mkdir -p /mnt/evidence
mount -o ro,loop,noatime,nodev,nosuid /forensic/sda1.img /mnt/evidence

# Now examine without altering
ls -la /mnt/evidence/etc/passwd

The nodev,nosuid are critical: the image might contain set-uid binaries or device nodes that, on mount, would be respected by your forensic workstation’s kernel. nodev makes device nodes inert; nosuid makes set-uid bits ignored. Always mount evidence with these flags.

Pattern: Live Examination From /proc

When you can’t pull a disk image (ephemeral cloud instance, no admin access to the hypervisor), /proc is the next-safest examination surface:

# Inspect process N without affecting filesystem
pid=4321

# Binary that's actually running (even if /usr/bin/foo on disk has been replaced)
readlink "/proc/$pid/exe"

# Working directory
readlink "/proc/$pid/cwd"

# Open files
ls -la "/proc/$pid/fd/"

# Environment
tr '\0' '\n' < "/proc/$pid/environ"

# Network namespace (does it match expected?)
readlink "/proc/$pid/ns/net"

# Memory maps (look for tmpfs/anonymous mappings — common malware indicators)
cat "/proc/$pid/maps"

The trick is that even if the attacker replaced /usr/bin/sshd on disk, /proc/<pid>/exe points to the original binary that was loaded at fork-time. So you can hash the running binary and compare to the on-disk one to detect replacement:

running_hash=$(sha256sum "/proc/$pid/exe" | cut -d' ' -f1)
on_disk_hash=$(sha256sum "$(readlink "/proc/$pid/exe")" | cut -d' ' -f1)

if [[ "$running_hash" != "$on_disk_hash" ]]; then
  echo "SUSPICIOUS: running binary != on-disk binary for pid=$pid"
fi

This catches a class of trojan where the attacker replaces the binary on disk but the original is still running in memory.

Hash-And-Archive: The Manifest Discipline

Every forensic capture produces a SHA-tree manifest:

forensics_hash_all() {
  local dir="$1"
  ( cd "$dir" && find . -type f ! -name "MANIFEST*" -print0 \
      | xargs -0 sha256sum | sort -k 2 ) > "$dir/MANIFEST.sha256"
  sha256sum "$dir/MANIFEST.sha256" > "$dir/MANIFEST.sha256.sig"
}

The chain of trust:

  1. Each artifact has its sha256 in MANIFEST.sha256.
  2. MANIFEST.sha256 itself has its sha256 in MANIFEST.sha256.sig.
  3. MANIFEST.sha256.sig is GPG-signed by the operator (next section).

If anyone changes a single byte in any artifact, the chain breaks deterministically.

Bundle Up With Tar + Signed Metadata

forensics_bundle() {
  local dir="$1"
  local out="${dir}.tar"
  tar --create --file="$out" --directory="$(dirname "$dir")" "$(basename "$dir")"
  sha256sum "$out" > "$out.sha256"

  cat > "$out.meta" <<EOF
{
  "bundle": "$(basename "$out")",
  "sha256": "$(sha256sum "$out" | cut -d' ' -f1)",
  "operator": "$USER",
  "host": "$(hostname)",
  "captured_at": "$(date -Iseconds)",
  "tool_version": "lib/forensics.sh v1.0.0",
  "incident_id": "${INCIDENT_ID:-unknown}"
}
EOF

  # Sign everything together
  gpg --batch --yes --output "$out.sig" \
    --detach-sign --armor \
    --local-user "${FORENSIC_KEYID:-incident-response@example.com}" \
    "$out"
}

Now the bundle is portable: ship *.tar, *.tar.sha256, *.tar.meta, *.tar.sig to a forensic archive, and any future investigator can verify it wasn’t tampered with.

Storage: Append-Only Bucket With Object Lock

Backup discipline from L35 applies to forensic evidence too — even more strongly. The bucket holding evidence must have S3 Object Lock in compliance mode, with retention exceeding any anticipated litigation window (typically 7 years for most regulated industries, indefinite for criminal evidence).

The credential pushing evidence to that bucket should have s3:PutObject only, never s3:DeleteObject. Forensic evidence is write-once, read-many forever.

Chain Of Custody: Who Touched What When

Chain of custody is the formal record of who handled the evidence, when, with what tools, and for what purpose. In legal proceedings, broken chain of custody can render evidence inadmissible.

The shell version: a chain-of-custody log appended to on every evidence event:

forensics_coc_log() {
  local action="$1" evidence_id="$2" reason="$3"
  jq -nc \
    --arg ts "$(date -Iseconds)" \
    --arg op "$USER" \
    --arg host "$(hostname)" \
    --arg action "$action" \
    --arg evid "$evidence_id" \
    --arg reason "$reason" \
    --arg incident "${INCIDENT_ID:-unknown}" \
    '{ts:$ts, operator:$op, host:$host, action:$action, evidence:$evid, reason:$reason, incident:$incident}' \
    >> "/var/forensics/chain-of-custody.jsonl"
}

# Usage
forensics_coc_log "captured" "sda1.img" "compromise indicators in auth.log"
forensics_coc_log "examined" "sda1.img" "looking for setuid binaries in /tmp"
forensics_coc_log "transferred" "sda1.img" "sent to legal hold s3 bucket"

The CoC log is itself signed (every line, ideally with HMAC chain-link to the previous line, so tampering is detectable).

A simple chain-link variant:

forensics_coc_log_chained() {
  local action="$1" evidence_id="$2" reason="$3"
  local prev_hash=""
  if [[ -f /var/forensics/chain-of-custody.jsonl ]]; then
    prev_hash=$(tail -1 /var/forensics/chain-of-custody.jsonl | sha256sum | cut -d' ' -f1)
  fi

  local entry
  entry=$(jq -nc \
    --arg ts "$(date -Iseconds)" \
    --arg op "$USER" \
    --arg action "$action" \
    --arg evid "$evidence_id" \
    --arg reason "$reason" \
    --arg prev "$prev_hash" \
    '{ts:$ts, operator:$op, action:$action, evidence:$evid, reason:$reason, prev_hash:$prev}')

  echo "$entry" >> /var/forensics/chain-of-custody.jsonl
}

Each line includes the hash of the previous line. Inserting, deleting, or modifying any line breaks every subsequent hash. This is “Merkle log” style — used by Certificate Transparency, audit-grade git, etc.

The Drop-In lib/forensics.sh

# lib/forensics.sh — sourced helpers for incident-response triage.
#
# Required env (set by the calling script):
#   INCIDENT_ID — unique identifier for this incident (e.g., INC-2026-0042)
#
# Optional env:
#   FORENSICS_DIR — default /var/forensics
#   FORENSIC_KEYID — GPG key for signing
#
# Note: NO 'set -e' — we want every capture step to attempt
#       even if individual ones fail.

set -uo pipefail

: "${INCIDENT_ID:?INCIDENT_ID must be set}"
: "${FORENSICS_DIR:=/var/forensics}"
: "${FORENSIC_KEYID:=incident-response@example.com}"

readonly STAMP=$(date +%Y%m%dT%H%M%S)
readonly EVIDENCE_DIR="$FORENSICS_DIR/$INCIDENT_ID-$(hostname)-$STAMP"
mkdir -p "$EVIDENCE_DIR"

forensics_log() {
  printf '[%s] [forensics] %s\n' "$(date -Iseconds)" "$*"
}

forensics_init() {
  cd "$EVIDENCE_DIR"
  cat > METADATA <<EOF
incident_id: $INCIDENT_ID
host: $(hostname)
captured_at: $(date -Iseconds)
operator: ${SUDO_USER:-$USER}
kernel: $(uname -r)
os: $(. /etc/os-release && echo "$PRETTY_NAME")
EOF
  forensics_log "Evidence dir: $EVIDENCE_DIR"
}

# Capture network state (most volatile after kernel cache)
forensics_capture_network() {
  forensics_log "Capturing network state"
  ss -tnap > 01-tcp-connections.txt 2>&1
  ss -unap > 02-udp-connections.txt 2>&1
  ss -tlnp > 03-tcp-listeners.txt 2>&1
  ip -s neigh > 04-arp.txt 2>&1
  ip route > 05-route.txt 2>&1
  ip addr > 06-addrs.txt 2>&1
  iptables-save > 07-iptables.txt 2>&1 || true
  nft list ruleset > 08-nftables.txt 2>&1 || true
}

# Capture process state
forensics_capture_processes() {
  forensics_log "Capturing process state"
  ps -eo pid,ppid,uid,user,start_time,etime,nice,stat,command \
     --sort=start_time > 10-ps.txt 2>&1
  pstree -palu > 11-pstree.txt 2>&1
  lsof > 12-lsof.txt 2>&1
  lsof -i > 13-lsof-net.txt 2>&1
}

# Capture each suspicious process in detail
forensics_capture_pid() {
  local pid="$1"
  [[ -d "/proc/$pid" ]] || { forensics_log "PID $pid does not exist"; return 1; }

  local pdir="proc-$pid"
  mkdir -p "$pdir"

  # Static info
  cat "/proc/$pid/cmdline" | tr '\0' ' ' > "$pdir/cmdline.txt"
  cat "/proc/$pid/status" > "$pdir/status.txt"
  tr '\0' '\n' < "/proc/$pid/environ" > "$pdir/environ.txt" 2>/dev/null
  ls -la "/proc/$pid/exe" > "$pdir/exe-link.txt" 2>/dev/null

  # Hash the running binary (vs. on-disk)
  if [[ -r "/proc/$pid/exe" ]]; then
    sha256sum "/proc/$pid/exe" > "$pdir/running-binary.sha256" 2>/dev/null
    local on_disk
    on_disk=$(readlink "/proc/$pid/exe")
    if [[ -f "$on_disk" ]]; then
      sha256sum "$on_disk" > "$pdir/on-disk-binary.sha256" 2>/dev/null
    fi
  fi

  # Open file descriptors
  ls -la "/proc/$pid/fd" > "$pdir/fd.txt" 2>/dev/null

  # Memory maps
  cat "/proc/$pid/maps" > "$pdir/maps.txt" 2>/dev/null

  # Namespace links
  ls -la "/proc/$pid/ns" > "$pdir/ns.txt" 2>/dev/null

  forensics_log "Captured PID $pid"
}

# Capture sessions and login history
forensics_capture_sessions() {
  forensics_log "Capturing sessions"
  who > 20-who.txt 2>&1
  w > 21-w.txt 2>&1
  last -100 > 22-last.txt 2>&1
  lastlog > 23-lastlog.txt 2>&1
  faillock --user root > 24-faillock.txt 2>&1 || true
}

# Capture cron, systemd, persistence-relevant artifacts
forensics_capture_persistence() {
  forensics_log "Capturing persistence indicators"
  ls -la /etc/cron.* /var/spool/cron/ /etc/at.deny /etc/at.allow 2>/dev/null > 30-cron.txt
  systemctl list-units --all --no-pager > 31-systemd-units.txt 2>&1
  systemctl list-timers --all --no-pager > 32-systemd-timers.txt 2>&1
  systemctl list-unit-files --no-pager > 33-systemd-unit-files.txt 2>&1
  ls -la /etc/profile.d/ > 34-profile.txt 2>&1
  ls -la /etc/init.d/ > 35-init.txt 2>&1
}

# Capture recently modified files in suspect locations
forensics_capture_recent_files() {
  forensics_log "Capturing recently modified files"
  find /tmp /var/tmp /dev/shm /home /root /var/spool \
       -type f -mtime -1 \
       -exec ls -la {} \; 2>/dev/null > 40-recent-1day.txt
  find / -type f -newer /etc/passwd -not -path /proc/\* -not -path /sys/\* \
       -not -path /run/\* -not -path "$EVIDENCE_DIR/*" \
       2>/dev/null | head -1000 > 41-newer-than-passwd.txt
}

# Capture system configuration baseline
forensics_capture_config() {
  forensics_log "Capturing system config"
  cp /etc/passwd 50-passwd
  cp /etc/shadow 51-shadow 2>/dev/null
  cp /etc/group 52-group
  cp /etc/sudoers 53-sudoers 2>/dev/null
  cp -r /etc/sudoers.d 54-sudoers-d 2>/dev/null
  cp /etc/ssh/sshd_config 55-sshd_config 2>/dev/null
  uname -a > 56-uname.txt
  lsmod > 57-modules.txt 2>&1
  dmesg --time-format iso > 58-dmesg.txt 2>&1
}

# Capture relevant logs
forensics_capture_logs() {
  forensics_log "Capturing logs"
  cp /var/log/auth.log* 60-auth.log* 2>/dev/null || true
  cp /var/log/syslog* 61-syslog* 2>/dev/null || true
  journalctl --since '7 days ago' --no-pager > 62-journal-7d.txt 2>&1 || true
}

# Hash all captured evidence
forensics_finalize() {
  forensics_log "Building manifest"
  cd "$EVIDENCE_DIR"
  find . -type f ! -name "MANIFEST*" ! -name "METADATA" -print0 \
    | xargs -0 sha256sum | sort -k 2 > MANIFEST.sha256
  sha256sum MANIFEST.sha256 > MANIFEST.sha256.sig

  # Tar bundle
  cd "$FORENSICS_DIR"
  local bundle="$(basename "$EVIDENCE_DIR").tar"
  tar --create --file="$bundle" "$(basename "$EVIDENCE_DIR")"
  sha256sum "$bundle" > "$bundle.sha256"

  # Sign
  if command -v gpg >/dev/null; then
    gpg --batch --yes --output "$bundle.sig" \
      --detach-sign --armor \
      --local-user "$FORENSIC_KEYID" \
      "$bundle" 2>/dev/null && forensics_log "Signed: $bundle.sig"
  fi

  forensics_log "Bundle: $FORENSICS_DIR/$bundle"
  forensics_log "Counts: $(wc -l < "$EVIDENCE_DIR/MANIFEST.sha256") files captured"
}

# Append a chain-of-custody record
forensics_coc_log() {
  local action="$1" evidence="$2" reason="$3"
  local prev_hash=""
  local coc=/var/forensics/chain-of-custody.jsonl
  mkdir -p "$(dirname "$coc")"
  if [[ -f "$coc" ]]; then
    prev_hash=$(tail -1 "$coc" | sha256sum | cut -d' ' -f1)
  fi
  jq -nc \
    --arg ts "$(date -Iseconds)" \
    --arg op "${SUDO_USER:-$USER}" \
    --arg host "$(hostname)" \
    --arg action "$action" \
    --arg evid "$evidence" \
    --arg reason "$reason" \
    --arg incident "$INCIDENT_ID" \
    --arg prev "$prev_hash" \
    '{ts:$ts, operator:$op, host:$host, action:$action, evidence:$evid, reason:$reason, incident:$incident, prev_hash:$prev}' \
    >> "$coc"
}

The 60-Second Triage Wrapper

#!/usr/bin/env bash
# triage.sh — runs the entire capture in 60 seconds
set -uo pipefail

: "${INCIDENT_ID:?usage: INCIDENT_ID=INC-... triage.sh}"

source /usr/local/lib/forensics.sh

forensics_init
forensics_coc_log "started" "$EVIDENCE_DIR" "incident response triage"

# Run captures in parallel where safe (most are read-only and independent)
forensics_capture_network &
forensics_capture_processes &
forensics_capture_sessions &
wait   # ~5s

forensics_capture_persistence &
forensics_capture_recent_files &
forensics_capture_config &
wait   # ~10s

forensics_capture_logs        # ~10-30s

# Capture suspicious PIDs (passed via env)
if [[ -n "${SUSPECT_PIDS:-}" ]]; then
  for pid in $SUSPECT_PIDS; do
    forensics_capture_pid "$pid"
  done
fi

forensics_finalize
forensics_coc_log "finalized" "$EVIDENCE_DIR" "evidence bundle complete"

forensics_log "Triage complete. Bundle: $FORENSICS_DIR/$(basename "$EVIDENCE_DIR").tar"

Run with:

sudo INCIDENT_ID=INC-2026-0042 SUSPECT_PIDS="4321 5678" /usr/local/bin/triage.sh

In ~30-60 seconds you have:

This is the script that buys an investigator the time to set up real forensic tools.

The Five-Step IR Triage Method

When you SSH into a possibly-compromised host, the script above is step 0. The five steps that follow:

Step 1: Isolate Without Destroying Evidence

If the host is in a load balancer, remove it from rotation but do not power off. Power-off destroys everything in step 2. Reboot loses memory state.

# Mark the instance unhealthy
aws ec2 modify-instance-attribute --instance-id i-xxxxx --no-source-dest-check

# Remove from ELB (does not stop the host)
aws elbv2 deregister-targets --target-group-arn $TG --targets Id=i-xxxxx

# Apply restrictive security group (allow only investigator's IP on SSH)
aws ec2 modify-instance-attribute --instance-id i-xxxxx --groups sg-investigator-only

The host is now isolated but still running, memory intact, processes alive, network connections preserved.

Step 2: Run Triage

Run the triage script above. Get the evidence bundle out of the host (scp to a separate forensic workstation) before doing anything else:

scp i-xxxxx:/var/forensics/$INCIDENT_ID-*.tar /forensic/$INCIDENT_ID/

Step 3: Targeted Investigation

Now examine the evidence on the workstation, hypothesis-driven:

# What's listening on unusual ports?
awk '$1 == "LISTEN"' tcp-listeners.txt | grep -v -E '^(LISTEN.*\b(22|80|443|3306)\b)'

# Which processes have outbound connections to non-standard destinations?
awk '$1 == "ESTAB"' tcp-connections.txt | awk '{print $5}' | sort -u

# Any PIDs where running-hash != on-disk-hash?
for d in proc-*/; do
  rh=$(cat "$d/running-binary.sha256" 2>/dev/null | cut -d' ' -f1)
  oh=$(cat "$d/on-disk-binary.sha256" 2>/dev/null | cut -d' ' -f1)
  [[ -n "$rh" && "$rh" != "$oh" ]] && echo "DIFFER: $d"
done

# Recent logins from unusual IPs
awk '/^Accepted/' 60-auth.log* | awk '{print $11}' | sort -u

This is hand-art; the goal is “find the indicator that opens up the rest of the investigation.”

Step 4: Containment / Eradication

Once you understand the attack pattern, determine if you can:

For most cloud workloads, rebuild-from-known-good is the right answer. The triage bundle stays as evidence; the running compromised host gets terminated.

Step 5: Post-Mortem And Update

Mounting Disk Images Read-Only For Examination

When you have a disk image (e.g., from EBS snapshot or dd), examine it on a separate workstation:

# Verify image integrity before mounting
sha256sum -c sda1.img.sha256 || { echo "FAIL: image corrupted"; exit 1; }

# Loop-mount read-only
mkdir -p /mnt/evidence
mount -o ro,loop,noatime,nodev,nosuid sda1.img /mnt/evidence

# Or use a sparse loop device for safety (writes go to overlay)
losetup --read-only -f sda1.img
# (then mount the loopN device)

# Examine
ls -la /mnt/evidence/etc/passwd
find /mnt/evidence/var/log -name "auth.log*" -exec wc -l {} \;

# Always umount before destroying the image
umount /mnt/evidence

The nodev,nosuid flags on the mount prevent device nodes and set-uid binaries in the image from being respected — protects your forensic workstation from being compromised by examining a malicious image.

For maximum safety, use a qemu-nbd read-only export so the original image is never even touched at the kernel block layer:

qemu-nbd --read-only -c /dev/nbd0 sda1.img
mount -o ro,noatime,nodev,nosuid /dev/nbd0p1 /mnt/evidence

The 8 Footguns

1. set -e In A Triage Script

set -e aborts on first failure — but in triage you want every capture step to attempt, even if some fail (lsof might not be installed; iptables might be replaced by nft). Fix: Use set -uo pipefail without errexit. Each command’s stderr goes to its own output file, so failures are evidence, not blockers.

2. Running The Script From The Host’s Filesystem

If the host is compromised, /usr/local/bin/triage.sh itself might be backdoored. Fix: Mount a read-only USB / forensic remote (NFS) with the toolkit, or copy the script via scp from the investigator’s workstation right before running.

3. Writing Evidence To The Same Filesystem You’re Examining

If you write /var/forensics/... on a compromised host, you’re modifying timestamps in /var, possibly overwriting evidence in unallocated blocks. Fix: Write to a separate mount (USB, NFS) or stream straight over SSH to the investigator’s workstation:

ssh investigator@workstation "cat > /forensic/$INCIDENT_ID-$(date +%s).tar" < <(forensics_finalize_to_stdout)

4. Forgetting --no-pager On journalctl / systemctl

Without --no-pager, those commands invoke less and the script hangs waiting for a key press. Investigators have lost minutes to this. Fix: Always --no-pager for any command that may invoke a pager.

5. Running As Non-Root For Process Memory Capture

/proc/<pid>/mem for processes you don’t own returns EACCES. Triage must run as root. Fix: Document sudo requirement; the script should error early if not root:

[[ $EUID -eq 0 ]] || { echo "must run as root"; exit 1; }

6. Skipping The On-Disk-vs-Running-Hash Comparison

You hash the binary on disk, but the attacker replaced it. The hash matches their replacement, not the original. Fix: Always hash /proc/<pid>/exe (kernel’s view of the loaded binary) AND the on-disk path. Mismatches are huge red flags.

7. Examining A Live Compromised Host With Your Personal SSH Key

If the host is compromised and the attacker is watching, your SSH session and even your authentication agent forwarding are visible to them. Fix: Use a dedicated investigator SSH keypair, never -A (agent forwarding), and rotate the key after each incident.

8. Storing Evidence In A Bucket The Compromised Host’s Role Can Delete

Same threat model as backups (L35). The host’s IAM role should never have s3:DeleteObject on the forensic bucket. Fix: Cross-account upload (host has assume-role to a separate forensics account that has write-only permission on the bucket; only a forensic-investigator role can read it).

Quick-Reference Card

ORDER OF VOLATILITY (capture in this order)
  1. Network state (ss, ip neigh, ip route)
  2. Process state (ps, pstree, lsof)
  3. Session state (who, last, w)
  4. Persistence indicators (cron, systemd, profile.d)
  5. Recent files (find -mtime -1)
  6. System config (passwd, shadow, sshd_config, dmesg)
  7. Logs (auth.log, journalctl)

READ-ONLY DISCIPLINE
  Mount images: ro,loop,noatime,nodev,nosuid
  Live host: examine via /proc, never modify
  Remount real fs: mount -o remount,ro,noatime  (if you must work in place)

CHAIN OF CUSTODY
  Append-only JSONL with prev_hash chain
  Each entry: ts, operator, host, action, evidence, reason
  Sign every bundle with detached GPG

EVIDENCE BUNDLE
  All captures hashed in MANIFEST.sha256
  MANIFEST.sha256 itself hashed (chain of trust)
  tar + sha256 + meta + sig as a 4-file unit
  Store in S3 Object Lock, write-once forever

SHELL SETTINGS
  set -uo pipefail   (NOT errexit — capture should be best-effort)
  Always --no-pager (journalctl, systemctl)
  Run as root (capture EUID gate)

THE TRIAGE 60s
  forensics_init
  forensics_capture_network  &
  forensics_capture_processes &
  forensics_capture_sessions &
  wait
  forensics_capture_persistence &
  forensics_capture_recent_files &
  forensics_capture_config &
  wait
  forensics_capture_logs
  forensics_finalize

What’s Next

You can now triage a compromised host, capture evidence with chain-of-custody discipline, and produce a forensic bundle that survives audit and litigation.

The capstone of the entire shell course awaits: a style guide that pulls together every pattern from L1-L41 into a review checklist, a lifecycle policy for shell scripts in production, and the sunset criteria that let you retire scripts cleanly. This is the document every team should keep on the wall — the answer to “is this script ready for production?” and “is this script still earning its keep?”

In the final lesson — The Shell-Script Style Guide Capstone: Review Checklist, Lifecycle & Sunset Criteria — we’ll consolidate the entire series into a single review checklist (boilerplate, error handling, idempotency, observability, documentation), a lifecycle policy from inception through deprecation, the metrics every script should emit, and the criteria for retiring a script (replaced by a real tool, no longer needed, owner left the team).

shellforensicsincident-responseirtriagechain-of-custodymemory-captureevidence-bundlevolatilityread-onlypost-mortem
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments