Shell Lesson 10 of 42

Signal Handling: trap, EXIT/ERR/INT/TERM, Idempotent Cleanup & Lock-File Discipline — Writing Scripts That Don't Leave a Mess Behind

Every long-running shell script will eventually be killed mid-execution. Ctrl+C from a tired engineer. A Kubernetes pod eviction. A systemd timeout. An OOM-killer. A laptop closed at the wrong moment. Servers reboot, networks die, and shell scripts that don’t plan for this leave behind a mess: half-written files, stuck lock files that will never be released, child processes still running with stale state, temporary directories on the filesystem forever.

The difference between a script that handles this gracefully and one that doesn’t is one shell idiom: trap. Every production-grade shell script you’ll ever write should set a trap EXIT handler at the top to clean up after itself, and most of them should also handle INT (Ctrl+C) and TERM (graceful shutdown) explicitly.

This lesson covers the signal model in just enough detail to write good handlers, the trap builtin, the canonical patterns for tempfile cleanup and lock files, and the production-grade signal-aware template you should adopt for all your scripts.


1. Signals in 90 seconds

A signal is a kernel-delivered interrupt to a process. The signaled process either has a handler installed (and that handler runs) or uses the default action for the signal (often: terminate). Signals are tiny — they carry no data, just an integer ID.

Bash signal numbers (Linux; differs slightly on macOS/BSD):

Signal Number Default action When it’s sent
SIGHUP 1 terminate controlling terminal hangs up (e.g. SSH disconnect)
SIGINT 2 terminate user pressed Ctrl+C
SIGQUIT 3 terminate + core dump user pressed Ctrl+\
SIGKILL 9 terminate (uncatchable) kill -9 — forced kill
SIGTERM 15 terminate kill default — polite “please stop”
SIGSTOP 19 stop (uncatchable) kill -STOP
SIGTSTP 20 stop user pressed Ctrl+Z
SIGCONT 18 continue kill -CONT — resume a stopped process
SIGUSR1 10 terminate user-defined
SIGUSR2 12 terminate user-defined
SIGCHLD 17 ignore a child process changed state
SIGPIPE 13 terminate wrote to a closed pipe

Two signals you cannot catch: SIGKILL (9) and SIGSTOP (19). The kernel handles them itself and the process gets no chance to react. Everything else can be caught (or ignored).

When a process exits because of signal N, its exit code is 128 + N:

We saw this in lesson 8 (SIGPIPE → 141).

You send a signal with kill -SIG PID:

kill -INT 12345
kill -TERM 12345
kill -KILL 12345     # the brutal one
kill 12345           # SIGTERM by default
kill -USR1 12345     # user-defined; useful for "reload config"

Signal names can be given without the SIG prefix (kill -INT = kill -SIGINT).


2. trap — the only signal-handling primitive in shell

trap COMMANDS SIGNAL [SIGNAL ...] registers COMMANDS to run when any of the named signals are received.

trap 'echo "Caught signal!"' INT TERM
sleep 60
# Press Ctrl+C — you'll see "Caught signal!" and the script exits

A few things to understand:

trap 'echo "Hi"' INT
trap                                # list registered traps
trap - INT                          # remove the trap
trap '' INT                         # ignore SIGINT entirely (Ctrl+C does nothing)

3. The four signals you’ll handle 95% of the time

EXIT — pseudo-signal for “the script is exiting”

The most useful signal in bash is one that doesn’t exist at the kernel level: EXIT. Bash fires it whenever the shell exits — for any reason: normal completion, exit N call, fatal error from set -e, signal-induced termination. Use EXIT for cleanup that must happen no matter what.

TMPDIR=$(mktemp -d)
trap 'rm -rf -- "$TMPDIR"' EXIT

# ... use $TMPDIR ...
# When the script exits, the trap runs and cleans up

This is the canonical tempfile-cleanup pattern. It’s bulletproof:

You should put a trap '...' EXIT at the top of nearly every script that creates temporary state.

ERR — pseudo-signal for “a command failed”

ERR is fired whenever a command exits non-zero (subject to the same suppression rules as set -e — not inside if, &&, ||, !, until conditions). Useful for error logging:

on_error() {
  local lineno="$1"
  local code="$2"
  echo "Error at line $lineno (exit $code)" >&2
}

trap 'on_error "$LINENO" "$?"' ERR

$LINENO inside the trap holds the line of the failing command. $? holds the exit code. This gives you a poor man’s stack trace when scripts fail.

ERR is fired in addition to EXIT — both fire on a failure. ERR fires first.

For ERR to propagate into functions, set set -E (also known as set -o errtrace). Without it, traps are not inherited by functions and command substitutions:

set -Eeuo pipefail

This is the right strict-mode preamble for any script with non-trivial functions.

INT — Ctrl+C

on_interrupt() {
  echo "Interrupted by user" >&2
  cleanup
  exit 130
}

trap on_interrupt INT

130 is the standard “I exited because of SIGINT” exit code (128 + 2). Use it consistently so your callers can distinguish “user interrupted” from “real failure.”

TERM — polite shutdown

Systemd, Kubernetes, Docker, and most supervisors send SIGTERM first. They wait some grace period (typically 10-30 seconds), then send SIGKILL.

on_term() {
  echo "Got SIGTERM, shutting down gracefully" >&2
  cleanup
  exit 143
}

trap on_term TERM

If your script is doing something that needs to finish cleanly — write a state file, close a connection, release a lock — your TERM handler is the place. Use the grace period.

Combined handler

A common idiom: same handler for INT and TERM:

on_signal() {
  local sig="$1"
  echo "Received SIG${sig}, shutting down" >&2
  cleanup
  case "$sig" in
    INT)  exit 130 ;;
    TERM) exit 143 ;;
    *)    exit 1 ;;
  esac
}

trap 'on_signal INT' INT
trap 'on_signal TERM' TERM

Or, more cleanly, with a separate cleanup and a trap EXIT that handles all paths:

cleanup() {
  rm -rf -- "$TMPDIR" 2>/dev/null || true
  kill "$BG_PID" 2>/dev/null || true
}

trap cleanup EXIT
trap 'echo "Interrupted"; exit 130' INT
trap 'echo "Terminated"; exit 143' TERM

This works because INT and TERM cause exit, which fires EXIT, which calls cleanup. Two layers, one cleanup function.


4. The canonical tempfile cleanup pattern

Every script that creates temp state should follow this template:

#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'

TMPDIR=$(mktemp -d -t myscript.XXXXXX)
trap 'rm -rf -- "$TMPDIR"' EXIT

# ... use $TMPDIR ...
echo "Working in $TMPDIR"
touch "$TMPDIR/working-file.txt"

# When the script exits, $TMPDIR is automatically cleaned up.

Key points:

The template-style is preferable to ad-hoc trap calls scattered through the script. Set the trap immediately after creating the resource.


5. Multiple resources: handler stacking

If you have multiple resources to clean up, you have two options:

Option A: one cleanup function

TMPDIR=$(mktemp -d)
LOG_FILE=$(mktemp)

cleanup() {
  rm -rf -- "$TMPDIR"
  rm -f -- "$LOG_FILE"
}

trap cleanup EXIT

Clean. Easy to extend. Recommended for most scripts.

Option B: stack handlers via reassignment

If you want to add cleanup steps as resources are acquired, use this pattern:

add_cleanup() {
  local cmd="$1"
  CLEANUPS+=("$cmd")
  trap 'for c in "${CLEANUPS[@]}"; do eval "$c"; done' EXIT
}

CLEANUPS=()

# Acquire and register
TMPDIR=$(mktemp -d)
add_cleanup "rm -rf -- '$TMPDIR'"

LOCK=$(mktemp)
add_cleanup "rm -f -- '$LOCK'"

PID=$(start-bg-task)
add_cleanup "kill '$PID' 2>/dev/null || true"

Order is preserved. Niche, but useful when you’ve got many resources acquired conditionally throughout a long script.


6. Lock files — the “only one instance running” pattern

If you’re writing a script that should not run concurrently with another instance of itself (cron jobs, daily backups, deployment scripts), you need a lock.

Naive approach (don’t)

LOCK=/tmp/myscript.lock

if [[ -f "$LOCK" ]]; then
  echo "Already running" >&2
  exit 1
fi
touch "$LOCK"
trap 'rm -f -- "$LOCK"' EXIT

This has a race condition: between the [[ -f ]] check and touch, another instance can do the same check, and you get two running instances. The test-and-create is not atomic.

Slightly better (mkdir as atomic)

LOCK=/tmp/myscript.lockdir

if ! mkdir "$LOCK" 2>/dev/null; then
  echo "Already running" >&2
  exit 1
fi
trap 'rmdir -- "$LOCK"' EXIT

mkdir is atomic (it either succeeds or fails with EEXIST). No race. But if your script crashes without cleanup, the lock dir is left behind and you’ll need to remove it manually next time.

The right way: flock(1)

LOCK=/var/lock/myscript.lock
exec 9>"$LOCK"

if ! flock -n 9; then
  echo "Already running" >&2
  exit 1
fi

# ... do work ...
# Lock is released automatically when fd 9 closes (i.e. on script exit)

flock uses kernel-level advisory locking on a file descriptor. The lock is held by the process and released when the process exits — no matter how it exits, including SIGKILL, OOM, power loss. There’s no leftover lock file to clean up (well, the file remains, but the lock on it is gone the moment the process is gone).

flock -n is non-blocking: returns immediately with status 1 if the lock is held. Without -n, it blocks until the lock becomes available. Use -n for “single instance, fail otherwise”; omit -n for “wait my turn.”

flock self-locking idiom — handy for cron:

#!/usr/bin/env bash
exec 9>"/var/lock/$(basename "$0").lock"
flock -n 9 || { echo "Already running"; exit 1; }

# ... work ...

In practice, the cleaner version that handles wrapping itself:

[[ "${LOCKED:-}" ]] || exec env LOCKED=1 flock -en /var/lock/myscript.lock "$0" "$@"

A bit cryptic but extremely effective: if the script is invoked without LOCKED=1, it re-execs itself under flock, which prevents two instances from running. Once flock is held, LOCKED=1 is set, so the inner invocation skips the re-exec.


7. Idempotent cleanup

Your cleanup function will sometimes run twice — for instance, if cleanup itself fails partway through, the EXIT trap may re-fire. Make cleanup idempotent: safe to call repeatedly with no error.

cleanup() {
  if [[ -d "${TMPDIR:-}" ]]; then
    rm -rf -- "$TMPDIR"
    TMPDIR=""
  fi
  if [[ -n "${BG_PID:-}" ]] && kill -0 "$BG_PID" 2>/dev/null; then
    kill -TERM "$BG_PID" || true
    wait "$BG_PID" 2>/dev/null || true
    BG_PID=""
  fi
}

Patterns:

Idempotent cleanup also matters for cleanup being called from both EXIT and a manual call — e.g., a script that wants to do cleanup before re-execing itself.


8. The production template

Adopt this as the boilerplate for every non-trivial script:

#!/usr/bin/env bash
# myscript.sh — short description of what this does
set -Eeuo pipefail
IFS=$'\n\t'

# --- Logging ---

log() {
  printf '[%s] [%s] %s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$1" "${*:2}" >&2
}

die() {
  log error "$*"
  exit 1
}

# --- Cleanup ---

TMPDIR=""
BG_PIDS=()

cleanup() {
  local rc=$?
  log debug "Cleaning up (exit code ${rc})"

  for pid in "${BG_PIDS[@]}"; do
    if kill -0 "$pid" 2>/dev/null; then
      kill -TERM "$pid" 2>/dev/null || true
      wait "$pid" 2>/dev/null || true
    fi
  done

  if [[ -n "$TMPDIR" && -d "$TMPDIR" ]]; then
    rm -rf -- "$TMPDIR"
  fi

  exit "$rc"
}

on_error() {
  local lineno="$1"
  local code="$2"
  log error "Failure at line ${lineno} (exit ${code})"
}

trap cleanup EXIT
trap 'on_error "$LINENO" "$?"' ERR
trap 'log warn "Interrupted"; exit 130' INT
trap 'log warn "Terminated"; exit 143' TERM

# --- Main ---

main() {
  TMPDIR=$(mktemp -d -t "$(basename "$0").XXXXXX")
  log info "Working in ${TMPDIR}"

  # ... actual work ...

  log info "Done"
}

main "$@"

This template:

Use this as your starting point. Cut what you don’t need, keep what you do.


9. Common signal-handling pitfalls

Forgetting set -E

Without set -E, ERR traps are not inherited by shell functions. So this:

set -e
trap 'echo "ERR at $LINENO"' ERR

myfunc() {
  false        # ERR will NOT fire here without set -E
}

myfunc

won’t fire the trap. Add -E:

set -Eeuo pipefail
trap 'echo "ERR at $LINENO"' ERR

Or use set -o errtrace (same thing, longer name).

Single-quote vs double-quote in trap

trap "echo $LINENO" ERR        # WRONG — substitutes LINENO when trap is SET, not when it FIRES
trap 'echo $LINENO' ERR        # CORRECT — substitutes at trap-time

Always single-quote the trap command unless you have a very specific reason to expand at registration time.

Trap for INT but not exiting

If your INT handler doesn’t exit, your script keeps running after Ctrl+C:

trap 'echo "ignoring Ctrl+C"' INT
sleep 60                       # Ctrl+C now just prints the message; sleep continues

This can be deliberate (long-running scripts that should not be Ctrl+C-able) but is more often a bug. If your INT handler should terminate, end it with exit 130.

Background processes don’t inherit traps

Traps reset to default when you fork a subshell or background a process:

trap 'echo caught' INT

(sleep 60) &              # the subshell doesn't inherit your INT trap

Set traps inside the subshell if you need them. Or use ( trap '...' INT; sleep 60 ).

Forgetting cleanup runs even on success

Your cleanup runs on every exit path, including normal successful exit. Make sure your cleanup is OK with running after success:

cleanup() {
  echo "Failure!"           # WRONG — also fires on success
  rm -f "$TMPFILE"
}

Use the exit-code variable:

cleanup() {
  local rc=$?
  if (( rc != 0 )); then
    echo "Failed with exit ${rc}" >&2
  fi
  rm -f "$TMPFILE"
  exit "$rc"
}

10. Sending signals to children

If your script spawned background work, your signal handlers should propagate signals to children:

BG_PID=""

cleanup() {
  if [[ -n "$BG_PID" ]] && kill -0 "$BG_PID" 2>/dev/null; then
    kill -TERM "$BG_PID"
    wait "$BG_PID" 2>/dev/null || true
  fi
}
trap cleanup EXIT

start-long-task &
BG_PID=$!

# main work...

For a whole process group (the script and all its descendants):

trap 'kill -- -$$' EXIT     # send SIGTERM to the entire process group

-$$ (negative of own PID) is the syntax to target a process group. This is heavy-handed — you’ll kill yourself in the process — but for “everything stops now” it works.

For a more targeted approach, use pkill -P $$ -SIGTERM (kill all direct children of this PID).


11. Interaction with set -e

set -e and traps interact in subtle ways:

The mental model: set -e triggers an exit; the exit triggers EXIT (and ERR fired earlier). Your traps should not try to “rescue” set -e-induced exits — instead, log diagnostics and let the exit happen.


12. Real example: a robust deployment runner

#!/usr/bin/env bash
# run-deployment.sh — robust deployment with locking, cleanup, and signal handling
set -Eeuo pipefail
IFS=$'\n\t'

readonly SCRIPT_NAME=$(basename "$0")
readonly LOCK_FILE="/var/lock/${SCRIPT_NAME}.lock"

log() {
  printf '[%s] [%s] %s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$1" "${*:2}" >&2
}

die() { log error "$*"; exit 1; }

# --- Self-lock via flock ---
if [[ -z "${LOCKED:-}" ]]; then
  exec env LOCKED=1 flock -n "$LOCK_FILE" "$0" "$@"
  die "Could not obtain lock on $LOCK_FILE — another instance running?"
fi

# --- Cleanup state ---
TMPDIR=""
BG_PIDS=()

cleanup() {
  local rc=$?
  log info "Cleanup (exit ${rc})"

  for pid in "${BG_PIDS[@]}"; do
    if kill -0 "$pid" 2>/dev/null; then
      log info "Stopping bg process $pid"
      kill -TERM "$pid" 2>/dev/null || true
      wait "$pid" 2>/dev/null || true
    fi
  done

  if [[ -n "$TMPDIR" && -d "$TMPDIR" ]]; then
    log info "Removing $TMPDIR"
    rm -rf -- "$TMPDIR"
  fi

  exit "$rc"
}

on_err() {
  local lineno="$1"
  local code="$2"
  log error "Failure at line ${lineno} (exit ${code})"
}

trap cleanup EXIT
trap 'on_err "$LINENO" "$?"' ERR
trap 'log warn "Caught SIGINT"; exit 130' INT
trap 'log warn "Caught SIGTERM"; exit 143' TERM

# --- Main ---

main() {
  log info "Deployment starting (PID $$)"
  TMPDIR=$(mktemp -d -t "${SCRIPT_NAME}.XXXXXX")

  # 1. Pull the latest artifacts
  log info "Pulling artifacts"
  curl -fsS https://artifacts.example.com/latest.tgz -o "${TMPDIR}/artifact.tgz"
  tar -xzf "${TMPDIR}/artifact.tgz" -C "${TMPDIR}"

  # 2. Background a health-watch
  ( while true; do
      curl -fsS https://api.example.com/healthz >/dev/null 2>&1 || break
      sleep 5
    done
    log warn "Health check broke!" ) &
  BG_PIDS+=($!)

  # 3. Run the deploy
  log info "Running deploy script"
  "${TMPDIR}/deploy.sh"

  log info "Deployment OK"
}

main "$@"

Things to notice:

This is the production template. Use it as your default.


13. What you must internalise before lesson 11

If any felt fuzzy, re-read. Lesson 11 (globbing, regex, find, grep, sed) is where we go from process-and-error discipline back into the data-manipulation toolkit at scale.


What’s next

Lesson 11 covers globbing in depth (nullglob, dotglob, globstar, extended globs), regex semantics (BRE vs ERE vs PCRE), the find command from beginner to advanced (filtering, actions, -print0), grep mastery (Perl regex, multiline, context flags), and sed for in-place editing. Bring everything from lessons 1-10.

shellbashsignalstrapcleanupexitlock-fileflockidempotentfundamentalslinux
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments