Every long-running shell script will eventually be killed mid-execution. Ctrl+C from a tired engineer. A Kubernetes pod eviction. A systemd timeout. An OOM-killer. A laptop closed at the wrong moment. Servers reboot, networks die, and shell scripts that don’t plan for this leave behind a mess: half-written files, stuck lock files that will never be released, child processes still running with stale state, temporary directories on the filesystem forever.
The difference between a script that handles this gracefully and one that doesn’t is one shell idiom: trap. Every production-grade shell script you’ll ever write should set a trap EXIT handler at the top to clean up after itself, and most of them should also handle INT (Ctrl+C) and TERM (graceful shutdown) explicitly.
This lesson covers the signal model in just enough detail to write good handlers, the trap builtin, the canonical patterns for tempfile cleanup and lock files, and the production-grade signal-aware template you should adopt for all your scripts.
1. Signals in 90 seconds
A signal is a kernel-delivered interrupt to a process. The signaled process either has a handler installed (and that handler runs) or uses the default action for the signal (often: terminate). Signals are tiny — they carry no data, just an integer ID.
Bash signal numbers (Linux; differs slightly on macOS/BSD):
| Signal | Number | Default action | When it’s sent |
|---|---|---|---|
| SIGHUP | 1 | terminate | controlling terminal hangs up (e.g. SSH disconnect) |
| SIGINT | 2 | terminate | user pressed Ctrl+C |
| SIGQUIT | 3 | terminate + core dump | user pressed Ctrl+\ |
| SIGKILL | 9 | terminate (uncatchable) | kill -9 — forced kill |
| SIGTERM | 15 | terminate | kill default — polite “please stop” |
| SIGSTOP | 19 | stop (uncatchable) | kill -STOP |
| SIGTSTP | 20 | stop | user pressed Ctrl+Z |
| SIGCONT | 18 | continue | kill -CONT — resume a stopped process |
| SIGUSR1 | 10 | terminate | user-defined |
| SIGUSR2 | 12 | terminate | user-defined |
| SIGCHLD | 17 | ignore | a child process changed state |
| SIGPIPE | 13 | terminate | wrote to a closed pipe |
Two signals you cannot catch: SIGKILL (9) and SIGSTOP (19). The kernel handles them itself and the process gets no chance to react. Everything else can be caught (or ignored).
When a process exits because of signal N, its exit code is 128 + N:
kill -INT $PID→ process exits with 130 (= 128 + 2)kill -TERM $PID→ 143kill -KILL $PID→ 137
We saw this in lesson 8 (SIGPIPE → 141).
You send a signal with kill -SIG PID:
kill -INT 12345
kill -TERM 12345
kill -KILL 12345 # the brutal one
kill 12345 # SIGTERM by default
kill -USR1 12345 # user-defined; useful for "reload config"
Signal names can be given without the SIG prefix (kill -INT = kill -SIGINT).
2. trap — the only signal-handling primitive in shell
trap COMMANDS SIGNAL [SIGNAL ...] registers COMMANDS to run when any of the named signals are received.
trap 'echo "Caught signal!"' INT TERM
sleep 60
# Press Ctrl+C — you'll see "Caught signal!" and the script exits
A few things to understand:
COMMANDSis a string. Bash parses and runs it when the signal arrives.- Quote with single quotes to defer parameter expansion until handler-time.
trap "echo $LINENO" ERRwould substitute$LINENOnow (when the trap is set);trap 'echo $LINENO' ERRsubstitutes it at trap-time. - You can register handlers for multiple signals at once.
- Calling
trapagain replaces the previous handler for those signals. trap - SIGNALresets the signal to its default action.trap '' SIGNALignores the signal entirely (the empty string means “do nothing, but don’t fall through to default”).
trap 'echo "Hi"' INT
trap # list registered traps
trap - INT # remove the trap
trap '' INT # ignore SIGINT entirely (Ctrl+C does nothing)
3. The four signals you’ll handle 95% of the time
EXIT — pseudo-signal for “the script is exiting”
The most useful signal in bash is one that doesn’t exist at the kernel level: EXIT. Bash fires it whenever the shell exits — for any reason: normal completion, exit N call, fatal error from set -e, signal-induced termination. Use EXIT for cleanup that must happen no matter what.
TMPDIR=$(mktemp -d)
trap 'rm -rf -- "$TMPDIR"' EXIT
# ... use $TMPDIR ...
# When the script exits, the trap runs and cleans up
This is the canonical tempfile-cleanup pattern. It’s bulletproof:
- Normal exit: handler runs.
exit 1after an error: handler runs.set -etriggers on a failed command: handler runs.- User pressed Ctrl+C: handler runs (because Ctrl+C kills the script, which triggers EXIT).
- Power cable yanked: well, no handler can save you from that. But for everything that lets bash exit cleanly, EXIT is your friend.
You should put a trap '...' EXIT at the top of nearly every script that creates temporary state.
ERR — pseudo-signal for “a command failed”
ERR is fired whenever a command exits non-zero (subject to the same suppression rules as set -e — not inside if, &&, ||, !, until conditions). Useful for error logging:
on_error() {
local lineno="$1"
local code="$2"
echo "Error at line $lineno (exit $code)" >&2
}
trap 'on_error "$LINENO" "$?"' ERR
$LINENO inside the trap holds the line of the failing command. $? holds the exit code. This gives you a poor man’s stack trace when scripts fail.
ERR is fired in addition to EXIT — both fire on a failure. ERR fires first.
For ERR to propagate into functions, set set -E (also known as set -o errtrace). Without it, traps are not inherited by functions and command substitutions:
set -Eeuo pipefail
This is the right strict-mode preamble for any script with non-trivial functions.
INT — Ctrl+C
on_interrupt() {
echo "Interrupted by user" >&2
cleanup
exit 130
}
trap on_interrupt INT
130 is the standard “I exited because of SIGINT” exit code (128 + 2). Use it consistently so your callers can distinguish “user interrupted” from “real failure.”
TERM — polite shutdown
Systemd, Kubernetes, Docker, and most supervisors send SIGTERM first. They wait some grace period (typically 10-30 seconds), then send SIGKILL.
on_term() {
echo "Got SIGTERM, shutting down gracefully" >&2
cleanup
exit 143
}
trap on_term TERM
If your script is doing something that needs to finish cleanly — write a state file, close a connection, release a lock — your TERM handler is the place. Use the grace period.
Combined handler
A common idiom: same handler for INT and TERM:
on_signal() {
local sig="$1"
echo "Received SIG${sig}, shutting down" >&2
cleanup
case "$sig" in
INT) exit 130 ;;
TERM) exit 143 ;;
*) exit 1 ;;
esac
}
trap 'on_signal INT' INT
trap 'on_signal TERM' TERM
Or, more cleanly, with a separate cleanup and a trap EXIT that handles all paths:
cleanup() {
rm -rf -- "$TMPDIR" 2>/dev/null || true
kill "$BG_PID" 2>/dev/null || true
}
trap cleanup EXIT
trap 'echo "Interrupted"; exit 130' INT
trap 'echo "Terminated"; exit 143' TERM
This works because INT and TERM cause exit, which fires EXIT, which calls cleanup. Two layers, one cleanup function.
4. The canonical tempfile cleanup pattern
Every script that creates temp state should follow this template:
#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'
TMPDIR=$(mktemp -d -t myscript.XXXXXX)
trap 'rm -rf -- "$TMPDIR"' EXIT
# ... use $TMPDIR ...
echo "Working in $TMPDIR"
touch "$TMPDIR/working-file.txt"
# When the script exits, $TMPDIR is automatically cleaned up.
Key points:
mktemp -dcreates a fresh directory in/tmp(or$TMPDIR) with an unguessable name. The-t myscript.XXXXXXis a template; mktemp replaces the X’s with random characters.rm -rf -- "$TMPDIR"— the--is essential to handle pathologically named tempfiles. Quoted to handle paths with spaces.trap '...' EXITruns the cleanup unconditionally on any exit path.
The template-style is preferable to ad-hoc trap calls scattered through the script. Set the trap immediately after creating the resource.
5. Multiple resources: handler stacking
If you have multiple resources to clean up, you have two options:
Option A: one cleanup function
TMPDIR=$(mktemp -d)
LOG_FILE=$(mktemp)
cleanup() {
rm -rf -- "$TMPDIR"
rm -f -- "$LOG_FILE"
}
trap cleanup EXIT
Clean. Easy to extend. Recommended for most scripts.
Option B: stack handlers via reassignment
If you want to add cleanup steps as resources are acquired, use this pattern:
add_cleanup() {
local cmd="$1"
CLEANUPS+=("$cmd")
trap 'for c in "${CLEANUPS[@]}"; do eval "$c"; done' EXIT
}
CLEANUPS=()
# Acquire and register
TMPDIR=$(mktemp -d)
add_cleanup "rm -rf -- '$TMPDIR'"
LOCK=$(mktemp)
add_cleanup "rm -f -- '$LOCK'"
PID=$(start-bg-task)
add_cleanup "kill '$PID' 2>/dev/null || true"
Order is preserved. Niche, but useful when you’ve got many resources acquired conditionally throughout a long script.
6. Lock files — the “only one instance running” pattern
If you’re writing a script that should not run concurrently with another instance of itself (cron jobs, daily backups, deployment scripts), you need a lock.
Naive approach (don’t)
LOCK=/tmp/myscript.lock
if [[ -f "$LOCK" ]]; then
echo "Already running" >&2
exit 1
fi
touch "$LOCK"
trap 'rm -f -- "$LOCK"' EXIT
This has a race condition: between the [[ -f ]] check and touch, another instance can do the same check, and you get two running instances. The test-and-create is not atomic.
Slightly better (mkdir as atomic)
LOCK=/tmp/myscript.lockdir
if ! mkdir "$LOCK" 2>/dev/null; then
echo "Already running" >&2
exit 1
fi
trap 'rmdir -- "$LOCK"' EXIT
mkdir is atomic (it either succeeds or fails with EEXIST). No race. But if your script crashes without cleanup, the lock dir is left behind and you’ll need to remove it manually next time.
The right way: flock(1)
LOCK=/var/lock/myscript.lock
exec 9>"$LOCK"
if ! flock -n 9; then
echo "Already running" >&2
exit 1
fi
# ... do work ...
# Lock is released automatically when fd 9 closes (i.e. on script exit)
flock uses kernel-level advisory locking on a file descriptor. The lock is held by the process and released when the process exits — no matter how it exits, including SIGKILL, OOM, power loss. There’s no leftover lock file to clean up (well, the file remains, but the lock on it is gone the moment the process is gone).
flock -n is non-blocking: returns immediately with status 1 if the lock is held. Without -n, it blocks until the lock becomes available. Use -n for “single instance, fail otherwise”; omit -n for “wait my turn.”
flock self-locking idiom — handy for cron:
#!/usr/bin/env bash
exec 9>"/var/lock/$(basename "$0").lock"
flock -n 9 || { echo "Already running"; exit 1; }
# ... work ...
In practice, the cleaner version that handles wrapping itself:
[[ "${LOCKED:-}" ]] || exec env LOCKED=1 flock -en /var/lock/myscript.lock "$0" "$@"
A bit cryptic but extremely effective: if the script is invoked without LOCKED=1, it re-execs itself under flock, which prevents two instances from running. Once flock is held, LOCKED=1 is set, so the inner invocation skips the re-exec.
7. Idempotent cleanup
Your cleanup function will sometimes run twice — for instance, if cleanup itself fails partway through, the EXIT trap may re-fire. Make cleanup idempotent: safe to call repeatedly with no error.
cleanup() {
if [[ -d "${TMPDIR:-}" ]]; then
rm -rf -- "$TMPDIR"
TMPDIR=""
fi
if [[ -n "${BG_PID:-}" ]] && kill -0 "$BG_PID" 2>/dev/null; then
kill -TERM "$BG_PID" || true
wait "$BG_PID" 2>/dev/null || true
BG_PID=""
fi
}
Patterns:
- Check existence before deleting (
[[ -d ]],kill -0). - Clear the variable after using it (
TMPDIR="") so a second call is a no-op. || trueafter operations that might “fail” because the resource is already gone.
Idempotent cleanup also matters for cleanup being called from both EXIT and a manual call — e.g., a script that wants to do cleanup before re-execing itself.
8. The production template
Adopt this as the boilerplate for every non-trivial script:
#!/usr/bin/env bash
# myscript.sh — short description of what this does
set -Eeuo pipefail
IFS=$'\n\t'
# --- Logging ---
log() {
printf '[%s] [%s] %s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$1" "${*:2}" >&2
}
die() {
log error "$*"
exit 1
}
# --- Cleanup ---
TMPDIR=""
BG_PIDS=()
cleanup() {
local rc=$?
log debug "Cleaning up (exit code ${rc})"
for pid in "${BG_PIDS[@]}"; do
if kill -0 "$pid" 2>/dev/null; then
kill -TERM "$pid" 2>/dev/null || true
wait "$pid" 2>/dev/null || true
fi
done
if [[ -n "$TMPDIR" && -d "$TMPDIR" ]]; then
rm -rf -- "$TMPDIR"
fi
exit "$rc"
}
on_error() {
local lineno="$1"
local code="$2"
log error "Failure at line ${lineno} (exit ${code})"
}
trap cleanup EXIT
trap 'on_error "$LINENO" "$?"' ERR
trap 'log warn "Interrupted"; exit 130' INT
trap 'log warn "Terminated"; exit 143' TERM
# --- Main ---
main() {
TMPDIR=$(mktemp -d -t "$(basename "$0").XXXXXX")
log info "Working in ${TMPDIR}"
# ... actual work ...
log info "Done"
}
main "$@"
This template:
- Strict-mode preamble with
-Efor ERR-trap inheritance into functions. - A
logfunction with timestamps that writes to stderr. - A
diehelper that logs and exits. - A
cleanupfunction that:- Captures the exit code at entry.
- Kills any tracked background PIDs.
- Removes the tempdir.
- Exits with the original code.
- ERR trap that logs the failing line and exit code.
- INT and TERM traps that log and exit with the signal-derived code.
- All inside a
mainfunction called at the end.
Use this as your starting point. Cut what you don’t need, keep what you do.
9. Common signal-handling pitfalls
Forgetting set -E
Without set -E, ERR traps are not inherited by shell functions. So this:
set -e
trap 'echo "ERR at $LINENO"' ERR
myfunc() {
false # ERR will NOT fire here without set -E
}
myfunc
won’t fire the trap. Add -E:
set -Eeuo pipefail
trap 'echo "ERR at $LINENO"' ERR
Or use set -o errtrace (same thing, longer name).
Single-quote vs double-quote in trap
trap "echo $LINENO" ERR # WRONG — substitutes LINENO when trap is SET, not when it FIRES
trap 'echo $LINENO' ERR # CORRECT — substitutes at trap-time
Always single-quote the trap command unless you have a very specific reason to expand at registration time.
Trap for INT but not exiting
If your INT handler doesn’t exit, your script keeps running after Ctrl+C:
trap 'echo "ignoring Ctrl+C"' INT
sleep 60 # Ctrl+C now just prints the message; sleep continues
This can be deliberate (long-running scripts that should not be Ctrl+C-able) but is more often a bug. If your INT handler should terminate, end it with exit 130.
Background processes don’t inherit traps
Traps reset to default when you fork a subshell or background a process:
trap 'echo caught' INT
(sleep 60) & # the subshell doesn't inherit your INT trap
Set traps inside the subshell if you need them. Or use ( trap '...' INT; sleep 60 ).
Forgetting cleanup runs even on success
Your cleanup runs on every exit path, including normal successful exit. Make sure your cleanup is OK with running after success:
cleanup() {
echo "Failure!" # WRONG — also fires on success
rm -f "$TMPFILE"
}
Use the exit-code variable:
cleanup() {
local rc=$?
if (( rc != 0 )); then
echo "Failed with exit ${rc}" >&2
fi
rm -f "$TMPFILE"
exit "$rc"
}
10. Sending signals to children
If your script spawned background work, your signal handlers should propagate signals to children:
BG_PID=""
cleanup() {
if [[ -n "$BG_PID" ]] && kill -0 "$BG_PID" 2>/dev/null; then
kill -TERM "$BG_PID"
wait "$BG_PID" 2>/dev/null || true
fi
}
trap cleanup EXIT
start-long-task &
BG_PID=$!
# main work...
For a whole process group (the script and all its descendants):
trap 'kill -- -$$' EXIT # send SIGTERM to the entire process group
-$$ (negative of own PID) is the syntax to target a process group. This is heavy-handed — you’ll kill yourself in the process — but for “everything stops now” it works.
For a more targeted approach, use pkill -P $$ -SIGTERM (kill all direct children of this PID).
11. Interaction with set -e
set -e and traps interact in subtle ways:
- ERR traps fire before
set -eexits the shell. - EXIT traps fire after
set -etriggers, as part of the exit process. - An ERR trap that exits zero (
return 0) does not preventset -efrom firing —set -eis based on the original command’s status.
The mental model: set -e triggers an exit; the exit triggers EXIT (and ERR fired earlier). Your traps should not try to “rescue” set -e-induced exits — instead, log diagnostics and let the exit happen.
12. Real example: a robust deployment runner
#!/usr/bin/env bash
# run-deployment.sh — robust deployment with locking, cleanup, and signal handling
set -Eeuo pipefail
IFS=$'\n\t'
readonly SCRIPT_NAME=$(basename "$0")
readonly LOCK_FILE="/var/lock/${SCRIPT_NAME}.lock"
log() {
printf '[%s] [%s] %s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$1" "${*:2}" >&2
}
die() { log error "$*"; exit 1; }
# --- Self-lock via flock ---
if [[ -z "${LOCKED:-}" ]]; then
exec env LOCKED=1 flock -n "$LOCK_FILE" "$0" "$@"
die "Could not obtain lock on $LOCK_FILE — another instance running?"
fi
# --- Cleanup state ---
TMPDIR=""
BG_PIDS=()
cleanup() {
local rc=$?
log info "Cleanup (exit ${rc})"
for pid in "${BG_PIDS[@]}"; do
if kill -0 "$pid" 2>/dev/null; then
log info "Stopping bg process $pid"
kill -TERM "$pid" 2>/dev/null || true
wait "$pid" 2>/dev/null || true
fi
done
if [[ -n "$TMPDIR" && -d "$TMPDIR" ]]; then
log info "Removing $TMPDIR"
rm -rf -- "$TMPDIR"
fi
exit "$rc"
}
on_err() {
local lineno="$1"
local code="$2"
log error "Failure at line ${lineno} (exit ${code})"
}
trap cleanup EXIT
trap 'on_err "$LINENO" "$?"' ERR
trap 'log warn "Caught SIGINT"; exit 130' INT
trap 'log warn "Caught SIGTERM"; exit 143' TERM
# --- Main ---
main() {
log info "Deployment starting (PID $$)"
TMPDIR=$(mktemp -d -t "${SCRIPT_NAME}.XXXXXX")
# 1. Pull the latest artifacts
log info "Pulling artifacts"
curl -fsS https://artifacts.example.com/latest.tgz -o "${TMPDIR}/artifact.tgz"
tar -xzf "${TMPDIR}/artifact.tgz" -C "${TMPDIR}"
# 2. Background a health-watch
( while true; do
curl -fsS https://api.example.com/healthz >/dev/null 2>&1 || break
sleep 5
done
log warn "Health check broke!" ) &
BG_PIDS+=($!)
# 3. Run the deploy
log info "Running deploy script"
"${TMPDIR}/deploy.sh"
log info "Deployment OK"
}
main "$@"
Things to notice:
- Self-lock via
flock: the script re-execs itself under flock if not yet locked. Only one instance runs. - Strict-mode +
-Efor ERR-into-functions. cleanuphandles tempdir AND background processes, idempotently.- ERR trap logs the failing line.
- INT / TERM traps log and exit with the right codes (130/143).
- All actual work is inside
main.
This is the production template. Use it as your default.
13. What you must internalise before lesson 11
- What’s the difference between SIGTERM and SIGKILL? (TERM is catchable; KILL is not. SIGKILL gives the process no chance to clean up.)
- What’s the EXIT pseudo-signal in bash? (Fires whenever the shell exits, for any reason. Use it for cleanup that must always run.)
- What’s the ERR pseudo-signal? (Fires whenever a command exits non-zero, subject to set -e suppression rules.)
- What does
set -Edo? (Makes ERR traps inherited by functions and subshells.) - Why use single quotes around trap commands? (To defer parameter expansion until trap-time, not registration-time.)
- What’s the canonical tempfile cleanup pattern? (
TMPDIR=$(mktemp -d); trap 'rm -rf -- "$TMPDIR"' EXIT.) - Why is naive
[[ -f $LOCK ]] && touch $LOCKracy? (The check-and-create is not atomic; two instances can pass the check simultaneously.) - What’s the right lock-file primitive? (
flockon a file descriptor — kernel-level advisory lock that’s released automatically on process exit.) - What’s the standard exit code when killed by SIGINT? (130 = 128 + 2.)
- Why must cleanup be idempotent? (It may run twice if cleanup itself triggers another exit.)
If any felt fuzzy, re-read. Lesson 11 (globbing, regex, find, grep, sed) is where we go from process-and-error discipline back into the data-manipulation toolkit at scale.
What’s next
Lesson 11 covers globbing in depth (nullglob, dotglob, globstar, extended globs), regex semantics (BRE vs ERE vs PCRE), the find command from beginner to advanced (filtering, actions, -print0), grep mastery (Perl regex, multiline, context flags), and sed for in-place editing. Bring everything from lessons 1-10.