Signal Handling: trap, EXIT/ERR/INT/TERM, Idempotent Cleanup & Lock-File Discipline — Writing Scripts That Don't Leave a Mess Behind

Every long-running shell script will eventually be killed mid-execution. Ctrl+C from a tired engineer. A Kubernetes pod eviction. A systemd timeout. An OOM-killer. A laptop closed at the wrong moment. Servers reboot, networks die, and shell scripts that don’t plan for this leave behind a mess: half-written files, stuck lock files that will never be released, child processes still running with stale state, temporary directories on the filesystem forever.

The difference between a script that handles this gracefully and one that doesn’t is one shell idiom: trap. Every production-grade shell script you’ll ever write should set a trap EXIT handler at the top to clean up after itself, and most of them should also handle INT (Ctrl+C) and TERM (graceful shutdown) explicitly.

This lesson covers the signal model in just enough detail to write good handlers, the trap builtin, the canonical patterns for tempfile cleanup and lock files, and the production-grade signal-aware template you should adopt for all your scripts.

1. Signals in 90 seconds

A signal is a kernel-delivered interrupt to a process. The signaled process either has a handler installed (and that handler runs) or uses the default action for the signal (often: terminate). Signals are tiny — they carry no data, just an integer ID.

Bash signal numbers (Linux; differs slightly on macOS/BSD):

Signal	Number	Default action	When it’s sent
SIGHUP	1	terminate	controlling terminal hangs up (e.g. SSH disconnect)
SIGINT	2	terminate	user pressed Ctrl+C
SIGQUIT	3	terminate + core dump	user pressed Ctrl+\
SIGKILL	9	terminate (uncatchable)	`kill -9` — forced kill
SIGTERM	15	terminate	`kill` default — polite “please stop”
SIGSTOP	19	stop (uncatchable)	`kill -STOP`
SIGTSTP	20	stop	user pressed Ctrl+Z
SIGCONT	18	continue	`kill -CONT` — resume a stopped process
SIGUSR1	10	terminate	user-defined
SIGUSR2	12	terminate	user-defined
SIGCHLD	17	ignore	a child process changed state
SIGPIPE	13	terminate	wrote to a closed pipe

Two signals you cannot catch: SIGKILL (9) and SIGSTOP (19). The kernel handles them itself and the process gets no chance to react. Everything else can be caught (or ignored).

When a process exits because of signal N, its exit code is 128 + N:

kill -INT $PID → process exits with 130 (= 128 + 2)
kill -TERM $PID → 143
kill -KILL $PID → 137

We saw this in lesson 8 (SIGPIPE → 141).

You send a signal with kill -SIG PID:

kill -INT 12345
kill -TERM 12345
kill -KILL 12345     # the brutal one
kill 12345           # SIGTERM by default
kill -USR1 12345     # user-defined; useful for "reload config"

Signal names can be given without the SIG prefix (kill -INT = kill -SIGINT).

2. `trap` — the only signal-handling primitive in shell

trap COMMANDS SIGNAL [SIGNAL ...] registers COMMANDS to run when any of the named signals are received.

trap 'echo "Caught signal!"' INT TERM
sleep 60
# Press Ctrl+C — you'll see "Caught signal!" and the script exits

A few things to understand:

COMMANDS is a string. Bash parses and runs it when the signal arrives.
Quote with single quotes to defer parameter expansion until handler-time. trap "echo $LINENO" ERR would substitute $LINENO now (when the trap is set); trap 'echo $LINENO' ERR substitutes it at trap-time.
You can register handlers for multiple signals at once.
Calling trap again replaces the previous handler for those signals.
trap - SIGNAL resets the signal to its default action.
trap '' SIGNAL ignores the signal entirely (the empty string means “do nothing, but don’t fall through to default”).

trap 'echo "Hi"' INT
trap                                # list registered traps
trap - INT                          # remove the trap
trap '' INT                         # ignore SIGINT entirely (Ctrl+C does nothing)

3. The four signals you’ll handle 95% of the time

`EXIT` — pseudo-signal for “the script is exiting”

The most useful signal in bash is one that doesn’t exist at the kernel level: EXIT. Bash fires it whenever the shell exits — for any reason: normal completion, exit N call, fatal error from set -e, signal-induced termination. Use EXIT for cleanup that must happen no matter what.

TMPDIR=$(mktemp -d)
trap 'rm -rf -- "$TMPDIR"' EXIT

# ... use $TMPDIR ...
# When the script exits, the trap runs and cleans up

This is the canonical tempfile-cleanup pattern. It’s bulletproof:

Normal exit: handler runs.
exit 1 after an error: handler runs.
set -e triggers on a failed command: handler runs.
User pressed Ctrl+C: handler runs (because Ctrl+C kills the script, which triggers EXIT).
Power cable yanked: well, no handler can save you from that. But for everything that lets bash exit cleanly, EXIT is your friend.

You should put a trap '...' EXIT at the top of nearly every script that creates temporary state.

`ERR` — pseudo-signal for “a command failed”

ERR is fired whenever a command exits non-zero (subject to the same suppression rules as set -e — not inside if, &&, ||, !, until conditions). Useful for error logging:

on_error() {
  local lineno="$1"
  local code="$2"
  echo "Error at line $lineno (exit $code)" >&2
}

trap 'on_error "$LINENO" "$?"' ERR

$LINENO inside the trap holds the line of the failing command. $? holds the exit code. This gives you a poor man’s stack trace when scripts fail.

ERR is fired in addition to EXIT — both fire on a failure. ERR fires first.

For ERR to propagate into functions, set set -E (also known as set -o errtrace). Without it, traps are not inherited by functions and command substitutions:

set -Eeuo pipefail

This is the right strict-mode preamble for any script with non-trivial functions.

`INT` — Ctrl+C

on_interrupt() {
  echo "Interrupted by user" >&2
  cleanup
  exit 130
}

trap on_interrupt INT

130 is the standard “I exited because of SIGINT” exit code (128 + 2). Use it consistently so your callers can distinguish “user interrupted” from “real failure.”

`TERM` — polite shutdown

Systemd, Kubernetes, Docker, and most supervisors send SIGTERM first. They wait some grace period (typically 10-30 seconds), then send SIGKILL.

on_term() {
  echo "Got SIGTERM, shutting down gracefully" >&2
  cleanup
  exit 143
}

trap on_term TERM

If your script is doing something that needs to finish cleanly — write a state file, close a connection, release a lock — your TERM handler is the place. Use the grace period.

Combined handler

A common idiom: same handler for INT and TERM:

on_signal() {
  local sig="$1"
  echo "Received SIG${sig}, shutting down" >&2
  cleanup
  case "$sig" in
    INT)  exit 130 ;;
    TERM) exit 143 ;;
    *)    exit 1 ;;
  esac
}

trap 'on_signal INT' INT
trap 'on_signal TERM' TERM

Or, more cleanly, with a separate cleanup and a trap EXIT that handles all paths:

cleanup() {
  rm -rf -- "$TMPDIR" 2>/dev/null || true
  kill "$BG_PID" 2>/dev/null || true
}

trap cleanup EXIT
trap 'echo "Interrupted"; exit 130' INT
trap 'echo "Terminated"; exit 143' TERM

This works because INT and TERM cause exit, which fires EXIT, which calls cleanup. Two layers, one cleanup function.

4. The canonical tempfile cleanup pattern

Every script that creates temp state should follow this template:

#!/usr/bin/env bash
set -Eeuo pipefail
IFS=$'\n\t'

TMPDIR=$(mktemp -d -t myscript.XXXXXX)
trap 'rm -rf -- "$TMPDIR"' EXIT

# ... use $TMPDIR ...
echo "Working in $TMPDIR"
touch "$TMPDIR/working-file.txt"

# When the script exits, $TMPDIR is automatically cleaned up.

Key points:

mktemp -d creates a fresh directory in /tmp (or $TMPDIR) with an unguessable name. The -t myscript.XXXXXX is a template; mktemp replaces the X’s with random characters.
rm -rf -- "$TMPDIR" — the -- is essential to handle pathologically named tempfiles. Quoted to handle paths with spaces.
trap '...' EXIT runs the cleanup unconditionally on any exit path.

The template-style is preferable to ad-hoc trap calls scattered through the script. Set the trap immediately after creating the resource.

5. Multiple resources: handler stacking

If you have multiple resources to clean up, you have two options:

Option A: one cleanup function

TMPDIR=$(mktemp -d)
LOG_FILE=$(mktemp)

cleanup() {
  rm -rf -- "$TMPDIR"
  rm -f -- "$LOG_FILE"
}

trap cleanup EXIT

Clean. Easy to extend. Recommended for most scripts.

Option B: stack handlers via reassignment

If you want to add cleanup steps as resources are acquired, use this pattern:

add_cleanup() {
  local cmd="$1"
  CLEANUPS+=("$cmd")
  trap 'for c in "${CLEANUPS[@]}"; do eval "$c"; done' EXIT
}

CLEANUPS=()

# Acquire and register
TMPDIR=$(mktemp -d)
add_cleanup "rm -rf -- '$TMPDIR'"

LOCK=$(mktemp)
add_cleanup "rm -f -- '$LOCK'"

PID=$(start-bg-task)
add_cleanup "kill '$PID' 2>/dev/null || true"

Order is preserved. Niche, but useful when you’ve got many resources acquired conditionally throughout a long script.

6. Lock files — the “only one instance running” pattern

If you’re writing a script that should not run concurrently with another instance of itself (cron jobs, daily backups, deployment scripts), you need a lock.

Naive approach (don’t)

LOCK=/tmp/myscript.lock

if [[ -f "$LOCK" ]]; then
  echo "Already running" >&2
  exit 1
fi
touch "$LOCK"
trap 'rm -f -- "$LOCK"' EXIT

This has a race condition: between the [[ -f ]] check and touch, another instance can do the same check, and you get two running instances. The test-and-create is not atomic.

Slightly better (mkdir as atomic)

LOCK=/tmp/myscript.lockdir

if ! mkdir "$LOCK" 2>/dev/null; then
  echo "Already running" >&2
  exit 1
fi
trap 'rmdir -- "$LOCK"' EXIT

mkdir is atomic (it either succeeds or fails with EEXIST). No race. But if your script crashes without cleanup, the lock dir is left behind and you’ll need to remove it manually next time.

The right way: `flock(1)`

LOCK=/var/lock/myscript.lock
exec 9>"$LOCK"

if ! flock -n 9; then
  echo "Already running" >&2
  exit 1
fi

# ... do work ...
# Lock is released automatically when fd 9 closes (i.e. on script exit)

flock uses kernel-level advisory locking on a file descriptor. The lock is held by the process and released when the process exits — no matter how it exits, including SIGKILL, OOM, power loss. There’s no leftover lock file to clean up (well, the file remains, but the lock on it is gone the moment the process is gone).

flock -n is non-blocking: returns immediately with status 1 if the lock is held. Without -n, it blocks until the lock becomes available. Use -n for “single instance, fail otherwise”; omit -n for “wait my turn.”

flock self-locking idiom — handy for cron:

#!/usr/bin/env bash
exec 9>"/var/lock/$(basename "$0").lock"
flock -n 9 || { echo "Already running"; exit 1; }

# ... work ...

In practice, the cleaner version that handles wrapping itself:

[[ "${LOCKED:-}" ]] || exec env LOCKED=1 flock -en /var/lock/myscript.lock "$0" "$@"

A bit cryptic but extremely effective: if the script is invoked without LOCKED=1, it re-execs itself under flock, which prevents two instances from running. Once flock is held, LOCKED=1 is set, so the inner invocation skips the re-exec.

7. Idempotent cleanup

Your cleanup function will sometimes run twice — for instance, if cleanup itself fails partway through, the EXIT trap may re-fire. Make cleanup idempotent: safe to call repeatedly with no error.

cleanup() {
  if [[ -d "${TMPDIR:-}" ]]; then
    rm -rf -- "$TMPDIR"
    TMPDIR=""
  fi
  if [[ -n "${BG_PID:-}" ]] && kill -0 "$BG_PID" 2>/dev/null; then
    kill -TERM "$BG_PID" || true
    wait "$BG_PID" 2>/dev/null || true
    BG_PID=""
  fi
}

Patterns:

Check existence before deleting ([[ -d ]], kill -0).
Clear the variable after using it (TMPDIR="") so a second call is a no-op.
|| true after operations that might “fail” because the resource is already gone.

Idempotent cleanup also matters for cleanup being called from both EXIT and a manual call — e.g., a script that wants to do cleanup before re-execing itself.

8. The production template

Adopt this as the boilerplate for every non-trivial script:

#!/usr/bin/env bash
# myscript.sh — short description of what this does
set -Eeuo pipefail
IFS=$'\n\t'

# --- Logging ---

log() {
  printf '[%s] [%s] %s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$1" "${*:2}" >&2
}

die() {
  log error "$*"
  exit 1
}

# --- Cleanup ---

TMPDIR=""
BG_PIDS=()

cleanup() {
  local rc=$?
  log debug "Cleaning up (exit code ${rc})"

  for pid in "${BG_PIDS[@]}"; do
    if kill -0 "$pid" 2>/dev/null; then
      kill -TERM "$pid" 2>/dev/null || true
      wait "$pid" 2>/dev/null || true
    fi
  done

  if [[ -n "$TMPDIR" && -d "$TMPDIR" ]]; then
    rm -rf -- "$TMPDIR"
  fi

  exit "$rc"
}

on_error() {
  local lineno="$1"
  local code="$2"
  log error "Failure at line ${lineno} (exit ${code})"
}

trap cleanup EXIT
trap 'on_error "$LINENO" "$?"' ERR
trap 'log warn "Interrupted"; exit 130' INT
trap 'log warn "Terminated"; exit 143' TERM

# --- Main ---

main() {
  TMPDIR=$(mktemp -d -t "$(basename "$0").XXXXXX")
  log info "Working in ${TMPDIR}"

  # ... actual work ...

  log info "Done"
}

main "$@"

This template:

Strict-mode preamble with -E for ERR-trap inheritance into functions.
A log function with timestamps that writes to stderr.
A die helper that logs and exits.
A cleanup function that:
- Captures the exit code at entry.
- Kills any tracked background PIDs.
- Removes the tempdir.
- Exits with the original code.
ERR trap that logs the failing line and exit code.
INT and TERM traps that log and exit with the signal-derived code.
All inside a main function called at the end.

Use this as your starting point. Cut what you don’t need, keep what you do.

9. Common signal-handling pitfalls

Forgetting `set -E`

Without set -E, ERR traps are not inherited by shell functions. So this:

set -e
trap 'echo "ERR at $LINENO"' ERR

myfunc() {
  false        # ERR will NOT fire here without set -E
}

myfunc

won’t fire the trap. Add -E:

set -Eeuo pipefail
trap 'echo "ERR at $LINENO"' ERR

Or use set -o errtrace (same thing, longer name).

Single-quote vs double-quote in trap

trap "echo $LINENO" ERR        # WRONG — substitutes LINENO when trap is SET, not when it FIRES
trap 'echo $LINENO' ERR        # CORRECT — substitutes at trap-time

Always single-quote the trap command unless you have a very specific reason to expand at registration time.

Trap for INT but not exiting

If your INT handler doesn’t exit, your script keeps running after Ctrl+C:

trap 'echo "ignoring Ctrl+C"' INT
sleep 60                       # Ctrl+C now just prints the message; sleep continues

This can be deliberate (long-running scripts that should not be Ctrl+C-able) but is more often a bug. If your INT handler should terminate, end it with exit 130.

Background processes don’t inherit traps

Traps reset to default when you fork a subshell or background a process:

trap 'echo caught' INT

(sleep 60) &              # the subshell doesn't inherit your INT trap

Set traps inside the subshell if you need them. Or use ( trap '...' INT; sleep 60 ).

Forgetting cleanup runs even on success

Your cleanup runs on every exit path, including normal successful exit. Make sure your cleanup is OK with running after success:

cleanup() {
  echo "Failure!"           # WRONG — also fires on success
  rm -f "$TMPFILE"
}

Use the exit-code variable:

cleanup() {
  local rc=$?
  if (( rc != 0 )); then
    echo "Failed with exit ${rc}" >&2
  fi
  rm -f "$TMPFILE"
  exit "$rc"
}

10. Sending signals to children

If your script spawned background work, your signal handlers should propagate signals to children:

BG_PID=""

cleanup() {
  if [[ -n "$BG_PID" ]] && kill -0 "$BG_PID" 2>/dev/null; then
    kill -TERM "$BG_PID"
    wait "$BG_PID" 2>/dev/null || true
  fi
}
trap cleanup EXIT

start-long-task &
BG_PID=$!

# main work...

For a whole process group (the script and all its descendants):

trap 'kill -- -$$' EXIT     # send SIGTERM to the entire process group

-$$ (negative of own PID) is the syntax to target a process group. This is heavy-handed — you’ll kill yourself in the process — but for “everything stops now” it works.

For a more targeted approach, use pkill -P $$ -SIGTERM (kill all direct children of this PID).

11. Interaction with `set -e`

set -e and traps interact in subtle ways:

ERR traps fire before set -e exits the shell.
EXIT traps fire after set -e triggers, as part of the exit process.
An ERR trap that exits zero (return 0) does not prevent set -e from firing — set -e is based on the original command’s status.

The mental model: set -e triggers an exit; the exit triggers EXIT (and ERR fired earlier). Your traps should not try to “rescue” set -e-induced exits — instead, log diagnostics and let the exit happen.

12. Real example: a robust deployment runner

#!/usr/bin/env bash
# run-deployment.sh — robust deployment with locking, cleanup, and signal handling
set -Eeuo pipefail
IFS=$'\n\t'

readonly SCRIPT_NAME=$(basename "$0")
readonly LOCK_FILE="/var/lock/${SCRIPT_NAME}.lock"

log() {
  printf '[%s] [%s] %s\n' "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$1" "${*:2}" >&2
}

die() { log error "$*"; exit 1; }

# --- Self-lock via flock ---
if [[ -z "${LOCKED:-}" ]]; then
  exec env LOCKED=1 flock -n "$LOCK_FILE" "$0" "$@"
  die "Could not obtain lock on $LOCK_FILE — another instance running?"
fi

# --- Cleanup state ---
TMPDIR=""
BG_PIDS=()

cleanup() {
  local rc=$?
  log info "Cleanup (exit ${rc})"

  for pid in "${BG_PIDS[@]}"; do
    if kill -0 "$pid" 2>/dev/null; then
      log info "Stopping bg process $pid"
      kill -TERM "$pid" 2>/dev/null || true
      wait "$pid" 2>/dev/null || true
    fi
  done

  if [[ -n "$TMPDIR" && -d "$TMPDIR" ]]; then
    log info "Removing $TMPDIR"
    rm -rf -- "$TMPDIR"
  fi

  exit "$rc"
}

on_err() {
  local lineno="$1"
  local code="$2"
  log error "Failure at line ${lineno} (exit ${code})"
}

trap cleanup EXIT
trap 'on_err "$LINENO" "$?"' ERR
trap 'log warn "Caught SIGINT"; exit 130' INT
trap 'log warn "Caught SIGTERM"; exit 143' TERM

# --- Main ---

main() {
  log info "Deployment starting (PID $$)"
  TMPDIR=$(mktemp -d -t "${SCRIPT_NAME}.XXXXXX")

  # 1. Pull the latest artifacts
  log info "Pulling artifacts"
  curl -fsS https://artifacts.example.com/latest.tgz -o "${TMPDIR}/artifact.tgz"
  tar -xzf "${TMPDIR}/artifact.tgz" -C "${TMPDIR}"

  # 2. Background a health-watch
  ( while true; do
      curl -fsS https://api.example.com/healthz >/dev/null 2>&1 || break
      sleep 5
    done
    log warn "Health check broke!" ) &
  BG_PIDS+=($!)

  # 3. Run the deploy
  log info "Running deploy script"
  "${TMPDIR}/deploy.sh"

  log info "Deployment OK"
}

main "$@"

Things to notice:

Self-lock via flock: the script re-execs itself under flock if not yet locked. Only one instance runs.
Strict-mode + -E for ERR-into-functions.
cleanup handles tempdir AND background processes, idempotently.
ERR trap logs the failing line.
INT / TERM traps log and exit with the right codes (130/143).
All actual work is inside main.

This is the production template. Use it as your default.

13. What you must internalise before lesson 11

What’s the difference between SIGTERM and SIGKILL? (TERM is catchable; KILL is not. SIGKILL gives the process no chance to clean up.)
What’s the EXIT pseudo-signal in bash? (Fires whenever the shell exits, for any reason. Use it for cleanup that must always run.)
What’s the ERR pseudo-signal? (Fires whenever a command exits non-zero, subject to set -e suppression rules.)
What does set -E do? (Makes ERR traps inherited by functions and subshells.)
Why use single quotes around trap commands? (To defer parameter expansion until trap-time, not registration-time.)
What’s the canonical tempfile cleanup pattern? (TMPDIR=$(mktemp -d); trap 'rm -rf -- "$TMPDIR"' EXIT.)
Why is naive [[ -f $LOCK ]] && touch $LOCK racy? (The check-and-create is not atomic; two instances can pass the check simultaneously.)
What’s the right lock-file primitive? (flock on a file descriptor — kernel-level advisory lock that’s released automatically on process exit.)
What’s the standard exit code when killed by SIGINT? (130 = 128 + 2.)
Why must cleanup be idempotent? (It may run twice if cleanup itself triggers another exit.)

If any felt fuzzy, re-read. Lesson 11 (globbing, regex, find, grep, sed) is where we go from process-and-error discipline back into the data-manipulation toolkit at scale.

What’s next

Lesson 11 covers globbing in depth (nullglob, dotglob, globstar, extended globs), regex semantics (BRE vs ERE vs PCRE), the find command from beginner to advanced (filtering, actions, -print0), grep mastery (Perl regex, multiline, context flags), and sed for in-place editing. Bring everything from lessons 1-10.

Signal Handling: trap, EXIT/ERR/INT/TERM, Idempotent Cleanup & Lock-File Discipline — Writing Scripts That Don't Leave a Mess Behind

1. Signals in 90 seconds

2. `trap` — the only signal-handling primitive in shell

3. The four signals you’ll handle 95% of the time

`EXIT` — pseudo-signal for “the script is exiting”

`ERR` — pseudo-signal for “a command failed”

`INT` — Ctrl+C

`TERM` — polite shutdown

Combined handler

4. The canonical tempfile cleanup pattern

5. Multiple resources: handler stacking

Option A: one cleanup function

Option B: stack handlers via reassignment

6. Lock files — the “only one instance running” pattern

Naive approach (don’t)

Slightly better (mkdir as atomic)

The right way: `flock(1)`

7. Idempotent cleanup

8. The production template

9. Common signal-handling pitfalls

Forgetting `set -E`

Single-quote vs double-quote in trap

Trap for INT but not exiting

Background processes don’t inherit traps

Forgetting cleanup runs even on success

10. Sending signals to children

11. Interaction with `set -e`

12. Real example: a robust deployment runner

13. What you must internalise before lesson 11

What’s next

Written by Vinod

Comments

Signal Handling: trap, EXIT/ERR/INT/TERM, Idempotent Cleanup & Lock-File Discipline — Writing Scripts That Don't Leave a Mess Behind

1. Signals in 90 seconds

2. trap — the only signal-handling primitive in shell

3. The four signals you’ll handle 95% of the time

EXIT — pseudo-signal for “the script is exiting”

ERR — pseudo-signal for “a command failed”

INT — Ctrl+C

TERM — polite shutdown

Combined handler

4. The canonical tempfile cleanup pattern

5. Multiple resources: handler stacking

Option A: one cleanup function

Option B: stack handlers via reassignment

6. Lock files — the “only one instance running” pattern

Naive approach (don’t)

Slightly better (mkdir as atomic)

The right way: flock(1)

7. Idempotent cleanup

8. The production template

9. Common signal-handling pitfalls

Forgetting set -E

Single-quote vs double-quote in trap

Trap for INT but not exiting

Background processes don’t inherit traps

Forgetting cleanup runs even on success

10. Sending signals to children

11. Interaction with set -e

12. Real example: a robust deployment runner

13. What you must internalise before lesson 11

What’s next

Written by Vinod

Comments

2. `trap` — the only signal-handling primitive in shell

`EXIT` — pseudo-signal for “the script is exiting”

`ERR` — pseudo-signal for “a command failed”

`INT` — Ctrl+C

`TERM` — polite shutdown

The right way: `flock(1)`

Forgetting `set -E`

11. Interaction with `set -e`