Shell Lesson 9 of 42

Process Management: Subshells, Command Groups, Jobs, fg/bg, wait, nohup & disown — Building Concurrent Shell Without the Foot-Cannons

The shell is not just a calculator that runs one command at a time. It is a process supervisor: it can run commands in the background, group them into subshells, wait for them to finish, send them signals, and let them outlive the controlling terminal. This is what makes “shell as orchestration language” possible — bash drives multi-step deployments, parallel data pipelines, daemon supervisors, and CI runners every minute of every day.

But process management in shell is also the most foot-cannon-rich part of the language. Mistakes here lead to zombie processes, orphaned children, scripts that hang on wait, jobs that mysteriously die when the SSH session drops, log lines from stages that never finished. Once you understand the process model — the same model from lesson 1, applied to the parallel and concurrent cases — these all make sense and stop biting you.

Read this lesson with top or htop open in another window. Run the examples. Watch the process tree change.


1. Subshells (...) vs command groups {...; }

Two ways to group commands. They look similar; they behave very differently.

Subshell: (commands)

Parentheses run the commands in a subshell — a forked child process with its own copy of all variables, working directory, FDs, and shell options. Anything you change inside the subshell is invisible to the parent.

NAME="Alice"
(
  NAME="Bob"
  cd /tmp
  echo "Inside: $NAME, $PWD"
)
echo "Outside: $NAME, $PWD"
# Inside: Bob, /tmp
# Outside: Alice, /your/original/dir

Subshells are excellent when you want temporary state changes that don’t pollute the parent:

# Run a command in a different directory without leaving cd's effect behind
( cd /etc && tar -czf /tmp/etc.tgz . )
# After this, $PWD is unchanged in the parent

# Set environment for a tool without permanently exporting
( export AWS_PROFILE=prod; aws s3 ls )
# AWS_PROFILE is gone after the subshell exits

The cost: a fork(). Cheap (typically <1ms) but not free. Avoid wrapping every block in a subshell; use them deliberately when you need scope isolation.

Command group: { commands; }

Curly braces run commands as a group in the current shell. No fork, no scope isolation, but you can redirect them as a unit:

{
  echo "Header"
  date
  echo "Footer"
} > combined.log

All three commands’ stdout goes to combined.log in one redirection. Without the braces you’d need three separate redirections.

Critical syntactic gotchas with {}:

{ ls; cat /etc/hostname; date; }    # CORRECT
{ ls cat /etc/hostname date }       # WRONG (missing semicolons)

A handy use of {}: redirect a multi-command sequence to a single file:

{
  echo "## System info"
  echo
  echo "Date: $(date)"
  echo "Host: $(hostname)"
  echo "Uptime: $(uptime)"
  echo
  echo "## Disk"
  df -h
  echo
  echo "## Memory"
  free -h
} > status.txt

Or pipe it:

{ echo "header"; cat data.txt; echo "footer"; } | wc -l

When to choose which

Need Use
Scope isolation (variables, cwd) ( ... ) subshell
Bulk redirection of many commands { ...; } command group
Need to keep variable updates in parent { ...; } always
Run a block as a background job ( ... ) & (forks anyway)
Time a block time { ...; } or time ( ... )

2. Background jobs: & and $!

Append & to a command and bash forks the command into a background process and returns immediately:

sleep 10 &
echo "Sleep is running in the background"

The shell prints something like [1] 12345[1] is the job number (a shell-local handle) and 12345 is the PID (kernel-wide).

After running a command in the background, $! holds the PID of the most recently backgrounded job:

long-running-task &
PID=$!
echo "Started with PID $PID"

You can run multiple background jobs:

task1 & PID1=$!
task2 & PID2=$!
task3 & PID3=$!

Each $! is the most recent background PID, so capture it immediately after & and before the next &.

wait — synchronise with background jobs

wait PID blocks until that process exits, then returns its exit status. wait with no argument waits for all background jobs:

task1 & PID1=$!
task2 & PID2=$!
task3 & PID3=$!

wait        # block until all three finish
echo "All done"

To wait for a specific PID and get its exit status:

slow-task &
PID=$!
# ... do other work ...
wait "$PID"
RC=$?
echo "slow-task exited with $RC"

wait $PID is the only reliable way to get the exit status of a specific backgrounded job. The shell’s $? after a &-suffixed command is just the success of launching the job, not its eventual exit code.

Parallel execution with explicit wait

#!/usr/bin/env bash
set -euo pipefail

PIDS=()

for host in web1 web2 web3 web4; do
  ssh "$host" 'sudo systemctl restart myapp' &
  PIDS+=($!)
done

# Wait for all and collect exit codes
FAILED=0
for pid in "${PIDS[@]}"; do
  if ! wait "$pid"; then
    echo "Job $pid failed" >&2
    (( FAILED++ ))
  fi
done

(( FAILED == 0 )) || exit 1

This is the canonical fan-out-and-wait pattern. Four SSH commands run in parallel; the script waits for all of them; any failure makes the script exit non-zero.

wait -n — wait for any one job (bash 4.3+)

task1 & task2 & task3 &

wait -n              # blocks until ANY one of them finishes

Useful for “first one done” patterns or for limited-concurrency loops.


3. Jobs, fg, bg

Interactive shells track jobs — pipelines and command groups that are running. You can list them:

sleep 100 &
sleep 200 &
jobs
# [1]-  Running                 sleep 100 &
# [2]+  Running                 sleep 200 &

The + marks the current job (default for fg and bg). The - marks the previous.

You can suspend a foreground job with Ctrl+Z:

$ vim                    # opens vim
# (press Ctrl+Z)
[1]+  Stopped                  vim
$ jobs
[1]+  Stopped                  vim
$ bg                     # resume the stopped job in background — useless for vim
$ fg                     # bring it back to foreground

Refer to jobs by %N (job number):

fg %1            # bring job 1 to foreground
bg %2            # send job 2 to background
kill %3          # send SIGTERM to job 3
kill -9 %3       # SIGKILL (lesson 10)
disown %1        # remove job 1 from the shell's job table (more in section 5)

Job control is a feature of interactive shells. Inside a script, jobs, fg, and bg are typically not what you want — you want explicit & and wait. Job control is mostly for human-driven workflow at the prompt.


4. nohup, controlling terminal, and SIGHUP

When you log out of a terminal session — close the SSH connection, close the terminal window, log out — every process in your session receives SIGHUP (signal 1, “hangup”). The default action for SIGHUP is to terminate.

This is why naive backgrounding doesn’t survive logout:

ssh user@server 'long-running-job &'   # job dies when ssh exits
ssh user@server
$ long-running-job &
$ exit                                  # job dies as ssh terminates the session

Three solutions:

Solution A: nohup

nohup long-running-job &

nohup does three things:

  1. Sets the SIGHUP signal handler to “ignore” (so the job keeps running when the terminal goes away).
  2. Redirects stdout to nohup.out (or whatever file is writable) so the job has somewhere to write after the terminal is gone.
  3. Redirects stderr to stdout so both go to that file.

Standard usage:

nohup my-task > my-task.log 2>&1 &

Now the job:

Solution B: disown

disown is bash-specific. It removes a job from the shell’s job table without affecting the running process:

my-task &
disown            # the job continues but is no longer "owned" by the shell
exit              # the shell exits; my-task continues

disown -h keeps the job in the table but tells the shell not to send SIGHUP at exit. Subtle difference, rarely matters.

disown doesn’t redirect output — if the job was writing to your terminal, after the terminal goes away, those writes will fail. So disown is best paired with explicit redirection:

my-task > my-task.log 2>&1 &
disown

Solution C: setsid or nohup setsid

setsid runs the command in a new session with no controlling terminal. This is the most robust: not just SIGHUP-immune, but completely detached from the terminal.

setsid my-task > my-task.log 2>&1 < /dev/null

Note the explicit < /dev/null for stdin — without a controlling terminal, reading from stdin would fail.

Solution D: tmux or screen

The “right” answer in 2026 is to run long-running interactive work inside tmux or screen:

tmux new -s mywork
# inside tmux: run your job
# detach with Ctrl+B, then D
exit                  # ssh exit; tmux session keeps running
# next time:
ssh user@host
tmux attach -t mywork

tmux keeps your processes alive across sessions, lets you reattach, and gives you scrollback. For interactive long work, this is unbeatable.


5. The controlling terminal in detail

When bash starts in a terminal, the kernel records the terminal as the bash session’s controlling terminal. All processes spawned by bash inherit this association. The terminal sends signals (SIGINT on Ctrl+C, SIGTSTP on Ctrl+Z, SIGHUP on hangup) to the foreground process group.

When you close the terminal:

  1. The kernel sends SIGHUP to the session leader (your bash).
  2. Bash sends SIGHUP to every job it’s tracking (unless they were disowned or had nohup applied).
  3. Each job, by default, terminates on SIGHUP.

setsid breaks this chain by creating a new session — a process in a new session has no controlling terminal, so the close-terminal-sends-SIGHUP chain doesn’t reach it.

systemd-run --user --scope and similar tools wrap your command in a systemd cgroup, which is yet another layer of isolation. Lesson 25 covers systemd and shell scripts.


6. Concurrency primitives: limiting parallelism

Spawning 1,000 background jobs at once will melt your system. You usually want to limit concurrency.

The xargs -P approach

find /var/log -name '*.log' -print0 | xargs -0 -P 4 -I {} gzip {}

xargs -P 4 runs up to 4 instances in parallel. Distributes work via stdin lines or NUL-separated tokens. Fast, simple, well-tested.

The GNU parallel approach

find /var/log -name '*.log' | parallel -j 4 gzip {}

parallel is more flexible: progress reporting, retry on failure, fancier substitution patterns, output ordering. Lesson 14 covers it in depth.

Hand-rolled with wait -n

For full control:

MAX_PARALLEL=4
RUNNING=0

for item in "${ITEMS[@]}"; do
  process_one "$item" &
  (( RUNNING++ ))

  if (( RUNNING >= MAX_PARALLEL )); then
    wait -n     # wait for any one to finish
    (( RUNNING-- ))
  fi
done

wait            # wait for the rest

This is the worker-pool pattern. Each new job replaces a finished one, keeping concurrency at exactly MAX_PARALLEL. Bash 4.3+ for wait -n.

FIFO-based semaphore (advanced)

For more elaborate orchestration, you can use a FIFO (named pipe from L7) as a counting semaphore. Lesson 14 details this.


7. Process inspection from shell

You’ll often need to look at what’s running.

# Most useful flags
ps -ef                   # all processes, full format
ps -ef --forest          # tree view (Linux)
ps -eo pid,ppid,pgid,sid,stat,cmd | head    # custom columns
ps -p $$                 # this shell's own info
ps --pid 12345           # specific PID
ps --ppid $$             # all children of this shell

# Top-like
top -b -n 1 | head        # batch mode, one snapshot, head for brevity
htop                      # interactive (if installed)

# pgrep / pkill
pgrep -f myapp            # PIDs whose command line contains "myapp"
pgrep -u alice            # PIDs owned by alice
pkill -f -SIGTERM myapp   # send SIGTERM to all matching processes

# Process tree
pstree -p $$              # tree from this shell (Linux only)

# Check if a process is alive (cheap, no fork)
kill -0 $PID 2>/dev/null && echo "$PID is running"

kill -0 PID doesn’t actually send a signal; it just checks “could I send a signal.” Returns 0 if the PID exists and you have permission, non-zero otherwise. The classic “is this PID alive?” probe.


8. Eight process-management idioms

# 1. Run a command in a different directory without changing cwd
( cd /tmp && tar -czf /backup/etc.tgz etc )

# 2. Bulk redirect a group of commands
{
  echo "## Report"
  date
  uptime
  df -h
} > report.txt

# 3. Fan-out, wait-all, propagate failures
PIDS=()
for h in "${HOSTS[@]}"; do
  ssh "$h" 'restart-service' & PIDS+=($!)
done
FAIL=0
for pid in "${PIDS[@]}"; do
  wait "$pid" || (( FAIL++ ))
done
(( FAIL == 0 ))

# 4. Run-and-detach for long-lived work
nohup my-task > my-task.log 2>&1 < /dev/null &
disown

# 5. Worker pool with bounded concurrency
MAX=4
RUNNING=0
for item in "${ITEMS[@]}"; do
  process_one "$item" &
  (( RUNNING++ ))
  if (( RUNNING >= MAX )); then
    wait -n
    (( RUNNING-- ))
  fi
done
wait

# 6. Time-out a command
timeout 30 my-flaky-tool

# 7. Check if a PID is alive
if kill -0 "$PID" 2>/dev/null; then
  echo "still running"
fi

# 8. Run with logging, watch progress
( my-build-task 2>&1 | tee build.log ) &
BUILD_PID=$!
# ... while build runs, you can read build.log live ...
wait "$BUILD_PID"
echo "Build exit code: $?"

9. The timeout command

timeout DURATION command runs command and kills it if it’s still running after DURATION:

timeout 30 curl https://slow-api.example.com/users
echo $?
# 0    — succeeded within 30s
# 124  — was killed by timeout (the standard "I timed out" exit code)
# X    — anything else: the command's natural exit code

timeout first sends SIGTERM. If the command doesn’t die within a grace period (default 10s), it sends SIGKILL. Tunable:

timeout --kill-after=5s 30s my-task     # SIGTERM at 30s, SIGKILL at 35s

timeout -s SIGINT 30 my-task sends SIGINT instead of SIGTERM (useful for tools that handle Ctrl+C cleanly).

A bash-only equivalent (no timeout binary needed):

my-task &
PID=$!
( sleep 30; kill -TERM "$PID" 2>/dev/null ) &
KILLER=$!

wait "$PID"
RC=$?
kill "$KILLER" 2>/dev/null
exit "$RC"

Crude but works. timeout(1) from coreutils is the right tool when available.


10. Common pitfalls

Backgrounding without redirection

my-task &     # WRONG if you'll close the terminal — output goes to terminal which dies

Always redirect when backgrounding for survival:

my-task > my-task.log 2>&1 < /dev/null &
disown

&& doesn’t background

task1 && task2 &     # ambiguous: does the & apply to task2 only, or the whole chain?

The answer is: & applies to task2 only. task1 && task2 & is “run task1; if it succeeds, run task2 in the background.” The & is a terminator, not a modifier.

To background a whole chain, group it:

{ task1 && task2; } &

Forgetting to capture $!

task1 &
task2 &       # NOW $! refers to task2
wait $!       # waits only for task2

Capture immediately:

task1 & PID1=$!
task2 & PID2=$!
wait $PID1
wait $PID2

Subshells and set -e

set -e does not propagate from a subshell to the parent. If you run (failing-cmd) & and the subshell fails, the parent script still keeps running. The parent only sees the exit code via $? or wait. Always check.

Job control disabled in non-interactive shells

In a script (non-interactive bash), fg, bg, and jobs may not work as expected. The default in scripts is “no job control.” Almost always you want explicit &, $!, and wait instead.


11. Real example: parallel host probe with timeouts

#!/usr/bin/env bash
# probe-hosts.sh — probe a list of hosts in parallel and report status
set -euo pipefail
IFS=$'\n\t'

HOSTS=("$@")
[[ ${#HOSTS[@]} -gt 0 ]] || { echo "Usage: $0 host1 host2 ..." >&2; exit 2; }

MAX_PARALLEL=10
TIMEOUT=5

probe_one() {
  local host="$1"
  if timeout "$TIMEOUT" curl -fsS --max-time "$TIMEOUT" "http://$host/healthz" >/dev/null 2>&1; then
    printf '%s\tOK\n' "$host"
    return 0
  else
    printf '%s\tFAIL\n' "$host"
    return 1
  fi
}

export -f probe_one
export TIMEOUT

OUTPUT_TMP=$(mktemp)
trap 'rm -f -- "$OUTPUT_TMP"' EXIT

# Fan out with bounded concurrency
RUNNING=0
PIDS=()

for host in "${HOSTS[@]}"; do
  probe_one "$host" >> "$OUTPUT_TMP" &
  PIDS+=($!)
  (( RUNNING++ ))
  if (( RUNNING >= MAX_PARALLEL )); then
    wait -n
    (( RUNNING-- ))
  fi
done

# Wait for the rest
for pid in "${PIDS[@]}"; do
  wait "$pid" 2>/dev/null || true
done

# Report
sort "$OUTPUT_TMP"
echo
OK_COUNT=$(grep -c $'\tOK$' "$OUTPUT_TMP" || true)
FAIL_COUNT=$(grep -c $'\tFAIL$' "$OUTPUT_TMP" || true)
echo "OK: $OK_COUNT, FAIL: $FAIL_COUNT, TOTAL: ${#HOSTS[@]}"

(( FAIL_COUNT == 0 )) || exit 1

Things to notice:

This is real concurrent shell. You’d run it as:

./probe-hosts.sh web1 web2 web3 web4 web5 ... web100

100 probes, 10 at a time, 5-second timeout per call, sorted output, propagated exit code. About 20 lines of real logic.


12. What you must internalise before lesson 10

If any felt fuzzy, re-read. Lesson 10 (signals and trap) is the natural sequel — once you’re managing processes, you need to respond to signals, clean up gracefully, and handle Ctrl+C correctly.


What’s next

Lesson 10 covers signals and trap: the SIGINT/SIGTERM/SIGKILL/SIGUSR1 model, the trap builtin for handler registration, the EXIT pseudo-signal for cleanup, the ERR pseudo-signal for fail-fast diagnostics, idempotent cleanup with tempfiles and lock files, and the canonical “structured cleanup” pattern. Bring everything from lessons 1–9.

shellbashsubshellbackgroundjobswaitnohupdisownprocessconcurrencyfundamentalslinux
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments