Shell Lesson 8 of 42

Pipes & Pipelines, In Depth: PIPESTATUS, set -o pipefail, SIGPIPE & Multi-Stage Pipeline Discipline

If lesson 7 was “how shell talks to files,” lesson 8 is “how shell connects programs to each other.” The pipe (|) is the most distinctive feature of the Unix shell — the thing that made Doug McIlroy’s design philosophy (“write programs that do one thing well; write programs to work together”) real and operational. Every interesting shell command longer than two tokens usually involves a pipe.

Pipes are also the source of the single most common silent-bug class in shell: a pipeline whose exit code is zero even though three earlier stages failed. This lesson is about understanding why that happens, when it matters, and how set -o pipefail and the PIPESTATUS array let you build pipelines you can actually trust in production.

This is also the lesson that fixes the question we deferred from L4: “why does set -e not catch the failure of curl in curl ... | grep ...?” The answer is right here.


1. What a pipe actually is

When you write A | B, the shell:

  1. Calls pipe() to get a pair of FDs from the kernel: a read end and a write end.
  2. Forks a child process for A. In that child, it dups the write end of the pipe to fd 1, closes the read end, and execs A. So A’s stdout is the pipe.
  3. Forks a child process for B. In that child, it dups the read end of the pipe to fd 0, closes the write end, and execs B. So B’s stdin is the pipe.
  4. Closes both ends of the pipe in the parent.
  5. Waits for one or both children, depending on the shell’s policy.

A and B run concurrently. As soon as A writes bytes to its stdout, those bytes are buffered in the kernel pipe (typical capacity: 64KB on Linux), and B reads them. If A is fast and B is slow, the pipe fills up and A blocks until B drains it. If B is fast and A is slow, B blocks waiting for input. This back-pressure is automatic and is the reason pipes scale to gigabytes — the kernel just makes producer and consumer take turns.

Two critical implications:

The lastpipe shell option (covered in L4) changes the bash policy so the last stage runs in the current shell. It’s bash-only and disabled by default; relying on it is non-portable.


2. The default exit-code rule and why it’s a trap

Bash’s default exit-code rule for a pipeline is: the exit code is the exit code of the last command.

false | true
echo $?              # 0 — because true (the last command) succeeded

Read that again. The pipeline false | true returns success — even though false clearly failed. The reason: the last command (true) succeeded, and that’s what bash reports.

This is the trap. Any pipeline whose final stage is reliable — grep, head, awk, tee — silently swallows earlier failures:

curl https://api.example.invalid | jq '.users[0].name'
echo $?              # 0 — even if curl couldn't resolve the host!
                     # Because jq successfully reported "null" or the input was empty

This pattern hides real production failures. You think your pipeline succeeded; it didn’t. The downstream system (a cron job, a deployment, a CI gate) sees zero and proceeds. Bad data, partial deployments, missed alerts — all because the wrong stage’s exit code became the pipeline’s exit code.


3. set -o pipefail — the fix

Add this to your strict-mode preamble (we already have it in the L2 strict-mode template):

set -o pipefail

With pipefail, the pipeline’s exit code is the exit code of the rightmost command that failed (or zero if all succeeded).

set -o pipefail
false | true
echo $?              # 1 — false's exit code propagates

curl https://api.example.invalid | jq .
echo $?              # 6 — curl's "couldn't resolve host"

This is essential. Every script you write past 20 lines should have set -o pipefail. Without it, your pipelines lie to you about success.

There’s a subtlety: pipefail returns the rightmost failure, not the leftmost. If both curl and jq fail, you get jq’s exit code, which often hides the more interesting failure (the network issue). But this is still vastly better than the default of always-zero.

For complete error inspection, use PIPESTATUS.


4. The PIPESTATUS array

Bash records the exit status of every stage of the most recent pipeline in a special array called PIPESTATUS:

false | true | false
echo "${PIPESTATUS[@]}"     # 1 0 1

Index 0 is the leftmost stage. You can inspect any stage:

curl https://api.example.com/users | jq '.[]' | wc -l
echo "curl exit: ${PIPESTATUS[0]}"
echo "jq exit: ${PIPESTATUS[1]}"
echo "wc exit: ${PIPESTATUS[2]}"

PIPESTATUS is reset by every command, including echo, so capture it immediately:

curl ... | jq ... | wc -l
PIPE_STATUSES=("${PIPESTATUS[@]}")     # snapshot
echo "Statuses: ${PIPE_STATUSES[*]}"

This is invaluable for debugging multi-stage pipelines or for retry logic that should fire only on specific stage failures.

PIPESTATUS is bash-specific. The POSIX equivalent is $? after pipefail (which gives you only the rightmost failure). Some shells (zsh) use pipestatus (lowercase) instead.

Inspecting PIPESTATUS in conditions

set -o pipefail
curl -fsS https://api.example.com/users | jq -e '.[].id' > ids.txt

# After the pipeline
case "${PIPESTATUS[@]}" in
  "0 0")
    echo "All good"
    ;;
  *" 0")
    echo "curl failed but jq somehow succeeded — investigate"
    ;;
  "0 "*)
    echo "curl OK; jq failed (likely empty or malformed response)"
    ;;
  *)
    echo "Both failed"
    ;;
esac

Niche but powerful. For most cases pipefail plus a single $? check is enough.


5. SIGPIPE — when “failure” is intentional

Here’s a confusing scenario. With pipefail enabled:

set -o pipefail
yes | head -n 5
echo $?              # 141 (or sometimes 0; varies)

head reads 5 lines and closes its stdin. yes keeps writing forever, but its writes go to a closed pipe — at which point the kernel sends yes the signal SIGPIPE (signal 13). yes dies. With pipefail, the pipeline’s exit code becomes the exit code of yes, which (if it died from SIGPIPE) is 128 + 13 = 141.

This is a false positive. There’s nothing wrong — head deliberately stopped reading because it had what it needed. But pipefail reports failure. This is the most-cited downside of pipefail.

The fixes:

Option A: ignore the specific SIGPIPE exit code in your error handling

set -euo pipefail
yes | head -n 5 || [[ $? == 141 ]]

Or wrap in a function:

ignore_sigpipe() {
  "$@"
  local rc=$?
  (( rc == 141 )) && return 0
  return $rc
}

ignore_sigpipe yes | head -n 5

Option B: use a tool that handles its own EOF gracefully

head -n 5 can be replaced with awk 'NR<=5' which reads to end-of-input gracefully. But this throws away head’s laziness — it processes the entire upstream output, which defeats the optimisation.

Option C: turn off pipefail just for this pipeline

set +o pipefail
yes | head -n 5
set -o pipefail

Verbose; only worth it for one-off cases. In practice, most scripts ignore the SIGPIPE-with-pipefail issue because head is rarely fed by a command that you’d want to error-check anyway. If your upstream is cat or yes or seq, who cares if it dies. The cases where SIGPIPE matters are when the upstream might also legitimately fail (e.g. curl | head), and there you need explicit handling.

The SIGPIPE signal in your own scripts

If your script writes to a closed pipe, your script also receives SIGPIPE. By default the shell will exit with 141. You can ignore SIGPIPE explicitly with trap '' PIPE (lesson 10).


6. Pipelines and set -e

Recall from L3 that set -e exits the shell on a command failure. With pipelines:

So set -e and pipefail are complementary; you want both. The standard preamble:

set -euo pipefail

is the right starting point.

There’s also a rarely-used flag set -e cousin: set -o errexit is just set -e. Bash also has set -E which makes traps inherited by functions. Lesson 10 covers trap and errtrace.


7. The |& operator (bash 4+)

|& is shorthand for 2>&1 | — pipe both stdout and stderr.

make build |& tee build.log
# equivalent to: make build 2>&1 | tee build.log

Useful when you want to capture or filter both streams together. It’s bash 4+. POSIX-portable scripts should use 2>&1 |.

Note: when both streams pipe together, they may interleave in surprising ways because they’re buffered separately at the producer. For deterministic ordering, the producer needs to flush stderr before stdout (or vice versa), or you need to fully capture and post-process.


8. Multi-stage pipeline discipline

A pipeline of 3-4 stages is normal. A pipeline of 10 stages is a code smell — break it up. Each | is a process boundary, and at some point the cognitive load of “which stage filtered out the records I’m now missing?” exceeds the elegance of one-liner shell.

The right shape for a long pipeline:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

# Stage 1: collect raw data
RAW=$(curl -fsS https://api.example.com/users)

# Stage 2: extract fields
USERS=$(echo "$RAW" | jq -r '.users[] | "\(.id)\t\(.email)"')

# Stage 3: filter
ACTIVE=$(echo "$USERS" | awk -F'\t' '$2 ~ /@example\.com$/')

# Stage 4: count
COUNT=$(echo "$ACTIVE" | wc -l)

echo "Active users: $COUNT"

vs. the one-liner:

curl -fsS ... | jq -r ... | awk ... | wc -l

The intermediate-variables form is slower (each stage forks $(...) and re-parses) but vastly easier to debug. You can echo "$RAW" | head between stages to see what changed. For production scripts where correctness matters more than speed, prefer the explicit form.

For really long, performance-sensitive pipelines, write the data to a file at each major step and pick up from there:

curl ... > /tmp/raw.json
jq ... < /tmp/raw.json > /tmp/users.tsv
awk ... < /tmp/users.tsv > /tmp/active.tsv
wc -l < /tmp/active.tsv

This is the shape of an ETL job. Each step is restartable, debuggable, observable.


9. Common pipeline antipatterns

ls | grep

Don’t. We covered this in L4 — never parse ls output. Use a glob, find, or find -print0 | xargs -0.

ls /etc | grep -v conf       # WRONG
find /etc -mindepth 1 -maxdepth 1 ! -name '*.conf'   # RIGHT

cat | grep

Useless use of cat. cat file | grep pattern is the same as grep pattern file but with one more fork. The latter is preferred:

cat file.log | grep ERROR    # WRONG
grep ERROR file.log          # RIGHT
< file.log grep ERROR        # also correct, sometimes preferred for visual flow

The < file.log grep ERROR form puts the data source first, reading more naturally as “from this file, run grep.” Useful taste.

grep ... | wc -l

grep -c does this without forking wc:

grep ERROR file.log | wc -l   # WORKS but unnecessarily forks
grep -cE '^ERROR' file.log    # FAST and clearer

awk | sed

Anything sed can do, awk can do. If you’re already piping into awk, finish in awk:

awk '{print $1}' file | sed 's/old/new/'    # one fork too many
awk '{ gsub(/old/, "new", $1); print $1 }' file

Lesson 12 covers awk mastery in depth.

cmd | grep -v ^$

Filtering blank lines is a sed idiom (sed '/^$/d') or just grep . (matches “any non-empty line”):

cmd | grep -v '^$'           # fine
cmd | grep .                 # shorter
cmd | sed '/^$/d'            # also fine

Subshell pipe-into-while loop

Already covered in L4. cmd | while read line; do COUNT=...; done runs the loop in a subshell, so COUNT doesn’t update. Use while read; do ...; done < <(cmd) instead.


10. Pipeline performance and parallelism

Pipes are the simplest form of parallelism in shell — each stage runs in its own process and they share the CPU. For CPU-bound work this can give you 2-4x speedup if the stages are roughly balanced.

For more aggressive parallelism, lesson 14 covers xargs -P and GNU parallel. Quick preview:

# Process 1000 files, 4 in parallel
find /var/log -name '*.log' -print0 | xargs -0 -P 4 -I {} gzip {}

xargs -P 4 runs up to 4 instances of the command concurrently, distributing input lines among them. The pipe is to feed input; the parallelism is in xargs.

When pipelines hurt

When pipelines win

The mental rule: pipes for streams (data passing through stages once), files for state (data being mutated and re-read).


11. The tee family of pipeline observability tools

Once you have multi-stage pipelines, you need observability. The tee command from L7 is the basic tool; combine with process substitution for more.

# Capture intermediate stage output for debugging
curl -fsS https://api.example.com/users \
  | tee /tmp/raw.json \
  | jq -r '.users[].email' \
  | tee /tmp/emails.txt \
  | awk '/@example\.com$/' \
  | wc -l

After running, /tmp/raw.json has the original API response, /tmp/emails.txt has the extracted emails, and the terminal shows the count. You can re-run from any stage by feeding /tmp/... into the next stage manually.

pv (Pipe Viewer) is another observability tool — it shows progress and throughput:

pv huge-file.tsv | jq -r '.id' | sort -u > unique-ids.txt

pv reports MB/s, ETA, and progress bar. Brilliant for long-running pipelines on large files.


12. Real example: ingest, transform, validate

#!/usr/bin/env bash
# ingest.sh — pull users from an API, validate, store
set -euo pipefail
IFS=$'\n\t'

API_URL="${API_URL:?API_URL required}"
OUT_FILE="${OUT_FILE:-users.tsv}"

# Stage 1 + 2: fetch and extract — capture both for debugging
RAW_TMP=$(mktemp)
trap 'rm -f -- "$RAW_TMP"' EXIT

curl -fsS "$API_URL/users" > "$RAW_TMP"

# Stage 3: extract structured fields
mapfile -t USERS < <(jq -r '.users[] | "\(.id)\t\(.email)\t\(.role)"' < "$RAW_TMP")

# Sanity check the row count
EXPECTED_COUNT=$(jq -r '.users | length' < "$RAW_TMP")
ACTUAL_COUNT="${#USERS[@]}"
if (( ACTUAL_COUNT != EXPECTED_COUNT )); then
  echo "ERROR: extracted $ACTUAL_COUNT users but API said $EXPECTED_COUNT" >&2
  exit 3
fi

# Stage 4: validate each row
INVALID=0
for row in "${USERS[@]}"; do
  IFS=$'\t' read -r id email role <<< "$row"
  if [[ -z "$id" || -z "$email" || ! "$email" =~ @ ]]; then
    echo "WARN: bad row: $row" >&2
    (( INVALID++ ))
  fi
done

if (( INVALID > 0 )); then
  echo "ERROR: $INVALID invalid rows out of $ACTUAL_COUNT" >&2
  exit 4
fi

# Stage 5: write output atomically
TMP_OUT=$(mktemp)
printf '%s\n' "${USERS[@]}" > "$TMP_OUT"
mv -- "$TMP_OUT" "$OUT_FILE"

echo "Wrote $ACTUAL_COUNT users to $OUT_FILE"

# Stage 6: report PIPESTATUS-aware exit
exit 0

Things to notice:

This is the production shape. Long pipelines should not be written as one-liners. Break them into testable, restartable, observable steps.


13. The pipefail cheat-sheet

set -o pipefail            # essential — don't let last-stage success hide upstream failures
"${PIPESTATUS[@]}"         # stage-by-stage exit codes (bash only)
PIPE=("${PIPESTATUS[@]}")  # capture before $? is reset

cmd1 | cmd2 | cmd3 || echo "Pipeline failed: ${PIPESTATUS[*]}"

# Ignore SIGPIPE
yes | head -n 5 || [[ $? == 141 ]]

# Pipe both stdout and stderr (bash 4+)
make build |& tee build.log

# POSIX equivalent
make build 2>&1 | tee build.log

# Tee to multiple destinations
cmd | tee >(gzip > out.gz) >(grep ERROR > errors.txt) > /dev/null

# Useless-use-of-cat avoidance
< file.txt grep ERROR     # right
grep ERROR file.txt       # also right
cat file.txt | grep ERROR # WRONG

14. What you must internalise before lesson 9

If any felt fuzzy, re-read. Lesson 9 covers process management — subshells, command groups, jobs, wait, nohup — the building blocks for lesson 10’s signal-handling discussion.


What’s next

Lesson 9 covers process management: subshells (...), command groups {...; }, background &, jobs, fg/bg, wait, nohup, disown, and the precise lifecycle of a backgrounded process. Bring everything from lessons 1–8 — every backgrounded job is a process-tree decision.

shellbashpipespipefailpipestatussigpipeexit-codespipelinesfundamentalslinux
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments