Pipes & Pipelines, In Depth: PIPESTATUS, set -o pipefail, SIGPIPE & Multi-Stage Pipeline Discipline

If lesson 7 was “how shell talks to files,” lesson 8 is “how shell connects programs to each other.” The pipe (|) is the most distinctive feature of the Unix shell — the thing that made Doug McIlroy’s design philosophy (“write programs that do one thing well; write programs to work together”) real and operational. Every interesting shell command longer than two tokens usually involves a pipe.

Pipes are also the source of the single most common silent-bug class in shell: a pipeline whose exit code is zero even though three earlier stages failed. This lesson is about understanding why that happens, when it matters, and how set -o pipefail and the PIPESTATUS array let you build pipelines you can actually trust in production.

This is also the lesson that fixes the question we deferred from L4: “why does set -e not catch the failure of curl in curl ... | grep ...?” The answer is right here.

1. What a pipe actually is

When you write A | B, the shell:

Calls pipe() to get a pair of FDs from the kernel: a read end and a write end.
Forks a child process for A. In that child, it dups the write end of the pipe to fd 1, closes the read end, and execs A. So A’s stdout is the pipe.
Forks a child process for B. In that child, it dups the read end of the pipe to fd 0, closes the write end, and execs B. So B’s stdin is the pipe.
Closes both ends of the pipe in the parent.
Waits for one or both children, depending on the shell’s policy.

A and B run concurrently. As soon as A writes bytes to its stdout, those bytes are buffered in the kernel pipe (typical capacity: 64KB on Linux), and B reads them. If A is fast and B is slow, the pipe fills up and A blocks until B drains it. If B is fast and A is slow, B blocks waiting for input. This back-pressure is automatic and is the reason pipes scale to gigabytes — the kernel just makes producer and consumer take turns.

Two critical implications:

Each pipe stage is a separate process — its own PID, its own copy of the environment, its own working directory. Variables set inside one stage cannot be read by another stage of the same pipeline.
Each pipe stage may also be a separate subshell, depending on the shell. In bash by default, every stage runs in a subshell, including the last one. This is the cause of the L4 “while-read counter” bug.

The lastpipe shell option (covered in L4) changes the bash policy so the last stage runs in the current shell. It’s bash-only and disabled by default; relying on it is non-portable.

2. The default exit-code rule and why it’s a trap

Bash’s default exit-code rule for a pipeline is: the exit code is the exit code of the last command.

false | true
echo $?              # 0 — because true (the last command) succeeded

Read that again. The pipeline false | true returns success — even though false clearly failed. The reason: the last command (true) succeeded, and that’s what bash reports.

This is the trap. Any pipeline whose final stage is reliable — grep, head, awk, tee — silently swallows earlier failures:

curl https://api.example.invalid | jq '.users[0].name'
echo $?              # 0 — even if curl couldn't resolve the host!
                     # Because jq successfully reported "null" or the input was empty

This pattern hides real production failures. You think your pipeline succeeded; it didn’t. The downstream system (a cron job, a deployment, a CI gate) sees zero and proceeds. Bad data, partial deployments, missed alerts — all because the wrong stage’s exit code became the pipeline’s exit code.

3. `set -o pipefail` — the fix

Add this to your strict-mode preamble (we already have it in the L2 strict-mode template):

set -o pipefail

With pipefail, the pipeline’s exit code is the exit code of the rightmost command that failed (or zero if all succeeded).

set -o pipefail
false | true
echo $?              # 1 — false's exit code propagates

curl https://api.example.invalid | jq .
echo $?              # 6 — curl's "couldn't resolve host"

This is essential. Every script you write past 20 lines should have set -o pipefail. Without it, your pipelines lie to you about success.

There’s a subtlety: pipefail returns the rightmost failure, not the leftmost. If both curl and jq fail, you get jq’s exit code, which often hides the more interesting failure (the network issue). But this is still vastly better than the default of always-zero.

For complete error inspection, use PIPESTATUS.

4. The `PIPESTATUS` array

Bash records the exit status of every stage of the most recent pipeline in a special array called PIPESTATUS:

false | true | false
echo "${PIPESTATUS[@]}"     # 1 0 1

Index 0 is the leftmost stage. You can inspect any stage:

curl https://api.example.com/users | jq '.[]' | wc -l
echo "curl exit: ${PIPESTATUS[0]}"
echo "jq exit: ${PIPESTATUS[1]}"
echo "wc exit: ${PIPESTATUS[2]}"

PIPESTATUS is reset by every command, including echo, so capture it immediately:

curl ... | jq ... | wc -l
PIPE_STATUSES=("${PIPESTATUS[@]}")     # snapshot
echo "Statuses: ${PIPE_STATUSES[*]}"

This is invaluable for debugging multi-stage pipelines or for retry logic that should fire only on specific stage failures.

PIPESTATUS is bash-specific. The POSIX equivalent is $? after pipefail (which gives you only the rightmost failure). Some shells (zsh) use pipestatus (lowercase) instead.

Inspecting `PIPESTATUS` in conditions

set -o pipefail
curl -fsS https://api.example.com/users | jq -e '.[].id' > ids.txt

# After the pipeline
case "${PIPESTATUS[@]}" in
  "0 0")
    echo "All good"
    ;;
  *" 0")
    echo "curl failed but jq somehow succeeded — investigate"
    ;;
  "0 "*)
    echo "curl OK; jq failed (likely empty or malformed response)"
    ;;
  *)
    echo "Both failed"
    ;;
esac

Niche but powerful. For most cases pipefail plus a single $? check is enough.

5. SIGPIPE — when “failure” is intentional

Here’s a confusing scenario. With pipefail enabled:

set -o pipefail
yes | head -n 5
echo $?              # 141 (or sometimes 0; varies)

head reads 5 lines and closes its stdin. yes keeps writing forever, but its writes go to a closed pipe — at which point the kernel sends yes the signal SIGPIPE (signal 13). yes dies. With pipefail, the pipeline’s exit code becomes the exit code of yes, which (if it died from SIGPIPE) is 128 + 13 = 141.

This is a false positive. There’s nothing wrong — head deliberately stopped reading because it had what it needed. But pipefail reports failure. This is the most-cited downside of pipefail.

The fixes:

Option A: ignore the specific SIGPIPE exit code in your error handling

set -euo pipefail
yes | head -n 5 || [[ $? == 141 ]]

Or wrap in a function:

ignore_sigpipe() {
  "$@"
  local rc=$?
  (( rc == 141 )) && return 0
  return $rc
}

ignore_sigpipe yes | head -n 5

Option B: use a tool that handles its own EOF gracefully

head -n 5 can be replaced with awk 'NR<=5' which reads to end-of-input gracefully. But this throws away head’s laziness — it processes the entire upstream output, which defeats the optimisation.

Option C: turn off pipefail just for this pipeline

set +o pipefail
yes | head -n 5
set -o pipefail

Verbose; only worth it for one-off cases. In practice, most scripts ignore the SIGPIPE-with-pipefail issue because head is rarely fed by a command that you’d want to error-check anyway. If your upstream is cat or yes or seq, who cares if it dies. The cases where SIGPIPE matters are when the upstream might also legitimately fail (e.g. curl | head), and there you need explicit handling.

The `SIGPIPE` signal in your own scripts

If your script writes to a closed pipe, your script also receives SIGPIPE. By default the shell will exit with 141. You can ignore SIGPIPE explicitly with trap '' PIPE (lesson 10).

6. Pipelines and `set -e`

Recall from L3 that set -e exits the shell on a command failure. With pipelines:

Without pipefail: only the last stage’s failure can trigger set -e (because that’s the pipeline’s exit code).
With pipefail: any stage’s failure can trigger set -e.

So set -e and pipefail are complementary; you want both. The standard preamble:

set -euo pipefail

is the right starting point.

There’s also a rarely-used flag set -e cousin: set -o errexit is just set -e. Bash also has set -E which makes traps inherited by functions. Lesson 10 covers trap and errtrace.

7. The `|&` operator (bash 4+)

|& is shorthand for 2>&1 | — pipe both stdout and stderr.

make build |& tee build.log
# equivalent to: make build 2>&1 | tee build.log

Useful when you want to capture or filter both streams together. It’s bash 4+. POSIX-portable scripts should use 2>&1 |.

Note: when both streams pipe together, they may interleave in surprising ways because they’re buffered separately at the producer. For deterministic ordering, the producer needs to flush stderr before stdout (or vice versa), or you need to fully capture and post-process.

8. Multi-stage pipeline discipline

A pipeline of 3-4 stages is normal. A pipeline of 10 stages is a code smell — break it up. Each | is a process boundary, and at some point the cognitive load of “which stage filtered out the records I’m now missing?” exceeds the elegance of one-liner shell.

The right shape for a long pipeline:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

# Stage 1: collect raw data
RAW=$(curl -fsS https://api.example.com/users)

# Stage 2: extract fields
USERS=$(echo "$RAW" | jq -r '.users[] | "\(.id)\t\(.email)"')

# Stage 3: filter
ACTIVE=$(echo "$USERS" | awk -F'\t' '$2 ~ /@example\.com$/')

# Stage 4: count
COUNT=$(echo "$ACTIVE" | wc -l)

echo "Active users: $COUNT"

vs. the one-liner:

curl -fsS ... | jq -r ... | awk ... | wc -l

The intermediate-variables form is slower (each stage forks $(...) and re-parses) but vastly easier to debug. You can echo "$RAW" | head between stages to see what changed. For production scripts where correctness matters more than speed, prefer the explicit form.

For really long, performance-sensitive pipelines, write the data to a file at each major step and pick up from there:

curl ... > /tmp/raw.json
jq ... < /tmp/raw.json > /tmp/users.tsv
awk ... < /tmp/users.tsv > /tmp/active.tsv
wc -l < /tmp/active.tsv

This is the shape of an ETL job. Each step is restartable, debuggable, observable.

9. Common pipeline antipatterns

`ls | grep`

Don’t. We covered this in L4 — never parse ls output. Use a glob, find, or find -print0 | xargs -0.

ls /etc | grep -v conf       # WRONG
find /etc -mindepth 1 -maxdepth 1 ! -name '*.conf'   # RIGHT

`cat | grep`

Useless use of cat. cat file | grep pattern is the same as grep pattern file but with one more fork. The latter is preferred:

cat file.log | grep ERROR    # WRONG
grep ERROR file.log          # RIGHT
< file.log grep ERROR        # also correct, sometimes preferred for visual flow

The < file.log grep ERROR form puts the data source first, reading more naturally as “from this file, run grep.” Useful taste.

`grep ... | wc -l`

grep -c does this without forking wc:

grep ERROR file.log | wc -l   # WORKS but unnecessarily forks
grep -cE '^ERROR' file.log    # FAST and clearer

`awk | sed`

Anything sed can do, awk can do. If you’re already piping into awk, finish in awk:

awk '{print $1}' file | sed 's/old/new/'    # one fork too many
awk '{ gsub(/old/, "new", $1); print $1 }' file

Lesson 12 covers awk mastery in depth.

`cmd | grep -v ^$`

Filtering blank lines is a sed idiom (sed '/^$/d') or just grep . (matches “any non-empty line”):

cmd | grep -v '^$'           # fine
cmd | grep .                 # shorter
cmd | sed '/^$/d'            # also fine

Subshell pipe-into-while loop

Already covered in L4. cmd | while read line; do COUNT=...; done runs the loop in a subshell, so COUNT doesn’t update. Use while read; do ...; done < <(cmd) instead.

10. Pipeline performance and parallelism

Pipes are the simplest form of parallelism in shell — each stage runs in its own process and they share the CPU. For CPU-bound work this can give you 2-4x speedup if the stages are roughly balanced.

For more aggressive parallelism, lesson 14 covers xargs -P and GNU parallel. Quick preview:

# Process 1000 files, 4 in parallel
find /var/log -name '*.log' -print0 | xargs -0 -P 4 -I {} gzip {}

xargs -P 4 runs up to 4 instances of the command concurrently, distributing input lines among them. The pipe is to feed input; the parallelism is in xargs.

When pipelines hurt

Tiny inputs: a 2-stage pipeline on 5 lines is slower than two function calls because of the fork overhead.
State that needs to survive: anything you compute in stage N is invisible to stage N+1 except via the pipe. If you need to share state, don’t use a pipe.
Random access: pipes are strictly forward-streaming. If stage N+1 needs to look at line 47 after seeing line 50, write to a file instead.

When pipelines win

Large inputs: streaming through a pipe is much faster than buffering everything to a file and re-reading.
Composable steps: each step can be tested independently.
Back-pressure: the kernel handles producer/consumer rate-matching automatically.

The mental rule: pipes for streams (data passing through stages once), files for state (data being mutated and re-read).

11. The `tee` family of pipeline observability tools

Once you have multi-stage pipelines, you need observability. The tee command from L7 is the basic tool; combine with process substitution for more.

# Capture intermediate stage output for debugging
curl -fsS https://api.example.com/users \
  | tee /tmp/raw.json \
  | jq -r '.users[].email' \
  | tee /tmp/emails.txt \
  | awk '/@example\.com$/' \
  | wc -l

After running, /tmp/raw.json has the original API response, /tmp/emails.txt has the extracted emails, and the terminal shows the count. You can re-run from any stage by feeding /tmp/... into the next stage manually.

pv (Pipe Viewer) is another observability tool — it shows progress and throughput:

pv huge-file.tsv | jq -r '.id' | sort -u > unique-ids.txt

pv reports MB/s, ETA, and progress bar. Brilliant for long-running pipelines on large files.

12. Real example: ingest, transform, validate

#!/usr/bin/env bash
# ingest.sh — pull users from an API, validate, store
set -euo pipefail
IFS=$'\n\t'

API_URL="${API_URL:?API_URL required}"
OUT_FILE="${OUT_FILE:-users.tsv}"

# Stage 1 + 2: fetch and extract — capture both for debugging
RAW_TMP=$(mktemp)
trap 'rm -f -- "$RAW_TMP"' EXIT

curl -fsS "$API_URL/users" > "$RAW_TMP"

# Stage 3: extract structured fields
mapfile -t USERS < <(jq -r '.users[] | "\(.id)\t\(.email)\t\(.role)"' < "$RAW_TMP")

# Sanity check the row count
EXPECTED_COUNT=$(jq -r '.users | length' < "$RAW_TMP")
ACTUAL_COUNT="${#USERS[@]}"
if (( ACTUAL_COUNT != EXPECTED_COUNT )); then
  echo "ERROR: extracted $ACTUAL_COUNT users but API said $EXPECTED_COUNT" >&2
  exit 3
fi

# Stage 4: validate each row
INVALID=0
for row in "${USERS[@]}"; do
  IFS=$'\t' read -r id email role <<< "$row"
  if [[ -z "$id" || -z "$email" || ! "$email" =~ @ ]]; then
    echo "WARN: bad row: $row" >&2
    (( INVALID++ ))
  fi
done

if (( INVALID > 0 )); then
  echo "ERROR: $INVALID invalid rows out of $ACTUAL_COUNT" >&2
  exit 4
fi

# Stage 5: write output atomically
TMP_OUT=$(mktemp)
printf '%s\n' "${USERS[@]}" > "$TMP_OUT"
mv -- "$TMP_OUT" "$OUT_FILE"

echo "Wrote $ACTUAL_COUNT users to $OUT_FILE"

# Stage 6: report PIPESTATUS-aware exit
exit 0

Things to notice:

Strict-mode + pipefail.
Each “stage” is an explicit step with intermediate state in a tempfile or array.
trap (lesson 10) cleans up the tempfile on exit.
Validation: extracted count is compared to API-reported count to catch silent jq filter bugs.
Atomic output: write to a tempfile, then mv into place. mv on the same filesystem is atomic at the kernel level — readers either see the old file or the new file, never a partial write.
Distinct exit codes per failure mode.
The earlier-pipeline observability fix is the RAW_TMP=$(mktemp) plus writing the API response there before piping anywhere — you can re-run from there if jq is bad.

This is the production shape. Long pipelines should not be written as one-liners. Break them into testable, restartable, observable steps.

13. The pipefail cheat-sheet

set -o pipefail            # essential — don't let last-stage success hide upstream failures
"${PIPESTATUS[@]}"         # stage-by-stage exit codes (bash only)
PIPE=("${PIPESTATUS[@]}")  # capture before $? is reset

cmd1 | cmd2 | cmd3 || echo "Pipeline failed: ${PIPESTATUS[*]}"

# Ignore SIGPIPE
yes | head -n 5 || [[ $? == 141 ]]

# Pipe both stdout and stderr (bash 4+)
make build |& tee build.log

# POSIX equivalent
make build 2>&1 | tee build.log

# Tee to multiple destinations
cmd | tee >(gzip > out.gz) >(grep ERROR > errors.txt) > /dev/null

# Useless-use-of-cat avoidance
< file.txt grep ERROR     # right
grep ERROR file.txt       # also right
cat file.txt | grep ERROR # WRONG

14. What you must internalise before lesson 9

What does A | B actually do at the kernel level? (pipe() + 2 forks; A’s stdout = B’s stdin.)
Why does each pipe stage run in its own process? (Each stage is a separate exec’d binary or subshell — that’s how concurrency works.)
What’s the default pipeline exit-code rule? (Exit code of the last stage only.)
What does set -o pipefail change? (Pipeline exit code = rightmost failing stage’s code, or zero if all OK.)
What’s PIPESTATUS? (Bash array of every stage’s exit code; reset by next command.)
Why does yes | head -n 5 sometimes give exit code 141? (head closes stdin → yes gets SIGPIPE → dies → exit 141 = 128 + 13.)
What’s the bash shorthand for piping both stdout and stderr? (|&, equivalent to 2>&1 |.)
When should you NOT use a long one-liner pipeline? (When debuggability matters; when steps need state; when input is small.)
What’s the right shape for ETL-style shell? (Explicit stages, intermediate files, tempfile + atomic mv.)
What’s “useless use of cat”? (cat file | cmd instead of cmd file or < file cmd.)

If any felt fuzzy, re-read. Lesson 9 covers process management — subshells, command groups, jobs, wait, nohup — the building blocks for lesson 10’s signal-handling discussion.

What’s next

Lesson 9 covers process management: subshells (...), command groups {...; }, background &, jobs, fg/bg, wait, nohup, disown, and the precise lifecycle of a backgrounded process. Bring everything from lessons 1–8 — every backgrounded job is a process-tree decision.

Pipes & Pipelines, In Depth: PIPESTATUS, set -o pipefail, SIGPIPE & Multi-Stage Pipeline Discipline

1. What a pipe actually is

2. The default exit-code rule and why it’s a trap

3. `set -o pipefail` — the fix

4. The `PIPESTATUS` array

Inspecting `PIPESTATUS` in conditions

5. SIGPIPE — when “failure” is intentional

The `SIGPIPE` signal in your own scripts

6. Pipelines and `set -e`

7. The `|&` operator (bash 4+)

8. Multi-stage pipeline discipline

9. Common pipeline antipatterns

`ls | grep`

`cat | grep`

`grep ... | wc -l`

`awk | sed`

`cmd | grep -v ^$`

Subshell pipe-into-while loop

10. Pipeline performance and parallelism

When pipelines hurt

When pipelines win

11. The `tee` family of pipeline observability tools

12. Real example: ingest, transform, validate

13. The pipefail cheat-sheet

14. What you must internalise before lesson 9

What’s next

Written by Vinod

Comments

Pipes & Pipelines, In Depth: PIPESTATUS, set -o pipefail, SIGPIPE & Multi-Stage Pipeline Discipline

1. What a pipe actually is

2. The default exit-code rule and why it’s a trap

3. set -o pipefail — the fix

4. The PIPESTATUS array

Inspecting PIPESTATUS in conditions

5. SIGPIPE — when “failure” is intentional

The SIGPIPE signal in your own scripts

6. Pipelines and set -e

7. The |& operator (bash 4+)

8. Multi-stage pipeline discipline

9. Common pipeline antipatterns

ls | grep

cat | grep

grep ... | wc -l

awk | sed

cmd | grep -v ^$

Subshell pipe-into-while loop

10. Pipeline performance and parallelism

When pipelines hurt

When pipelines win

11. The tee family of pipeline observability tools

12. Real example: ingest, transform, validate

13. The pipefail cheat-sheet

14. What you must internalise before lesson 9

What’s next

Written by Vinod

Comments

3. `set -o pipefail` — the fix

4. The `PIPESTATUS` array

Inspecting `PIPESTATUS` in conditions

The `SIGPIPE` signal in your own scripts

6. Pipelines and `set -e`

7. The `|&` operator (bash 4+)

`ls | grep`

`cat | grep`

`grep ... | wc -l`

`awk | sed`

`cmd | grep -v ^$`

11. The `tee` family of pipeline observability tools