Shell scripts are slow. That’s the headline. The interesting question is how slow, where the time goes, and when it crosses the threshold where rewriting in a different language is justified.
Most operators reach for shell because it’s familiar and “fast enough.” That’s right 95% of the time. The remaining 5% — tight loops, line-by-line processing of big files, scripts called per-request from a web server — is where shell ceilings get hit hard, and where the difference between “naïve shell” and “tuned shell” can be 100x.
This lesson is the quantitative answer to “why is my script slow?” and “should I leave shell?”:
- Profiling: how to find where time is going (it’s almost always fork/exec, but you should measure).
- The fork/exec ceiling: every external command costs ~1ms. With 10,000 invocations, that’s 10 seconds before you’ve done any work.
- Builtins vs externals: when a builtin like
[[ ]]beats[ ](which forks/bin/[); whenprintfbeatsecho; whenreadbeatshead -n 1. - Anti-patterns by perf cost — the 5 patterns you’ll find in any slow shell script.
- Empirical thresholds: at what point does it pay to switch to awk, perl, python, or go?
- A real example: profiling a 30-second script down to 0.3 seconds, then rewriting it in awk for 0.05 seconds.
By the end, you’ll know how to measure, how to optimize, and — most importantly — when to stop optimizing shell and write something else.
1. The fork/exec ceiling — the most important number to internalize
Every external command (grep, sed, awk, cut, wc, even cat) costs a fork() and an exec(). On modern Linux, that’s roughly:
- ~1ms per fork+exec in a fresh container.
- ~0.5ms per fork+exec on bare metal, warm cache.
That doesn’t sound like much. But:
# 10,000 invocations of /bin/true (does nothing):
$ time bash -c 'for i in {1..10000}; do /bin/true; done'
real 0m6.2s
6 seconds doing literally nothing. That’s the floor. Any script with a tight loop that calls externals will hit this.
1.1 The classic anti-pattern
Reading lines and pulling one field per line:
# BAD — forks `cut` once per line:
while IFS= read -r line; do
field=$(echo "$line" | cut -d, -f2)
process "$field"
done < big-file.csv
For a 100,000-line file, this is 100,000 × (echo + cut) ≈ 100,000 × 1ms ≈ 100 seconds.
The same logic, no fork:
# GOOD — uses bash parameter expansion:
while IFS=, read -r _ field _; do
process "$field"
done < big-file.csv
For 100,000 lines: ~1 second. 100x speedup, just by removing one cut call per line.
1.2 The “use awk” version
For pure data processing, awk reads the whole file in one process:
awk -F, '{print $2}' big-file.csv | while IFS= read -r field; do
process "$field"
done
awk parses the file once. The shell loop only does what shell can’t avoid. For most “process a CSV” tasks, awk is 50–100x faster than shell-only.
Or even better: do the processing in awk:
awk -F, '{ # process_field(field2) }' big-file.csv
If you can express the work entirely in awk, you avoid the shell entirely for the inner loop.
2. Profiling a shell script — finding where time goes
Before optimizing, measure. Three tools, increasing in detail.
2.1 time — the wall-clock baseline
$ time ./myscript.sh
real 0m4.532s
user 0m1.230s
sys 0m3.100s
real: wall-clock time.user: CPU time spent in user space.sys: CPU time spent in kernel (this is where fork/exec time accumulates).
If sys is more than half of user+sys, fork/exec is your bottleneck. The fix is reducing external command calls.
2.2 set -x with timestamped trace
bash’s xtrace (set -x) prints every command. Add timestamps via PS4 to get a per-line timing log:
#!/usr/bin/env bash
PS4='+ $(date "+%s.%N")\011'
exec 3>>/tmp/trace.log
BASH_XTRACEFD=3
set -x
# Your script body...
Now /tmp/trace.log has lines like:
+ 1710081234.523000000 for i in {1..10000}
+ 1710081234.524000000 for i in {1..10000}
+ 1710081234.525000000 echo 1 | wc -c
+ 1710081234.527000000 echo 2 | wc -c
...
Each line shows when the command started. Subtracting consecutive timestamps gives per-line cost. Pipe into a tool to find the slowest 10 lines:
awk '{print $2, $0}' /tmp/trace.log | sort -nr | head
BASH_XTRACEFD=3 keeps the trace out of stdout/stderr, so it doesn’t pollute your script’s normal output.
2.3 Bash’s time builtin — per-pipeline timing
time some_function arg1 arg2
time grep foo file | sort | uniq
Where time (the builtin, not /usr/bin/time) measures one command or pipeline. For systematic profiling, wrap functions:
profile() {
local label=$1; shift
local start end
start=$(date +%s.%N)
"$@"
end=$(date +%s.%N)
printf '[PROFILE] %s: %.3fs\n' "$label" "$(awk "BEGIN{print $end - $start}")" >&2
}
profile "load_config" load_config
profile "process_data" process_data file.csv
profile "write_output" write_output result.txt
Output:
[PROFILE] load_config: 0.012s
[PROFILE] process_data: 4.231s
[PROFILE] write_output: 0.045s
Now you know process_data is 99% of runtime — focus optimization there.
2.4 perf for system-level insight
For deep profiling on Linux:
sudo perf stat ./myscript.sh
Output includes context-switches, page-faults, and (importantly) the count of fork() syscalls:
Performance counter stats for './myscript.sh':
4,532.10 msec task-clock # 0.998 CPUs utilized
12,453 context-switches # 2.749 K/sec
8,124 page-faults # 1.793 K/sec
9,872 forks # 2.179 K/sec
That forks line is the one to watch. 9,872 forks in 4.5 seconds confirms fork/exec dominates. Every fork is a process creation; for a script that “should just compute things,” that’s the smoking gun.
2.5 Is it stuck?
For a script that seems to hang, attach strace to see where it’s blocked:
strace -p $(pgrep -f myscript.sh) -tt -f 2>&1 | head -50
You’ll see syscalls in real-time. Common findings:
- Stuck on
read()— waiting for input that never comes. - Stuck on
connect()— network call without timeout. - Stuck on
wait4()— waiting for a child process that’s hung.
strace is invaluable for “the script doesn’t crash, it just doesn’t progress.”
3. Builtins vs externals — when to use which
bash has dozens of builtins (commands implemented inside the shell, no fork). They’re 10–100x faster than the equivalent external. Knowing which is a builtin is operational knowledge.
3.1 Common builtins — these are FAST
# All builtins (no fork):
echo, printf, read, [[, [, test, type, declare, local, unset
shift, set, break, continue, return, exit
true, false, :
pwd, cd, pushd, popd
let, ((, eval, source, .
trap, kill (the builtin), wait
type cmd tells you what cmd is:
$ type printf
printf is a shell builtin
$ type sed
sed is /usr/bin/sed
If type says “shell builtin,” it’s free (no fork). If it says a path, every call costs 1ms.
3.2 The deceptive ones — [ ] is sometimes a builtin
Historically, [ ] was an external (/bin/[). In bash, it’s a builtin. So [ -f file ] is fast in bash. But on minimal POSIX shells, [ may actually fork.
[[ ]] is always a bash builtin and never forks. It’s faster than [ ] even when both are builtins, because [[ ]] is a special parser construct (no word-splitting, no globbing).
For perf: [[ ]] > [ ] > test.
3.3 The killer pattern: $(< file) is faster than $(cat file)
# Forks cat:
content=$(cat /etc/hostname)
# Bash builtin: no fork:
content=$(< /etc/hostname)
$(< file) is a bash special form that reads the file directly. ~1ms saved per invocation. Loop over many files? Significant speedup.
3.4 Common externals you can replace
| External | Builtin replacement | Speedup |
|---|---|---|
cat file |
$(<file) for small files |
~5x |
wc -l file |
mapfile arr < file; echo ${#arr[@]} |
~3x |
cut -d, -f2 <<< "$line" |
IFS=, read _ a _ <<< "$line" |
~10x |
echo "$x" | tr a-z A-Z |
echo "${x^^}" |
~10x |
expr 1 + 2 |
$(( 1 + 2 )) |
~50x |
sleep 0.1 |
(no replacement; sleep is a fast external) | n/a |
basename "$path" |
${path##*/} |
~10x |
dirname "$path" |
${path%/*} |
~10x |
basename and dirname as externals are surprisingly common — and surprisingly costly in tight loops. Replacing with parameter expansion is a big win.
3.5 The printf trick for repeated strings
Building a long string:
# Bad — forks for every `:`:
result=""
for i in $(seq 1 10000); do
result="${result}:"
done
# Good — printf builtin, all in one call:
printf -v result '%.s:' {1..10000}
printf -v var writes to a variable instead of stdout — pure builtin, no fork. The %.s: format prints : for each argument while ignoring the value. For building filler strings or repeated patterns, this is the bash equivalent of Python’s ':' * 10000.
4. Subshells — the silent fork
Subshells are written ( ... ) or $(cmd). Each one is a fork(). They’re cheap (~0.3ms vs ~1ms for fork+exec since no execve), but in tight loops they add up.
4.1 Counting subshells in a script
# Each $() is a subshell:
total=0
while IFS= read -r line; do
parts=$(echo "$line" | awk -F, '{print NF}') # 1 subshell per line
total=$((total + parts))
done < big.csv
100k lines × 1 subshell × ~1ms = 100 seconds.
4.2 Eliminating subshells
# Same logic without subshells:
total=0
while IFS=, read -ra parts; do
total=$((total + ${#parts[@]}))
done < big.csv
-a parts reads into an array; ${#parts[@]} is the length, all builtin. 100k lines now takes ~1s.
4.3 The “command substitution in a loop” giveaway
Anytime you see $( ... ) inside a while or for loop, that’s a fork-per-iteration. Pull it out of the loop or rewrite without it.
# Forks date 100k times:
for i in $(seq 1 100000); do
echo "$(date +%s) iteration $i"
done
# Forks date once:
NOW=$(date +%s)
for i in $(seq 1 100000); do
echo "$NOW iteration $i"
done
If the value can be cached, cache it.
4.4 The pipeline-in-loop pattern
# Each | is a fork. This is 4 processes per iteration:
for x in "$@"; do
echo "$x" | tr a-z A-Z | sed 's/.../...' | head -c 10
done
# Move to awk: 1 process for the entire loop:
printf '%s\n' "$@" | awk '{
s = toupper($0)
sub(/.../, "...", s)
print substr(s, 1, 10)
}'
When you see ≥3 pipes in a tight loop, the answer is awk. awk is a small DSL specifically designed for the line-processing pattern. It’s 10–100x faster than the equivalent bash pipeline-in-loop.
5. The “should I rewrite this in another language?” decision
Sometimes shell isn’t the right tool. The threshold:
| If your script… | Consider rewriting in… |
|---|---|
| Reads >100k lines and does per-line logic | awk, then perl, then python |
| Uses associative arrays heavily | python, perl |
| Does HTTP calls in a loop with parsing | python (requests), go |
| Runs sub-second per request, called >10/s | go, python (warm process) |
| Implements a state machine | python, go |
| Manipulates JSON/YAML extensively | python (with pyyaml), jq for read-only |
| Does floating-point math | python, perl, awk (limited) |
| Talks to databases | python, go |
| Has more than 1000 lines | almost any other language |
Quick reference: shell is a glue language. It’s optimal for orchestration (call this command, check exit code, call the next), poor for computation (per-line transforms, math, parsing).
5.1 The benchmarks that justify the move
Same task: count distinct values in column 2 of a 1M-line CSV.
# Pure shell (no awk):
cut -d, -f2 file.csv | sort -u | wc -l # ~5s
# awk (one process):
awk -F, '{++c[$2]} END{print length(c)}' file.csv # ~0.4s
# python:
python3 -c "
import csv
seen = set()
with open('file.csv') as f:
for row in csv.reader(f):
seen.add(row[1])
print(len(seen))
" # ~0.6s
# Go (compiled):
# (a 30-line program, runs in ~0.15s)
For one-off, manual analysis: shell with awk is fine. For a job that runs every 5 minutes processing growing CSVs: pay the cost to rewrite in Go. The 30x speedup over pure shell pays back in operational cost (CPU/IO) and reduced operational risk.
6. Patterns that are always wrong, perf-wise
6.1 cat file | grep ... — the useless cat
# Wrong: forks cat for no reason.
cat file.txt | grep foo
# Right:
grep foo file.txt
# OR if you must pipe (e.g. complex generation):
grep foo < file.txt
This won’t change your hot path, but it indicates the author hasn’t measured. Once you start counting forks, this becomes obvious.
6.2 Multiple grep | grep | grep
# Wrong:
grep foo file.txt | grep bar | grep baz
# Right (single grep with multiple patterns):
grep -E 'foo' file.txt | grep -E 'bar' | grep -E 'baz'
# OR (single grep, all conditions on each line):
awk '/foo/ && /bar/ && /baz/' file.txt
Each grep is a separate process reading the input. awk does one pass.
6.3 for i in $(cat file) — reads whole file then iterates
# Wrong: $(cat) loads whole file into a string, splits on whitespace, iterates.
for line in $(cat file.txt); do
process "$line"
done
# Right:
while IFS= read -r line; do
process "$line"
done < file.txt
The for in $(cat) form word-splits on IFS (whitespace), which corrupts lines with spaces. It also loads the entire file before iteration begins. The while read form streams one line at a time, preserves whitespace, and is more memory-efficient.
6.4 result=$(command); echo "$result"
# Wrong: captures output then re-emits it. Useless subshell.
result=$(curl -s "$URL")
echo "$result"
# Right (just let curl print directly):
curl -s "$URL"
If you need to use the result for something else, fine. If you’re just echoing it, the assignment is a wasted subshell.
6.5 seq for big ranges
# Wrong: forks seq, prints 1..10000 to stdout, shell tokenizes:
for i in $(seq 1 10000); do
echo "$i"
done
# Right (bash brace expansion, no fork):
for i in {1..10000}; do
echo "$i"
done
# Or C-style (no expansion, no extra memory):
for ((i=1; i<=10000; i++)); do
echo "$i"
done
Brace expansion {1..10000} is bash-only and creates the whole list in memory. C-style for is more memory-efficient for huge ranges. seq adds fork+exec.
7. Real-world example: optimizing a log-processing script
Let’s walk through optimizing a real (representative) script.
7.1 The original — 30 seconds
#!/usr/bin/env bash
# log-summary.sh — summarise a 100k-line nginx access log
# Original: takes ~30 seconds.
set -euo pipefail
LOG=$1
declare -A status_count
declare -A path_count
while IFS= read -r line; do
status=$(echo "$line" | awk '{print $9}')
path=$(echo "$line" | awk '{print $7}')
status_count[$status]=$((${status_count[$status]:-0} + 1))
path_count[$path]=$((${path_count[$path]:-0} + 1))
done < "$LOG"
echo "Status counts:"
for s in "${!status_count[@]}"; do
echo " $s: ${status_count[$s]}"
done
echo "Top 10 paths:"
for p in "${!path_count[@]}"; do
echo " $p: ${path_count[$p]}"
done | sort -k2 -nr | head -10
For a 100k-line file: 30 seconds.
7.2 Profiling
$ time ./log-summary.sh access.log
real 0m31.42s
user 0m18.20s
sys 0m12.85s
sys is 12.85s — that’s fork overhead. perf stat confirms 200k+ forks (2 per line: one for each echo | awk).
7.3 First optimization — eliminate the per-line forks
Replace the echo | awk with read parsing fields directly:
while IFS=' ' read -r ip _ _ _ _ method path proto status _; do
status_count[$status]=$((${status_count[$status]:-0} + 1))
path_count[$path]=$((${path_count[$path]:-0} + 1))
done < "$LOG"
Note: nginx fields are space-separated; the _ placeholders skip the ones we don’t need. read -r is a builtin, no fork.
$ time ./log-summary.sh access.log
real 0m1.23s
user 0m1.10s
sys 0m0.10s
25x speedup by removing 200k forks. sys is now negligible.
7.4 Second optimization — let awk do everything
For pure aggregation, awk is the right tool:
#!/usr/bin/env bash
LOG=$1
awk '
{ status_count[$9]++; path_count[$7]++ }
END {
print "Status counts:"
for (s in status_count) print " " s ": " status_count[s]
print "Top 10 paths:"
n = 0
PROCINFO["sorted_in"] = "@val_num_desc"
for (p in path_count) {
print " " p ": " path_count[p]
if (++n >= 10) break
}
}
' "$LOG"
$ time ./log-summary.sh access.log
real 0m0.18s
user 0m0.15s
sys 0m0.03s
170x speedup over original. Single process, single read of the file, all aggregation in awk’s hash tables.
7.5 Lessons from this exercise
- Profile first: don’t guess where time goes.
timeandperftold us fork was the issue. - Builtins are 10–100x cheaper than externals: replacing
echo | awkwithreadwas a 25x speedup. - The right tool wins: even tuned shell is 7x slower than awk for this task. awk is built for line-oriented aggregation; shell isn’t.
- Don’t optimize blindly: each optimization above took 5 minutes. We measured before and after each change. Without measurement, you can spend days on changes that don’t help.
8. Quick reference card
The “is this slow?” checklist
time ./script.sh # baseline
PS4='+ $(date "+%s.%N")\011' bash -x \
./script.sh 2>/tmp/trace.log # per-line timing
sudo perf stat ./script.sh # forks count
strace -p $PID -tt -f # if it's stuck
The “always do this” rules
[[ ]]over[ ]in bash scripts.$(< file)instead of$(cat file).${var^^}instead oftr a-z A-Z.${path##*/}instead ofbasename "$path".$(( ))instead ofexprorlet.{1..10000}instead of$(seq 1 10000).read -rainstead ofcut-in-loop.- awk instead of
cmd | sed | grep | headchains.
The “rewrite in another language” thresholds
| Symptom | Action |
|---|---|
| Reads ≥100k lines per run | Move to awk |
| Has associative arrays nested ≥2 levels | Move to python |
| Does ≥10 HTTP calls per run | Move to python or go |
| Called >10/s in production | Move to go (compiled) |
| Has float math | Move to awk, python, perl |
The fork cost rule of thumb
1 fork = ~1ms
1000 forks = 1 second
100k forks = 100 seconds (visible)
1M forks = 17 minutes (production-killing)
The “where do I look for forks?” pattern
Anything inside a tight loop:
$( ... ) # subshell + maybe exec
| anything | ... # each pipe is a fork
[ ... ] # was external, now builtin (in bash)
echo "$x" | cmd # cat, echo, tr, sed in pipes — all forks
9. Wrap-up
Shell scripts are slow because every external command is a process. The fix is to:
- Measure first —
time, xtrace withPS4,perf stat. Don’t guess. - Reduce forks — replace externals with builtins where they exist (
[[ ]],$(< ),${var^^},$(( ))). - Eliminate per-iteration forks — move computation to awk, or pull invariants outside the loop.
- Know when to leave — if you’re doing computation-heavy work, especially nested data structures or per-request invocation, shell isn’t the right tool. awk for pure data; python for general; go for performance-critical.
The performance ceiling of a tuned shell script is roughly: ~1k operations/sec for fork-heavy code, ~100k operations/sec for builtin-only code. awk is ~1M operations/sec; go is ~10M+. Pick the level that matches your need.
Most importantly: the right tool is the one that solves the problem at the right speed without becoming a maintenance burden. A 100-line shell script that takes 30 seconds is fine if it runs nightly. The same script as a 1000-line shell mess that takes 2 seconds is worse than a 200-line python program that takes 1 second. Measure, optimize where it matters, rewrite when shell hits its ceiling.
Next: L25 — security. We’ll cover command injection, IFS attacks, quoting hardening, and input validation — the security side of “shell is just executing strings.”