We’ve climbed from echo to find … -print0 | xargs -0. So far the mental model has been “stream of bytes, lines, words.” That’s enough for 70% of shell tasks. The remaining 30% have structure: JSON from APIs, YAML from Kubernetes, CSV from finance teams, columnar output from ps/df/docker. For these, grep and sed are the wrong tool. You don’t want to regex JSON; you want a parser.
This final Tier 2 lesson covers four tools that fill that gap. By the end you will be able to:
- Use
awknot just for “print column 2” but as a real programming language with associative arrays and multi-file processing. - Use
jqto slice, transform, and rewrite JSON in pipelines and scripts. - Use
yq(the Go version, by Mike Farah) to do the same for YAML — essential for Kubernetes manifests and Helm charts. - Use
csvkitto handle CSV files the way they actually exist in the wild (quoted fields, embedded commas, type inference, SQL queries). - Avoid the locale and UTF-8 pitfalls that cause sort to “fail” on German names and grep to be 10x slower than it should be.
This is the closer of Wave 1. After this, you have a complete shell-fundamentals foundation. Wave 2 builds advanced patterns — error frameworks, package managers, CLI design, secrets — on top of this.
1. awk — the data-processing language hidden inside awk
awk is named after its three creators — Aho, Weinberger, Kernighan — and is one of the original Unix-era programming languages. It looks like a one-liner tool, but it’s actually a small, complete programming language: variables, control flow, functions, associative arrays, regex.
The data model
Every awk program follows the same structure:
awk 'BEGIN { ... } /PATTERN/ { ACTION } END { ... }' FILE
awk reads input one record at a time (default = one line). For each record, it splits into fields (default separator = whitespace). Then it runs each pattern { action } block where the pattern matches.
$0— the entire current record$1, $2, …— fields 1, 2, etc.NF— number of fields in current recordNR— record number (1-indexed, across all input)FNR— record number in current file (resets per file)FS— input field separator (default: whitespace)OFS— output field separator (default: space)RS— input record separator (default: newline)ORS— output record separator (default: newline)FILENAME— name of current input file
“Print column N” — the boilerplate
ps -ef | awk '{ print $2 }' # print 2nd column (PID)
ls -l | awk '{ print $5, $9 }' # size and name
df -h | awk 'NR > 1 { print $5, $6 }' # skip header (NR==1)
But that’s awk’s boring 5%. Where it shines:
BEGIN and END blocks
# Sum a column
ls -l *.log | awk 'BEGIN { total = 0 } { total += $5 } END { print "Total:", total }'
# Average response time from a log
awk 'BEGIN { sum = 0; n = 0 } /response_ms=/ { sum += $NF; n++ } END { print sum/n }' app.log
BEGIN runs before any input. END runs after all input. Use them for initialization and summarization.
Arithmetic and conditions
# Print files larger than 1MB (column 5 from ls -l is size in bytes)
ls -l | awk '$5 > 1024 * 1024 { print $9 }'
# Format payroll: fields are NAME HOURS RATE
awk '{ printf "%-10s $%.2f\n", $1, $2 * $3 }' payroll.txt
Note printf (lowercase, awk-builtin, separate from shell printf) — same C-style format string.
Field separators (-F and OFS)
# Print user, shell from /etc/passwd (colon-separated)
awk -F: '{ print $1, $7 }' /etc/passwd
# Convert CSV-like input to TSV output (DOES NOT handle quoted commas — see csvkit later)
awk -F, 'BEGIN{OFS="\t"} { $1=$1; print }' input.csv > output.tsv
The $1=$1 trick is a classic awk idiom — it forces awk to “rebuild” the record using the new OFS, even if no field actually changed.
Associative arrays — counting and grouping
This is awk’s killer feature. Arrays in awk are associative (string keys), not indexed.
# Count distinct user IDs in /etc/passwd
awk -F: '{ count[$3]++ } END { for (uid in count) print uid, count[uid] }' /etc/passwd
# Count log lines per HTTP status code from nginx access log
awk '{ status[$9]++ } END { for (s in status) print s, status[s] }' access.log
# Sum bytes by IP address (nginx log columns: 1=IP, 10=bytes)
awk '{ bytes[$1] += $10 } END { for (ip in bytes) print bytes[ip], ip }' access.log | sort -rn | head
This is enormously powerful. You’re aggregating in a single pass without sorting first.
Multi-file processing
# Compare line counts of two files
awk 'NR==FNR { count++; next } END { print FILENAME, count, NR-count }' file1 file2
# Merge two files by key (a kind of join)
awk 'NR==FNR { map[$1] = $2; next } { if ($1 in map) print $1, $2, map[$1] }' lookup.tsv data.tsv
The NR==FNR trick: while we’re on the first file, NR (overall record number) equals FNR (current file record number). Once we move to the second file, they diverge. So NR==FNR { … ; next } means “process only the first file, and skip to the next record.”
Regex and patterns
# Print only lines that match a regex
awk '/ERROR|WARN/ { print }' app.log
# Print lines where field 3 starts with "foo"
awk '$3 ~ /^foo/' data.tsv
# Negation
awk '!/DEBUG/' app.log # everything except DEBUG lines
# Print lines from PATTERN to PATTERN (range, like sed)
awk '/^START/,/^END/' file.txt
printf for formatting
awk '{ printf "%-30s %10d\n", $1, $2 }' data.tsv
# Common formats:
# %s string
# %d integer
# %f float
# %e scientific
# %x hex
# %o octal
# %-10s left-align in width 10
# %5.2f float, width 5, 2 decimals
awk functions
# Built-ins: length, substr, index, split, gsub, sub, tolower, toupper, sprintf, ...
awk '{ print length($0), $0 }' file # length of each line
awk '{ print toupper($1) }' file # uppercase first field
awk '{ gsub(/foo/, "bar"); print }' file # global substitute (like sed)
# Custom functions
awk 'function abs(x) { return x < 0 ? -x : x } { print abs($1) }' data
A complete real-world awk script
# Parse nginx access log, compute requests/sec and bytes/sec by 5-min bucket
awk '
BEGIN { FS="[ \\[\\]]+" }
{
# field 4 looks like: 22/Jun/2026:14:35:12
split($4, t, "[:/]")
bucket = t[1] "/" t[2] "/" t[3] " " t[4] ":" sprintf("%02d", int(t[5]/5)*5)
requests[bucket]++
bytes[bucket] += $NF
}
END {
for (b in requests)
printf "%-22s %6d req %12d bytes\n", b, requests[b], bytes[b]
}
' access.log | sort
This kind of pipeline used to be a Python script. In awk, it’s 10 lines.
2. jq — JSON Swiss army knife
JSON is everywhere in modern shell work — Kubernetes API, AWS CLI, GitHub API, every web service. jq is to JSON what awk is to columnar text.
Basics: pretty-print and select
# Pretty-print
echo '{"name":"alice","age":30}' | jq .
# Get a field
echo '{"name":"alice","age":30}' | jq .name # "alice"
echo '{"name":"alice","age":30}' | jq '.name' # same; quote when shell would interpret
# Get a nested field
echo '{"user":{"name":"alice"}}' | jq .user.name
# Array access
echo '[1,2,3]' | jq '.[0]' # 1
echo '[1,2,3]' | jq '.[-1]' # 3 (negative = from end)
echo '[1,2,3]' | jq '.[]' # iterate: 1\n2\n3
echo '[1,2,3]' | jq '.[1:3]' # slice: [2,3]
Always wrap jq filters in single quotes — they contain $, [, . that the shell would otherwise interpret.
Pipes (inside jq)
jq has its own internal pipe |, which feeds output of one filter into another:
# From a list of users, get names
echo '[{"name":"alice"},{"name":"bob"}]' | jq '.[] | .name'
# "alice"
# "bob"
# Even more compact:
echo '[{"name":"alice"},{"name":"bob"}]' | jq '.[].name'
Selectors and filters
# select: keep only items matching a predicate
jq '.[] | select(.age > 30)' users.json
# Multiple predicates
jq '.[] | select(.age > 30 and .role == "admin")' users.json
# Pattern match
jq '.[] | select(.name | test("^A"))' users.json # name starts with A
Construct new objects
# Pick specific fields
jq '.[] | {name, age}' users.json
# Rename / compute
jq '.[] | {full_name: .name, is_adult: (.age >= 18)}' users.json
# As an array
jq '[.[] | .name]' users.json
map — transform an array
# Double every age
jq 'map(.age *= 2)' users.json
# Map to just names
jq 'map(.name)' users.json # equivalent to [.[] | .name]
length, keys, to_entries, from_entries
jq 'length' users.json # number of array items
jq 'keys' my_obj.json # all keys of an object
jq '.users | keys' # nested
# Convert object to array of {key, value} pairs
jq 'to_entries' obj.json
# [ {"key":"name","value":"alice"}, {"key":"age","value":30} ]
# And back
jq 'to_entries | from_entries' obj.json # round-trip
Aggregations: add, min, max, unique, group_by
# Sum all ages
jq '[.[].age] | add' users.json
# Max age
jq '[.[].age] | max' users.json
# Unique roles
jq '[.[].role] | unique' users.json
# Group by role, count each
jq 'group_by(.role) | map({role: .[0].role, count: length})' users.json
Output modes
jq -r '.name' file # raw — no JSON quotes; useful for shell strings
jq -c . # compact — one item per line; for streaming
jq -s '.' # slurp — read all input as single array
-r is essential when feeding jq output into shell variables:
NAME=$(curl -s api.example.com/user | jq -r '.name') # without -r, NAME would have quotes
Real-world examples
# Get all running pod names from kubectl
kubectl get pods -o json | jq -r '.items[] | select(.status.phase == "Running") | .metadata.name'
# Format AWS instances as tab-separated
aws ec2 describe-instances --output json \
| jq -r '.Reservations[].Instances[] | [.InstanceId, .InstanceType, .State.Name] | @tsv'
# From a GitHub commits API response, get hash and message
curl -s api.github.com/repos/torvalds/linux/commits \
| jq -r '.[] | "\(.sha[0:7]) \(.commit.message | split("\n")[0])"'
Editing JSON
# Update a field
echo '{"name":"alice","age":30}' | jq '.age = 31'
# Add a field
echo '{"name":"alice"}' | jq '. + {age: 30}'
# Delete a field
echo '{"name":"alice","age":30}' | jq 'del(.age)'
# In-place edit a JSON file (atomic with mv-temp)
TMP=$(mktemp)
jq '.version = "2.0"' package.json > "$TMP" && mv "$TMP" package.json
There’s no jq -i (yet); the mktemp + mv pattern is canonical.
3. yq — jq for YAML
There are two yq tools confusingly named the same:
- Mike Farah’s
yq(Go, written 2017+) — what most cloud engineers use. Syntax mirrorsjq. Works on YAML, JSON, XML. - kislyuk’s
yq(Python wrapper around jq) — the older one. Less common now.
We’ll cover Mike Farah’s. Install with brew install yq or download the binary.
Basic usage
yq '.metadata.name' deployment.yaml # get a field — same as jq syntax
yq '.spec.replicas = 5' deployment.yaml # mutate (prints to stdout)
yq -i '.spec.replicas = 5' deployment.yaml # in-place (yq has -i, jq doesn't)
Multiple documents in one YAML file
Kubernetes manifests often have multiple documents separated by ---. yq handles them:
yq '.kind' multi.yaml # prints all "kind" values, one per document
yq 'select(.kind == "Service")' multi.yaml # extract only Service docs
Convert between formats
yq -o json '.' file.yaml > file.json # YAML to JSON
yq -p json -o yaml '.' file.json # JSON to YAML
yq -p xml '.' file.xml # parse XML
Real-world examples
# Get all images used in a Helm template'd deployment
helm template mychart | yq '..|.image? | select(.)'
# Bulk-update image tag in a Kustomize patch
yq -i '.spec.template.spec.containers[0].image = "myimage:v2"' deploy.yaml
# Get all containers across all pods in a namespace
kubectl get pods -o yaml | yq '.items[].spec.containers[].name'
Caveats
- YAML is more complex than JSON — comments, anchors, multi-line strings — and
yqpreserves them when possible, but anyyq -iround-trip can subtly reformat the file. For Helm/Kustomize source files where exact formatting matters, prefer using YAML-aware tools (Helm, Kustomize, OPA Rego) or be very deliberate withyq -i. - The two
yqs are not interchangeable. If a colleague’s snippet doesn’t work, check whichyqthey have (yq --version).
4. CSV: when awk -F, isn’t enough
The naive approach to CSV is awk -F,. It works until the first quoted field with an embedded comma, then it explodes.
# This file:
"Smith, John",30,Engineer
"Doe, Jane",25,Manager
# awk -F, treats the comma INSIDE the quotes as a separator. Wrong.
For real-world CSV (especially anything from spreadsheets or business systems), use a real CSV parser.
csvkit — Python-based CSV toolkit
pip install csvkit
Provides:
csvlook— pretty-print as a tablecsvcut— extract columns by name or indexcsvgrep— filter rows by column valuecsvstat— column statistics (min/max/mean/distinct values)csvsort— sort by columncsvjoin— SQL-style join of two CSVscsvjson— convert to JSONcsvsql— run SQL queries against a CSV (!)in2csv— convert XLS/XLSX/JSON to CSVcsvformat— change delimiters, quoting
Examples
# Pretty-print the first 10 rows
head -n 10 data.csv | csvlook
# Get a specific column by name
csvcut -c first_name,last_name people.csv
# Filter rows where state is CA
csvgrep -c state -m CA people.csv
# Statistics on each column
csvstat sales.csv
# Sort by sale amount, descending
csvsort -c amount -r sales.csv | head
# Join orders with customers on customer_id
csvjoin -c customer_id orders.csv customers.csv > joined.csv
# Run SQL against a CSV
csvsql --query "SELECT state, COUNT(*) FROM people GROUP BY state ORDER BY 2 DESC" people.csv
# Convert Excel to CSV
in2csv sales.xlsx > sales.csv
csvsql is genuinely magical: it loads the CSV into an in-memory SQLite, runs the query, prints the result. For small to medium CSVs (up to a few hundred MB), it beats writing pandas or a real database.
xsv — fast CSV (Rust)
For very large CSVs, csvkit (Python) is slow. xsv is a Rust-based alternative:
brew install xsv
xsv stats data.csv | xsv table # statistics, table-formatted
xsv select first_name,last_name data.csv
xsv search -s state CA data.csv # filter by column
xsv join customer_id orders.csv customer_id customers.csv
Same operations as csvkit, much faster on big files.
miller (mlr)
Yet another option, designed for “TSV/CSV/JSON/etc as named-field records”:
brew install miller
mlr --csv stats1 -a mean,stddev -f age people.csv
mlr --c2t cat people.csv > people.tsv # CSV to TSV
mlr --c2j cat people.csv > people.json # CSV to JSON
mlr is genuinely clever and very capable, but has its own syntax to learn. Pick one (csvkit, xsv, or mlr) and stick with it.
5. Locale and UTF-8 — the silent saboteurs
This is the section that most “shell scripting” tutorials skip, and it’s where almost every senior engineer gets bitten at least once.
Locale categories
A locale tells programs how to interpret text:
LC_CTYPE— what counts as a letter, digit, lowercaseLC_COLLATE— sort orderLC_NUMERIC— decimal separator (,in Germany,.in the US)LC_TIME— date and time formatLC_MESSAGES— language of program messagesLC_MONETARY— currency formattingLC_ALL— overrides all of the aboveLANG— fallback if a specificLC_*isn’t set
On a typical Mac/Linux system:
locale # show current settings
# Common values:
LANG=en_US.UTF-8
LC_ALL=
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
...
The LC_ALL=C trick
Setting LC_ALL=C (or the equivalent LC_ALL=POSIX) tells programs to use the most basic, byte-comparison-only locale. It’s:
- Faster — no Unicode collation tables, no locale lookups.
sort | uniqof a 100MB file can be 10x faster. - More predictable — alphabetical sort means “byte order,” not “linguistic order.”
b < Bistruein C locale (capital letters come first); in en_US.UTF-8 it depends. - Wrong for international data — German
äwill sort wherever its UTF-8 byte sequence sorts, not “near a.” Polishłwill sort like a totally different letter.
So the rule:
# For pipelines that don't need linguistic correctness:
LC_ALL=C sort -u file.txt
# For pipelines that DO need it:
LC_ALL=en_US.UTF-8 sort -u file.txt
Always set LC_ALL explicitly inside scripts to avoid being at the mercy of the user’s environment:
#!/usr/bin/env bash
set -Eeuo pipefail
export LC_ALL=C # if you want fast, byte-deterministic processing
# ... your pipeline ...
UTF-8 in grep, sed, awk
Modern GNU grep/sed/awk handle UTF-8 correctly when the locale is set to a UTF-8 locale:
echo 'café' | grep -oE '\w+' # in en_US.UTF-8: "café"
echo 'café' | LC_ALL=C grep -oE '\w+' # in C: "caf" (é is not a word char in C locale)
If you want to count grapheme clusters (what users perceive as “characters”), neither shell nor awk is the right tool — that’s python3 with unicodedata or specialised libraries.
wc -c vs wc -m
echo 'café' | wc -c # 6 (bytes; é is 2 bytes in UTF-8) — plus newline
echo 'café' | wc -m # 5 (characters) — plus newline; in UTF-8 locale
wc -c counts bytes; wc -m counts characters but only respects the locale.
File names with weird characters
Always quote variables that might contain filenames:
for f in *.txt; do
cp "$f" "/backup/$f" # quotes essential — filename might have spaces/UTF-8
done
find . -type f -print0 | xargs -0 cp -t /backup/ # NUL-safe
If a filename contains an invalid UTF-8 sequence (rare but happens), some tools will refuse it. find and cp are byte-faithful — they don’t care about UTF-8 validity. bash is also byte-faithful. So shell handles them; just don’t try to convert filenames to a string in another encoding.
BOM (Byte Order Mark) gotcha
Files saved by Windows tools sometimes have a UTF-8 BOM (EF BB BF) at the start. This is invisible but breaks scripts:
file weird.csv # "UTF-8 Unicode (with BOM) text"
# This BOM appears as a "character" in the first cell:
head -c 3 weird.csv | xxd # 00000000: efbb bf
# Strip BOM:
sed -i '1s/^\xef\xbb\xbf//' weird.csv
# or use dos2unix (which also handles CRLF)
dos2unix weird.csv
If your CSV “first column header” mysteriously doesn’t match what you expect, suspect a BOM.
CRLF vs LF
Windows line endings (\r\n) trip up shell scripts. The carriage return is invisible but breaks read, awk, etc.:
file script.sh # "ASCII text, with CRLF line terminators"
dos2unix script.sh # convert in place
# or:
sed -i 's/\r$//' script.sh
# or:
tr -d '\r' < script.sh > tmp && mv tmp script.sh
Numeric locale
LC_NUMERIC=de_DE.UTF-8 makes printf '%.2f' 3.14 output 3,14 (comma as decimal separator). This breaks tools that re-read the output:
# Inside scripts, force C numeric locale for safety
export LC_NUMERIC=C
6. Combining the toolkit
The real power is composing these. A few full workflows:
Workflow 1: Top 10 noisiest containers (Kubernetes)
kubectl top pods --all-namespaces --no-headers \
| awk '{ print $3, $1 "/" $2 }' \
| sort -k1 -h -r \
| head -n 10 \
| awk '{ printf "%-10s %s\n", $1, $2 }'
awk selects and reorders columns; sort -h does human-readable sort (M, G); the second awk formats. No regex, no jq.
Workflow 2: From CSV to per-region summary
csvgrep -c country -m USA sales.csv \
| csvcut -c region,amount \
| csvsql --query "SELECT region, SUM(CAST(amount AS REAL)) AS total
FROM stdin GROUP BY region ORDER BY total DESC"
Workflow 3: Container image audit across a Kubernetes cluster
# All container images in the cluster, deduped, with usage count
kubectl get pods --all-namespaces -o json \
| jq -r '.items[].spec.containers[].image' \
| sort | uniq -c | sort -rn
Workflow 4: Dynamic Helm values from a YAML file
# Pull pre-defined image map from values.yaml, inject into a kubectl set image
yq -r '.images | to_entries | .[] | "\(.key)=\(.value)"' values.yaml \
| while IFS=$'\n' read -r line; do
kubectl set image deployment/$DEPLOY "$line"
done
Workflow 5: Streaming JSON logs filter
# Tail a structured-log-as-JSON file, filter ERROR-level, format human-readable
tail -F /var/log/app.log \
| jq --unbuffered -r 'select(.level == "ERROR") | "\(.ts) \(.msg)"'
--unbuffered is essential for tail -F | jq pipelines so jq flushes after each input line.
Workflow 6: Detect drift in a Kubernetes manifest
# Compare in-cluster vs source-of-truth YAML, ignoring runtime fields
diff \
<(kubectl get deploy myapp -o yaml | yq 'del(.metadata.resourceVersion, .metadata.generation, .status)') \
<(yq 'del(.metadata.resourceVersion, .metadata.generation, .status)' deployment.yaml)
This kind of one-liner replaces a Python script. It’s the daily life of a platform engineer.
7. Pitfalls and conventions
Don’t pipe sort | uniq when you need order-preservation
uniq only deduplicates adjacent duplicates; that’s why it’s almost always used after sort. But sort reorders. If you need first-occurrence-preserving uniq:
awk '!seen[$0]++' # canonical "uniq, preserving order"
Don’t sort -u when you need stable ordering
sort -u is a fast alternative to sort | uniq, but it doesn’t guarantee a particular dedup-winner among equal lines.
Don’t write CSV by hand
If your output is for downstream consumption as CSV, escape correctly. The naive printf '%s,%s\n' "$a" "$b" breaks the moment $a contains a comma or newline. Use a real tool (csvkit, miller, Python).
Don’t jq -r on JSON arrays of complex objects
If you do jq -r '.[]' on [{...},{...}], you’ll get malformed shell tokens. Either iterate one field at a time, or use jq -c '.[]' and parse each line again with jq.
# Wrong — produces ambiguous/multi-line output
jq -r '.[]' file.json | while read -r item; do … done
# Right — compact JSON per line
jq -c '.[]' file.json | while IFS= read -r item; do
NAME=$(jq -r '.name' <<< "$item")
AGE=$(jq -r '.age' <<< "$item")
…
done
Don’t trust awk -F, for real CSV
We covered this. Use csvkit/xsv/miller.
Locale in CI
CI runners often have LANG=C or LANG=POSIX by default. If your script does locale-sensitive sort or printf, it will behave differently than on your laptop. Either explicitly set the locale in the script, or test with LC_ALL=C once.
Streaming vs slurp
jq defaults to streaming (one input record at a time). jq -s slurps everything into one array. If your script does tail -F | jq …, never use -s — it would buffer forever waiting for EOF.
8. Twelve idioms for daily use
# 1. Sum a numeric column
awk '{ s += $1 } END { print s }' data.tsv
# 2. Count distinct values in a column
awk '{ count[$1]++ } END { for (k in count) print count[k], k }' data | sort -rn
# 3. Print column 2 from a colon-separated file
awk -F: '{ print $2 }' file
# 4. Skip the header row
awk 'NR > 1' file.csv
# 5. Get all running pod names from kubectl
kubectl get pods -o json | jq -r '.items[] | select(.status.phase=="Running") | .metadata.name'
# 6. JSON to TSV
jq -r '.[] | [.id, .name, .email] | @tsv' users.json
# 7. Update a JSON field in place (atomic)
TMP=$(mktemp); jq '.version = "2.0"' package.json > "$TMP" && mv "$TMP" package.json
# 8. Update YAML in place
yq -i '.spec.replicas = 5' deploy.yaml
# 9. Convert YAML to JSON
yq -o json '.' file.yaml
# 10. Extract a column from real CSV (handling quotes)
csvcut -c name people.csv
# 11. Run SQL on a CSV
csvsql --query "SELECT state, COUNT(*) FROM stdin GROUP BY state" people.csv
# 12. Strip BOM and CRLF from a file
dos2unix file.csv && sed -i '1s/^\xef\xbb\xbf//' file.csv
9. What you must internalise before Wave 2
- What does
awkuse for record/field separators by default? (Newline / whitespace.) - What does
BEGIN { … }do in awk? (Runs before any input.ENDruns after.) - What’s the
NR==FNR { … ; next }idiom? (Process only the first of multiple input files.) - What’s the difference between
jq .andjq -r .? (-routputs raw strings without JSON quotes — use when feeding into shell variables.) - What’s
jq -c? (Compact output, one record per line — for streaming.) - Which
yqare we using? (Mike Farah’s Go version. Not the Python wrapper.) - Why is
awk -F,wrong for real CSV? (Doesn’t handle quoted fields with embedded commas.) - Which CSV tool is fastest for big files? (
xsv. csvkit is convenient but Python-slow.) - What does
LC_ALL=Cdo? (Forces byte-only comparisons — fast and predictable, but breaks non-ASCII linguistic correctness.) - What’s a UTF-8 BOM and how do you remove it? (
EF BB BFat file start;sed -i '1s/^\xef\xbb\xbf//' fileordos2unix.)
If anything felt fuzzy, re-read the section. These tools repay study many times over.
What’s next: Wave 1 complete!
You’ve now completed the foundation of the course:
Tier 1 (foundation) — anatomy, variables, conditionals, loops, functions, arrays. Tier 2 (intermediate) — I/O, pipes, processes, signals, glob/regex/find/grep/sed, structured-data toolkit.
You can now write a robust shell script: it sets set -Eeuo pipefail and IFS=$'\n\t', defines main "$@", has trap cleanup for safe interrupts, uses arrays for structured data, processes JSON with jq and YAML with yq, handles UTF-8 and locales correctly, and uses find -print0 | xargs -0 for filename-safe pipelines.
Wave 2 — Tier 3 Advanced — covers the next layer: error handling frameworks, debug/trace techniques, secrets management (1Password, vault, sops, age), package management cross-platform, idempotent installers, CLI design for your own scripts (option parsing with getopts and argparse-like patterns), bash testing (bats), and the canonical patterns for writing scripts that ship in production. We bring everything from Wave 1 and start building real systems with it.
See you in Tier 3.