DevOps Lesson 5 of 56

Shell & Bash Scripting for DevOps, In Depth: the Language, Safety & Automation Patterns

Open any CI/CD pipeline you like — GitHub Actions, GitLab CI, Jenkins, Azure Pipelines — and look past the YAML. Underneath the steps: and script: keys, the thing that actually runs is almost always a shell. A run: block is a shell script. A Dockerfile RUN is a shell command. The entrypoint that starts your container is a shell script. The “just a quick check” that gates a deploy is a shell one-liner. The shell is the universal glue of operations, and Bash is its lingua franca: it is on every Linux runner, every base image, every server you will ever SSH into. You cannot avoid it, so you should be good at it.

The trouble is that almost nobody learns Bash properly. People absorb it by osmosis — copy a snippet, tweak it until the pipeline goes green, move on — and the result is the single most common class of production incident there is: a script that looked like it worked, exited 0, and silently did the wrong thing. A space in a filename. An unset variable that expanded to nothing and turned rm -rf "$DIR/" into rm -rf /. A pipeline whose middle command failed but whose exit code came from the harmless tee at the end. This lesson is the cure. It is not a full Linux course (we do not cover the filesystem, package managers, or systemd) — it is a focused, pipeline-oriented treatment of the Bash language itself: how to write scripts that are safe, predictable, and idempotent, the way a CI script must be. We go through variables and quoting (the number-one bug), exit codes and the set -euo pipefail + trap preamble every serious script needs, conditionals and tests, loops, functions, arrays, parameter expansion, the cut/sed/awk/grep quartet, pipes and redirection, getopts, and how to debug it all. This sits one rung below the vendor-neutral anatomy of CI/CD — that lesson tells you what a job and a step are; this one tells you how to write the thing inside the step so it does not betray you.

Learning objectives

By the end of this lesson you will be able to:

Prerequisites & where this fits

You need only a terminal on macOS, Linux, or WSL, and a willingness to run small scripts as you read. No prior shell scripting is assumed — we define every term — but comfort with the command line (running a command, editing a file, cd-ing around) will help. This lesson sits in the Fundamentals module of the DevOps Zero-to-Hero course, deliberately before the tool-specific CI lessons, because every one of them ends up running shell: a GitHub Actions run: step, a GitLab script:, a Jenkins sh, an Azure Pipelines bash@3 task, a Dockerfile RUN, a Kubernetes init container. Once you understand the anatomy of a pipeline — stages, jobs, steps, agents — this lesson teaches you to write the code that lives inside a step without it becoming the reason your 2 a.m. page goes off. Everything here is Bash specifically (version 4+, the practical baseline in 2026), with notes where POSIX sh differs, because CI runners and containers vary in which shell they give you.

Core concepts: what the shell actually is

A shell is a program that reads lines of text, expands them according to a set of rules, and runs the result as commands. When you type ls -l "$HOME", the shell expands "$HOME" to /home/you, splits the line into words, finds the ls program, and runs it with the argument /home/you. A shell script is just a file full of those lines run non-interactively. The shell’s power — and its danger — is that expansion happens before the command runs, and the rules of expansion (word splitting, globbing, variable substitution) are exactly where the foot-guns live.

A handful of terms recur throughout, so fix them now:

Term Meaning
Shell The command interpreter (bash, sh, dash, zsh); reads, expands, executes.
Bash The “Bourne-Again SHell” — the GNU shell, ubiquitous on Linux; a superset of POSIX sh.
POSIX sh The standardised minimal shell language; the lowest common denominator (dash, BusyBox ash).
Shebang The #!/path/to/interpreter first line that tells the OS which interpreter to run a script with.
Builtin A command the shell implements itself (cd, echo, [[, read) — no separate process.
External command A separate program found on $PATH (grep, sed, awk, curl).
Expansion The substitution the shell performs before running a command (variables, globs, command substitution).
Word splitting The shell breaking an unquoted expansion into multiple words on $IFS (spaces/tabs/newlines).
Exit code / status The integer (0255) a command returns; 0 = success, non-zero = failure.

The single most important sentence in this lesson: the shell expands variables and then splits the result into words and expands globs — so an unquoted $var is not “the value of var”, it is “the value of var, chopped on whitespace, with any * turned into matching filenames”. Quoting is how you stop that. Almost everything that follows is downstream of that one fact.

Starting a script: shebang, sh vs bash, and execution

A script is a text file. To make it runnable you give it a shebang (the #! line) and an execute bit:

#!/usr/bin/env bash
# deploy.sh — the first line is the shebang; the OS reads it to pick the interpreter.
echo "Hello from $0"
chmod +x deploy.sh   # add the execute permission
./deploy.sh          # run it; the kernel reads the shebang and runs: bash deploy.sh

There are three ways a script gets executed, and they behave differently:

Invocation What runs Needs execute bit? Honours shebang?
./script.sh The interpreter from the shebang Yes Yes
bash script.sh The bash you named, ignoring the shebang No No (shebang is just a comment)
source script.sh / . script.sh Runs in your current shell — variables and cd persist No No

That last row matters: source (or its POSIX synonym .) does not start a new process — it runs the lines in your current shell, so any variables it sets or directories it cds into stick around. Use it to load environment files (source .env); never use it to run an untrusted script, because it can change your shell.

Why #!/usr/bin/env bash and not #!/bin/bash

#!/usr/bin/env bash asks env to find bash on $PATH, which is more portable — on macOS the system /bin/bash is an ancient 3.2 (Apple froze it over licensing), while Homebrew installs a modern Bash 5 earlier on $PATH. env finds the modern one. The trade-off is that env can’t take arguments portably (#!/usr/bin/env bash -e is unreliable), so put your options inside the script with set instead — which you should do anyway.

sh vs bash: the distinction that bites in CI

This is the one that surprises people. sh is not Bash. On Debian and Ubuntu (and therefore most CI runners and Docker base images), /bin/sh is dash, a tiny strict POSIX shell with none of Bash’s conveniences. So a Dockerfile line like RUN [ -n "$X" ] && echo yes runs under sh, and Bash-only syntax — [[ ]], arrays, ${var,,}, <<<, function keyword, local (in some) — will throw a syntax error or behave differently.

Feature bash POSIX sh (dash)
[[ condition ]] Yes (preferred) No — use [ ... ] (the test builtin)
Arrays arr=(a b c) Yes No
${var,,} / ${var^^} (case) Yes No
<<< here-string Yes No
$'\n' ANSI-C quoting Yes No
function name { } keyword Yes No — use name() { }
local in functions Yes Often, but not guaranteed
set -o pipefail Yes No (not in POSIX)

The practical rule for pipelines: decide which shell you are in and write for it. If you want Bash features, ensure the step runs Bash — most CI lets you set the shell. In GitHub Actions: shell: bash (it is the default on Linux/macOS anyway, with bash --noprofile --norc -eo pipefail {0}). In GitLab CI the runner uses sh by default unless the image’s default is Bash — call bash explicitly or set the image. In a Dockerfile, RUN uses /bin/sh; use the JSON-array exec form to force Bash: RUN ["/bin/bash", "-c", "set -euo pipefail; ..."]. A huge fraction of “works on my laptop, fails in the pipeline” bugs are simply your Mac/zsh or Bash running it locally and dash running it in the container. ShellCheck (later) catches most of these for you.

Variables and quoting — the number-one bug

A variable is set with name=valueno spaces around the = (name = value is parsed as “run the command name with args = and value”). You read it with $name or, better, ${name}:

name="prod cluster"          # the value contains a space
echo $name                   # BUG: prints two words → ls sees "prod" and "cluster"
echo "$name"                 # CORRECT: prints one word, "prod cluster"
echo "${name}-eu"            # braces delimit the name: "prod cluster-eu"

Here is the rule, and it is close to absolute: double-quote every variable expansion and every command substitution. Write "$var", "${arr[@]}", "$(date)". Quoting suppresses two things you almost never want inside a value:

file="my report.txt"
rm $file        # BUG: rm "my" "report.txt" — deletes the wrong things or errors
rm "$file"      # CORRECT: rm "my report.txt"

The disaster case is well known: a script does rm -rf $TARGET/ and TARGET is unset or empty, so the line becomes rm -rf /. Quoting (rm -rf "$TARGET/") plus set -u (treat unset as an error, below) prevents it. Internalise this table:

You write If var="a b" and a file * exists, the shell runs Safe?
cmd $var cmd a b (split into 2 args) No
cmd "$var" cmd "a b" (one arg) Yes
cmd $files (glob in value) filenames expanded No
cmd "$files" literal value, no glob Yes
cmd "${arr[@]}" each element a separate, intact arg Yes
cmd ${arr[@]} (unquoted) each element split again No

Single vs double quotes: double quotes ("...") allow expansion of $var, $(...) and backticks but suppress splitting/globbing; single quotes ('...') are literal — nothing is expanded. Use single quotes for fixed strings and for awk/sed programs (so $1 means awk’s field, not a shell variable). To get a literal single quote inside single quotes you must close, escape, reopen: 'it'\''s'.

Command substitution

$(command) runs a command and substitutes its standard output (trailing newlines stripped). Prefer it to the older backticks `command`$(...) nests cleanly and is readable:

commit="$(git rev-parse --short HEAD)"   # capture output into a variable
echo "Building ${commit}"
files_count="$(ls -1 | wc -l)"           # nesting and quoting both work

Always quote the assignment target’s use ("$commit"); the assignment itself (x="$(...)") does not word-split, but using $commit later unquoted does.

Environment variables vs shell variables

A plain name=value is a shell variable — visible only in the current shell. export name=value (or export name) puts it in the environment, so child processes (the programs your script runs) inherit it. CI injects configuration as environment variables, which your script reads exactly the same way: "$CI_COMMIT_SHA", "$GITHUB_SHA", "$AWS_REGION". To pass a variable to one command only, prefix it: DEBUG=1 ./run.sh sets DEBUG for that invocation alone.

Parameter expansion — manipulate values without external tools

Bash’s ${...} parameter expansion does string work in the shell, faster and safer than spawning sed/cut. The defaults forms are essential for robust CI scripts:

Expansion Meaning Example (f=app/main.go, x unset)
${x:-default} Use default if x is unset or empty ${x:-dev}dev
${x-default} Use default only if x is unset (empty stays empty)
${x:=default} As :- but also assigns default to x sets and returns dev
${x:?message} Error and exit with message if x unset/empty ${DB_URL:?must be set}
${x:+value} Use value only if x is set (else empty) feature-flag style
${#f} Length of the value 8
${f#*/} Strip shortest match of pattern from front main.go
${f##*/} Strip longest from front (→ basename) main.go
${f%/*} Strip shortest from back (→ dirname) app
${f%%.*} Strip longest from back app/main
${f/main/test} Replace first match app/test.go
${f//o/0} Replace all matches app/main.g0
${f:0:3} Substring (offset, length) app
${f^^} / ${f,,} Upper- / lower-case (Bash 4+) APP/MAIN.GO

The two you will reach for most in pipelines are ${VAR:-default} (give a safe default so the script does not break when an optional variable is missing) and ${VAR:?message} (fail immediately and clearly when a required variable is missing — far better than a confusing error 40 lines later). For example: region="${AWS_REGION:-ap-south-1}" and : "${IMAGE_TAG:?IMAGE_TAG is required}".

Exit codes — how everything signals success or failure

Every command returns an exit status: an integer 0255 where 0 means success and any non-zero means failure (the specific number is the command’s choice — grep returns 1 for “no match found”, 2 for an actual error). The shell stores the last command’s status in the special variable $?:

grep -q "ERROR" build.log
echo "grep exit code: $?"     # 0 if found, 1 if not found, 2 on error

Your own scripts must return meaningful codes — this is how a CI step knows whether to go red. exit 0 for success, exit 1 (or another non-zero) for failure. A function returns the status of its last command, or you can return N explicitly. The exit-code conventions worth knowing:

Code Meaning
0 Success
1 General error (the catch-all)
2 Misuse of a builtin / bad arguments (Bash convention)
126 Command found but not executable (permission)
127 Command not found (typo, missing tool on $PATH)
128 + N Killed by signal N (e.g. 130 = Ctrl-C/SIGINT, 137 = SIGKILL/OOM, 143 = SIGTERM)

Two of these are CI gold: 127 in a log almost always means a tool is not installed on the runner (or a typo), and 137 almost always means the out-of-memory killer killed your process (bump the container memory). Knowing the code tells you the fix without reading the rest of the log.

The safety preamble: set -euo pipefail + trap

This is the most important section in the lesson. By default, Bash is forgiving in all the wrong ways: it keeps going after a command fails, treats unset variables as empty, and reports only the last command in a pipeline. For an interactive shell that is convenient; for an unattended CI script it is how you silently deploy a half-built artifact. The fix is four words at the top of every serious script:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

Here is exactly what each flag does and why you want it:

Flag Long form Effect Why it matters in CI
-e set -o errexit Exit immediately if any command returns non-zero Stops the script the instant a step fails instead of barrelling on
-u set -o nounset Treat an unset variable as an error and exit Catches typos and missing env vars; prevents the empty-$TARGET disaster
-o pipefail A pipeline’s status is the last non-zero of any stage, not just the last command cmd | tee log no longer hides cmd’s failure behind tee’s success
(optional) IFS=$'\n\t' Word-split only on newline and tab, not space Makes filenames-with-spaces far less dangerous

Why all three of -e -u -o pipefail together? Each plugs a different leak:

set -e has well-known sharp edges you must know:

trap — guaranteed cleanup

set -e makes the script stop on error; trap makes it clean up on the way out, no matter how it exits (success, error, or Ctrl-C). A trap registers a command to run when a signal or pseudo-signal fires. The big one is EXIT, which runs on any exit:

#!/usr/bin/env bash
set -euo pipefail

workdir="$(mktemp -d)"                 # create a temp working dir
cleanup() {
  rm -rf "$workdir"                    # always remove it
  echo "Cleaned up $workdir"
}
trap cleanup EXIT                      # run cleanup() however we leave

# ... do work in "$workdir" ...
git clone --depth 1 "$REPO" "$workdir/src"
# no matter what happens next, the temp dir is removed on exit
Signal / pseudo-signal Fires when Typical use
EXIT The script exits for any reason Remove temp files, release locks, log “done”
ERR A command fails (pairs with set -e) Print a diagnostic with $LINENO before dying
INT Ctrl-C (SIGINT) Graceful interrupt handling
TERM kill / orchestrator stop (SIGTERM) Drain/stop gracefully (containers get this)

A useful ERR trap prints where it died — invaluable in a long CI log:

trap 'echo "ERROR on line $LINENO (exit $?)" >&2' ERR

This combination — set -euo pipefail to fail fast, an EXIT trap to clean up, an ERR trap to tell you where — is the skeleton of every production-grade shell script. Put it at the top and you have eliminated the majority of “silent wrong behaviour” bugs in one stroke.

Conditionals and tests

Bash decides things with if and with the short-circuit operators. The thing being tested is a command’s exit status, not a boolean — if cmd; then means “if cmd succeeded (exit 0)”.

if [[ -f "$config" ]]; then
  echo "config exists"
elif [[ -d "$config" ]]; then
  echo "it's a directory"
else
  echo "missing"
fi

Use [[ ... ]], not [ ... ], in Bash. [[ ]] is a Bash keyword (safer parsing — no word-splitting of variables inside, supports &&/||/</> and =~ regex), whereas [ is the old test command that does split and needs every variable quoted. Reserve [ ] for POSIX sh scripts. The operators:

Test True when Category
-z "$s" string is empty (zero length) string
-n "$s" string is non-empty string
"$a" == "$b" strings equal (= also works); == supports glob patterns in [[ ]] string
"$a" != "$b" strings not equal string
"$s" =~ ^v[0-9]+$ string matches the regex (Bash [[ ]] only) string
-e path path exists (any type) file
-f path exists and is a regular file file
-d path exists and is a directory file
-r/-w/-x path readable / writable / executable file
-s path exists and is non-empty file
-L path is a symbolic link file
"$a" -eq "$b" numbers equal (-ne -lt -le -gt -ge for the rest) number

String vs number is a classic trap: == compares strings, -eq compares integers. [[ "10" == "10.0" ]] is false (different strings), and [[ "abc" -eq 0 ]] errors. For arithmetic, use the dedicated (( )):

count=5
if (( count > 3 )); then echo "many"; fi   # numeric context: no $, C-style operators
(( count++ ))                               # arithmetic, increments to 6
total=$(( count * 2 ))                       # arithmetic expansion → 12

&&, ||, and the ||true idiom

A && B runs B only if A succeeded; A || B runs B only if A failed. This gives compact, readable flow:

mkdir -p ./dist && echo "ready"            # echo only if mkdir worked
command -v jq >/dev/null || { echo "jq not installed" >&2; exit 1; }   # guard
flaky-check || true                         # ignore failure (with set -e on)

The cmd || true idiom is how you tell set -e “this one is allowed to fail” — use it sparingly and only where a non-zero exit is genuinely fine (e.g. grep finding nothing, deleting a file that may not exist with rm -f). For multi-line “do this or bail”, prefer an explicit if.

case is the clean way to branch on a value matching patterns — common for dispatching on an argument or environment:

case "$ENVIRONMENT" in
  prod|production) replicas=5 ;;
  staging)         replicas=2 ;;
  dev|*)           replicas=1 ;;          # * is the default
esac

Loops — and the only safe way to read lines

Bash has for, while, and until. The for loop iterates a list of words:

for env in dev staging prod; do
  echo "Deploying to $env"
done

for file in ./manifests/*.yaml; do        # globs expand to real files
  [[ -e "$file" ]] || continue            # guard: skip if the glob matched nothing
  kubectl apply -f "$file"
done

for i in $(seq 1 5); do echo "attempt $i"; done   # or: for ((i=1;i<=5;i++))

while repeats while a command succeeds; until repeats until it does — the natural shape for a retry/wait loop, which pipelines need constantly:

# Wait for a service to become healthy (with a timeout), then proceed.
attempt=0
until curl -fsS "http://localhost:8080/health" >/dev/null; do
  attempt=$(( attempt + 1 ))
  (( attempt >= 30 )) && { echo "service never came up" >&2; exit 1; }
  echo "waiting for service... ($attempt)"
  sleep 2
done
echo "service is healthy"

Reading a file or command output line by line — the safe pattern

This is the loop everyone gets wrong. The wrong way is for line in $(cat file) — it splits on all whitespace (so a line “a b” becomes two iterations) and globs. The only safe way is while IFS= read -r line:

while IFS= read -r line; do
  echo "got: [$line]"
done < input.txt

Each piece matters: IFS= (empty, for this command) stops leading/trailing whitespace being trimmed; read -r stops backslashes being interpreted (-r = raw); < input.txt redirects the file into the loop’s stdin. To consume command output the same way, avoid piping into the loop (a pipe puts the loop in a subshell, so variables set inside are lost):

# GOOD: process substitution keeps the loop in the current shell
while IFS= read -r pod; do
  echo "restarting $pod"
done < <(kubectl get pods -o name)

# Read whitespace-separated fields per line:
while IFS=$'\t' read -r name status age; do
  echo "$name is $status"
done < pods.tsv

break exits the loop; continue skips to the next iteration. break 2/continue 2 operate on the enclosing loop when nested.

Functions and positional parameters

Functions group reusable logic. Define them with name() { ... } (portable) — skip the Bash-only function name {} keyword for portability. Always declare function-local variables with local, or they leak into the global scope and bite you elsewhere:

log() {                          # a simple structured logger
  local level="$1"; shift        # first arg is the level; shift drops it
  echo "[$(date -u +%FT%TZ)] [$level] $*" >&2   # remaining args are the message
}

deploy() {
  local service="$1" tag="${2:-latest}"   # second arg defaults to "latest"
  log INFO "deploying $service:$tag"
  kubectl set image "deploy/$service" "$service=$REGISTRY/$service:$tag"
}

log INFO "starting"
deploy api v1.4.2
deploy worker             # uses the default tag

Inside a function (and a script), arguments are the positional parameters:

Variable Meaning
$0 The script name (in a function, still the script, not the function)
$1, $2, … ${10} The 1st, 2nd, … 10th argument (braces needed past $9)
$# The number of arguments
"$@" All arguments, each as a separate quoted word — almost always what you want
"$*" All arguments as a single string joined by the first char of $IFS
shift / shift N Discard the first (or first N) arguments, renumbering the rest
$$ PID of the current shell (useful for unique temp names)
$! PID of the last background (&) command

The "$@" vs "$*" distinction is the function-level twin of the quoting rule: "$@" preserves each argument intact (so a filename with spaces survives), "$*" smashes them into one string. Use "$@" to forward arguments to another command: wrapper() { mytool --flag "$@"; }. Validate argument count early: (( $# == 2 )) || { echo "usage: $0 <svc> <tag>" >&2; exit 2; }.

A function returns the exit status of its last command, or return N explicitly (note: return is for status 0255, not for returning data — to return a string, echo it and capture with $(...)).

Arrays

Arrays hold lists — essential for building command arguments safely (the alternative, a space-separated string, re-introduces the word-splitting bug). Bash has indexed and associative (Bash 4+) arrays:

# Indexed array
regions=("ap-south-1" "eu-west-1" "us-east-1")
echo "${regions[0]}"          # first element → ap-south-1
echo "${#regions[@]}"         # length → 3
regions+=("us-west-2")        # append
for r in "${regions[@]}"; do  # iterate — ALWAYS quote "${arr[@]}"
  echo "$r"
done

# Build a command's arguments safely, then run them as one array
args=(--namespace prod --selector "app=web")
kubectl get pods "${args[@]}"   # each element stays one argument, spaces intact

# Associative array (declare -A) — key/value map
declare -A replicas=([api]=5 [worker]=2 [cron]=1)
echo "${replicas[api]}"         # → 5
for svc in "${!replicas[@]}"; do          # ${!arr[@]} = the KEYS
  echo "$svc wants ${replicas[$svc]}"
done
Syntax Meaning
arr=(a b c) Create an indexed array
"${arr[i]}" Element at index i
"${arr[@]}" All elements, each a separate quoted word (use this to iterate/forward)
"${arr[*]}" All elements as one string (joined by $IFS)
"${#arr[@]}" Number of elements
"${!arr[@]}" The indices/keys
arr+=(x) Append element(s)
declare -A m Declare an associative (key→value) array

The killer use in pipelines is building up command-line arguments conditionally: start with args=(deploy), then [[ -n "$NAMESPACE" ]] && args+=(--namespace "$NAMESPACE"), then run helm "${args[@]}". Each piece stays a clean separate argument no matter what is in it — impossible to do safely with a flat string.

Pipes, redirection, here-docs and here-strings

Every process has three standard streams: stdin (fd 0, input), stdout (fd 1, normal output), stderr (fd 2, errors/diagnostics). The shell lets you connect and redirect them — this is the heart of “Unix philosophy” composition.

Operator Effect
cmd1 | cmd2 Pipe: cmd1’s stdout becomes cmd2’s stdin
cmd > file Redirect stdout to file (truncating it)
cmd >> file Redirect stdout, appending
cmd 2> file Redirect stderr to file
cmd > out 2> err stdout and stderr to separate files
cmd > file 2>&1 stdout to file, then stderr to the same place (order matters!)
cmd &> file Bash shorthand for “both stdout and stderr to file
cmd < file Feed file as stdin
cmd 2>/dev/null Discard stderr (the “black hole”)
cmd >/dev/null 2>&1 Discard all output (run silently)
cmd | tee file Send stdout to both the terminal and file

The 2>&1 ordering is a classic interview catch: redirection is processed left to right, and 2>&1 means “make fd 2 go wherever fd 1 currently goes”. So cmd > file 2>&1 sends both to file (fd1→file, then fd2→fd1’s target=file), but cmd 2>&1 > file sends stderr to the terminal (fd2→current fd1=terminal) and only stdout to the file. Remember: redirect stdout first, then point stderr at it.

Separate your streams in scripts: send logs and progress to stderr (echo "building..." >&2) and keep stdout clean for data you want a caller to capture. That way result="$(myscript)" gets only the result, while the human still sees the progress messages.

Here-docs and here-strings

A here-document feeds a multi-line block as stdin — perfect for writing config files, SQL, or multi-line input without a separate file:

cat > config.yaml <<EOF
environment: ${ENVIRONMENT}
replicas: ${REPLICAS}
EOF

Variables are expanded inside <<EOF. To suppress expansion (write literal $VAR), quote the delimiter: <<'EOF'. Use <<-EOF (note the dash) to allow leading tabs to be stripped so you can indent the heredoc body. A here-string (<<<) feeds a single string as stdin — handy for one-liners:

grep "ERROR" <<< "$log_output"        # feed a variable as stdin without echo|grep
read -r major minor patch <<< "1 4 2" # split a string into variables
jq '.version' <<< "$json"

Text wrangling: grep, cut, sed, awk quick-reference

Pipelines spend half their life slicing text — parsing kubectl output, extracting a field from JSON-ish logs, rewriting a config. Four external tools do the bulk of it. (For structured data prefer jq for JSON and yq for YAML — they parse properly instead of guessing — but the classic quartet is everywhere.)

Tool Best for Canonical example
grep Finding/filtering lines that match a pattern grep -E "ERROR|WARN" app.log
cut Extracting columns by delimiter or character position cut -d',' -f2,3 data.csv
sed Stream editing — substitute, delete, insert lines sed 's/old/new/g' file
awk Field-aware processing, columns + arithmetic + logic awk '{sum+=$3} END{print sum}'

Key flags worth memorising:

# Real pipeline snippets:
kubectl get pods | awk 'NR>1 && $3!="Running" {print $1}'   # name of every non-Running pod
git log --oneline | wc -l                                    # commit count
sed -E "s/version: .*/version: ${TAG}/" chart.yaml > chart.new   # bump a version
ps aux | grep -v grep | grep "myapp" | awk '{print $2}'      # find a process' PID

A note on grep’s exit code under set -e: grep -q pattern file returns 1 when there is no match, which set -e treats as a fatal error. If “no match” is a normal outcome, guard it: if grep -q pattern file; then ... or grep -q pattern file || true.

Parsing arguments with getopts

For anything beyond one or two positional arguments, parse flags with the getopts builtin — it handles -v, -f value, and combined -vf, and is portable. The option string lists valid letters; a trailing : means that flag takes an argument:

#!/usr/bin/env bash
set -euo pipefail

usage() { echo "usage: $0 [-v] [-e ENV] -t TAG" >&2; exit 2; }

verbose=0
environment="dev"
tag=""

while getopts ":ve:t:" opt; do        # leading : = silent error mode (we handle errors)
  case "$opt" in
    v) verbose=1 ;;                    # a flag (no argument)
    e) environment="$OPTARG" ;;        # -e takes a value, in $OPTARG
    t) tag="$OPTARG" ;;
    :) echo "option -$OPTARG needs a value" >&2; usage ;;   # missing arg
    \?) echo "unknown option -$OPTARG" >&2; usage ;;        # unknown flag
  esac
done
shift $(( OPTIND - 1 ))                # drop the parsed options; "$@" now = positionals

[[ -n "$tag" ]] || { echo "-t TAG is required" >&2; usage; }
echo "env=$environment tag=$tag verbose=$verbose remaining=$*"
Piece Role
The optstring ":ve:t:" Valid options; : after a letter = takes an argument; leading : = quiet mode
$opt The current option letter being processed
$OPTARG The argument value for an option that takes one
$OPTIND Index of the next argument; shift $((OPTIND-1)) removes the parsed options
\? case Catches unknown options
: case Catches a flag that is missing its required argument

getopts handles only short single-letter options (no --long-flags); for GNU-style long options you either use the external getopt (different tool, trickier) or parse a while [[ $# -gt 0 ]]; do case "$1" in --env) ...; shift 2;; esac; done loop by hand. For most pipeline scripts, short flags via getopts are plenty.

Debugging: set -x, bash -n, and ShellCheck

When a script misbehaves, you have three tiers of help.

1. Trace execution with set -x (xtrace). It prints every command after expansion, prefixed with +, so you see exactly what ran with what values — the single fastest way to find “why did it expand to that”:

set -x            # turn tracing on
deploy api v1.2
set +x            # turn it off

# Or trace one block; or run the whole script with: bash -x script.sh
# Make the trace more useful by adding file:line:function to the prefix:
export PS4='+ ${BASH_SOURCE}:${LINENO}:${FUNCNAME[0]:-main}: '

In CI, gate verbose tracing behind a flag so logs stay clean by default: [[ "${DEBUG:-0}" == 1 ]] && set -x. Then re-run the job with DEBUG=1 (or the runner’s “re-run with debug logging” button) when you need it.

2. Check syntax without running, with bash -n script.sh (noexec) — it parses the script and reports syntax errors but executes nothing. Cheap to run as a pre-commit check on every script.

3. Lint with ShellCheck — do this for every script, always. ShellCheck is a static analyser that finds the bugs this entire lesson is about before they reach production: unquoted variables (SC2086), useless cat in pipes, [ ] vs [[ ]] issues, for line in $(cat ...) mistakes, cd without checking it succeeded, and the sh-vs-bash feature mismatches. It is the highest-leverage tool in shell scripting.

shellcheck deploy.sh                    # install: apt/brew install shellcheck
shellcheck -s bash *.sh                 # force the bash dialect
# Suppress a specific check on the next line only, with a reason:
# shellcheck disable=SC2086  # word splitting is intended here

Wire ShellCheck into CI as a required gate. A tiny GitHub Actions job catches shell bugs on every PR:

jobs:
  shellcheck:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Lint shell scripts
        run: |
          sudo apt-get update && sudo apt-get install -y shellcheck
          # Find and lint every shell script in the repo:
          find . -type f -name '*.sh' -print0 | xargs -0 shellcheck -s bash

Pair it with a formatter (shfmt) for consistent style, and you have the shell equivalent of a linter+formatter that every other language enjoys.

Idempotency and fail-fast patterns for CI

A CI script may run twice (a re-run, a retry), partway (a previous run died), or in parallel. Idempotent means “running it again produces the same end state without erroring” — the property that makes retries safe. The patterns:

Here is the canonical safe-script skeleton that combines everything — keep it as your template:

#!/usr/bin/env bash
set -euo pipefail
IFS=$'\n\t'

# --- config & required inputs (fail fast) ---
: "${IMAGE_TAG:?IMAGE_TAG is required}"
region="${AWS_REGION:-ap-south-1}"

# --- cleanup on any exit ---
workdir="$(mktemp -d)"
cleanup() { rm -rf "$workdir"; }
trap cleanup EXIT
trap 'echo "ERROR on line $LINENO (exit $?)" >&2' ERR

# --- tool checks ---
for tool in git docker; do
  command -v "$tool" >/dev/null || { echo "$tool not installed" >&2; exit 127; }
done

main() {
  echo "Building ${IMAGE_TAG} in ${region}..." >&2
  # ... real work, using "$workdir" ...
}

main "$@"

Anatomy of a safe DevOps shell script — shebang, the set -euo pipefail and trap safety preamble, quoted variables, functions, and a fail-fast main, all feeding a pipeline step

The diagram traces a single script from its shebang down through the safety preamble, quoted-variable expansion, a trap-guarded temp directory, and a main "$@" entry point, showing how each layer prevents a specific class of pipeline failure.

Hands-on lab

You will write, harden, and lint a small but real deployment-style script — entirely free, on any machine with Bash. Each step shows the command and the expected outcome.

Prerequisites. A terminal with Bash 4+ (bash --version). Install ShellCheck if you can: brew install shellcheck (macOS) or sudo apt-get install -y shellcheck (Debian/Ubuntu). The lab works without it, but step 6 is the highlight.

Step 1 — a deliberately fragile script. Create release.sh:

mkdir -p ~/bash-lab && cd ~/bash-lab
cat > release.sh <<'SCRIPT'
#!/bin/bash
target=$1
echo Releasing version $VERSION to $target
files=$(ls *.txt)
for f in $files; do
  echo packaging $f
done
SCRIPT
chmod +x release.sh

Step 2 — watch it misbehave. Run it with no arguments and an unset variable, then with a tricky filename:

touch "release notes.txt" build.txt
./release.sh

Expected: it prints Releasing version to (empty $VERSION and $target, swallowed silently), and the loop prints release, notes.txt, build.txt as three items — the space in release notes.txt was split. Nothing errored. This is the bug class the whole lesson is about.

Step 3 — add the safety preamble. Rewrite the head of the script and re-run:

cat > release.sh <<'SCRIPT'
#!/usr/bin/env bash
set -euo pipefail
target="${1:?usage: release.sh <target>}"
version="${VERSION:?VERSION env var is required}"
echo "Releasing version ${version} to ${target}"
for f in ./*.txt; do
  [[ -e "$f" ]] || continue
  echo "packaging $f"
done
SCRIPT
./release.sh

Expected now: it exits immediately with release.sh: line 3: 1: usage: release.sh <target> (or similar) — the missing argument is caught loudly instead of producing empty output. Provide the inputs and see it work cleanly:

VERSION=1.4.0 ./release.sh prod

Expected: Releasing version 1.4.0 to prod, then packaging ./build.txt and packaging ./release notes.txt as two correctly-quoted items.

Step 4 — add cleanup with a trap. Append a temp-dir pattern and confirm cleanup runs on both success and failure:

cat >> release.sh <<'SCRIPT'
workdir="$(mktemp -d)"
trap 'echo "cleaning $workdir"; rm -rf "$workdir"' EXIT
echo "staging into $workdir"
cp ./*.txt "$workdir"/
SCRIPT
VERSION=1.4.0 ./release.sh prod

Expected: you see staging into /tmp/tmp.XXXX, then cleaning /tmp/tmp.XXXX at the very end — the trap fired on normal exit. Run ls /tmp/tmp.* and confirm it is gone.

Step 5 — exit codes and validation. Confirm the script returns a non-zero status when an input is missing (this is what makes a CI step go red):

./release.sh prod    # VERSION not set
echo "exit code was: $?"

Expected: an error about VERSION and exit code was: 1 — a real failure signal a pipeline can act on.

Step 6 — lint with ShellCheck. Run it against both versions:

# Recreate the fragile original to a separate file and lint it:
cat > fragile.sh <<'SCRIPT'
#!/bin/bash
target=$1
files=$(ls *.txt)
for f in $files; do echo $f; done
SCRIPT
shellcheck fragile.sh
shellcheck release.sh

Expected: fragile.sh reports several findings — SC2086 (“Double quote to prevent globbing and word splitting”) on $target, $f, SC2045 (“Iterating over ls output is fragile”), and the unset-variable risks — each with a wiki link. release.sh should report no issues (or only benign ones). You have just used the tool that prevents the majority of shell production incidents.

Validation checklist.

Cleanup.

cd ~ && rm -rf ~/bash-lab

Cost note. Zero — everything ran locally with tools already on your machine (ShellCheck is free and open source). No cloud resources, no charges.

Common mistakes & troubleshooting

Symptom Cause Fix
Script “succeeds” but did the wrong thing No set -e; a failed command was ignored Add set -euo pipefail at the top
rm/cp hits the wrong files; word-splitting Unquoted $var got split on spaces or glob-expanded Quote everything: "$var", "${arr[@]}"
unbound variable error after adding set -u A variable is genuinely unset (often a typo or missing env var) Provide a default "${X:-}" or set the variable; fix the typo
command not found / exit 127 Tool not installed on the runner, or a typo command -v tool guard; install it in the image
Pipeline exits 0 despite a failing middle command No pipefail; status came from the last stage Add set -o pipefail
Variables set inside a loop are empty afterwards The loop ran in a subshell because it was on the right of a | Use done < <(cmd) (process substitution), not cmd | while
Works locally, syntax error in container Local Bash vs container /bin/sh (dash) — [[/arrays unsupported Force Bash (#!/usr/bin/env bash, RUN ["/bin/bash","-c",...]) or write POSIX
2>&1 “didn’t capture stderr” Redirections were in the wrong order Put > file before 2>&1
for line in $(cat f) mangles lines/spaces Word-splitting and globbing of the file contents Use while IFS= read -r line; do ...; done < f
set -e script dies on a grep with no match grep returns 1 on no-match, which -e treats as fatal Guard: grep -q ... || true, or if grep -q ...
name = value → “command not found” Spaces around = in assignment Remove them: name=value
Exit 137 in CI Process killed by SIGKILL — almost always OOM Increase container/job memory; reduce footprint

Best practices

Security notes

Interview & exam questions

1. Why must you double-quote variable expansions in Bash? Because an unquoted $var undergoes word splitting (the value is broken into separate arguments on $IFS) and globbing (any */?/[...] is expanded against filenames). Quoting ("$var") suppresses both, so the value is passed as a single, literal argument. It is the most common source of shell bugs and a frequent security issue.

2. What does set -euo pipefail do, flag by flag? -e (errexit) exits the script the moment any command fails; -u (nounset) treats use of an unset variable as a fatal error; -o pipefail makes a pipeline return the exit status of the last command that failed rather than just the last command. Together they make a script fail fast and loud instead of continuing silently after an error.

3. What is the difference between sh and bash, and why does it cause “works locally, fails in CI” bugs? sh is the POSIX shell — on Debian/Ubuntu it is dash, a strict minimal shell without [[ ]], arrays, ${var,,}, <<<, or pipefail. bash is a superset with all of those. Containers and CI often run scripts under /bin/sh (dash) while your laptop runs Bash, so Bash-only syntax fails only in the pipeline. Fix by forcing Bash or writing POSIX-compatible code.

4. What does trap cleanup EXIT accomplish, and why is EXIT special? It registers cleanup to run whenever the script exits — for any reason: normal completion, an error (with set -e), or a signal. EXIT is special because it catches all exit paths, guaranteeing temp files, locks, and port-forwards are released even when the script dies unexpectedly.

5. Explain $@ vs $* (and why the quoting matters). "$@" expands to each positional argument as a separate quoted word, preserving arguments that contain spaces — this is what you use to forward arguments to another command. "$*" joins all arguments into a single string separated by the first character of $IFS. Unquoted, both word-split. Use "$@" almost always.

6. What is the only safe way to read a file line by line, and what is wrong with for line in $(cat file)? The safe form is while IFS= read -r line; do ...; done < file. for line in $(cat file) splits the file on all whitespace (so a line with spaces becomes multiple iterations) and applies globbing — it iterates words, not lines. IFS= preserves whitespace and -r stops backslash interpretation.

7. Why might variables set inside a while loop be empty after the loop, and how do you fix it? If the loop is on the right side of a pipe (cmd | while read ...), it runs in a subshell, and variable changes do not propagate to the parent. Fix by feeding the loop with redirection or process substitution: while read ...; do ...; done < <(cmd) keeps the loop in the current shell.

8. What do exit codes 127, 126, 130, and 137 mean? 127 = command not found (typo or missing tool on $PATH); 126 = found but not executable (permissions); 130 = terminated by SIGINT (Ctrl-C, i.e. 128+2); 137 = killed by SIGKILL (128+9), in CI almost always the OOM killer. They let you diagnose a failure from the code alone.

9. Why is cmd > file 2>&1 different from cmd 2>&1 > file? Redirections apply left to right and 2>&1 means “send stderr to wherever stdout points right now”. In the first, stdout is already redirected to file, so both end up in file. In the second, stdout still points at the terminal when 2>&1 runs, so stderr goes to the terminal and only stdout goes to file.

10. What is idempotency in a script, and give three patterns that achieve it. Idempotency means running the script again yields the same end state without error, so retries are safe. Patterns: mkdir -p/rm -f (no error if the target already exists/absent); declarative kubectl apply instead of imperative create; check-then-act like grep -q line file || echo line >> file; and atomic write-temp-then-mv.

11. What is ShellCheck and name two issues it catches. ShellCheck is a static analyser for shell scripts. It flags, among others, unquoted variables (SC2086, word-splitting/globbing), iterating over ls output (SC2045), [ ] vs [[ ]] pitfalls, cd without checking success, and sh/bash feature mismatches — i.e. exactly the bug classes that cause shell outages, caught before merge.

12. How does getopts work, and what is its limitation? getopts optstring var parses single-letter flags in a while loop; a : after a letter means that flag takes an argument (placed in $OPTARG), and shift $((OPTIND-1)) removes the parsed options afterwards. Its limitation is that it handles only short single-letter options — no GNU-style --long-flags (you parse those by hand or with external getopt).

Quick check

  1. Write the three-line safety preamble every CI Bash script should start with.
  2. What does ${IMAGE_TAG:?must be set} do if IMAGE_TAG is unset?
  3. Which is correct for comparing two numbers: [[ "$a" == "$b" ]] or [[ "$a" -eq "$b" ]]?
  4. How do you append an element to a Bash array and then iterate it safely?
  5. What command lints a shell script for the bugs in this lesson?

Answers

  1. #!/usr/bin/env bash, then set -euo pipefail, then (usually) IFS=$'\n\t'.
  2. It prints must be set to stderr and exits the script non-zero immediately — a fail-fast guard for a required variable.
  3. [[ "$a" -eq "$b" ]]-eq is numeric; == compares strings (so "10" == "10.0" would be false). For arithmetic prefer (( a == b )).
  4. arr+=(value) to append; iterate with for x in "${arr[@]}"; do ...; done (always quote "${arr[@]}").
  5. shellcheck script.sh.

Exercise

Harden a real-world script and gate it in CI:

  1. Write healthcheck.sh that takes a URL via -u and a timeout in seconds via -t (default 60) using getopts, then polls the URL with curl -fsS in a while/until retry loop until it returns 200 or the timeout elapses, exiting 0 on success and non-zero on timeout.
  2. Add the full safety preamble (set -euo pipefail), validate that -u was provided (${url:?}), and check curl is installed with command -v.
  3. Add a trap that prints a clear failure message with $LINENO on ERR, and (if you create any temp files) removes them on EXIT.
  4. Make it idempotent and quiet by default; gate verbose set -x tracing behind a -v flag or DEBUG=1.
  5. Run shellcheck healthcheck.sh until it is clean, then add a GitHub Actions (or GitLab CI) job that installs ShellCheck and lints every *.sh in the repo as a required check.

Success criteria: the script exits non-zero (and a CI step would go red) when the URL never returns healthy within the timeout; a missing -u produces a clear usage error and a non-zero exit; ShellCheck reports no issues; and a filename or URL containing spaces is handled without splitting. Bonus: convert it to also work under POSIX sh (swap [[ ]][ ], drop pipefail) and note what you lost.

Certification mapping

Glossary

Next steps

You can now write the shell that lives inside any pipeline step safely. From here:

BashShell ScriptingCI/CDAutomationShellCheckLinux
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments