Shell Lesson 29 of 42

Shell /proc, /sys & sysctl: Kernel Introspection, Runtime Tuning, Persistent Configs & Per-Process Forensics From the Command Line

Why /proc, /sys, and sysctl Matter for Shell Operators

Most monitoring you do from shell — checking memory, listing open files, reading network connections, looking at namespaces, tuning kernel limits — does not need a special tool. The kernel exposes everything you need as text files under three magic directories:

When you understand these three filesystems, a huge category of tooling becomes “just use cat and echo”:

This lesson covers the layout, the conventions, the persistence model, and a lib/proc.sh of helper functions you can use to query and tune from any script.

/proc — The Process Filesystem

/proc is a virtual filesystem. The files don’t exist on disk; the kernel synthesizes them every time you read. This has consequences:

Per-process layout

The directory /proc/$pid/ (where $pid is a PID, or self for the current process) contains:

/proc/$pid/
├── status              # human-readable summary: name, state, uid, mem, threads
├── stat                # space-separated, one line; same data, machine-readable
├── statm               # memory in pages (size, resident, shared, ...)
├── cmdline             # NUL-separated argv (the actual process arguments)
├── environ             # NUL-separated environment (root or owner only)
├── exe                 # symlink → the executable file
├── cwd                 # symlink → current working directory
├── root                # symlink → root directory (different in chroots)
├── fd/                 # one symlink per open file descriptor
│   ├── 0 -> /dev/pts/2
│   ├── 1 -> /dev/null
│   └── 4 -> /var/log/myapp.log
├── fdinfo/             # offset, flags per fd
├── maps                # VM memory map: ranges, perms, mapped files
├── smaps               # detailed per-mapping memory accounting
├── io                  # bytes read/written by the process
├── limits              # rlimit values: max files, stack, ...
├── ns/                 # namespaces: net, mnt, pid, user, uts, ipc, cgroup
│   ├── net -> 'net:[4026531992]'
│   └── mnt -> 'mnt:[4026531840]'
├── cgroup              # cgroup memberships
├── sched               # scheduler statistics
├── stack               # current kernel stack trace (CONFIG_STACKTRACE)
└── task/$tid/          # one subdirectory per thread (same layout as $pid/)

Recipe: read process info portably

# Read PID, name, and state.
proc_status() {
  local pid="$1"
  [[ -d "/proc/$pid" ]] || { echo "no such pid: $pid" >&2; return 1; }

  local name state ppid threads vmrss
  while IFS=$'\t' read -r key value; do
    case "$key" in
      "Name:")    name=$value ;;
      "State:")   state=$value ;;
      "PPid:")    ppid=$value ;;
      "Threads:") threads=$value ;;
      "VmRSS:")   vmrss=$value ;;
    esac
  done < "/proc/$pid/status"

  printf 'pid=%s name=%s state=%s ppid=%s threads=%s rss=%s\n' \
    "$pid" "$name" "$state" "$ppid" "$threads" "$vmrss"
}

proc_status $$
# pid=12345 name=bash state=S (sleeping) ppid=12340 threads=1 rss=4096 kB

Recipe: list a process’s open files (tiny lsof)

proc_fds() {
  local pid="$1"
  [[ -d "/proc/$pid/fd" ]] || return 1
  local fd target
  for fd in /proc/$pid/fd/*; do
    target=$(readlink "$fd" 2>/dev/null) || continue
    printf 'fd=%-3s target=%s\n' "$(basename "$fd")" "$target"
  done
}

proc_fds 1234
# fd=0   target=/dev/null
# fd=1   target=pipe:[123456]
# fd=2   target=/var/log/myapp.log
# fd=4   target=socket:[789012]

This is what lsof does, but lsof walks every PID; if you know the PID you care about, /proc/$pid/fd/ is much faster (single readdir).

Recipe: identify a socket from its inode

socket:[INODE] from /proc/$pid/fd/ is opaque. Resolve it via /proc/net/tcp (or /proc/net/udp):

# /proc/net/tcp columns:
# sl  local_address rem_address st tx_queue:rx_queue tr:tm->when retrnsmt uid timeout inode
#  0: 0100007F:1F90 00000000:0000 0A 00000000:00000000 00:00000000 00000000  0      0 12345

socket_inode_to_addr() {
  local inode="$1"
  awk -v ino="$inode" '
    NR>1 && $10==ino {
      # local_address is hex IP:hex PORT, little-endian for IP.
      split($2, a, ":")
      ip=a[1]; port=a[2]
      # Convert IP from hex little-endian to dotted decimal.
      printf "%d.%d.%d.%d:%d  state=%s\n",
        strtonum("0x"substr(ip,7,2)), strtonum("0x"substr(ip,5,2)),
        strtonum("0x"substr(ip,3,2)), strtonum("0x"substr(ip,1,2)),
        strtonum("0x"port), $4
    }' /proc/net/tcp
}

This is reverse-engineerable but well-documented. Most production scripts use ss -tnp or lsof -i instead — but knowing where the data comes from helps when those tools are unavailable (minimal containers).

Recipe: detect what namespace a process is in

# Each ns symlink has the form 'net:[INODE]'. Two PIDs in the same namespace
# share the same inode.
ns_id() { readlink "/proc/$1/ns/$2"; }

# Compare two processes' namespaces:
[[ "$(ns_id 1234 net)" == "$(ns_id 5678 net)" ]] && echo "same network ns"

# Find the host's network namespace inode (PID 1 is init):
ns_id 1 net
# Output: net:[4026531992]

This is how nsenter and container tooling figure out which namespace to enter. For diagnostics: “is this process in a different network namespace from the host?” — compare ns_id $pid net with ns_id 1 net.

Recipe: scrape memory layout from /proc/$pid/maps

# /proc/$pid/maps lines:
# 7fc0a8e4f000-7fc0a9000000 r-xp 00000000 fd:00 524294  /usr/lib/x86_64-linux-gnu/libc-2.31.so

proc_libs() {
  local pid="$1"
  awk '$6 ~ /\.so/ { print $6 }' "/proc/$pid/maps" | sort -u
}

proc_libs $$
# /usr/lib/.../libc.so.6
# /usr/lib/.../libdl.so.2
# /usr/lib/.../libtinfo.so.6

maps reveals every shared library, every mapped file, every executable region. For forensics: “is this binary loading something it shouldn’t?” → grep maps for unexpected paths.

/sys — The Device Filesystem

/sys is similar in spirit to /proc but tied to the kernel’s device model. The shape:

/sys/
├── class/                 # by-functionality view (block, net, leds, thermal)
│   ├── net/eth0/          # symlink to /sys/devices/.../eth0
│   │   ├── operstate      # 'up' | 'down'
│   │   ├── mtu            # 1500
│   │   └── statistics/
│   │       └── rx_bytes
│   └── block/sda/
│       ├── size           # in 512-byte sectors
│       └── queue/scheduler  # 'mq-deadline [bfq] none'
├── devices/               # the underlying device tree (PCI, USB, ...)
├── module/                # loaded kernel modules and their parameters
└── kernel/                # kernel state knobs (rcu, debug, ...)

Useful one-liners

# Network interface link state.
cat /sys/class/net/eth0/operstate          # up

# Total bytes received on eth0 (no parsing /proc/net/dev needed).
cat /sys/class/net/eth0/statistics/rx_bytes

# Block device size in bytes.
echo $(( $(cat /sys/class/block/sda/size) * 512 ))

# Current I/O scheduler for sda.
cat /sys/class/block/sda/queue/scheduler   # mq-deadline [bfq] none
# Change it (writable):
echo deadline > /sys/class/block/sda/queue/scheduler

# CPU frequency governor.
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor   # performance / powersave / ...
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

# Thermal zone temperature (millidegrees C).
cat /sys/class/thermal/thermal_zone0/temp   # 47000 means 47.000 °C

/sys vs sysctl

/sys is per-device. sysctl (which is /proc/sys/...) is per-subsystem and global. Tuning a single network interface’s MTU is /sys/class/net/eth0/mtu. Tuning system-wide TCP buffer sizes is sysctl net.ipv4.tcp_rmem. They don’t overlap; learn both.

sysctl — Runtime Kernel Tuning

sysctl is the canonical interface for kernel tunables. The mapping is mechanical:

# These three are the same setting.
sysctl net.ipv4.ip_forward
cat /proc/sys/net/ipv4/ip_forward
# (also exposed via /sys depending on subsystem)

# Read all current values.
sysctl -a   # huge; pipe through grep for what you care about

# Read one.
sysctl -n net.ipv4.ip_forward   # -n: print value only, no key=value

# Write (runtime-only; lost on reboot).
sysctl -w net.ipv4.ip_forward=1
# Equivalent:
echo 1 > /proc/sys/net/ipv4/ip_forward

Persistence: /etc/sysctl.conf vs /etc/sysctl.d/

Settings written via sysctl -w or echo > /proc/sys/... are runtime-only. To persist across reboot, write them to a config file that’s applied at boot.

The 3-tier loading order (modern systemd-based distros):

  1. /usr/lib/sysctl.d/*.conf — distro defaults (e.g. 99-sysctl.conf).
  2. /run/sysctl.d/*.conf — runtime overrides (rarely used).
  3. /etc/sysctl.d/*.conf — your overrides.
  4. /etc/sysctl.conf — legacy single-file config (still supported; consider deprecated).

Files within each directory are loaded in lexicographic order; later wins. This is why convention is 99-myapp.conf (run last) for overrides and 10-something.conf (run early) for defaults. Use /etc/sysctl.d/, not /etc/sysctl.conf — multiple management tools cooperate via per-tool files, and per-tool files are easier to enable/disable.

Recipe: persist a sysctl change with rollback

sysctl_persist() {
  local key="$1" value="$2" reason="${3:-no reason given}"
  local file="/etc/sysctl.d/99-$(printf '%s' "$key" | tr '.' '-').conf"

  # Write the persistent file.
  cat >"$file" <<EOF
# Set by ensure_sysctl on $(date -u +%FT%TZ): $reason
$key = $value
EOF

  # Apply now (so the change takes effect without a reboot).
  sysctl -w "$key=$value" >/dev/null

  # Verify.
  local actual
  actual=$(sysctl -n "$key")
  if [[ "$actual" != "$value" ]]; then
    echo "sysctl_persist: failed to apply $key=$value (got '$actual')" >&2
    return 1
  fi
}

# Usage:
sysctl_persist net.ipv4.ip_forward 1 "enable IP forwarding for k8s networking"
sysctl_persist vm.swappiness 10 "favor cache over swap on this DB host"

Recipe: validate sysctl values before applying

Some sysctl values are constrained (e.g. net.core.rmem_max must be ≥ net.core.rmem_default). Always verify:

sysctl_safe_set() {
  local key="$1" value="$2"
  # Snapshot current value for rollback.
  local prev
  prev=$(sysctl -n "$key" 2>/dev/null) || { echo "no such sysctl: $key" >&2; return 1; }
  # Apply.
  if ! sysctl -w "$key=$value" >/dev/null 2>&1; then
    echo "sysctl rejected $key=$value; keeping $prev" >&2
    return 1
  fi
  # Validate via re-read (kernel may have clamped).
  local actual
  actual=$(sysctl -n "$key")
  if [[ "$actual" != "$value" ]]; then
    echo "warning: requested $value, kernel set $actual (clamping)" >&2
  fi
}

Recipe: dump current vs default to detect drift

# Compare the running kernel's values to /etc/sysctl.d/* declared values.
sysctl_drift() {
  local file actual declared key value
  for file in /etc/sysctl.d/*.conf; do
    while IFS='=' read -r key value; do
      [[ -z "${key// }" || "${key:0:1}" == "#" ]] && continue
      key="${key// /}"; value="${value# }"
      actual=$(sysctl -n "$key" 2>/dev/null) || continue
      if [[ "$actual" != "$value" ]]; then
        printf '%s: declared=%s actual=%s (file=%s)\n' "$key" "$value" "$actual" "$file"
      fi
    done < "$file"
  done
}

Useful in CI: “did someone change a sysctl at runtime that’s out of sync with the persistent config?” Drift before audit, not during.

/proc/sys Tunables Worth Knowing

A reference of high-leverage tunables most production hosts touch:

Tunable Meaning Common values
vm.swappiness 0=never swap unless OOM, 100=swap aggressively 1–10 for DB; 60 default
vm.overcommit_memory 0=heuristic, 1=always allow, 2=strict accounting 1 for Redis; 2 for paranoid
vm.dirty_ratio % of RAM dirty before sync writeback blocks 10–20 (default 20)
vm.dirty_background_ratio % of RAM dirty before bg writeback starts 5–10 (default 10)
net.ipv4.ip_forward Enable routing 1 for routers/k8s nodes
net.ipv4.tcp_fin_timeout TIME_WAIT seconds 15–30 for high-conn servers
net.ipv4.tcp_tw_reuse Reuse TIME_WAIT sockets 1 for outbound-heavy clients
net.core.somaxconn Listen backlog cap 4096+ for high-RPS servers
net.core.rmem_max / wmem_max Max socket buffer 16777216 for big BDP links
net.ipv4.tcp_rmem / tcp_wmem TCP buffer min/default/max 4096 87380 16777216
net.ipv4.tcp_keepalive_time Idle before keepalives 300 (default 7200)
fs.file-max System-wide max open files 2097152+ for fd-heavy hosts
fs.inotify.max_user_watches inotify watches per user 524288+ for k8s hosts
kernel.pid_max Max PID value 4194304 for high-fork hosts
kernel.panic_on_oops Panic on kernel oops (cluster reset) 0 default; 1 in HA

Add kernel.dmesg_restrict=1 and kernel.kptr_restrict=2 for hardening.

Putting It Together: lib/proc.sh

# lib/proc.sh — process and kernel introspection helpers.

# ─── Process queries ───────────────────────────────────────────────────────

proc_exists() { [[ -d "/proc/$1" ]]; }

proc_name() {
  [[ -r "/proc/$1/comm" ]] && cat "/proc/$1/comm"
}

# Read /proc/$pid/status into associative-array-style output.
proc_status_kv() {
  local pid="$1" key value
  while IFS=$'\t' read -r key value; do
    key="${key%:}"
    printf '%s=%s\n' "$key" "$value"
  done < "/proc/$pid/status"
}

# RSS in kB.
proc_rss() {
  awk '/^VmRSS:/ {print $2}' "/proc/$1/status" 2>/dev/null
}

# UID owning the process.
proc_uid() {
  awk '/^Uid:/ {print $2}' "/proc/$1/status"
}

# Walk children of a PID.
proc_children() {
  local parent="$1"
  for pid in /proc/[0-9]*; do
    pid=${pid##*/}
    [[ "$(awk '/^PPid:/ {print $2}' "/proc/$pid/status" 2>/dev/null)" == "$parent" ]] \
      && echo "$pid"
  done
}

# ─── Open-file inspection (mini lsof) ──────────────────────────────────────

proc_open_files() {
  local pid="$1" fd target
  for fd in /proc/$pid/fd/*; do
    target=$(readlink "$fd" 2>/dev/null) || continue
    printf '%s\t%s\n' "$(basename "$fd")" "$target"
  done
}

# Find PIDs that have a given path open.
proc_holders_of() {
  local path="$1" pid fd
  path=$(readlink -f "$path")
  for pid in /proc/[0-9]*; do
    pid=${pid##*/}
    for fd in /proc/$pid/fd/*; do
      [[ "$(readlink "$fd" 2>/dev/null)" == "$path" ]] && {
        echo "$pid"
        break
      }
    done
  done
}

# ─── Namespace inspection ──────────────────────────────────────────────────

ns_inode() { readlink "/proc/$1/ns/$2"; }

ns_same_as_host() {
  [[ "$(ns_inode "$1" "$2")" == "$(ns_inode 1 "$2")" ]]
}

# ─── sysctl helpers ────────────────────────────────────────────────────────

sysctl_get() { sysctl -n "$1" 2>/dev/null; }

sysctl_set() {
  local key="$1" value="$2"
  sysctl -w "$key=$value" >/dev/null
}

sysctl_persist() {
  local key="$1" value="$2" reason="${3:-managed}"
  local fname
  fname=$(printf '%s' "$key" | tr '.' '-')
  local file="/etc/sysctl.d/99-${fname}.conf"

  cat >"$file" <<EOF
# Managed: $reason
# Set by lib/proc.sh on $(date -u +%FT%TZ)
$key = $value
EOF
  chmod 0644 "$file"
  sysctl_set "$key" "$value"
}

# Rollback: remove the managed file and reload.
sysctl_unmanage() {
  local key="$1"
  local fname
  fname=$(printf '%s' "$key" | tr '.' '-')
  rm -f "/etc/sysctl.d/99-${fname}.conf"
  sysctl --system >/dev/null
}

# ─── /sys helpers ──────────────────────────────────────────────────────────

block_size_bytes() {
  local dev="$1"   # e.g. sda
  local sectors
  sectors=$(cat "/sys/class/block/${dev}/size" 2>/dev/null) || return 1
  echo $((sectors * 512))
}

iface_link() { cat "/sys/class/net/$1/operstate" 2>/dev/null; }
iface_mtu()  { cat "/sys/class/net/$1/mtu" 2>/dev/null; }

iface_rx_bytes() { cat "/sys/class/net/$1/statistics/rx_bytes" 2>/dev/null; }
iface_tx_bytes() { cat "/sys/class/net/$1/statistics/tx_bytes" 2>/dev/null; }

# Compute throughput between two snapshots.
iface_throughput_bps() {
  local iface="$1" interval="${2:-1}"
  local r1 r2
  r1=$(iface_rx_bytes "$iface")
  sleep "$interval"
  r2=$(iface_rx_bytes "$iface")
  echo $(( (r2 - r1) * 8 / interval ))
}

Real-World Recipes

Recipe 1: Find what’s holding /var/log/myapp.log open

. lib/proc.sh
proc_holders_of /var/log/myapp.log
# 12345
# 12346
ps -fp 12345 12346
# Output: which processes still have the deleted log open

This is the “why isn’t my disk space freed after rm?” debugging tool. Restart the listed processes or close their fds and the kernel reclaims the inode.

Recipe 2: Tune for a database host

# A reasonable baseline for a Postgres host.
sysctl_persist vm.swappiness 1 "DB host: avoid swapping"
sysctl_persist vm.dirty_background_ratio 5 "smaller writeback bursts"
sysctl_persist vm.dirty_ratio 10 "smaller writeback bursts"
sysctl_persist vm.overcommit_memory 2 "strict accounting; refuse oversubscription"
sysctl_persist vm.overcommit_ratio 80 "with 20% reserved for kernel"
sysctl_persist net.core.somaxconn 4096 "DB connection pool backlog"
sysctl_persist fs.file-max 2097152 "many DB connections + WAL files"
sysctl_persist kernel.shmmax 17179869184 "for big shared_buffers"

# Now persist and verify in one pass.
sysctl --system   # reload all /etc/sysctl.d/*.conf

Recipe 3: Audit drift between expected and actual sysctl

# CI check: read a manifest of expected sysctl values and compare.
audit_sysctl_manifest() {
  local manifest="$1" key value actual fail=0
  while IFS='=' read -r key value; do
    [[ -z "${key// }" || "${key:0:1}" == "#" ]] && continue
    key="${key// /}"; value="${value# }"
    actual=$(sysctl -n "$key" 2>/dev/null)
    if [[ "$actual" != "$value" ]]; then
      printf 'DRIFT %s: expected %s got %s\n' "$key" "$value" "$actual"
      fail=1
    fi
  done < "$manifest"
  return "$fail"
}

# manifest format:
# vm.swappiness = 1
# net.ipv4.ip_forward = 1
audit_sysctl_manifest /etc/myapp/sysctl-baseline.conf || exit 1

Recipe 4: Detect container vs host

# Container detection from /proc.
detect_container() {
  if [[ -f /.dockerenv ]]; then echo docker; return; fi
  if grep -qa 'kubepods\|docker' /proc/1/cgroup 2>/dev/null; then echo container; return; fi
  if [[ "$(awk -F/ '$2=="systemd" {print $NF}' /proc/1/cgroup 2>/dev/null)" != "$(hostname)" ]]; then
    # cgroup path differs from hostname-named systemd scope: probably container
    :
  fi
  # Compare PID 1's mount namespace to the host's (won't work inside container).
  # Better: check PID 1's parent. Host has none; container's PID 1 is /sbin/init or app.
  if [[ "$(proc_name 1)" =~ ^(systemd|init)$ ]]; then echo host; else echo container; fi
}

/proc/1/cgroup contains the cgroup path; in containers it usually mentions docker, kubepods, or lxc. This is far more reliable than [[ -f /.dockerenv ]] (which Docker can hide).

Recipe 5: Read scheduler stats for a tight-loop process

# /proc/$pid/sched has cumulative scheduler stats.
sched_summary() {
  local pid="$1"
  awk '
    /se.sum_exec_runtime/   { runtime  = $3 }
    /se.statistics.wait_sum/ { wait     = $3 }
    /nr_voluntary_switches/  { vol      = $3 }
    /nr_involuntary_switches/{ invol    = $3 }
    END {
      printf "runtime_ms=%.1f wait_ms=%.1f vol_switches=%s invol_switches=%s\n",
        runtime, wait, vol, invol
    }' "/proc/$pid/sched"
}

invol_switches rising fast = the process is being preempted by other CPU-hungry processes. wait_ms rising = the process is waiting in the run queue. Useful diagnostic when “the app is slow but CPU isn’t pegged.”

Footgun List

  1. /proc/$pid is racy. A process can exit between your [[ -d /proc/$pid ]] and your cat /proc/$pid/status. Always handle “file vanished” gracefully.

  2. /proc/$pid/cmdline uses NUL separators, not spaces. cat shows them squished together. Use tr '\0' ' ' for human display, or xargs -0 to parse.

  3. /proc/sys/kernel/perf_event_paranoid defaults restrict perf for non-root. If your script invokes perf, expect to need root or a tuned perf_event_paranoid.

  4. sysctl --system reloads ALL /etc/sysctl.d files. If a stale file declares something destructive, --system will apply it. Audit periodically.

  5. /etc/sysctl.conf is loaded by some distros and ignored by others. Use /etc/sysctl.d/*.conf only for portability.

  6. Some sysctl values are clamped silently. sysctl -w net.core.rmem_max=999999999 may set a smaller value than requested. Always re-read after writing.

  7. /sys writes can require specific timing. Writing to /sys/.../scheduler while the device is busy may fail with EBUSY. Stop I/O first if possible.

  8. /proc/$pid/smaps is expensive to read. It walks the process’s entire VM. On large processes (multi-GB heaps), a single cat smaps can take seconds and cause scheduling glitches. Prefer statm for cheap memory snapshots.

  9. readlink /proc/$pid/exe may say (deleted) if the process’s binary was upgraded after the process started. The pattern /usr/bin/myapp (deleted) means “restart this process to pick up the new binary.”

  10. Per-PID files are subject to ptrace_scope hardening. With kernel.yama.ptrace_scope=2 or 3, even root may need CAP_SYS_PTRACE to read /proc/$pid/environ or /proc/$pid/maps. Surface a clear error in scripts that depend on these.

  11. Inside containers, /proc/sys/... is largely read-only or namespaced. Don’t assume sysctl writes from inside a container will persist; many net.* and vm.* are host-only.

  12. /sys/class/net/eth0/statistics/rx_bytes is a 64-bit counter that may wrap. On 1 Gbps interfaces it’s effectively unwrapping for years, but on 100 Gbps interfaces it can wrap in <1 day. Use deltas and handle wrap-around if your tooling runs long.

Quick-Reference Card

┌─ /proc/$pid/ — PER-PROCESS STATE ─────────────────────────────────────┐
│  status       human KV: name, state, uid, ppid, threads, mem        │
│  stat         single line, machine-parseable                         │
│  statm        memory in pages: size, resident, shared, ...           │
│  cmdline      NUL-separated argv                                     │
│  environ      NUL-separated env (root/owner only)                    │
│  fd/          open file descriptors (symlinks)                       │
│  maps         memory map: ranges, perms, mapped files                │
│  smaps        detailed per-mapping accounting (slow on big procs)    │
│  ns/          namespaces (net, mnt, pid, user, uts, ipc, cgroup)     │
│  io           bytes read/written                                     │
│  limits       rlimits                                                │
│  sched        scheduler statistics                                   │
└────────────────────────────────────────────────────────────────────────┘

┌─ GLOBAL /proc ENTRIES ────────────────────────────────────────────────┐
│  /proc/cpuinfo, /proc/meminfo, /proc/loadavg                         │
│  /proc/mounts, /proc/swaps, /proc/diskstats                          │
│  /proc/net/{tcp,udp,unix,dev,route,arp}                              │
│  /proc/sys/...        sysctl tunables exposed as files               │
│  /proc/version, /proc/cmdline (kernel boot args)                     │
└────────────────────────────────────────────────────────────────────────┘

┌─ /sys — DEVICE / DRIVER ─────────────────────────────────────────────┐
│  /sys/class/net/<iface>/{operstate,mtu,statistics/}                  │
│  /sys/class/block/<dev>/{size,queue/scheduler}                       │
│  /sys/devices/system/cpu/<n>/cpufreq/scaling_governor                │
│  /sys/class/thermal/thermal_zone*/temp                               │
│  /sys/module/<mod>/parameters/*    runtime module params             │
└────────────────────────────────────────────────────────────────────────┘

┌─ sysctl LIFECYCLE ────────────────────────────────────────────────────┐
│  sysctl -a                          dump all                          │
│  sysctl -n KEY                      read one (no key= prefix)         │
│  sysctl -w KEY=VAL                  runtime-only set                  │
│  sysctl --system                    reload /etc/sysctl.d/*.conf       │
│  Persist by writing /etc/sysctl.d/99-NAME.conf                       │
│  Files read in lex order; later wins                                 │
└────────────────────────────────────────────────────────────────────────┘

┌─ HIGH-VALUE TUNABLES ─────────────────────────────────────────────────┐
│  vm.swappiness                  0–10 for DB; 60 default              │
│  vm.overcommit_memory           1=always; 2=strict (paranoid)        │
│  net.core.somaxconn             4096+ for high-RPS                    │
│  net.ipv4.tcp_rmem/wmem         "4096 87380 16777216" for big BDP    │
│  fs.file-max                    2097152 for fd-heavy hosts           │
│  fs.inotify.max_user_watches    524288+ for k8s/IDE hosts            │
│  kernel.pid_max                 4194304 for high-fork                │
└────────────────────────────────────────────────────────────────────────┘

What’s Next

You can now read process and kernel state from /proc and /sys, and tune the kernel via sysctl. The next layer is integration with container and cluster tooling: how shell scripts safely interact with docker, podman, and kubectl. The next lesson, Container Interactions: docker/podman exec, kubectl Pipelines & jq-Driven Inspection, covers script-driven container lifecycle, log collection, exec with proper stdin/tty handling, and parsing kubectl JSON output with jq for automation.

shellprocsysctllinux-kernelintrospectionperformance-tuningnamespacescgroupslsofdiagnostics
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments