Why /proc, /sys, and sysctl Matter for Shell Operators
Most monitoring you do from shell — checking memory, listing open files, reading network connections, looking at namespaces, tuning kernel limits — does not need a special tool. The kernel exposes everything you need as text files under three magic directories:
/proc— process and kernel state. Per-process data lives in/proc/$pid/. Global kernel state lives in/proc/<various>(e.g./proc/meminfo,/proc/cpuinfo)./proc/sys— runtime-tunable kernel parameters. The same assysctlshows, but exposed as files:cat /proc/sys/net/ipv4/ip_forward,echo 1 > /proc/sys/net/ipv4/ip_forward./sys— device and driver state. Per-device knobs live under/sys/class/,/sys/devices/,/sys/block/. CPU frequency scaling, block-device queue depth, network interface flags — all live here.
When you understand these three filesystems, a huge category of tooling becomes “just use cat and echo”:
- “What’s the memory layout of my Java process?” →
cat /proc/$pid/status - “What network namespace is this container in?” →
readlink /proc/$pid/ns/net - “Which files does this process have open?” →
ls -l /proc/$pid/fd/ - “How much swap is being used right now?” →
awk '/^Swap/' /proc/meminfo - “Tune TCP buffers on a flaky network” →
sysctl -w net.ipv4.tcp_rmem='4096 87380 16777216' - “Persist across reboots” → write to
/etc/sysctl.d/99-tcp-tuning.conf
This lesson covers the layout, the conventions, the persistence model, and a lib/proc.sh of helper functions you can use to query and tune from any script.
/proc — The Process Filesystem
/proc is a virtual filesystem. The files don’t exist on disk; the kernel synthesizes them every time you read. This has consequences:
- You cannot trust file size.
stat /proc/meminfoshows size 0; the actual content materializes only on read. - Reads are not atomic with respect to the underlying state. The contents of
/proc/$pid/mapscan change between consecutive reads if the process is allocating memory. - Some files are accessible only to root or to the process owner (look at the file’s owner:
ls -l /proc/1/mapsis root,ls -l /proc/$$/mapsis you). - Some entries disappear when the process exits. Operating on
/proc/$pid/...is racy with the process’s lifecycle.
Per-process layout
The directory /proc/$pid/ (where $pid is a PID, or self for the current process) contains:
/proc/$pid/
├── status # human-readable summary: name, state, uid, mem, threads
├── stat # space-separated, one line; same data, machine-readable
├── statm # memory in pages (size, resident, shared, ...)
├── cmdline # NUL-separated argv (the actual process arguments)
├── environ # NUL-separated environment (root or owner only)
├── exe # symlink → the executable file
├── cwd # symlink → current working directory
├── root # symlink → root directory (different in chroots)
├── fd/ # one symlink per open file descriptor
│ ├── 0 -> /dev/pts/2
│ ├── 1 -> /dev/null
│ └── 4 -> /var/log/myapp.log
├── fdinfo/ # offset, flags per fd
├── maps # VM memory map: ranges, perms, mapped files
├── smaps # detailed per-mapping memory accounting
├── io # bytes read/written by the process
├── limits # rlimit values: max files, stack, ...
├── ns/ # namespaces: net, mnt, pid, user, uts, ipc, cgroup
│ ├── net -> 'net:[4026531992]'
│ └── mnt -> 'mnt:[4026531840]'
├── cgroup # cgroup memberships
├── sched # scheduler statistics
├── stack # current kernel stack trace (CONFIG_STACKTRACE)
└── task/$tid/ # one subdirectory per thread (same layout as $pid/)
Recipe: read process info portably
# Read PID, name, and state.
proc_status() {
local pid="$1"
[[ -d "/proc/$pid" ]] || { echo "no such pid: $pid" >&2; return 1; }
local name state ppid threads vmrss
while IFS=$'\t' read -r key value; do
case "$key" in
"Name:") name=$value ;;
"State:") state=$value ;;
"PPid:") ppid=$value ;;
"Threads:") threads=$value ;;
"VmRSS:") vmrss=$value ;;
esac
done < "/proc/$pid/status"
printf 'pid=%s name=%s state=%s ppid=%s threads=%s rss=%s\n' \
"$pid" "$name" "$state" "$ppid" "$threads" "$vmrss"
}
proc_status $$
# pid=12345 name=bash state=S (sleeping) ppid=12340 threads=1 rss=4096 kB
Recipe: list a process’s open files (tiny lsof)
proc_fds() {
local pid="$1"
[[ -d "/proc/$pid/fd" ]] || return 1
local fd target
for fd in /proc/$pid/fd/*; do
target=$(readlink "$fd" 2>/dev/null) || continue
printf 'fd=%-3s target=%s\n' "$(basename "$fd")" "$target"
done
}
proc_fds 1234
# fd=0 target=/dev/null
# fd=1 target=pipe:[123456]
# fd=2 target=/var/log/myapp.log
# fd=4 target=socket:[789012]
This is what lsof does, but lsof walks every PID; if you know the PID you care about, /proc/$pid/fd/ is much faster (single readdir).
Recipe: identify a socket from its inode
socket:[INODE] from /proc/$pid/fd/ is opaque. Resolve it via /proc/net/tcp (or /proc/net/udp):
# /proc/net/tcp columns:
# sl local_address rem_address st tx_queue:rx_queue tr:tm->when retrnsmt uid timeout inode
# 0: 0100007F:1F90 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 12345
socket_inode_to_addr() {
local inode="$1"
awk -v ino="$inode" '
NR>1 && $10==ino {
# local_address is hex IP:hex PORT, little-endian for IP.
split($2, a, ":")
ip=a[1]; port=a[2]
# Convert IP from hex little-endian to dotted decimal.
printf "%d.%d.%d.%d:%d state=%s\n",
strtonum("0x"substr(ip,7,2)), strtonum("0x"substr(ip,5,2)),
strtonum("0x"substr(ip,3,2)), strtonum("0x"substr(ip,1,2)),
strtonum("0x"port), $4
}' /proc/net/tcp
}
This is reverse-engineerable but well-documented. Most production scripts use ss -tnp or lsof -i instead — but knowing where the data comes from helps when those tools are unavailable (minimal containers).
Recipe: detect what namespace a process is in
# Each ns symlink has the form 'net:[INODE]'. Two PIDs in the same namespace
# share the same inode.
ns_id() { readlink "/proc/$1/ns/$2"; }
# Compare two processes' namespaces:
[[ "$(ns_id 1234 net)" == "$(ns_id 5678 net)" ]] && echo "same network ns"
# Find the host's network namespace inode (PID 1 is init):
ns_id 1 net
# Output: net:[4026531992]
This is how nsenter and container tooling figure out which namespace to enter. For diagnostics: “is this process in a different network namespace from the host?” — compare ns_id $pid net with ns_id 1 net.
Recipe: scrape memory layout from /proc/$pid/maps
# /proc/$pid/maps lines:
# 7fc0a8e4f000-7fc0a9000000 r-xp 00000000 fd:00 524294 /usr/lib/x86_64-linux-gnu/libc-2.31.so
proc_libs() {
local pid="$1"
awk '$6 ~ /\.so/ { print $6 }' "/proc/$pid/maps" | sort -u
}
proc_libs $$
# /usr/lib/.../libc.so.6
# /usr/lib/.../libdl.so.2
# /usr/lib/.../libtinfo.so.6
maps reveals every shared library, every mapped file, every executable region. For forensics: “is this binary loading something it shouldn’t?” → grep maps for unexpected paths.
/sys — The Device Filesystem
/sys is similar in spirit to /proc but tied to the kernel’s device model. The shape:
/sys/
├── class/ # by-functionality view (block, net, leds, thermal)
│ ├── net/eth0/ # symlink to /sys/devices/.../eth0
│ │ ├── operstate # 'up' | 'down'
│ │ ├── mtu # 1500
│ │ └── statistics/
│ │ └── rx_bytes
│ └── block/sda/
│ ├── size # in 512-byte sectors
│ └── queue/scheduler # 'mq-deadline [bfq] none'
├── devices/ # the underlying device tree (PCI, USB, ...)
├── module/ # loaded kernel modules and their parameters
└── kernel/ # kernel state knobs (rcu, debug, ...)
Useful one-liners
# Network interface link state.
cat /sys/class/net/eth0/operstate # up
# Total bytes received on eth0 (no parsing /proc/net/dev needed).
cat /sys/class/net/eth0/statistics/rx_bytes
# Block device size in bytes.
echo $(( $(cat /sys/class/block/sda/size) * 512 ))
# Current I/O scheduler for sda.
cat /sys/class/block/sda/queue/scheduler # mq-deadline [bfq] none
# Change it (writable):
echo deadline > /sys/class/block/sda/queue/scheduler
# CPU frequency governor.
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor # performance / powersave / ...
echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
# Thermal zone temperature (millidegrees C).
cat /sys/class/thermal/thermal_zone0/temp # 47000 means 47.000 °C
/sys vs sysctl
/sys is per-device. sysctl (which is /proc/sys/...) is per-subsystem and global. Tuning a single network interface’s MTU is /sys/class/net/eth0/mtu. Tuning system-wide TCP buffer sizes is sysctl net.ipv4.tcp_rmem. They don’t overlap; learn both.
sysctl — Runtime Kernel Tuning
sysctl is the canonical interface for kernel tunables. The mapping is mechanical:
# These three are the same setting.
sysctl net.ipv4.ip_forward
cat /proc/sys/net/ipv4/ip_forward
# (also exposed via /sys depending on subsystem)
# Read all current values.
sysctl -a # huge; pipe through grep for what you care about
# Read one.
sysctl -n net.ipv4.ip_forward # -n: print value only, no key=value
# Write (runtime-only; lost on reboot).
sysctl -w net.ipv4.ip_forward=1
# Equivalent:
echo 1 > /proc/sys/net/ipv4/ip_forward
Persistence: /etc/sysctl.conf vs /etc/sysctl.d/
Settings written via sysctl -w or echo > /proc/sys/... are runtime-only. To persist across reboot, write them to a config file that’s applied at boot.
The 3-tier loading order (modern systemd-based distros):
/usr/lib/sysctl.d/*.conf— distro defaults (e.g.99-sysctl.conf)./run/sysctl.d/*.conf— runtime overrides (rarely used)./etc/sysctl.d/*.conf— your overrides./etc/sysctl.conf— legacy single-file config (still supported; consider deprecated).
Files within each directory are loaded in lexicographic order; later wins. This is why convention is 99-myapp.conf (run last) for overrides and 10-something.conf (run early) for defaults. Use /etc/sysctl.d/, not /etc/sysctl.conf — multiple management tools cooperate via per-tool files, and per-tool files are easier to enable/disable.
Recipe: persist a sysctl change with rollback
sysctl_persist() {
local key="$1" value="$2" reason="${3:-no reason given}"
local file="/etc/sysctl.d/99-$(printf '%s' "$key" | tr '.' '-').conf"
# Write the persistent file.
cat >"$file" <<EOF
# Set by ensure_sysctl on $(date -u +%FT%TZ): $reason
$key = $value
EOF
# Apply now (so the change takes effect without a reboot).
sysctl -w "$key=$value" >/dev/null
# Verify.
local actual
actual=$(sysctl -n "$key")
if [[ "$actual" != "$value" ]]; then
echo "sysctl_persist: failed to apply $key=$value (got '$actual')" >&2
return 1
fi
}
# Usage:
sysctl_persist net.ipv4.ip_forward 1 "enable IP forwarding for k8s networking"
sysctl_persist vm.swappiness 10 "favor cache over swap on this DB host"
Recipe: validate sysctl values before applying
Some sysctl values are constrained (e.g. net.core.rmem_max must be ≥ net.core.rmem_default). Always verify:
sysctl_safe_set() {
local key="$1" value="$2"
# Snapshot current value for rollback.
local prev
prev=$(sysctl -n "$key" 2>/dev/null) || { echo "no such sysctl: $key" >&2; return 1; }
# Apply.
if ! sysctl -w "$key=$value" >/dev/null 2>&1; then
echo "sysctl rejected $key=$value; keeping $prev" >&2
return 1
fi
# Validate via re-read (kernel may have clamped).
local actual
actual=$(sysctl -n "$key")
if [[ "$actual" != "$value" ]]; then
echo "warning: requested $value, kernel set $actual (clamping)" >&2
fi
}
Recipe: dump current vs default to detect drift
# Compare the running kernel's values to /etc/sysctl.d/* declared values.
sysctl_drift() {
local file actual declared key value
for file in /etc/sysctl.d/*.conf; do
while IFS='=' read -r key value; do
[[ -z "${key// }" || "${key:0:1}" == "#" ]] && continue
key="${key// /}"; value="${value# }"
actual=$(sysctl -n "$key" 2>/dev/null) || continue
if [[ "$actual" != "$value" ]]; then
printf '%s: declared=%s actual=%s (file=%s)\n' "$key" "$value" "$actual" "$file"
fi
done < "$file"
done
}
Useful in CI: “did someone change a sysctl at runtime that’s out of sync with the persistent config?” Drift before audit, not during.
/proc/sys Tunables Worth Knowing
A reference of high-leverage tunables most production hosts touch:
| Tunable | Meaning | Common values |
|---|---|---|
vm.swappiness |
0=never swap unless OOM, 100=swap aggressively | 1–10 for DB; 60 default |
vm.overcommit_memory |
0=heuristic, 1=always allow, 2=strict accounting | 1 for Redis; 2 for paranoid |
vm.dirty_ratio |
% of RAM dirty before sync writeback blocks | 10–20 (default 20) |
vm.dirty_background_ratio |
% of RAM dirty before bg writeback starts | 5–10 (default 10) |
net.ipv4.ip_forward |
Enable routing | 1 for routers/k8s nodes |
net.ipv4.tcp_fin_timeout |
TIME_WAIT seconds | 15–30 for high-conn servers |
net.ipv4.tcp_tw_reuse |
Reuse TIME_WAIT sockets | 1 for outbound-heavy clients |
net.core.somaxconn |
Listen backlog cap | 4096+ for high-RPS servers |
net.core.rmem_max / wmem_max |
Max socket buffer | 16777216 for big BDP links |
net.ipv4.tcp_rmem / tcp_wmem |
TCP buffer min/default/max | 4096 87380 16777216 |
net.ipv4.tcp_keepalive_time |
Idle before keepalives | 300 (default 7200) |
fs.file-max |
System-wide max open files | 2097152+ for fd-heavy hosts |
fs.inotify.max_user_watches |
inotify watches per user | 524288+ for k8s hosts |
kernel.pid_max |
Max PID value | 4194304 for high-fork hosts |
kernel.panic_on_oops |
Panic on kernel oops (cluster reset) | 0 default; 1 in HA |
Add kernel.dmesg_restrict=1 and kernel.kptr_restrict=2 for hardening.
Putting It Together: lib/proc.sh
# lib/proc.sh — process and kernel introspection helpers.
# ─── Process queries ───────────────────────────────────────────────────────
proc_exists() { [[ -d "/proc/$1" ]]; }
proc_name() {
[[ -r "/proc/$1/comm" ]] && cat "/proc/$1/comm"
}
# Read /proc/$pid/status into associative-array-style output.
proc_status_kv() {
local pid="$1" key value
while IFS=$'\t' read -r key value; do
key="${key%:}"
printf '%s=%s\n' "$key" "$value"
done < "/proc/$pid/status"
}
# RSS in kB.
proc_rss() {
awk '/^VmRSS:/ {print $2}' "/proc/$1/status" 2>/dev/null
}
# UID owning the process.
proc_uid() {
awk '/^Uid:/ {print $2}' "/proc/$1/status"
}
# Walk children of a PID.
proc_children() {
local parent="$1"
for pid in /proc/[0-9]*; do
pid=${pid##*/}
[[ "$(awk '/^PPid:/ {print $2}' "/proc/$pid/status" 2>/dev/null)" == "$parent" ]] \
&& echo "$pid"
done
}
# ─── Open-file inspection (mini lsof) ──────────────────────────────────────
proc_open_files() {
local pid="$1" fd target
for fd in /proc/$pid/fd/*; do
target=$(readlink "$fd" 2>/dev/null) || continue
printf '%s\t%s\n' "$(basename "$fd")" "$target"
done
}
# Find PIDs that have a given path open.
proc_holders_of() {
local path="$1" pid fd
path=$(readlink -f "$path")
for pid in /proc/[0-9]*; do
pid=${pid##*/}
for fd in /proc/$pid/fd/*; do
[[ "$(readlink "$fd" 2>/dev/null)" == "$path" ]] && {
echo "$pid"
break
}
done
done
}
# ─── Namespace inspection ──────────────────────────────────────────────────
ns_inode() { readlink "/proc/$1/ns/$2"; }
ns_same_as_host() {
[[ "$(ns_inode "$1" "$2")" == "$(ns_inode 1 "$2")" ]]
}
# ─── sysctl helpers ────────────────────────────────────────────────────────
sysctl_get() { sysctl -n "$1" 2>/dev/null; }
sysctl_set() {
local key="$1" value="$2"
sysctl -w "$key=$value" >/dev/null
}
sysctl_persist() {
local key="$1" value="$2" reason="${3:-managed}"
local fname
fname=$(printf '%s' "$key" | tr '.' '-')
local file="/etc/sysctl.d/99-${fname}.conf"
cat >"$file" <<EOF
# Managed: $reason
# Set by lib/proc.sh on $(date -u +%FT%TZ)
$key = $value
EOF
chmod 0644 "$file"
sysctl_set "$key" "$value"
}
# Rollback: remove the managed file and reload.
sysctl_unmanage() {
local key="$1"
local fname
fname=$(printf '%s' "$key" | tr '.' '-')
rm -f "/etc/sysctl.d/99-${fname}.conf"
sysctl --system >/dev/null
}
# ─── /sys helpers ──────────────────────────────────────────────────────────
block_size_bytes() {
local dev="$1" # e.g. sda
local sectors
sectors=$(cat "/sys/class/block/${dev}/size" 2>/dev/null) || return 1
echo $((sectors * 512))
}
iface_link() { cat "/sys/class/net/$1/operstate" 2>/dev/null; }
iface_mtu() { cat "/sys/class/net/$1/mtu" 2>/dev/null; }
iface_rx_bytes() { cat "/sys/class/net/$1/statistics/rx_bytes" 2>/dev/null; }
iface_tx_bytes() { cat "/sys/class/net/$1/statistics/tx_bytes" 2>/dev/null; }
# Compute throughput between two snapshots.
iface_throughput_bps() {
local iface="$1" interval="${2:-1}"
local r1 r2
r1=$(iface_rx_bytes "$iface")
sleep "$interval"
r2=$(iface_rx_bytes "$iface")
echo $(( (r2 - r1) * 8 / interval ))
}
Real-World Recipes
Recipe 1: Find what’s holding /var/log/myapp.log open
. lib/proc.sh
proc_holders_of /var/log/myapp.log
# 12345
# 12346
ps -fp 12345 12346
# Output: which processes still have the deleted log open
This is the “why isn’t my disk space freed after rm?” debugging tool. Restart the listed processes or close their fds and the kernel reclaims the inode.
Recipe 2: Tune for a database host
# A reasonable baseline for a Postgres host.
sysctl_persist vm.swappiness 1 "DB host: avoid swapping"
sysctl_persist vm.dirty_background_ratio 5 "smaller writeback bursts"
sysctl_persist vm.dirty_ratio 10 "smaller writeback bursts"
sysctl_persist vm.overcommit_memory 2 "strict accounting; refuse oversubscription"
sysctl_persist vm.overcommit_ratio 80 "with 20% reserved for kernel"
sysctl_persist net.core.somaxconn 4096 "DB connection pool backlog"
sysctl_persist fs.file-max 2097152 "many DB connections + WAL files"
sysctl_persist kernel.shmmax 17179869184 "for big shared_buffers"
# Now persist and verify in one pass.
sysctl --system # reload all /etc/sysctl.d/*.conf
Recipe 3: Audit drift between expected and actual sysctl
# CI check: read a manifest of expected sysctl values and compare.
audit_sysctl_manifest() {
local manifest="$1" key value actual fail=0
while IFS='=' read -r key value; do
[[ -z "${key// }" || "${key:0:1}" == "#" ]] && continue
key="${key// /}"; value="${value# }"
actual=$(sysctl -n "$key" 2>/dev/null)
if [[ "$actual" != "$value" ]]; then
printf 'DRIFT %s: expected %s got %s\n' "$key" "$value" "$actual"
fail=1
fi
done < "$manifest"
return "$fail"
}
# manifest format:
# vm.swappiness = 1
# net.ipv4.ip_forward = 1
audit_sysctl_manifest /etc/myapp/sysctl-baseline.conf || exit 1
Recipe 4: Detect container vs host
# Container detection from /proc.
detect_container() {
if [[ -f /.dockerenv ]]; then echo docker; return; fi
if grep -qa 'kubepods\|docker' /proc/1/cgroup 2>/dev/null; then echo container; return; fi
if [[ "$(awk -F/ '$2=="systemd" {print $NF}' /proc/1/cgroup 2>/dev/null)" != "$(hostname)" ]]; then
# cgroup path differs from hostname-named systemd scope: probably container
:
fi
# Compare PID 1's mount namespace to the host's (won't work inside container).
# Better: check PID 1's parent. Host has none; container's PID 1 is /sbin/init or app.
if [[ "$(proc_name 1)" =~ ^(systemd|init)$ ]]; then echo host; else echo container; fi
}
/proc/1/cgroup contains the cgroup path; in containers it usually mentions docker, kubepods, or lxc. This is far more reliable than [[ -f /.dockerenv ]] (which Docker can hide).
Recipe 5: Read scheduler stats for a tight-loop process
# /proc/$pid/sched has cumulative scheduler stats.
sched_summary() {
local pid="$1"
awk '
/se.sum_exec_runtime/ { runtime = $3 }
/se.statistics.wait_sum/ { wait = $3 }
/nr_voluntary_switches/ { vol = $3 }
/nr_involuntary_switches/{ invol = $3 }
END {
printf "runtime_ms=%.1f wait_ms=%.1f vol_switches=%s invol_switches=%s\n",
runtime, wait, vol, invol
}' "/proc/$pid/sched"
}
invol_switches rising fast = the process is being preempted by other CPU-hungry processes. wait_ms rising = the process is waiting in the run queue. Useful diagnostic when “the app is slow but CPU isn’t pegged.”
Footgun List
-
/proc/$pidis racy. A process can exit between your[[ -d /proc/$pid ]]and yourcat /proc/$pid/status. Always handle “file vanished” gracefully. -
/proc/$pid/cmdlineuses NUL separators, not spaces.catshows them squished together. Usetr '\0' ' 'for human display, orxargs -0to parse. -
/proc/sys/kernel/perf_event_paranoiddefaults restrict perf for non-root. If your script invokesperf, expect to need root or a tunedperf_event_paranoid. -
sysctl --systemreloads ALL /etc/sysctl.d files. If a stale file declares something destructive,--systemwill apply it. Audit periodically. -
/etc/sysctl.confis loaded by some distros and ignored by others. Use/etc/sysctl.d/*.confonly for portability. -
Some sysctl values are clamped silently.
sysctl -w net.core.rmem_max=999999999may set a smaller value than requested. Always re-read after writing. -
/syswrites can require specific timing. Writing to/sys/.../schedulerwhile the device is busy may fail withEBUSY. Stop I/O first if possible. -
/proc/$pid/smapsis expensive to read. It walks the process’s entire VM. On large processes (multi-GB heaps), a singlecat smapscan take seconds and cause scheduling glitches. Preferstatmfor cheap memory snapshots. -
readlink /proc/$pid/exemay say(deleted)if the process’s binary was upgraded after the process started. The pattern/usr/bin/myapp (deleted)means “restart this process to pick up the new binary.” -
Per-PID files are subject to ptrace_scope hardening. With
kernel.yama.ptrace_scope=2or 3, even root may needCAP_SYS_PTRACEto read/proc/$pid/environor/proc/$pid/maps. Surface a clear error in scripts that depend on these. -
Inside containers,
/proc/sys/...is largely read-only or namespaced. Don’t assume sysctl writes from inside a container will persist; manynet.*andvm.*are host-only. -
/sys/class/net/eth0/statistics/rx_bytesis a 64-bit counter that may wrap. On 1 Gbps interfaces it’s effectively unwrapping for years, but on 100 Gbps interfaces it can wrap in <1 day. Use deltas and handle wrap-around if your tooling runs long.
Quick-Reference Card
┌─ /proc/$pid/ — PER-PROCESS STATE ─────────────────────────────────────┐
│ status human KV: name, state, uid, ppid, threads, mem │
│ stat single line, machine-parseable │
│ statm memory in pages: size, resident, shared, ... │
│ cmdline NUL-separated argv │
│ environ NUL-separated env (root/owner only) │
│ fd/ open file descriptors (symlinks) │
│ maps memory map: ranges, perms, mapped files │
│ smaps detailed per-mapping accounting (slow on big procs) │
│ ns/ namespaces (net, mnt, pid, user, uts, ipc, cgroup) │
│ io bytes read/written │
│ limits rlimits │
│ sched scheduler statistics │
└────────────────────────────────────────────────────────────────────────┘
┌─ GLOBAL /proc ENTRIES ────────────────────────────────────────────────┐
│ /proc/cpuinfo, /proc/meminfo, /proc/loadavg │
│ /proc/mounts, /proc/swaps, /proc/diskstats │
│ /proc/net/{tcp,udp,unix,dev,route,arp} │
│ /proc/sys/... sysctl tunables exposed as files │
│ /proc/version, /proc/cmdline (kernel boot args) │
└────────────────────────────────────────────────────────────────────────┘
┌─ /sys — DEVICE / DRIVER ─────────────────────────────────────────────┐
│ /sys/class/net/<iface>/{operstate,mtu,statistics/} │
│ /sys/class/block/<dev>/{size,queue/scheduler} │
│ /sys/devices/system/cpu/<n>/cpufreq/scaling_governor │
│ /sys/class/thermal/thermal_zone*/temp │
│ /sys/module/<mod>/parameters/* runtime module params │
└────────────────────────────────────────────────────────────────────────┘
┌─ sysctl LIFECYCLE ────────────────────────────────────────────────────┐
│ sysctl -a dump all │
│ sysctl -n KEY read one (no key= prefix) │
│ sysctl -w KEY=VAL runtime-only set │
│ sysctl --system reload /etc/sysctl.d/*.conf │
│ Persist by writing /etc/sysctl.d/99-NAME.conf │
│ Files read in lex order; later wins │
└────────────────────────────────────────────────────────────────────────┘
┌─ HIGH-VALUE TUNABLES ─────────────────────────────────────────────────┐
│ vm.swappiness 0–10 for DB; 60 default │
│ vm.overcommit_memory 1=always; 2=strict (paranoid) │
│ net.core.somaxconn 4096+ for high-RPS │
│ net.ipv4.tcp_rmem/wmem "4096 87380 16777216" for big BDP │
│ fs.file-max 2097152 for fd-heavy hosts │
│ fs.inotify.max_user_watches 524288+ for k8s/IDE hosts │
│ kernel.pid_max 4194304 for high-fork │
└────────────────────────────────────────────────────────────────────────┘
What’s Next
You can now read process and kernel state from /proc and /sys, and tune the kernel via sysctl. The next layer is integration with container and cluster tooling: how shell scripts safely interact with docker, podman, and kubectl. The next lesson, Container Interactions: docker/podman exec, kubectl Pipelines & jq-Driven Inspection, covers script-driven container lifecycle, log collection, exec with proper stdin/tty handling, and parsing kubectl JSON output with jq for automation.