When a host is compromised, the question your responders ask is never “was the firewall on” — it is “what did the process do, as which user, to which file, and where did the bytes go.” Answering that needs a tamper-evident record of kernel-level activity, and on Linux that record has two complementary sources. auditd is the kernel’s own audit subsystem: authoritative, syscall-accurate, and the only thing a FIPS or PCI assessor accepts as the system of record. eBPF tooling layers on top: low-overhead, context-rich, and able to resolve the container ID and Kubernetes pod that the raw audit log knows nothing about. This guide builds both, wires them together, and ships the result somewhere you can query it.
If you take one thing away:
auditdis your compliance system of record and eBPF is your detection layer. They are not redundant. The audit log proves what the kernel saw; the eBPF layer tells you what it meant — which container, which image, which network peer — at a fraction of the per-event cost.
1. The auditd architecture: kernel, daemon, dispatcher
The Linux audit subsystem is three pieces, and confusing them is the source of most misconfiguration:
- The kernel audit layer. A ring buffer inside the kernel where audit records are generated. It is driven by rules loaded via the netlink socket. Nothing in userspace generates these events — the kernel does, at syscall boundaries and via the LSM hooks.
auditd, the daemon. Reads records off the netlink socket and writes them to/var/log/audit/audit.log. It owns log rotation, disk-full behaviour, and the decision of what to do when the kernel produces events faster than they can be written.- The dispatcher and plugins. A multiplexor that fans each record out to real-time consumers — syslog, a remote collector, an IDS. Historically this was a separate
audispdprocess; as of audit 3.x the dispatcher is merged intoauditditself, and plugin configs live in/etc/audit/plugins.d/. The legacy/etc/audisp/path is gone on modern RHEL 9 / Ubuntu 22.04+.
Confirm the daemon and version before you touch anything:
sudo auditctl -s # status: enabled, pid, backlog, lost, rate limits
auditctl -v # audit userspace version (expect 3.x on current distros)
sudo systemctl status auditd
A note that trips people up: on RHEL, auditd is deliberately not restartable with systemctl restart auditd — the unit is masked from restart to avoid losing the netlink subscription mid-flight. Use service auditd restart, or reload rules without restarting the daemon at all, which is what you will do 99% of the time.
The auditctl -s output is the single most important diagnostic. Two fields matter:
lost— records the kernel dropped because the backlog was full. Any non-zero value here means your audit trail has holes. This is a finding, not a warning.backlog/backlog_limit— the in-kernel queue depth and its ceiling. The defaultbacklog_limitof 64 is far too small for a busy host; you will set it to 8192 or higher below.
2. Authoring syscall, file watch, and execve rules
Rules come in two flavours. File watches (-w) trigger when a path is accessed with given permissions. Syscall rules (-a) trigger when a specific system call is made, optionally filtered by arguments. You can load them live with auditctl, but anything you want to survive a reboot belongs in a file under /etc/audit/rules.d/, which augenrules compiles into the active rule set.
Start with the structure of the rules directory. Files are concatenated in lexical order, so the naming convention is numeric prefixes:
ls /etc/audit/rules.d/
# 10-base-config.rules 30-custom.rules 99-finalize.rules
File watches
Watch the files whose modification is always suspicious — the identity store, the sudoers policy, the SSH and PAM configuration:
# /etc/audit/rules.d/30-custom.rules
## Identity and authorisation
-w /etc/passwd -p wa -k identity
-w /etc/shadow -p wa -k identity
-w /etc/group -p wa -k identity
-w /etc/sudoers -p wa -k privesc
-w /etc/sudoers.d/ -p wa -k privesc
## Remote access and auth stack
-w /etc/ssh/sshd_config -p wa -k sshd
-w /etc/pam.d/ -p wa -k pam
-p wa audits write and attribute changes (mode, owner, xattrs) but not reads — auditing reads of /etc/passwd would bury you in noise since every getent touches it. The -k key is a free-text tag; it is the single most useful field you will set, because it is how you slice the log later with ausearch -k.
Syscall rules
Syscall rules are where the real coverage lives. The canonical example is auditing privilege escalation — any setuid/setgid family call that succeeds and was made by a real user (UID >= 1000), which is the signature of a user gaining root:
## Privilege escalation: successful setuid by an unprivileged user
-a always,exit -F arch=b64 -S setuid -F auid>=1000 -F auid!=unset -F exit=0 -k privesc
-a always,exit -F arch=b64 -S setgid -F auid>=1000 -F auid!=unset -F exit=0 -k privesc
## Loading/unloading kernel modules (classic rootkit vector)
-a always,exit -F arch=b64 -S init_module,finit_module,delete_module -k modules
## Time changes (anti-forensics: rolling the clock to confuse timelines)
-a always,exit -F arch=b64 -S adjtimex,settimeofday,clock_settime -k time-change
Decode the syntax, because every token is load-bearing:
| Token | Meaning |
|---|---|
-a always,exit |
Add a rule on the exit of the syscall, always recording |
-F arch=b64 |
Filter to 64-bit syscalls. You must also add a b32 rule if 32-bit binaries can run, or an attacker evades you by using the 32-bit ABI |
-S setuid |
The syscall name (multiple comma-separated names allowed on one rule) |
-F auid>=1000 |
The audit UID — the login identity, immutable across su/sudo. This is the field that survives privilege drops |
-F auid!=unset |
Exclude kernel threads and daemons started before login (audit UID of 4294967295) |
-F exit=0 |
Only successful calls |
The
auid(audit/login UID) is the most important concept in audit rule design. Real UID changes when yousudo;auiddoes not. It is set at login and is the only reliable way to attribute a root action back to the human who initiated the session.
execve: the process-execution firehose
Auditing every process execution gives you a complete command-line history of the box. It is also the highest-volume rule you will write, so treat it deliberately:
## Every program execution, both ABIs, attributed to the login user
-a always,exit -F arch=b64 -S execve -F auid>=1000 -F auid!=unset -k exec
-a always,exit -F arch=b32 -S execve -F auid>=1000 -F auid!=unset -k exec
Filtering on auid>=1000 here is the difference between a usable log and a 50 GB/day fire hose: it drops the thousands of executions from systemd, cron, and package management that you do not care about, and keeps the interactive activity that you do.
Load the rules and verify they compiled:
sudo augenrules --load # compile rules.d/ into the active set, persistently
sudo auditctl -l # list the loaded rules (should match your files)
3. Mapping rules to a baseline and reducing noise
Do not hand-write a rule set from scratch. Two maintained baselines exist and both are correct starting points:
- The Linux Audit project’s
rules/directory ships a layered set keyed to STIG, PCI-DSS, and NISPOM. On RHEL these are under/usr/share/audit/sample-rules/. - The CIS Benchmark for your distro specifies an audit rule section nearly verbatim; the
30-pci-dss-v31.rulessample maps almost one-to-one.
The practical workflow is to copy the baseline, then subtract noise — never add coverage you cannot afford to read. The two ordering rules that govern the entire file:
-Dfirst, finalize last. Start with-D(delete all rules) in10-base-config.rulesand end with-e 2in99-finalize.rules.-e 2makes the rule set immutable until reboot — an attacker with root cannot then unload your auditing without a reboot, which itself is an audited, noisy event.- First match wins for exclusions. Audit evaluates rules top-to-bottom and stops at the first match. To suppress noise, put an exclude rule before the broad rule that would otherwise catch it.
The single most effective noise reduction is the exclude filter, which drops whole record types before they are ever written. The textbook case is CWD (current-working-directory) records, which double the volume of every execve event for marginal forensic value:
## Drop the noisiest record types entirely
-a always,exclude -F msgtype=CWD
-a always,exclude -F msgtype=CONFIG_CHANGE -F auid=unset
For service accounts that legitimately make audited syscalls in a tight loop, exclude that specific actor rather than disabling the rule globally:
## A backup agent that legitimately walks the filesystem all night
-a never,exit -F arch=b64 -F auid=991 -F dir=/srv/backup -k backup-exclude
Resist the urge to fix volume by raising
auidthresholds or deleting rules wholesale. Every rule you drop is a control an assessor will ask you to justify. Use targetedexclude/neverrules with a-kkey so the suppression itself is documented and greppable.
4. Parsing records with ausearch and aureport
The raw audit.log is deliberately machine-oriented — a single logical event is split across multiple lines sharing an event ID, with values hex-encoded. Never grep it directly. The two tools that exist for this are ausearch (find and reassemble events) and aureport (summarise).
ausearch reassembles the multi-line event and, with -i, interprets the raw values — resolving UIDs to names, syscall numbers to names, and hex-encoded paths back to text:
# Everything tagged 'privesc' in the last 24h, interpreted, human time
sudo ausearch -k privesc -i --start recent
# All execve events for a specific login user since boot
sudo ausearch -k exec -ui 1000 -i --start boot
# Anything touching a specific file, by full path
sudo ausearch -f /etc/shadow -i
A reassembled, interpreted execve event looks like this — note that auid survived the sudo, naming the human responsible:
type=PROCTITLE ... proctitle=cat /etc/shadow
type=SYSCALL ... arch=x86_64 syscall=execve success=yes exit=0
ppid=4120 pid=4188 auid=alice uid=root gid=root
euid=root ... comm="cat" exe="/usr/bin/cat" key="exec"
aureport turns the log into ranked summaries — the report you run at the start of an investigation to find where to point ausearch:
sudo aureport --summary -i # overall event counts by type
sudo aureport --auth --summary -i # authentication attempts, pass/fail
sudo aureport -x --summary -i # executables, ranked by execution count
sudo aureport --failed -i # everything that returned an error
Chain them: aureport finds the anomalous executable; you pivot to the raw events with ausearch. For ad-hoc filtering, ausearch --format csv or --format text produces output you can pipe into awk or load into a notebook without writing a log parser.
5. Why eBPF complements auditd
auditd is authoritative but has structural limits that matter at scale:
- Cost. Broad syscall rules — unfiltered
execveor anyconnect/acceptauditing — add measurable per-syscall overhead, because each matching event takes a trip through the netlink path and a synchronous write. On a busy host this is real CPU and real I/O. - No container awareness. The kernel audit record knows the PID and mount namespace, but not the container ID, image, or Kubernetes pod. In a containerised estate the raw log is nearly unusable without external enrichment.
- Coarse network visibility. Auditing socket syscalls tells you a
connect()happened; it does not give you a clean, correlated flow record.
eBPF closes all three gaps. A program attached to a kprobe or tracepoint runs in the kernel, filters and aggregates before any data crosses to userspace, and carries the full process and cgroup context — exactly where container and pod identity live. The result is an order-of-magnitude lower cost for high-cardinality signals like every exec and every connection, plus enrichment the audit subsystem cannot provide.
The division of labour is clean: keep auditd as the compliance-grade, file-and-privesc record (low volume, high assurance, immutable), and let eBPF own the high-volume runtime signals (exec, connect, file-open by container) where its cost profile and context win.
6. Deploying an eBPF runtime tool
Two production-grade options dominate. Falco (CNCF graduated) is the incumbent — a rules engine over syscalls with a large community rule set. Tetragon (from Cilium) is the newer entrant, built on the same eBPF foundation as Cilium, with first-class Kubernetes identity and in-kernel enforcement. I will show Tetragon for the Kubernetes-native shape and Falco for the rules-driven host shape; pick one.
Tetragon
Tetragon ships as a DaemonSet and, with zero policy, immediately emits enriched process-execution events:
helm repo add cilium https://helm.cilium.io
helm install tetragon cilium/tetragon -n kube-system
# Stream enriched exec/exit events, decoded
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact
You extend it with a TracingPolicy — a CRD that attaches eBPF probes to kernel functions or syscalls and filters in-kernel. This one observes writes to sensitive files and, critically, runs the filter in the kernel so userspace only ever sees the matches:
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: "monitor-sensitive-writes"
spec:
kprobes:
- call: "security_file_permission"
syscall: false
args:
- index: 0
type: "file"
- index: 1
type: "int"
selectors:
- matchArgs:
- index: 0
operator: "Prefix"
values:
- "/etc/passwd"
- "/etc/shadow"
- index: 1
operator: "Equal"
values:
- "4" # MAY_WRITE
Hooking the LSM hook security_file_permission rather than the raw write syscall is the correct choice: it fires regardless of how the write was issued and gives you the resolved file argument directly. Tetragon can additionally take a matchActions of Sigkill to enforce in-kernel, but treat enforcement as a separate, carefully-staged rollout — start in observe-only.
Falco
Falco is the better fit when you want a portable rules engine on bare hosts. Install it with the modern eBPF driver (no kernel module, no compilation):
# Run the official image with the CO-RE eBPF probe
docker run --rm -i -t \
--privileged \
-v /proc:/host/proc:ro \
-v /etc:/host/etc:ro \
-e FALCO_DRIVER=modern_ebpf \
falcosecurity/falco:latest
A Falco rule is a condition over syscall fields plus an output template. This one alerts on a shell spawned inside a container — the textbook “interactive process in a thing that should be immutable” signal:
- rule: Terminal shell in container
desc: A shell was spawned by a non-shell program in a container
condition: >
spawned_process and container
and shell_procs and proc.tty != 0
and not user_expected_terminal_shell_in_container_conditions
output: >
Shell spawned in container
(user=%user.name container=%container.id
image=%container.image.repository proc=%proc.cmdline)
priority: WARNING
tags: [container, shell, mitre_execution]
Falco’s value is the %container.id, %container.image.repository, and %k8s.pod.name fields — the enrichment auditd cannot produce — rendered straight into the alert.
7. Correlating process, network, and file events
A single event is rarely the alert. The signal is the sequence: a web process spawns a shell, the shell reads /etc/shadow, then opens an outbound connection to an unknown IP. Neither auditd nor a raw eBPF stream alerts on that chain by itself — you need a correlation layer keyed on a stable join field.
That join field is the process lineage: pid plus ppid, anchored by auid (from audit) or the cgroup/container ID (from eBPF). Both Tetragon and Falco emit the parent chain, which lets a downstream rule express the chain directly. The Falco condition for the canonical “reverse-shell precursor” — a network tool launched by a database or web server — looks like:
- rule: Unexpected outbound connection from server process
desc: A long-running service process initiated an outbound connection
condition: >
outbound and proc.pname in (nginx, postgres, mysqld)
and not fd.sip in (allowed_egress_ips)
output: >
Outbound from service process
(proc=%proc.name parent=%proc.pname
dest=%fd.sip:%fd.sport container=%container.id)
priority: CRITICAL
The pattern that scales is not to encode every correlation rule at the agent. Emit richly-attributed atomic events (each carrying pid/ppid/container.id/auid) from both auditd and eBPF, ship them to a central store, and run the stateful, multi-event correlation there — where you have the full timeline, can join across hosts, and can update detection logic without redeploying to 5,000 nodes.
8. Shipping to a SIEM and tuning for performance
For auditd, the dispatcher plugin is the shipping mechanism. To forward every record to syslog (and onward to your collector), drop a plugin config under /etc/audit/plugins.d/:
# /etc/audit/plugins.d/syslog.conf
active = yes
direction = out
path = /sbin/audisp-syslog
type = always
args = LOG_LOCAL6
format = string
Then point rsyslog/vector at local6 and forward off-box. For native log shipping, vector or fluent-bit both have a dedicated audit-log source that reassembles the multi-line records for you — strongly prefer that over tailing audit.log with a generic file input, which will hand your SIEM half-events.
Tune auditd itself so it never silently drops records under load. In /etc/audit/auditd.conf:
max_log_file = 50 # MB per file
num_logs = 10 # keep 10 rotations on disk
max_log_file_action = ROTATE
space_left = 1024 # MB; below this, run space_left_action
space_left_action = SYSLOG # warn, do not halt, on low disk
disk_full_action = SUSPEND # stop auditing, do NOT kill the host
flush = INCREMENTAL_ASYNC # async writes; the right perf/safety balance
And raise the kernel backlog so a burst does not cause lost events — set it in 99-finalize.rules before the -e 2 immutability line:
## Kernel audit tuning (place before -e 2)
-b 8192 # backlog_limit: in-kernel queue depth
--backlog_wait_time 0 # do not throttle syscalls waiting on backlog
-e 2 # make the rule set immutable until reboot
disk_full_actionis a genuine policy decision with no free answer.SUSPENDstops auditing but keeps the host serving — the right call for most workloads.HALTstops the host when it can no longer audit — mandated by some high-assurance regimes (and the reason an unmonitoredspace_leftonce took down a payments cluster). Choose deliberately and document why.
For eBPF, the performance lever is ring-buffer sizing and in-kernel filtering. Filter as early as possible (in the TracingPolicy selector or Falco condition) so events are dropped in the kernel, not shipped and discarded in userspace. Watch each agent’s drop counter — Falco exposes falco.n_drops, Tetragon exports event and map-pressure metrics on its Prometheus endpoint. A growing drop count means your buffers are too small for the event rate, and your detection has the same kind of holes a non-zero auditd lost does.
Verify
Prove the whole pipeline end to end, from rule load to enriched alert.
# 1. auditd is healthy and NOT dropping records
sudo auditctl -s
# -> enabled 2 (immutable), lost 0, backlog well under backlog_limit
# 2. Rules are loaded and immutable
sudo auditctl -l | head
# -> your rules; 'enabled 2' above confirms immutability
# 3. Trigger a watched event and confirm capture + attribution
sudo touch /etc/sudoers.d/zz-test && sudo rm /etc/sudoers.d/zz-test
sudo ausearch -k privesc -i --start recent | tail -20
# -> SYSCALL records with your auid, not just uid=root
# 4. Confirm aureport sees the executable activity
sudo aureport -x --summary -i | head
# 5. eBPF layer is emitting enriched events
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact | head
# -> process_exec events carrying pod/container identity
# 6. Neither agent is dropping under load
# auditd: 'lost 0' from step 1
# falco: curl -s localhost:8765/metrics | grep n_drops # if metrics enabled
If step 1 shows lost > 0 or step 6 shows a climbing drop count, stop and fix buffers before trusting any detection — a control with gaps is worse than no control, because it manufactures false confidence.
Enterprise scenario
A payments platform team ran the CIS Level 2 audit baseline across roughly 4,000 RHEL 9 nodes, half of them container hosts running a high-throughput order-matching service. The baseline included unfiltered execve and connect syscall auditing. Within a week of rollout, two things broke: the matching service’s p99 latency rose by 11% under peak load, and auditctl -s was reporting lost events in the tens of thousands per hour on the busiest nodes — meaning the very SOX-mandated trail they had deployed had holes precisely when it mattered most.
The root cause was structural: the synchronous netlink-plus-write path could not keep up with a service making millions of short-lived connect() calls per minute, so the kernel both throttled syscalls (the latency hit) and overran the 64-deep default backlog (the dropped records).
The constraint they could not relax was the compliance requirement itself — process execution and outbound connections had to be auditable on those exact hosts.
The fix was to split the workload across the two layers along the cost boundary. They kept auditd for the file, identity, and privesc rules — low volume, high assurance, immutable — and removed the broad execve and connect syscall rules from auditd entirely. Those high-cardinality signals moved to Tetragon, where filtering happens in the kernel before any userspace crossing, and the events arrive already tagged with the pod and image. To keep auditd honest under the residual load, they raised the backlog and made the disk-full behaviour explicit rather than defaulting to a host halt:
# 99-finalize.rules — tuned, with execve/connect moved to the eBPF layer
-b 16384 # backlog sized for the residual file/privesc load
--backlog_wait_time 0 # never throttle a syscall on a full backlog
-e 2 # immutable until reboot
Result: auditd lost returned to zero, p99 latency recovered to within 1% of the pre-rollout baseline, and the auditors accepted the design because both required signals remained continuously captured and attributable — execution and egress via Tetragon’s enriched stream, identity and privilege changes via the immutable kernel audit log. The lesson generalises: when a compliance baseline costs you availability, the answer is rarely to weaken the control — it is to move the expensive signal to the layer built to carry it cheaply.