Flow logs tell you a 5-tuple talked, moved some bytes, and got an ACCEPT or REJECT. They are sampled, aggregated, and L3/L4-only — enough for “who talked to whom,” useless for “why did this connection stall after the handshake.” When an application team swears the network is dropping their traffic and the network team swears every rule allows it, both can be right at the flow-log layer and the truth is three layers down: a mid-stream RST, a retransmit storm, a 1500-byte packet vanishing into a tunnel that only carries 1400, or a return path that gets dropped by a stateful firewall that never saw the SYN. None of that is visible in aggregate telemetry. You have to see the packets.
This guide is the layer below flow logs: on-demand packet capture, continuous traffic mirroring (VTAP), and the exact filters and trace-reading skills that turn a pcap into a root cause.
1. Know what only packets can tell you
Before reaching for capture, be honest about what it costs and what it uniquely buys. Capture is expensive: storage, analysis time, and on continuous mirroring a real duplicate-traffic bill. Reach for it when the failure class is one of these, because nothing else will show it:
| Symptom | What flow logs show | What only the pcap shows |
|---|---|---|
| Connection establishes then stalls | ACCEPT, normal byte counts |
Mid-stream RST, or TCP Zero Window from a stuck receiver |
| Slow throughput, no errors | Bytes transferred, no verdict change | Retransmissions, dup ACKs, SACK ranges, RTO backoff |
| Large requests fail, small succeed | Both look like ACCEPT |
1500-byte frames dropped, ICMP frag needed blocked (PMTUD black hole) |
| Intermittent resets under a firewall | Often no record at all | SYN goes out one path, SYN-ACK returns another -> state drop |
| App-reported timeout | Connection looks healthy | Handshake never completes; SYN retransmitted with no SYN-ACK |
The pattern: flow logs are a verdict on connections; packets are a record of conversations. When the failure lives in the conversation’s timing, sequencing, or framing, you need the packets.
2. On-demand capture: Azure Network Watcher
Azure’s packet capture runs on the VM via the AzureNetworkWatcherExtension (Windows and Linux), so it captures at the guest NIC and writes a .cap file to a storage account or local disk. No appliance, no mirroring session — good for a targeted, time-boxed grab.
Always scope it. An unfiltered capture on a busy host fills storage in minutes and buries the signal. Use a packet-count or byte limit, a time limit, and a capture filter:
# Capture only traffic to/from the suspect backend on 443,
# truncate each packet to 128 bytes (headers only), cap at 100 MB / 300s.
az network watcher packet-capture create \
--resource-group rg-prod \
--vm vm-app-01 \
--name pc-app01-rst-investigation \
--storage-account stnwcaptures \
--time-limit 300 \
--total-bytes-per-session 104857600 \
--bytes-to-capture-per-packet 128 \
--filters '[{"protocol":"TCP","remoteIPAddress":"10.4.2.50","remotePort":"443"}]'
Two parameters do the heavy lifting. --bytes-to-capture-per-packet 128 (snap length / truncation) keeps only the L2-L4 headers and the first bytes of payload — for diagnosing RSTs and retransmits you never need the body, and truncation can cut capture size by 10x or more. --filters is a BPF-style allowlist applied in the extension, so non-matching packets are never written. Check status and download when done:
az network watcher packet-capture show-status \
--resource-group rg-prod --name pc-app01-rst-investigation \
--query "packetCaptureStatus"
# Running -> Stopped when the time/byte limit is hit
The output .cap opens directly in Wireshark or tcpdump -r.
3. On-demand capture: AWS VPC Traffic Mirroring to a capture target
AWS has no in-guest capture extension. Instead it mirrors at the ENI level: a mirror session copies packets from a source ENI, wraps them in VXLAN, and sends them to a mirror target (an ENI, an NLB, or a Gateway Load Balancer endpoint). For an on-demand grab, point the target at a small capture instance and run tcpdump there.
# 1. Filter: only the flows you care about, both directions.
aws ec2 create-traffic-mirror-filter --description "app-to-backend-443"
# returns tmf-0abc...; add rules for ingress and egress
aws ec2 create-traffic-mirror-filter-rule \
--traffic-mirror-filter-id tmf-0abc123 \
--traffic-direction ingress --rule-number 10 \
--rule-action accept --protocol 6 \
--destination-port-range FromPort=443,ToPort=443 \
--source-cidr-block 0.0.0.0/0 --destination-cidr-block 0.0.0.0/0
aws ec2 create-traffic-mirror-filter-rule \
--traffic-mirror-filter-id tmf-0abc123 \
--traffic-direction egress --rule-number 10 \
--rule-action accept --protocol 6 \
--source-port-range FromPort=443,ToPort=443 \
--source-cidr-block 0.0.0.0/0 --destination-cidr-block 0.0.0.0/0
# 2. Target = the capture instance's ENI.
aws ec2 create-traffic-mirror-target \
--network-interface-id eni-0capture999 \
--description "pcap-collector" # returns tmt-0...
# 3. Session ties source ENI -> target, with VXLAN VNI and packet length.
aws ec2 create-traffic-mirror-session \
--network-interface-id eni-0source111 \
--traffic-mirror-target-id tmt-0def456 \
--traffic-mirror-filter-id tmf-0abc123 \
--session-number 1 \
--virtual-network-id 42 \
--packet-length 128
--packet-length 128 is the AWS equivalent of snap length: it truncates mirrored packets to headers, cutting mirror bandwidth and storage. On the collector, decapsulate VXLAN (UDP 4789) before reading the inner packet:
# Mirrored frames arrive VXLAN-encapsulated on UDP 4789.
# tcpdump understands VXLAN; strip it to see the inner conversation.
sudo tcpdump -i eth0 -n 'udp port 4789' \
-w /tmp/mirror.pcap -s 0
# Then in Wireshark: it auto-dissects VXLAN and shows the inner TCP stream.
Sizing note: the mirror target must absorb a copy of source traffic on top of its own. An NLB or GWLB target spreads the load across an Auto Scaling group of analyzers; a single capture ENI is fine for a short investigation but will drop packets if the source is hot. AWS also enforces a per-instance mirror-source limit, so you cannot mirror every ENI in a busy subnet from one place.
4. Continuous mirroring: a VTAP feed to an out-of-band appliance
On-demand capture answers “what is happening right now.” For recurring or hard-to-reproduce issues you want a standing feed — every packet from a set of sources duplicated to an analysis appliance (a tshark farm, an IDS, an NDR tool) that lives out of band, so analysis load never touches the production path. Both clouds do this with VXLAN encapsulation. The design that scales:
- Sources: the ENIs/NICs under investigation, selected by tag or subnet.
- Target: a load balancer (AWS NLB/GWLB, or Azure’s GWLB) fronting an autoscaled fleet of collectors, never a single box.
- Filter: a standing allowlist that mirrors only the protocols/ports you actually analyze. Mirroring everything doubles your data-plane bandwidth and the bill with it.
- Truncation: keep
packet-length/snap at headers unless you are doing payload forensics.
On AWS the same create-traffic-mirror-session calls apply, with the target set to an NLB or GWLB endpoint instead of a bare ENI. On Azure, the equivalent is the Virtual Network TAP (VTAP) feature, which configures a NIC to stream a copy of its traffic to a collector behind an internal load balancer. Conceptually identical: source NIC, VXLAN encap, LB-fronted collector pool. Whichever cloud, two rules keep a continuous mirror from becoming an incident of its own:
- The collector path must be on its own subnet and route table, isolated from production, so a misbehaving analyzer cannot blackhole real traffic.
- The mirror filter is a deny-by-default allowlist. You add ports as investigations require them and remove them after. Standing “mirror all” sessions are how teams accidentally 2x their inter-AZ data-transfer bill.
5. Capture without drowning: filters, truncation, ring buffers
The skill that separates a useful capture from a useless 40 GB file is restraint. Three levers, applied at the source whenever possible:
Capture filters (BPF), not display filters. A capture filter discards packets before they are written; a display filter only hides them after. On a busy host the difference is gigabytes. Examples that cover most investigations:
# Only traffic to/from one host on one port, both directions:
sudo tcpdump -i eth0 'host 10.4.2.50 and tcp port 443' -w /tmp/cap.pcap
# Only TCP control packets (SYN/FIN/RST) - tiny file, perfect for
# proving handshake failures and resets without payload at all:
sudo tcpdump -i eth0 'tcp[tcpflags] & (tcp-syn|tcp-fin|tcp-rst) != 0' -w /tmp/ctl.pcap
# Only ICMP - for MTU / fragmentation-needed diagnosis:
sudo tcpdump -i eth0 'icmp' -w /tmp/icmp.pcap
Truncation (snap length). -s 128 writes only the first 128 bytes per packet. For TCP-state debugging that is all you need; it slashes file size and sidesteps capturing sensitive payloads.
Ring buffers for intermittent issues. When the failure is rare, you cannot sit watching. Run a rotating capture and keep only the last N files, so when the symptom fires you grab the relevant ring:
# 20 files x 50 MB, oldest overwritten. With a snap length and a tight
# filter, this runs for hours in ~1 GB total. Stop it the moment the
# user reports the failure; the bug is in the most recent file(s).
sudo tcpdump -i eth0 -s 128 'host 10.4.2.50 and tcp port 443' \
-w /tmp/ring.pcap -C 50 -W 20
For unattended capture-on-condition, dumpcap (ships with Wireshark) is more robust than tcpdump for long-running rings and is the recommended capture engine for exactly this.
6. Reading the trace: RST, retransmits, zero-window, handshake failures
Open the pcap in Wireshark and let Expert Information (Analyze > Expert Information) do the first pass — it flags retransmissions, dup ACKs, zero windows, and resets automatically. Then narrow with display filters. The four signatures that explain most “the network is broken” tickets:
Mid-stream reset. The handshake completed, data flowed, then one side sent RST. That is almost never the network — it is an application, a load balancer idle timeout, or a stateful firewall expiring the connection.
tcp.flags.reset == 1
Look at who sent it and when. An RST from the server right after a quiet period points at an idle-timeout (firewall or LB). An RST immediately after a request points at the app rejecting it.
Retransmissions and dup ACKs. Packet loss or a congested/black-holed path.
tcp.analysis.retransmission || tcp.analysis.fast_retransmission || tcp.analysis.duplicate_ack
A few retransmits are normal. A wall of them, especially with the sequence numbers climbing slowly and RTO backoff doubling the gap between attempts, means real loss somewhere in the path.
Zero window. The receiver advertised a window of 0 — its buffer is full and it is telling the sender to stop. This is a receiver problem (a stuck or overloaded application not reading the socket), not a network one, and it is one of the most commonly misdiagnosed.
tcp.analysis.zero_window || tcp.analysis.window_full
Handshake never completes. The SYN goes out, gets retransmitted (Wireshark marks [TCP Retransmission] on the SYN), and no SYN-ACK ever arrives. The SYN is reaching nothing, or the reply is being dropped on the return path. This is the classic asymmetric-routing / firewall-state signature (Step 8).
tcp.flags.syn == 1 && tcp.flags.ack == 0
If you see repeated SYNs with no matching SYN-ACK in the same capture point, capture at the other end too — the asymmetry is usually revealed by the fact that the SYN-ACK exists on the server’s NIC but never makes it back.
7. MTU and fragmentation black holes across tunnels
This is the failure that drives engineers to madness: SSH connects, ping works, small HTTP requests succeed — but large POSTs, TLS with big certificate chains, or bulk transfers hang forever. The cause is almost always MTU. A tunnel (IPsec, VXLAN, GRE, WireGuard) adds encapsulation overhead, so the usable MTU inside it is below 1500. When a host sends a full 1500-byte frame with the Don’t Fragment bit set, the tunnel cannot forward it and is supposed to return an ICMP “fragmentation needed” (type 3, code 4) carrying the correct MTU. Path MTU Discovery (PMTUD) depends on that ICMP getting back to the sender. If a firewall on the return path blocks ICMP — which far too many do by reflex — the sender never learns to shrink its packets, the large frames silently disappear, and you get a black hole: small packets through, large packets gone.
Diagnose it deterministically with a DF-set ping sweep. The largest payload that succeeds, plus 28 bytes (20 IP + 8 ICMP), is your real path MTU:
# Linux: -M do sets Don't Fragment; -s is the ICMP payload size.
# 1472 payload + 28 = 1500. Walk it down until it succeeds.
ping -M do -s 1472 10.4.2.50 # fails over a 1400-MTU tunnel
ping -M do -s 1372 10.4.2.50 # 1372 + 28 = 1400 -> succeeds => PMTU ~1400
# Windows equivalent: -f sets DF, -l sets payload size.
ping -f -l 1472 10.4.2.50
ping -f -l 1372 10.4.2.50
Confirm in the pcap that the ICMP feedback is actually being delivered (or not):
icmp.type == 3 && icmp.code == 4
If that ICMP is absent at the sender but you can prove the tunnel device is generating it, a firewall is eating it — fix the firewall to allow ICMP type 3. The robust application-layer fix for TCP specifically is MSS clamping: have the tunnel router rewrite the TCP MSS option during the handshake so endpoints never send segments larger than the tunnel can carry, making the connection immune to broken PMTUD.
# Linux router: clamp advertised MSS to the tunnel's path MTU.
# Belt-and-suspenders "clamp to PMTU" lets the kernel derive it per route.
iptables -t mangle -A FORWARD -o tun0 -p tcp --tcp-flags SYN,RST SYN \
-j TCPMSS --clamp-mss-to-pmtu
For a 1400-byte tunnel MTU, the corresponding clamp is MSS 1360 (1400 - 20 IP - 20 TCP). Set it explicitly with --set-mss 1360 if you cannot rely on PMTU derivation.
8. Proving asymmetric routing and the firewall state-drop it causes
Asymmetric routing means the forward path (client -> server) and the return path (server -> client) traverse different devices. The internet tolerates this fine. Stateful firewalls do not. A stateful firewall builds a connection-tracking entry when it sees the SYN; if the SYN-ACK comes back through a different firewall (or the same firewall has no state because the SYN went elsewhere), the return packet is “out of state” and silently dropped. The result is a handshake that never completes — exactly the Step 6 signature — but only for the flows whose paths happen to split.
You cannot prove this from one capture point. You need two simultaneous captures, one at the client edge and one at the server edge, and you compare:
# Capture point A (client side): do we see the SYN-ACK come back?
sudo tcpdump -i eth0 -n 'tcp port 443 and tcp[tcpflags] & (tcp-syn|tcp-ack) != 0' \
-w /tmp/clientside.pcap
# Capture point B (server side): does the server send a SYN-ACK at all,
# and via which next-hop/interface?
sudo tcpdump -i eth0 -n 'tcp port 443 and tcp[tcpflags] & (tcp-syn|tcp-ack) != 0' \
-w /tmp/serverside.pcap
The proof: the server’s capture shows it received the SYN and sent a SYN-ACK; the client’s capture shows the SYN-ACK never arrived. The packet existed and was lost in between — and when you trace the route tables, the return path for the client CIDR points at a different firewall than the one holding the forward-flow state. In cloud terms this is usually a route table sending return traffic to the wrong NVA, a missing route in a hub-and-spoke topology, or (with GWLB/appliance fleets) a per-AZ hairpin that does not match the egress AZ. The fix is always the same shape: make the return path symmetric so the same stateful device sees both directions — correct the offending route, or enable the platform’s appliance/symmetry feature.
Enterprise scenario
A platform team ran a hub-and-spoke Azure topology with a third-party NVA firewall fleet in the hub behind an internal load balancer, spokes peered to the hub, UDRs forcing all spoke egress through the NVA. Standard, well-tested, in production for a year. Then a new spoke onboarded a data-replication workload, and intermittently — perhaps one connection in five — the replication handshake to an on-prem target over ExpressRoute would hang and time out. Flow logs showed the SYN leaving the spoke with ACCEPT and then nothing: no return record, no REJECT, no error anywhere. Retries usually succeeded, which made everyone blame the on-prem firewall.
The constraint was that the NVA fleet was active-active behind a load balancer that hashed flows across two firewall instances. The forward SYN hashed to NVA-1 via the spoke UDR. But the return SYN-ACK arrived over ExpressRoute into the hub, and the hub’s route table for the spoke CIDR sent return traffic to the load balancer VIP — which re-hashed it and sometimes landed it on NVA-2, which had no connection state for that flow. NVA-2 dropped the out-of-state SYN-ACK. The “one in five” was exactly the flows whose return hash differed from their forward hash; retries worked when a retry happened to re-hash consistently.
Two simultaneous captures nailed it: the spoke-side capture showed SYNs retransmitted with no SYN-ACK; a capture on NVA-2 showed inbound SYN-ACKs for connections it had never seen a SYN for. The fix was to enable the load balancer’s floating IP with source-IP-port session persistence (5-tuple, bidirectional) consistency so forward and return of the same flow always pin to the same NVA instance — restoring symmetry at the firewall layer:
# Make forward + return of a flow land on the SAME NVA instance.
# Azure internal LB rule: 5-tuple distribution is the default, but the
# return path must use the same rule; here we pin distribution explicitly.
az network lb rule update \
--resource-group rg-hub --lb-name ilb-nva \
--name rule-allports \
--load-distribution SourceIPProtocol # source IP + protocol affinity
The durable lesson: an active-active stateful appliance fleet is only correct if both directions of every flow hit the same instance. A load balancer in front of stateful firewalls must be configured for flow affinity, or it will silently split a fraction of connections and produce exactly this “intermittent, retry-fixes-it, invisible in logs” signature. The packets were the only thing that could see it.
Verify
Confirm each capability before you trust a capture in an incident.
1. Azure capture extension is present and the capture ran:
az network watcher packet-capture show-status \
--resource-group rg-prod --name pc-app01-rst-investigation \
--query "{status:packetCaptureStatus, reason:stopReason}"
# expect status: Stopped, reason: TimeExceeded or BytesExceeded (a clean stop)
2. AWS mirror session is active and pointed at the right target:
aws ec2 describe-traffic-mirror-sessions \
--query "TrafficMirrorSessions[].{src:NetworkInterfaceId, tgt:TrafficMirrorTargetId, len:PacketLength, vni:VirtualNetworkId}"
# expect your source ENI, the collector target, the truncation length, and VNI
3. The collector actually receives mirrored frames (not silence):
# On the collector: you should see VXLAN on 4789 carrying inner TCP.
sudo tcpdump -i eth0 -c 20 -n 'udp port 4789'
# zero packets = filter too tight, wrong target, or source not hot
4. PMTUD path MTU is what you think it is:
ping -M do -s 1472 <remote> # if this fails but -s 1372 succeeds, MTU < 1500
5. Symmetry: the return packet exists somewhere. Two captures, client edge and server edge. If the server sent a SYN-ACK that the client never received, you have asymmetry or an in-path drop — and now you know which hop to fix.
6. The capture itself is readable and complete:
# capinfos summarizes a pcap: packet count, drops, duration, truncation.
capinfos /tmp/cap.pcap | grep -E 'Number of packets|Packet size limit|Capture duration'
# "Packet size limit" confirms your snap length; a sudden zero-packet
# tail means the ring rotated past the event - widen the ring next time.