Networking Azure

DDoS Protection in Production: Adaptive Tuning, Telemetry, and Attack Rehearsal

The worst time to learn how your DDoS mitigation behaves is during an attack. Yet that is the default posture for most teams: a checkbox got ticked, a plan got subscribed, and nobody has ever watched the mitigation engage or confirmed that an alert actually pages the on-call. Network-tier protection is not a product you buy; it is a control you operate. This walkthrough enables it correctly on both Azure and AWS, lets adaptive tuning learn real baselines, wires the under-attack signal into alerting, and then does the part everyone skips – runs a sanctioned simulation and reads the mitigation report so the whole chain is proven cold.

1. Threat model: who absorbs which layer

DDoS is not one attack; it is three families that hit different layers and are absorbed by different things. If you conflate them you will buy the wrong control and tune the wrong knob.

Family Examples OSI layer Absorbed by
Volumetric UDP/ICMP floods, amplification (DNS, NTP, memcached) L3 Provider backbone + infra protection (Shield, Azure DDoS)
Protocol / state-exhaustion SYN floods, fragmented packets, Slowloris-style half-opens L3/L4 Infra protection’s per-IP TCP/SYN policies
Application-layer HTTP GET/POST floods, expensive-query abuse, credential stuffing L7 WAF + rate limiting at the edge, never L3/L4

The hard rule that governs everything below: L3/L4 infrastructure protection cannot see inside HTTP. A flood of valid-looking GET requests at 50k RPS is invisible to Shield’s network mitigation and to Azure’s per-IP policies – the packets are well-formed and below volumetric thresholds. That traffic is absorbed only by a WAF doing rate-based rules. Conversely, a 600 Gbps UDP reflection flood never reaches your WAF; it is scrubbed at the provider edge long before. Match the control to the layer or you pay for a tier that protects against the wrong thing.

Both Azure DDoS Protection and AWS Shield Standard are always on and free at the network tier for every public endpoint. The paid tiers (Azure DDoS Network/IP Protection, AWS Shield Advanced) do not add raw scrubbing capacity you otherwise lack – they add adaptive per-resource tuning, attack telemetry, cost protection, and human response. You are buying observability and tuned thresholds, not a bigger pipe.

2. Always-on baseline vs paid tiers: what infra protection does and does not cover

Be precise about the line, because it drives the spend conversation.

The free baseline (Azure DDoS infrastructure protection, AWS Shield Standard) protects the shared platform against large volumetric and common protocol attacks at the provider edge. There is no per-resource tuning, no attack-specific telemetry, no SLA on your application’s availability during an attack, and no support engagement.

What the paid tier adds:

What neither tier does: protect Layer 7. That is always a separate WAF discussion (Section 5).

3. Enabling protection and associating frontends

Azure: DDoS Network Protection plan

Azure exposes two SKUs. Network Protection is a fixed monthly plan that covers up to 100 public IP resources across the tenant – the right choice once you have several protected VNets. IP Protection is pay-per-protected-public-IP, better for one or two endpoints. Both share the same mitigation engine and adaptive tuning; only the billing model and scope differ.

Create the plan and enable it on a virtual network. Enabling at the VNet level means every public IP attached to resources in that VNet is automatically protected – you do not enable per-IP under the Network plan.

RG=rg-edge-prod
LOC=eastus2
PLAN=ddos-plan-prod
VNET=vnet-edge

az group create -n $RG -l $LOC

# Network Protection plan (fixed monthly, covers up to 100 public IPs)
az network ddos-protection create -g $RG -n $PLAN -l $LOC

# Enable DDoS protection on the VNet and bind the plan
az network vnet update -g $RG -n $VNET \
  --ddos-protection-plan $PLAN \
  --ddos-protection true

For the IP Protection SKU you skip the plan and flip protection directly on the public IP:

az network public-ip update -g $RG -n pip-frontend \
  --ddos-protection-mode Enabled

Protect anything internet-facing: Application Gateway, Standard Load Balancer frontends, Azure Firewall, and any VM with a public IP. The engine protects Standard public IPs only; Basic SKU public IPs are not covered – one more reason Basic is retired for new builds.

AWS: Shield Advanced

Shield Advanced is a per-organization subscription (roughly $3,000/month plus data-transfer-out fees) that, once enabled, lets you protect individual resources. Subscribing alone protects nothing – you must enroll each resource (CloudFront distributions, ALBs, NLBs, Route 53 hosted zones, Global Accelerator, and Elastic IPs).

# Subscribe the account/org to Shield Advanced (one-time)
aws shield create-subscription

# Protect an Application Load Balancer
aws shield create-protection \
  --name alb-edge-prod \
  --resource-arn arn:aws:elasticloadbalancing:us-east-1:111122223333:loadbalancer/app/edge-prod/abc123

# Protect a CloudFront distribution (global resources are protected in us-east-1)
aws shield create-protection \
  --name cf-edge-prod \
  --resource-arn arn:aws:cloudfront::111122223333:distribution/E1ABCDEF2GHIJ

For fleets, do not click resource-by-resource. Use protection groups to apply Shield’s aggregation and detection across a set, and drive enrollment through Firewall Manager so new resources are protected on creation:

# Auto-aggregate detection across all Elastic IPs in the account
aws shield create-protection-group \
  --protection-group-id all-eips \
  --aggregation SUM \
  --pattern ALL \
  --resource-type ELASTIC_IP_ALLOCATION

The single most common Shield misconfiguration is a subscription with zero protected resources, or new resources that silently fall outside protection because enrollment was manual. Treat resource protection as policy-enforced (Firewall Manager) infrastructure, not a one-time setup step.

4. Adaptive tuning: traffic profiling, per-IP thresholds, and warm-up

This is the feature you are actually paying for. Coarse platform defaults either trigger mitigation too late (attack traffic reaches your app first) or too early (a legitimate traffic spike gets scrubbed). Adaptive tuning fixes that by learning your specific endpoint.

On Azure, the moment a public IP is protected, the engine begins building three auto-tuned mitigation policies for each public IP independently:

Machine-learning-based traffic profiling watches the real packets-per-second and bytes-per-second baseline for that IP and sets the trigger thresholds just above your observed peaks. Mitigation engages only when a policy threshold is exceeded – so an endpoint that normally peaks at 8k pps gets a tighter trigger than one that peaks at 800k pps, instead of both inheriting one platform default.

There is a warm-up period. Profiling is not instantaneous; the engine needs to observe a representative cycle of your traffic (days, across business and off hours) before its thresholds reflect reality. The operational consequences:

  1. Enable protection well before you need it. A plan turned on the morning of a launch has no learned baseline; it runs on conservative defaults that very launch traffic may trip.
  2. Re-profiling is continuous. As traffic patterns shift (a new region, a viral event) the policy adapts – but a sudden 50x legitimate spike on a freshly protected IP can still register as anomalous. Bake protection in early so the baseline is broad.

AWS Shield Advanced is analogous: it establishes per-resource baselines and resource-specific detection, and the longer a resource is protected, the more accurate its baseline. There is no threshold to hand-tune; the value is the learned profile, which is why you enroll resources early and leave them enrolled.

5. Layering: where a WAF and rate limiting catch what L3/L4 cannot

Restating the hard rule from the threat model because it is where architectures fail: infrastructure protection does not inspect L7. An HTTP flood of valid requests sails through Shield’s network mitigation and through Azure’s per-IP policies untouched. You need a WAF with rate-based rules in the path.

On AWS, that means an AWS WAF web ACL on the CloudFront/ALB with a rate-based rule – and Shield Advanced specifically can deploy automatic application-layer DDoS mitigation, generating WAF rules in response to detected L7 events when you opt in:

{
  "Name": "rate-limit-per-ip",
  "Priority": 10,
  "Statement": {
    "RateBasedStatement": {
      "Limit": 2000,
      "AggregateKeyType": "IP"
    }
  },
  "Action": { "Block": {} },
  "VisibilityConfig": {
    "SampledRequestsEnabled": true,
    "CloudWatchMetricsEnabled": true,
    "MetricName": "rateLimitPerIp"
  }
}

Limit is requests per 5-minute window per aggregation key. Pick AggregateKeyType deliberately: raw IP is the floor, but behind a shared NAT or mobile carrier you may need FORWARDED_IP or a header-based key so you do not block thousands of users sharing one egress IP.

The same layering applies on Azure: DDoS Protection handles L3/L4, and a WAF policy on Application Gateway or Front Door handles L7 with rate-limit custom rules:

resource wafRateLimit 'Microsoft.Network/ApplicationGatewayWebApplicationFirewallPolicies@2023-11-01' = {
  name: 'waf-edge-prod'
  location: location
  properties: {
    policySettings: {
      mode: 'Prevention'
      state: 'Enabled'
    }
    customRules: [
      {
        name: 'rateLimitPerClientIp'
        priority: 10
        ruleType: 'RateLimitRule'
        rateLimitDuration: 'OneMin'
        rateLimitThreshold: 1000
        groupByUserSession: [
          { groupByVariables: [ { variableName: 'ClientAddr' } ] }
        ]
        matchConditions: [
          {
            matchVariables: [ { variableName: 'RemoteAddr' } ]
            operator: 'IPMatch'
            matchValues: [ '0.0.0.0/0' ]
          }
        ]
        action: 'Block'
      }
    ]
  }
}

The mental model: L3/L4 protection is the moat, the WAF is the gate. A flood that is well-formed enough to look like traffic gets stopped at the gate, not the moat.

6. Telemetry, mitigation reports, and alerting on the under-attack signal

A mitigation you cannot see is a mitigation you cannot trust. The deliverable of this section is an alert that fires the instant an endpoint is under attack – not a dashboard nobody looks at.

Azure: the IfUnderDDoSAttack metric

Azure surfaces per-public-IP DDoS metrics: UnderDDoSAttack (the binary under-attack signal), plus inbound packets/bytes dropped, forwarded, and the TCP/UDP/SYN breakdowns. Alert on the under-attack signal directly:

PIP_ID=$(az network public-ip show -g $RG -n pip-frontend --query id -o tsv)

az monitor metrics alert create \
  -g $RG -n "alert-under-ddos-attack" \
  --scopes $PIP_ID \
  --condition "max IfUnderDDoSAttack > 0" \
  --description "Public IP is under an active DDoS mitigation" \
  --evaluation-frequency 1m \
  --window-size 5m \
  --severity 1 \
  --action <action-group-resource-id>

Route the diagnostic logs – DDoSProtectionNotifications, DDoSMitigationFlowLogs, and DDoSMitigationReports – to a Log Analytics workspace so you have the post-attack forensic record (vectors, dropped-packet counts, mitigation duration). Query the notifications stream in KQL:

AzureDiagnostics
| where Category == "DDoSProtectionNotifications"
| where TimeGenerated > ago(7d)
| project TimeGenerated, publicIpAddress_s, type_s, Message
| order by TimeGenerated desc

AWS: CloudWatch DDoSDetected and the SRT

Shield Advanced publishes the DDoSDetected metric per protected resource (1 while a DDoS event is in progress). Alarm on it and notify SNS:

aws cloudwatch put-metric-alarm \
  --alarm-name "shield-ddos-detected-alb-edge" \
  --namespace "AWS/DDoSProtection" \
  --metric-name "DDoSDetected" \
  --dimensions Name=ResourceArn,Value=arn:aws:elasticloadbalancing:us-east-1:111122223333:loadbalancer/app/edge-prod/abc123 \
  --statistic Maximum \
  --period 60 \
  --evaluation-periods 1 \
  --threshold 1 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --treat-missing-data notBreaching \
  --alarm-actions arn:aws:sns:us-east-1:111122223333:secops-pager

Mitigation detail lives in the Shield console and the describe-attack API, which returns attack vectors, top contributors, and the mitigation timeline for any past event:

aws shield list-attacks \
  --start-time fromInclusive=2026-06-01T00:00:00Z,toExclusive=2026-06-08T00:00:00Z

aws shield describe-attack --attack-id <attack-id>

7. Cost protection, scaling credits, and rapid-response engagement

Two paid-tier features matter operationally beyond the mitigation itself.

Cost protection / scaling credits. When an attack forces your infrastructure to scale – autoscaling EC2, surging data transfer, extra load balancer capacity units – Shield Advanced lets you request service credits for the attack-driven charges, so a mitigated attack does not arrive as a bill. Credits cover, for example, EC2 instances created by an autoscaling policy in response to the attack, and they are valid for 12 months. Azure provides equivalent cost-protection credits for scale-out caused by a documented DDoS event under the Network/IP Protection plans. The practical action: when an attack happens, open the credit request promptly with the attack ID and scaling timeline – this is a claim you file, not an automatic refund.

Rapid-response engagement. Shield Advanced gives access to the Shield Response Team (SRT) for customers on Business or Enterprise support; they can triage, identify root cause, and apply mitigations – including writing WAF rules on your web ACL – on your behalf during an incident. Activate proactive engagement so the SRT contacts you when a Route 53 health check tied to a protected resource goes unhealthy during an event, rather than waiting for you to open a case mid-attack:

aws shield associate-proactive-engagement \
  --emergency-contact-list \
    EmailAddress=secops@example.com,PhoneNumber=+15555550100,ContactNotes="Primary on-call"

aws shield enable-proactive-engagement

Azure’s equivalent is engaging the DDoS Rapid Response (DRR) team during an active attack through a support ticket. Pre-register the path: know which support plan tier grants it and have the escalation runbook ready before the day you need it.

8. Running a sanctioned attack simulation and reading the mitigation report

You have not proven any of this until you have watched mitigation engage and an alert fire. Both clouds support authorized simulations – and authorization is the whole point. Launching attack traffic at a cloud endpoint without it violates the provider’s acceptable-use policy.

Rules of engagement, non-negotiable:

A typical sanctioned run against an Azure protected public IP, driven through a partner’s API:

# Illustrative: parameters submitted to an APPROVED simulation partner's API.
# You do not generate this traffic yourself.
curl -X POST "https://<partner-endpoint>/tests" \
  -H "Authorization: Bearer $PARTNER_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
        "targetIP": "20.51.x.x",
        "testType": "UDP 1024",
        "portNumber": 443,
        "durationSeconds": 600,
        "bandwidthMbps": 200
      }'

While it runs, watch the under-attack signal flip and the drop counters climb:

az monitor metrics list \
  --resource $PIP_ID \
  --metric "IfUnderDDoSAttack" "PacketsDroppedDDoS" "PacketsInDDoS" \
  --interval PT1M \
  --start-time 2026-06-08T14:00:00Z \
  --end-time 2026-06-08T14:30:00Z

Then read the mitigation report – the artifact the whole exercise produces. On Azure it lands in the DDoSMitigationReports log category (vectors mitigated, packets dropped vs forwarded, top source geographies, start/stop). On AWS, describe-attack returns the same shape. You are verifying four things:

  1. IfUnderDDoSAttack / DDoSDetected went to 1 during the window.
  2. The mitigation dropped the attack packets (drop counter rose) while legitimate synthetic traffic still got through.
  3. Your alert actually paged the on-call – check the action group / SNS delivery, not just the alarm state.
  4. The report’s identified vectors match what the partner generated (UDP flood shows up as a UDP mitigation).

If any of those four did not happen, you found the gap in a rehearsal instead of an incident. That is the entire return on this work.

Verify

Run this end-to-end checklist against the live environment, not the IaC:

# Azure: confirm the VNet has a plan bound and protection on
az network vnet show -g $RG -n $VNET \
  --query "{ddosEnabled:enableDdosProtection, plan:ddosProtectionPlan.id}" -o json

# Azure: confirm the public IP reports a protection mode / plan
az network public-ip show -g $RG -n pip-frontend \
  --query "{ip:ipAddress, ddosMode:ddosSettings.protectionMode}" -o json

# Azure: confirm the under-attack alert exists and is enabled
az monitor metrics alert show -g $RG -n "alert-under-ddos-attack" \
  --query "{enabled:enabled, severity:severity}" -o json
# AWS: confirm the subscription is active
aws shield describe-subscription \
  --query "Subscription.{state:SubscriptionState,proactive:ProactiveEngagementStatus}"

# AWS: confirm the resource is actually protected
aws shield list-protections \
  --query "Protections[].{name:Name,arn:ResourceArn}"

# AWS: confirm the DDoSDetected alarm is wired to a notification target
aws cloudwatch describe-alarms --alarm-names "shield-ddos-detected-alb-edge" \
  --query "MetricAlarms[].{state:StateValue,actions:AlarmActions}"

Expected results: protection enabled at both the VNet/resource level, an active subscription/plan, an under-attack alarm in OK state with a real action target attached, and – after your sanctioned simulation – a mitigation report on file showing the attack was dropped.

Enterprise scenario

A media-streaming platform ran Azure DDoS Network Protection across their edge VNets and, on paper, were covered. During a high-profile live event their origin held under the network-layer volumetric noise – the per-IP UDP and SYN policies engaged exactly as designed and the under-attack alert paged correctly. But the player’s API tier buckled anyway, and the post-incident timeline showed no DDoS mitigation event at all for the API’s public IP. The platform team’s first assumption was that DDoS Protection had failed.

The constraint: it had not failed – it was the wrong layer. The attack was a Layer 7 HTTP flood, tens of thousands of well-formed POST /v2/heartbeat requests per second from a botnet, each request indistinguishable from a real player check-in. Every packet was valid TCP below the volumetric threshold, so the per-IP policies correctly never triggered. L3/L4 protection had nothing to bite on; the flood was an application-layer event only a WAF could see.

The fix was a rate-based rule on the Front Door WAF policy in front of the API, keyed on client address with a threshold set from the learned legitimate heartbeat rate – a real player sends one heartbeat every 30 seconds, so anything sustaining 60+ per minute from one address was provably synthetic:

{
  name: 'rateLimitHeartbeat'
  priority: 5
  ruleType: 'RateLimitRule'
  rateLimitDuration: 'OneMin'
  rateLimitThreshold: 60
  groupByUserSession: [
    { groupByVariables: [ { variableName: 'ClientAddr' } ] }
  ]
  matchConditions: [
    {
      matchVariables: [ { variableName: 'RequestUri' } ]
      operator: 'Contains'
      matchValues: [ '/v2/heartbeat' ]
      transforms: [ 'Lowercase' ]
    }
  ]
  action: 'Block'
}

Drops were immediate and legitimate players were unaffected. The lesson they wrote into their reference architecture: DDoS Network Protection and a WAF rate-limit are not interchangeable, they are complementary – one for the moat, one for the gate – and they added an L7 HTTP-flood scenario to their twice-yearly attack rehearsal so the gate gets tested, not just the moat. They also instrumented a WAF rate-limit-block alarm so an application-layer flood pages on-call the same way the network-layer signal already did.

Checklist

DDoSAzureAWS ShieldSecurityNetworkingResilience

Comments

Keep Reading