The Linux Consumption plan gave you scale-to-zero and execution billing, but you paid for it with no VNet integration, opaque scaling, and cold starts you could only pray about. Flex Consumption is Microsoft’s answer: same serverless billing model, but now with true virtual network integration, selectable instance memory, deterministic per-function concurrency, and always-ready instances to kill cold starts on the functions that matter. This is how to provision it correctly, tune it, and prove the scale controller behaves under load.
1. Flex vs Consumption vs Premium: the scaling and billing model
Pick the wrong plan and you either overpay for idle compute (Premium) or hit a wall you can’t tune around (Consumption). Here is the decision matrix that actually matters:
| Concern | Consumption | Premium (EP) | Flex Consumption |
|---|---|---|---|
| Scale to zero | Yes | No (min 1) | Yes |
| Max scale-out instances | 200 | 100 | 1,000 |
| VNet integration | No | Yes | Yes (subnet delegation) |
| Cold-start mitigation | None | Pre-warmed instances | Always-ready instances |
| Instance memory | Fixed | Fixed per SKU | Selectable: 512 / 2048 / 4096 MB |
| Concurrency control | Implicit | Implicit | Explicit per-instance |
| Billing | Execution only | Per-instance (always on) | Execution + always-ready baseline |
| OS | Linux/Windows | Linux/Windows | Linux only |
The billing distinction is the crux. Consumption bills only GB-seconds of active execution. Premium bills the full lifetime of every reserved instance whether it runs code or not. Flex Consumption splits the difference: on-demand instances bill only while actively executing (1,000 ms minimum, then rounded up to 100 ms), while any always-ready instances you configure bill a baseline for provisioned memory whether or not they execute. You only pay the Premium-style baseline on the slice of capacity you explicitly reserve.
The C# in-process model is not supported on Flex Consumption. You must be on the isolated worker model (.NET 8/9/10). There is also no in-place migration in or out: moving to Flex means creating a new app and redeploying.
2. Provision a Flex app with subnet delegation
VNet integration on Flex requires a subnet delegated to Microsoft.App/environments, at least /27 in size, and the Microsoft.App resource provider registered on the subscription. The portal and CLI enforce the RP registration at create time because you can add VNet integration later.
RG=rg-fnflex-prod
LOC=eastus
VNET=vnet-app
SUBNET=snet-func-flex
STORAGE=stfnflexprod$RANDOM
# 1. Register the provider that backs subnet delegation
az provider register --namespace Microsoft.App --wait
# 2. Network + a dedicated, delegated subnet (/26 leaves headroom)
az network vnet create -g $RG -n $VNET --address-prefixes 10.40.0.0/16 \
--subnet-name $SUBNET --subnet-prefixes 10.40.1.0/26
az network vnet subnet update -g $RG --vnet-name $VNET -n $SUBNET \
--delegations Microsoft.App/environments
# 3. Backing storage account (host metadata + deployment container)
az storage account create -g $RG -n $STORAGE -l $LOC --sku Standard_LZRS \
--allow-blob-public-access false --min-tls-version TLS1_2
The --delegations value is exact — Microsoft.App/environments, not Microsoft.Web/.... This trips up everyone coming from App Service VNet integration. With the subnet ready, create the app and join it to the VNet in one shot:
SUBNET_ID=$(az network vnet subnet show -g $RG --vnet-name $VNET -n $SUBNET --query id -o tsv)
az functionapp create \
--resource-group $RG \
--name fn-orders-prod \
--storage-account $STORAGE \
--flexconsumption-location $LOC \
--runtime dotnet-isolated --runtime-version 8.0 \
--vnet "$VNET" --subnet "$SUBNET"
--flexconsumption-location (not --consumption-plan-location) is what selects the Flex plan. Confirm the region supports it first with az functionapp list-flexconsumption-locations -o table — Flex is not in every region. To attach a VNet to an existing Flex app instead, use az functionapp vnet-integration add -g $RG -n fn-orders-prod --vnet "$VNET" --subnet "$SUBNET".
3. Configure instance memory and maximum instance count
Two knobs govern how big each worker is and how far the app can spread. Memory comes in three sizes; CPU and network bandwidth scale proportionally with it:
| Instance memory (MB) | vCPU cores | Use for |
|---|---|---|
| 512 | 0.25 | High fan-out, light per-request work; cheapest cores |
| 2048 | 1 | Default for most workloads |
| 4096 | 2 | CPU/memory-heavy work, large payloads, ML inference |
Every instance also gets an extra ~272 MB platform buffer that you are not billed for. Set memory at create time with --instance-memory, or change it later:
# Larger instances for a CPU-bound transform app
az functionapp scale config set -g $RG -n fn-orders-prod --instance-memory 4096
# Cap horizontal scale (40 is the lowest allowed max; 1000 the ceiling)
az functionapp scale config set -g $RG -n fn-orders-prod --maximum-instance-count 120
--maximum-instance-count accepts 40 to 1,000. The floor of 40 surprises people — you cannot pin a Flex app to “max 5 instances.” If you need a hard, low ceiling, Flex is the wrong plan.
Mind the regional subscription quota: every Flex app in a subscription+region shares a default budget of 250 cores (512,000 MB). Cores are instances x cores-per-instance, so a single 4096-MB app maxes out the default quota at 125 instances (125 x 2). Always-ready instances count against it; scaled-to-zero apps do not. Request an increase via support before you plan for thousands of large instances.
4. Per-instance concurrency: HTTP and non-HTTP triggers
This is the single most impactful tuning lever on Flex. Concurrency is how many parallel executions each instance handles. Set it too high and instances thrash under memory pressure; set it too low and you scale out (and bill) more instances than you need.
Flex groups functions into scale groups that scale together: all HTTP/SignalR triggers (http), Event Grid blob triggers (blob), and Durable orchestration/activity/entity triggers (durable). Everything else scales individually as function:<NAME>.
HTTP concurrency is set explicitly and, once set, is honored regardless of instance memory size:
# Each instance handles up to 10 concurrent HTTP executions before
# the scale controller adds another instance.
az functionapp scale config set -g $RG -n fn-orders-prod \
--trigger-type http --trigger-settings perInstanceConcurrency=10
http is the only trigger type valid for perInstanceConcurrency. The default HTTP concurrency is derived from instance memory when you do not set it — bigger instances default higher. Pin it explicitly in production so a later memory change doesn’t silently shift your scale math.
For non-HTTP triggers (Service Bus, Event Hubs, Storage Queue), concurrency is governed by target-based scaling through host.json, not the CLI flag above. You tune the batch/concurrency knobs of the binding and the runtime computes a target instance count from queue depth:
{
"version": "2.0",
"extensions": {
"serviceBus": {
"maxConcurrentCalls": 16,
"maxConcurrentSessions": 8,
"prefetchCount": 32
},
"queues": {
"batchSize": 16,
"newBatchThreshold": 8
}
}
}
For a queue trigger, target-based scaling computes desired instances as roughly messages / (batchSize + newBatchThreshold). Lowering batchSize makes the app scale out more aggressively per message backlog; raising it packs more work onto each instance. Tune this against downstream throughput limits (database connection pools, third-party API rate caps) — uncontrolled fan-out is how you DDoS your own backend.
5. Always-ready instances to kill cold starts
On-demand instances cold-start. For latency-critical paths — a synchronous checkout API, a webhook with a tight SLA — reserve always-ready instances that stay warm and take traffic first. The platform only spins up on-demand instances after the always-ready pool is saturated.
# Keep 3 warm instances for the HTTP group
az functionapp scale config always-ready set -g $RG -n fn-orders-prod \
--settings http=3
# Mix: warm Durable group + warm a single hot function
az functionapp scale config always-ready set -g $RG -n fn-orders-prod \
--settings durable=2 function:ProcessPayment=2
At create time the equivalent is --always-ready-instances http=3. Remove reservations with az functionapp scale config always-ready delete -g $RG -n fn-orders-prod --setting-names http function:ProcessPayment.
Two things to internalize. First, billing: always-ready instances bill a baseline for provisioned memory continuously, plus execution memory while running, with no free grant — this is the Premium-style cost, scoped to only the instances you reserve. Reserve the minimum that holds your steady-state concurrency. Second, zone redundancy: if you enable availability zones, the minimum always-ready count per group is 2, not 1, so the warm pool survives a zone outage.
6. Deploy with one-deploy and managed-identity storage
Flex has exactly one deployment path: build, zip, push the package to a blob container. The app pulls and runs from that package on startup. No WEBSITE_RUN_FROM_PACKAGE gymnastics — that behavior is built in.
# Build + zip your project, then one-deploy it
func azure functionapp publish fn-orders-prod
# or push a prebuilt package and run the build remotely on the platform:
az functionapp deployment source config-zip \
-g $RG -n fn-orders-prod --src ./app.zip --build-remote true
--build-remote true runs Oryx build (restore/compile) on the platform — use it for Python/Node where native wheels must match the Linux host. For precompiled .NET isolated output, ship the built artifact and skip remote build.
The security upgrade is removing storage secrets entirely. By default the host talks to storage via a connection string in AzureWebJobsStorage. Replace it with an identity-based connection so no key ever lands in app settings:
# Assign a user-assigned identity and grant it data-plane access to storage
UAMI_ID=$(az identity show -g $RG -n id-fn-orders --query id -o tsv)
UAMI_CLIENT=$(az identity show -g $RG -n id-fn-orders --query clientId -o tsv)
STORAGE_ID=$(az storage account show -g $RG -n $STORAGE --query id -o tsv)
az functionapp identity assign -g $RG -n fn-orders-prod --identities "$UAMI_ID"
# Host needs Blob + Queue + Table data roles on the backing account
for ROLE in "Storage Blob Data Owner" "Storage Queue Data Contributor" "Storage Account Contributor"; do
az role assignment create --assignee "$UAMI_CLIENT" --role "$ROLE" --scope "$STORAGE_ID"
done
# Swap the connection string for an identity-based connection
az functionapp config appsettings set -g $RG -n fn-orders-prod --settings \
"AzureWebJobsStorage__accountName=$STORAGE" \
"AzureWebJobsStorage__credential=managedidentity" \
"AzureWebJobsStorage__clientId=$UAMI_CLIENT" && \
az functionapp config appsettings delete -g $RG -n fn-orders-prod \
--setting-names AzureWebJobsStorage
The __accountName syntax is specific to AzureWebJobsStorage. Omit __clientId and Flex falls back to the system-assigned identity (use az functionapp identity assign -g $RG -n fn-orders-prod with no --identities). For the deployment container specifically, you can authenticate the same way at create time:
az functionapp create -g $RG -n fn-orders-prod --storage-account $STORAGE \
--runtime dotnet-isolated --runtime-version 8.0 --flexconsumption-location $LOC \
--deployment-storage-name $STORAGE \
--deployment-storage-container-name app-package \
--deployment-storage-auth-type UserAssignedIdentity \
--deployment-storage-auth-value "$UAMI_ID"
--deployment-storage-auth-type accepts StorageAccountConnectionString, UserAssignedIdentity, or SystemAssignedIdentity. The identity needs Storage Blob Data Contributor on the deployment account.
7. Private endpoints, Key Vault references, and outbound lockdown
VNet integration handles outbound traffic. To lock down inbound access to your dependencies, pair it with private endpoints and disable public network access on each backing resource.
# Private endpoint for the storage blob service
az network private-endpoint create -g $RG -n pe-st-blob \
--vnet-name $VNET --subnet snet-pe \
--private-connection-resource-id "$STORAGE_ID" \
--group-id blob --connection-name conn-st-blob
# Force all storage traffic through the private path
az storage account update -g $RG -n $STORAGE --public-network-access Disabled
For the function app to resolve *.privatelink.blob.core.windows.net to the private IP through its VNet, ensure the integration subnet’s VNet is linked to the relevant Private DNS zones (privatelink.blob.core.windows.net, privatelink.queue.core.windows.net, privatelink.vaultcore.azure.net, and so on). Without that DNS link the app resolves the public IP and the endpoint is bypassed.
Pull secrets from Key Vault behind its own private endpoint via Key Vault references — the secret value is never stored in app settings:
az functionapp config appsettings set -g $RG -n fn-orders-prod --settings \
"DbConnection=@Microsoft.KeyVault(SecretUri=https://kv-orders.vault.azure.net/secrets/db-conn/)"
Grant the app’s managed identity Key Vault Secrets User on the vault. To force all outbound through the VNet (so it can traverse a firewall or NAT gateway and the resolver sees private records), set vnetRouteAllEnabled:
az resource update -g $RG --namespace Microsoft.Web --resource-type sites \
--name fn-orders-prod --set properties.vnetRouteAllEnabled=true
8. Load-test the scale controller and diagnose 429s
The scale controller is deterministic on Flex — instances are added based on the concurrency you configured — but you still need to prove it under representative load and confirm you are not hitting the regional quota. Generate load, then read the platform metrics that explain scaling decisions.
APP_ID=$(az functionapp show -g $RG -n fn-orders-prod --query id -o tsv)
# Instance count and execution units over the last hour
az monitor metrics list --resource "$APP_ID" \
--metric "InstanceCount" --interval PT1M -o table
az monitor metrics list --resource "$APP_ID" \
--metric "OnDemandFunctionExecutionUnits" --interval PT1H -o table
az monitor metrics list --resource "$APP_ID" \
--metric "AlwaysReadyFunctionExecutionUnits" --interval PT1H -o table
When you see HTTP 429 responses under load, there are two distinct causes and you must tell them apart:
- App-level throttling — instances are saturated at their concurrency limit and the app cannot scale further because
--maximum-instance-countis too low or the regional core quota is exhausted. Fix: raise the max instance count, raise per-instance concurrency (if instances have memory headroom), or request a quota increase. - Cold-start latency cascades — a burst arrives faster than on-demand instances warm up, and an upstream gateway times out and retries, amplifying load. Fix: add always-ready instances sized to absorb the burst’s leading edge.
Use Application Insights to separate them with this Kusto query — it correlates 429 rate against live instance count so you can see whether you were capped or cold:
let window = 5m;
requests
| where timestamp > ago(1h)
| summarize
total = count(),
throttled = countif(resultCode == 429),
p95_ms = percentile(duration, 95)
by bin(timestamp, window)
| extend throttle_rate = round(100.0 * throttled / total, 2)
| order by timestamp asc
A throttle rate that climbs while p95 stays flat points to a hard instance cap (cause 1). A throttle rate that spikes alongside a p95 latency spike at the start of a burst points to cold starts (cause 2). The portal’s Diagnose and solve problems blade also exposes a Flex Consumption Quota tool and a Flex Consumption Deployment tool that show real-time core usage and deployment package status — check the quota tool first when scaling stalls below your configured max.
Verify
Confirm each layer is actually in effect, not just configured:
# Plan is Flex, instance memory and max count are what you set
az functionapp show -g $RG -n fn-orders-prod \
--query "{sku:sku, mem:siteConfig.functionAppScaleLimit}" -o jsonc
az functionapp scale config show -g $RG -n fn-orders-prod -o jsonc
# VNet integration is bound to the delegated subnet
az functionapp vnet-integration list -g $RG -n fn-orders-prod -o table
# Always-ready reservations are present
az functionapp scale config always-ready list -g $RG -n fn-orders-prod -o table
# No storage connection-string secret remains in app settings
az functionapp config appsettings list -g $RG -n fn-orders-prod \
--query "[?name=='AzureWebJobsStorage']" -o table # should be empty
Then prove behavior: hit a warm endpoint and confirm sub-second p50 from the always-ready pool, fire a burst beyond the always-ready count and watch InstanceCount climb in metrics, and trigger a private-endpoint path to confirm DNS resolves to a 10.x address (nslookup from a peered VM, not your laptop).
Enterprise scenario
A payments platform team migrated a synchronous card-authorization API off the Linux Consumption plan after a Black Friday incident: cold starts pushed p99 past their acquirer’s 800 ms timeout, the acquirer retried, and retries stampeded a backend whose database had a 200-connection pool. The hard constraint: the auth function had to reach an on-prem fraud-scoring service over a private ExpressRoute path (impossible on Consumption, which has no VNet integration), and it could not exceed ~150 concurrent backend connections regardless of incoming spike.
They moved to Flex Consumption and solved it with three coordinated settings. VNet integration over a delegated Microsoft.App/environments subnet gave them the private route to on-prem. They reserved always-ready instances to absorb the burst leading edge so the acquirer never saw a cold start. Crucially, they capped fan-out by pinning per-instance concurrency and max instances so total in-flight executions could never exceed the database pool:
# 6 warm instances x 24 concurrency = 144 steady-state in-flight,
# hard-capped at 8 instances so peak <= 192 < the 200-conn pool.
az functionapp scale config always-ready set -g rg-payments -n fn-auth \
--settings http=6
az functionapp scale config set -g rg-payments -n fn-auth \
--trigger-type http --trigger-settings perInstanceConcurrency=24
az functionapp scale config set -g rg-payments -n fn-auth \
--maximum-instance-count 8 --instance-memory 2048
The result: p99 dropped under 300 ms because the warm pool never cold-started on the hot path, and the explicit concurrency x max-instances ceiling made backend overload structurally impossible — the function throttled with 429s at the edge (which the acquirer handled gracefully) long before the database pool exhausted. The lesson that generalizes: on Flex, concurrency and max-instance-count are not just performance knobs, they are a backpressure mechanism. Size them against your weakest downstream dependency, not against incoming traffic.