Classic UI-based release pipelines were easy to click together and impossible to review. Multi-stage YAML pipelines put the entire promotion path — build, dev, test, prod, approvals, and gates — under version control next to the code it ships. This guide builds a defensible, production-grade pipeline that takes a single commit from dev to prod with real guardrails.
The multi-stage mental model
A YAML pipeline is a hierarchy: a pipeline contains stages, a stage contains jobs, and a job contains steps. There are two kinds of jobs:
- A regular
jobruns steps on an agent. - A
deploymentjob targets an Environment, records deployment history against it, and unlocks lifecycle hooks and deployment strategies (runOnce,rolling,canary).
The key mental shift from classic releases: approvals and gates are not pipeline YAML. They are checks attached to a protected resource — almost always an Environment, sometimes a service connection or variable group. The pipeline declares what it wants to deploy and where; the Environment’s checks decide whether it is allowed to proceed. This separation is deliberate: a developer editing the pipeline file cannot remove a production approval, because that approval lives on the Environment, governed by a different permission.
Checks run before the deployment job’s agent is acquired. A blocked approval costs you nothing in agent minutes.
Step 1 — Structure stages and the dependsOn graph
Stages run sequentially by default, each depending on the previous one. Make dependencies explicit so the graph is obvious and so you can fan out later. A deployment job names its target Environment under environment:.
trigger:
branches:
include: [ main ]
pool:
vmImage: ubuntu-latest
stages:
- stage: Build
jobs:
- job: build
steps:
- task: DotNetCoreCLI@2
inputs:
command: publish
publishWebProjects: true
arguments: '--configuration Release --output $(Build.ArtifactStagingDirectory)'
- publish: $(Build.ArtifactStagingDirectory)
artifact: app
- stage: DeployDev
dependsOn: Build
jobs:
- deployment: deploy
environment: dev
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: app
- script: echo "Deploying to dev"
- stage: DeployTest
dependsOn: DeployDev
jobs:
- deployment: deploy
environment: test
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: app
- script: echo "Deploying to test"
- stage: DeployProd
dependsOn: DeployTest
jobs:
- deployment: deploy
environment: prod
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: app
- script: echo "Deploying to prod"
Two details worth internalizing:
- Inside a
deploymentjob you do not add an explicitcheckoutfor the source. TherunOnce.deploylifecycle skips source checkout by default; youdownloadbuild artifacts instead. If you genuinely need the repo in a deployment job, add- checkout: self. dependsOnaccepts a list.dependsOn: [DeployDev, IntegrationTests]lets a stage wait on multiple predecessors, which is how you model parallel test stages converging before prod.
To gate a stage on a non-default branch or a condition, combine dependsOn with condition:
- stage: DeployProd
dependsOn: DeployTest
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
Step 2 — Model dev/test/prod as Environments
Create one Environment per deployment target under Pipelines -> Environments. Environments give you three things classic releases bolted on awkwardly: deployment history per target, resource scoping (you can add specific Kubernetes namespaces or VMs as resources), and a place to hang checks.
You can create them in the portal or via the CLI:
az pipelines environment create \
--name prod \
--organization https://dev.azure.com/contoso \
--project Payments
Environment names referenced in YAML are created on first run if they do not exist — but an auto-created Environment has no checks. Always pre-create production Environments and attach checks before the pipeline can target them, or your first prod run sails straight through.
For container or VM workloads, target a resource within the Environment using the environment.resourceName form:
- deployment: deploy
environment: prod.payments-ns # Kubernetes resource "payments-ns" in env "prod"
strategy:
runOnce:
deploy:
steps:
- task: KubernetesManifest@1
inputs:
action: deploy
manifests: manifests/deployment.yaml
Step 3 — Manual approvals, business-hours, and exclusive locks
Checks are configured on the Environment (the … menu -> Approvals and checks). The three you will reach for most:
Approvals. Designate approver users or groups. Use a group, not individuals, so on-call rotation does not break promotions. Set a timeout (the run waits, pending, until then) and decide whether the approver who requested the run may also approve it — for prod, turn that off to enforce four-eyes.
Business hours. A check that only passes during a defined window and time zone. Attach it to prod so a Friday-evening merge queues until Monday morning rather than deploying into the weekend.
Exclusive lock. Guarantees only one run at a time can pass through the Environment. Without it, two merges in quick succession can both enter the prod stage and race. With it, runs serialize; the newer run waits for the lock. Pair this with the pipeline-level setting to cancel superseded runs if you only ever care about the latest commit reaching prod.
Checks have a time out and a separate evaluation retry cadence. An approval that no one actions within its timeout fails the stage — it does not silently pass. Set the timeout to match your real escalation SLA.
You cannot define these checks in pipeline YAML; that is the point of putting them on the resource. What you can version-control is the Environment-and-checks configuration itself, by managing Azure DevOps with Terraform (the azuredevops provider exposes azuredevops_environment and check resources), so even your approval policy is reviewable.
Step 4 — Automated deployment gates
A gate is an automated check that polls an external system and only passes when a condition holds. This is how you stop a promotion when production is already unhealthy. Two built-in checks cover most needs:
Query Azure Monitor alerts. Configure the check with a subscription, resource group, and the alert rules to evaluate. The check passes only when no configured alert is firing. Attach it to prod so an active “5xx spike” or “p99 latency” alert blocks the next deployment automatically.
Invoke REST API. Call any HTTPS endpoint and pass/fail based on the response. Use it to query a change-freeze calendar, a feature-flag service, or your own health endpoint. The check succeeds when the response matches your success criteria; configure it to retry on a cadence so a transient failure does not immediately fail the stage.
A robust gate pattern: give the check a window and an interval (for example, evaluate every 5 minutes for up to 30 minutes). The check must report healthy on each sample before the stage proceeds — a single green blip will not let a flapping service through.
You can also write your own gate purely in YAML by making an early job fail fast on a probe, but prefer the resource-level checks: they run before agent acquisition and apply no matter which pipeline targets the Environment.
- deployment: deploy
environment: prod
strategy:
runOnce:
preDeploy:
steps:
- script: ./scripts/smoke-check.sh # in-pipeline guard, complements gates
deploy:
steps:
- download: current
artifact: app
- script: ./scripts/deploy.sh
Step 5 — Template libraries and required-template enforcement
Copy-pasting stages across repos is how pipelines rot. Azure DevOps has two template mechanisms:
Includes templates inject reusable YAML (steps, jobs, or stages) into a pipeline the author controls. Good for sharing a build sequence.
Extends templates invert control: the pipeline extends a template that owns the overall shape, and the template decides which parameterized hooks the consumer may fill. This is the security-relevant one. Combined with a required template check on a protected resource, you can mandate that any pipeline touching prod must extend an approved governance template — there is no way to deploy to that Environment otherwise.
A consumer pipeline:
# azure-pipelines.yml in an app repo
resources:
repositories:
- repository: templates
type: git
name: Platform/pipeline-templates
ref: refs/tags/v3
extends:
template: stages/deploy.yml@templates
parameters:
serviceName: payments-api
environments: [ dev, test, prod ]
The governing template:
# stages/deploy.yml in Platform/pipeline-templates
parameters:
- name: serviceName
type: string
- name: environments
type: object
default: [ dev ]
stages:
- ${{ each env in parameters.environments }}:
- stage: Deploy_${{ env }}
jobs:
- deployment: deploy
environment: ${{ env }}
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: app
- script: ./deploy.sh ${{ parameters.serviceName }} ${{ env }}
The ${{ each ... }} is a compile-time expansion. Template expressions resolve when the YAML is parsed, before runtime — so the loop literally generates one stage per environment in the expanded pipeline. Pin the template repository to a tag (ref: refs/tags/v3), not a branch, so a platform change cannot silently alter every consumer’s prod path.
To enforce it: on the prod Environment, add a Required template check pointing at stages/deploy.yml@templates. Pipelines that do not extend it are rejected at queue time.
Step 6 — Variables, parameters, and keyless deploys
Variable groups hold shared, often-secret values (linked to Key Vault for secrets). Reference them at the stage or job scope so dev values never leak into prod:
- stage: DeployProd
variables:
- group: payments-prod # variable group, may be Key Vault-backed
jobs:
- deployment: deploy
environment: prod
# ...
A variable group can itself be a protected resource with its own approval check, so granting a pipeline access to prod secrets requires a sign-off independent of the Environment.
Runtime parameters (parameters: at the top of the pipeline) are typed and shown in the Run pipeline dialog. Unlike variables, they expand at compile time, so they can drive ${{ if }} / ${{ each }} logic — ideal for an optional “deploy hotfix to prod only” toggle.
Keyless deploys with workload identity federation. Do not store cloud secrets in service connections. Create the Azure Resource Manager service connection using Workload Identity Federation: Azure DevOps presents a short-lived OIDC token to Microsoft Entra ID, which exchanges it for an access token via a federated credential on an app registration or managed identity. No client secret is stored, nothing expires under you, and tasks like AzureCLI@2 authenticate transparently:
deploy:
steps:
- task: AzureCLI@2
inputs:
azureSubscription: payments-prod-wif # WIF service connection
scriptType: bash
scriptLocation: inlineScript
inlineScript: |
az group list --output table
Azure DevOps can convert existing secret-based ARM connections to WIF in place, and it surfaces a warning when a connection still uses an expiring secret. Treat that warning as a backlog item.
Enterprise scenario
A payments platform team ran ~40 microservices, each extending a shared stages/deploy.yml@templates pinned to refs/tags/v3. Prod was protected by a four-eyes approval and an exclusive lock. The incident: a Sev-2 outage needed a hotfix shipped in minutes, but the exclusive lock was held by a long-running canary from an unrelated service that was parked in postRouteTraffic waiting on its Azure Monitor gate. The hotfix run sat queued behind it. Worse, the parked run could not be approved-through because the on-call engineer wasn’t in the approver group — that was the four-eyes design working exactly as intended, against us.
The root cause was scoping the exclusive lock at the whole prod Environment instead of per service. We re-modeled each service as a resource inside the Environment (prod.payments-api, prod.ledger) and moved the lock to the resource via the runOnce deployment targeting that resource, so locks no longer cross service boundaries:
- deployment: deploy
environment: prod.payments-api # per-service resource, lock is scoped here
strategy:
runOnce:
deploy:
steps:
- download: current
artifact: app
- script: ./deploy.sh payments-api
We also added a dedicated break-glass pipeline that extends the same governance template but targets a separate prod-hotfix Environment whose approval group includes on-call, with the Azure Monitor gate kept but business-hours dropped. Lesson: an exclusive lock and a global four-eyes approval are correct defaults, but if they share a single blast radius they will eventually serialize an emergency behind a routine deploy. Scope locks to the unit you actually deploy, and pre-build the break-glass path before you need it.
Verify
Run the pipeline from main and confirm each guardrail actually engages:
# Trigger a run and capture its id
az pipelines run \
--name payments-cd \
--branch main \
--organization https://dev.azure.com/contoso \
--project Payments
# Inspect stage timeline; prod should sit in "pending"/"waiting" until approved
az pipelines runs show \
--id <runId> \
--organization https://dev.azure.com/contoso \
--project Payments \
--query "{status:status, result:result}"
What “correct” looks like:
- Dev deploys automatically; test waits on its approval; prod waits on approval plus any business-hours and Azure Monitor gates.
- The prod stage shows no agent consumed while a check is pending — checks gate before agent acquisition.
- Triggering a second run while one holds the exclusive lock leaves the second queued, not racing.
- A run from a non-
mainbranch never reaches the prod stage (theconditionfilters it out). - Forcing an alert into a firing state causes the Azure Monitor check to hold the prod stage until the alert clears.
Production checklist
Pitfalls
- Auto-created Environments have no checks. If prod is created implicitly on first run, your first deploy is ungated. Pre-create and protect it.
- Approvals are not in YAML. Reviewers who only read the pipeline file cannot see your guardrails — point them at the Environment configuration, or codify it with Terraform so it is reviewable.
- Branch-pinning templates. A template referenced by branch means a platform commit instantly changes every consumer’s prod behavior. Pin to immutable tags and roll consumers forward deliberately.
- Gates that pass on one green sample. Configure a window and interval so a flapping dependency cannot slip a bad deploy through between failures.
- Forgetting
dependsOnon fan-in. A prod stage that lists only one predecessor will start before parallel integration stages finish. List every dependency explicitly.
Next, layer in a canary or rolling strategy on the prod deployment job and wire its postRouteTraffic hook to the same Azure Monitor gate — so a bad canary rolls itself back before it ever reaches the full fleet.