Instrumenting DORA Metrics: Building a Deployment Frequency and Lead-Time Pipeline

Most DORA dashboards lie. They count GitHub merges as deploys, treat every closed Jira bug as a failure, and aggregate forty services into one meaningless company-wide number. The result is a chart leadership stares at while nobody’s delivery actually improves. DORA metrics are only useful when each value maps to a real, timestamped event in a system you control, and when the aggregation matches how you actually ship.

This guide builds that pipeline. We collect deployment and change events from your VCS and CI/CD, derive lead time from first commit to production, compute change failure rate and MTTR from incident and rollback signals, store everything in a queryable table, and surface it in Grafana without creating metrics people game. The reference stack is GitHub plus a generic CI runner plus PostgreSQL plus Grafana, but every event boundary maps cleanly to GitLab, Azure DevOps, Jenkins, PagerDuty, or Opsgenie.

1. Define the four keys precisely (and the pitfalls)

The four DORA metrics, per the Accelerate research and the annual DORA report, split into throughput and stability:

Metric	Definition	Event boundary
Deployment frequency	How often you deploy to production	One row per successful prod deploy
Lead time for changes	Time from code committed to code running in prod	`first_commit_ts` to `deployed_ts`
Change failure rate	Percentage of deployments causing a failure in prod	failed deploys / total deploys
Mean time to restore (MTTR)*	Time to recover from a prod failure	`incident_start_ts` to `incident_resolved_ts`

*The 2024 DORA report renamed MTTR to “failed deployment recovery time” to clarify it measures recovery from a deployment-induced failure, not from any incident. The boundary matters: a failed disk in a datacenter is reliability, not delivery. Scope MTTR to incidents linked to a deployment.

The pitfalls that produce fiction:

Merge != deploy. A merge to main is a change event, not a deployment. Counting merges inflates frequency for teams that batch releases.
Lead time must start at first commit, not at PR open or merge. Code sitting in a long-lived branch is exactly the waste lead time is meant to expose. Use the earliest commit on the branch, not the merge commit timestamp.
Change failure rate needs a denominator of deployments, not of incidents. CFR is failed_deploys / total_deploys, so you must record successful deploys too.
Averages hide everything. DORA bands are about distributions. Report the median (p50) for lead time and MTTR; a single outlier rollback at 3am wrecks the mean.

2. Identify the source events

Map each metric to a concrete, timestamped signal before writing any code. For the reference stack:

Event	Source	Signal
Change created	GitHub	`push` / merge to default branch; first commit SHA + author date
Build / pipeline run	CI	pipeline start and finish with commit SHA
Deployment	CI deploy job	explicit “deploy succeeded/failed to prod” emission
Incident opened	PagerDuty / Opsgenie	incident triggered with a `service` field
Incident resolved	PagerDuty / Opsgenie	incident resolved timestamp
Rollback	CI / GitOps	a deploy whose `is_rollback=true`

The single most important design decision: the deployment event is emitted by the deploy job itself, not inferred. Inference (e.g. “a merge to main means a deploy happened”) breaks the moment you batch, gate, or have a failed promotion. Make the pipeline say “I deployed service X, version Y, to prod, at time T, and it succeeded.”

3. Collect deployment and change events

Two ingestion paths feed one normalized table: webhooks for things that happen in SaaS (merges, incidents) and an explicit CI annotation for deploys.

First, the schema. One wide event table is enough to start; you derive metrics with SQL.

CREATE TABLE delivery_events (
  id            BIGSERIAL PRIMARY KEY,
  event_type    TEXT NOT NULL,          -- change | deploy | incident_open | incident_resolved
  service       TEXT NOT NULL,
  team          TEXT,
  environment   TEXT,                   -- prod | staging | ...
  commit_sha    TEXT,
  first_commit_ts TIMESTAMPTZ,          -- earliest commit in the change set
  occurred_at   TIMESTAMPTZ NOT NULL,   -- when the event itself happened
  success       BOOLEAN,                -- deploy outcome
  is_rollback   BOOLEAN DEFAULT FALSE,
  incident_id   TEXT,                   -- correlate open/resolved
  source        TEXT NOT NULL,          -- github | ci | pagerduty
  payload       JSONB,                  -- raw event for auditing
  ingested_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_events_service_time ON delivery_events (service, occurred_at);
CREATE INDEX idx_events_type ON delivery_events (event_type, environment);

A minimal receiver. Verify the webhook signature before trusting anything; GitHub signs with HMAC-SHA256 over the raw body using your webhook secret.

import hashlib, hmac, os, json
from datetime import datetime, timezone
from flask import Flask, request, abort
import psycopg

app = Flask(__name__)
SECRET = os.environ["GH_WEBHOOK_SECRET"].encode()

def verify(req) -> bool:
    sig = req.headers.get("X-Hub-Signature-256", "")
    mac = hmac.new(SECRET, req.get_data(), hashlib.sha256)
    expected = "sha256=" + mac.hexdigest()
    return hmac.compare_digest(expected, sig)

@app.post("/hooks/github")
def github():
    if not verify(request):
        abort(401)
    if request.headers.get("X-GitHub-Event") != "push":
        return "", 204
    e = request.json
    if e.get("ref") != f"refs/heads/{e['repository']['default_branch']}":
        return "", 204  # only default-branch changes count

    commits = e.get("commits", [])
    if not commits:
        return "", 204
    # earliest commit timestamp in the pushed set = lead-time start
    first_ts = min(c["timestamp"] for c in commits)

    with psycopg.connect(os.environ["DATABASE_URL"]) as conn:
        conn.execute(
            """INSERT INTO delivery_events
               (event_type, service, commit_sha, first_commit_ts,
                occurred_at, source, payload)
               VALUES ('change', %s, %s, %s, %s, 'github', %s)""",
            (e["repository"]["name"], e["after"], first_ts,
             datetime.now(timezone.utc), json.dumps(e)),
        )
    return "", 202

Now the deploy event. Emit it from the pipeline at the end of the prod deploy step. This is a plain authenticated POST; the CI job already knows the version, environment, and outcome.

# Runs as the last step of the prod deploy job.
# DEPLOY_OK is "true"/"false" set by the preceding deploy step.
FIRST_COMMIT_TS=$(git log --reverse --format=%cI "origin/main..HEAD" | head -1)
[ -z "$FIRST_COMMIT_TS" ] && FIRST_COMMIT_TS=$(git show -s --format=%cI HEAD)

curl -fsS -X POST "$METRICS_INGEST_URL/events/deploy" \
  -H "Authorization: Bearer $METRICS_TOKEN" \
  -H "Content-Type: application/json" \
  -d @- <<JSON
{
  "service":        "$SERVICE_NAME",
  "team":           "$TEAM_NAME",
  "environment":    "prod",
  "commit_sha":     "$(git rev-parse HEAD)",
  "first_commit_ts":"$FIRST_COMMIT_TS",
  "occurred_at":    "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
  "success":        $DEPLOY_OK,
  "is_rollback":    ${IS_ROLLBACK:-false}
}
JSON

The git log origin/main..HEAD trick gives you the first commit unique to this change set, which is the correct lead-time start. For a merge-commit workflow, walk the merged commits instead via the GitHub compare API.

Make the metrics POST non-blocking for the deploy: wrap it so a metrics outage never fails a production release. Use curl -fsS ... || echo "metrics emit failed" and alert on the gap separately. Delivery instrumentation must never be in the critical path of delivery.

4. Compute lead time for changes

Lead time is the gap between first_commit_ts and the prod deploy that carried that commit. Because the deploy event records its own first_commit_ts (from the CI snippet above), the computation is local to the deploy row, no join required:

SELECT
  service,
  date_trunc('week', occurred_at)                       AS week,
  percentile_cont(0.5) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (occurred_at - first_commit_ts))
  ) / 3600.0                                             AS lead_time_p50_hours,
  percentile_cont(0.9) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (occurred_at - first_commit_ts))
  ) / 3600.0                                             AS lead_time_p90_hours,
  count(*)                                               AS deploys
FROM delivery_events
WHERE event_type = 'deploy'
  AND environment = 'prod'
  AND success = TRUE
  AND first_commit_ts IS NOT NULL
GROUP BY service, week
ORDER BY week DESC;

Two correctness notes. Report p50 and p90, not the mean, because lead-time distributions are heavily right-skewed. And guard against clock skew producing negative intervals: if a committer’s machine has a future clock, occurred_at - first_commit_ts goes negative. Filter WHERE occurred_at >= first_commit_ts or clamp at zero, and log the rejects so you can chase the bad runner.

Deployment frequency falls out of the same table trivially:

SELECT service,
       date_trunc('day', occurred_at) AS day,
       count(*) AS prod_deploys
FROM delivery_events
WHERE event_type = 'deploy' AND environment = 'prod' AND success = TRUE
GROUP BY service, day
ORDER BY day DESC;

5. Derive change failure rate and MTTR

Change failure rate is failed prod deploys over all prod deploys. Two failure signals count: a deploy that reports success=false, and a subsequent is_rollback=true deploy (a rollback is evidence the prior release failed).

WITH prod AS (
  SELECT * FROM delivery_events
  WHERE event_type = 'deploy' AND environment = 'prod'
)
SELECT
  service,
  date_trunc('week', occurred_at) AS week,
  count(*) FILTER (WHERE success = FALSE OR is_rollback)::numeric
    / NULLIF(count(*), 0) AS change_failure_rate,
  count(*) AS total_deploys
FROM prod
GROUP BY service, week
ORDER BY week DESC;

MTTR comes from correlating incident_open and incident_resolved events on incident_id. Scope to incidents tied to a service that deploys, so you measure deployment recovery rather than general reliability:

WITH incidents AS (
  SELECT
    service,
    incident_id,
    min(occurred_at) FILTER (WHERE event_type = 'incident_open')     AS opened,
    min(occurred_at) FILTER (WHERE event_type = 'incident_resolved') AS resolved
  FROM delivery_events
  WHERE event_type IN ('incident_open', 'incident_resolved')
  GROUP BY service, incident_id
)
SELECT
  service,
  date_trunc('week', opened) AS week,
  percentile_cont(0.5) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (resolved - opened))
  ) / 60.0 AS mttr_p50_minutes
FROM incidents
WHERE resolved IS NOT NULL
GROUP BY service, week
ORDER BY week DESC;

For the incident source, normalize the PagerDuty webhook into the same table. PagerDuty v3 webhooks deliver an envelope with event.event_type values like incident.triggered and incident.resolved:

@app.post("/hooks/pagerduty")
def pagerduty():
    e = request.json["event"]
    etype = {"incident.triggered": "incident_open",
             "incident.resolved":  "incident_resolved"}.get(e["event_type"])
    if not etype:
        return "", 204
    data = e["data"]
    with psycopg.connect(os.environ["DATABASE_URL"]) as conn:
        conn.execute(
            """INSERT INTO delivery_events
               (event_type, service, environment, occurred_at,
                incident_id, source, payload)
               VALUES (%s, %s, 'prod', %s, %s, 'pagerduty', %s)""",
            (etype, data["service"]["summary"], e["occurred_at"],
             data["id"], json.dumps(e)),
        )
    return "", 202

6. Model per-service vs. per-team aggregation and bands

DORA bands are defined per delivery unit, and the right unit depends on your org. A team owning one service: roll up by team. A platform team owning twenty services: per-service is the unit that drives action, with team as a secondary slice. Keep both service and team on every row (as the schema does) so you can aggregate either way without re-ingesting.

The published DORA performance bands (2024 report) are a useful, if coarse, yardstick:

Metric	Elite	High	Medium	Low
Deployment frequency	On-demand (multiple/day)	Weekly to monthly	Monthly to every 6 months	Fewer than every 6 months
Lead time for changes	Less than one day	One day to one week	One week to one month	One to six months
Change failure rate	5 percent	10 percent	15 percent	64 percent (low cluster)
Failed deployment recovery	Less than one hour	Less than one day	One day to one week	More than six months

Encode the bands once so dashboards and reports agree:

CREATE OR REPLACE FUNCTION lead_time_band(hours numeric)
RETURNS text LANGUAGE sql IMMUTABLE AS $$
  SELECT CASE
    WHEN hours < 24            THEN 'Elite'
    WHEN hours < 24 * 7        THEN 'High'
    WHEN hours < 24 * 30       THEN 'Medium'
    ELSE 'Low'
  END;
$$;

Resist averaging metrics across services into one org number. A platform team’s true picture is “16 of 20 services are Elite on lead time, 4 are Medium and here’s why,” not “the org is High.” The per-service breakdown is what you act on.

7. Build dashboards in Grafana (and avoid gaming)

Point Grafana at the PostgreSQL data source and back each panel with the SQL above. Use Grafana’s macros so panels respect the dashboard time range and the $service template variable:

-- Grafana panel: weekly lead time p50 per selected service
SELECT
  $__timeGroup(occurred_at, '1w')                       AS time,
  percentile_cont(0.5) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (occurred_at - first_commit_ts))
  ) / 3600.0                                            AS "lead_time_p50_hours"
FROM delivery_events
WHERE event_type = 'deploy' AND environment = 'prod' AND success
  AND service IN ($service)
  AND $__timeFilter(occurred_at)
GROUP BY 1
ORDER BY 1;

Define the $service variable as a query: SELECT DISTINCT service FROM delivery_events ORDER BY 1. The $__timeFilter and $__timeGroup macros bind to the dashboard controls automatically.

On gaming: every delivery metric has a perverse optimum. The defenses are structural, not motivational:

Never set per-team targets on a single metric. “Deploy more” rewards splitting one deploy into ten. Always pair throughput (frequency, lead time) with stability (CFR, MTTR), so a team can’t win one by wrecking the other.
Don’t tie metrics to performance reviews. The instant DORA is a personal KPI, the numbers become advocacy, not signal.
Show trend, not a single number. A team going from monthly to weekly deploys is the win, regardless of which band it lands in.
Make the pipeline the source of truth, so no one can hand-edit a value. If a number looks wrong, fix the event emission, not the dashboard.

Verify

Confirm the pipeline produces correct, trustworthy numbers before anyone reads a dashboard.

# 1. Webhook signature path works (expect 202 for valid, 401 for tampered).
BODY='{"ref":"refs/heads/main","after":"abc123",
       "repository":{"name":"checkout","default_branch":"main"},
       "commits":[{"id":"abc123","timestamp":"2026-06-08T09:00:00Z"}]}'
SIG="sha256=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$GH_WEBHOOK_SECRET" | awk '{print $2}')"
curl -i -X POST localhost:8080/hooks/github \
  -H "X-GitHub-Event: push" -H "X-Hub-Signature-256: $SIG" \
  -H "Content-Type: application/json" -d "$BODY"

-- 2. No negative lead times leaked in (clock-skew guard working).
SELECT count(*) AS bad_rows
FROM delivery_events
WHERE event_type = 'deploy' AND occurred_at < first_commit_ts;
-- expect 0

-- 3. CFR denominator includes successes (not just failures).
SELECT count(*) FILTER (WHERE success)      AS ok,
       count(*) FILTER (WHERE NOT success)  AS failed,
       count(*) FILTER (WHERE is_rollback)  AS rollbacks
FROM delivery_events
WHERE event_type = 'deploy' AND environment = 'prod';

-- 4. Every resolved incident has a matching open (correlation intact).
SELECT incident_id FROM delivery_events
WHERE event_type = 'incident_resolved'
  AND incident_id NOT IN (
    SELECT incident_id FROM delivery_events WHERE event_type = 'incident_open');
-- expect 0 rows

Then sanity-check one known release by hand: pick a recent prod deploy, find its first commit in git log, and confirm the dashboard’s lead time for that week matches the wall-clock gap. If it doesn’t, the bug is in the event, not the query.

Enterprise scenario

A platform team at a payments company ran twenty-six microservices behind Argo CD. Their first DORA dashboard counted GitHub merges as deployments, and it reported the whole org as “Elite” with sub-hour lead times. Leadership loved it; the on-call engineers knew it was nonsense. The gap: a merge to main only updated a Helm values file in the GitOps repo. The actual rollout happened minutes-to-hours later when Argo CD synced and the canary analysis passed, and a non-trivial fraction of syncs were rolled back by the analysis gate. Merges massively under-counted lead time and completely missed failures.

The constraint was that the deploy outcome lived in Argo CD, not in CI, and the team couldn’t bolt a curl into a sync the way they could into a pipeline job. They solved it by treating Argo CD as the deployment event source: a Kubernetes controller watched Rollout and Application resources and emitted a deploy event only when a rollout reached Healthy in prod, with success=false when it reached Degraded and is_rollback=true when Argo CD aborted to the previous revision. The first_commit_ts came from the image tag’s commit SHA, resolved against the app repo.

# Argo CD notifications trigger -> POST a real deploy event on rollout health.
template.deploy-event: |
  webhook.metrics:
    method: POST
    path: /events/deploy
    body: |
      {
        "service": "{{.app.metadata.labels.service}}",
        "team": "{{.app.metadata.labels.team}}",
        "environment": "prod",
        "commit_sha": "{{.app.status.sync.revision}}",
        "occurred_at": "{{.app.status.operationState.finishedAt}}",
        "success": {{eq .app.status.health.status "Healthy"}}
      }
trigger.on-deployed: |
  - when: app.status.health.status in ['Healthy', 'Degraded']
    send: [deploy-event]

Once the event came from the system that actually performs the deploy, the dashboard told a true and far less flattering story: median lead time was nine hours, change failure rate was eleven percent, and two services dragged the rest. Those two became the next quarter’s focus, and lead time on them halved within two months. Same data warehouse, same Grafana; the fix was moving the event boundary to the thing that ships.

Instrumenting DORA Metrics: Building a Deployment Frequency and Lead-Time Pipeline

1. Define the four keys precisely (and the pitfalls)

2. Identify the source events

3. Collect deployment and change events

4. Compute lead time for changes

5. Derive change failure rate and MTTR

6. Model per-service vs. per-team aggregation and bands

7. Build dashboards in Grafana (and avoid gaming)

Verify

Enterprise scenario

Checklist

Written by Vinod

Comments

Keep Reading

Blue-Green on Kubernetes with Argo Rollouts: Preview Services, Analysis Gates, and Automated Promotion

Standing Up Backstage as an Internal Developer Portal: Catalog, Software Templates, and TechDocs

Fast, Reproducible, Multi-Arch Builds with BuildKit Remote Cache and SBOM Attestations