Most DORA dashboards lie. They count GitHub merges as deploys, treat every closed Jira bug as a failure, and aggregate forty services into one meaningless company-wide number. The result is a chart leadership stares at while nobody’s delivery actually improves. DORA metrics are only useful when each value maps to a real, timestamped event in a system you control, and when the aggregation matches how you actually ship.
This guide builds that pipeline. We collect deployment and change events from your VCS and CI/CD, derive lead time from first commit to production, compute change failure rate and MTTR from incident and rollback signals, store everything in a queryable table, and surface it in Grafana without creating metrics people game. The reference stack is GitHub plus a generic CI runner plus PostgreSQL plus Grafana, but every event boundary maps cleanly to GitLab, Azure DevOps, Jenkins, PagerDuty, or Opsgenie.
1. Define the four keys precisely (and the pitfalls)
The four DORA metrics, per the Accelerate research and the annual DORA report, split into throughput and stability:
| Metric | Definition | Event boundary |
|---|---|---|
| Deployment frequency | How often you deploy to production | One row per successful prod deploy |
| Lead time for changes | Time from code committed to code running in prod | first_commit_ts to deployed_ts |
| Change failure rate | Percentage of deployments causing a failure in prod | failed deploys / total deploys |
| Mean time to restore (MTTR)* | Time to recover from a prod failure | incident_start_ts to incident_resolved_ts |
*The 2024 DORA report renamed MTTR to “failed deployment recovery time” to clarify it measures recovery from a deployment-induced failure, not from any incident. The boundary matters: a failed disk in a datacenter is reliability, not delivery. Scope MTTR to incidents linked to a deployment.
The pitfalls that produce fiction:
- Merge != deploy. A merge to
mainis a change event, not a deployment. Counting merges inflates frequency for teams that batch releases. - Lead time must start at first commit, not at PR open or merge. Code sitting in a long-lived branch is exactly the waste lead time is meant to expose. Use the earliest commit on the branch, not the merge commit timestamp.
- Change failure rate needs a denominator of deployments, not of incidents. CFR is
failed_deploys / total_deploys, so you must record successful deploys too. - Averages hide everything. DORA bands are about distributions. Report the median (p50) for lead time and MTTR; a single outlier rollback at 3am wrecks the mean.
2. Identify the source events
Map each metric to a concrete, timestamped signal before writing any code. For the reference stack:
| Event | Source | Signal |
|---|---|---|
| Change created | GitHub | push / merge to default branch; first commit SHA + author date |
| Build / pipeline run | CI | pipeline start and finish with commit SHA |
| Deployment | CI deploy job | explicit “deploy succeeded/failed to prod” emission |
| Incident opened | PagerDuty / Opsgenie | incident triggered with a service field |
| Incident resolved | PagerDuty / Opsgenie | incident resolved timestamp |
| Rollback | CI / GitOps | a deploy whose is_rollback=true |
The single most important design decision: the deployment event is emitted by the deploy job itself, not inferred. Inference (e.g. “a merge to main means a deploy happened”) breaks the moment you batch, gate, or have a failed promotion. Make the pipeline say “I deployed service X, version Y, to prod, at time T, and it succeeded.”
3. Collect deployment and change events
Two ingestion paths feed one normalized table: webhooks for things that happen in SaaS (merges, incidents) and an explicit CI annotation for deploys.
First, the schema. One wide event table is enough to start; you derive metrics with SQL.
CREATE TABLE delivery_events (
id BIGSERIAL PRIMARY KEY,
event_type TEXT NOT NULL, -- change | deploy | incident_open | incident_resolved
service TEXT NOT NULL,
team TEXT,
environment TEXT, -- prod | staging | ...
commit_sha TEXT,
first_commit_ts TIMESTAMPTZ, -- earliest commit in the change set
occurred_at TIMESTAMPTZ NOT NULL, -- when the event itself happened
success BOOLEAN, -- deploy outcome
is_rollback BOOLEAN DEFAULT FALSE,
incident_id TEXT, -- correlate open/resolved
source TEXT NOT NULL, -- github | ci | pagerduty
payload JSONB, -- raw event for auditing
ingested_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_events_service_time ON delivery_events (service, occurred_at);
CREATE INDEX idx_events_type ON delivery_events (event_type, environment);
A minimal receiver. Verify the webhook signature before trusting anything; GitHub signs with HMAC-SHA256 over the raw body using your webhook secret.
import hashlib, hmac, os, json
from datetime import datetime, timezone
from flask import Flask, request, abort
import psycopg
app = Flask(__name__)
SECRET = os.environ["GH_WEBHOOK_SECRET"].encode()
def verify(req) -> bool:
sig = req.headers.get("X-Hub-Signature-256", "")
mac = hmac.new(SECRET, req.get_data(), hashlib.sha256)
expected = "sha256=" + mac.hexdigest()
return hmac.compare_digest(expected, sig)
@app.post("/hooks/github")
def github():
if not verify(request):
abort(401)
if request.headers.get("X-GitHub-Event") != "push":
return "", 204
e = request.json
if e.get("ref") != f"refs/heads/{e['repository']['default_branch']}":
return "", 204 # only default-branch changes count
commits = e.get("commits", [])
if not commits:
return "", 204
# earliest commit timestamp in the pushed set = lead-time start
first_ts = min(c["timestamp"] for c in commits)
with psycopg.connect(os.environ["DATABASE_URL"]) as conn:
conn.execute(
"""INSERT INTO delivery_events
(event_type, service, commit_sha, first_commit_ts,
occurred_at, source, payload)
VALUES ('change', %s, %s, %s, %s, 'github', %s)""",
(e["repository"]["name"], e["after"], first_ts,
datetime.now(timezone.utc), json.dumps(e)),
)
return "", 202
Now the deploy event. Emit it from the pipeline at the end of the prod deploy step. This is a plain authenticated POST; the CI job already knows the version, environment, and outcome.
# Runs as the last step of the prod deploy job.
# DEPLOY_OK is "true"/"false" set by the preceding deploy step.
FIRST_COMMIT_TS=$(git log --reverse --format=%cI "origin/main..HEAD" | head -1)
[ -z "$FIRST_COMMIT_TS" ] && FIRST_COMMIT_TS=$(git show -s --format=%cI HEAD)
curl -fsS -X POST "$METRICS_INGEST_URL/events/deploy" \
-H "Authorization: Bearer $METRICS_TOKEN" \
-H "Content-Type: application/json" \
-d @- <<JSON
{
"service": "$SERVICE_NAME",
"team": "$TEAM_NAME",
"environment": "prod",
"commit_sha": "$(git rev-parse HEAD)",
"first_commit_ts":"$FIRST_COMMIT_TS",
"occurred_at": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"success": $DEPLOY_OK,
"is_rollback": ${IS_ROLLBACK:-false}
}
JSON
The git log origin/main..HEAD trick gives you the first commit unique to this change set, which is the correct lead-time start. For a merge-commit workflow, walk the merged commits instead via the GitHub compare API.
Make the metrics POST non-blocking for the deploy: wrap it so a metrics outage never fails a production release. Use
curl -fsS ... || echo "metrics emit failed"and alert on the gap separately. Delivery instrumentation must never be in the critical path of delivery.
4. Compute lead time for changes
Lead time is the gap between first_commit_ts and the prod deploy that carried that commit. Because the deploy event records its own first_commit_ts (from the CI snippet above), the computation is local to the deploy row, no join required:
SELECT
service,
date_trunc('week', occurred_at) AS week,
percentile_cont(0.5) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (occurred_at - first_commit_ts))
) / 3600.0 AS lead_time_p50_hours,
percentile_cont(0.9) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (occurred_at - first_commit_ts))
) / 3600.0 AS lead_time_p90_hours,
count(*) AS deploys
FROM delivery_events
WHERE event_type = 'deploy'
AND environment = 'prod'
AND success = TRUE
AND first_commit_ts IS NOT NULL
GROUP BY service, week
ORDER BY week DESC;
Two correctness notes. Report p50 and p90, not the mean, because lead-time distributions are heavily right-skewed. And guard against clock skew producing negative intervals: if a committer’s machine has a future clock, occurred_at - first_commit_ts goes negative. Filter WHERE occurred_at >= first_commit_ts or clamp at zero, and log the rejects so you can chase the bad runner.
Deployment frequency falls out of the same table trivially:
SELECT service,
date_trunc('day', occurred_at) AS day,
count(*) AS prod_deploys
FROM delivery_events
WHERE event_type = 'deploy' AND environment = 'prod' AND success = TRUE
GROUP BY service, day
ORDER BY day DESC;
5. Derive change failure rate and MTTR
Change failure rate is failed prod deploys over all prod deploys. Two failure signals count: a deploy that reports success=false, and a subsequent is_rollback=true deploy (a rollback is evidence the prior release failed).
WITH prod AS (
SELECT * FROM delivery_events
WHERE event_type = 'deploy' AND environment = 'prod'
)
SELECT
service,
date_trunc('week', occurred_at) AS week,
count(*) FILTER (WHERE success = FALSE OR is_rollback)::numeric
/ NULLIF(count(*), 0) AS change_failure_rate,
count(*) AS total_deploys
FROM prod
GROUP BY service, week
ORDER BY week DESC;
MTTR comes from correlating incident_open and incident_resolved events on incident_id. Scope to incidents tied to a service that deploys, so you measure deployment recovery rather than general reliability:
WITH incidents AS (
SELECT
service,
incident_id,
min(occurred_at) FILTER (WHERE event_type = 'incident_open') AS opened,
min(occurred_at) FILTER (WHERE event_type = 'incident_resolved') AS resolved
FROM delivery_events
WHERE event_type IN ('incident_open', 'incident_resolved')
GROUP BY service, incident_id
)
SELECT
service,
date_trunc('week', opened) AS week,
percentile_cont(0.5) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (resolved - opened))
) / 60.0 AS mttr_p50_minutes
FROM incidents
WHERE resolved IS NOT NULL
GROUP BY service, week
ORDER BY week DESC;
For the incident source, normalize the PagerDuty webhook into the same table. PagerDuty v3 webhooks deliver an envelope with event.event_type values like incident.triggered and incident.resolved:
@app.post("/hooks/pagerduty")
def pagerduty():
e = request.json["event"]
etype = {"incident.triggered": "incident_open",
"incident.resolved": "incident_resolved"}.get(e["event_type"])
if not etype:
return "", 204
data = e["data"]
with psycopg.connect(os.environ["DATABASE_URL"]) as conn:
conn.execute(
"""INSERT INTO delivery_events
(event_type, service, environment, occurred_at,
incident_id, source, payload)
VALUES (%s, %s, 'prod', %s, %s, 'pagerduty', %s)""",
(etype, data["service"]["summary"], e["occurred_at"],
data["id"], json.dumps(e)),
)
return "", 202
6. Model per-service vs. per-team aggregation and bands
DORA bands are defined per delivery unit, and the right unit depends on your org. A team owning one service: roll up by team. A platform team owning twenty services: per-service is the unit that drives action, with team as a secondary slice. Keep both service and team on every row (as the schema does) so you can aggregate either way without re-ingesting.
The published DORA performance bands (2024 report) are a useful, if coarse, yardstick:
| Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deployment frequency | On-demand (multiple/day) | Weekly to monthly | Monthly to every 6 months | Fewer than every 6 months |
| Lead time for changes | Less than one day | One day to one week | One week to one month | One to six months |
| Change failure rate | 5 percent | 10 percent | 15 percent | 64 percent (low cluster) |
| Failed deployment recovery | Less than one hour | Less than one day | One day to one week | More than six months |
Encode the bands once so dashboards and reports agree:
CREATE OR REPLACE FUNCTION lead_time_band(hours numeric)
RETURNS text LANGUAGE sql IMMUTABLE AS $$
SELECT CASE
WHEN hours < 24 THEN 'Elite'
WHEN hours < 24 * 7 THEN 'High'
WHEN hours < 24 * 30 THEN 'Medium'
ELSE 'Low'
END;
$$;
Resist averaging metrics across services into one org number. A platform team’s true picture is “16 of 20 services are Elite on lead time, 4 are Medium and here’s why,” not “the org is High.” The per-service breakdown is what you act on.
7. Build dashboards in Grafana (and avoid gaming)
Point Grafana at the PostgreSQL data source and back each panel with the SQL above. Use Grafana’s macros so panels respect the dashboard time range and the $service template variable:
-- Grafana panel: weekly lead time p50 per selected service
SELECT
$__timeGroup(occurred_at, '1w') AS time,
percentile_cont(0.5) WITHIN GROUP (
ORDER BY EXTRACT(EPOCH FROM (occurred_at - first_commit_ts))
) / 3600.0 AS "lead_time_p50_hours"
FROM delivery_events
WHERE event_type = 'deploy' AND environment = 'prod' AND success
AND service IN ($service)
AND $__timeFilter(occurred_at)
GROUP BY 1
ORDER BY 1;
Define the $service variable as a query: SELECT DISTINCT service FROM delivery_events ORDER BY 1. The $__timeFilter and $__timeGroup macros bind to the dashboard controls automatically.
On gaming: every delivery metric has a perverse optimum. The defenses are structural, not motivational:
- Never set per-team targets on a single metric. “Deploy more” rewards splitting one deploy into ten. Always pair throughput (frequency, lead time) with stability (CFR, MTTR), so a team can’t win one by wrecking the other.
- Don’t tie metrics to performance reviews. The instant DORA is a personal KPI, the numbers become advocacy, not signal.
- Show trend, not a single number. A team going from monthly to weekly deploys is the win, regardless of which band it lands in.
- Make the pipeline the source of truth, so no one can hand-edit a value. If a number looks wrong, fix the event emission, not the dashboard.
Verify
Confirm the pipeline produces correct, trustworthy numbers before anyone reads a dashboard.
# 1. Webhook signature path works (expect 202 for valid, 401 for tampered).
BODY='{"ref":"refs/heads/main","after":"abc123",
"repository":{"name":"checkout","default_branch":"main"},
"commits":[{"id":"abc123","timestamp":"2026-06-08T09:00:00Z"}]}'
SIG="sha256=$(printf '%s' "$BODY" | openssl dgst -sha256 -hmac "$GH_WEBHOOK_SECRET" | awk '{print $2}')"
curl -i -X POST localhost:8080/hooks/github \
-H "X-GitHub-Event: push" -H "X-Hub-Signature-256: $SIG" \
-H "Content-Type: application/json" -d "$BODY"
-- 2. No negative lead times leaked in (clock-skew guard working).
SELECT count(*) AS bad_rows
FROM delivery_events
WHERE event_type = 'deploy' AND occurred_at < first_commit_ts;
-- expect 0
-- 3. CFR denominator includes successes (not just failures).
SELECT count(*) FILTER (WHERE success) AS ok,
count(*) FILTER (WHERE NOT success) AS failed,
count(*) FILTER (WHERE is_rollback) AS rollbacks
FROM delivery_events
WHERE event_type = 'deploy' AND environment = 'prod';
-- 4. Every resolved incident has a matching open (correlation intact).
SELECT incident_id FROM delivery_events
WHERE event_type = 'incident_resolved'
AND incident_id NOT IN (
SELECT incident_id FROM delivery_events WHERE event_type = 'incident_open');
-- expect 0 rows
Then sanity-check one known release by hand: pick a recent prod deploy, find its first commit in git log, and confirm the dashboard’s lead time for that week matches the wall-clock gap. If it doesn’t, the bug is in the event, not the query.
Enterprise scenario
A platform team at a payments company ran twenty-six microservices behind Argo CD. Their first DORA dashboard counted GitHub merges as deployments, and it reported the whole org as “Elite” with sub-hour lead times. Leadership loved it; the on-call engineers knew it was nonsense. The gap: a merge to main only updated a Helm values file in the GitOps repo. The actual rollout happened minutes-to-hours later when Argo CD synced and the canary analysis passed, and a non-trivial fraction of syncs were rolled back by the analysis gate. Merges massively under-counted lead time and completely missed failures.
The constraint was that the deploy outcome lived in Argo CD, not in CI, and the team couldn’t bolt a curl into a sync the way they could into a pipeline job. They solved it by treating Argo CD as the deployment event source: a Kubernetes controller watched Rollout and Application resources and emitted a deploy event only when a rollout reached Healthy in prod, with success=false when it reached Degraded and is_rollback=true when Argo CD aborted to the previous revision. The first_commit_ts came from the image tag’s commit SHA, resolved against the app repo.
# Argo CD notifications trigger -> POST a real deploy event on rollout health.
template.deploy-event: |
webhook.metrics:
method: POST
path: /events/deploy
body: |
{
"service": "{{.app.metadata.labels.service}}",
"team": "{{.app.metadata.labels.team}}",
"environment": "prod",
"commit_sha": "{{.app.status.sync.revision}}",
"occurred_at": "{{.app.status.operationState.finishedAt}}",
"success": {{eq .app.status.health.status "Healthy"}}
}
trigger.on-deployed: |
- when: app.status.health.status in ['Healthy', 'Degraded']
send: [deploy-event]
Once the event came from the system that actually performs the deploy, the dashboard told a true and far less flattering story: median lead time was nine hours, change failure rate was eleven percent, and two services dragged the rest. Those two became the next quarter’s focus, and lead time on them halved within two months. Same data warehouse, same Grafana; the fix was moving the event boundary to the thing that ships.