DevOps Fundamentals

DevOps Fundamentals: Culture, CI/CD, the DevOps Lifecycle & DORA Metrics

Ask ten engineers what DevOps means and you will get a job title, a Jenkins server, a Kubernetes cluster, and a YAML file. None of those is the answer. DevOps is a way of working that closes the gap between the people who build software (Dev) and the people who run it in production (Ops) — and the practices, automation and metrics that make that collaboration fast, safe and repeatable. The tools are downstream of the idea. You can buy every pipeline product on the market and still not be doing DevOps; conversely, a small team with a shell script and a culture of shared ownership often outperforms an enterprise drowning in tooling.

This lesson is the on-ramp for the whole DevOps track. By the end you will be able to explain what DevOps is and why the old wall between Dev and Ops was so expensive; you will know the CALMS model that frames a DevOps transformation; you will be able to walk the lifecycle infinity loop stage by stage; you will understand — precisely, because interviewers love this question — the difference between Continuous Integration, Continuous Delivery and Continuous Deployment; you will be able to run a value-stream mapping exercise to find where work actually gets stuck; and you will know the four DORA metrics and the Elite/High/Medium/Low tiers that turn “are we any good at this?” into a number you can track. Throughout, the three principles behind everything — flow, feedback and continual learning — keep recurring, because they are the spine of the whole discipline.

Learning objectives

After working through this lesson you will be able to:

Prerequisites

You need almost nothing to start. A general familiarity with how software gets written and shipped — that code lives in a repository, that it has to be built and tested, and that it eventually runs on a server somewhere — is plenty. No prior pipeline experience, no cloud background and no programming specialism are assumed; every term is defined as it appears. This is the first stop in the DevOps Zero-to-Hero ladder, and everything that follows — YAML for pipelines, CI/CD pipeline design, deployment strategies, GitOps, DevSecOps and internal developer platforms — builds directly on the vocabulary and mental models below. If you have read the course’s Terraform or Kubernetes tracks you will already recognise the value of automation and reproducibility; this lesson gives you the framework that ties all of it together.

What DevOps is — and the wall it tears down

For most of software history, two tribes owned the lifecycle. Developers were measured on change: ship features, close tickets, move fast. Operations were measured on stability: keep the lights on, no outages, no surprises. Those incentives are in direct opposition. Developers wanted to deploy on Friday afternoon; operations wanted a change freeze. The handoff between them — developers finishing code and “throwing it over the wall” to ops to run — became the single most expensive and error-prone moment in the whole process. Ops received software they had never seen, with deployment instructions in a Word document, and were then held accountable when it failed at 2am.

This is the silo problem, and DevOps is the response to it. The core move is to make delivery and operations a shared responsibility — “you build it, you run it”, in Amazon’s famous phrasing — so that the people writing the code feel the operational consequences of their choices, and the people running it have a say in how it is built. The wall comes down. Crucially, DevOps does not mean “developers do ops” or “we fired the ops team and renamed everyone DevOps Engineer”. It means the system of work changes so the two concerns are aligned rather than adversarial.

It is worth naming what DevOps is not, because the misconceptions are everywhere:

DevOps is NOT… Why that’s wrong
A job title “DevOps Engineer” exists, but DevOps is an organisational practice; one person cannot “be the DevOps”.
A specific tool Jenkins, GitHub Actions, Docker and Kubernetes enable DevOps; buying them does not make you a DevOps shop.
Just automation Automation is one pillar (the A in CALMS). Culture is the hard part and the part tools cannot buy.
The same as a CI/CD pipeline A pipeline is an implementation of some DevOps practices, not the whole discipline.
A department you create Creating a separate “DevOps team” often just builds a third silo between Dev and Ops.

The deepest research on the topic — the Accelerate book and the annual DORA (DevOps Research and Assessment) reports — found that the organisations that adopt these ways of working ship more often and more reliably at the same time. The old belief that you must trade speed for stability turns out to be false: the best performers are better at both, because the same practices (small batches, automation, fast feedback) improve throughput and stability simultaneously.

CALMS: the five dimensions of a DevOps transformation

If “culture” sounds too fuzzy to act on, the CALMS model breaks it into five concrete dimensions. It is the standard lens for assessing how far along a DevOps journey an organisation is, and it is a frequent interview prompt.

Letter Dimension What it means What it looks like when present
C Culture Shared ownership, blameless attitude, collaboration over hand-offs Dev and Ops in the same standups; blameless post-mortems; psychological safety
A Automation Remove manual, repetitive, error-prone toil from build, test and deploy One-button builds and deploys; infrastructure as code; automated tests
L Lean Small batch sizes, eliminate waste, optimise flow (from Lean manufacturing) Small frequent changes; work-in-progress limits; value-stream thinking
M Measurement Decide with data, not opinion; measure flow, quality and feedback DORA metrics tracked; dashboards; SLOs; error budgets
S Sharing Share knowledge, tools, responsibility and outcomes across teams Internal docs and platforms; pairing; communities of practice; shared on-call

Two observations a senior practitioner will stress. First, Culture is the load-bearing letter — the other four are far easier to buy or build, and a team with great automation but a blame culture will still under-perform, because people will hide failures rather than learn from them. Second, the letters reinforce each other: Measurement without Sharing produces dashboards nobody acts on; Automation without Lean just lets you ship large, risky batches faster. Treat CALMS as a system, not a checklist.

The DevOps lifecycle: the infinity loop

DevOps work is usually drawn as an infinity loop (a sideways figure-eight) rather than a straight line or a one-way pipeline. The shape is deliberate: software delivery is continuous and cyclical, not a project with an end. What you learn from running software in production (the right-hand loop, “Ops”) feeds directly back into what you plan and build next (the left-hand loop, “Dev”). There is no finish line; you go round and round, ideally getting faster and safer each lap.

The loop has eight canonical stages. The first four are the “Dev” side, the second four are the “Ops” side, and they meet in the middle:

Stage Side What happens Typical tooling
Plan Dev Define what to build; backlog, requirements, design Jira, Azure Boards, GitHub Issues
Code Dev Write the software; version control; review Git, GitHub/GitLab, IDEs
Build Dev Compile/package the code into an artifact Maven, npm, Docker, MSBuild
Test Dev Verify correctness and quality automatically JUnit, pytest, Selenium, SAST/SCA
Release Ops Version and stage the validated artifact, ready to ship Artifact repos, release pipelines, approvals
Deploy Ops Push the release into an environment (and ultimately production) Argo CD, Helm, Spinnaker, cloud deploy
Operate Ops Run the software; manage infrastructure and scale Kubernetes, Terraform, cloud platforms
Monitor Ops Observe behaviour, performance and incidents in production Prometheus, Grafana, Datadog, OpenTelemetry

The arrow from Monitor back to Plan is the most important one in the whole diagram, and the one teams most often neglect. It is the feedback loop: production telemetry, incident learnings and user behaviour become inputs to the next planning cycle. A team that ships into a black hole — deploying without monitoring, or monitoring without feeding insights back into planning — has drawn a line, not a loop, and has thrown away the compounding advantage DevOps is meant to give.

It is worth distinguishing the lifecycle (this conceptual loop — the stages of work) from a CI/CD pipeline (a concrete, automated implementation that typically spans Code → Build → Test → Release → Deploy). The lifecycle is the map; the pipeline is one vehicle that drives part of it.

The DevOps lifecycle & DORA metrics

The diagram above shows the eight-stage infinity loop with the Dev and Ops sides meeting in the middle, the all-important feedback arrow from Monitor back to Plan, and the four DORA metrics overlaid on the points of the loop they measure — so you can see where in the cycle each metric is taken.

CI vs CD vs Continuous Deployment — the distinction interviewers probe

These three terms are used loosely and incorrectly all the time, and being able to separate them cleanly is a reliable signal of seniority. They form a ladder: each builds on the one before.

Continuous Integration (CI) is the practice of every developer merging their work into a shared mainline frequently — at least daily — with every merge automatically built and tested. The point is to catch integration problems within hours instead of discovering, weeks later, that two branches are hopelessly incompatible (“merge hell”). CI is fundamentally about keeping the codebase in a known-good, always-buildable state. Its prerequisites are a version-control system, a trunk or short-lived branches, a fast automated test suite, and the team discipline to fix a broken build immediately.

Continuous Delivery (CD) extends CI: every change that passes the pipeline is automatically built, tested and packaged into a release artifact that is ready to deploy to production at any time — but the final push to production is a manual, business decision (someone clicks “deploy”). The software is always in a deployable state; you choose when to release. This is the sweet spot for the majority of organisations, especially those with regulatory sign-offs or marketing-driven release timing.

Continuous Deployment goes one step further: every change that passes all automated stages is automatically deployed to production with no human gate. There is no “deploy” button — passing the pipeline is the deploy. This demands the highest maturity: comprehensive automated testing, robust monitoring, feature flags to decouple deploy from release, and automated rollback, because there is no human in the loop to catch a bad change.

The single most useful way to hold the difference in your head:

Build & test on every commit Always release-ready artifact Deploy to prod automatically
Continuous Integration Yes
Continuous Delivery Yes Yes No — manual approval to release
Continuous Deployment Yes Yes Yes — no human gate

Note the abbreviation trap: “CD” almost always means Continuous Delivery. Continuous Deployment is the stricter, fully-automated cousin and is usually written out in full to avoid confusion. The practical difference between the two is exactly one manual approval step — but that step represents a large jump in the testing, observability and rollback maturity a team needs before it is safe to remove it.

Value-stream mapping: finding the real bottleneck

You can have a beautiful CI/CD pipeline and still take three weeks to get a one-line change into production — because the pipeline is only one segment of the journey from idea to value delivered to a user. Value-stream mapping (VSM) is a Lean technique, borrowed from manufacturing, for making that entire journey visible so you can find where time is actually lost. It is the practical expression of the Lean in CALMS and the flow principle.

The method is straightforward:

  1. Map every step from a customer request (or idea) to it running in production and delivering value — including the invisible ones: backlog grooming, waiting for review, waiting for QA, waiting for a release window, waiting for a change-approval board.
  2. For each step, record two numbers: process time (PT) — how long the work actually takes when someone is doing it — and lead time (LT) — the total elapsed time from when the step could start to when the next step begins, including all the waiting.
  3. Compute flow efficiency = total process time ÷ total lead time. In most unoptimised organisations this is shockingly low — often under 15% — meaning work spends the overwhelming majority of its life sitting in a queue, not being worked on.
  4. Attack the biggest wait, not the biggest work. The instinct is to optimise the coding or build step; the data almost always shows the bottleneck is a queue — a code review nobody picks up, a weekly release train, a manual approval that takes four days.

A simplified value stream for a typical change might look like this:

Step Process time (working) Lead time (incl. waiting) The waste
In backlog → picked up 5 days Prioritisation queue
Coding 4 hours 1 day
Waiting for code review 2 days Reviewer queue (the classic killer)
Build & automated tests 20 min 20 min Already automated — fast
Waiting for QA 3 days Manual QA queue
Waiting for release window 4 days Weekly release train
Deploy 15 min 15 min

Here the actual work is roughly half a day, but the change takes well over two weeks — a flow efficiency in the low single-digit percentages. Notice that automating the build harder buys you minutes, while killing the weekly release train and the review queue buys you days. VSM stops teams from optimising the part that feels technical and ignoring the queues that dominate the timeline. This is also exactly why DORA’s lead time for changes is such a powerful metric: it is the value stream, distilled to a single number.

The four DORA metrics

Everything above is qualitative. The DORA metrics are how you make it measurable. DORA — DevOps Research and Assessment, the research program behind the Accelerate book and the annual State of DevOps report — identified four key metrics that, together, reliably distinguish high-performing software teams from low-performing ones. Two measure throughput (speed) and two measure stability (quality) — and the central finding is that the best teams excel at both at once, demolishing the old speed-versus-stability trade-off.

Metric Pillar What it measures How it’s taken
Deployment frequency Throughput How often you successfully release to production Count of successful prod deployments per unit time
Lead time for changes Throughput How fast a commit gets to production Median time from code committed → running in prod
Change failure rate (CFR) Stability How often a deployment causes a failure % of deployments needing a hotfix, rollback or patch
Failed deployment recovery time Stability How quickly you recover from a failed deployment Median time from failure detected → service restored

A few precision points that separate a strong answer from a vague one:

Crucially, you measure all four together. Optimising one in isolation invites gaming: a team told only to raise deployment frequency can ship tiny, pointless commits; a team told only to lower change failure rate can simply stop deploying. The four form a balanced scorecard — two for speed, two for safety — precisely so that improving the system, not gaming a number, is the only way to move them all in the right direction.

The DORA performance tiers: Elite, High, Medium, Low

DORA groups teams into four performance tiers based on those metrics. The exact numeric thresholds shift year to year as the whole industry improves and as the report occasionally collapses or relabels bands, so treat the figures below as representative orders of magnitude rather than gospel — what matters is the dramatic gap between tiers.

Tier Deployment frequency Lead time for changes Change failure rate Recovery time
Elite On-demand — multiple times per day Less than one hour 0–15% Less than one hour
High Between once per day and once per week One day to one week 16–30% Less than one day
Medium Between once per week and once per month One week to one month 16–30% Less than one day
Low Between once per month and once every six months One to six months 16–30%+ Up to a week or more

The headline that lands in interviews: Elite performers deploy roughly hundreds to thousands of times more frequently than Low performers, with lead times measured in hours rather than months, and they recover from failures faster too. The gap between Elite and Low is not a few percent — it is orders of magnitude, on both speed and stability simultaneously. That is the empirical proof behind every claim in this lesson: the practices (small batches, automation, fast feedback, shared ownership) are not just pleasant; they produce measurably, dramatically better outcomes.

The right way to use the tiers is as a direction, not a leaderboard. Find where your team honestly sits, identify the single metric holding you back, trace it to a cause in your value stream, fix that, and measure again. That loop — measure, learn, improve, re-measure — is itself the DevOps method applied to your own delivery.

The three principles underneath everything: flow, feedback, continual learning

Strip away the tools and the acronyms and DevOps rests on three principles, often called the Three Ways (from The Phoenix Project and The DevOps Handbook). Every practice in this lesson is an expression of one of them.

Hold these three in mind and the rest of the DevOps track reads as variations on a theme: every pipeline stage, deployment strategy and platform you will build is ultimately there to improve flow, tighten feedback, or accelerate learning.

Hands-on lab

You do not need any cloud account or paid tooling for this lab — only Git (free) and a terminal. The goal is to experience the smallest possible CI loop and to calculate DORA-style numbers by hand, so the concepts stop being abstract.

Part A — a one-command “pipeline” locally. We will simulate the Build → Test stages of the lifecycle with a tiny script, so you can feel what CI automates.

  1. Create a working folder and initialise a repository:

    mkdir devops-onramp && cd devops-onramp
    git init
    
  2. Create a trivial “application” — a shell function — and a test for it:

    cat > app.sh <<'EOF'
    add() { echo $(( $1 + $2 )); }
    EOF
    
    cat > test.sh <<'EOF'
    . ./app.sh
    result=$(add 2 3)
    if [ "$result" -eq 5 ]; then echo "PASS"; else echo "FAIL: got $result"; exit 1; fi
    EOF
    
  3. Create a local “CI pipeline” — the script a CI server would run on every commit — that builds (here, just syntax-checks) and tests:

    cat > ci.sh <<'EOF'
    set -e
    echo "== Build (syntax check) =="; bash -n app.sh && echo "build ok"
    echo "== Test =="; bash test.sh
    echo "== Pipeline green =="
    EOF
    chmod +x ci.sh
    
  4. Run your “pipeline”:

    ./ci.sh
    

    Expected output:

    == Build (syntax check) ==
    build ok
    == Test ==
    PASS
    == Pipeline green ==
    
  5. Now practise Continuous Integration by hand: make a change, run the pipeline before committing, and only commit if it is green — exactly what a real CI server enforces automatically.

    git add app.sh test.sh ci.sh
    ./ci.sh && git commit -m "feat: add() with passing CI"
    

    Validation: git log --oneline shows one commit, and it only exists because the pipeline passed. Try breaking app.sh (e.g. change the + to -), re-run ./ci.sh, and watch it exit with FAIL — that red build is the fast feedback CI exists to give you.

Part B — calculate DORA metrics by hand. Imagine the last 30 days of one service: 60 production deployments, of which 6 needed a hotfix; the median commit-to-prod time was 3 hours; the median recovery from those 6 failures was 25 minutes.

Write those four numbers down and place the team in the tier table above. This is the entire DORA practice in miniature: capture four signals, compute four numbers, locate yourself, and decide what to improve.

Cleanup: there is nothing to tear down — just remove the practice folder if you wish:

cd .. && rm -rf devops-onramp

Cost note: zero. This lab uses only Git and a shell, both free, and runs entirely on your machine.

Common mistakes & troubleshooting

Symptom Cause Fix
“We adopted DevOps but nothing improved” Bought tools, never changed the culture or the hand-offs Start with the C in CALMS — shared ownership, blameless post-mortems — before more automation
Created a separate “DevOps team” and silos got worse A new team became a third wall between Dev and Ops Embed responsibility in delivery teams; a platform team enables, it does not gate
Deployment frequency looks great but quality fell Optimised one DORA metric in isolation Always track all four; pair every throughput metric with a stability metric
Pipeline is fast but changes still take weeks The bottleneck is a queue, not the pipeline Run a value-stream map; attack the biggest wait (review/QA/release window), not the build
DORA dashboard counts merges as deployments Conflated a change event with a release event Emit an explicit deploy event from the deploy job; count production releases only
“Continuous Deployment” claimed, but there’s a manual approval Confused Delivery with Deployment If a human clicks deploy, it is Continuous Delivery; Deployment has no gate
Averages make metrics swing wildly Using mean for lead time / recovery Report the median (p50); one outlier wrecks a mean
Post-mortems hunt for who to blame Blame culture; people hide failures Make post-mortems blameless; the system, not the person, is the defect

Best practices

Security notes

DevOps done well makes software more secure, not less — but only if security is built into the loop rather than bolted on at the end. A few foundational points that the later DevSecOps lessons expand on:

Interview & exam questions

1. Is DevOps a tool, a team, or a culture? Primarily a culture — a way of working that unites Dev and Ops around shared ownership of delivery and operations — supported by practices (CI/CD, IaC, monitoring) and automation/tools. It is not a single tool, and creating an isolated “DevOps team” usually backfires by adding a third silo.

2. What does CALMS stand for, and which letter is hardest? Culture, Automation, Lean, Measurement, Sharing. Culture is the hardest and most important — the other four can be bought or built, but without a collaborative, blameless culture they under-deliver.

3. Why is the DevOps lifecycle drawn as an infinity loop? Because delivery is continuous and cyclical, not a project with an end. Learnings from running software in production (Monitor) feed back into Plan, so the loop never closes — you iterate forever, ideally faster each lap.

4. Explain the difference between Continuous Integration, Continuous Delivery and Continuous Deployment. CI: every commit is automatically built and tested against a shared mainline. Continuous Delivery: every passing change is also packaged into a release artifact that is always ready to deploy, but the production release is a manual decision. Continuous Deployment: every passing change is automatically deployed to production with no human gate. The difference between the last two is exactly one manual approval step.

5. “CD” — Delivery or Deployment? By convention CD = Continuous Delivery. Continuous Deployment is the stricter, fully-automated variant and is usually spelled out to avoid ambiguity.

6. Name the four DORA metrics and which pillar each belongs to. Deployment frequency and lead time for changes (throughput/speed); change failure rate and failed deployment recovery time (formerly MTTR) (stability/quality).

7. Why must you track all four DORA metrics together? Because optimising one in isolation invites gaming — you can raise deployment frequency with trivial commits, or lower change failure rate by simply not deploying. The four form a balanced scorecard (two speed, two stability) so that only genuinely improving the system moves them all.

8. What does DORA’s central finding say about speed versus stability? That the trade-off is false. Elite performers are better at both — they deploy far more often and fail less and recover faster — because small batches, automation and fast feedback improve throughput and stability simultaneously.

9. What is value-stream mapping and what does it usually reveal? A Lean technique that maps every step from idea to production, recording process time and waiting time for each. It usually reveals that the bottleneck is a queue (code review, manual QA, a release window) rather than the actual engineering work, and that flow efficiency is shockingly low (often well under 15%).

10. What is the most-neglected arrow in the lifecycle, and why does it matter? The Monitor → Plan feedback arrow. Without it you ship into a black hole and lose the compounding learning that makes DevOps pay off — you have drawn a line, not a loop.

11. What are the Three Ways? Flow (optimise the whole left-to-right stream), Feedback (fast feedback from production back to development), and Continual learning/experimentation (a blameless culture of improvement). Every DevOps practice serves one of them.

12. How would you measure whether a DevOps initiative is actually working? Track the four DORA metrics at the team/service level over time and watch the trend. Improvement in throughput and stability together is the signal; movement in only one (or gaming) is not.

Quick check

  1. True or false: buying a CI/CD tool means your organisation is “doing DevOps”.
  2. Which CALMS letter is generally considered the hardest and most important?
  3. In one sentence, what is the difference between Continuous Delivery and Continuous Deployment?
  4. Which two DORA metrics measure stability rather than throughput?
  5. In a value-stream map, where is the bottleneck almost always found?

Answers

  1. False. Tools enable DevOps but do not constitute it; culture and practices are the substance.
  2. Culture (the C).
  3. Continuous Delivery keeps every change ready to release but requires a manual approval to deploy to production; Continuous Deployment deploys every passing change automatically with no human gate.
  4. Change failure rate and failed deployment recovery time (MTTR).
  5. In a queue / waiting step (e.g. code review, manual QA, or a release window) — not in the active engineering work.

Exercise

Pick a real change you or your team shipped recently — anything from a feature to a config tweak. Then:

  1. Map its value stream. List every step from “idea/ticket created” to “running in production and delivering value”, including all the waiting. For each step, estimate the process time (active work) and the lead time (elapsed, including waiting).
  2. Compute flow efficiency (total process time ÷ total lead time). Be honest — most people are surprised how low it is.
  3. Identify the single biggest wait and write one concrete change that would shrink it (e.g. “review SLA of 4 hours”, “deploy on demand instead of the Thursday train”).
  4. Estimate your four DORA metrics for that service over the last month and place the team in a tier.
  5. Write a short paragraph: which DORA metric is your weakest, which value-stream queue is most responsible for it, and which of the Three Ways (flow, feedback, learning) your proposed fix serves. This is exactly the reasoning a senior engineer brings to a delivery retrospective.

Certification mapping

This lesson maps to the foundational, vendor-neutral DevOps knowledge that recurs across certifications:

The vocabulary here — CALMS, the infinity loop, CI vs CD vs Continuous Deployment, the four DORA metrics and tiers — appears in the opening, conceptual questions of essentially every DevOps exam.

Glossary

Next steps

You now have the conceptual spine of DevOps: the culture and CALMS, the lifecycle loop, the CI/CD ladder, value-stream thinking, and the DORA metrics that tell you whether any of it is working. The next lesson, YAML for DevOps: Pipelines, Anchors, Templates & the Gotchas (yaml-for-devops-pipelines-anchors-templates-jinja), grounds all of this in the language every pipeline and manifest is written in — because before you can build a CI/CD pipeline, you need to read and write YAML without falling into its famous traps. From there the track moves into CI/CD Pipeline Design (cicd-pipeline-design-stages-gates-artifacts) and Deployment Strategies (devops-deployment-strategies-rolling-bluegreen-canary-flags), where the practices in this lesson become concrete, running pipelines.

DevOpsCI/CDDORACALMSValue StreamCulture
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading