Set Up Jenkins on Kubernetes with the Kubernetes Plugin and Ephemeral Agent Pods

A 200-engineer platform team is bleeding money on a row of permanently-on Jenkins agent VMs that sit at 8% utilization overnight and still buckle at 9am when every squad pushes at once. Worse, the agents have drifted: one has JDK 17, another JDK 21, a third has a stale Trivy binary, and “works on the build server” has stopped meaning anything. The fix is to stop treating build capacity as a fleet of pets and start treating it as ephemeral pods — the Jenkins Kubernetes plugin asks the cluster for a fresh agent pod when a job needs one, runs the build inside per-job container templates pinned to exact tool versions, and deletes the pod the instant the build finishes. You pay only for the seconds a build actually runs, every build starts from an identical image, and the controller config lives in version control as code. This guide walks the whole thing end to end: a Helm-installed controller, JCasC (Jenkins Configuration as Code) to declare the cloud and pod templates, a real multi-container pipeline, then validation, rollback, and the security/cost notes that keep it production-grade.

Prerequisites

A Kubernetes cluster, 1.28+ (EKS, AKS, GKE, or on-prem) with a working kubectl context and cluster-admin for the initial install.
Helm 3.12+ and the jenkinsci/helm-charts repo reachable.
A default StorageClass for the controller’s persistent home (e.g. gp3 on EKS, managed-csi on AKS).
An ingress controller (NGINX or your cloud’s) and a DNS name you control, fronted by Akamai for edge TLS termination, WAF, and bot mitigation so the controller UI is never raw-exposed.
An OIDC identity provider — Okta as the workforce IdP (federated to Microsoft Entra ID where Azure RBAC is in play) — to back Jenkins SSO instead of local accounts.
HashiCorp Vault reachable from the cluster for build secrets (registry creds, signing keys) injected at runtime rather than baked into images.
A container registry (ECR/ACR/GHCR) for your agent images, and Terraform already managing the cluster and its node pools.

Target topology

Set Up Jenkins on Kubernetes with the Kubernetes Plugin and Ephemeral Agent Pods — topology

The shape is deliberately simple and that is the point. A single long-lived Jenkins controller runs as a StatefulSet in a jenkins namespace, with its $JENKINS_HOME on a persistent volume so jobs, build history, and credentials survive a pod restart. The controller holds no build capacity of its own — it is a scheduler and a UI. When a pipeline requests an agent, the Kubernetes plugin calls the Kubernetes API and creates an ephemeral agent pod in a separate jenkins-agents namespace. That pod contains a jnlp container (the Jenkins agent process that phones home over JNLP/WebSocket) plus one or more per-job tool containers — maven, node, docker/kaniko, trivy, whatever the job’s podTemplate declares. The build steps run inside those containers; when the pipeline ends, the plugin deletes the pod and the node-level autoscaler reclaims the node minutes later. Argo CD keeps the controller’s Helm release and JCasC in sync with Git (GitOps), so the controller is reproducible and drift-free. Identity flows Okta → Entra → Jenkins OIDC; secrets flow Vault → agent pod at runtime; and Wiz, CrowdStrike Falcon, and Datadog observe the whole namespace.

1. Create namespaces and the agent service account

Isolate the controller from the throwaway agents. The agents get their own namespace and a tightly-scoped service account — they should never be able to mutate the controller.

kubectl create namespace jenkins
kubectl create namespace jenkins-agents

# Service account the agent pods run as
kubectl -n jenkins-agents create serviceaccount jenkins-agent

Grant the controller’s service account permission to create and delete pods in the agents namespace only — least privilege, scoped by RoleBinding, never a cluster-wide ClusterRoleBinding:

# agent-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: jenkins-agent-manager
  namespace: jenkins-agents
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/exec", "pods/log"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: jenkins-controller-manages-agents
  namespace: jenkins-agents
subjects:
  - kind: ServiceAccount
    name: jenkins                # created by the Helm chart in step 2
    namespace: jenkins
roleRef:
  kind: Role
  name: jenkins-agent-manager
  apiGroup: rbac.authorization.k8s.io

kubectl apply -f agent-rbac.yaml

2. Install the Jenkins controller with Helm

Use the official chart. The key is to pin the agents into the right namespace, give the controller a persistent home, and turn on the plugins we need — the Kubernetes plugin, JCasC, and the OIDC plugin for Okta/Entra SSO.

helm repo add jenkins https://charts.jenkins.io
helm repo update

# values.yaml  (managed by Terraform/Argo CD in production)
controller:
  image:
    tag: "2.452.3-lts-jdk17"
  installPlugins:
    - kubernetes:4253.v7700d91739e8
    - configuration-as-code:1810.v9b_c30a_249a_4c
    - workflow-aggregator:600.vb_57cdd26fdd7   # Pipeline
    - git:5.2.2
    - oic-auth:4.418.vccc7061f5b_6d             # OIDC for Okta/Entra
    - hashicorp-vault-plugin:370.va_4c5fe9f9a_69 # Vault credentials
  JCasC:
    defaultConfig: true
    configScripts:
      jenkins-casc: |
        # filled in step 3
  serviceType: ClusterIP        # exposed via Ingress + Akamai, not LoadBalancer
  resources:
    requests: { cpu: "1",   memory: "2Gi" }
    limits:   { cpu: "2",   memory: "4Gi" }
persistence:
  enabled: true
  storageClass: "gp3"
  size: "30Gi"
agent:
  enabled: false                # we declare agents in JCasC, not chart defaults
serviceAccount:
  create: true
  name: jenkins

Install it:

helm upgrade --install jenkins jenkins/jenkins \
  --namespace jenkins \
  --values values.yaml \
  --wait --timeout 10m

Fetch the initial admin password (you will replace this with OIDC in step 3):

kubectl -n jenkins exec -it sts/jenkins -c jenkins -- \
  cat /run/secrets/additional/chart-admin-password

3. Configure the cloud and pod templates with JCasC

This is the heart of the setup. Everything below goes in the JCasC.configScripts.jenkins-casc block from step 2 (or as a separate ConfigMap that the chart mounts). JCasC declares the Kubernetes cloud (how the controller talks to the API and where agents land) and one or more reusable pod templates.

jenkins:
  clouds:
    - kubernetes:
        name: "k8s"
        serverUrl: "https://kubernetes.default.svc"
        namespace: "jenkins-agents"
        jenkinsUrl: "http://jenkins.jenkins.svc.cluster.local:8080"
        jenkinsTunnel: "jenkins-agent.jenkins.svc.cluster.local:50000"
        directConnection: false          # WebSocket/JNLP through the controller svc
        containerCapStr: "50"            # hard ceiling on concurrent agent pods
        connectTimeout: 100
        readTimeout: 200
        podRetention: "never"           # delete the pod the moment the build ends
        templates:
          - name: "base"
            label: "k8s-base"
            serviceAccount: "jenkins-agent"
            idleMinutes: 0               # do not keep idle agents warm
            yamlMergeStrategy: "merge"
            containers:
              - name: "jnlp"
                image: "jenkins/inbound-agent:3261.v9c670a_4748a_9-1"
                resourceRequestCpu: "500m"
                resourceRequestMemory: "512Mi"
                resourceLimitCpu: "1"
                resourceLimitMemory: "1Gi"
  securityRealm:
    oic:
      clientId: "${OIDC_CLIENT_ID}"
      clientSecret: "${OIDC_CLIENT_SECRET}"
      wellKnownOpenIDConfigurationUrl: "https://your-org.okta.com/.well-known/openid-configuration"
      userNameField: "preferred_username"
      groupsFieldName: "groups"
  authorizationStrategy:
    roleBased:
      roles:
        global:
          - name: "admin"
            permissions: ["Overall/Administer"]
            assignments: ["platform-admins"]   # Okta/Entra group claim
          - name: "developer"
            permissions: ["Overall/Read", "Job/Build", "Job/Read"]
            assignments: ["developers"]
unclassified:
  location:
    url: "https://jenkins.example.com/"

A few choices that teams get wrong, called out explicitly:

podRetention: never and idleMinutes: 0 are what make agents genuinely ephemeral. The default onFailure retention keeps failed pods around “for debugging” and quietly fills your node pool — only enable it temporarily.
containerCapStr is your blast-radius limit. Without it, a flood of queued jobs can try to schedule hundreds of pods and exhaust the cluster.
jenkinsTunnel must point at the 50000 agent port service the Helm chart creates (jenkins-agent), or agents connect to the UI port and silently fail to register.
The OIDC realm replaces local accounts entirely: Okta (federated to Entra ID for Azure-backed RBAC) issues the token, and the groups claim drives Jenkins’ role-based authorization, so access is managed in the IdP, not in Jenkins.

The ${OIDC_CLIENT_SECRET} and any registry/signing secrets are not hardcoded — they are pulled from HashiCorp Vault via the Vault plugin and exposed to JCasC as environment variables, so no secret is ever written into the Helm values or Git.

Apply by upgrading the release, then reload config without a restart:

helm upgrade jenkins jenkins/jenkins -n jenkins --values values.yaml --wait
# JCasC reloads on chart upgrade via the sidecar; or force it:
kubectl -n jenkins exec sts/jenkins -c jenkins -- \
  curl -s -X POST localhost:8080/reload-configuration-as-code/

4. Build a per-job pod template into a Pipeline

Now use it. A pipeline declares its own pod inline with a podTemplate so each job gets exactly the tool containers it needs — a Maven build, a Kaniko image build with no Docker daemon, and a Trivy scan, all in one ephemeral pod sharing the workspace.

// Jenkinsfile
pipeline {
  agent {
    kubernetes {
      yaml '''
        apiVersion: v1
        kind: Pod
        spec:
          serviceAccountName: jenkins-agent
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
          containers:
            - name: maven
              image: maven:3.9.6-eclipse-temurin-17
              command: ["cat"]
              tty: true
              resources:
                requests: { cpu: "1", memory: "1Gi" }
                limits:   { cpu: "2", memory: "2Gi" }
            - name: kaniko
              image: gcr.io/kaniko-project/executor:v1.23.2-debug
              command: ["sleep"]
              args: ["9999999"]
            - name: trivy
              image: aquasec/trivy:0.53.0
              command: ["cat"]
              tty: true
      '''
    }
  }
  stages {
    stage('Build & test') {
      steps {
        container('maven') {
          sh 'mvn -B -ntp clean verify'
        }
      }
    }
    stage('Image build (no daemon)') {
      steps {
        container('kaniko') {
          // registry creds injected from Vault, mounted at /kaniko/.docker
          sh '''/kaniko/executor \
            --context=`pwd` \
            --dockerfile=Dockerfile \
            --destination=ghcr.io/acme/app:${BUILD_NUMBER} \
            --cache=true'''
        }
      }
    }
    stage('Scan') {
      steps {
        container('trivy') {
          sh 'trivy image --exit-code 1 --severity HIGH,CRITICAL ghcr.io/acme/app:${BUILD_NUMBER}'
        }
      }
    }
  }
}

The same job, run via GitHub Actions for ephemeral runners or promoted by Argo CD into the cluster, would reuse these container images — keeping the toolchain identical across CI systems. Terraform (or Ansible for any node-level config) owns the node pools these pods land on, so capacity is also code.

5. Wire in secrets, identity, and the operating stack

Glue the ephemeral pods into the platform so they are governed, not just functional.

HashiCorp Vault issues short-lived registry and signing credentials. The agent pod authenticates to Vault with its Kubernetes service-account JWT (Vault’s kubernetes auth method), leases a token, and mounts the secret — nothing long-lived sits in a Secret object. Configure the role to bind exactly the jenkins-agent SA in jenkins-agents:
```
vault write auth/kubernetes/role/jenkins-agent \
  bound_service_account_names=jenkins-agent \
  bound_service_account_namespaces=jenkins-agents \
  policies=ci-registry-read,ci-signing \
  ttl=20m
```
Okta → Entra ID is the only way humans log in (step 3); Jenkins local auth stays disabled. Group claims map to Jenkins roles, so onboarding/offboarding happens in the IdP.
Wiz / Wiz Code scans the agent and controller images in the registry and runs CSPM over the jenkins/jenkins-agents namespaces, alerting if a pod escapes its scoped RBAC or an image ships a critical CVE — the posture backstop behind the Trivy gate in the pipeline.
CrowdStrike Falcon sensors on the node pool give runtime threat detection on the ephemeral pods themselves (crypto-miner in a poisoned dependency, unexpected egress) and feed detections to the SOC.
Datadog (or Dynatrace) collects the Jenkins Prometheus metrics, agent pod lifecycle events, and build-stage timings, so queue time, pod-startup latency, and per-pipeline duration are dashboards, not guesses.
ServiceNow receives an auto-raised change record when the controller’s Helm/JCasC release is promoted, and an incident ticket on a guardrail breach (a Falcon detection, a sustained scan failure) — giving compliance a documented gate.
Internal training for new engineers on this pipeline lives as a course in Moodle, and any legacy build that still needs a Windows toolchain runs on a virtual appliance node attached to the cluster as a separate, labeled pod template until it can be containerized.

Validation

Confirm the controller is healthy and that agents are genuinely ephemeral — created on demand, gone after.

# Controller pod Running, PVC bound
kubectl -n jenkins get pods,pvc
kubectl -n jenkins logs sts/jenkins -c jenkins | grep -i "Configuration as Code"

# The k8s cloud is registered and reachable
kubectl -n jenkins exec sts/jenkins -c jenkins -- \
  curl -s localhost:8080/manage/cloud/k8s/ -o /dev/null -w "%{http_code}\n"

Trigger a build, then watch a pod appear in the agents namespace and disappear when it ends:

# In one terminal — watch agents come and go
kubectl -n jenkins-agents get pods -w

You should see a pod named like app-build-7-xxxxx-yyyyy go Pending → Running → Completed/Terminating within the build’s lifetime, then vanish. Verify the connection method and that no agents linger idle:

# After a few builds: zero idle agent pods should remain
kubectl -n jenkins-agents get pods --no-headers | wc -l   # expect 0 between builds

A green run with the Maven, Kaniko, and Trivy stages all passing — and the pod gone afterward — is the success criterion.

Rollback / teardown

Because the controller is Helm-managed and the agents are stateless, rollback is clean.

# Roll the controller back to the previous release revision
helm history jenkins -n jenkins
helm rollback jenkins <PREVIOUS_REVISION> -n jenkins --wait

# Or fully tear down — agents first, then controller, then namespaces
kubectl -n jenkins-agents delete pods --all          # kill any in-flight agents
helm uninstall jenkins -n jenkins
kubectl delete -f agent-rbac.yaml
kubectl delete namespace jenkins-agents
kubectl delete namespace jenkins   # this deletes the PVC and JENKINS_HOME — back it up first

If you only need to disable ephemeral agents temporarily (e.g. cluster maintenance), set containerCapStr: "0" in JCasC and reload — the controller stays up but schedules no new pods, draining gracefully.

Common pitfalls

Agents stuck Pending. Almost always RBAC or the tunnel: the controller SA lacks create pods in jenkins-agents, or jenkinsTunnel points at the wrong service/port (must be the 50000 agent port). Check kubectl -n jenkins-agents describe pod <agent> events.
Pods never get deleted. podRetention left at onFailure/always, or a finalizer hanging. Set never for production and reserve retention for ad-hoc debugging.
jnlp container overridden by accident. If your podTemplate names a container jnlp, you replace the agent itself — name tool containers maven/node/etc. and let the plugin inject jnlp.
Workspace not shared. All containers in a podTemplate share the workspace volume automatically; if a step in container('trivy') can’t see the artifact container('maven') built, you likely overrode the workspace mount.
Image pulls throttle builds. Cold pulls of large tool images dominate startup. Pre-pull with a DaemonSet or use a pull-through cache so agent startup is seconds, not minutes.
Clock/timeout flakiness at scale. Under a thundering herd, bump connectTimeout/readTimeout and raise containerCapStr deliberately rather than letting jobs queue invisibly.

Security notes

Run agent pods as non-root with a restricted securityContext (runAsNonRoot: true, drop all capabilities, read-only root filesystem where the toolchain allows), and never mount the host Docker socket — use Kaniko or BuildKit rootless for image builds so a poisoned build cannot escape to the node. Keep the controller off the public internet: ClusterIP service behind Ingress, fronted by Akamai for TLS and WAF. Disable Jenkins local auth and gate every login through Okta/Entra OIDC with group-driven RBAC. Pull all build secrets from HashiCorp Vault at runtime with short TTLs, so nothing durable lands in a Secret. Let Wiz scan images and namespace posture and CrowdStrike Falcon watch runtime, with breaches auto-ticketed in ServiceNow. Apply a NetworkPolicy so agent pods can reach only the registry, Vault, and the controller — not each other or the wider cluster.

Cost notes

This is where the design pays for itself. Permanently-on agents bill 24/7 regardless of load; ephemeral pods bill only for the seconds a build runs. Pair the pod model with a cluster autoscaler (or Karpenter on EKS) on a dedicated, Spot/Preemptible node pool labeled for agents — builds are interruptible and idempotent, so a 60–80% discount is realistic. Right-size resourceRequest/limit per container so the scheduler bin-packs tightly instead of stranding capacity. Set idleMinutes: 0 and podRetention: never so nothing idles. Use Datadog to chart cost-relevant signals — node-pool utilization, pod-startup latency, and build minutes per team — and feed per-team build minutes into chargeback so each squad owns its spend. The combined effect for the team in the scenario: the overnight VM bill goes to near zero, peak capacity becomes elastic instead of a fixed ceiling, and every build runs on an identical, version-pinned toolchain — the “works on the build server” problem solved by construction.

Set Up Jenkins on Kubernetes with the Kubernetes Plugin and Ephemeral Agent Pods

Target topology

1. Create namespaces and the agent service account

2. Install the Jenkins controller with Helm

3. Configure the cloud and pod templates with JCasC

4. Build a per-job pod template into a Pipeline

5. Wire in secrets, identity, and the operating stack

Validation

Rollback / teardown

Common pitfalls

Security notes

Cost notes

Written by Vinod

Comments

Keep Reading

CI/CD Pipeline Design: Stages, Quality Gates, Artifacts & Security Scans

The DevOps Architecting Ladder: From a Single Pipeline to an Internal Developer Platform

DevOps Certification Prep Kit: AWS/Azure/GCP DevOps, Terraform Associate, CKA/CKAD & GitHub/GitLab