DevOps Multi-cloud

Set Up Jenkins on Kubernetes with the Kubernetes Plugin and Ephemeral Agent Pods

A 200-engineer platform team is bleeding money on a row of permanently-on Jenkins agent VMs that sit at 8% utilization overnight and still buckle at 9am when every squad pushes at once. Worse, the agents have drifted: one has JDK 17, another JDK 21, a third has a stale Trivy binary, and “works on the build server” has stopped meaning anything. The fix is to stop treating build capacity as a fleet of pets and start treating it as ephemeral pods — the Jenkins Kubernetes plugin asks the cluster for a fresh agent pod when a job needs one, runs the build inside per-job container templates pinned to exact tool versions, and deletes the pod the instant the build finishes. You pay only for the seconds a build actually runs, every build starts from an identical image, and the controller config lives in version control as code. This guide walks the whole thing end to end: a Helm-installed controller, JCasC (Jenkins Configuration as Code) to declare the cloud and pod templates, a real multi-container pipeline, then validation, rollback, and the security/cost notes that keep it production-grade.

Prerequisites

Target topology

Set Up Jenkins on Kubernetes with the Kubernetes Plugin and Ephemeral Agent Pods — topology

The shape is deliberately simple and that is the point. A single long-lived Jenkins controller runs as a StatefulSet in a jenkins namespace, with its $JENKINS_HOME on a persistent volume so jobs, build history, and credentials survive a pod restart. The controller holds no build capacity of its own — it is a scheduler and a UI. When a pipeline requests an agent, the Kubernetes plugin calls the Kubernetes API and creates an ephemeral agent pod in a separate jenkins-agents namespace. That pod contains a jnlp container (the Jenkins agent process that phones home over JNLP/WebSocket) plus one or more per-job tool containersmaven, node, docker/kaniko, trivy, whatever the job’s podTemplate declares. The build steps run inside those containers; when the pipeline ends, the plugin deletes the pod and the node-level autoscaler reclaims the node minutes later. Argo CD keeps the controller’s Helm release and JCasC in sync with Git (GitOps), so the controller is reproducible and drift-free. Identity flows Okta → Entra → Jenkins OIDC; secrets flow Vault → agent pod at runtime; and Wiz, CrowdStrike Falcon, and Datadog observe the whole namespace.

1. Create namespaces and the agent service account

Isolate the controller from the throwaway agents. The agents get their own namespace and a tightly-scoped service account — they should never be able to mutate the controller.

kubectl create namespace jenkins
kubectl create namespace jenkins-agents

# Service account the agent pods run as
kubectl -n jenkins-agents create serviceaccount jenkins-agent

Grant the controller’s service account permission to create and delete pods in the agents namespace only — least privilege, scoped by RoleBinding, never a cluster-wide ClusterRoleBinding:

# agent-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: jenkins-agent-manager
  namespace: jenkins-agents
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/exec", "pods/log"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: jenkins-controller-manages-agents
  namespace: jenkins-agents
subjects:
  - kind: ServiceAccount
    name: jenkins                # created by the Helm chart in step 2
    namespace: jenkins
roleRef:
  kind: Role
  name: jenkins-agent-manager
  apiGroup: rbac.authorization.k8s.io
kubectl apply -f agent-rbac.yaml

2. Install the Jenkins controller with Helm

Use the official chart. The key is to pin the agents into the right namespace, give the controller a persistent home, and turn on the plugins we need — the Kubernetes plugin, JCasC, and the OIDC plugin for Okta/Entra SSO.

helm repo add jenkins https://charts.jenkins.io
helm repo update
# values.yaml  (managed by Terraform/Argo CD in production)
controller:
  image:
    tag: "2.452.3-lts-jdk17"
  installPlugins:
    - kubernetes:4253.v7700d91739e8
    - configuration-as-code:1810.v9b_c30a_249a_4c
    - workflow-aggregator:600.vb_57cdd26fdd7   # Pipeline
    - git:5.2.2
    - oic-auth:4.418.vccc7061f5b_6d             # OIDC for Okta/Entra
    - hashicorp-vault-plugin:370.va_4c5fe9f9a_69 # Vault credentials
  JCasC:
    defaultConfig: true
    configScripts:
      jenkins-casc: |
        # filled in step 3
  serviceType: ClusterIP        # exposed via Ingress + Akamai, not LoadBalancer
  resources:
    requests: { cpu: "1",   memory: "2Gi" }
    limits:   { cpu: "2",   memory: "4Gi" }
persistence:
  enabled: true
  storageClass: "gp3"
  size: "30Gi"
agent:
  enabled: false                # we declare agents in JCasC, not chart defaults
serviceAccount:
  create: true
  name: jenkins

Install it:

helm upgrade --install jenkins jenkins/jenkins \
  --namespace jenkins \
  --values values.yaml \
  --wait --timeout 10m

Fetch the initial admin password (you will replace this with OIDC in step 3):

kubectl -n jenkins exec -it sts/jenkins -c jenkins -- \
  cat /run/secrets/additional/chart-admin-password

3. Configure the cloud and pod templates with JCasC

This is the heart of the setup. Everything below goes in the JCasC.configScripts.jenkins-casc block from step 2 (or as a separate ConfigMap that the chart mounts). JCasC declares the Kubernetes cloud (how the controller talks to the API and where agents land) and one or more reusable pod templates.

jenkins:
  clouds:
    - kubernetes:
        name: "k8s"
        serverUrl: "https://kubernetes.default.svc"
        namespace: "jenkins-agents"
        jenkinsUrl: "http://jenkins.jenkins.svc.cluster.local:8080"
        jenkinsTunnel: "jenkins-agent.jenkins.svc.cluster.local:50000"
        directConnection: false          # WebSocket/JNLP through the controller svc
        containerCapStr: "50"            # hard ceiling on concurrent agent pods
        connectTimeout: 100
        readTimeout: 200
        podRetention: "never"           # delete the pod the moment the build ends
        templates:
          - name: "base"
            label: "k8s-base"
            serviceAccount: "jenkins-agent"
            idleMinutes: 0               # do not keep idle agents warm
            yamlMergeStrategy: "merge"
            containers:
              - name: "jnlp"
                image: "jenkins/inbound-agent:3261.v9c670a_4748a_9-1"
                resourceRequestCpu: "500m"
                resourceRequestMemory: "512Mi"
                resourceLimitCpu: "1"
                resourceLimitMemory: "1Gi"
  securityRealm:
    oic:
      clientId: "${OIDC_CLIENT_ID}"
      clientSecret: "${OIDC_CLIENT_SECRET}"
      wellKnownOpenIDConfigurationUrl: "https://your-org.okta.com/.well-known/openid-configuration"
      userNameField: "preferred_username"
      groupsFieldName: "groups"
  authorizationStrategy:
    roleBased:
      roles:
        global:
          - name: "admin"
            permissions: ["Overall/Administer"]
            assignments: ["platform-admins"]   # Okta/Entra group claim
          - name: "developer"
            permissions: ["Overall/Read", "Job/Build", "Job/Read"]
            assignments: ["developers"]
unclassified:
  location:
    url: "https://jenkins.example.com/"

A few choices that teams get wrong, called out explicitly:

The ${OIDC_CLIENT_SECRET} and any registry/signing secrets are not hardcoded — they are pulled from HashiCorp Vault via the Vault plugin and exposed to JCasC as environment variables, so no secret is ever written into the Helm values or Git.

Apply by upgrading the release, then reload config without a restart:

helm upgrade jenkins jenkins/jenkins -n jenkins --values values.yaml --wait
# JCasC reloads on chart upgrade via the sidecar; or force it:
kubectl -n jenkins exec sts/jenkins -c jenkins -- \
  curl -s -X POST localhost:8080/reload-configuration-as-code/

4. Build a per-job pod template into a Pipeline

Now use it. A pipeline declares its own pod inline with a podTemplate so each job gets exactly the tool containers it needs — a Maven build, a Kaniko image build with no Docker daemon, and a Trivy scan, all in one ephemeral pod sharing the workspace.

// Jenkinsfile
pipeline {
  agent {
    kubernetes {
      yaml '''
        apiVersion: v1
        kind: Pod
        spec:
          serviceAccountName: jenkins-agent
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
          containers:
            - name: maven
              image: maven:3.9.6-eclipse-temurin-17
              command: ["cat"]
              tty: true
              resources:
                requests: { cpu: "1", memory: "1Gi" }
                limits:   { cpu: "2", memory: "2Gi" }
            - name: kaniko
              image: gcr.io/kaniko-project/executor:v1.23.2-debug
              command: ["sleep"]
              args: ["9999999"]
            - name: trivy
              image: aquasec/trivy:0.53.0
              command: ["cat"]
              tty: true
      '''
    }
  }
  stages {
    stage('Build & test') {
      steps {
        container('maven') {
          sh 'mvn -B -ntp clean verify'
        }
      }
    }
    stage('Image build (no daemon)') {
      steps {
        container('kaniko') {
          // registry creds injected from Vault, mounted at /kaniko/.docker
          sh '''/kaniko/executor \
            --context=`pwd` \
            --dockerfile=Dockerfile \
            --destination=ghcr.io/acme/app:${BUILD_NUMBER} \
            --cache=true'''
        }
      }
    }
    stage('Scan') {
      steps {
        container('trivy') {
          sh 'trivy image --exit-code 1 --severity HIGH,CRITICAL ghcr.io/acme/app:${BUILD_NUMBER}'
        }
      }
    }
  }
}

The same job, run via GitHub Actions for ephemeral runners or promoted by Argo CD into the cluster, would reuse these container images — keeping the toolchain identical across CI systems. Terraform (or Ansible for any node-level config) owns the node pools these pods land on, so capacity is also code.

5. Wire in secrets, identity, and the operating stack

Glue the ephemeral pods into the platform so they are governed, not just functional.

Validation

Confirm the controller is healthy and that agents are genuinely ephemeral — created on demand, gone after.

# Controller pod Running, PVC bound
kubectl -n jenkins get pods,pvc
kubectl -n jenkins logs sts/jenkins -c jenkins | grep -i "Configuration as Code"

# The k8s cloud is registered and reachable
kubectl -n jenkins exec sts/jenkins -c jenkins -- \
  curl -s localhost:8080/manage/cloud/k8s/ -o /dev/null -w "%{http_code}\n"

Trigger a build, then watch a pod appear in the agents namespace and disappear when it ends:

# In one terminal — watch agents come and go
kubectl -n jenkins-agents get pods -w

You should see a pod named like app-build-7-xxxxx-yyyyy go Pending → Running → Completed/Terminating within the build’s lifetime, then vanish. Verify the connection method and that no agents linger idle:

# After a few builds: zero idle agent pods should remain
kubectl -n jenkins-agents get pods --no-headers | wc -l   # expect 0 between builds

A green run with the Maven, Kaniko, and Trivy stages all passing — and the pod gone afterward — is the success criterion.

Rollback / teardown

Because the controller is Helm-managed and the agents are stateless, rollback is clean.

# Roll the controller back to the previous release revision
helm history jenkins -n jenkins
helm rollback jenkins <PREVIOUS_REVISION> -n jenkins --wait

# Or fully tear down — agents first, then controller, then namespaces
kubectl -n jenkins-agents delete pods --all          # kill any in-flight agents
helm uninstall jenkins -n jenkins
kubectl delete -f agent-rbac.yaml
kubectl delete namespace jenkins-agents
kubectl delete namespace jenkins   # this deletes the PVC and JENKINS_HOME — back it up first

If you only need to disable ephemeral agents temporarily (e.g. cluster maintenance), set containerCapStr: "0" in JCasC and reload — the controller stays up but schedules no new pods, draining gracefully.

Common pitfalls

Security notes

Run agent pods as non-root with a restricted securityContext (runAsNonRoot: true, drop all capabilities, read-only root filesystem where the toolchain allows), and never mount the host Docker socket — use Kaniko or BuildKit rootless for image builds so a poisoned build cannot escape to the node. Keep the controller off the public internet: ClusterIP service behind Ingress, fronted by Akamai for TLS and WAF. Disable Jenkins local auth and gate every login through Okta/Entra OIDC with group-driven RBAC. Pull all build secrets from HashiCorp Vault at runtime with short TTLs, so nothing durable lands in a Secret. Let Wiz scan images and namespace posture and CrowdStrike Falcon watch runtime, with breaches auto-ticketed in ServiceNow. Apply a NetworkPolicy so agent pods can reach only the registry, Vault, and the controller — not each other or the wider cluster.

Cost notes

This is where the design pays for itself. Permanently-on agents bill 24/7 regardless of load; ephemeral pods bill only for the seconds a build runs. Pair the pod model with a cluster autoscaler (or Karpenter on EKS) on a dedicated, Spot/Preemptible node pool labeled for agents — builds are interruptible and idempotent, so a 60–80% discount is realistic. Right-size resourceRequest/limit per container so the scheduler bin-packs tightly instead of stranding capacity. Set idleMinutes: 0 and podRetention: never so nothing idles. Use Datadog to chart cost-relevant signals — node-pool utilization, pod-startup latency, and build minutes per team — and feed per-team build minutes into chargeback so each squad owns its spend. The combined effect for the team in the scenario: the overnight VM bill goes to near zero, peak capacity becomes elastic instead of a fixed ceiling, and every build runs on an identical, version-pinned toolchain — the “works on the build server” problem solved by construction.

JenkinsKubernetesJCasCCI/CDHelmDevOps
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading