A pool of always-on VM build agents is a standing liability: it accrues state between jobs, it sits idle (and billed) overnight, and a poisoned job leaks into the next one. This guide replaces that pattern with ephemeral, per-job runner pods on Kubernetes that autoscale on queue depth, scale to zero between builds, and authenticate to your cloud with OIDC instead of static secrets.
1. Why ephemeral runners
Three problems with persistent agents are worth naming precisely:
- Security and reproducibility. A long-lived agent carries dependencies, caches, and credentials from job to job. One compromised pull request can plant a backdoor that the next job inherits. A fresh pod per job gives you a clean room every time and removes a whole class of cross-job contamination.
- The cost of idle. Persistent agents are sized for peak and billed at all times. Most CI fleets run 15-30% utilization, so you pay for capacity that does nothing two-thirds of the day.
- Drift. Hand-patched agents diverge. When the runner image is the only artifact and pods are disposable, “what ran my build” is answerable by a digest.
The trade-off is cold-start latency (pulling the runner image, scheduling a pod) and the need to externalize caches, since nothing survives the pod. Both are solvable, and the rest of this guide does so.
Callout: Ephemeral does not mean stateless caching. It means job state is discarded. You still want layer caches and dependency caches — just stored outside the pod, in object storage or a registry.
2. Deploying GitHub Actions Runner Controller (ARC)
Modern ARC uses runner scale sets managed by two Helm charts: a single cluster-wide controller, and one listener+ephemeral-runner release per scale set. This is the supported model; the older RunnerDeployment/HorizontalRunnerAutoscaler CRDs are legacy and you should not start there.
Install the controller once:
helm install arc \
--namespace arc-systems --create-namespace \
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller
Authentication to GitHub should use a GitHub App (scoped, rotating installation tokens) rather than a personal access token. Create the App, install it on the org, and store the credentials as a secret the scale set will reference:
kubectl create namespace arc-runners
kubectl create secret generic arc-github-app \
--namespace arc-runners \
--from-literal=github_app_id=123456 \
--from-literal=github_app_installation_id=7891011 \
--from-file=github_app_private_key=./app-private-key.pem
Now install a scale set. githubConfigUrl can target an org, an enterprise, or a single repo; the installation-name becomes the runs-on label your workflows select:
helm install platform-runners \
--namespace arc-runners \
--set githubConfigUrl="https://github.com/my-org" \
--set githubConfigSecret=arc-github-app \
--set minRunners=0 \
--set maxRunners=50 \
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set
Workflows opt in by name:
jobs:
build:
runs-on: platform-runners
steps:
- uses: actions/checkout@v4
- run: ./ci/build.sh
For anything beyond defaults, drive the install from a values file you keep in Git. The template block is a real pod spec, which is where isolation and resources get set later:
# platform-runners-values.yaml
githubConfigUrl: "https://github.com/my-org"
githubConfigSecret: arc-github-app
minRunners: 0
maxRunners: 50
runnerScaleSetName: platform-runners
template:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
resources:
requests:
cpu: "1"
memory: 2Gi
limits:
cpu: "2"
memory: 4Gi
helm upgrade --install platform-runners \
--namespace arc-runners -f platform-runners-values.yaml \
oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set
3. Azure DevOps scale-set agents on the same cluster
Azure DevOps does not ship a Kubernetes controller equivalent to ARC. Its native elastic option is VM scale-set agents, where you point an agent pool at an Azure VMSS and Azure DevOps scales the VM count on demand. That is the right call when jobs need full-VM isolation or nested virtualization.
If you want Azure DevOps jobs to run as ephemeral pods on the same cluster, run the agent in a Kubernetes Job with the --once flag so the container processes exactly one job and exits. Register against an agent pool with a PAT (or, better, a managed identity once your org supports it):
# azdo-agent-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: azdo-agent
namespace: azdo-runners
spec:
ttlSecondsAfterFinished: 120
backoffLimit: 0
template:
spec:
restartPolicy: Never
securityContext:
runAsNonRoot: true
runAsUser: 1001
containers:
- name: agent
image: myregistry.azurecr.io/azdo-agent:2.x
env:
- name: AZP_URL
value: "https://dev.azure.com/my-org"
- name: AZP_POOL
value: "k8s-ephemeral"
- name: AZP_TOKEN
valueFrom:
secretKeyRef:
name: azdo-pat
key: token
args: ["--once"]
The agent container’s entrypoint runs config.sh --unattended --replace then run.sh --once. Driving creation of these Jobs from queue depth needs an external scaler, covered next. For most teams the pragmatic split is: VMSS elastic agents for Azure DevOps when you need VM isolation, ARC for GitHub where pod-per-job is native.
| Need | GitHub | Azure DevOps |
|---|---|---|
| Native pod-per-job | ARC runner scale sets | Job + --once (DIY scaling) |
| Full-VM isolation / nested virt | larger runners | VMSS elastic agents |
| Scale to zero | built in (minRunners: 0) |
VMSS min count 0 / KEDA on Jobs |
4. Autoscaling on queue depth and scaling to zero
ARC’s listener watches the GitHub job queue and creates one ephemeral runner per queued job, up to maxRunners, then deletes the pod when the job ends. With minRunners: 0 the scale set sits at zero pods between builds — you pay nothing but node baseline.
For the Azure DevOps Job pattern, use KEDA with the azure-pipelines scaler, which reads the pending job count for a pool and scales a ScaledJob:
# azdo-scaledjob.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: azdo-agents
namespace: azdo-runners
spec:
minReplicaCount: 0
maxReplicaCount: 30
pollingInterval: 15
jobTargetRef:
template:
spec:
restartPolicy: Never
containers:
- name: agent
image: myregistry.azurecr.io/azdo-agent:2.x
args: ["--once"]
triggers:
- type: azure-pipelines
metadata:
poolName: "k8s-ephemeral"
organizationURLFromEnv: "AZP_URL"
authenticationRef:
name: azdo-trigger-auth
Scaling pods to zero is only half the win. Empty pods still need empty nodes to vanish, so pair this with cluster-level node autoscaling (the Kubernetes Cluster Autoscaler or Karpenter) on a dedicated CI node pool. Otherwise you scale pods to zero but keep paying for the nodes they used to sit on.
Callout: KEDA scales workloads; it does not scale nodes. The node count drops only when your cluster autoscaler removes empty nodes. Validate that both layers actually reach zero.
5. Hardening: per-job pods, non-root, network policies, dind alternatives
Ephemerality buys you isolation between jobs. These controls harden the pod itself.
Run as non-root and drop privileges. Set this in the runner pod template:
template:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1001
seccompProfile:
type: RuntimeDefault
containers:
- name: runner
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
Restrict egress with a NetworkPolicy. Untrusted PR code should not be able to reach your cluster’s internal services or metadata endpoints. Default-deny, then allow only what builds need (DNS, your registry, GitHub/Azure DevOps):
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: runners-egress
namespace: arc-runners
spec:
podSelector: {}
policyTypes: ["Egress"]
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 169.254.169.254/32 # block instance metadata
ports:
- protocol: TCP
port: 443
Avoid Docker-in-Docker. Privileged dind sidecars are the classic escape hatch and a real risk on shared clusters. Build images without a Docker daemon instead:
- BuildKit /
buildkitdrunning as a separate, controlled service that runners call over its API. - Kaniko or Buildah, which build from a Dockerfile in userspace without a privileged daemon.
If a job genuinely needs nested virtualization or a real Docker socket, isolate it: send it to VMSS elastic agents, or to a separate node pool fronted by a sandboxed runtime such as gVisor or Kata Containers, rather than granting privileged: true on your general fleet.
6. Build caching that survives ephemerality
Since pods are thrown away, caching must live elsewhere. Two layers matter.
Dependency cache. For GitHub, actions/cache already stores keyed archives in GitHub-hosted storage, so it works unchanged on self-hosted runners — no local disk assumptions.
Layer cache for image builds. This is where ephemeral runners hurt without help, because every build starts with a cold daemon. BuildKit supports remote caches you export and import across runs. Push the cache to your registry alongside the image:
docker buildx build \
--cache-to type=registry,ref=myregistry.azurecr.io/app:buildcache,mode=max \
--cache-from type=registry,ref=myregistry.azurecr.io/app:buildcache \
--push -t myregistry.azurecr.io/app:$GIT_SHA .
mode=max exports intermediate layers too, which gives far better hit rates than the default min. Kaniko has an equivalent with --cache=true --cache-repo=<registry>/cache.
For heavy, repeated builds, a persistent BuildKit service with its own cache volume (a small, long-lived Deployment that ephemeral runners talk to) outperforms per-job cache import/export, because the cache stays hot in one place instead of being re-pulled each build. The runners stay ephemeral; only the builder is durable.
| Cache type | Where it lives | Mechanism |
|---|---|---|
| Dependencies | GitHub cache / object storage | actions/cache, keyed restore |
| Image layers (simple) | Container registry | BuildKit --cache-to/--cache-from registry |
| Image layers (heavy) | Persistent BuildKit volume | Shared buildkitd service |
7. Securing cloud access with OIDC, not secrets
The biggest secret-sprawl win is deleting cloud credentials from CI entirely. Both GitHub Actions and Azure DevOps can mint short-lived OIDC tokens that your cloud trusts via a federated identity. No static keys to leak, no rotation to manage.
Azure (GitHub Actions). Create an app registration with a federated credential bound to your repo and branch, then log in with no client secret:
az ad app federated-credential create \
--id "$APP_ID" \
--parameters '{
"name": "gh-main",
"issuer": "https://token.actions.githubusercontent.com",
"subject": "repo:my-org/my-repo:ref:refs/heads/main",
"audiences": ["api://AzureADTokenExchange"]
}'
permissions:
id-token: write
contents: read
steps:
- uses: azure/login@v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
The id-token: write permission is mandatory — without it the runner cannot request an OIDC token and azure/login falls back to looking for a secret.
AWS (GitHub Actions). Register GitHub as an OIDC provider and assume a role scoped by the same sub claim:
permissions:
id-token: write
contents: read
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::111122223333:role/github-ci
aws-region: us-east-1
Azure DevOps. Use an Azure Resource Manager service connection configured with workload identity federation, which removes the stored service principal secret from the connection. Tasks like AzureCLI@2 then authenticate with a federated token automatically.
Scope the trust tightly. The federated subject/sub should pin the exact repo and ref (and, for production, the GitHub Environment), so a fork or a feature branch cannot assume a privileged role.
Verify
Confirm the controller and a scale set are healthy, and prove scale-to-zero:
# Controller and listener pods running
kubectl get pods -n arc-systems
kubectl get pods -n arc-runners
# AutoscalingRunnerSet registered with GitHub
kubectl get autoscalingrunnerset -n arc-runners
# At rest: zero ephemeral runners
kubectl get ephemeralrunner -n arc-runners
Trigger a workflow targeting runs-on: platform-runners, then watch a pod appear and disappear:
kubectl get pods -n arc-runners -w
For Azure DevOps + KEDA, confirm the scaler reads the queue and that the Job runs once:
kubectl get scaledjob -n azdo-runners
kubectl get jobs -n azdo-runners
Then verify cost behavior directly: with no builds running, the CI node pool should drain to its minimum (ideally zero) within the autoscaler’s scale-down window. If pods are at zero but nodes are not, your node autoscaler — not KEDA or ARC — is the thing to fix.
Checklist
Capacity, spot nodes, and cost-per-build
CI is the ideal spot/low-priority workload: jobs are short, retriable, and tolerant of eviction. Put runners on a dedicated spot node pool, tainted so only CI lands there, and let runner pods tolerate it.
# AKS example: a low-cost CI pool, scalable to zero, tainted for CI only
az aks nodepool add \
--resource-group rg-ci --cluster-name aks-ci \
--name cispot --priority Spot \
--eviction-policy Delete --spot-max-price -1 \
--enable-cluster-autoscaler --min-count 0 --max-count 20 \
--node-taints "workload=ci:NoSchedule"
Add the matching toleration (and a nodeSelector) to the runner pod template so builds schedule onto the spot pool. Size requests honestly: over-requesting CPU/memory inflates node count and silently doubles your bill.
For cost-per-build, attribute spend per job: tag the CI node pool, and label runner pods with the repo/workflow so a tool like OpenCost or Kubecost can roll up cost by pipeline. Dividing CI node spend by build count over a week gives the single number that justifies this whole architecture to finance.
Enterprise scenario
A fintech platform team moved ~600 daily GitHub Actions jobs from always-on VMs to ARC scale sets on AKS, with minRunners: 0 on a Karpenter-managed spot pool. Builds passed; cost dropped. Then audit failed them: every job was assuming the same broadly-scoped Azure role through OIDC, because the federated credential subject was repo:org/*:ref:refs/heads/*. A developer on any fork-derived branch could mint a token with write access to production Key Vaults.
The fix had two parts. First, pin the subject to the GitHub Environment, not the branch, so privileged access requires an environment with required reviewers:
az ad app federated-credential create --id "$APP_ID" --parameters '{
"name": "gh-prod-deploy",
"issuer": "https://token.actions.githubusercontent.com",
"subject": "repo:org/payments-api:environment:production",
"audiences": ["api://AzureADTokenExchange"]
}'
Second, the gotcha that bit them: pull_request triggers from forks cannot read id-token: write against protected environments anyway, but their internal-branch PRs still inherited repo-level access. They split identities — a read-only role for CI builds, a deploy role reachable only from the production environment job — and added a default-deny egress NetworkPolicy so a poisoned dependency in a build pod could not reach the metadata endpoint (169.254.169.254) to scrape the kubelet’s identity. Scale-to-zero economics were never the hard part; bounding the blast radius of an ephemeral pod that briefly holds a cloud token was.
Pitfalls
- Scaling pods, not nodes. ARC and KEDA empty the pods; only the cluster autoscaler empties the nodes. Tune scale-down or you keep paying for idle capacity.
- Cold-start tax. Large runner images make every job wait on a pull. Keep images lean and pre-pull on CI nodes, or hit rate suffers.
- Loose OIDC subjects. A wildcard
subclaim lets any branch or fork assume your role. Pin repo, ref, and environment. - Reaching for dind. Privileged daemons undo the isolation you just built. Use daemonless builders and quarantine the rare job that truly needs a socket.
- Mixing ARC generations. The legacy
RunnerDeploymentCRDs and the new scale-set charts are different systems. Standardize on runner scale sets and do not blend the two.