Running Secure, Autoscaling Ephemeral CI Runners on Kubernetes (GitHub ARC and Azure DevOps Agents)

A pool of always-on VM build agents is a standing liability: it accrues state between jobs, it sits idle (and billed) overnight, and a poisoned job leaks into the next one. This guide replaces that pattern with ephemeral, per-job runner pods on Kubernetes that autoscale on queue depth, scale to zero between builds, and authenticate to your cloud with OIDC instead of static secrets.

1. Why ephemeral runners

Three problems with persistent agents are worth naming precisely:

Security and reproducibility. A long-lived agent carries dependencies, caches, and credentials from job to job. One compromised pull request can plant a backdoor that the next job inherits. A fresh pod per job gives you a clean room every time and removes a whole class of cross-job contamination.
The cost of idle. Persistent agents are sized for peak and billed at all times. Most CI fleets run 15-30% utilization, so you pay for capacity that does nothing two-thirds of the day.
Drift. Hand-patched agents diverge. When the runner image is the only artifact and pods are disposable, “what ran my build” is answerable by a digest.

The trade-off is cold-start latency (pulling the runner image, scheduling a pod) and the need to externalize caches, since nothing survives the pod. Both are solvable, and the rest of this guide does so.

Callout: Ephemeral does not mean stateless caching. It means job state is discarded. You still want layer caches and dependency caches — just stored outside the pod, in object storage or a registry.

2. Deploying GitHub Actions Runner Controller (ARC)

Modern ARC uses runner scale sets managed by two Helm charts: a single cluster-wide controller, and one listener+ephemeral-runner release per scale set. This is the supported model; the older RunnerDeployment/HorizontalRunnerAutoscaler CRDs are legacy and you should not start there.

Install the controller once:

helm install arc \
  --namespace arc-systems --create-namespace \
  oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller

Authentication to GitHub should use a GitHub App (scoped, rotating installation tokens) rather than a personal access token. Create the App, install it on the org, and store the credentials as a secret the scale set will reference:

kubectl create namespace arc-runners

kubectl create secret generic arc-github-app \
  --namespace arc-runners \
  --from-literal=github_app_id=123456 \
  --from-literal=github_app_installation_id=7891011 \
  --from-file=github_app_private_key=./app-private-key.pem

Now install a scale set. githubConfigUrl can target an org, an enterprise, or a single repo; the installation-name becomes the runs-on label your workflows select:

helm install platform-runners \
  --namespace arc-runners \
  --set githubConfigUrl="https://github.com/my-org" \
  --set githubConfigSecret=arc-github-app \
  --set minRunners=0 \
  --set maxRunners=50 \
  oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

Workflows opt in by name:

jobs:
  build:
    runs-on: platform-runners
    steps:
      - uses: actions/checkout@v4
      - run: ./ci/build.sh

For anything beyond defaults, drive the install from a values file you keep in Git. The template block is a real pod spec, which is where isolation and resources get set later:

# platform-runners-values.yaml
githubConfigUrl: "https://github.com/my-org"
githubConfigSecret: arc-github-app
minRunners: 0
maxRunners: 50
runnerScaleSetName: platform-runners
template:
  spec:
    securityContext:
      runAsNonRoot: true
      runAsUser: 1001
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        resources:
          requests:
            cpu: "1"
            memory: 2Gi
          limits:
            cpu: "2"
            memory: 4Gi

helm upgrade --install platform-runners \
  --namespace arc-runners -f platform-runners-values.yaml \
  oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

3. Azure DevOps scale-set agents on the same cluster

Azure DevOps does not ship a Kubernetes controller equivalent to ARC. Its native elastic option is VM scale-set agents, where you point an agent pool at an Azure VMSS and Azure DevOps scales the VM count on demand. That is the right call when jobs need full-VM isolation or nested virtualization.

If you want Azure DevOps jobs to run as ephemeral pods on the same cluster, run the agent in a Kubernetes Job with the --once flag so the container processes exactly one job and exits. Register against an agent pool with a PAT (or, better, a managed identity once your org supports it):

# azdo-agent-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: azdo-agent
  namespace: azdo-runners
spec:
  ttlSecondsAfterFinished: 120
  backoffLimit: 0
  template:
    spec:
      restartPolicy: Never
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
      containers:
        - name: agent
          image: myregistry.azurecr.io/azdo-agent:2.x
          env:
            - name: AZP_URL
              value: "https://dev.azure.com/my-org"
            - name: AZP_POOL
              value: "k8s-ephemeral"
            - name: AZP_TOKEN
              valueFrom:
                secretKeyRef:
                  name: azdo-pat
                  key: token
          args: ["--once"]

The agent container’s entrypoint runs config.sh --unattended --replace then run.sh --once. Driving creation of these Jobs from queue depth needs an external scaler, covered next. For most teams the pragmatic split is: VMSS elastic agents for Azure DevOps when you need VM isolation, ARC for GitHub where pod-per-job is native.

Need	GitHub	Azure DevOps
Native pod-per-job	ARC runner scale sets	Job + `--once` (DIY scaling)
Full-VM isolation / nested virt	larger runners	VMSS elastic agents
Scale to zero	built in (`minRunners: 0`)	VMSS min count 0 / KEDA on Jobs

4. Autoscaling on queue depth and scaling to zero

ARC’s listener watches the GitHub job queue and creates one ephemeral runner per queued job, up to maxRunners, then deletes the pod when the job ends. With minRunners: 0 the scale set sits at zero pods between builds — you pay nothing but node baseline.

For the Azure DevOps Job pattern, use KEDA with the azure-pipelines scaler, which reads the pending job count for a pool and scales a ScaledJob:

# azdo-scaledjob.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: azdo-agents
  namespace: azdo-runners
spec:
  minReplicaCount: 0
  maxReplicaCount: 30
  pollingInterval: 15
  jobTargetRef:
    template:
      spec:
        restartPolicy: Never
        containers:
          - name: agent
            image: myregistry.azurecr.io/azdo-agent:2.x
            args: ["--once"]
  triggers:
    - type: azure-pipelines
      metadata:
        poolName: "k8s-ephemeral"
        organizationURLFromEnv: "AZP_URL"
      authenticationRef:
        name: azdo-trigger-auth

Scaling pods to zero is only half the win. Empty pods still need empty nodes to vanish, so pair this with cluster-level node autoscaling (the Kubernetes Cluster Autoscaler or Karpenter) on a dedicated CI node pool. Otherwise you scale pods to zero but keep paying for the nodes they used to sit on.

Callout: KEDA scales workloads; it does not scale nodes. The node count drops only when your cluster autoscaler removes empty nodes. Validate that both layers actually reach zero.

5. Hardening: per-job pods, non-root, network policies, dind alternatives

Ephemerality buys you isolation between jobs. These controls harden the pod itself.

Run as non-root and drop privileges. Set this in the runner pod template:

template:
  spec:
    securityContext:
      runAsNonRoot: true
      runAsUser: 1001
      seccompProfile:
        type: RuntimeDefault
    containers:
      - name: runner
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop: ["ALL"]

Restrict egress with a NetworkPolicy. Untrusted PR code should not be able to reach your cluster’s internal services or metadata endpoints. Default-deny, then allow only what builds need (DNS, your registry, GitHub/Azure DevOps):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: runners-egress
  namespace: arc-runners
spec:
  podSelector: {}
  policyTypes: ["Egress"]
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - protocol: UDP
          port: 53
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
              - 169.254.169.254/32   # block instance metadata
      ports:
        - protocol: TCP
          port: 443

Avoid Docker-in-Docker. Privileged dind sidecars are the classic escape hatch and a real risk on shared clusters. Build images without a Docker daemon instead:

BuildKit / buildkitd running as a separate, controlled service that runners call over its API.
Kaniko or Buildah, which build from a Dockerfile in userspace without a privileged daemon.

If a job genuinely needs nested virtualization or a real Docker socket, isolate it: send it to VMSS elastic agents, or to a separate node pool fronted by a sandboxed runtime such as gVisor or Kata Containers, rather than granting privileged: true on your general fleet.

6. Build caching that survives ephemerality

Since pods are thrown away, caching must live elsewhere. Two layers matter.

Dependency cache. For GitHub, actions/cache already stores keyed archives in GitHub-hosted storage, so it works unchanged on self-hosted runners — no local disk assumptions.

Layer cache for image builds. This is where ephemeral runners hurt without help, because every build starts with a cold daemon. BuildKit supports remote caches you export and import across runs. Push the cache to your registry alongside the image:

docker buildx build \
  --cache-to   type=registry,ref=myregistry.azurecr.io/app:buildcache,mode=max \
  --cache-from type=registry,ref=myregistry.azurecr.io/app:buildcache \
  --push -t myregistry.azurecr.io/app:$GIT_SHA .

mode=max exports intermediate layers too, which gives far better hit rates than the default min. Kaniko has an equivalent with --cache=true --cache-repo=<registry>/cache.

For heavy, repeated builds, a persistent BuildKit service with its own cache volume (a small, long-lived Deployment that ephemeral runners talk to) outperforms per-job cache import/export, because the cache stays hot in one place instead of being re-pulled each build. The runners stay ephemeral; only the builder is durable.

Cache type	Where it lives	Mechanism
Dependencies	GitHub cache / object storage	`actions/cache`, keyed restore
Image layers (simple)	Container registry	BuildKit `--cache-to/--cache-from registry`
Image layers (heavy)	Persistent BuildKit volume	Shared `buildkitd` service

7. Securing cloud access with OIDC, not secrets

The biggest secret-sprawl win is deleting cloud credentials from CI entirely. Both GitHub Actions and Azure DevOps can mint short-lived OIDC tokens that your cloud trusts via a federated identity. No static keys to leak, no rotation to manage.

Azure (GitHub Actions). Create an app registration with a federated credential bound to your repo and branch, then log in with no client secret:

az ad app federated-credential create \
  --id "$APP_ID" \
  --parameters '{
    "name": "gh-main",
    "issuer": "https://token.actions.githubusercontent.com",
    "subject": "repo:my-org/my-repo:ref:refs/heads/main",
    "audiences": ["api://AzureADTokenExchange"]
  }'

permissions:
  id-token: write
  contents: read
steps:
  - uses: azure/login@v2
    with:
      client-id: ${{ secrets.AZURE_CLIENT_ID }}
      tenant-id: ${{ secrets.AZURE_TENANT_ID }}
      subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

The id-token: write permission is mandatory — without it the runner cannot request an OIDC token and azure/login falls back to looking for a secret.

AWS (GitHub Actions). Register GitHub as an OIDC provider and assume a role scoped by the same sub claim:

permissions:
  id-token: write
  contents: read
steps:
  - uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: arn:aws:iam::111122223333:role/github-ci
      aws-region: us-east-1

Azure DevOps. Use an Azure Resource Manager service connection configured with workload identity federation, which removes the stored service principal secret from the connection. Tasks like AzureCLI@2 then authenticate with a federated token automatically.

Scope the trust tightly. The federated subject/sub should pin the exact repo and ref (and, for production, the GitHub Environment), so a fork or a feature branch cannot assume a privileged role.

Verify

Confirm the controller and a scale set are healthy, and prove scale-to-zero:

# Controller and listener pods running
kubectl get pods -n arc-systems
kubectl get pods -n arc-runners

# AutoscalingRunnerSet registered with GitHub
kubectl get autoscalingrunnerset -n arc-runners

# At rest: zero ephemeral runners
kubectl get ephemeralrunner -n arc-runners

Trigger a workflow targeting runs-on: platform-runners, then watch a pod appear and disappear:

kubectl get pods -n arc-runners -w

For Azure DevOps + KEDA, confirm the scaler reads the queue and that the Job runs once:

kubectl get scaledjob -n azdo-runners
kubectl get jobs -n azdo-runners

Then verify cost behavior directly: with no builds running, the CI node pool should drain to its minimum (ideally zero) within the autoscaler’s scale-down window. If pods are at zero but nodes are not, your node autoscaler — not KEDA or ARC — is the thing to fix.

Checklist

Capacity, spot nodes, and cost-per-build

CI is the ideal spot/low-priority workload: jobs are short, retriable, and tolerant of eviction. Put runners on a dedicated spot node pool, tainted so only CI lands there, and let runner pods tolerate it.

# AKS example: a low-cost CI pool, scalable to zero, tainted for CI only
az aks nodepool add \
  --resource-group rg-ci --cluster-name aks-ci \
  --name cispot --priority Spot \
  --eviction-policy Delete --spot-max-price -1 \
  --enable-cluster-autoscaler --min-count 0 --max-count 20 \
  --node-taints "workload=ci:NoSchedule"

Add the matching toleration (and a nodeSelector) to the runner pod template so builds schedule onto the spot pool. Size requests honestly: over-requesting CPU/memory inflates node count and silently doubles your bill.

For cost-per-build, attribute spend per job: tag the CI node pool, and label runner pods with the repo/workflow so a tool like OpenCost or Kubecost can roll up cost by pipeline. Dividing CI node spend by build count over a week gives the single number that justifies this whole architecture to finance.

Enterprise scenario

A fintech platform team moved ~600 daily GitHub Actions jobs from always-on VMs to ARC scale sets on AKS, with minRunners: 0 on a Karpenter-managed spot pool. Builds passed; cost dropped. Then audit failed them: every job was assuming the same broadly-scoped Azure role through OIDC, because the federated credential subject was repo:org/*:ref:refs/heads/*. A developer on any fork-derived branch could mint a token with write access to production Key Vaults.

The fix had two parts. First, pin the subject to the GitHub Environment, not the branch, so privileged access requires an environment with required reviewers:

az ad app federated-credential create --id "$APP_ID" --parameters '{
  "name": "gh-prod-deploy",
  "issuer": "https://token.actions.githubusercontent.com",
  "subject": "repo:org/payments-api:environment:production",
  "audiences": ["api://AzureADTokenExchange"]
}'

Second, the gotcha that bit them: pull_request triggers from forks cannot read id-token: write against protected environments anyway, but their internal-branch PRs still inherited repo-level access. They split identities — a read-only role for CI builds, a deploy role reachable only from the production environment job — and added a default-deny egress NetworkPolicy so a poisoned dependency in a build pod could not reach the metadata endpoint (169.254.169.254) to scrape the kubelet’s identity. Scale-to-zero economics were never the hard part; bounding the blast radius of an ephemeral pod that briefly holds a cloud token was.

Pitfalls

Scaling pods, not nodes. ARC and KEDA empty the pods; only the cluster autoscaler empties the nodes. Tune scale-down or you keep paying for idle capacity.
Cold-start tax. Large runner images make every job wait on a pull. Keep images lean and pre-pull on CI nodes, or hit rate suffers.
Loose OIDC subjects. A wildcard sub claim lets any branch or fork assume your role. Pin repo, ref, and environment.
Reaching for dind. Privileged daemons undo the isolation you just built. Use daemonless builders and quarantine the rare job that truly needs a socket.
Mixing ARC generations. The legacy RunnerDeployment CRDs and the new scale-set charts are different systems. Standardize on runner scale sets and do not blend the two.

Running Secure, Autoscaling Ephemeral CI Runners on Kubernetes (GitHub ARC and Azure DevOps Agents)

1. Why ephemeral runners

2. Deploying GitHub Actions Runner Controller (ARC)

3. Azure DevOps scale-set agents on the same cluster

4. Autoscaling on queue depth and scaling to zero

5. Hardening: per-job pods, non-root, network policies, dind alternatives

6. Build caching that survives ephemerality

7. Securing cloud access with OIDC, not secrets

Verify

Checklist

Capacity, spot nodes, and cost-per-build

Enterprise scenario

Pitfalls

Written by Vinod

Comments

Keep Reading

Blue-Green on Kubernetes with Argo Rollouts: Preview Services, Analysis Gates, and Automated Promotion

Standing Up Backstage as an Internal Developer Portal: Catalog, Software Templates, and TechDocs

Fast, Reproducible, Multi-Arch Builds with BuildKit Remote Cache and SBOM Attestations