Ansible Lesson 28 of 42

Ansible for Kubernetes, In Depth: kubernetes.core, k8s, Helm Charts, Manifests & Operator-Style Workflows

Kubernetes engineers tend to fall into two camps: the GitOps purists who think every cluster change should flow through ArgoCD or Flux, and the platform engineers who need to manage 300 clusters across 12 cloud accounts and want a single tool that can talk to all of them. Ansible serves the second group exceptionally well — and, perhaps surprisingly, it serves the first group too, as the orchestrator that creates the cluster, bootstraps ArgoCD, and manages the bits of infrastructure (load balancers, DNS, IAM) that GitOps can’t touch.

This lesson covers the kubernetes.core collection in EX374-grade depth: the k8s module’s seven states and how they map to kubectl apply/create/delete/replace/patch, manifest templating patterns, Helm chart management with helm and helm_info, operator-style “wait for ready” workflows, multi-cluster inventory patterns, RBAC, namespace lifecycle, and the line between “Ansible should do this” and “let GitOps do this.” If you are coming from kubectl apply -f and shell scripts, this lesson is your upgrade path.

Learning Objectives

By the end you will be able to:

Prerequisites

Mental Model: How Ansible Talks to Kubernetes

1. The K8s API is just a REST server — Ansible is an HTTP client

Every Kubernetes module in kubernetes.core is a thin Python wrapper around the official kubernetes Python client, which itself is a generated client for the cluster’s REST API. Whether you kubectl apply -f deploy.yaml or k8s: definition: ..., the wire-level call is the same: a PATCH or POST to /apis/apps/v1/namespaces/<ns>/deployments. This means anything kubectl can do, Ansible can do — and the auth model is identical: kubeconfig with contexts, or in-cluster ServiceAccount tokens.

2. state: maps to kubectl verbs

The k8s module has six states: present, absent, patched, replaced, plus the implicit “create if missing, update if exists” of present (which is what kubectl apply does). One module, six verbs, full idempotence — Ansible’s k8s module is essentially kubectl apply with idempotence guarantees.

3. Manifests are Jinja2-rendered, not pre-baked

The right way to template manifests for multiple environments isn’t envsubst or kustomize — it’s lookup('template', 'deploy.yaml.j2'). You write one deploy.yaml.j2 with {{ replicas }}, {{ image }}, {{ namespace }}, then call it from a play with environment-specific vars. This is more powerful than Kustomize overlays for complex templating (loops over service ports, conditional sidecars), simpler than Helm for one-off applications.

4. Helm is a first-class concept, not a shell-out

kubernetes.core.helm runs Helm chart releases without shelling out to the helm binary on the target. It tracks the chart name, version, values, and namespace, and produces idempotent install/upgrade/uninstall behavior. You can drive Helm-packaged apps from Ansible exactly as you would drive container deployments from Compose — same playbook structure, same idempotence guarantees.

5. Ansible and GitOps are complements, not competitors

Ansible owns the cluster lifecycle (create cluster, install CNI, bootstrap ArgoCD, configure ingress controller IAM), while GitOps owns app delivery (the manifests of your microservices). The test for “is this Ansible or GitOps?” is: if I redeploy the cluster from scratch, who creates this object first? That object belongs in Ansible. Everything else (your apps, your team’s workloads) belongs in Git, watched by ArgoCD/Flux.

Setting Up the Control Node

kubernetes.core requires the kubernetes Python client and (for Helm) the helm binary on the control node. The collection itself comes from Galaxy.

# Install the Python client
python3 -m pip install --user 'kubernetes>=29.0.0' 'PyYAML>=6.0' 'jsonpatch>=1.33'

# Install the collection
ansible-galaxy collection install kubernetes.core

# Install Helm (control node)
curl -fsSL https://get.helm.sh/helm-v3.14.0-linux-amd64.tar.gz | tar -xz
sudo mv linux-amd64/helm /usr/local/bin/

# Verify
ansible-galaxy collection list kubernetes.core
helm version

For multi-architecture environments (control node on macOS or arm64), match the helm binary to your control node OS — the target is the K8s API, not a specific OS, so target architecture doesn’t matter.

Authentication: Three Patterns

Pattern A: Kubeconfig file (most common)

- name: Apply a namespace using kubeconfig
  kubernetes.core.k8s:
    kubeconfig: ~/.kube/config
    context: prod-cluster
    state: present
    definition:
      apiVersion: v1
      kind: Namespace
      metadata:
        name: payments

This reads the standard kubeconfig with your contexts. To target multiple clusters, set context: per task or per play.

Pattern B: In-cluster ServiceAccount (Ansible runs inside K8s)

When Ansible runs inside a pod (e.g. as a Kubernetes Job, or inside an Ansible Automation Platform execution environment that runs as a pod):

- name: Apply a manifest using in-cluster auth
  kubernetes.core.k8s:
    state: present
    definition: "{{ lookup('file', 'manifest.yaml') | from_yaml }}"
  # No kubeconfig — auto-detects /var/run/secrets/kubernetes.io/serviceaccount

The pod’s ServiceAccount token must have RBAC for whatever you’re applying. This is the pattern AAP uses when its execution environments run as Kubernetes pods.

Pattern C: OIDC / short-lived tokens

For zero-long-lived-credential setups (EKS with IRSA, GKE Workload Identity, AKS Workload Identity, or any OIDC provider):

- name: Get short-lived token from OIDC
  ansible.builtin.command:
    cmd: aws eks get-token --cluster-name prod
  register: eks_token
  changed_when: false

- name: Apply with token
  kubernetes.core.k8s:
    api_key: "{{ (eks_token.stdout | from_json).status.token }}"
    host: "https://prod.eks.us-east-1.amazonaws.com"
    ca_cert: /etc/eks/prod-ca.crt
    state: present
    definition: "{{ lookup('template', 'deploy.yaml.j2') | from_yaml }}"

Use this pattern when your control node runs in a cloud workload that has cluster-targeted IAM permissions but no static kubeconfig.

The kubernetes.core.k8s Module — Module-by-Module

k8s — the workhorse

Manages any Kubernetes object. Six states: present, absent, patched, replaced. Plus a definition (inline) or src (file path) parameter.

- name: Apply a namespace
  kubernetes.core.k8s:
    state: present
    definition:
      apiVersion: v1
      kind: Namespace
      metadata:
        name: payments
        labels:
          team: payments
          env: prod

- name: Apply a deployment from a templated file
  kubernetes.core.k8s:
    state: present
    definition: "{{ lookup('template', 'roles/payments/templates/deploy.yaml.j2') | from_yaml }}"
    namespace: payments
    apply: true   # Equivalent to `kubectl apply` (server-side merge)

- name: Replace (not merge) a configmap
  kubernetes.core.k8s:
    state: replaced
    definition: "{{ lookup('file', 'config.yaml') | from_yaml }}"

- name: Strategic-merge-patch a deployment's image
  kubernetes.core.k8s:
    state: patched
    kind: Deployment
    namespace: payments
    name: api
    definition:
      spec:
        template:
          spec:
            containers:
              - name: api
                image: registry.example.com/api:v1.2.3

- name: Delete a deployment
  kubernetes.core.k8s:
    state: absent
    kind: Deployment
    namespace: payments
    name: deprecated-service

Note apply: true — when present, the module uses server-side apply (the modern kubectl apply behavior), which is the right default for most cases because it preserves fields that other controllers (autoscalers, admission webhooks) have written.

k8s_info — the gather-facts equivalent

- name: Get all pods in payments namespace
  kubernetes.core.k8s_info:
    kind: Pod
    namespace: payments
  register: payments_pods

- name: Show pod names
  ansible.builtin.debug:
    msg: "{{ payments_pods.resources | map(attribute='metadata.name') | list }}"

- name: Get a specific deployment with full status
  kubernetes.core.k8s_info:
    kind: Deployment
    namespace: payments
    name: api
  register: api_deploy

- name: Wait for at least 3 ready replicas
  ansible.builtin.assert:
    that:
      - api_deploy.resources[0].status.readyReplicas >= 3

k8s_exec — exec into pods

- name: Run a smoke test inside a pod
  kubernetes.core.k8s_exec:
    namespace: payments
    pod: "{{ payments_pods.resources[0].metadata.name }}"
    container: api
    command: /bin/sh -c "curl -s http://localhost:8080/health | grep ok"
  register: smoke
  failed_when: smoke.rc != 0

k8s_log — read pod logs

- name: Get last 100 lines of a pod's logs
  kubernetes.core.k8s_log:
    namespace: payments
    name: api-7d8f4c5b9-xkj2p
    tail_lines: 100
  register: api_logs

k8s_scale — scale workloads

- name: Scale the api deployment to 5 replicas
  kubernetes.core.k8s_scale:
    api_version: apps/v1
    kind: Deployment
    name: api
    namespace: payments
    replicas: 5
    wait: true
    wait_timeout: 120

k8s_drain — cordon and drain a node

- name: Drain node before maintenance
  kubernetes.core.k8s_drain:
    name: ip-10-0-1-15.ec2.internal
    state: drain
    delete_options:
      force: true
      ignore_daemonsets: true
      grace_period: 30

k8s_rollback — roll back deployments

- name: Rollback the api deployment one revision
  kubernetes.core.k8s_rollback:
    api_version: apps/v1
    kind: Deployment
    name: api
    namespace: payments

Helm Lifecycle with kubernetes.core.helm

Helm releases are first-class Ansible resources. The collection ships helm, helm_info, helm_repository, helm_template, and helm_plugin.

- name: Add a Helm repo
  kubernetes.core.helm_repository:
    name: prometheus-community
    repo_url: https://prometheus-community.github.io/helm-charts
    state: present

- name: Install kube-prometheus-stack with custom values
  kubernetes.core.helm:
    release_name: monitoring
    chart_ref: prometheus-community/kube-prometheus-stack
    chart_version: 56.6.2
    release_namespace: monitoring
    create_namespace: true
    values:
      grafana:
        adminPassword: "{{ vault_grafana_admin }}"
        ingress:
          enabled: true
          hosts:
            - grafana.example.com
      prometheus:
        prometheusSpec:
          retention: 30d
          storageSpec:
            volumeClaimTemplate:
              spec:
                accessModes: [ReadWriteOnce]
                resources:
                  requests:
                    storage: 100Gi
    wait: true
    wait_timeout: 10m

- name: Get state of installed release
  kubernetes.core.helm_info:
    release_name: monitoring
    release_namespace: monitoring
  register: release_state

- name: Rollback if a check failed
  kubernetes.core.helm:
    release_name: monitoring
    release_namespace: monitoring
    release_state: rolled_back
    revision: 3
  when: release_state.failed

kubernetes.core.helm is idempotent — running it twice with the same chart_version and values results in changed: false. Bumping chart_version or changing values triggers an upgrade. Setting release_state: absent uninstalls.

Manifest Templating Patterns

Pattern 1: Inline definition (small objects)

- name: Create a ConfigMap
  kubernetes.core.k8s:
    state: present
    definition:
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: app-config
        namespace: "{{ app_namespace }}"
      data:
        DATABASE_URL: "{{ db_connection_string }}"
        LOG_LEVEL: "{{ log_level }}"

Pattern 2: Templated file (deployments, custom resources)

# templates/deploy.yaml.j2
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ app_name }}
  namespace: {{ app_namespace }}
spec:
  replicas: {{ replicas | default(2) }}
  selector:
    matchLabels:
      app: {{ app_name }}
  template:
    metadata:
      labels:
        app: {{ app_name }}
        version: {{ image_tag }}
    spec:
      containers:
        - name: {{ app_name }}
          image: {{ image_repo }}:{{ image_tag }}
          ports:
            {% for port in app_ports %}
            - containerPort: {{ port }}
            {% endfor %}
          resources:
            requests:
              cpu: {{ cpu_request | default('100m') }}
              memory: {{ memory_request | default('128Mi') }}
# task in role
- name: Apply deployment
  kubernetes.core.k8s:
    state: present
    definition: "{{ lookup('template', 'deploy.yaml.j2') | from_yaml }}"
    apply: true

Pattern 3: Multi-document YAML (deployment + service + ingress in one file)

- name: Apply a multi-doc manifest
  kubernetes.core.k8s:
    state: present
    definition: "{{ lookup('template', 'app.yaml.j2') | from_yaml_all | list }}"
    apply: true

from_yaml_all parses ----separated YAML documents into a list, then the module applies each.

Operator-Style Workflows

A pattern that’s common in platform automation: create a Custom Resource, then wait for a status condition before continuing.

- name: Create an Argo CD Application
  kubernetes.core.k8s:
    state: present
    definition:
      apiVersion: argoproj.io/v1alpha1
      kind: Application
      metadata:
        name: payments-api
        namespace: argocd
      spec:
        project: payments
        source:
          repoURL: https://github.com/example/payments-api
          targetRevision: HEAD
          path: deploy/k8s
        destination:
          server: https://kubernetes.default.svc
          namespace: payments
        syncPolicy:
          automated:
            prune: true
            selfHeal: true

- name: Wait for the Application to be Synced and Healthy
  kubernetes.core.k8s_info:
    api_version: argoproj.io/v1alpha1
    kind: Application
    namespace: argocd
    name: payments-api
  register: app
  until:
    - app.resources[0].status.sync.status == 'Synced'
    - app.resources[0].status.health.status == 'Healthy'
  retries: 30
  delay: 10

This is the operator-style “create then reconcile” pattern. Same approach works for Cert-Manager Certificates, Crossplane Compositions, and any other CRD that exposes status conditions.

Multi-Cluster Inventory

For platform teams managing 50+ clusters, define them in inventory:

# inventories/all-clusters/clusters.yml
all:
  children:
    prod_clusters:
      hosts:
        prod-us-east:
          k8s_context: prod-us-east
          k8s_region: us-east-1
        prod-eu-west:
          k8s_context: prod-eu-west
          k8s_region: eu-west-1
        prod-ap-south:
          k8s_context: prod-ap-south
          k8s_region: ap-south-1
    nonprod_clusters:
      hosts:
        dev:
          k8s_context: dev
        staging:
          k8s_context: staging

Then run a play that targets prod_clusters and uses delegate_to: localhost (since K8s API calls don’t run on the cluster — they run on the control node):

- hosts: prod_clusters
  gather_facts: false
  tasks:
    - name: Apply baseline namespace policy to every prod cluster
      kubernetes.core.k8s:
        kubeconfig: ~/.kube/config
        context: "{{ k8s_context }}"
        state: present
        definition: "{{ lookup('template', 'baseline-policy.yaml.j2') | from_yaml }}"
      delegate_to: localhost

delegate_to: localhost is the key — Ansible iterates over the cluster inventory but executes each task locally, just changing the context: per cluster.

Hands-on Free Lab: Bootstrap a kind Cluster, Install ArgoCD, Deploy an App

Free, runs on your laptop, takes 10 minutes.

# Prerequisites
brew install kind helm  # macOS, or use your distro equivalent
mkdir -p ~/ansible-k8s-lab && cd ~/ansible-k8s-lab

# Create a kind cluster
cat > kind-config.yaml <<'EOF'
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
  - role: worker
  - role: worker
EOF
kind create cluster --name ansible-lab --config kind-config.yaml

# Build the inventory
cat > inventory.yml <<'EOF'
all:
  hosts:
    cluster_lab:
      ansible_connection: local
      k8s_context: kind-ansible-lab
EOF

# Build the playbook
cat > site.yml <<'EOF'
---
- hosts: cluster_lab
  gather_facts: false
  collections:
    - kubernetes.core
  tasks:

    - name: Create monitoring namespace
      kubernetes.core.k8s:
        context: "{{ k8s_context }}"
        state: present
        definition:
          apiVersion: v1
          kind: Namespace
          metadata:
            name: monitoring

    - name: Add prometheus-community Helm repo
      kubernetes.core.helm_repository:
        name: prometheus-community
        repo_url: https://prometheus-community.github.io/helm-charts

    - name: Install kube-prometheus-stack
      kubernetes.core.helm:
        kubeconfig: ~/.kube/config
        context: "{{ k8s_context }}"
        release_name: monitoring
        chart_ref: prometheus-community/kube-prometheus-stack
        chart_version: 56.6.2
        release_namespace: monitoring
        wait: true
        wait_timeout: 5m
        values:
          grafana:
            adminPassword: "labpassword"

    - name: Add argo-cd Helm repo
      kubernetes.core.helm_repository:
        name: argo
        repo_url: https://argoproj.github.io/argo-helm

    - name: Install Argo CD
      kubernetes.core.helm:
        kubeconfig: ~/.kube/config
        context: "{{ k8s_context }}"
        release_name: argocd
        chart_ref: argo/argo-cd
        chart_version: 6.7.1
        release_namespace: argocd
        create_namespace: true
        wait: true
        wait_timeout: 5m

    - name: Deploy a sample app via templated manifest
      kubernetes.core.k8s:
        context: "{{ k8s_context }}"
        state: present
        definition: "{{ lookup('template', 'demo-app.yaml.j2') | from_yaml_all | list }}"

    - name: Wait for sample app to have 2 ready replicas
      kubernetes.core.k8s_info:
        context: "{{ k8s_context }}"
        kind: Deployment
        namespace: default
        name: hello
      register: hello_deploy
      until: hello_deploy.resources[0].status.readyReplicas | default(0) >= 2
      retries: 20
      delay: 5
EOF

# Build the manifest template
mkdir -p templates
cat > templates/demo-app.yaml.j2 <<'EOF'
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
        - name: hello
          image: gcr.io/google-samples/hello-app:2.0
          ports:
            - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: hello
  namespace: default
spec:
  selector:
    app: hello
  ports:
    - port: 80
      targetPort: 8080
EOF

# Run the playbook
ansible-playbook -i inventory.yml site.yml

# Verify
kubectl get pods -A
kubectl --context kind-ansible-lab port-forward -n monitoring svc/monitoring-grafana 3000:80 &
# Open http://localhost:3000 (admin / labpassword)

# Tear down
kind delete cluster --name ansible-lab

This is the complete bootstrap pattern: namespace lifecycle, Helm charts (Prometheus + ArgoCD), and a templated manifest deployment — exactly the workflow you’d use to provision a real cluster.

Common Mistakes & Troubleshooting

1. “Failed to import the required Python library (kubernetes)” The kubernetes Python client isn’t installed in the Python interpreter Ansible uses. Run ansible-config dump | grep INTERPRETER_PYTHON to find it, then <that python> -m pip install kubernetes.

2. apply: true keeps reporting changed: true A controller (autoscaler, admission webhook) is rewriting fields you specify. Use server-side apply with field manager owner-tracking, or remove the field from your manifest and let the controller own it.

3. k8s_info returns empty list when the resource exists Check the api_version — listing pods is v1 (no group) but listing deployments is apps/v1. Wrong API version returns empty silently.

4. Helm install fails with “context deadline exceeded” The chart has CRDs and the cluster takes time to register them. Increase wait_timeout: 10m or split into two tasks: install with wait: false, then helm_info with until: status.status == 'deployed'.

5. “Unable to recognize” error on a CRD The CRD wasn’t applied before the CR. Either install the CRD in a separate task with wait: true, or use Helm (which applies CRDs first), or pass --validate=false (last resort).

6. Multi-cluster play hits the same cluster repeatedly You forgot delegate_to: localhost. Without it, Ansible tries to SSH to the inventory hostname (which isn’t a real host in K8s inventory).

7. RBAC errors when running from inside the cluster The pod’s ServiceAccount doesn’t have the required RBAC. Create a ClusterRole / Role with the verbs you need (get, list, create, update, delete) on the resources you touch, and bind it to the ServiceAccount.

Best Practices

Security Notes

Q&A — 13 Questions

Q1. Does kubernetes.core.k8s use kubectl under the hood? No. It uses the kubernetes Python client to talk directly to the K8s API. No kubectl binary required on either control node or target.

Q2. What’s the difference between state: present and apply: true? state: present creates if missing, updates if exists (client-side merge). state: present + apply: true uses server-side apply, which is the modern equivalent of kubectl apply and tracks field ownership.

Q3. Should I use Helm or templated manifests? Helm for distributed packages (Prometheus, ArgoCD, ingress controllers) — anyone with Helm can install them. Templated manifests for your own apps where you don’t need Helm’s chart distribution model.

Q4. Can I use Ansible and ArgoCD on the same cluster? Yes — and you should. Ansible manages cluster baseline (namespaces, RBAC, ingress controllers, ArgoCD itself). ArgoCD manages app delivery (your workloads). They don’t fight if you draw the line clearly.

Q5. What’s the difference between k8s_drain and kubectl drain? None functionally. k8s_drain is the Ansible wrapper. Use it when you want drain-then-do-something workflows in a playbook, e.g. drain a node, run package upgrades, uncordon.

Q6. How do I roll back a Helm release? kubernetes.core.helm with release_state: rolled_back and a revision: parameter. Use helm_info to find prior revisions.

Q7. Can I template a manifest that includes secrets? Yes — pull values from Ansible Vault ({{ vault_db_password }}) and render into a Secret resource. The rendered manifest is in-memory only (Ansible doesn’t write it to disk unless you do).

Q8. What if a CRD doesn’t exist when I try to apply a CR? k8s will fail. Apply the CRD first (Helm, k8s with the CRD definition, or operator-installed), wait for it (k8s_info with until), then apply the CR.

Q9. How do I run a one-off Job and wait for it to complete? Apply the Job manifest with state: present, then k8s_info with until: status.succeeded == 1 and a generous retries/delay. There’s no “wait for Job” shorthand.

Q10. Can I use Ansible to provision the cluster itself? Indirectly — use Ansible to call Terraform/Pulumi (or community.general.terraform module) which creates the cluster, then use kubernetes.core to manage objects inside it. Don’t try to write “create EKS cluster” in pure Ansible — there’s no module for it (use Terraform).

Q11. What’s the right way to manage ConfigMaps with binary data? Use kubernetes.core.k8s with definition.binaryData.<key>: "{{ lookup('file', 'path') | b64encode }}". Don’t use data: for binary — it’s UTF-8 only.

Q12. How does Ansible handle K8s API rate limiting? The Python client respects the API server’s rate-limit responses. For high-volume operations (apply 1000 manifests), use forks: 1 and serial: 1 — parallel Ansible against the same cluster trips the rate limiter.

Q13. Should I use the --dry-run=server mode? Yes — kubernetes.core.k8s supports dry_run: true (server-side dry-run) which validates against admission controllers without actually applying. Combine with --check for full preview.

Quick Check

  1. Which collection ships the k8s module?
  2. What does apply: true do?
  3. What’s the equivalent of kubectl get pods in Ansible?
  4. How do you scale a deployment to 5 replicas?
  5. What’s the recommended pattern for multi-cluster plays?
  6. How do you wait for a Helm release to be ready?
  7. What’s the difference between state: replaced and state: patched?
  8. When is delegate_to: localhost required for K8s plays?

Exercise

Build a complete role cluster_baseline that:

  1. Creates standard namespaces (monitoring, logging, ingress-nginx, cert-manager, argocd).
  2. Applies baseline NetworkPolicies (default-deny, allow-monitoring).
  3. Installs cert-manager via Helm with a Let’s Encrypt ClusterIssuer.
  4. Installs ingress-nginx via Helm with cloud-specific load balancer annotations (parameterize by cloud_provider: aws|gcp|azure).
  5. Installs argocd via Helm with an admin OIDC integration.
  6. Applies a baseline ResourceQuota and LimitRange to every namespace.
  7. Includes a validate.yml task list that confirms each component is healthy.

Test it on a kind cluster locally, then deploy it to two real cloud clusters (one EKS, one GKE) and confirm the same role bootstraps both with only the cloud_provider var changing.

Cert Mapping

Glossary

Next Steps

You can now drive Kubernetes from Ansible — manifests, Helm releases, custom resources, multi-cluster — and integrate cleanly with GitOps tools. The next lesson covers Ansible for containers (Docker and Podman): the community.docker and containers.podman collections, Compose-style multi-container deployments, container image building, registry pushing, and the patterns that let Ansible drive container hosts the way it drives any other infrastructure.

ansiblekubernetesk8skubernetes-corehelmmanifestsoperatorgitopsargocdkubeconfigkloudvin
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments