DevOps Multi-Cloud

Building a Reusable GitHub Actions Platform: Composite Actions, Reusable Workflows, and Org-Wide Standards

Every org with more than a handful of repos eventually drowns in copy-pasted CI YAML: the same Node setup, the same actions/checkout, the same brittle deploy block duplicated 200 times with subtle drift. This guide shows how to build a versioned, governed GitHub Actions platform that hundreds of teams consume without you becoming a bottleneck.

1. The copy-paste pipeline problem

When each repo owns a full copy of its .github/workflows/ci.yml, you have no leverage. A CVE in a third-party action means 200 pull requests. A new mandatory SBOM step means 200 more. Worse, every copy diverges, so “our CI” stops meaning anything concrete.

A workflow platform solves four things: deduplication (one source of truth per pipeline shape), governance (you can mandate a step org-wide), safe change (semantic versioning so consumers opt into upgrades), and least-privilege auth (centralized OIDC instead of long-lived secrets sprayed across repos).

2. Choosing the right abstraction

GitHub gives you three building blocks. Picking the wrong one is the most common early mistake.

Abstraction What it is Use when
Composite action A bundle of steps that runs inside a job You want to reuse a sequence of steps (setup, cache, login) within a caller’s job
Reusable workflow An entire workflow called via workflow_call You want to own whole jobs: build, test, deploy, with their own runners and permissions
Starter workflow A template copied into a repo once You want a starting point teams then own and edit themselves

The mental model: composite actions are functions you call from a step; reusable workflows are jobs you call from a workflow. Starter workflows are scaffolding you hand off and forget. For a governed platform you mostly want reusable workflows (to own the pipeline) plus composite actions (to share step-level logic inside them).

Callout: A reusable workflow can call composite actions, but a composite action cannot call a reusable workflow. Compose downward, not upward.

3. A versioned org-level .github repo

GitHub treats a repo literally named .github in your org as a special home for org defaults. Create one and lay it out so reusable workflows and shared actions live together.

gh repo create my-org/.github --private --clone
cd .github
mkdir -p .github/workflows
mkdir -p actions/setup-node-build
mkdir -p workflow-templates
git checkout -b main

Note the distinction: files under .github/workflows/ in this repo are the reusable workflows other repos call. Files under workflow-templates/ are starter workflows surfaced in the org’s “New workflow” UI. They are not the same thing.

A reusable workflow declares a workflow_call trigger:

# .github/workflows/node-ci.yml
name: node-ci
on:
  workflow_call:
    inputs:
      node-version:
        type: string
        default: "20"
      run-lint:
        type: boolean
        default: true
    secrets:
      NPM_TOKEN:
        required: false
    outputs:
      image-tag:
        description: "Built image tag"
        value: ${{ jobs.build.outputs.image-tag }}

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
    outputs:
      image-tag: ${{ steps.meta.outputs.tag }}
    steps:
      - uses: actions/checkout@v4
      - uses: my-org/.github/actions/setup-node-build@v1
        with:
          node-version: ${{ inputs.node-version }}
      - if: ${{ inputs.run-lint }}
        run: npm run lint
      - run: npm test
      - id: meta
        run: echo "tag=sha-${GITHUB_SHA::12}" >> "$GITHUB_OUTPUT"

The composite action it references:

# actions/setup-node-build/action.yml
name: "Setup Node and build deps"
description: "Checkout-agnostic Node setup with cache"
inputs:
  node-version:
    description: "Node major version"
    required: true
runs:
  using: "composite"
  steps:
    - uses: actions/setup-node@v4
      with:
        node-version: ${{ inputs.node-version }}
        cache: "npm"
    - run: npm ci
      shell: bash

Callout: Every run step in a composite action must declare shell:. This is the single most common composite-action failure, and the error message is not obvious.

Semantic tags and a deprecation policy

Consumers should pin to a moving major tag (@v1) that you advance, plus you publish immutable patch tags (@v1.4.2) for teams that want to freeze. Maintain the major tag as a sliding pointer:

git tag -a v1.4.2 -m "node-ci: add SBOM step"
git tag -fa v1 -m "advance v1 -> v1.4.2"
git push origin v1.4.2
git push origin v1 --force

Publish a written policy: major tags get 90 days of support after the next major ships; breaking changes only land on a new major; deprecations are announced via a pinned discussion and an annotation emitted from the workflow itself:

- run: echo "::warning::node-ci v1 is deprecated; migrate to v2 by 2026-09-01"

4. Inputs, secrets, and outputs across workflow_call

The boundary is strict and that is a feature. A called workflow sees only what the caller explicitly passes.

# consumer repo: .github/workflows/ci.yml
name: ci
on: [push, pull_request]

jobs:
  ci:
    uses: my-org/.github/.github/workflows/node-ci.yml@v1
    with:
      node-version: "20"
    secrets:
      NPM_TOKEN: ${{ secrets.NPM_TOKEN }}
    permissions:
      contents: read

Three rules that bite people:

Outputs flow back through the outputs: map you declared, and downstream jobs read them via needs:

  deploy:
    needs: ci
    runs-on: ubuntu-latest
    steps:
      - run: echo "Deploying ${{ needs.ci.outputs.image-tag }}"

5. Enforcing standards with rulesets and CODEOWNERS

A platform nobody is required to use is just a library. GitHub required workflows (configured at the org level via repository rulesets) let you force a reusable workflow to run on every PR in scope, even if the target repo has no workflow file of its own.

In Org Settings -> Rules -> Rulesets, create a branch ruleset targeting your default branches that:

Gate changes to the platform repo itself with CODEOWNERS so only the platform team can alter shared workflows:

# .github/CODEOWNERS
/.github/workflows/   @my-org/platform-team
/actions/             @my-org/platform-team

Callout: Required workflows run with the consumer repo’s context and token. Keep them fast and side-effect-free, because they execute on every single PR across the org.

6. Keyless auth with OIDC to Azure and AWS

Stop storing cloud credentials in repo secrets. With OIDC, GitHub mints a short-lived token per run, and the cloud exchanges it for temporary credentials scoped to that repo and branch. Any job using it needs id-token: write.

Azure via a federated credential on an app registration:

az ad app federated-credential create \
  --id "$APP_OBJECT_ID" \
  --parameters '{
    "name": "github-main",
    "issuer": "https://token.actions.githubusercontent.com",
    "subject": "repo:my-org/my-service:ref:refs/heads/main",
    "audiences": ["api://AzureADTokenExchange"]
  }'
  deploy-azure:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: azure/login@v2
        with:
          client-id: ${{ vars.AZURE_CLIENT_ID }}
          tenant-id: ${{ vars.AZURE_TENANT_ID }}
          subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
      - run: az group list -o table

AWS via an IAM OIDC identity provider and a role whose trust policy pins the sub claim:

{
  "Effect": "Allow",
  "Principal": { "Federated": "arn:aws:iam::123456789012:oidc-provider/token.actions.githubusercontent.com" },
  "Action": "sts:AssumeRoleWithWebIdentity",
  "Condition": {
    "StringEquals": { "token.actions.githubusercontent.com:aud": "sts.amazonaws.com" },
    "StringLike": { "token.actions.githubusercontent.com:sub": "repo:my-org/my-service:ref:refs/heads/main" }
  }
}
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/gha-deploy
          aws-region: us-east-1

The win for a platform: centralize this in the shared deploy workflow once. Each consumer only supplies its own client ID or role ARN as a repo variable, and the federation subject/sub condition enforces that repo X can only assume role X.

Callout: Scope the subject/sub claim as tightly as you can. repo:my-org/* trusts the entire org; pin to a specific repo, branch, or environment: instead.

7. Versioning, testing, and releasing

Test workflows locally before tagging. act runs jobs in Docker against a chosen event:

act pull_request -j build --container-architecture linux/amd64

act does not perfectly emulate workflow_call chaining or OIDC token minting, so back it with a real integration smoke test: a throwaway consumer repo that pins @main, runs the full pipeline against live runners on a schedule, and fails loudly on drift.

Lint the YAML in the platform repo’s own CI:

npm install -g @action-validator/cli
action-validator .github/workflows/node-ci.yml

Cut releases deterministically. Conventional commits plus a release step that advances both the patch tag and the sliding major keeps the contract honest. Treat the major-tag move as the actual “ship” event, since that is what most consumers track.

Enterprise scenario

A fintech platform team rolled out a shared node-ci.yml to ~180 repos pinned at @v1. Weeks later, deploys to a regulated workload started failing intermittently with AssumeRoleWithWebIdentity errors, but only on PRs from forks and on release/* branches. The trust policy pinned sub to repo:org/svc:ref:refs/heads/main, so anything off main got no credentials, and the failure surfaced inside the consumer’s job context, making it look like a per-repo problem rather than a platform one.

The real gotcha: they had assumed id-token: write and a single sub condition covered every trigger. It did not. The OIDC sub claim format differs by trigger, branch vs. tag vs. environment, and fork PRs intentionally receive a read-only token with no id-token write capability at all, by GitHub design.

The fix was to stop matching on branch refs and key the trust on the deployment environment instead, which GitHub stamps into the claim and which forks can never assume:

{
  "Condition": {
    "StringEquals": {
      "token.actions.githubusercontent.com:aud": "sts.amazonaws.com",
      "token.actions.githubusercontent.com:sub": "repo:org/svc:environment:production"
    }
  }
}

Consumers then gated the deploy job with environment: production, which also forced required-reviewer approval before any token was minted. Fork PRs cleanly skipped deploy instead of erroring. One trust-policy claim, scoped to an environment rather than a ref, removed an entire class of confusing cross-repo failures.

Verify

# 1. Confirm the reusable workflow resolves and a run was created
gh workflow list --repo my-org/my-service
gh run list --repo my-org/my-service --workflow ci --limit 3

# 2. Inspect a run; confirm the called workflow + OIDC job executed
gh run view --repo my-org/my-service --log | grep -E "node-ci|id-token|AssumeRole"

# 3. Confirm the required workflow is enforced by the ruleset
gh api repos/my-org/my-service/rules/branches/main \
  --jq '.[].type'

# 4. Verify the major tag points where you expect
git ls-remote --tags https://github.com/my-org/.github v1

A green run, a non-empty rules list including the required workflow, and a v1 tag pointing at your latest patch SHA mean the platform is wired correctly.

Rollout checklist

Rollout strategy: migrating 50+ repos safely

Never flip the whole org at once. Stage it:

  1. Pilot (3-5 repos): the platform team’s own repos consume @v1. Shake out edge cases where it costs you, not other teams.
  2. Opt-in wave: announce, document, and let willing teams migrate. Provide a one-PR migration that deletes their old YAML and adds the uses: call. Automate it with a script that opens PRs in bulk via gh.
  3. Required wave: enable the required-workflow ruleset on increasing scopes (by team, then org-wide). Run it in a non-blocking mode first if you can, watching failure rates before making it a hard gate.
  4. Cleanup: delete starter-workflow leftovers and dead secrets once OIDC is universal.

Keep both the old and new path working during each wave. The moment migration becomes all-or-nothing, teams stop trusting the platform.

Pitfalls

Build the contract first, version it like an API, and let teams upgrade on their own schedule. That is the difference between a platform people adopt and a mandate they route around.

GitHub ActionsCI/CDPlatform EngineeringYAMLOIDC

Comments

Keep Reading