DevOps Multi-Cloud

Standing Up Backstage as an Internal Developer Portal: Catalog, Software Templates, and TechDocs

Backstage demos beautifully and rots quietly. A platform team can have a portal running in an afternoon and a catalog full of stale, orphaned entities six months later. The thing that separates a useful internal developer portal from an abandoned tab is not the frontend; it is the discipline in the catalog model, the quality of the golden-path templates, and a docs pipeline that nobody has to think about. This is the build I reach for when a platform team has to stand up Backstage for real and keep it honest.

1. The architecture you are actually operating

Backstage is a monorepo of a React frontend app and a Node backend, glued by an app-config.yaml and a set of plugins. Since the deprecation of the legacy backend, the backend is a single process assembled from backend plugins and backend modules wired through a dependency-injection system. You do not edit a giant index.ts of createRouter calls anymore; you add() features to a createBackend() instance.

// packages/backend/src/index.ts (new backend system)
import { createBackend } from '@backstage/backend-defaults';

const backend = createBackend();

backend.add(import('@backstage/plugin-app-backend'));
backend.add(import('@backstage/plugin-catalog-backend'));
backend.add(import('@backstage/plugin-catalog-backend-module-github'));
backend.add(import('@backstage/plugin-scaffolder-backend'));
backend.add(import('@backstage/plugin-techdocs-backend'));
backend.add(import('@backstage/plugin-auth-backend'));
backend.add(import('@backstage/plugin-auth-backend-module-github-provider'));
backend.add(import('@backstage/plugin-permission-backend'));

backend.start();

The mental model that matters:

Scaffold the app with the official CLI rather than cloning a starter — it gives you the current backend system out of the box:

npx @backstage/create-app@latest --path backstage
cd backstage
yarn install
yarn dev   # frontend on :3000, backend on :7007

Resist the urge to fork and customize the frontend App.tsx heavily on day one. The leverage in Backstage is in the catalog and the scaffolder, not in bespoke React. Keep the frontend close to stock so platform upgrades stay a yarn backstage-cli versions:bump and not a merge conflict marathon.

2. Modeling the catalog: get the entity kinds right

The catalog is a graph of typed entities. Get the kinds and relations right and everything else (ownership, on-call routing, dependency views) falls out for free. Get them wrong and you have a glorified spreadsheet.

The kinds you will actually use:

Kind Models Key relations
Component A deployable unit of software (a service, a website, a library) ownedBy, partOf System, providesApi, dependsOn
API A boundary contract (OpenAPI, gRPC, AsyncAPI) providedBy, consumedBy
System A collection of components and resources with a shared purpose hasPart, ownedBy
Resource Infrastructure a component needs (a database, a bucket, a queue) dependencyOf, partOf System
Group / User Org structure, almost always synced from your IdP memberOf, hasMember
Domain A bounded area the business cares about hasPart (Systems)

The discipline: every Component has an owner that resolves to a real Group, and belongs to a System. An entity with owner: guests or no system is a smell. A catalog-info.yaml that earns its place:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: checkout-api
  title: Checkout API
  description: Handles cart-to-order conversion and payment intents.
  annotations:
    github.com/project-slug: acme/checkout-api
    backstage.io/techdocs-ref: dir:.
    pagerduty.com/integration-key: <key>
  tags:
    - go
    - tier1
spec:
  type: service
  lifecycle: production
  owner: group:default/payments-team
  system: commerce
  providesApis:
    - checkout-api-v2
  dependsOn:
    - resource:default/checkout-postgres
---
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: checkout-api-v2
spec:
  type: openapi
  lifecycle: production
  owner: group:default/payments-team
  system: commerce
  definition:
    $text: ./openapi.yaml

The lifecycle field is not decoration — it drives filtering and lets you visually retire things. Use experimental, production, and deprecated consistently; a deprecated component that still shows as healthy is how teams keep calling dead APIs.

3. Ingesting entities: discovery, not registration-by-hand

Manually registering each catalog-info.yaml through the UI does not scale past a demo. The catalog ingests entities through providers and processors that run on an interval. The two patterns you want:

GitHub discovery crawls an org for catalog-info.yaml files and registers what it finds. Add the catalog module and configure discovery in app-config.yaml:

catalog:
  providers:
    github:
      acmeOrg:
        organization: 'acme'
        catalogPath: '/catalog-info.yaml'
        filters:
          branch: 'main'
        schedule:
          frequency: { minutes: 30 }
          timeout: { minutes: 3 }

Org ingestion pulls Users and Groups from your IdP so owner: group:default/payments-team actually resolves. The GitHub org provider mirrors teams and members:

catalog:
  providers:
    githubOrg:
      acmeId:
        id: production
        githubUrl: 'https://github.com'
        orgs: ['acme']
        schedule:
          frequency: { hours: 1 }
          timeout: { minutes: 5 }

For Entra ID / Okta, swap in @backstage/plugin-catalog-backend-module-msgraph or the community Okta provider — the shape is identical, only the config block changes. The non-negotiable: org data comes from one authoritative source on a schedule, never hand-maintained YAML. A User that has left the company should disappear from the catalog within one sync, which means orphaned ownership surfaces automatically.

Static catalog.locations entries are fine for the handful of platform-owned entities (your Systems, Domains, and shared Resources). Everything a service team owns should arrive through discovery so the source of truth is the code repo, not the portal.

4. Authoring Software Templates: the golden path as code

A Software Template (kind: Template) is the scaffolder’s unit of work: a set of parameters rendered as a form, then a sequence of steps that run actions. The built-in actions cover most of the path — fetch a skeleton, template it with the form values, publish a repo, register it back into the catalog.

apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: go-service
  title: Go Service (golden path)
  description: Provisions a production-ready Go service with CI, TechDocs, and catalog registration.
  tags: [recommended, go]
spec:
  owner: group:default/platform-team
  type: service
  parameters:
    - title: Service details
      required: [name, owner]
      properties:
        name:
          title: Name
          type: string
          pattern: '^[a-z0-9-]+$'
        owner:
          title: Owner
          type: string
          ui:field: OwnerPicker
          ui:options:
            catalogFilter:
              kind: Group
        system:
          title: System
          type: string
          ui:field: EntityPicker
          ui:options:
            catalogFilter:
              kind: System
  steps:
    - id: fetch
      name: Fetch skeleton
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          owner: ${{ parameters.owner }}
          system: ${{ parameters.system }}

    - id: publish
      name: Publish to GitHub
      action: publish:github
      input:
        repoUrl: github.com?owner=acme&repo=${{ parameters.name }}
        defaultBranch: main
        repoVisibility: internal
        protectDefaultBranch: true

    - id: register
      name: Register in catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
        catalogInfoPath: '/catalog-info.yaml'

  output:
    links:
      - title: Repository
        url: ${{ steps.publish.output.remoteUrl }}
      - title: Open in catalog
        icon: catalog
        entityRef: ${{ steps.register.output.entityRef }}

The skeleton directory is templated with Nunjucks — files ending in .njk (or any file containing ${{ }}) get the form values interpolated, including filenames. That is how catalog-info.yaml in the skeleton ends up correctly owned and systemed without the developer touching it.

Custom actions live in a backend module

When the built-in actions run out — you need to open a Jira ticket, register a service in PagerDuty, or call an internal provisioning API — you write a custom action as a backend module. Do not fork the scaffolder; extend it.

// plugins/scaffolder-backend-module-acme/src/actions/registerOnCall.ts
import { createTemplateAction } from '@backstage/plugin-scaffolder-node';

export const registerOnCallAction = () =>
  createTemplateAction<{ service: string; team: string }>({
    id: 'acme:oncall:register',
    schema: {
      input: {
        type: 'object',
        required: ['service', 'team'],
        properties: {
          service: { type: 'string' },
          team: { type: 'string' },
        },
      },
    },
    async handler(ctx) {
      const { service, team } = ctx.input;
      ctx.logger.info(`Registering ${service} on-call for ${team}`);
      // call your provisioning / PagerDuty API here
      ctx.output('escalationPolicyId', 'PXXXXXX');
    },
  });
// register the action as a scaffolder module
import { createBackendModule } from '@backstage/backend-plugin-api';
import { scaffolderActionsExtensionPoint } from '@backstage/plugin-scaffolder-node/alpha';
import { registerOnCallAction } from './actions/registerOnCall';

export const scaffolderModuleAcme = createBackendModule({
  pluginId: 'scaffolder',
  moduleId: 'acme-actions',
  register(env) {
    env.registerInit({
      deps: { scaffolder: scaffolderActionsExtensionPoint },
      async init({ scaffolder }) {
        scaffolder.addActions(registerOnCallAction());
      },
    });
  },
});

Then backend.add(import('@internal/plugin-scaffolder-backend-module-acme')) in index.ts, and acme:oncall:register is available as a step.

5. Golden-path scaffolding that provisions real infrastructure

The template above creates a repo. A golden path provisions the whole vertical: repo, CI, and the cloud resources the service needs. Two patterns, and the choice matters.

Pattern A — the template commits IaC and lets your platform pipeline apply it. The skeleton ships a Terraform module that your CI plan/applies. This keeps a single source of truth (Git) and means the scaffolder never holds cloud credentials. It is the pattern I default to.

    - id: terraform-pr
      name: Open infra PR
      action: publish:github:pull-request
      input:
        repoUrl: github.com?owner=acme&repo=infra-live
        branchName: provision-${{ parameters.name }}
        title: 'Provision infra for ${{ parameters.name }}'
        description: 'Adds Postgres + bucket for ${{ parameters.name }}'
        targetPath: 'services/${{ parameters.name }}'
        sourcePath: ./infra-skeleton

The skeleton’s Terraform is plain HCL, templated with the service name:

module "db" {
  source     = "../../modules/postgres"
  name       = "${name}"
  size       = "small"
  tier       = "tier1"
}

Pattern B — the template calls a custom action that provisions synchronously. Faster feedback, but now the scaffolder needs cloud credentials and you own the rollback story when step 4 fails after step 3 created a database. Only reach for this when a same-session result is a hard requirement.

The failure mode that bites everyone: a multi-step template that is not idempotent. If publish:github succeeds and catalog:register fails, re-running creates a second repo. Make custom actions idempotent (check-then-create), and prefer the PR pattern so a human merge is the commit point. Treat scaffolder runs as best-effort orchestration, not a transaction.

6. Publishing docs with TechDocs and a CI build pipeline

TechDocs renders MkDocs-built static sites inside Backstage, keyed off the backstage.io/techdocs-ref annotation. The decision that defines whether TechDocs scales is the generation strategy.

The local builder (Backstage generates docs on the fly) is fine for yarn dev and a disaster in production — every page view can trigger a build, and your portal needs MkDocs and the right plugins installed. The production pattern is build docs in CI, publish the static site to object storage, and configure TechDocs to read-only from there:

techdocs:
  builder: 'external'        # Backstage does not build; it serves
  publisher:
    type: 'awsS3'
    awsS3:
      bucketName: 'acme-techdocs'
      region: 'eu-west-1'

Each repo builds and pushes its own docs on merge using the techdocs-cli:

# .github/workflows/techdocs.yml
name: TechDocs
on:
  push:
    branches: [main]
jobs:
  publish:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm i -g @techdocs/cli
      - uses: actions/setup-python@v5
        with: { python-version: '3.x' }
      - run: pip install mkdocs-techdocs-core
      - run: techdocs-cli generate --no-docker --verbose
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::111122223333:role/techdocs-publisher
          aws-region: eu-west-1
      - run: |
          techdocs-cli publish \
            --publisher-type awsS3 \
            --storage-name acme-techdocs \
            --entity default/component/${{ github.event.repository.name }}

The service repo carries an mkdocs.yml and a docs/ tree; the backstage.io/techdocs-ref: dir:. annotation tells Backstage where to look. Docs-as-code means the docs live beside the code, review in the same PR, and go stale visibly when the diff touches behavior but not docs/.

7. Authentication, the permissions framework, and RBAC

Authentication and authorization are two separate systems in Backstage, and conflating them is a classic mistake.

Auth signs the user in and resolves them to a catalog User entity. The sign-in resolver is the load-bearing piece — it maps an OAuth identity to a Backstage identity, which is what every ownership and permission check keys off:

// the resolver decides which catalog User an OAuth login maps to
import { githubAuthenticator } from '@backstage/plugin-auth-backend-module-github-provider';
// resolver: match the GitHub username to a User entity's name
signIn: {
  resolver: async (info, ctx) =>
    ctx.signInWithCatalogUser({ entityRef: { name: info.result.fullProfile.username } }),
}

Permissions decide what an authenticated user can do. The permission framework is opt-in: you write a PermissionPolicy and the backend enforces it at every plugin that emits permission checks (catalog, scaffolder). A policy that lets a user delete only entities they own and gates template execution by group:

// a permission policy keyed off ownership and group membership
import { PolicyDecision, AuthorizeResult } from '@backstage/plugin-permission-common';
import { catalogEntityDeletePermission } from '@backstage/plugin-catalog-common/alpha';
import { createCatalogConditionalDecision, catalogConditions }
  from '@backstage/plugin-catalog-backend/alpha';

async handle(request, user): Promise<PolicyDecision> {
  if (request.permission.name === catalogEntityDeletePermission.name) {
    return createCatalogConditionalDecision(request.permission, {
      rule: 'IS_ENTITY_OWNER',
      resourceType: 'catalog-entity',
      params: { claims: user?.identity.ownershipEntityRefs ?? [] },
    });
  }
  return { result: AuthorizeResult.ALLOW };
}

The conditional decision is the powerful bit: instead of a flat allow/deny, the catalog applies the IS_ENTITY_OWNER rule as a filter, so the same policy works whether the user is acting on one entity or listing thousands. For role-based access at scale, the community RBAC plugin (@backstage-community/plugin-rbac) layers a managed UI and role/permission CSV on top of the framework so you are not redeploying for every policy tweak.

Start with authentication and ownership resolution working end to end before you turn on the permission policy. A broken sign-in resolver makes every user own nothing, and an ownership-based policy then denies everyone — which reads as “permissions are broken” when the real fault is one line in the resolver.

8. Production deployment and keeping the catalog from rotting

Backstage ships as a Docker image you build from the monorepo and run on Kubernetes. The production-grade choices:

# build the backend image from the repo root
yarn install --immutable
yarn tsc
yarn build:backend
docker build . -f packages/backend/Dockerfile --tag acme/backstage:$(git rev-parse --short HEAD)
# app-config.production.yaml
backend:
  database:
    client: pg
    connection:
      host: ${POSTGRES_HOST}
      port: 5432
      user: ${POSTGRES_USER}
      password: ${POSTGRES_PASSWORD}

Upgrades are a recurring chore, not a project. The Backstage release cadence is roughly monthly, and the supported path is the version bump tool plus the published upgrade-helper diff:

yarn backstage-cli versions:bump
# then reconcile packages/app and packages/backend against the
# upgrade-helper diff for your from->to versions, run yarn dedupe

Falling more than a few releases behind turns a routine bump into an archaeology project — the new backend system, the alpha permission APIs, and the auth module split all landed across versions, and skipping them compounds the migration cost.

Keeping the catalog honest is the work that never ends and matters most. The mechanisms:

catalog:
  orphanStrategy: delete
  processingInterval: { minutes: 30 }

The catalog is a product with an SLA on freshness, not a wiki. If a service can ship without its catalog entry being correct, the catalog will be wrong, and a wrong catalog is worse than no catalog because people stop trusting it.

Enterprise scenario

A 600-engineer fintech rolled Backstage out as the mandated front door for service creation. Within a quarter the catalog showed 1,400 components against roughly 300 real services. The cause: their CI bot ran catalog:register on every feature branch that contained a catalog-info.yaml, and GitHub discovery was configured without a branch filter, so every long-lived branch minted a duplicate entity. Ownership views were unusable and the platform team was fielding “why are there nine checkout services” tickets weekly.

The constraint: they could not stop teams from working on branches, and they could not hand-prune 1,100 entities.

The fix was two lines of configuration and one policy. First, pin discovery to the default branch so only main produces catalog entities:

catalog:
  providers:
    github:
      acmeOrg:
        organization: 'acme'
        catalogPath: '/catalog-info.yaml'
        filters:
          branch: 'main'              # branches no longer mint entities
        schedule:
          frequency: { minutes: 30 }
          timeout: { minutes: 3 }
  orphanStrategy: delete              # duplicates from old branches get reaped

Then they moved catalog:register out of per-branch CI entirely — registration became a one-time scaffolder output, and steady state came from discovery alone. With orphanStrategy: delete, the 1,100 branch-derived duplicates fell out of the catalog over the next two sync cycles without anyone deleting a thing. Component count settled at 312. The lesson the team wrote into their platform runbook: the catalog must have exactly one ingestion path per entity, and that path must be the default branch. Two paths is how you get duplicates; a non-default branch is how you get noise.

Verify

Confirm the portal works end to end before you call it done:

# 1. Backend health and catalog API responding
curl -s http://localhost:7007/api/catalog/entities | jq 'length'

# 2. Org data ingested — Users and Groups resolve
curl -s 'http://localhost:7007/api/catalog/entities?filter=kind=group' | jq '.[].metadata.name'

# 3. No entities stuck with processing errors
curl -s 'http://localhost:7007/api/catalog/entities' \
  | jq '[.[] | select(.metadata.annotations["backstage.io/orphan"]=="true")] | length'

# 4. A scaffolder template renders its form and lists actions
curl -s http://localhost:7007/api/scaffolder/v2/actions | jq '.[].id' | grep acme

# 5. TechDocs is served from storage, not built locally
curl -s 'http://localhost:7007/api/techdocs/static/docs/default/component/checkout-api/index.html' -I

Then in the UI: load /create, run the golden-path template against a throwaway repo, and confirm the new component appears in /catalog with the right owner and System, and its docs render under /docs. If sign-in resolves you to a User and that user owns the entity you just created, the auth-to-ownership chain is sound.

Checklist

backstageplatform-engineeringidpdeveloper-experiencesoftware-catalog

Comments

Keep Reading