Most Pulumi tutorials stop at aws.s3.Bucket. Real platforms run into two harder problems: there is no native provider for some internal or niche SaaS API you must manage, and your infrastructure is too large to live in one stack. Pulumi’s Python SDK has first-class answers for both. Dynamic providers let you implement a resource’s full lifecycle in plain Python, and StackReference lets independently-deployed stacks consume each other’s outputs without sharing state. This guide builds both correctly, including the serialization and secret-handling traps that bite people in production.
Everything here targets pulumi 3.x and the pulumi Python package 3.x on Python 3.9+.
1. The resource model: inputs, outputs, and apply
Before writing a provider you must internalize how Pulumi values flow. Every resource argument is an Input[T]: it may be a plain value, an Output[T], or an Awaitable. Every resource attribute Pulumi gives back is an Output[T]. An Output is a promise plus a dependency edge plus a secret flag. You never read its value synchronously during pulumi up, because at preview time the value may be unknown.
import pulumi
from pulumi_aws import s3
bucket = s3.BucketV2("data")
# WRONG: bucket.id is an Output, not a str. This prints a wrapper.
# resource_name = bucket.id + "-logs" # works by luck for str-like, but do not rely on it
# RIGHT: transform inside apply; the lambda runs only when the value is known.
log_name = bucket.id.apply(lambda bid: f"{bid}-logs")
Two rules that matter for the provider work below:
applycallbacks do not run during preview when their input is unknown. Never put side effects (API calls, file writes) inapply. Side effects belong in a resource provider.- Combine multiple outputs with
pulumi.Output.all(...)orpulumi.Output.concat(...), not Python string concatenation, so the dependency graph stays correct.
url = pulumi.Output.all(bucket.bucket, bucket.region).apply(
lambda args: f"https://{args[0]}.s3.{args[1]}.amazonaws.com"
)
Output.format is the readable equivalent of concat:
url = pulumi.Output.format("https://{0}.s3.{1}.amazonaws.com", bucket.bucket, bucket.region)
2. Building a dynamic provider
A dynamic provider is a Python class implementing pulumi.dynamic.ResourceProvider. You subclass pulumi.dynamic.Resource and pass an instance of the provider plus the inputs. The engine calls your provider’s lifecycle methods over its diff loop. The methods you care about are create, update, delete, diff, and optionally check and read.
The example manages a “DNS record” in a fictional REST API that has no Pulumi provider. The principle generalizes to any CRUD API.
# dnsrecord.py
import requests
from pulumi.dynamic import (
ResourceProvider,
CreateResult,
UpdateResult,
DiffResult,
CheckResult,
CheckFailure,
)
class DnsRecordProvider(ResourceProvider):
def check(self, _olds, news):
failures = []
if news.get("type") not in ("A", "AAAA", "CNAME", "TXT"):
failures.append(CheckFailure("type", "type must be A, AAAA, CNAME, or TXT"))
return CheckResult(news, failures)
def create(self, props):
resp = requests.post(
f"{props['endpoint']}/zones/{props['zone']}/records",
headers={"Authorization": f"Bearer {props['token']}"},
json={"name": props["name"], "type": props["type"], "value": props["value"]},
timeout=30,
)
resp.raise_for_status()
record = resp.json()
# outs becomes the resource's outputs; id is the physical identifier.
return CreateResult(id_=record["id"], outs={**props, "record_id": record["id"]})
def diff(self, _id, olds, news):
replaces = []
# Changing name or type forces replacement; value can be updated in place.
for field in ("name", "type", "zone"):
if olds.get(field) != news.get(field):
replaces.append(field)
changed = replaces or olds.get("value") != news.get("value")
return DiffResult(
changes=changed,
replaces=replaces,
delete_before_replace=True,
)
def update(self, id_, _olds, news):
resp = requests.put(
f"{news['endpoint']}/zones/{news['zone']}/records/{id_}",
headers={"Authorization": f"Bearer {news['token']}"},
json={"value": news["value"]},
timeout=30,
)
resp.raise_for_status()
return UpdateResult(outs={**news, "record_id": id_})
def delete(self, id_, props):
resp = requests.delete(
f"{props['endpoint']}/zones/{props['zone']}/records/{id_}",
headers={"Authorization": f"Bearer {props['token']}"},
timeout=30,
)
if resp.status_code not in (200, 204, 404): # 404 == already gone, treat as success
resp.raise_for_status()
The typed resource wrapper exposes outputs as Output attributes via class-level annotations:
from typing import Optional
import pulumi
from pulumi.dynamic import Resource
class DnsRecord(Resource):
record_id: pulumi.Output[str]
name: pulumi.Output[str]
def __init__(self, name, zone, record_name, type, value, endpoint, token,
opts: Optional[pulumi.ResourceOptions] = None):
super().__init__(
DnsRecordProvider(),
name,
{
"zone": zone,
"name": record_name,
"type": type,
"value": value,
"endpoint": endpoint,
"token": token,
"record_id": None, # declared so it is a known output key
},
opts,
)
Why declare
record_id: Nonein the inputs? Any key you want back as an output must exist in the args dict. Pulumi populates it from theoutsyourcreate/updatereturns; if you omit the key, the output attribute resolves toNoneeven when the provider set it.
diff semantics matter
diff is where you control whether a change is an in-place update or a replacement. Get this wrong and you either orphan cloud resources or trigger needless rebuilds. replaces lists the properties whose change forces a new resource. delete_before_replace=True deletes the old resource before creating the new one, which you need when a unique constraint (like a DNS name) would collide if both existed at once. If you return changes=False, Pulumi shows no diff and skips update entirely.
3. Serialization pitfalls and secret inputs
This is the part that trips up nearly everyone. Pulumi serializes your dynamic provider instance, by pickling its __init__-captured state, and stores it in state. At update time it deserializes that pickle and calls your methods. Three consequences:
- The provider class must be importable by a stable path. Do not define the provider class inline in
__main__or inside a function. Put it in a module (dnsrecord.py) so unpickling can locateDnsRecordProvider. - Do not capture unpicklable or environment-specific objects (open sockets, live clients, file handles) in the provider’s
__init__. Build clients inside the lifecycle methods using values passed viaprops, as shown above. Anything the methods need must arrive through the serialized inputs. - Heavy or version-sensitive imports that you capture get pinned into state. Keep providers lean.
For secrets, never pass a raw token as a normal input that lands in plaintext state. Mark it secret so Pulumi encrypts it at rest and redacts it in logs and diffs:
import pulumi
cfg = pulumi.Config()
api_token = cfg.require_secret("dnsApiToken") # Output[str], flagged secret
record = DnsRecord(
"www",
zone="example.com",
record_name="www",
type="A",
value="203.0.113.10",
endpoint="https://dns.internal.example.com/api",
token=api_token, # secret flows through; state encrypts it
)
You can also force individual output properties to be treated as secrets from inside the provider by listing them when constructing results. Pulumi propagates the secret flag through any Output derived from a secret input automatically, so the common case is handled for you as long as the input arrives as a secret.
Caveat: dynamic providers run in process during
pulumi up. Their dependencies are your program’s dependencies, so pinrequests(or whatever SDK) inrequirements.txt. There is no separate provider plugin binary to install.
4. Cross-stack architecture with StackReference
Large estates split into layers: a networking stack, a data stack, an app stack. Each is deployed independently and owns its blast radius. They communicate through stack outputs and StackReference, not shared state files.
Export outputs from the producing stack with pulumi.export:
# networking/__main__.py
import pulumi
from pulumi_aws import ec2
vpc = ec2.Vpc("main", cidr_block="10.0.0.0/16")
private = ec2.Subnet("private-a", vpc_id=vpc.id, cidr_block="10.0.1.0/24",
availability_zone="us-east-1a")
pulumi.export("vpc_id", vpc.id)
pulumi.export("private_subnet_ids", pulumi.Output.all(private.id).apply(list))
Consume them in another stack. The reference name is <org>/<project>/<stack> for Pulumi Cloud, or <project>/<stack> when using a self-managed backend without an org:
# app/__main__.py
import pulumi
from pulumi_aws import ec2
net = pulumi.StackReference("acme/networking/prod")
vpc_id = net.get_output("vpc_id")
subnet_ids = net.get_output("private_subnet_ids")
sg = ec2.SecurityGroup("app", vpc_id=vpc_id)
get_output returns an Output, preserving the dependency and secret flags across the boundary. A few operational notes:
- Use
require_output("vpc_id")instead ofget_outputwhen the key is mandatory; it fails loudly at runtime if the output is missing rather than handing you a null. - For values you genuinely need as a plain Python value at program-construction time (rare, and usually a smell),
get_output(...).apply(...)is still the right tool; do not block on outputs. - A consuming stack does not auto-redeploy when the producer changes. Re-run the consumer after the producer publishes new outputs. Wiring this ordering is a CI/CD concern (see section 8).
The StackReference resource needs read access to the referenced stack’s state. With Pulumi Cloud that means the deploying identity must have read permission on the source stack.
5. Per-environment config, ESC, and secret providers
Each stack carries its own config file (Pulumi.dev.yaml, Pulumi.prod.yaml). Set plain and secret values with the CLI:
pulumi config set aws:region us-east-1
pulumi config set app:replicas 3
pulumi config set --secret app:dnsApiToken 'tok_live_xxx'
Secrets are encrypted with the stack’s secret provider. The default is the Pulumi Cloud service, but for self-managed backends or stricter key custody you should pin a KMS-backed provider when you initialize the stack:
pulumi stack init prod --secrets-provider="awskms://alias/pulumi-prod?region=us-east-1"
# Azure Key Vault and GCP KMS are equivalent:
# azurekeyvault://<vault>.vault.azure.net/keys/<key>
# gcpkms://projects/<p>/locations/<l>/keyRings/<r>/cryptoKeys/<k>
ESC: Environments, Secrets, and Configuration
For secrets and config that span many stacks, Pulumi ESC centralizes them and can broker short-lived cloud credentials via OIDC instead of static keys. Define an environment once, then import it from any stack’s config under the environment key.
# imported via: pulumi env init acme/aws-prod, then edited
values:
aws:
login:
fn::open::aws-login:
oidc:
roleArn: arn:aws:iam::111122223333:role/pulumi-deploy
sessionName: pulumi
duration: 1h
environmentVariables:
AWS_ACCESS_KEY_ID: ${aws.login.accessKeyId}
AWS_SECRET_ACCESS_KEY: ${aws.login.secretAccessKey}
AWS_SESSION_TOKEN: ${aws.login.sessionToken}
# Pulumi.prod.yaml
environment:
- aws-prod
config:
app:replicas: 5
This is how you stop storing long-lived cloud keys in CI: ESC mints temporary credentials per run, and aws:region-style config still lives in the stack file.
6. Component resources for reusable, typed abstractions
A ComponentResource groups child resources under one logical node and is your unit of reuse, the Pulumi answer to a Terraform module, but with types. Define typed args with a dataclass, register outputs, and always set parent on children.
from dataclasses import dataclass
from typing import Optional
import pulumi
from pulumi_aws import s3
@dataclass
class StaticSiteArgs:
index_document: str = "index.html"
versioned: bool = True
class StaticSite(pulumi.ComponentResource):
bucket_name: pulumi.Output[str]
website_endpoint: pulumi.Output[str]
def __init__(self, name: str, args: StaticSiteArgs,
opts: Optional[pulumi.ResourceOptions] = None):
super().__init__("acme:web:StaticSite", name, {}, opts)
child = pulumi.ResourceOptions(parent=self)
bucket = s3.BucketV2(f"{name}-bucket", opts=child)
if args.versioned:
s3.BucketVersioningV2(
f"{name}-ver",
bucket=bucket.id,
versioning_configuration={"status": "Enabled"},
opts=child,
)
website = s3.BucketWebsiteConfigurationV2(
f"{name}-web",
bucket=bucket.id,
index_document={"suffix": args.index_document},
opts=child,
)
self.bucket_name = bucket.bucket
self.website_endpoint = website.website_endpoint
# Surfaces these as outputs and finalizes the component in the graph.
self.register_outputs({
"bucket_name": self.bucket_name,
"website_endpoint": self.website_endpoint,
})
The first argument to super().__init__ is the component’s type token (package:module:Type). Setting parent=self on every child nests them in pulumi stack graph and ties their lifecycle to the component. Forgetting register_outputs leaves the component half-constructed in state.
7. Testing with mocks and policy with CrossGuard
Pulumi’s unit-test framework swaps the engine for a mock so tests run with no cloud calls and no real pulumi up. Implement pulumi.runtime.Mocks, set it before importing your program, then assert on resource properties resolved through apply.
# test_infra.py
import pulumi
class Mocks(pulumi.runtime.Mocks):
def new_resource(self, args: pulumi.runtime.MockResourceArgs):
# Return (id, state). state echoes inputs plus computed fields.
return [args.name + "_id", {**args.inputs, "arn": "arn:fake:" + args.name}]
def call(self, args: pulumi.runtime.MockCallArgs):
return {}
pulumi.runtime.set_mocks(Mocks(), preview=False)
import infra # import AFTER set_mocks so resources register against the mock
@pulumi.runtime.test
def test_bucket_is_versioned():
def check(args):
status = args[0]
assert status == "Enabled", "production buckets must be versioned"
return infra.site_versioning.versioning_configuration.apply(
lambda c: pulumi.Output.from_input([c["status"]])
).apply(check)
The @pulumi.runtime.test decorator handles the async output resolution; return an Output (or a coroutine) so the framework waits for assertions inside apply. Run with pytest.
For org-wide guardrails that run during preview and up, write a CrossGuard policy pack in Python. Policies fail the deployment when violated, so they gate every stack, not just the ones with tests.
# policy/__main__.py
from pulumi_policy import (
PolicyPack, ResourceValidationPolicy, EnforcementLevel, ReportViolation,
)
def s3_no_public_acl(args, report: ReportViolation):
if args.resource_type == "aws:s3/bucketV2:BucketV2":
if args.props.get("acl") == "public-read":
report("S3 buckets must not be public-read")
PolicyPack(
name="acme-baseline",
enforcement_level=EnforcementLevel.MANDATORY,
policies=[
ResourceValidationPolicy(
name="s3-no-public-acl",
description="Disallow public-read S3 buckets",
validate=s3_no_public_acl,
),
],
)
pulumi preview --policy-pack ./policy
8. CI/CD: preview gating and update with the GitHub Action
The discipline that makes this safe is: preview on every pull request, comment the diff, require approval, then update on merge. Use the official pulumi/actions@v6 action with OIDC so no static cloud or Pulumi tokens sit in the repo.
# .github/workflows/pulumi.yml
name: pulumi
on:
pull_request:
branches: [main]
push:
branches: [main]
permissions:
id-token: write # OIDC to cloud and to Pulumi
contents: read
pull-requests: write # so the action can comment the preview
jobs:
preview:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -r requirements.txt
- uses: pulumi/actions@v6
with:
command: preview
stack-name: acme/app/prod
comment-on-pr: true
update:
if: github.event_name == 'push'
runs-on: ubuntu-latest
environment: production # GitHub Environment protection rule = approval gate
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install -r requirements.txt
- uses: pulumi/actions@v6
with:
command: up
stack-name: acme/app/prod
Two gating mechanisms are doing the work. The pull_request job runs preview and posts the plan as a PR comment so a human reviews the diff. The push job is bound to a GitHub Environment (production) with a required-reviewers protection rule, so the merge-to-deploy step blocks until approved. For multi-stack ordering, run the producer stack’s up job before the consumer’s, gated on success, so StackReference consumers see fresh outputs.
Verify
Run these to confirm each piece behaves. The dynamic provider:
pulumi preview # should show the DnsRecord with known/unknown props
pulumi up --yes # create() runs; record_id appears in outputs
pulumi stack output --show-secrets # token is encrypted at rest, decrypted only here
pulumi up --yes # change value only -> in-place update, no replace
pulumi destroy --yes # delete() runs; 404 tolerated as success
Confirm secrets never leak to plaintext state. With a self-managed backend you can inspect the export:
pulumi stack export | python -c "import json,sys; \
s=json.load(sys.stdin); \
print('SECRETS PRESENT' if 'ciphertext' in json.dumps(s) else 'NO CIPHERTEXT')"
Validate cross-stack wiring and policy:
pulumi stack output vpc_id --stack acme/networking/prod # producer exports it
pulumi preview --stack acme/app/prod # consumer resolves the reference
pulumi preview --policy-pack ./policy # MANDATORY policy blocks violations
pytest -q # mocks run with zero cloud calls
Expected results: pulumi up on a value-only change reports ~ update (not +- replace); a public-read bucket fails preview under the policy pack with a non-zero exit; pytest passes offline; and the stack export shows ciphertext for the token, never the raw value.