A mid-size insurer has just failed an audit finding on privileged access: thirty-odd engineers and three managed-service vendors reach production Linux and Windows hosts through a flat VPN, with shared bastion.pem keys passed around in chat and a domain admin RDP account whose password last rotated when the bastion was built. The auditor’s note is blunt — there is no per-user attribution on a host, no just-in-time grant, and the standing network path means a single phished laptop owns the whole estate. The mandate from the CISO is to retire the VPN-plus-jumpbox model and put a brokered, identity-gated access layer in front of every SSH and RDP target, self-hosted because the regulated workloads cannot egress to a SaaS control plane. This guide stands up exactly that with HashiCorp Boundary: a controller cluster, a tier of session workers reachable from the private subnets, Okta-federated-to-Entra SSO at the front, and HashiCorp Vault brokering short-lived credentials so the engineer who connects never sees a password or a private key.
Boundary’s model is worth one sentence before the commands. A user authenticates to the controller (the control plane — API, auth, policy, session orchestration), the controller authorizes them against a target, and then a worker (the data plane) proxies the actual TCP session to the private host. The user’s client only ever talks to the worker; the worker only ever talks to the host. There is no standing network route from the laptop to the target, and no credential ever lands on the laptop.
Prerequisites
- A PostgreSQL 13+ database the controllers can reach (Boundary stores config/state here). This guide uses an RDS/Cloud SQL-style managed instance named
boundary. - Three small Linux VMs for controllers (HA) and two for workers, or an equivalent. Workers must have a network path to your private SSH/RDP hosts; controllers do not.
- A KMS for root/recovery/worker-auth key wrapping — AWS KMS, Azure Key Vault, or GCP KMS. We reference AWS KMS keys by alias.
boundaryCLI and Boundary Enterprise/HCP-compatible binary 0.16+ installed on the nodes;terraform1.6+ and thehashicorp/boundaryprovider; access to a HashiCorp Vault cluster for credential brokering.- An Okta org (workforce IdP) federated to Microsoft Entra ID as the OIDC token issuer Boundary will trust.
- Outbound DNS/TLS handled at the edge by Akamai for the public controller API endpoint.
Target topology
The deployment splits cleanly into a control plane and a data plane, and keeping them separate in your head is the whole point of Boundary.
- Controllers (3 nodes, private subnet, fronted by an internal load balancer on
:9200API and:9201cluster) own the API, the database, auth-method federation, and session authorization. They are the brain; they never carry session bytes. - Workers (2+ nodes, sitting in or peered to the private subnets that hold your hosts) register upstream to the controllers and proxy sessions. They are the only thing that touches a target. Place a worker in each network zone where targets live.
- Vault issues the actual SSH and RDP credentials on demand, brokered through Boundary so they are short-lived and never reused.
- Okta → Entra ID federates workforce identity; Boundary trusts the Entra-issued OIDC token and maps group claims to Boundary roles.
- Akamai terminates TLS and provides WAF/anycast in front of the public controller API; engineers’ clients reach the controller through it, then get handed to a worker over an authenticated proxy.
Auxiliary tooling that operates around the cluster: CrowdStrike Falcon sensors run on every controller and worker node for runtime threat detection; Wiz (with Wiz Code scanning the Terraform in the pipeline) continuously checks the cloud posture and flags any drift that would re-open a direct path to a host; Dynatrace ingests the controller/worker metrics and traces; ServiceNow holds the change record and the just-in-time access approvals; and GitHub Actions with Terraform applies the whole estate as code while Ansible bakes the node images.
1. Provision the database, KMS keys, and node images
Boundary needs a Postgres database and KMS keys it can reach before the first controller boots. Provision these with Terraform so the estate is reproducible and Wiz Code can scan the plan in the pipeline before anything lands.
# kms.tf — three purposes: never share one key across roles
resource "aws_kms_key" "boundary_root" { description = "boundary-root" }
resource "aws_kms_key" "boundary_recovery"{ description = "boundary-recovery" }
resource "aws_kms_key" "boundary_worker" { description = "boundary-worker-auth" }
resource "aws_kms_alias" "root" { name = "alias/boundary-root" target_key_id = aws_kms_key.boundary_root.id }
resource "aws_kms_alias" "recovery" { name = "alias/boundary-recovery" target_key_id = aws_kms_key.boundary_recovery.id }
resource "aws_kms_alias" "worker" { name = "alias/boundary-worker" target_key_id = aws_kms_key.boundary_worker.id }
Create the database and a least-privilege role:
CREATE DATABASE boundary;
CREATE ROLE boundary WITH LOGIN PASSWORD 'set-via-vault-not-here';
GRANT ALL PRIVILEGES ON DATABASE boundary TO boundary;
Bake the controller and worker VM images with Ansible so the boundary binary, the CrowdStrike Falcon sensor, and the Dynatrace OneAgent are baked in rather than installed at boot. A minimal play:
- hosts: boundary_nodes
become: true
tasks:
- name: Install Boundary binary
ansible.builtin.unarchive:
src: "https://releases.hashicorp.com/boundary/0.16.2+ent/boundary_0.16.2+ent_linux_amd64.zip"
dest: /usr/local/bin
remote_src: true
- name: Install CrowdStrike Falcon sensor
ansible.builtin.apt: { deb: /opt/falcon-sensor.deb }
- name: Install Dynatrace OneAgent
ansible.builtin.command: /opt/dynatrace/oneagentctl --set-host-group=boundary
2. Initialize the controller and run the database migration
On the first controller node, write the controller config. The kms stanzas point at the KMS aliases from step 1; the database.url points at Postgres.
# /etc/boundary/controller.hcl
disable_mlock = true
controller {
name = "boundary-controller-1"
description = "KloudVin Boundary controller"
database {
url = "env://BOUNDARY_PG_URL" # postgres://boundary:...@db:5432/boundary
}
}
listener "tcp" { purpose = "api" address = "0.0.0.0:9200" tls_disable = true } # TLS terminated at Akamai/LB
listener "tcp" { purpose = "cluster" address = "0.0.0.0:9201" }
kms "awskms" { purpose = "root" key_id = "alias/boundary-root" }
kms "awskms" { purpose = "recovery" key_id = "alias/boundary-recovery" }
kms "awskms" { purpose = "worker-auth" key_id = "alias/boundary-worker" }
Run the one-time schema migration, then start the service. Run database init on exactly one node:
export BOUNDARY_PG_URL="postgres://boundary:$(vault kv get -field=password secret/boundary/db)@db.internal:5432/boundary?sslmode=verify-full"
# First node only — creates schema + bootstrap org/auth/role; capture the output
boundary database init -config /etc/boundary/controller.hcl
# All controller nodes
systemctl enable --now boundary-controller
database init prints a bootstrap auth method, an initial admin login name and password, and a generated org/project scope. Store these in Vault immediately and never in the repo. Bring up controllers 2 and 3 with the same config (changing only controller.name) — they share the database and KMS, so they form an HA set behind the internal load balancer automatically.
3. Enroll workers into the data plane
Workers are what actually reach your private hosts, so they live in the target subnets and register upstream to the controllers. Use controller-led (PKI) worker registration, which is the model that does not require pre-sharing a static token.
# /etc/boundary/worker.hcl
disable_mlock = true
listener "tcp" { purpose = "proxy" address = "0.0.0.0:9202" }
worker {
public_addr = "worker-1.private.kloudvin.internal:9202" # what clients are told to dial
initial_upstreams = ["controller-lb.internal:9201"]
tags { region = ["ap-south-1"], zone = ["app-private"] } # used by target worker filters
}
kms "awskms" { purpose = "worker-auth" key_id = "alias/boundary-worker" }
Start the worker, grab its auth request token from the log, and approve it on the controller:
systemctl enable --now boundary-worker
journalctl -u boundary-worker | grep -m1 "Worker Auth Registration Request"
# -> copy the token string, then on an admin client:
boundary workers create worker-led \
-worker-generated-auth-token "<token-from-log>" \
-name "worker-app-private-1" \
-description "ap-south-1 / app-private zone"
boundary workers list # confirm it shows active=true
Repeat for the second worker (and one per additional network zone). Workers heartbeat to the controllers; if a zone has no healthy worker, targets there are simply unreachable — which is the safe failure direction.
4. Federate identity through Okta and Entra ID (OIDC)
Engineers authenticate with their corporate identity, not a Boundary-local password. Okta is the workforce IdP; it federates to Microsoft Entra ID, and Boundary trusts the Entra-issued OIDC token. Register Boundary as an app in Entra (redirect URI https://boundary.kloudvin.com/v1/auth-methods/oidc:authenticate:callback), then create the OIDC auth method:
boundary auth-methods create oidc \
-issuer "https://login.microsoftonline.com/<tenant-id>/v2.0" \
-client-id "<entra-app-client-id>" \
-client-secret "$(vault kv get -field=secret secret/boundary/oidc)" \
-signing-algorithm "RS256" \
-api-url-prefix "https://boundary.kloudvin.com" \
-claims-scopes "groups" \
-name "entra-sso" -description "Okta->Entra workforce SSO"
# Make it usable and primary for the org
boundary auth-methods change-state oidc -id <amoidc_id> -state "active-public" -primary
Map Entra group claims to Boundary principals with managed groups, so membership is driven by the directory, not maintained by hand:
boundary managed-groups create oidc \
-auth-method-id <amoidc_id> \
-filter '"prod-ssh-admins" in "/token/groups"' \
-name "prod-ssh-admins"
Now an engineer who is in the prod-ssh-admins Entra group (sourced from Okta) automatically lands in the matching Boundary managed group — the directory is the single source of truth.
5. Broker SSH credentials from Vault for Linux targets
This is the step that retires shared .pem files. Rather than store keys in Boundary, configure a Vault credential store and have Vault’s ssh secrets engine sign a short-lived certificate per session. On the Vault side, enable the SSH CA:
vault secrets enable -path=ssh-client-signer ssh
vault write ssh-client-signer/config/ca generate_signing_key=true
vault write ssh-client-signer/roles/boundary - <<EOF
{ "key_type":"ca","algorithm_signer":"rsa-sha2-256","allow_user_certificates":true,
"allowed_users":"*","default_extensions":{"permit-pty":""},"ttl":"5m" }
EOF
Put the Vault CA public key in each Linux host’s sshd_config as a TrustedUserCAKeys file (push it with Ansible) so hosts accept Vault-signed certs and nothing else. Then wire Boundary to that Vault path:
# A token with policy to use ssh-client-signer, set to renew
boundary credential-stores create vault \
-scope-id <project_id> \
-vault-address "https://vault.kloudvin.internal:8200" \
-vault-token "$(vault token create -policy=boundary-ssh -period=20m -field=token)" \
-name "vault-ssh-ca"
boundary credential-libraries create vault-ssh-certificate \
-credential-store-id <cs_id> \
-vault-path "ssh-client-signer/sign/boundary" \
-username "ec2-user" \
-key-type "ecdsa" -key-bits 256 \
-name "linux-ssh-cert"
6. Define targets and attach the brokered credentials
A target is the host (or host set) plus the worker filter that says which worker proxies it and the credential library that injects the secret. Create an SSH target for a private Linux host, scoped to the right worker zone:
boundary targets create ssh \
-scope-id <project_id> \
-name "prod-db-linux" \
-default-port 22 \
-egress-worker-filter '"app-private" in "/tags/zone"' \
-address "10.20.4.11"
# Inject the Vault-signed cert at connect time (brokered, not visible to the user)
boundary targets add-credential-sources \
-id <tssh_id> \
-brokered-credential-source <linux-ssh-cred-lib-id>
For Windows RDP, broker the credential as a Vault-issued, short-TTL local/AD account instead of a static domain admin. Create a generic TCP target on 3389 with a username/password credential library backed by Vault’s AD or kv secrets engine:
boundary targets create tcp \
-scope-id <project_id> \
-name "prod-win-rdp" \
-default-port 3389 \
-egress-worker-filter '"app-private" in "/tags/zone"' \
-address "10.20.4.40"
boundary targets add-credential-sources \
-id <ttcp_id> \
-brokered-credential-source <vault-ad-cred-lib-id>
Finally, grant access by binding the Entra-sourced managed group to a role on the project scope, with a grant to connect to targets:
boundary roles create -scope-id <project_id> -name "prod-ssh-admins-connect"
boundary roles add-principals -id <role_id> -principal <managed_group_id>
boundary roles add-grant-strings -id <role_id> \
-grant "ids=*;type=target;actions=authorize-session" \
-grant "ids=*;type=session;actions=read:self,cancel:self"
Validation
Prove the full path end to end from an engineer’s laptop. First authenticate through Entra, then connect — the credential is injected, never shown.
# 1. SSO login via the OIDC auth method (opens browser to Okta -> Entra)
boundary authenticate oidc -auth-method-id <amoidc_id>
# 2. List what this identity is allowed to reach
boundary targets list -scope-id <project_id>
# 3. Brokered SSH — no key on disk, Vault signs a 5-min cert per session
boundary connect ssh -target-id <tssh_id>
# you land on 10.20.4.11 as ec2-user; `last` on the host shows your identity
# 4. Brokered RDP — Boundary opens a local proxy port; point mstsc/Remmina at it
boundary connect rdp -target-id <ttcp_id>
Then confirm the controls actually hold:
# Active sessions are visible and cancellable centrally
boundary sessions list -scope-id <project_id> -recursive
# On a target host, no standing route exists outside an active session:
# from the laptop, a direct `ssh 10.20.4.11` MUST fail (no network path)
Check that Dynatrace shows the controller :9200 and worker :9202 listeners healthy, that CrowdStrike Falcon reports both node types as protected, and that Wiz shows no path from the engineer subnet straight to the target subnet. A clean run here is the audit evidence: every connection is attributed to a federated identity, every credential is short-lived and Vault-issued, and there is no path to a host except through a worker during an authorized session.
Rollback / teardown
Tear down in reverse dependency order so you never strand a live session. Cancel sessions first, then remove grants, then infrastructure.
# 1. Drain: cancel any live sessions
for s in $(boundary sessions list -scope-id <project_id> -recursive -format json | jq -r '.items[].id'); do
boundary sessions cancel -id "$s"
done
# 2. Remove access bindings and targets
boundary roles delete -id <role_id>
boundary targets delete -id <tssh_id>
boundary targets delete -id <ttcp_id>
# 3. Deregister workers, then stop services
boundary workers delete -id <worker_id>
systemctl disable --now boundary-worker boundary-controller
Because the estate is Terraform-managed, the durable teardown is terraform destroy of the Boundary stack, which removes the controllers, workers, load balancer, and database. Keep the KMS keys until last and schedule their deletion separately — destroying the root/recovery key before the database is gone leaves an unrecoverable cluster. Revoke the Vault token used by the credential store (vault token revoke) so no orphaned lease can sign a certificate after the cluster is gone. Re-open the legacy VPN path only as a documented break-glass during the cutover window, gated by a ServiceNow change.
Common pitfalls
- Sharing one KMS key across purposes. Use distinct root, recovery, and worker-auth keys. Reusing one couples blast radius and makes recovery rotation impossible — Boundary will start, but you have quietly thrown away the isolation the design exists for.
- Putting controllers where workers belong. Controllers do not need to reach targets and should not. If a controller can route to a host, you have re-created the flat network the project was meant to kill. Only workers get target-subnet access.
- Forgetting the worker filter on a target. Without
-egress-worker-filter, Boundary may try a worker with no path to the host and the session hangs at connect. Tag workers by zone and filter every target explicitly. - Long-lived Vault tokens on the credential store. Create the store’s Vault token with
-periodso it auto-renews and dies if Boundary stops renewing it. A non-periodic token expires mid-incident and every brokered connect fails at once. - Trusting the Okta token directly. Boundary trusts the Entra-issued OIDC token, not Okta’s. Point
-issuerat the Entra v2.0 endpoint and verify thegroupsclaim is actually present — a missinggroupsscope silently strips everyone of their managed-group roles. - TLS double-termination. With Akamai terminating TLS at the edge, run the controller API listener with
tls_disable = trueonly behind the trusted LB/edge — never expose:9200plaintext directly.
Security notes
The whole point is Zero Trust for privileged access: no standing network path, identity-based authorization on every session, and credentials that are brokered, short-lived, and never seen by the human. Keep the three KMS purposes separate; scope Boundary roles to the minimum grant (authorize-session, read:self, cancel:self) rather than admin; and source group membership from Okta → Entra so an offboarded user loses access the moment the directory does. CrowdStrike Falcon on the nodes catches runtime compromise of a worker (the one component with target reachability), Wiz + Wiz Code continuously assert that neither the running posture nor the Terraform re-opens a direct route, and every session is centrally logged and cancellable for incident response and audit. Session recording (BSR) can be enabled on the worker for the highest-sensitivity targets where compliance wants a replayable record.
Cost notes
Self-hosting Boundary’s community/enterprise binary means you pay for compute and the database, not a per-seat SaaS fee — three small controllers, two workers, and a modest managed Postgres covers a few hundred engineers comfortably. Right-size the controllers (they are control-plane only) and scale workers by session concurrency and per-zone reachability rather than over-provisioning. The real saving is indirect and larger: retiring the always-on VPN concentrators and shared jumpboxes, and replacing a standing privileged-access exposure with a just-in-time one, removes both licence cost and the far more expensive risk the auditor flagged. Pipe utilization to Dynatrace so worker scaling tracks actual session load instead of a guess.