A 14-hospital regional health system in the U.S. Midwest gets a board-level mandate after a near-miss: a neighboring provider was down for nineteen days when ransomware encrypted its domain controllers, and the board’s question to the CIO is blunt — “if our Active Directory goes, how long are we down, and why does every single application still depend on it?” The honest answer is ugly. One Windows Server 2012 R2 forest, stood up in 2006, is the load-bearing wall for everything: the EHR’s single sign-on, the badge readers on the OR doors, the radiology PACS, the Moodle instance that runs mandatory clinical compliance training, 18,000 clinician and staff accounts, and a tangle of LDAP bindings nobody fully maps anymore. If that forest is encrypted, the hospital does not just lose email — it loses the ability to authenticate a nurse at a med-dispensing cabinet. This article is the reference architecture for the way out: making Okta the identity broker the whole enterprise authenticates against, federating it cleanly to Microsoft Entra ID for the Azure and Microsoft 365 estate, and then methodically shrinking on-prem AD from “the center of the universe” to “one more directory we sync from” — without a flag-day cutover no hospital could survive.
The pressures here are healthcare’s, and they are specific. HIPAA means every access decision needs an auditable trail and least-privilege enforcement, and a breach is a reportable event with real fines. Clinical safety means identity is life-critical infrastructure — a clinician who cannot log in cannot chart or order, so any change must be reversible and staged ward by ward, never big-bang. Mergers mean the system acquires a hospital roughly yearly and inherits another AD forest each time, so the architecture must absorb new directories, not choke on them. And cost means this runs on a nonprofit health system’s budget, so “rip and replace the directory” is off the table — the path has to deliver risk reduction at every step, not only at the finish line.
Why not the obvious shortcuts
Three “simple” answers get proposed in the first planning meeting, and each fails for a reason worth naming out loud.
“Just move everything to Entra ID and retire AD.” Entra ID is not a drop-in replacement for an on-prem domain. The OR badge readers, the lab instruments, the PACS, and a dozen biomedical appliances speak Kerberos and LDAP, not OIDC — they cannot authenticate against a cloud-only directory. Group Policy, file shares, and certificate auto-enrollment have no native cloud equivalent. Retiring AD outright would brick the clinical floor on day one.
“Keep AD as the brain and bolt SSO on top.” This is the status quo dressed up. As long as AD remains the authoritative source and the runtime authority for every login, the ransomware blast radius is unchanged — the board’s question goes unanswered. SSO without re-rooting where authentication actually happens is theater.
“Let each SaaS app federate to AD FS.” AD FS makes the on-prem forest even more critical: now your Salesforce, your ServiceNow, and your Moodle logins all die when the domain controllers die, and you are running a fragile, internet-exposed federation server that is itself a prime ransomware and token-forgery target (the Golden SAML class of attack).
The architecture threads the needle differently: insert Okta as the identity broker in front of everything. Okta becomes the single place users authenticate, the single place MFA and device posture are enforced, and the single place that provisions and deprovisions accounts everywhere. AD stays — it keeps serving Kerberos/LDAP to the clinical appliances that need it — but it is demoted from the authority to an authority that Okta reads from and, over time, writes less and less to. Crucially, Okta lives in the cloud and survives an on-prem AD outage, so a domain-controller ransomware event no longer means clinicians cannot log into the EHR.
Architecture overview
The design runs two planes that are easy to conflate and must be kept distinct: a control plane that governs the lifecycle of an identity (who exists, what they may access, provisioned and deprovisioned), and a runtime plane that handles a live login at 3 a.m. when a nurse taps a badge. Modernization is the act of steadily migrating both planes off on-prem AD as the sole authority — but they migrate on different schedules and with different risks.
The organizing principle of the whole topology: the HR system is the source of truth for who someone is, Okta is the authority for what they can reach, AD is reduced to a downstream consumer that only the legacy clinical estate still needs, and Entra ID is the gateway to the Microsoft cloud. Get that hierarchy right and every later decision falls out of it.
Runtime path, following a real login:
- A clinician opens the EHR (or Salesforce, or Moodle for their annual HIPAA refresher). The app is configured for SSO to Okta via SAML or OIDC — Okta is the IdP, the app is the service provider. The user’s traffic reaches the hospital’s public surfaces through Akamai at the edge for TLS termination, global anycast, WAF, and bot mitigation before it ever touches an origin.
- Okta authenticates the user and runs an adaptive access policy: it evaluates device trust, network zone (on-campus vs. remote), and risk signals, then prompts for Okta Verify push or a FIDO2 security key as a phishing-resistant second factor. Shared clinical workstations use Okta’s desktop SSO so a badge tap resolves to the right session without re-typing a password.
- For anything in the Microsoft cloud — Microsoft 365, the Azure portal, an internal app registered in Entra — Okta is wired as the external IdP federated to Entra ID. The user authenticates at Okta; Okta asserts to Entra; Entra issues its native token so Azure RBAC and Conditional Access see a first-class Entra identity. One human login, brokered cleanly across both clouds.
- For the legacy clinical appliances — PACS, lab analyzers, OR badge controllers, the older virtual appliances that speak only LDAP/Kerberos — the request still terminates against an on-prem domain controller. These are the accounts Okta synchronizes into AD rather than away from it, and they are the explicit reason AD does not vanish in phase one.
Control plane, the lifecycle that runs continuously:
The HR system (Workday in this case) is the system of record for joiners, movers, and leavers. An HR event flows to Okta, which then provisions and deprovisions outward via SCIM — creating the Salesforce account, the ServiceNow account, the Moodle enrollment, the AWS IAM Identity Center entitlement — and writes the user into AD through the Okta AD Agent for the apps that still bind to AD. When a clinician is terminated in Workday, Okta deprovisions every connected app and disables the AD account in one fan-out, closing the orphaned-account gap that audits always flag. The directional shift over the program is the whole point: early on, Okta writes a lot into AD; by the end, AD is a near-empty downstream consumer and most apps provision directly from Okta.
Component breakdown
| Layer | Service / tool | Role in the modernization | Key configuration choices |
|---|---|---|---|
| Edge | Akamai | TLS, anycast, WAF, bot mitigation for the Okta-fronted apps and the Okta org’s custom domain | Custom WAF rules for credential-stuffing; bot mitigation on the login origin |
| Workforce IdP | Okta | Single runtime authority for login, MFA, adaptive policy, and the SCIM provisioning hub | Adaptive MFA; FIDO2/Okta Verify; ThreatInsight on the auth endpoint |
| Microsoft gateway | Microsoft Entra ID | Federated target for the Azure / M365 estate; native RBAC + Conditional Access | Okta as external/federated IdP; CA as defense-in-depth behind Okta |
| Legacy directory | On-prem Active Directory | Kerberos/LDAP for clinical appliances; sync source early, downstream consumer late | DCs hardened and tiered; LDAP bindings inventoried and retired over time |
| Directory bridge | Okta AD Agent + Entra Connect | Sync identities between AD, Okta, and Entra; pass-through or delegated auth | Multiple agents for HA; password hash sync to Entra for resilience |
| Provisioning | SCIM (Okta → SaaS/cloud) | Automated joiner/mover/leaver to Salesforce, ServiceNow, Moodle, AWS, Azure | SCIM 2.0 connectors; group-driven entitlements; deprovision on HR term |
| Secrets | HashiCorp Vault | Holds the SCIM API tokens, AD service-account creds, signing keys the agents need | Dynamic AD credentials; short leases; agent injection, no creds on disk |
| ITSM / governance | ServiceNow | Access requests, approvals, and the change gate for every migration wave | Request workflow feeds Okta group membership; change gate per ward cutover |
| CSPM / posture | Wiz + Wiz Code | Verifies no risky public exposure or stale federation; scans IaC before it ships | Agentless cloud scan; Wiz Code on the Terraform PRs that wire identity |
| Endpoint / runtime security | CrowdStrike Falcon | Device-trust signal into Okta; runtime protection on DCs and identity infra | Falcon Identity signals to Okta; sensors on every domain controller |
| Observability | Datadog / Dynatrace | Login latency, MFA failure rate, provisioning lag, DC health | Okta + Entra logs ingested; SLOs on auth success and SCIM lag |
| CI / IaC | GitHub Actions / Jenkins + Terraform / Ansible + Argo CD | Manages Okta, Entra, and AD config as code; GitOps for app onboarding | OIDC to cloud (no stored creds); Terraform for tenants, Ansible for DCs |
A few of these choices carry the program, and the why matters.
Why Okta in front of Entra, rather than Entra alone. A pure-Microsoft shop would reasonably make Entra the broker and skip Okta. This health system cannot: its estate is genuinely multi-cloud (a large AWS footprint for analytics and imaging archives, alongside Azure for M365 and a few line-of-business apps), and it has hundreds of non-Microsoft SaaS apps and a steady stream of acquired companies on every conceivable directory. Okta’s value is being the neutral broker that federates to Entra and AWS IAM Identity Center and the SaaS estate from one policy engine, with the deepest catalog of pre-built SCIM connectors. Entra is then the gateway specifically for the Microsoft cloud — a target Okta federates to, not a competitor to it.
Why SCIM provisioning is the quiet hero. SSO gets the attention, but SCIM is what actually shrinks risk and toil. With SCIM, a termination in Workday propagates to every connected app in minutes — the Salesforce license reclaimed, the ServiceNow role revoked, the Moodle account disabled, the AWS entitlement pulled — instead of a help-desk ticket nobody files. That closes the dormant-account attack surface HIPAA auditors care about most, and it turns access into a managed lifecycle instead of a one-way accumulation of permissions.
Why hybrid join, not cloud-only or domain-only. Corporate Windows devices are hybrid-joined: domain-joined to on-prem AD (so Group Policy, file shares, and certificate enrollment keep working for the clinical workflows that depend on them) and registered to Entra ID (so Conditional Access and device-trust signals work in the cloud). Hybrid join is the bridge that lets a single laptop satisfy the legacy estate and the cloud estate at once — the practical embodiment of “migrate the runtime plane gradually.”
Implementation guidance: the staged path
The program is deliberately staged. Every wave must reduce risk on its own, because a hospital cannot bet clinical operations on a finish line two years out. Manage the whole thing as code: Okta and Entra config in Terraform, domain-controller hardening and agent rollout in Ansible, app-onboarding manifests promoted through Argo CD (GitOps), all built and gated by GitHub Actions (or Jenkins, which the platform team already runs for legacy jobs). Wiz Code scans those Terraform PRs before merge so a misconfigured federation or an over-broad trust never reaches production.
Phase 1 — Establish Okta as the broker, AD still authoritative. Stand up the Okta org, deploy redundant Okta AD Agents against the existing forest, and import users and groups. Wire password hash sync so Okta (and Entra) can authenticate users even if the on-prem DCs are unreachable — this is the single most important resilience win and it lands in week one. Federate Entra to Okta as an external IdP. Onboard the first low-risk apps (Moodle, the internal wiki) to Okta SSO. Roll out adaptive MFA to administrators first, then staff. At the end of phase 1, AD is still the source of truth, but humans now authenticate at Okta and survive a DC outage.
The Okta-to-Entra federation is configured declaratively; the intent is “Okta is the IdP, Entra trusts it”:
resource "okta_app_oauth" "entra_federation" {
label = "Microsoft Entra ID (federated)"
type = "web"
grant_types = ["authorization_code"]
response_types = ["code"]
redirect_uris = [
"https://login.microsoftonline.com/te/${var.entra_tenant_id}/oauth2/authresp"
]
}
# Entra side: Okta registered as an external/federated IdP for the tenant's
# verified domains, so a user at clinic.example.org authenticates against Okta
# and Entra issues a native token for Azure RBAC + Conditional Access.
Phase 2 — Make Okta the provisioning authority. Turn on SCIM from Okta to the SaaS and cloud estate: Salesforce, ServiceNow, AWS IAM Identity Center, Moodle, and the rest. Drive entitlements from Okta groups, and feed those group memberships from ServiceNow access requests so every grant has an approval and an audit record. Now joiner/mover/leaver is automated end to end: Workday → Okta → every app, and Okta → AD for the accounts still bound to it. This is where the orphaned-account problem disappears.
Phase 3 — Shrink the AD dependency, app by app. Inventory every LDAP/Kerberos binding (the painful, unglamorous step) and reclassify each: cloud-ready apps move to Okta-native SSO and stop touching AD; legacy-but-replaceable appliances get firmware updates or are fronted by an LDAP-to-OIDC proxy; genuinely AD-bound clinical systems stay, and are isolated to a small, hardened, well-monitored AD footprint. The domain-controller estate contracts from “everywhere” to “a hardened island serving a known, shrinking list of appliances.” That island is what remains of the ransomware blast radius — and it is now small, segmented, and watched, instead of being the whole hospital.
A pragmatic rule keeps the program honest: never migrate an app off AD without an instant rollback. Run new SSO in parallel, cut over one ward or one app at a time behind a ServiceNow change gate, and keep the AD path live until the new path proves out. Clinical safety forbids flag days.
Enterprise considerations
Security & Zero Trust. The architecture moves the hospital toward Zero Trust by making identity the control plane: phishing-resistant MFA (FIDO2 / Okta Verify) on every login, adaptive policy that weighs device and network risk, and least-privilege entitlements driven by groups rather than standing admin rights. Device trust is real, not assumed — CrowdStrike Falcon feeds device-posture and identity-risk signals into Okta’s policy engine, so a workstation flagged by Falcon as compromised is forced to step-up or is blocked outright, and Falcon sensors run on every domain controller to catch the lateral-movement and credential-dumping patterns that precede a Golden Ticket attack. HashiCorp Vault holds the secrets this whole machine needs — the SCIM API tokens, the AD service-account credentials the Okta Agent uses, the SAML signing keys — issued as short-lived dynamic leases and injected at runtime, so nothing sensitive sits in a config file or a CI variable (a lesson this team has internalized after a prior leaked-credential scare). Wiz runs continuous CSPM across the AWS and Azure estate to confirm no identity surface drifts to risky public exposure and no stale federation trust lingers after an app is decommissioned, and Wiz Code shifts that check left onto the Terraform PRs themselves. A common failure mode this guards against: AD FS or a forgotten SAML trust left enabled after migration — a dormant token-forgery path. Wiz flags it; the migration runbook requires it be torn down.
Cost optimization. Identity modernization saves money in ways finance can see, and the levers are concrete.
| Lever | Mechanism | Typical effect |
|---|---|---|
| License reclamation | SCIM deprovisioning frees SaaS seats the instant someone leaves | Recovers seats that previously leaked for months |
| Help-desk deflection | Self-service password reset + SSO collapse “I can’t log in” tickets | Password resets are a top-3 ticket category; SSO cuts the volume hard |
| DC footprint reduction | Fewer domain controllers and AD FS servers to license, patch, and host | Retires Windows Server + AD FS infra as apps migrate off |
| Merger absorption | New acquisitions federate to Okta instead of forest trusts | Avoids the cost and risk of every-merger forest-trust projects |
| Audit efficiency | One identity log and access-review surface instead of per-app exports | Shrinks the quarterly HIPAA access-review effort |
The honest counter-pressure: Okta is a per-user subscription, and at 18,000 users that is a material recurring line item. It is justified by the consolidated security posture, the retired infrastructure, and the deflected toil — but it must be modeled openly, because “identity got more expensive on the invoice even though it got cheaper overall” is exactly the kind of surprise that erodes trust with a nonprofit’s CFO.
Scalability & mergers. The design scales precisely where this organization grows: acquisitions. A newly acquired hospital’s directory is connected to Okta as just another sourced directory (or its users are imported and its old forest retired on a schedule), and federated in — no fragile cross-forest trust, no project to merge two AD schemas. Okta AD Agents scale horizontally for sync throughput and HA; the SCIM hub fans out to as many downstream apps as the estate contains. The natural ceiling is integration effort per net-new app, not a platform limit — which is why onboarding is templated as code and promoted through Argo CD so adding an app is a reviewed pull request, not a bespoke ticket.
Failure modes, and what each looks like. Name them before they page the on-call.
- On-prem AD / domain controllers down (the ransomware scenario) — clinical appliances bound to AD lose authentication, but because of password hash sync, humans still log into Okta and the cloud apps. Mitigation: this is exactly the resilience the program buys; the residual exposure is the small, hardened AD island, recovered from immutable DC backups.
- Okta org outage — Okta is now the single front door, so its availability is critical. Mitigation: Okta’s own multi-tenant SLA, plus break-glass local admin accounts on critical systems stored in Vault, and Entra’s password hash sync providing a fallback authentication path for the Microsoft estate.
- Okta AD Agent failure — sync stalls; new joiners/leavers do not propagate to AD-bound apps. Mitigation: deploy multiple agents for HA and alert on agent heartbeat in Datadog.
- SCIM drift — a connector silently fails and an app’s accounts diverge from Okta. Mitigation: scheduled reconciliation reports and a Datadog/Dynatrace SLO on provisioning lag so a stuck connector pages instead of festering.
- Federation misconfiguration on cutover — a domain federation set wrong locks an entire tenant’s users out. Mitigation: stage every federation change in a test tenant, gate it through ServiceNow change, and keep a documented un-federate rollback.
Reliability & DR (RTO/RPO). Decide the numbers per plane. The runtime plane’s resilience is the headline: with password hash sync, an on-prem AD outage no longer takes down human cloud logins, so the effective RTO for “clinicians can authenticate to the EHR” drops from days (the old DC-recovery time) to near zero. For the remaining AD island, maintain immutable, offline domain-controller backups and a rehearsed forest-recovery runbook — the discipline the neighboring hospital lacked. Okta config and Entra config are stored as Terraform state and re-applicable, so the control plane is rebuildable from code. A pragmatic target: human authentication RTO of minutes (Okta + hash sync), and AD-island recovery measured in hours, not weeks, from immutable backups.
Observability. Instrument the signals that actually predict an incident, in Datadog (or Dynatrace): login success rate and p95 login latency at Okta, MFA failure and step-up rate (a spike signals an attack or a broken policy), SCIM provisioning lag per connector, AD Agent heartbeat, and domain-controller health. Ingest the Okta System Log and Entra sign-in logs so every authentication and admin action is queryable for HIPAA audit and incident review. Set SLOs on auth success and provisioning lag, and route guardrail breaches — a surge in failed MFA, a Falcon detection on a DC — to a ServiceNow incident so security gets a ticket, not just a dashboard.
Governance. Manage identity infrastructure as code: Okta apps, policies, and groups in Terraform; domain-controller configuration and the agent fleet in Ansible; app onboarding promoted via Argo CD GitOps; all built and gated by GitHub Actions or Jenkins with OIDC to the cloud so no long-lived credential sits in a pipeline. Run quarterly access reviews out of the single Okta entitlement surface instead of per-app spreadsheets. Pin which apps are AD-bound versus Okta-native in a living inventory, and require a ServiceNow change record for every migration wave so compliance has a documented, reversible trail.
Explicit tradeoffs
Accept these or do not start. Inserting Okta as the broker adds a hop and a token-translation step on every Microsoft-cloud login that a pure-Entra shop would not have — and it makes Okta a critical, paid dependency you must operate to a high SLA. The hybrid-join state is genuinely complex: a device that is both domain-joined and Entra-registered has more moving parts and more failure modes than either alone, and getting Entra Connect / Okta AD Agent topology and password hash sync right is fiddly. The staged path is slow on purpose — risk reduction lands early, but full AD minimization is a multi-year program with a long tail of stubborn clinical appliances that may never leave AD, so “retire Active Directory” is honestly the wrong goal; “reduce AD to a small, hardened, well-understood island” is the achievable one. And the SCIM/federation lattice is real engineering per app, not a switch you flip.
The alternatives, and when they win. A pure-Microsoft organization with no AWS and few non-Microsoft SaaS apps should make Entra ID the broker directly and skip Okta — the second IdP earns its keep only when the estate is genuinely multi-cloud and multi-vendor, which this hospital’s is. A greenfield company with no legacy clinical appliances can go cloud-only from day one and never stand up a domain controller at all. And an organization whose risk is concentrated in a few apps might get most of the benefit from just SSO + MFA on those apps without the full provisioning and AD-shrink program — though it leaves the orphaned-account and DC-blast-radius problems unsolved, which for a HIPAA-regulated hospital is precisely the gap that matters.
The shape of the win
For the health system, the payoff is the answer to the board’s original question. After phase 1, the CIO can say: “If our domain controllers are encrypted tomorrow, our clinicians still log into the EHR, the imaging archive, and email — because they authenticate at Okta in the cloud, not against those servers. The blast radius is now a small, hardened set of legacy appliances we are actively retiring, recoverable from immutable backups in hours, not the entire hospital for nineteen days.” That sentence is what funds the program. Everything upstream — Okta as the broker, the Entra federation, hybrid join, SCIM-driven lifecycle, the Vault-held secrets, the CrowdStrike device signals, the Wiz posture checks, the Datadog SLOs — exists so that an identity outage stops being an existential clinical event and becomes a contained, survivable one. The architecture here is the destination. Start with phase 1 and the resilience it buys on day one; the rest is the disciplined, ward-by-ward walk to get there.