A 14-hospital regional health system in the US runs an on-premises PACS — a picture archiving and communication system — that is, in the most literal sense, full. Radiology produces roughly 60 TB a year of CT, MR, mammography, and digital-pathology studies, and the SAN under the legacy archive is out of both capacity and warranty. The CIO has a stack of converging problems on one desk: a hardware refresh quote that makes the CFO wince, a regulatory mandate to retain adult imaging for at least seven years and pediatric imaging until the patient turns 21 plus the statute of limitations (which in practice means decades), a recent ransomware scare at a peer system that put PACS continuity on the board’s agenda, and three new outpatient clinics that need radiologists to read studies from home over a browser. The directive that lands on the cloud team is deceptively simple: “Get us off the SAN, keep every study forever, let our radiologists read from anywhere, and pass the next HIPAA audit without a war room.” A lift-and-shift of a VNA appliance onto an EC2 fleet with an EBS volume is not the answer. This article is the reference architecture for doing it properly on AWS — a managed, lifecycle-tiered, encrypted, fully audited DICOM archive that a hospital’s privacy officer and CISO will actually sign.
The pressures here are specific to imaging and worth naming, because they drive every decision downstream. Object size and count are both enormous: a single CT study can be hundreds of slices and a digital-pathology whole-slide image can exceed a gigabyte, and across 14 hospitals you are managing hundreds of millions of objects. Access is bimodal — a study is read intensively in the first 30–90 days, then almost never, yet must remain recoverable for decades. Retention is legally mandatory, not a nice-to-have, which makes deletion a compliance event, not a cleanup. And every byte is protected health information (PHI) under HIPAA, so encryption, access control, and a tamper-evident audit trail are table stakes, not features. The architecture that satisfies all of this is a managed imaging service for the active tier plus S3 lifecycle tiering that automatically and cheaply ages cold studies into Glacier — the storage cost curve is the whole game.
Why not the obvious shortcuts
Three naive approaches will be proposed in the first design meeting, and each fails in a way worth stating plainly so the team can move past them.
EC2 plus a big EBS volume simply recreates the SAN problem in the cloud at a worse price. EBS is block storage priced for hot, attached workloads; parking 400 TB of mostly-cold imaging on gp3 volumes costs roughly an order of magnitude more than object storage and you still own the appliance, the OS patching, the capacity planning, and the backup. You have rented the same trap.
Raw S3 Standard for everything fixes durability and removes the appliance, but ignores the access pattern. Keeping a decade of studies that are read once and then never touched in the Standard tier means paying hot-storage rates for cold data forever — and “forever” here is literal. Without lifecycle policy, the storage bill grows monotonically and the CFO’s original complaint comes right back.
Self-managing the DICOM layer — running an open-source DICOM server on containers, parsing the protocol, building the web viewer, and writing the de-identification and metadata indexing yourself — is a multi-year engineering program that a hospital IT department should not own. DICOM is a deep protocol (DIMSE networking, DICOMweb, structured reporting, an enormous tag dictionary), and getting de-identification subtly wrong is a breach. There is a managed service for exactly this, and the build-it-yourself path spends scarce healthcare-engineering budget reinventing it.
The architecture below uses AWS HealthImaging as the purpose-built DICOM data store, backs it and the raw imports with S3 under an aggressive lifecycle policy, fronts the radiologist experience with a zero-footprint web viewer behind an Application Load Balancer, and makes the whole thing auditable with CloudTrail and S3 access logs queried by Athena. Identity, posture, runtime, and observability tooling wrap it into something operable.
Architecture overview
The platform runs two distinct paths that share storage but live on different schedules: a high-volume ingest pipeline that absorbs studies from hospital modalities and the legacy archive, and a synchronous read pipeline that serves radiologists and clinicians a viewer. Keeping them separate is the first step to operating this well — ingest is throughput-bound and bursty (overnight migration backfills, daytime modality output); read is latency-sensitive and human-facing.
The defining property of the topology is the one the privacy officer cares about most: PHI never has a public surface, and every byte is encrypted with customer-managed keys. S3 buckets, the HealthImaging data store, and the viewer compute all live behind private networking or are reachable only through an identity-gated, WAF-protected front door. The active study set is in HealthImaging and S3 Standard / Intelligent-Tiering; the cold archive ages automatically into Glacier; nothing is ever quietly deleted before its legally mandated retention expires.
Ingest path, following the data flow:
- Hospital modalities (CT, MR, ultrasound, mammography, pathology scanners) and the legacy VNA push DICOM. On-premises edge agents — packaged as virtual appliances running on each hospital’s hypervisor — terminate the C-STORE / DICOM association locally, buffer studies, and forward them to AWS over the Direct Connect private circuit, so imaging traffic never crosses the public internet and a clinic’s flaky link cannot drop a study mid-transfer.
- Studies land first as raw DICOM in a landing S3 bucket. An S3
ObjectCreatedevent triggers an import into AWS HealthImaging, which ingests the DICOM, normalizes it into its optimized internal representation (separating bulk pixel data from searchable metadata), and exposes the study via the DICOMweb (WADO-RS/QIDO-RS) and native APIs. - HealthImaging emits the import result and metadata; a small Lambda writes a study-level index row (patient ID hash, accession number, modality, body part, study date, retention class) into DynamoDB so the worklist and lifecycle logic have a fast, queryable catalog without scanning S3.
- The original DICOM in the landing bucket is transitioned by S3 Lifecycle rules: hot for the read-intensive window, then progressively to colder tiers and finally Glacier Deep Archive for the long legal tail.
Read path, synchronous and human-facing:
- A radiologist opens the zero-footprint viewer — a browser-based DICOM viewer (e.g. an OHIF-style web app) served from compute behind an Application Load Balancer. There is nothing to install on the workstation, which is what makes work-from-home reads and outpatient clinics feasible.
- The request authenticates through Okta as the clinical workforce IdP, federated to AWS IAM Identity Center, so a radiologist’s existing hospital SSO and MFA gate access and AWS sees a scoped, role-mapped session. Akamai sits at the edge for TLS, global anycast, and WAF/bot protection in front of the ALB.
- The viewer’s API calls hit a thin backend on ECS/Fargate that authorizes the request against the user’s role and the DynamoDB catalog, then proxies DICOMweb reads from HealthImaging — streaming frames so a study renders progressively rather than after a full download.
- If a requested study has already aged into Glacier, the backend issues a restore (expedited or standard retrieval) and surfaces a “retrieving from archive” state to the radiologist — a deliberately visible tradeoff covered below.
- Every API call, every HealthImaging access, and every S3 object read is logged for the audit trail.
Component breakdown
| Component | Service / tool | Role in the platform | Key configuration choices |
|---|---|---|---|
| Edge | Akamai | TLS, anycast, WAF, bot mitigation in front of the viewer | WAF rules for credential-stuffing; origin shield to the ALB |
| On-prem ingest | Virtual appliances (per hospital) | Terminate DICOM C-STORE locally, buffer, forward over Direct Connect | Store-and-forward queue; auto-retry; TLS to AWS |
| Identity / SSO | Okta + AWS IAM Identity Center | Clinical SSO (Okta) federated to AWS for scoped role sessions | SAML/OIDC federation; MFA enforced; radiologist vs. admin roles |
| DICOM store | AWS HealthImaging | Managed import, storage, and DICOMweb serving of studies | Import jobs from S3; SSE-KMS with CMK; metadata + pixel separation |
| Active object store | Amazon S3 | Landing zone for raw DICOM + durable source of truth | Versioning + Object Lock (compliance mode); SSE-KMS |
| Cold archive | S3 Glacier / Deep Archive | Decade-scale cheap retention of cold studies | Lifecycle transitions; Deep Archive for the long legal tail |
| Catalog | DynamoDB | Fast study index / worklist; retention class per study | On-demand capacity; GSI on accession + study date |
| Viewer backend | ECS on Fargate | AuthZ, DICOMweb proxy, restore orchestration | Private subnets; auto-scaling on concurrency |
| Web viewer | Zero-footprint DICOM viewer (OHIF-style) | Browser-based read, no workstation install | Progressive frame streaming; served behind ALB |
| Ingress | Application Load Balancer | Terminate, route, health-check the viewer tier | HTTPS only; WAF association; target group health checks |
| Encryption | AWS KMS (customer-managed) | Envelope encryption for all PHI at rest | CMK per environment; key policy + grants; rotation on |
| Audit query | Athena over CloudTrail + S3 access logs | HIPAA access auditing and forensics | Partitioned external tables; SQL audit reports |
| Secrets | HashiCorp Vault | Third-party tokens, edge-agent credentials, signing keys | AWS auth method; dynamic leases; short-lived secrets |
| CSPM / data posture | Wiz | Cloud posture, PHI exposure, attack-path analysis | Agentless scan of S3/HealthImaging; alert on public-exposure drift |
| Runtime security | CrowdStrike Falcon | Runtime protection on Fargate tasks and ingest VMs | Sensor on the viewer tier and edge appliances; SOC feed |
| Observability | Datadog | Metrics, traces, ingest/restore latency, audit dashboards | Agent on ECS; APM on the DICOMweb proxy; log pipeline |
| ITSM / approvals | ServiceNow | Migration change gates, deletion approvals, incident records | Change gate per hospital cutover; auto-ticket on guardrail breach |
| CI / IaC | GitHub Actions + Terraform | Pipeline build/test; infrastructure as code | OIDC to AWS (no stored creds); plan/apply gates |
A few of these choices deserve the why, because they are the ones teams get wrong.
Why HealthImaging instead of plain S3 plus a DICOM server. Raw S3 stores files; it does not understand DICOM. HealthImaging ingests a study, splits the heavy pixel data from the searchable metadata, and serves it back over standard DICOMweb so any conformant viewer works — and it does the protocol-level heavy lifting (frame-level access, metadata search) that you would otherwise build and maintain. The original DICOM still lives in S3 as your durable, portable source of truth, so you are never locked in: the managed service is the serving layer, S3 is the system of record.
Why customer-managed KMS keys, not the default. SSE-S3 encrypts data but the key is entirely AWS-controlled and invisible to your auditors. A customer-managed CMK lets the hospital’s security team own the key policy, grant access explicitly to HealthImaging and the viewer role, see every Decrypt call in CloudTrail, and revoke access by disabling the key — which is the kind of control a HIPAA risk assessment expects to see documented. Encrypt the HealthImaging data store, both S3 buckets, and DynamoDB under it.
Why the audit trail is Athena over CloudTrail, not a SIEM line item. HIPAA’s audit-control requirement is concrete: you must be able to answer “who accessed this patient’s images, when, and from where.” CloudTrail records the API calls (HealthImaging reads, KMS decrypts, S3 gets), S3 server-access logs record object-level reads, and both land in a log bucket that Athena queries directly with SQL. There is no cluster to run and no data to move — you point a partitioned external table at the logs and answer an auditor’s question in a query.
Implementation guidance
Provision with Terraform, and treat encryption and retention as the first deliverables, not a hardening pass. Getting the key policy, bucket protections, and lifecycle rules right at creation time is far cheaper than retrofitting them onto a live PHI archive.
A minimal Terraform shape for the landing bucket communicates the intent — versioned, Object-Locked for tamper-evidence, CMK-encrypted, public access fully blocked:
resource "aws_s3_bucket" "dicom_landing" {
bucket = "rhs-dicom-landing-prod"
object_lock_enabled = true # WORM: retention can't be deleted early
}
resource "aws_s3_bucket_server_side_encryption_configuration" "dicom" {
bucket = aws_s3_bucket.dicom_landing.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.phi.arn # customer-managed CMK
}
bucket_key_enabled = true # cuts KMS request cost at scale
}
}
resource "aws_s3_bucket_public_access_block" "dicom" {
bucket = aws_s3_bucket.dicom_landing.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
Lifecycle tiering is where the cost story lives, so model it against the real access curve. Imaging is read hot for the first weeks, then goes cold but must remain recoverable for years. A lifecycle policy that transitions raw DICOM through progressively cheaper tiers — and only into Deep Archive once a study is firmly in the legal tail — turns a monotonically rising bill into a flat one:
resource "aws_s3_bucket_lifecycle_configuration" "dicom" {
bucket = aws_s3_bucket.dicom_landing.id
rule {
id = "tier-cold-studies"
status = "Enabled"
transition { days = 30 storage_class = "STANDARD_IA" } # past the active read window
transition { days = 90 storage_class = "GLACIER" } # cold but occasionally recalled
transition { days = 365 storage_class = "DEEP_ARCHIVE" } # decade-scale legal tail
# No expiration: deletion is a governed compliance event, never an S3 rule.
}
}
Note what is deliberately absent: there is no expiration. Retention is legally mandated, so studies are never aged out by a bucket rule — purging a study at end-of-retention is a governed, ticketed, dual-approved action through ServiceNow, not an automatic S3 deletion. For the genuinely unpredictable middle tier, S3 Intelligent-Tiering is the pragmatic alternative to hand-tuned transitions: it moves objects between access tiers automatically based on observed access, which suits a mixed corpus where you cannot cleanly predict which old studies a tumor-board follow-up will recall.
Identity: federate the clinicians, kill the long-lived keys. Radiologists authenticate through Okta with the hospital’s existing MFA and conditional-access policies; Okta federates to AWS IAM Identity Center over SAML/OIDC, which maps each clinician to a scoped permission set — a reading radiologist gets DICOMweb read access through the backend, an imaging administrator gets catalog and migration tooling, and neither gets direct S3 or KMS access. The viewer backend on Fargate uses an IAM task role, not stored credentials, granted exactly the permissions it needs: medical-imaging:GetImageFrame and the DICOMweb read actions on the data store, kms:Decrypt via a CMK grant, and read on the catalog table. The few residual secrets that are not IAM roles — edge-appliance enrollment credentials, third-party integration tokens — live in HashiCorp Vault, leased dynamically with the AWS auth method, so they are short-lived and never baked into an AMI or a task definition.
The migration itself is a project inside the project. Backfilling years of legacy studies is a bulk ingest that runs alongside live modality traffic. Stage exports from the old VNA to the landing bucket, drive HealthImaging import jobs in controlled batches (so you do not saturate Direct Connect during clinical hours), and reconcile counts against the source PACS before decommissioning anything. Each hospital’s cutover passes through a ServiceNow change gate with a verified import reconciliation, so the privacy officer has a documented chain of custody for every study that moved.
Enterprise considerations
Security & HIPAA. The architecture is defensible by construction: no public PHI surface, CMK encryption on every store, identity-gated access only, and least-privilege task roles. Layer on top: (a) all imaging traffic on Direct Connect, never the public internet, with the Akamai WAF in front of the viewer’s ALB to absorb credential-stuffing and bot traffic at the edge; (b) S3 Object Lock in compliance mode plus versioning so a ransomware actor or a careless admin cannot overwrite or delete studies within their retention window — the WORM (write-once-read-many) guarantee the board asked for after the peer-system incident; © Wiz running continuous CSPM and PHI-exposure scanning across S3 and HealthImaging, alerting the moment any bucket drifts toward public access or a key policy widens — the posture backstop behind the IAM controls; (d) CrowdStrike Falcon sensors on the Fargate viewer tier and the on-prem ingest appliances for runtime threat detection feeding the hospital SOC; (e) any guardrail breach — a public-exposure alert, a failed-decrypt spike, an attempted bulk export — auto-raises a ServiceNow incident so security has a ticket, not just a log line. A HIPAA Business Associate Addendum (BAA) with AWS underpins the whole arrangement, and every PHI-handling service in the design is BAA-eligible.
Cost optimization. Storage dominates and grows with every study, so engineer the curve from day one.
| Lever | Mechanism | Typical effect |
|---|---|---|
| Lifecycle to Glacier | Age cold studies Standard → IA → Glacier → Deep Archive | Deep Archive is a fraction of Standard’s per-GB cost |
| Intelligent-Tiering | Auto-move unpredictable middle-tier objects on observed access | No retrieval fee surprises on a mixed corpus |
| S3 Bucket Keys | Reduce per-object KMS API calls under SSE-KMS | Cuts KMS request charges at object scale |
| Retrieval discipline | Default to standard Glacier retrieval; expedite only on clinical need | Avoids paying expedited rates by default |
| Fargate right-sizing | Scale the viewer tier on real concurrency, not peak | Pay for daytime read load, not 24/7 peak |
Track storage-class distribution and retrieval spend in Datadog so the imaging service has the cost dashboard the CFO actually reviews — the same CFO whose SAN-refresh quote started this.
Scalability. Each tier scales independently. S3 and HealthImaging are effectively unbounded for storage and absorb ingest spikes natively, so the migration backfill and daytime modality output coexist without capacity planning. The viewer tier on Fargate auto-scales on request concurrency; DynamoDB on-demand absorbs worklist bursts. The natural ceilings are not storage but ingest throughput (bounded by Direct Connect bandwidth and HealthImaging import concurrency — pace the migration batches) and Glacier retrieval rate during a mass recall (e.g. a research cohort pull), which is a planning conversation, not a surprise.
Failure modes, and what each one looks like. Name them before they page you.
- A cold study requested for an urgent read — it is in Deep Archive and standard retrieval takes hours, unacceptable for a stat read. Mitigation: keep recent and likely-recalled studies in faster tiers via Intelligent-Tiering, expose an expedited Glacier retrieval path for clinically urgent cases, and surface a clear “retrieving from archive” state so the radiologist is never staring at a blank viewer.
- A KMS key or grant misconfiguration — the data is intact but the viewer cannot
Decrypt, so studies fail to render and it looks like data loss. Mitigation: assert the CMK key policy and HealthImaging grant in Terraform, alarm onDecryptaccess-denied spikes in CloudTrail, and never disable a production key without a change ticket. - An ingest appliance backlog — a hospital’s edge agent loses its link and studies queue locally; if its buffer fills, a modality’s send fails and a study is lost at the source. Mitigation: store-and-forward queues sized for a multi-hour outage, auto-retry, and a Datadog alert on queue depth per site.
- A botched migration reconciliation — studies appear migrated but a subset failed import silently, and the legacy PACS is decommissioned before anyone notices. Mitigation: count-and-checksum reconciliation against the source as a hard ServiceNow gate before any decommission, and never delete the source until the new archive is verified.
Reliability & DR (RTO/RPO). Decide the numbers per tier. S3 is regionally durable (eleven nines) and is the source of truth, so the real recovery guarantee is the object store itself; enable S3 Cross-Region Replication of the landing/source bucket to a paired region so a regional event cannot strand PHI, and the replica plus HealthImaging import is the rebuild path for the serving layer. DynamoDB point-in-time recovery and global tables cover the catalog. The viewer tier is stateless and redeploys from Terraform in either region behind a failed-over ALB, with Akamai health checks driving edge failover. A pragmatic target for this platform: RTO 1 hour for the read service, RPO near-zero for the archive (replication is continuous), accepting that mass re-import of studies into HealthImaging in a DR region is a background rebuild measured in hours while the source data itself is never at risk.
Observability & audit. Two distinct concerns, both essential. For operations, instrument the read path end to end in Datadog with APM on the DICOMweb proxy — one trace covering authorize → catalog lookup → frame stream (or restore) — and emit the metrics the service is judged on: time-to-first-frame (the latency a radiologist feels), archive-restore latency, ingest throughput and queue depth per hospital, and storage-class distribution and cost. For compliance, the Athena-over-CloudTrail-and-S3-access-logs layer answers the auditor’s questions directly — a saved query that returns every access to a given patient’s studies, by user and source IP, over any window:
SELECT eventtime, useridentity.arn AS who, sourceipaddress AS from_ip, eventname
FROM cloudtrail_logs
WHERE eventsource = 'medical-imaging.amazonaws.com'
AND requestparameters LIKE '%datastore/<id>/study/<patient_study_uid>%'
AND eventtime BETWEEN '2026-01-01' AND '2026-06-10'
ORDER BY eventtime;
That single query is the HIPAA access-audit control, with no infrastructure to run between question and answer.
Governance. Keep all infrastructure in version-controlled Terraform, reviewed and revertable, applied via GitHub Actions authenticating to AWS over OIDC so there is no stored access key to leak. Pin lifecycle policy and retention classes as code so a transition rule cannot drift silently. Apply org-level Service Control Policies to deny disabling encryption or making an imaging bucket public, with Wiz as the independent check that the controls are actually holding. Every deletion at end-of-retention is a dual-approved ServiceNow workflow with the action logged — because in an imaging archive, deletion is the most dangerous operation in the system, and it should be the hardest to perform.
Explicit tradeoffs
Accept these or do not build it. Lifecycle tiering trades retrieval latency for storage cost, and that trade is real and visible: a study in Deep Archive is dramatically cheaper to hold but takes hours to retrieve by default, which means you must design the access experience (faster tiers for likely-recalled studies, an expedited path for urgent reads, an honest “retrieving” state) rather than pretend all data is instant. The managed-service path — HealthImaging doing the DICOM heavy lifting — trades some control and a service dependency for not running a DICOM stack yourself; you mitigate lock-in by keeping the original DICOM in S3 as a portable source of truth, but you are still relying on AWS for the serving layer. The CMK encryption and Object-Lock posture that satisfy HIPAA cost you operational care: a fat-fingered key-policy change becomes a visible outage rather than a quiet degradation, which is by design but demands discipline. And the Okta-to-Identity-Center federation adds a hop the simpler single-IdP shops will not need.
The alternatives, and when they win. If you are a small single-site practice with modest volume and no work-from-home requirement, a managed third-party cloud PACS (SaaS) may be the right answer — you trade control and per-study cost for someone else owning the whole stack. If your retention need is short and your access stays hot, S3 Standard or Intelligent-Tiering without the Glacier tail is simpler and you skip retrieval-latency design entirely. If you have deep imaging-platform engineering and very specific workflow needs, self-hosting an open-source DICOM stack (Orthanc, dcm4chee) on AWS gives maximum control at the cost of owning the protocol — viable for a research institution, rarely worth it for a clinical health system that just needs the SAN gone. This architecture is the right destination when you have real scale, multi-site reads, decade-scale mandated retention, and a HIPAA audit you have to pass — which is precisely the regional health system that started this.
The shape of the win
For the health system, the payoff is not “we moved to the cloud.” It is that the SAN and its refresh quote are gone; a radiologist at home opens a browser, authenticates with the same hospital SSO, and a study renders progressively in seconds with nothing installed; a seven-year-old pediatric study is still there, still encrypted, still recoverable, costing a fraction of what it would on hot storage; and when the auditor asks who looked at a given patient’s images last March, the privacy officer answers with a single Athena query instead of a war room. Everything upstream — the lifecycle rules into Glacier, the customer-managed keys, the Object-Lock WORM guarantee, the Direct Connect ingest, the Wiz posture scanning, the CloudTrail-and-Athena audit trail — exists to make a CFO, a CISO, and a privacy officer each say yes. The architecture here is the destination; a small site can start narrower, but a multi-site, decade-retention, HIPAA-audited imaging archive has to land here.