A regional grocery retailer — 600 stores, nine distribution centres, two corporate offices, and a fast-growing e-commerce arm — gets a hard deadline from its CFO and its CISO in the same week. The CFO has seen the MPLS bill: ₹14 crore a year for a hub-and-spoke WAN whose every byte of cloud and SaaS traffic is hauled back to two data-centre internet gateways before it can reach the internet, and the three-year renewal quote is up 20%. The CISO has a different problem: a third-party penetration test found that the store-network VPN, once a contractor’s laptop is on it, can reach the entire flat corporate /8 — point-of-sale controllers, the warehouse management system, the HR database — because the old “connect to the network, then you’re trusted” model has no idea who the user is or what app they actually need. Both problems have the same root cause: a network built in 2012 where the network perimeter is the security boundary and all roads lead to the data centre. This article is the reference architecture for unwinding that — replacing MPLS and the concentrator VPN with a Zscaler SASE fabric, where the security boundary moves to identity and the shortest path to an app is a direct one.
The pressures stack the way they always do in retail. Cost is the loud one: backhauling Microsoft 365, the new cloud-hosted ERP, and store CCTV uploads across MPLS to a central breakout is paying premium private-circuit rates to carry traffic that was always destined for the public internet. Latency is the silent killer of the e-commerce site and the in-store ordering tablets — a trombone route from a store to a data centre to the cloud and back adds 40–80 ms that customers feel. Security is the one that ends careers: a flat, VPN-trusted network means one compromised store device is one hop from the cardholder-data environment, and PCI-DSS auditors have started asking pointed questions about segmentation. And agility means the business wants to open 80 new stores this year, each of which currently waits 90 days for an MPLS circuit to be provisioned. SASE — Secure Access Service Edge — answers all four by inverting the model: every user and branch connects to a globally distributed security cloud, policy follows identity, and traffic egresses to its destination from the nearest point of presence instead of being dragged home first.
Why not the obvious shortcuts
The cheaper-looking fixes each fail predictably, and naming why matters because someone on the steering committee will propose all three.
“Just add internet breakout at each store with a firewall.” Now you own and patch 600 branch firewalls, 600 sets of URL-filtering rules that drift out of sync, and 600 places a misconfiguration can expose the store flat. You have decentralised the cost but multiplied the security surface and the operational toil — and you still have a VPN for remote staff.
“Keep the VPN, just micro-segment the data centre.” Internal segmentation is worth doing, but it does nothing for the core defect: the VPN still grants network access before it knows who the user is or which app they need, so a stolen credential or a compromised contractor laptop is still on the network looking for a way across. You are hardening the inside of a building whose front door checks no ID.
“Move to SD-WAN and call it done.” SD-WAN is genuinely useful and is part of this design — it makes the branch transport cheap and resilient over commodity broadband. But SD-WAN is a transport technology: it decides which link a packet takes, not whether the user behind the packet is allowed to reach that app. SD-WAN without a security cloud just gives you faster, cheaper paths to an unsegmented internet. The two are complementary, not alternatives.
SASE threads the needle. Zscaler operates the security and access controls as a cloud service that sits inline between every user and every destination — internet, SaaS, or private app. Zscaler Internet Access (ZIA) inspects and filters all internet/SaaS traffic at the nearest point of presence. Zscaler Private Access (ZPA) replaces the VPN with identity-based, app-level access to internal systems — and crucially, it never puts the user on the network: it brokers an outbound-only connection from the app to the user, app by app, so there is no network to laterally move across. Identity from Okta decides who you are; policy decides which apps you may reach; the nearest PoP is where it happens.
Architecture overview
The fabric carries three distinct traffic classes that share the same control plane but follow different paths: internet/SaaS traffic (handled by ZIA), private-app traffic (handled by ZPA), and branch transport (handled by SD-WAN and direct breakout). Keeping those three separate in your head is the first step to operating SASE well, because they fail and scale independently.
The defining property of the whole topology is the one the CISO cares about most: the network is no longer the trust boundary — identity is. A user or device is never “on the corporate network.” Every flow is brokered, authenticated against Okta, evaluated against policy, inspected, and forwarded — whether the user is in a store, at home, or in head office. There is no inside.
Internet / SaaS path (ZIA), following the control flow:
- A user — a store manager on a tablet, an analyst on a laptop — is enrolled with Zscaler Client Connector (the device agent). For unmanaged or IoT devices in a store, traffic is steered instead by GRE/IPSec tunnels from the branch SD-WAN edge to the nearest ZIA PoP, so even a label printer’s firmware-update call is inspected.
- The traffic is forwarded to the closest Zscaler ZIA point of presence — not to the data centre. There, Zscaler does full inline inspection: TLS decryption, URL and content filtering, Cloud DLP to stop card numbers or customer PII leaving over webmail or shadow-SaaS, sandboxing of downloaded files, and CASB controls on sanctioned SaaS.
- Identity and posture gate the policy. ZIA consumes the user identity and group claims federated from Okta (SAML/SCIM), so policy is written against people and roles, not IP ranges — “store staff may not reach personal cloud storage,” “finance may reach the ERP SaaS.” Device posture (is CrowdStrike running, is the disk encrypted) can be required before sensitive categories are allowed.
- Clean traffic egresses to the internet or SaaS destination from the PoP, by the shortest path. The data-centre internet gateways — and the MPLS circuits that fed them — are decommissioned. Backhaul is gone.
Private-app path (ZPA), replacing the VPN:
- The same Client Connector, when a user requests an internal app (the warehouse management system, an internal HR portal, an SSH jump host), sends the request to the ZPA service edge instead of a VPN concentrator.
- ZPA authenticates the user against Okta, evaluates access policy (which app, from what device posture, at what time), and then stitches the session together using App Connectors — lightweight forwarders (virtual appliances) we deploy inside each data centre and each cloud VPC/VNet. The App Connector makes an outbound-only connection to the Zscaler cloud; there is no inbound firewall hole and no published VPN endpoint to attack.
- The user reaches exactly one application, at layer 7, brokered through the cloud. They are never placed on the subnet, cannot scan it, and cannot pivot. A contractor cleared for the WMS sees the WMS and nothing else — the pen-test finding that started this project simply cannot happen.
Branch transport (SD-WAN + direct breakout): each store’s SD-WAN edge (virtual appliance or hardware) runs two commodity broadband links (and optionally an LTE/5G failover) instead of an MPLS circuit. It sends internet/SaaS traffic straight to the nearest ZIA PoP over IPSec/GRE, sends private-app traffic via the Client Connector/ZPA path, and uses application-aware routing to keep the POS transaction flows on the healthiest link. The two “cloud hubs” — landing-zone VNets/VPCs in Azure and AWS where the ERP and e-commerce platform live — host App Connectors so private access into the cloud rides the same fabric.
Component breakdown
| Component | Service / tool | Role in the fabric | Key configuration choices |
|---|---|---|---|
| Internet/SaaS security | Zscaler Internet Access (ZIA) | Inline TLS inspection, URL/content filter, DLP, CASB, sandbox for all egress | PAC + Client Connector forwarding; full SSL inspection with bypass list; DLP dictionaries for PAN/PII |
| Private app access | Zscaler Private Access (ZPA) | VPN replacement: identity-based, per-app, outbound-only brokering | App segments per application; policy by Okta group + device posture; no network access |
| Device agent | Zscaler Client Connector | Steers managed-device traffic to ZIA/ZPA; enrols posture | Forwarding profile per network; trusted-network detection for office |
| Branch edge | SD-WAN (virtual appliance) | Cheap, resilient transport; app-aware routing; tunnels to ZIA | Dual broadband + LTE failover; IPSec/GRE to nearest PoP; POS QoS class |
| Identity / SSO | Okta | Source of truth for user identity, groups, MFA; feeds ZIA & ZPA | SAML SSO + SCIM provisioning to Zscaler; adaptive MFA on sensitive apps |
| Secrets | HashiCorp Vault | API tokens for automation, App Connector provisioning keys, cert material | Dynamic leases; short-TTL Zscaler API tokens; no static keys in pipelines |
| Endpoint security | CrowdStrike Falcon | Device posture signal to Zscaler + runtime EDR on endpoints/servers | Falcon ZTA score gates ZPA policy; sensor on App Connector VMs |
| CSPM / posture | Wiz + Wiz Code | Posture of the cloud hubs and App Connector infra; IaC scanning pre-merge | Agentless scan of VNet/VPC; Wiz Code blocks risky Terraform in PR |
| Observability | Dynatrace / Datadog | Digital-experience monitoring, path/latency, PoP health, app SLOs | Synthetic store probes; Zscaler ZDX/log feed; alert on breakout latency |
| Edge / web acceleration | Akamai | CDN + WAF in front of the public e-commerce site (not user egress) | WAF rules; origin shield to the cloud hub; bot management at the edge |
| ITSM / approvals | ServiceNow | Store cutover changes, access-request approvals, incident records | Change gate per store wave; auto-ticket on policy breach or PoP outage |
| CI / IaC | GitHub Actions / Jenkins + Terraform / Ansible | Automate App Connector deploy, policy-as-code, store edge config | OIDC to cloud (no stored creds); Terraform for hubs; Ansible for edge fleet |
| Internal training | Moodle | Rollout enablement: store-staff and helpdesk SASE courses | Per-wave course; completion gate before a store’s cutover ticket closes |
A few of these choices deserve the why, because they are the ones teams get wrong.
Why ZPA is not “VPN in the cloud.” A cloud VPN still terminates the user onto a network. ZPA never does. The App Connector dials out to the Zscaler cloud and the broker splices the two halves of an authenticated, per-app session together; the application is effectively dark to everyone who is not explicitly authorised to it, and there is no listening VPN port on the internet to scan, exploit, or DDoS. This is the structural reason a SASE rollout shrinks the attack surface rather than just relocating it.
Why identity and posture must both gate access. Identity alone (a valid Okta token) is necessary but not sufficient — a phished credential is a valid token. So ZPA policy also consumes device posture: the CrowdStrike Falcon Zero Trust Assessment score is passed to Zscaler, and a device with a low score (sensor disabled, OS out of date) is denied the sensitive app segments even with perfect credentials. Identity says who; posture says is this device trustworthy right now; only both together open the door.
Why TLS inspection is non-negotiable but needs a bypass list. Roughly all malware and data exfiltration now hides inside TLS, so ZIA must decrypt to inspect — that is most of the security value. But some traffic legally or technically must not be decrypted: certificate-pinned banking apps, healthcare/payroll portals, and a few APIs. The right pattern is decrypt by default, maintain a tight, audited bypass list for the genuine exceptions, and review it quarterly so it does not quietly become a hole.
Implementation guidance
Treat this as a fleet migration, not a flip of a switch. The deployment order matters because you are removing the data-centre breakout that every store currently depends on — get the sequence wrong and a store goes dark.
- Stand up identity first. Wire Okta as the IdP for both ZIA and ZPA: SAML for authentication, SCIM for automatic user/group provisioning so a leaver in Okta is a leaver in Zscaler the same minute. Define the group taxonomy (store-staff, DC-ops, finance, contractors) you will write policy against — policy quality is capped by identity quality.
- Deploy App Connectors into each hub and data centre as redundant pairs (two per location, in different fault domains) so private access has no single point of failure. Provision them with Terraform into the Azure VNet and AWS VPC landing zones; provisioning keys come from HashiCorp Vault with a short TTL, never hard-coded.
- Define app segments in ZPA — one per internal application, with the FQDN/port ranges and the Okta groups allowed — and run them in parallel with the existing VPN. Cut a pilot population over, prove it, then migrate cohorts.
- Roll out ZIA per store wave. Install Client Connector on managed devices; point the SD-WAN edge’s GRE/IPSec tunnels at the nearest ZIA PoP for unmanaged/IoT traffic; verify breakout works before touching MPLS.
- Only then retire the circuit. Once a store’s traffic is fully on the SASE fabric and monitored for a soak period, raise the ServiceNow change to decommission its MPLS tail — and bank the saving.
A minimal Terraform shape for an App Connector group in the Azure hub communicates the intent — outbound-only, redundant, provisioned from Vault:
# App Connectors run as VMs in the cloud hub; they dial OUT to Zscaler.
# No inbound rules are opened — that is the whole point.
data "vault_generic_secret" "zpa" {
path = "secret/zscaler/zpa-provisioning" # short-TTL key, not a static cred
}
resource "zpa_app_connector_group" "az_hub" {
name = "azhub-prod-cin"
enabled = true
city_country = "Chennai, IN"
latitude = "13.0827"
longitude = "80.2707"
upgrade_day = "SUNDAY" # maintenance window for the fleet
override_version_profile = true
version_profile_name = "Default"
}
resource "zpa_application_segment" "wms" {
name = "warehouse-mgmt"
enabled = true
domain_names = ["wms.internal.grocer.co"]
segment_group_id = zpa_segment_group.dc_apps.id
server_groups { id = [zpa_server_group.wms.id] }
tcp_port_ranges = ["443", "443"]
# Access is granted by a separate policy bound to an Okta group + posture.
}
The pipeline that applies this runs in GitHub Actions (some legacy edge-config jobs still run on Jenkins), authenticating to the cloud via OIDC federation so there is no stored service-principal secret to leak — a hard lesson the platform team intends never to repeat. The 600-store SD-WAN edge fleet is configured and kept in drift-free state with Ansible, so a new store’s edge is a playbook run, not a hand-built box.
Policy as code, reviewed like code. Zscaler access and DLP policy is exported and version-controlled, and changes flow through pull requests where Wiz Code scans the accompanying Terraform for misconfiguration before merge — an overly broad app segment or a public-exposure drift in the hub is caught in review, not in production. A policy change is a PR with an approver, not a click in a console nobody audits.
Enterprise considerations
Security & Zero Trust. The architecture is Zero Trust by construction: no user is ever on the network, access is per-app and identity-plus-posture gated, and every byte of egress is inspected. Layer on top: (a) ZIA Cloud DLP with dictionaries for PAN/PII so cardholder data and customer records cannot leave over webmail or an unsanctioned SaaS — directly serving the PCI-DSS segmentation and data-loss requirements; (b) CrowdStrike Falcon EDR on endpoints and on the App Connector VMs, with its Zero Trust score feeding ZPA policy so a sick device is quarantined from sensitive apps; © Wiz running continuous CSPM across the Azure and AWS hubs, alerting the moment an App Connector subnet or a hub resource drifts toward public exposure — the posture backstop behind the inline controls; (d) a policy or DLP breach (a blocked exfiltration, a denied-then-retried sensitive access) auto-raises a ServiceNow incident so the SOC has a ticket, not just a log line. The cardholder-data environment in particular is its own ZPA app segment with a tiny, explicitly enumerated access group — segmentation that an auditor can read in one screen.
Failure modes and resilience. Move your single points of failure from circuits to identity and brokers, then make those redundant. If a ZIA PoP degrades, Client Connector and the SD-WAN tunnels fail over to the next-nearest PoP automatically — Zscaler’s cloud is multi-PoP by design, which is more resilient than the two data-centre gateways it replaces. If Okta is unreachable, no one can authenticate to new sessions, so Okta is now tier-0 infrastructure: enforce its own HA, and configure a sensible session lifetime so existing sessions survive a brief IdP blip. If a store loses both broadband links, the LTE/5G failover keeps POS and ordering alive at reduced bandwidth — application-aware routing prioritises the transaction traffic. If an App Connector pair fails, private access to that location’s apps stops, which is why connectors are always deployed in redundant pairs across fault domains. The trade you have made is explicit: you have given up the (perceived) determinism of a private MPLS circuit for the resilience of a globally distributed cloud — and for a retailer whose stores already depend on the public internet for card settlement, that is the right trade.
Scaling and performance. SASE scales the way the business does. Opening 80 new stores no longer means 80 ninety-day circuit orders — it means shipping an SD-WAN edge, running an Ansible playbook, enrolling devices in Client Connector, and the store is on the fabric in days. The Zscaler cloud absorbs the inspection load elastically; you scale App Connectors (add a VM to the group) only for private-app throughput at the hubs. Watch the right metric: digital experience, not link utilisation. Dynatrace (or Datadog) runs synthetic probes from representative stores and ingests Zscaler’s experience telemetry, so the team sees per-hop latency from store to PoP to app and is alerted when a breakout path or a PoP regresses — before a store manager calls the helpdesk.
Cost. The headline saving is real and is what funds the project: retiring the MPLS WAN removes the ₹14 crore-a-year private-circuit spend and replaces it with commodity broadband (a fraction of the cost) plus per-user Zscaler subscriptions. But model the full picture honestly — the comparison below is the one the CFO actually signs.
| Cost line | Legacy MPLS + VPN model | Zscaler SASE model |
|---|---|---|
| Branch transport | Premium MPLS circuits per store (capacity-priced) | Commodity broadband + LTE failover (a fraction of MPLS) |
| Internet egress | Backhauled to 2 DC gateways (extra circuit + appliance cost) | Direct breakout at the PoP (no backhaul) |
| Remote access | VPN concentrators, licences, HA pair refresh | Included in ZPA subscription (no concentrator fleet) |
| Security stack | DC firewalls, web proxies, sandbox appliances to refresh | Inspection delivered as cloud service (no appliance refresh) |
| Subscription | — | Per-user ZIA + ZPA (predictable, scales with headcount) |
| Operations | Per-site appliance patching; 90-day circuit lead times | Fleet automation (Ansible); days to onboard a store |
The honest caveats: per-user SASE subscriptions are an operating cost that grows with headcount, so a sudden hiring surge raises the bill in a way MPLS did not; and full TLS inspection has a real (small) latency and compute cost that you are paying at the PoP. For this retailer the maths is decisively positive — the eliminated circuit and appliance-refresh spend dwarfs the subscription — but a business with very few sites and very heavy private-circuit needs (a trading firm wanting deterministic low-latency exchange links) might keep MPLS for that one workload and use SASE for everything else.
The explicit tradeoffs. SASE buys you Zero-Trust security, dramatically lower WAN cost, lower latency to cloud and SaaS, and the agility to open a store in days — and it costs you a hard dependency on the internet and on two cloud control planes (Zscaler and Okta) that you do not own. You trade circuit determinism for cloud resilience, and capital appliances for operating subscriptions. For a 600-store grocer whose every transaction already rides the public internet, who is bleeding money on backhaul, and who has a pen-test report saying the flat VPN is a liability, that trade is not close. The data-centre is no longer the centre of the network — the user is, and the security follows them wherever they go.
Enablement and rollout
A fabric this different fails on people, not packets, if the rollout is silent. Store staff who do not understand why their tablet now routes differently will flood the helpdesk; security analysts who do not know how to read a ZPA policy will mis-grant access. Enablement is wired into the cutover itself: a short Moodle course is published per rollout wave — one track for store-and-helpdesk staff explaining the new access model and how to report a problem, one for the network and SOC teams on writing policy-as-code and reading Zscaler logs — and the ServiceNow change to close a store’s cutover ticket is gated on that store’s staff completing the course. Training is not an afterthought tacked on at the end; it is a dependency in the project plan, because the day MPLS goes away is the day the network’s correctness becomes everyone’s job.