AI/ML GCP

GCP Enterprise Architecture: Retail Recommendation Engine

A recommendation engine is the single highest-leverage piece of machine learning most retailers ever ship. Done well, it lifts revenue per session by double digits without touching acquisition spend. Done badly, it recommends the umbrella the customer already bought, tanks page-load latency, and quietly trains itself on a feedback loop of its own bad guesses. This article is a complete, reusable GCP reference architecture for getting it right — from the first clickstream event to the ranked carousel that renders on the product page, and the Looker dashboards the merchandising team actually trusts.

The business scenario

Picture a mid-market omnichannel retailer — call the segment “₹500 crore to ₹5,000 crore annual GMV.” They sell across a web storefront, native mobile apps, and a few hundred physical stores with a loyalty programme that ties the channels together. They already have a data warehouse, a tag manager firing clickstream events, and a product catalogue that changes daily as SKUs go in and out of stock. What they do not have is a recommendation system that earns its keep.

The symptoms are familiar across the whole size range, from a single-brand DTC shop to a large multi-banner group:

The business goal is concrete and modest enough to be credible: increase revenue per visit by serving relevant, fresh, in-stock recommendations on the homepage, product detail page (PDP), cart, and post-purchase email — and prove the lift with a clean A/B framework. The technical goal is a system that ingests behavioural events in near-real-time, keeps a unified view of customer and catalogue, serves a ranked list in well under 100 ms at the edge, and gives merchandisers a Looker cockpit to monitor and steer it.

GCP is a natural fit here because the hard parts — a managed recommender that handles cold-start, a warehouse that doubles as a feature store, and a streaming bus — are first-party services that integrate without glue code you have to maintain.

Architecture overview

The system is best understood as four planes that share data but scale and fail independently: an ingestion plane, an analytics/feature plane, a serving plane, and a insight plane. Picture the diagram as a left-to-right flow with a vertical “data backbone” running through the middle.

GCP retail recommendation engine reference architecture: clickstream sources feed Pub/Sub and Dataflow into a central BigQuery backbone, which fans out to Vertex AI Search for commerce and a custom Vertex AI Endpoint behind a Cloud Run recommendations API with a Memorystore cache, while Looker reads BigQuery for insight — request path numbered one to eight.

Ingestion (left edge). Every customer touchpoint — web SDK, mobile SDK, server-side checkout service, in-store POS, and the loyalty CRM — emits structured events. View, add-to-cart, purchase, and search events flow into Pub/Sub topics. Pub/Sub is the shock absorber: it decouples bursty client traffic from downstream processing and fans the same event stream out to multiple consumers. A Dataflow streaming pipeline subscribes, validates and enriches events (resolving anonymous IDs to loyalty IDs where possible), and writes them two places at once: into BigQuery for analytics, and into Vertex AI Search for commerce (the service formerly branded Recommendations AI) as real-time user events so the model sees behaviour within seconds.

Analytics and features (centre — the backbone). BigQuery is the gravitational centre of the architecture. It is the system of record for raw events, the place where the product catalogue is curated and joined to inventory, and the engine that computes engineered features (recency/frequency/monetary aggregates, category affinity, session embeddings). A scheduled BigQuery pipeline publishes the cleaned product catalogue to Vertex AI Search for commerce. The same BigQuery features feed a Vertex AI Feature Store so that custom models can be trained and served on consistent online/offline features.

Serving (right side). Two complementary serving paths exist, and choosing between them is the central architectural decision (covered in When to use it). Path A is the managed Vertex AI Search for commerce / Recommendations AI Predict API: you call a serving config (e.g. “recommended for you”, “frequently bought together”, “others you may like”) and it returns a ranked, optionally personalised list, handling cold-start and freshness internally. Path B is a custom two-tower model trained in Vertex AI, deployed to a Vertex AI Endpoint, with a Vertex AI Vector Search index for fast approximate-nearest-neighbour candidate retrieval and a re-ranking model on top. Both paths sit behind an internal recommendations API (Cloud Run) that the storefront’s BFF (backend-for-frontend) calls; results are cached in Memorystore (Redis) keyed by user + surface + context, with a short TTL to keep them fresh.

Insight (top-right). Looker sits on top of BigQuery via a governed semantic model (LookML). Merchandisers and analysts get dashboards for recommendation coverage, click-through and attach rate by surface, A/B experiment results, and model drift. Looker’s modelling layer is what turns the raw event tables into trustworthy, consistent metrics that the business and the data-science team agree on.

The end-to-end request path for a personalised carousel: the browser requests a PDP, the BFF calls the Cloud Run recommendations API with the user/visitor ID, surface, and current product context, the API checks Redis, on a miss it calls the Vertex AI Predict endpoint, filters the candidates against a live in-stock set, writes the result to Redis, and returns it. Meanwhile the view event the customer just generated is already flowing through Pub/Sub into both BigQuery and the recommender, so the next call reflects what they just did.

Component breakdown

Component GCP Service Role in the architecture Key configuration choices
Event bus Pub/Sub Durable, fan-out ingestion of all behavioural events Separate topics per event type; schema-validated messages; dead-letter topic; ordering keys only where strictly needed
Stream processing Dataflow (Apache Beam) Validate, enrich, deduplicate, identity-stitch, dual-write Streaming engine; exactly-once to BigQuery via Storage Write API; windowed dedup; DLQ for malformed events
Warehouse / feature engine BigQuery System of record, catalogue curation, feature computation, A/B analytics Partition event tables by date, cluster by user/product; BI Engine reservation for Looker; scheduled queries for feature refresh
Managed recommender Vertex AI Search for commerce (Recommendations AI) Cold-start-safe, freshness-aware personalised recommendations Import catalogue + user events; choose model type per surface; set optimisation objective (CTR / revenue / conversion)
Custom modelling Vertex AI (Training, Endpoints, Feature Store) Two-tower retrieval + re-ranker when business logic exceeds the managed model Custom training on TFX/Keras; online Feature Store; autoscaling endpoints with min replicas for latency
Candidate retrieval Vertex AI Vector Search Millisecond ANN lookup over item embeddings for the custom path ScaNN index; tuned leaf_node_embedding_count; deployed index with autoscaling
Serving API Cloud Run Stateless recommendations service: cache, fallback, business filters Min instances to avoid cold starts; concurrency tuned; per-surface fallback rails
Online cache Memorystore for Redis Sub-millisecond cache of computed recommendation lists Short TTL (30–120 s); key = visitor+surface+context hash; Standard HA tier
Inventory / catalogue source BigQuery + Cloud Storage Authoritative product + inventory feed Hourly inventory delta; full catalogue reconcile daily
Insight & governance Looker Semantic model, dashboards, experiment readouts, drift monitoring LookML metrics; row-level access; PDTs for heavy aggregates
Secrets & config Secret Manager API keys, model/serving-config IDs, connection strings Versioned secrets; accessed via workload identity, never baked into images

A few components deserve a closer look.

Pub/Sub as the decoupler. The reason events go to Pub/Sub first, rather than straight to BigQuery, is resilience and fan-out. A flash sale can 10x event volume in seconds; Pub/Sub absorbs that without back-pressuring the storefront, and the same stream feeds Dataflow, the recommender, and any future consumer (fraud, real-time inventory) without re-instrumenting clients. The dead-letter topic ensures a single malformed event schema deploy doesn’t silently drop data.

BigQuery as both warehouse and feature engine. This is the design choice that keeps the architecture lean. Rather than standing up a separate feature platform, the engineered features (RFM aggregates, 30-day category affinity, trending-in-your-region signals) are SQL on partitioned, clustered event tables. For the custom-model path those features are materialised into Vertex AI Feature Store so that the exact same feature definitions are available at training time (offline) and serving time (online), eliminating training/serving skew — the most common cause of “the model looked great in the notebook and flopped in production.”

The managed recommender’s surface model. Vertex AI Search for commerce maps to retail surfaces directly: “Recommended for You” (homepage, personalised), “Others You May Like” and “Similar Items” (PDP), “Frequently Bought Together” (cart/PDP), and “Recently Viewed”. Each is a serving config backed by a model with an explicit business objective. Picking revenue per session vs click-through rate as the objective materially changes behaviour — CTR optimisation can over-favour cheap, high-engagement items, so cart and PDP surfaces usually optimise for conversion/revenue while discovery surfaces optimise for engagement.

Memorystore with deliberately short TTLs. Caching recommendations is in tension with freshness. The resolution is a short TTL (tens of seconds) plus cache keys that include behavioural context, so a customer who just added an item gets a fresh computation while a thundering herd on a popular PDP is still absorbed. The cache exists for tail-latency protection and cost control, not to serve stale results for minutes.

Implementation guidance

Project and environment layout. Use a multi-project structure governed by Terraform and an organisation hierarchy: a host project for shared VPC and DNS, plus dev / staging / prod service projects. Recommendation workloads live in the service projects; BigQuery datasets for raw, curated, and feature layers are separated so IAM can grant analysts curated access without exposing raw PII. A dedicated looker connection service account reads only the curated and feature datasets.

Infrastructure as Code. Terraform is the right default on GCP (Deployment Manager is effectively legacy; Config Connector/KCC is an option if you are all-in on GKE/Kubernetes, but Terraform is more common for this mix). Structure it as composable modules:

Keep all model IDs, serving-config IDs, and connection strings in Secret Manager, referenced by Terraform outputs — never hard-code them in the Cloud Run image.

Networking. Run a Shared VPC from the host project. Cloud Run uses a Serverless VPC Access connector to reach Memorystore and any private endpoints. Enable Private Service Connect / Private Google Access so traffic to BigQuery, Pub/Sub, and Vertex AI stays on Google’s backbone rather than the public internet. Front the public-facing storefront with the Global External Application Load Balancer and Cloud Armor (WAF + rate limiting); the recommendations API itself is internal and is only reached by the BFF, not exposed directly to browsers. Put VPC Service Controls around the analytics projects to create a perimeter that prevents data exfiltration from BigQuery/Vertex AI even with valid credentials.

Identity wiring. Every workload uses a dedicated, least-privilege service account; no service account keys — use workload identity (for GKE) or the attached runtime service account (for Cloud Run/Dataflow). Concretely: the Dataflow SA gets pubsub.subscriber + bigquery.dataEditor (on the raw dataset only) + the Retail event-write role; the Cloud Run SA gets the Vertex AI predict/Retail user role, redis access, and secretmanager.secretAccessor; the Looker SA gets bigquery.dataViewer on curated/feature datasets plus bigquery.jobUser. Human access is via Google Groups bound to IAM roles, never individual grants, so onboarding/offboarding is a group membership change.

Event contract. Standardise on a single event schema (validated by Pub/Sub schemas) carrying visitor_id, optional user_id (loyalty), event_type, product_details, session_id, timestamp, and channel. This is the contract both BigQuery analytics and the Retail user-event API consume, so investing in it up front avoids divergent event definitions later.

CI/CD. Cloud Build (or GitHub Actions) builds the Dataflow and Cloud Run images, runs terraform plan on PRs with manual approval to apply to prod, and runs LookML validation against a Looker dev branch. Model retraining is orchestrated by Vertex AI Pipelines on a schedule, with the managed recommender retraining automatically as fresh events arrive.

Enterprise considerations

Security and Zero Trust. The perimeter is defence-in-depth: Cloud Armor at the edge, an internal-only recommendations API, Shared VPC with private connectivity to all data services, and VPC Service Controls around the analytics estate. Identity is the new perimeter — every call is authenticated with a least-privilege service account and authorised per-dataset; no long-lived keys. PII handling matters because behavioural data is personal: use BigQuery column-level access and dynamic data masking so analysts see hashed user IDs unless they have explicit need, apply Sensitive Data Protection (DLP) to scan for accidental PII in event payloads, and keep raw and curated datasets in separate IAM domains. CMEK (customer-managed keys via Cloud KMS) encrypts BigQuery, Pub/Sub, and Storage where compliance requires key control. Audit everything with Cloud Audit Logs exported to a locked log sink.

Cost optimisation. The largest line items are BigQuery and Vertex AI serving. Tactics that move the needle: partition and cluster event tables so feature queries scan kilobytes not terabytes; consider BigQuery editions with slot reservations + autoscaling once on-demand spend is predictable; use a BI Engine reservation so Looker dashboards hit cache rather than re-scanning; set Cloud Run min instances just high enough to hide cold starts (latency vs idle cost trade-off); keep Memorystore right-sized — a high cache hit rate directly reduces Vertex AI Predict calls, which are billed per request. Lifecycle-tier raw events to colder storage after the feature window. For the custom model, batch predictions where real-time isn’t required and reserve online endpoints for the surfaces that truly need sub-100 ms.

Scalability. Each plane scales independently. Pub/Sub and BigQuery are effectively limitless for this workload; Dataflow autoscales workers to event volume; Cloud Run scales to traffic with concurrency tuning; Vertex AI endpoints autoscale on QPS (set a sensible min/max). The Redis cache flattens read amplification on hot products. The architecture comfortably spans a single-brand shop doing thousands of events a day to a large group doing tens of thousands of events a second during a sale, with the only changes being reservation sizes and replica counts.

Reliability and DR (RTO/RPO). Pub/Sub retains and replays messages, so a downstream outage doesn’t lose data (effective RPO near zero for events in flight; default retention up to 7 days). BigQuery is regional with automatic replication and supports cross-region dataset copies and time-travel (7-day) for recovery from bad writes — for stricter DR, configure scheduled cross-region copies of curated/feature datasets. The serving plane is designed to degrade gracefully: if the Vertex AI endpoint is unavailable, the Cloud Run API falls back to a cached or BigQuery-derived “popular in category / trending” rail so the page never renders empty — this is the most important reliability property, because a broken recommender should be invisible to the shopper. Target RTO < 1 hour for full personalised serving (failover region for endpoints) and RTO near zero for a degraded-but-functional experience via fallback rails. Run the serving plane multi-region behind the global load balancer for true high availability.

Observability. Use Cloud Monitoring/Logging/Trace for golden signals on the serving path: recommendation API p50/p95/p99 latency, cache hit ratio, Vertex AI Predict error rate, and Dataflow system lag (the freshness SLO — how many seconds between event and model visibility). Business observability lives in Looker: coverage (what fraction of sessions got a recommendation), CTR and attach rate per surface, revenue attributed to recommendations, and model drift indicators (shifts in recommended-category distribution). Alert on data-freshness lag and on a drop in recommendation coverage — both are leading indicators of customer-visible degradation.

Governance. Catalogue the estate in Dataplex / Data Catalog with tags for PII and data domains. Looker’s semantic layer is the governance keystone: defining “attach rate” and “recommendation-attributed revenue” once in LookML means finance, merchandising, and data science argue about strategy, not about whose number is right. Enforce row-level security in Looker so a banner manager sees only their banner. Keep an experiment registry so every A/B test’s design, dates, and result are recorded and recommendation changes are tied to measured lift, not vibes.

Reference enterprise example

Company: Saanjh Living — a fictional ₹1,800-crore-GMV omnichannel home and lifestyle retailer (furniture, decor, kitchenware) with a web store, iOS/Android apps, 140 stores, and a 6-million-member loyalty programme. Catalogue: ~90,000 active SKUs with high churn (seasonal decor, fast-moving kitchenware). Pain: the homepage showed identical top-sellers to everyone, PDP cross-sell was a static “you may also like” curated by hand, and merchandising couldn’t quantify recommendation impact.

What they built. Saanjh adopted the managed path first (Path A) to get to value fast. Web and app SDKs and the checkout service publish to four Pub/Sub topics (view, add_to_cart, purchase, search) averaging ~9,000 events/sec at peak (Diwali) and ~1,200/sec on a normal weekday. A Dataflow streaming job dual-writes to a partitioned/clustered BigQuery events_raw dataset and to Vertex AI Search for commerce as user events. A nightly BigQuery scheduled query reconciles the full catalogue and an hourly query pushes inventory deltas, so the recommender never surfaces out-of-stock SKUs. Surfaces: “Recommended for You” on home (objective: CTR), “Frequently Bought Together” on cart (objective: revenue), “Similar Items” on PDP (objective: conversion).

The Cloud Run recommendations API (min 3 instances per region, two regions) sits behind the app BFF, caches in Memorystore with a 60-second TTL, and falls back to a BigQuery “trending in category” rail on any endpoint error. Looker, with a 200-slot BI Engine reservation, gives merchandising a daily cockpit and powers the A/B readout.

Decisions and numbers.

Outcome. Saanjh rolled the treatment to 100%, kept the A/B framework permanently to gate future model changes, and the merchandising team now steers objectives per surface from a Looker dashboard rather than filing tickets. The fallback rail proved its worth during a regional Vertex AI hiccup — shoppers saw “trending” recommendations and never knew the personalised model had blipped. Twelve months in, they began the custom Vector Search bundle recommender as Phase 2, reusing the same Pub/Sub + BigQuery + Feature Store backbone with zero re-instrumentation.

When to use it

Use this architecture when you have meaningful behavioural volume (thousands of events a day and up), a catalogue that changes faster than nightly batch can track, multiple surfaces to personalise, and a real need to prove lift. It scales down to a single-brand DTC store (drop the multi-region serving and the custom path) and up to a large multi-banner group (add Vector Search, per-banner Looker row-level security, and reserved slots).

Prefer the managed Vertex AI Search for commerce path when you want fast time-to-value, your team is small, and cold-start + freshness are your hard problems — it handles all three without you owning a model lifecycle. Reach for the custom Vertex AI two-tower + Vector Search path when the business logic is genuinely beyond a catalogue recommender: bundle/“complete the look” reasoning, multi-objective ranking that blends margin and inventory-clearance goals, or recommendations over a non-product entity (content, services). You can run both side by side — managed for the standard surfaces, custom for the differentiated one — on the same data backbone.

Anti-patterns to avoid:

Alternatives. If you are already deep in a different stack, the equivalent patterns are Amazon Personalize + Kinesis + Redshift + QuickSight on AWS, or Azure Personalizer/Azure AI + Event Hubs + Synapse/Fabric + Power BI on Azure. The GCP version’s distinctive strength is that BigQuery doubles as warehouse and feature engine and the managed retail recommender removes the hardest ML lifecycle work — which is precisely why it is the pragmatic default for retailers who want lift, not a research project. For very small catalogues or low traffic, a simpler heuristic (“bought together” computed in BigQuery and served from Redis) may be all you need until volume justifies the full engine.

GCPArchitectureEnterpriseReference Architecture
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading