Architecture GCP

Real-Time Payments Fraud Scoring Pipeline on GCP

A mid-tier card network — the kind that sits between a few hundred issuing banks and the merchant acquirers, switching roughly 9,000 authorization messages a second at peak — gets a board-level ultimatum after a bad quarter. Card-not-present fraud losses are up, a single coordinated bot attack on a weekend pushed chargeback ratios past the threshold that triggers brand-scheme fines, and the existing rules engine — a wall of hand-written IF amount > X AND country != home THEN decline rules accreted over fifteen years — is simultaneously catching too little real fraud and declining too many good customers. The head of risk frames it bluntly: “Every authorization that crosses our switch must get a fraud score, the score must come back before we have to answer the issuer, and the model has to learn from last week’s attack — not last year’s.” The hard constraint is the one that makes this an engineering problem and not a data-science notebook: the network’s processing SLA gives the fraud decision a budget of about 100 milliseconds, end to end, at p99, inside an authorization flow that itself has only a few hundred milliseconds before a timeout becomes a forced approval. This article is the reference architecture for building that scoring service properly on Google Cloud — a streaming, low-latency, governed fraud pipeline that a card network’s risk officer, CISO, and scheme auditor will all sign.

The pressures stack the way they always do in payments. Latency is non-negotiable and adversarial: blow the budget and the authorization either times out (a forced approval, which is exactly when the fraudster wins) or declines a legitimate cardholder at the point of sale (a customer-experience and revenue failure). Scale means 9,000 transactions per second sustained with spikes at retail peaks, every one needing a score, with no warm-up. Accuracy under drift means fraud patterns mutate weekly — a model trained on June’s attack is blind to July’s — so the system has to retrain and redeploy without a maintenance window. And regulation means PCI-DSS scope around the cardholder data, model-governance and explainability obligations under emerging AI rules, and an audit trail that proves why any given transaction was declined. A batch job that scores yesterday’s transactions overnight is worthless here; the decision has to happen inside the swipe.

Why not the obvious shortcuts

Three shortcuts will be proposed in the first design meeting, and each fails in a way worth naming so the room can move past them.

Keep the rules engine and just add more rules. Rules are fast and explainable, which is why they survive, but they are static, brittle, and exploitable: an attacker probes the thresholds and walks transactions just under them. Rules cannot weigh a hundred weak signals together the way a model can, and the maintenance burden compounds until no one dares touch a rule for fear of what it silently catches.

Score in batch and cache a risk grade per card. Precomputing a nightly risk score per card and looking it up at authorization time is gloriously fast — but it is blind to the transaction in front of it. The whole signal in card-not-present fraud is velocity and context: five transactions on this card in the last ninety seconds across three countries. A cached overnight grade cannot see the ninety seconds that just happened.

Call a hosted model API synchronously per transaction. Putting a network hop to a generic model endpoint in the authorization path adds tail latency you do not control and a dependency you cannot bound at p99. At 9,000 TPS the per-call overhead and the occasional multi-hundred-millisecond tail will blow the budget on their own, before the model even runs.

The architecture that actually works splits the problem in two: compute the expensive, stateful features continuously on a stream so they are already sitting in a low-latency store before the transaction arrives, and at decision time do only the cheap part — fetch precomputed features, assemble a vector, and call a model that returns in single-digit milliseconds. The streaming pipeline pays the cost of statefulness ahead of time; the synchronous path stays thin enough to fit the budget.

Architecture overview

Real-Time Payments Fraud Scoring Pipeline on GCP — architecture

The platform runs two paths that share state but live on completely different latency regimes, and keeping them separate in your head is the first step to operating this well: an asynchronous streaming path that ingests the firehose of authorization events and continuously maintains features, and a synchronous decision path that must return a score inside the 100ms budget. They meet at exactly one place — the Bigtable feature store — which the stream writes to and the decision path reads from.

The defining property of the whole topology is that nothing slow is allowed on the synchronous path. No stream join, no aggregation, no model training, and no cross-region call happens while a transaction is waiting. All of that lives on the streaming side. The decision path does three cheap things — read features, build a vector, predict — and returns.

Synchronous decision path, following the request:

  1. The network’s authorization switch holds the in-flight ISO 8583 / ISO 20022 message and makes a low-latency gRPC call to the scoring service — a stateless app on GKE (a regional, private cluster) fronted by an internal load balancer. This call is on the critical path, so it never leaves the VPC and never touches the public internet.
  2. The scoring service reads the card’s and merchant’s precomputed features from Bigtable with a single-digit-millisecond point lookup keyed by card token and merchant id. These features were computed seconds ago by the streaming path and are already warm.
  3. It assembles the feature vector — the freshly-read velocity and behavioral features plus a handful of request-time fields (amount, MCC, channel) — and calls a Vertex AI online prediction endpoint hosting the fraud model. The endpoint runs in the same region with the model loaded in memory, returning a probability in a few milliseconds.
  4. The service applies the decision policy — a calibrated threshold plus a thin layer of non-negotiable hard rules (a known-compromised BIN, a sanctioned geography) that risk insists stay deterministic and explainable — and returns score + decision + reason codes to the switch. The switch approves, declines, or steps up to 3-D Secure.
  5. The complete decision record — features used, score, threshold, reason codes — is published to a Pub/Sub outcome topic for the audit trail and for later label-joining, asynchronously, off the critical path.

Asynchronous streaming path, independent and continuous:

  1. Every authorization event the switch processes is published to a Pub/Sub ingestion topic — a fully managed, globally durable buffer that absorbs the 9,000 TPS firehose and decouples the volatile switch from the pipeline behind it.
  2. A Dataflow streaming job (Apache Beam) consumes the topic and does the stateful heavy lifting: windowed aggregations per card and per merchant (transaction count and amount over sliding 1-minute, 5-minute, and 1-hour windows), velocity features (distinct countries, distinct merchants, time since last transaction), and enrichment joins. Dataflow’s exactly-once processing and watermark-based windowing are what make these counts correct under out-of-order and late events.
  3. Dataflow writes the updated feature values to Bigtable continuously, so by the time the next transaction on that card arrives milliseconds or seconds later, its velocity features already reflect the one that just happened. This write-ahead-of-read is the entire trick.
  4. In parallel, raw enriched events land in BigQuery as the historical store for offline training, analytics, and the eventual fraud labels (chargebacks, confirmed-fraud reports) that arrive days later and become training targets.

Training & retraining loop, on a slower cadence still: Vertex AI Pipelines orchestrate scheduled retraining — pull labeled history from BigQuery, recompute training features with the same logic the stream uses, train and evaluate the model, register it in the Vertex AI Model Registry, and roll it out behind a traffic split. The streaming features and the training features must be computed identically, or the model sees one distribution in training and another in production — the train/serve skew that quietly destroys fraud models.

Component breakdown

Component Service / tool Role in the pipeline Key configuration choices
Ingestion buffer Pub/Sub Durable, decoupling firehose for all authorization events Regional topic; ordering keys off (throughput); dead-letter topic
Stream processing Dataflow (Beam) Windowed aggregations, velocity features, exactly-once writes Streaming engine; sliding windows; autoscaling workers
Feature store Bigtable Single-digit-ms feature reads on the decision path SSD cluster; row key = card token; column families per feature group
Decision service GKE (regional, private) Thin synchronous path: read → vectorize → predict → decide Workload Identity; HPA on RPS; internal load balancer
Model serving Vertex AI online prediction Low-latency fraud probability from the served model Dedicated endpoint; in-region; traffic split for canaries
Historical store BigQuery Training data, analytics, label store Partitioned by date; column-level access for PAN-adjacent fields
Training orchestration Vertex AI Pipelines + Model Registry Scheduled retrain, eval, versioned model rollout KFP pipeline; eval gate; champion/challenger split
Identity / SSO Okta + Entra ID Workforce SSO for analysts/engineers into GCP and dashboards OIDC federation to Cloud Identity; group-mapped IAM
Secrets HashiCorp Vault Issuer-API tokens, signing keys, third-party feed creds Dynamic leases; GKE auth; Vault Agent sidecar injection
CSPM / data posture Wiz Cloud posture, PAN-exposure detection, attack-path analysis Agentless scan of Bigtable/BigQuery/buckets; public-exposure alerts
Runtime security CrowdStrike Falcon Runtime threat detection on GKE nodes and Dataflow workers Sensor on node pools; detections streamed to the SOC
Observability / SLOs Datadog Decision-latency SLOs, drift monitors, pipeline lag, dashboards OTel traces on decision span; p99 latency SLO; lag monitors
ITSM / change ServiceNow Model-promotion change records, incident tickets Change gate before a model goes to 100%; auto-ticket on SLO breach
Edge Akamai Edge protection for the issuer/analyst web surfaces (not the switch path) WAF, bot mitigation on portals; not in the authorization path
CI / IaC GitHub Actions + Terraform Infra as code; pipeline build/test/eval gate OIDC to GCP (no stored keys); eval gate before promote

A few of these choices deserve the why, because they are the ones teams get wrong.

Why Bigtable as the feature store, not a relational cache or a generic key-value store. The decision path’s read is the most latency-sensitive operation in the whole system, and it happens 9,000 times a second. Bigtable gives single-digit-millisecond point reads at that throughput with a flat latency profile that holds as data grows, because the row key is designed for exactly this lookup — cardToken#reversedTimestamp style keys put a card’s hot features on a predictable, well-distributed row. A relational store would add query-planner variance and connection-pool contention you cannot bound at p99; a smaller cache would not hold the full card population. The discipline that matters: design the row key around the read pattern, and split features into column families so the decision path reads only the groups it needs.

Why Dataflow for the features, not a microservice doing its own counting. The hard part of velocity features is correctness under disorder — events arrive late, out of order, and occasionally twice, and a naive counter double-counts or misses, producing features that lie to the model. Dataflow’s Beam model gives you watermarks (a principled notion of “how late is too late”), windowing (sliding windows for “last 5 minutes” that update as time advances), and exactly-once state, so the count of “transactions on this card in the last minute” is actually correct. Rebuilding that correctly in application code is a project unto itself, and getting it subtly wrong is how fraud models silently degrade.

Why the feature logic must be shared between stream and training. This is the single most common and most damaging mistake in real-time ML. If the streaming job computes “distinct countries in the last hour” one way and the training job computes it another way from BigQuery history, the model is trained on a distribution it never sees in production — train/serve skew — and its real-world accuracy collapses while offline metrics look fine. The fix is to factor the feature transformations into a shared library invoked by both the Beam pipeline and the Vertex AI Pipelines training step, and to validate parity continuously.

Implementation guidance

Provision with Terraform, and treat the VPC and private connectivity as the first deliverable. Everything on the decision path — GKE, Bigtable, the Vertex AI endpoint — must reach each other over private networking with no public egress, or you have both a latency tax and a PCI scope you do not want.

  1. A VPC with subnets for GKE, a Private Service Connect / private-services range for managed services, and Private Google Access so the cluster reaches Bigtable and Vertex AI without traversing the internet.
  2. Bigtable provisioned with an SSD cluster sized to the read QPS (not just storage), with autoscaling on node count.
  3. The regional private GKE cluster with Workload Identity enabled and an internal load balancer for the switch’s gRPC call.
  4. The Vertex AI endpoint deployed in the same region as GKE and Bigtable — cross-region hops are latency you cannot afford.
  5. Pub/Sub topics (ingestion + outcome + dead-letter) and the Dataflow streaming job with the streaming engine and autoscaling enabled.

A minimal Terraform shape for the Bigtable feature store communicates the intent — SSD for latency, autoscaling for the firehose:

resource "google_bigtable_instance" "feature_store" {
  name = "fraud-feature-store-prod"

  cluster {
    cluster_id   = "fraud-fs-prod-c1"
    storage_type = "SSD"            # SSD, not HDD — single-digit-ms reads
    zone         = "asia-south1-a"
    autoscaling_config {
      min_nodes      = 6
      max_nodes      = 30
      cpu_target     = 60           # scale ahead of the read firehose
    }
  }
  deletion_protection = true
}

The pipeline that applies this runs in GitHub Actions, authenticating to GCP via Workload Identity Federation (OIDC) so there is no long-lived service-account key sitting in a secret to leak — a hard rule for anything that touches a payments environment. The same pipeline runs the model evaluation harness (below) as a required gate before any promotion.

Identity: federate the humans, kill the static keys. Analysts, data scientists, and on-call engineers reach BigQuery, the Vertex dashboards, and Datadog through SSO: the workforce IdP is Okta, federated (for the shops that also run Microsoft estates, via Entra ID) into Google Cloud Identity over OIDC, with Okta groups mapped to GCP IAM roles so a data scientist gets BigQuery read on the analytics dataset but never on the PAN-adjacent columns, and only the SRE group can touch the production endpoint. Conditional-access and adaptive MFA live in Okta. Workloads authenticate with Workload Identity — the GKE scoring service and the Dataflow workers assume scoped service accounts (Bigtable read, Vertex predict, Pub/Sub publish) with no key files anywhere. The handful of residual secrets that are not service identities — issuer-callback API tokens, a third-party device-fingerprint feed credential, a signing key for the outcome records — live in HashiCorp Vault, leased dynamically and injected by the Vault Agent sidecar, so they are short-lived and never written to a Kubernetes Secret or a container image.

Feature and serving wiring. Define the feature schema once and share it: column families in Bigtable grouped so the decision path reads only what it needs (velocity, behavior, merchant), the same field definitions emitted by the Beam pipeline, and the identical transformation library imported by the training pipeline. Serve the model on a Vertex AI dedicated endpoint sized so the model stays resident in memory (cold starts are fatal at p99), and roll new models out behind a traffic split — 5% to the challenger, watch the metrics, then ramp — rather than a hard cutover. Keep the threshold and the deterministic hard rules in version-controlled config, reviewable and instantly revertable, because the threshold is a business lever (the fraud-catch versus false-decline tradeoff) that risk will want to tune without a redeploy.

Enterprise considerations

Security, PCI scope, and Zero Trust. Payments raises the bar past ordinary cloud security. (a) Tokenize the PAN before it ever reaches this pipeline — the scoring service operates on a card token, not the primary account number, which keeps the bulk of this system out of PCI-DSS cardholder-data scope; only the tokenization boundary and the narrow stores that touch PAN-adjacent data stay in scope. (b) Everything on the decision path is private-networking only, identity-based access, least-privilege service accounts per workload — Zero Trust by construction. © Wiz runs continuous CSPM and sensitive-data scanning across Bigtable, BigQuery, and Cloud Storage, alerting the moment a dataset drifts toward public exposure or an IAM binding widens access to PAN-adjacent fields — the posture backstop behind the policy controls. (d) CrowdStrike Falcon sensors on the GKE node pools and Dataflow workers give runtime threat detection, feeding the network’s SOC. (e) Organization policy denies public IPs on the data-plane resources, and Wiz independently verifies the policy is actually holding. (f) A material control breach — a public-exposure drift, a sustained guardrail failure — auto-raises a ServiceNow incident so security has a ticket, not just a log line.

Cost optimization. Two cost centers dominate — the always-on streaming/serving footprint and the data volume — and both reward engineering.

Lever Mechanism Typical effect
Bigtable right-sizing Autoscale nodes on CPU; SSD only for hot feature data Pay for the read QPS you have, not peak forever
Dataflow streaming engine Decouple compute from worker disk; autoscale on backlog Cuts worker count off-peak without losing state
Endpoint autoscaling Scale Vertex replicas on QPS with a warm floor Avoids paying for peak replicas 24/7
BigQuery partitioning Partition by date, cluster by card; prune on read Slashes scan cost on training and analytics queries
Feature TTL Age out cold cards’ features from Bigtable Bounds the hot dataset and its node count

Tag and label every resource by environment and cost center, pipe spend metrics to Datadog, and let the FinOps team see fraud-platform cost per million transactions scored — the unit economic the CFO actually asks about.

Scalability. Each tier scales independently and the whole point of the split is that nothing slow blocks the firehose. Pub/Sub absorbs spikes natively — it is a buffer, so a switch surge becomes backlog, not backpressure on the source. Dataflow autoscales workers on the backlog and watermark lag. Bigtable scales out on node count (read QPS) with the row-key design ensuring the load spreads rather than hot-spotting one tablet — the classic Bigtable failure is a sequential row key that funnels all writes to one node. The GKE scoring service scales pods on requests-per-second behind the internal LB, and the Vertex endpoint scales replicas on QPS with a warm floor so a scale-up never cold-starts into the latency budget. The natural ceiling is regional capacity, which is why a network at this volume reviews quotas and plans a second region early.

Failure modes, and what each one looks like. Name them before they page you.

Reliability & DR (RTO/RPO). Decide the numbers per tier and pick them around the business reality that the scoring service going dark is itself a fraud event — you must keep deciding. The fallback is the safety net: if Vertex or Bigtable is unreachable, the scoring service fails open to the deterministic hard-rule set and flags those transactions for offline review, so authorizations keep flowing with degraded protection rather than timing out into forced approvals. For genuine regional loss, run the decision path active in a second region with Bigtable replication keeping the feature store warm and the model deployed in both regions; the switch routes to the healthy region. BigQuery and the durable Pub/Sub history are the rebuild source of truth. A pragmatic target: RTO under 5 minutes to the second region with fail-open rules covering the gap, and RPO near zero for the decision audit trail (every outcome is published durably to Pub/Sub before the service returns).

Observability and SLOs. This system lives or dies on latency, so the decision-latency SLO is the headline metric: instrument the decision span end to end in Datadog with OpenTelemetry — one trace covering Bigtable read → vector assembly → Vertex predict → policy → return — with a hard p99 < 100ms objective and an error budget that pages before it is exhausted, not after. Beyond latency, monitor the metrics the risk team actually cares about: pipeline/watermark lag, feature freshness, score-distribution drift, fraud-catch rate and false-decline rate (the business tradeoff), fallback-activation rate (how often you degraded to rules), and Pub/Sub backlog. Run an offline evaluation harness in the GitHub Actions pipeline so every candidate model is scored on precision/recall at the operating threshold before it can be promoted, and every model promotion passes a ServiceNow change gate so risk and audit have a documented, reversible record of which model decided which transactions.

Governance and explainability. Payments regulators and the card schemes increasingly require that an automated decline be explainable. Keep reason codes on every decision — the top contributing features — and persist them with the outcome record so a disputed decline can be reconstructed. Pin model versions explicitly in the registry (never a floating “latest”), promote through the eval and ServiceNow gates, and keep the decision policy and thresholds in version control. Log every scored decision — features, score, model version, reason codes — durably to the outcome topic and BigQuery for audit, dispute resolution, and as future training data, under the retention the scheme rules require.

Explicit tradeoffs

Accept these or do not build it. The streaming-feature architecture is genuinely more complex than a rules engine: you now operate a Pub/Sub firehose, a stateful Dataflow job whose correctness depends on watermarks you must understand, a feature store whose row-key design you must get right, and a retraining loop. The 100ms budget forces real discipline — everything slow must be precomputed, every component must be in-region, and the model must stay warm — and the price of getting it wrong is not a slow page, it is a forced approval or a declined customer at checkout. A model is also less transparent than a rule; you buy accuracy and adaptability at the cost of explainability, which is why the reason-code and hard-rule layer is not optional in a regulated payments context. And the platform’s standing cost — the always-on streaming and serving footprint, the second region, the Wiz/CrowdStrike/Datadog tooling — is overhead you cannot amortize away at low volume; this design earns its keep at a card network’s scale and would be over-engineered for a single merchant’s checkout.

The alternatives, and when they win. If your volume is modest and latency is forgiving, a synchronous feature computation (read recent transactions and aggregate them at decision time) skips the streaming pipeline and is far simpler — it just will not hold at 9,000 TPS inside 100ms. If you need maximum explainability and can accept lower accuracy, a modern rules-plus-gradient-boosted-model hybrid scored mostly on request-time features keeps the architecture lighter. If you would rather not operate the ML lifecycle at all, a managed fraud-detection service (a turnkey fraud API) trades control and customizability for speed-to-launch — reasonable for a smaller issuer, but a card network needs the control to model its own scheme-specific fraud and to own the latency budget. And if your fraud is dominated by account-level patterns rather than per-transaction velocity, a graph-based approach (entity-resolution and link analysis over the BigQuery history) complements this pipeline rather than replacing it — run it on the streaming side and feed its signals in as features.

The shape of the win

For the card network’s risk desk, the payoff is not “an ML model.” It is that a card-not-present transaction crosses the switch, gets a fraud score grounded in the velocity of the last ninety seconds with reason codes attached, comes back inside the 100ms budget so the issuer is answered on time, and — because the pipeline retrains weekly and degrades safely to deterministic rules when a component falters — fraud losses fall and false declines fall and the scheme auditor can reconstruct any decision. That combination is the one that ends the board-level pressure. Everything upstream — the Pub/Sub firehose, the watermark-correct Dataflow features, the single-digit-millisecond Bigtable reads, the warm in-region Vertex endpoint, the Okta-federated access, the Vault-held secrets, the Wiz posture scanning, the Datadog latency SLO — exists to make a risk officer, a CISO, and a scheme auditor each say yes. The architecture here is the destination; start narrower if you must, but this is where real-time fraud scoring at a card network’s scale has to land.

GCPVertex AIDataflowBigtablePub/SubFraud Detection
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading