Real-time streaming on AWS goes wrong in a very specific way: a team picks Kinesis because “it’s the streaming service,” writes one fat Lambda that does ingest and enrichment and indexing and archival, hard-codes a shard count that was right for the demo, and then discovers six weeks later that a Monday-morning traffic spike is throttling producers, the Lambda is retrying the same poison-pill record forever, and OpenSearch is on fire because every record is being indexed one at a time. None of that is Kinesis’s fault. It is the absence of an architecture — a deliberate separation between the durable log, the processing tier, the hot operational store, and the cheap analytical landing zone, each scaling on its own axis. This article is that architecture, built end to end on Kinesis Data Streams, Lambda, Kinesis Data Firehose, and Amazon OpenSearch Service.
The single most important idea here is that the stream is a buffer and a contract, not a queue you drain as fast as possible. Kinesis Data Streams gives you a replayable, ordered-per-key, durable log. Everything downstream — the hot path that lights up OpenSearch dashboards in seconds, and the cold path that lands raw events in S3 for cheap reprocessing — reads from that same log independently. Get that split right and the system is boring in the best way: producers never feel consumer pressure, a slow consumer never backs up a fast one, and you can rebuild any derived store by replaying the log.
The business scenario
Picture an operator who has events but no way to act on them while they still matter. This is the same shape at 30 engineers and at 3,000.
The early version: a digital-native business — a payments processor, a multiplayer game studio, a connected-fitness brand, an ad-tech exchange — emits a firehose of events. Transactions authorise, players score, devices report heart rate, bids clear. Today those events trickle into a transactional database and someone runs a query the next morning. The business is steering by a rear-view mirror, and every interesting question is asked in the past tense.
Then the questions that the batch world structurally cannot answer start arriving, and they are the questions the business actually cares about:
- Payments / fintech: “Decline rates for one card BIN range spiked 4x in the last 90 seconds. Is it a fraud attack, an issuer outage, or our own gateway degrading? We need the alert now, and we need to slice the last 15 minutes by issuer, merchant, and geography to triage.” End-of-day reconciliation finds this tomorrow; the fraud ring is gone by then.
- Gaming: “A new build went out 10 minutes ago. Crash events from one device class just went vertical. Roll back before the review-bomb, and show me the live funnel of session-start → match-found → match-complete so I can see exactly where players are dropping.”
- IoT / connected product: “These 200 devices on firmware 4.2 are reporting temperatures that are physically impossible — sensor fault or a real safety issue? Surface the anomaly on the ops wall in seconds, and keep two years of raw telemetry so we can prove what happened and retrain the model.”
- Ad-tech / clickstream: “Win-rate on one exchange just collapsed. Is it a bid-shading change on their side or a latency regression on ours? We bill on this; minutes of blindness are real money.”
Every one of these shares the same structural requirement, and it is the requirement that defines the architecture: one event stream must simultaneously feed a sub-second operational view and a durable analytical archive, and the two must not interfere. The operations dashboard needs the last minutes-to-hours of events, indexed and queryable interactively, with alerting (the hot path). The data scientist, the analyst, and the compliance officer need every raw event, kept cheaply for months or years, reprocessable when the model or the schema changes (the cold path). Bolt both onto one engine and they fight: a heavy analytical scan starves the live dashboard; a burst of ingestion stalls everyone; a schema change forces you to re-ingest from producers you do not control.
The scale-invariance is the reason this belongs in an architecture center. A 30-person startup runs this with a 4-shard stream, a single small OpenSearch domain, and one Firehose. A global platform runs the identical topology with on-demand or 500-shard streams, a multi-AZ OpenSearch cluster with dedicated masters and UltraWarm, and Firehose dynamic partitioning into a partitioned data lake. The shape — durable log, fan-out to hot and cold, independent scaling — never changes. Only the dials move.
The promise to the business: events become actionable in seconds, nothing is ever lost, and any derived view can be rebuilt from the source of truth without going back to the producers.
Architecture overview
The architecture is a single durable log fanned out to two paths that scale independently, plus a thin alerting/serving layer on top. Read it as four tiers.
Tier 1 — Ingest into a durable, replayable log (Kinesis Data Streams). Producers — application SDKs, the Kinesis Producer Library (KPL), mobile/IoT via API Gateway → Kinesis, or other AWS services — write records into a Kinesis data stream. Each record carries a partition key (e.g. device_id, card_bin, session_id). Kinesis hashes that key to a shard; all records with the same key land on the same shard in order, which is what gives you per-entity ordering without a global bottleneck. The stream is the contract: it durably retains every record (default 24h, extendable to 365 days) across three Availability Zones, and — critically — it lets multiple independent consumers read the same records at their own pace. This is the property the whole design hinges on.
Tier 2 — The hot path: stream processing with Lambda. A Lambda function is wired to the stream as an event source mapping. The Lambda service polls the stream, batches records per shard, and invokes your function with up to one concurrent invocation per shard (or far more with parallelization factor / enhanced fan-out — covered below). The function does the low-latency work: validate and parse, enrich (look up a device profile, a merchant category, a geo from IP), compute lightweight derived fields, drop or quarantine bad records, and bulk-index the results into Amazon OpenSearch Service. Because the function is invoked with a batch per shard, it indexes hundreds of documents in a single OpenSearch _bulk call rather than one-by-one — the difference between a healthy cluster and a melted one. Failures are handled by checkpointing semantics: a batch that errors is retried, and poison records are siphoned to an on-failure destination (an SQS DLQ or SNS) instead of blocking the shard forever.
Tier 3 — The cold path: the analytical landing zone (Kinesis Data Firehose). Independently of the Lambda, a Kinesis Data Firehose delivery stream consumes the same Kinesis data stream. Firehose is the fully-managed, zero-code, buffer-and-deliver service: it accumulates records by size or time (e.g. 128 MB or 60 s), optionally transforms them with its own Lambda, optionally converts JSON to Parquet/ORC using a Glue Data Catalog schema, and writes compressed, partitioned objects to S3. With dynamic partitioning, it lays records out as s3://lake/events/event_type=auth/dt=2026-06-09/hour=14/... so Athena/EMR/Redshift Spectrum can prune partitions efficiently. This is your immutable, cheap, query-anytime source of truth — and your replay buffer’s permanent twin. Firehose can also deliver to OpenSearch directly, but in this reference we keep Firehose on the S3/lake path and Lambda on the OpenSearch path, so the two stores fail and scale independently.
Tier 4 — Serving, search and alerting (Amazon OpenSearch Service). OpenSearch holds the hot, operational slice — typically the last hours-to-days of enriched events in time-based indices (events-2026.06.09), rolled and lifecycled by ISM. Engineers and ops teams query it interactively through OpenSearch Dashboards (live funnels, decline-rate-by-issuer panels, device-fault heat maps), and alerting monitors fire when a threshold trips (“declines for BIN X > 4x baseline over 90 s”) into Slack/PagerDuty/SNS. Anomaly detection (the built-in RCF-based feature) catches the spikes you didn’t write a rule for. As data ages, ISM rolls indices from hot nodes to UltraWarm (S3-backed) and then cold storage, so a single domain serves seconds-fresh dashboards and keeps weeks of searchable history without paying hot-node prices for all of it.
The end-to-end data path, following one authorization event from source to action:
- A payment authorises. The producing service writes a JSON record (
{card_bin, merchant_id, amount, result, ts, geo}) withpartition_key = card_binto the Kinesis data stream. Kinesis routes it to the shard owning that BIN, appends it durably across 3 AZs, and acknowledges. - Hot path: the Lambda event source mapping has been polling that shard; it delivers a batch of ~200 records to the function. The function enriches each (issuer name from BIN, merchant category, country from geo), computes
is_decline, and issues one OpenSearch_bulkrequest indexing all 200 intoevents-2026.06.09. Latency from authorise to “visible on the dashboard” is single-digit seconds. - Alert: an OpenSearch alerting monitor running every 60 s over the last 5 minutes sees decline rate for BIN X cross its threshold and pages the on-call fraud analyst with a deep link straight into the filtered dashboard.
- Cold path (parallel, unaware of all the above): Firehose, reading the same stream, has been buffering. At 60 s it converts the batch to Parquet using the Glue schema and writes
s3://lake/auth/dt=2026-06-09/hour=14/part-….parquet. That object is now permanent, cheap, and queryable by Athena. - Triage & history: the analyst slices the last 15 minutes live in OpenSearch to isolate the attack to three merchants, while a data scientist runs an Athena query over three months of the same events in S3 to see whether this BIN has a history — no impact on the live cluster, because they’re hitting entirely different stores.
- Replay (the safety net): a week later the fraud model adds a feature. Rather than ask producers to resend, the team replays: a new Lambda (or an EMR/Flink job) reads the Kinesis stream from a past sequence number, or re-processes the raw S3 Parquet, and rebuilds the derived OpenSearch index from scratch.
The diagram, in words. On the left, a stack of producers (app SDKs/KPL, API Gateway for mobile/IoT, AWS services) all pointing at a single tall cylinder in the center-left labelled Kinesis Data Streams, drawn as a set of parallel horizontal lanes (the shards). From that cylinder, two arrows fan out to the right, and the fact that they’re two separate arrows off the same log is the whole picture. The top arrow goes to a Lambda box (the hot path) which arrows into an OpenSearch Service domain; above OpenSearch sits OpenSearch Dashboards and an Alerting → PagerDuty/SNS badge. The bottom arrow goes to a Kinesis Data Firehose box (the cold path) which arrows into an S3 bucket drawn with partition folders, with Athena / Redshift Spectrum / EMR reading from S3 below. A small Glue Data Catalog sits between Firehose and S3 (schema for Parquet) and is also wired to Athena. A dotted “replay” arrow loops from S3 and from the Kinesis cylinder back into a reprocessing box, emphasising that any store is rebuildable. Cross-cutting boxes underneath — IAM, KMS, VPC, CloudWatch — touch every tier. The defining visual: one log, two non-interfering consumers, each scaling on its own axis.
Component breakdown
| Component | AWS service | Role in the pipeline | Key configuration choices |
|---|---|---|---|
| Durable log / ingest | Kinesis Data Streams | Replayable, ordered-per-key buffer; the single source of truth and contract | On-demand mode for spiky/unknown load; provisioned shards once load is steady (cheaper at scale). Partition key chosen for even distribution. Retention 24h → up to 365 days. Server-side KMS encryption on. |
| Producers | KPL / AWS SDK / API Gateway proxy / agents | Get events into the stream efficiently | KPL for high-throughput aggregation+batching; API Gateway → Kinesis integration for browser/mobile/IoT where you don’t ship the SDK. Always set a meaningful partition key. |
| Hot-path processor | AWS Lambda (event source mapping) | Low-latency enrich, transform, route, and bulk-index to OpenSearch | Batch size 100–500; batch window 1–5 s; parallelization factor up to 10 per shard; bisect-on-error + maximum-retry + on-failure SQS/SNS DLQ; enhanced fan-out consumer for fast, dedicated read. |
| Cold-path delivery | Kinesis Data Firehose | Zero-code buffer → transform → Parquet → partitioned S3 landing zone | Buffer 128 MB / 60 s; record format conversion to Parquet via Glue schema; dynamic partitioning by event_type/date/hour; GZIP/Snappy compression; S3 error-output prefix for failed records. |
| Operational analytics store | Amazon OpenSearch Service | Sub-second search, live dashboards, alerting, anomaly detection over hot data | Multi-AZ with dedicated master nodes (3) at scale; time-based indices + ISM rollover; UltraWarm + cold tiers for history; in-VPC deployment; fine-grained access control. |
| Analytical lake | Amazon S3 + Glue Data Catalog | Immutable raw archive + schema registry for query-in-place | Lifecycle to S3 Intelligent-Tiering/Glacier; partitioned layout; Glue crawler or Firehose-registered schema; the durable twin of the stream. |
| Lake query engines | Athena, Redshift Spectrum, EMR | Ad-hoc and heavy analytics over months/years of events — off the hot store | Athena for ad-hoc/serverless; Spectrum for BI joins to curated dims; EMR/Spark/Flink for replay and ML feature jobs. |
| Cross-cutting | IAM, KMS, CloudWatch, VPC, X-Ray | Identity, encryption, observability, isolation | Least-privilege roles per tier; CMKs per stream/domain; iterator-age & throttle alarms; private subnets + endpoints. |
A few component-level decisions carry disproportionate weight:
Partition key design is the single highest-leverage choice in the whole system. The key determines both ordering and parallelism. Pick a key with too few distinct values (say, region with 4 values) and you get hot shards: most traffic crams onto one shard, that shard’s records throttle at its 1 MB/s · 1000 rec/s write limit while others sit idle, and the consumer for that shard becomes your bottleneck. Pick a high-cardinality, evenly-distributed key (device_id, session_id, card_bin if BINs are well spread) and load fans out across shards smoothly. When you genuinely need per-entity ordering, the entity is your key; when you don’t, use a high-cardinality key (or a random suffix) purely for distribution. There is no fixing a bad partition key downstream — it is decided at the producer.
Why two consumers off one stream instead of one Lambda that does everything. A Kinesis stream supports multiple consumers, and using that is the architecture, not an optimisation. If a single Lambda both indexed to OpenSearch and wrote to S3, then OpenSearch back-pressure (a cluster yellow, a bulk reject) would stall the same iterator that’s responsible for archival — you’d risk losing your durable copy because your operational copy was unhealthy. By giving Firehose its own consumer, the archive keeps flowing no matter what OpenSearch is doing, and vice-versa. They share a source of truth and share nothing else.
Standard consumers vs. enhanced fan-out. Each shard’s read throughput (2 MB/s) is shared across all standard (polling) consumers. With both a Lambda and Firehose polling, plus any replay job, you can starve reads and watch iterator age climb (your consumers fall behind real time). Enhanced fan-out (EFO) gives a consumer its own dedicated 2 MB/s per shard pipe with push delivery and ~70 ms latency. The reference uses EFO for the latency-sensitive Lambda hot path and standard consumption for Firehose (which is throughput-, not latency-, sensitive), so the live dashboard stays fast even when the archive is busy.
Firehose does the boring-but-critical lake hygiene for free. Hand-rolling JSON→Parquet conversion, buffering to avoid the S3 small-file problem, and Hive-style partitioning is a surprising amount of code to get right and keep right. Firehose does all three declaratively: it buffers to large objects (killing small-file query tax), converts to columnar Parquet against a Glue schema (10x cheaper Athena scans), and dynamic-partitions by fields you choose. Letting the managed service own the cold path is what keeps the team’s code limited to the business logic in the hot-path Lambda.
Implementation guidance
Region, accounts, and isolation. Put the streaming stack in the account that owns the workload, but keep the analytical lake (S3 + Glue) in a data account so analysts and the platform team get governed access without touching production streaming. In a multi-account org (AWS Organizations / Control Tower), the Kinesis stream and Lambda live in prod-app, Firehose assumes a role to write into the data-lake account’s bucket, and OpenSearch lives in prod-app (operational) or a shared observability account.
Infrastructure as Code (Terraform sketch). Everything here is declarative; do not click streams or domains into existence. The core resources and the wiring that people most often get wrong:
# 1. The durable log — on-demand to start; switch to PROVISIONED with a shard count once load is known.
resource "aws_kinesis_stream" "events" {
name = "events-${var.env}"
retention_period = 168 # hours; 7 days of replay headroom
stream_mode_details { stream_mode = "ON_DEMAND" }
encryption_type = "KMS"
kms_key_id = aws_kms_key.stream.arn
}
# 2. Enhanced fan-out consumer for the latency-sensitive hot path.
resource "aws_kinesis_stream_consumer" "hot" {
name = "lambda-hot-efo"
stream_arn = aws_kinesis_stream.events.arn
}
# 3. Lambda event source mapping — the bits that save you at 3 a.m.
resource "aws_lambda_event_source_mapping" "hot" {
event_source_arn = aws_kinesis_stream_consumer.hot.arn # EFO ARN
function_name = aws_lambda_function.enricher.arn
starting_position = "LATEST"
batch_size = 300
maximum_batching_window_in_seconds = 2
parallelization_factor = 4 # >1 invocation per shard, order kept per key
bisect_batch_on_function_error = true # isolate the poison record
maximum_retry_attempts = 5
maximum_record_age_in_seconds = 3600
function_response_types = ["ReportBatchItemFailures"] # partial-batch checkpointing
destination_config {
on_failure { destination_arn = aws_sqs_queue.dlq.arn }
}
}
# 4. Firehose cold path — same stream, independent consumer; JSON -> Parquet -> partitioned S3.
resource "aws_kinesis_firehose_delivery_stream" "lake" {
name = "events-lake-${var.env}"
destination = "extended_s3"
kinesis_source_configuration {
kinesis_stream_arn = aws_kinesis_stream.events.arn
role_arn = aws_iam_role.firehose.arn
}
extended_s3_configuration {
role_arn = aws_iam_role.firehose.arn
bucket_arn = aws_s3_bucket.lake.arn
buffering_size = 128
buffering_interval = 60
dynamic_partitioning_configuration { enabled = true }
prefix = "events/event_type=!{partitionKeyFromQuery:event_type}/dt=!{timestamp:yyyy-MM-dd}/hour=!{timestamp:HH}/"
error_output_prefix = "errors/!{firehose:error-output-type}/dt=!{timestamp:yyyy-MM-dd}/"
data_format_conversion_configuration {
enabled = true
output_format_configuration { serializer { parquet_ser_de {} } }
schema_configuration {
role_arn = aws_iam_role.firehose.arn
database_name = aws_glue_catalog_database.lake.name
table_name = aws_glue_catalog_table.events.name
}
}
}
}
The high-value, frequently-missed lines: function_response_types = ["ReportBatchItemFailures"] lets the function return which records in a batch failed so Lambda re-drives only those rather than the whole batch (no more replaying 299 good records to retry 1 bad one); bisect_batch_on_function_error plus on_failure DLQ guarantees a poison pill never wedges a shard; and parallelization_factor is how you scale processing concurrency beyond one-per-shard while still preserving per-partition-key order. On the Firehose side, dynamic_partitioning + the !{partitionKeyFromQuery:...} prefix is what produces a properly partitioned, Parquet, query-cheap lake with zero custom code.
Networking and identity wiring.
- OpenSearch in a VPC, never public. Place data nodes in private subnets across AZs; reach Dashboards through a bastion/SSO proxy or via a private ALB. Enable fine-grained access control and map IAM roles to OpenSearch roles, so the hot-path Lambda’s role can write to
events-*indices and nothing else. - VPC endpoints (PrivateLink) for Kinesis, Firehose, S3 (gateway endpoint), and STS so data and control-plane traffic never traverse the public internet. The Lambda runs in the VPC to reach OpenSearch; give it an endpoint to Kinesis so it can still read the stream.
- Least-privilege IAM, one role per tier. The producer role gets
kinesis:PutRecord*on exactly that stream. The Lambda execution role getskinesis:GetRecords/GetShardIterator/SubscribeToShard/DescribeStream*on the stream/consumer,es:ESHttpPost/Puton the OpenSearch domain, andsqs:SendMessageon the DLQ. The Firehose role gets read on the stream, write on the lake bucket,glue:GetTable*for the schema, andkms:GenerateDataKey/Decrypton the relevant keys. No tier shares a role. - KMS CMKs for the stream, the OpenSearch domain, and the S3 lake. Producers and consumers need explicit
kms:Decrypt/GenerateDataKeygrants — a forgotten KMS grant is the most common “it deployed but nothing flows” failure.
Schema discipline. Producers should emit a versioned envelope ({schema_version, event_type, payload, ts}). Register the canonical schema in AWS Glue Schema Registry; producers using the KPL/SDK serializer validate against it, so a malformed producer is rejected at the edge rather than poisoning the lake. The hot-path Lambda routes on event_type, and Firehose dynamic-partitions on it — one field doing double duty.
Enterprise considerations
Security & Zero Trust. Treat every tier as independently authenticated and encrypted. Encryption in transit is TLS everywhere (Kinesis/Firehose/OpenSearch HTTPS endpoints); encryption at rest is KMS CMKs on the stream, the domain, and the bucket. OpenSearch runs in-VPC with fine-grained access control, document- and field-level security where PII lives (mask pan/card_number, restrict geo to authorised roles), and SAML/SSO for Dashboards users — no shared admin logins. Apply least-privilege per-tier IAM as above, and use SCPs to forbid creating public OpenSearch domains or unencrypted streams org-wide. For regulated data, strip or tokenize sensitive fields in the hot-path Lambda before indexing, and let only the S3 lake hold the raw (encrypted, access-controlled) copy. Nothing trusts the network; every hop carries identity.
Cost optimization. The cost levers are specific and mostly about not over-paying for the hot tier:
- Kinesis mode: on-demand is convenient and right for spiky/unknown load, but it costs meaningfully more per GB than provisioned at steady state. Once you know your sustained throughput, switch to provisioned and size shards to ~70% of capacity; you can flip modes without downtime. This is frequently the biggest single saving.
- OpenSearch tiering: keep only days of data on expensive hot nodes; roll older indices to UltraWarm (S3-backed, a fraction of the cost) and then cold via ISM. Right-size with gp3 storage and reserved instances for the steady-state cluster. Most teams keep 10x more data hot than anyone ever queries.
- Firehose + Parquet makes the cold path almost free to query: columnar + compressed + partitioned means Athena scans (and bills) a fraction of the bytes. Lifecycle S3 to Intelligent-Tiering and Glacier for cold history.
- Lambda is billed on GB-seconds; right-size memory (which also sets CPU), keep the function lean, and let batching amortise invocations. A bigger batch window trades a little latency for fewer invocations and fewer, larger OpenSearch bulk calls.
Scalability. Each tier scales on its own axis, which is the point. The stream scales by shard count (provisioned) or automatically (on-demand) up to the quota; watch WriteProvisionedThroughputExceeded and resharding (split/merge). Lambda scales to one concurrent invocation per shard times the parallelization factor — so 50 shards × factor 4 = up to 200 concurrent enrichers — and EFO removes read contention. Firehose scales transparently. OpenSearch scales by adding data nodes and shards (size shards to ~10–50 GB, avoid shard explosion). The governing metric for “is the hot path keeping up?” is iterator age / GetRecords.IteratorAgeMilliseconds — if it climbs, consumers are falling behind real time and you add shards, raise the parallelization factor, or move to EFO.
Reliability & DR (RTO/RPO). The durable log is the resilience story. Kinesis replicates synchronously across three AZs, so an AZ loss is a non-event for ingest. The replay window (retention up to 365 days) is your logical RPO for derived stores: if OpenSearch is corrupted or lost, you have not lost data — you rebuild the index by replaying the stream and/or reprocessing the S3 Parquet, so RPO for the derived/operational store is effectively zero up to the retention window, and RTO is “how long a reprocess takes.” For OpenSearch itself, take automated snapshots to S3 (hourly) for a fast restore. For regional DR, the lake’s S3 bucket uses Cross-Region Replication; the stream can be mirrored to a second region with a small consumer-and-re-put bridge or you stand the pipeline up in the DR region and replay from replicated S3. Pin concrete numbers to it: hot-path component failure RTO in minutes (Lambda/OpenSearch auto-recover); full operational-store rebuild RTO in the low hours via replay; data-loss RPO ≈ 0 within retention. The poison-pill DLQ guarantees one bad record can never take down a shard — reliability at the record level, not just the cluster level.
Observability. Instrument all four tiers, but watch these signals specifically: stream iterator age and throughput-exceeded (the canary for falling behind / hot shards); Lambda errors, throttles, iterator age, and DLQ depth (a non-empty DLQ means poison records to investigate); Firehose delivery freshness and S3 delivery failures; OpenSearch cluster status (green/yellow/red), _bulk rejections / 429s, JVM memory pressure, and free storage. Wire CloudWatch alarms on iterator-age-rising and DLQ-not-empty as your two highest-signal pages. Enable X-Ray on the Lambda to see enrich-then-index latency. And use OpenSearch’s own alerting/anomaly-detection for the business signals (decline spikes, crash bursts) — the platform that holds the data is also the one that watches it.
Governance. Tag every resource by data-classification, owner, cost-center, and env. Enforce org-wide guardrails with SCPs (no unencrypted streams, no public OpenSearch). Catalog the lake in Glue / Lake Formation so analyst access to historical events is governed with row/column controls, decoupled from the operational store. Keep producer schemas in the Glue Schema Registry with compatibility rules so a producer change can’t silently break the lake. Retention is policy: stream retention, OpenSearch ISM, and S3 lifecycle each encode “how long, where” for their tier, and they should be reviewed together.
Reference enterprise example
Meridian Pay is a fictional mid-market payments processor: ~1,400 merchants, peaks of 6,000 authorizations/second on Black-Friday-class days, ~1.2 kB JSON per event, and a hard requirement from their sponsor bank to detect fraud-pattern anomalies and gateway degradation in near-real-time. Their old world was a read-replica of the auth database plus a nightly job; fraud was found at T+1 day and the bank was unhappy.
What they built. They stood up the reference exactly as above:
- Kinesis Data Streams, started in on-demand mode while load was unknown, then — after four weeks of data showed a steady ~6k peak / ~900 average — moved to provisioned with 8 shards (each shard handles 1 MB/s · 1000 rec/s ingest; 8 gives ~8k rec/s headroom at peak with the partition key being
card_bin, which spreads evenly across ~3,000 active BINs). The mode switch alone cut their Kinesis bill by ~55%. Retention set to 7 days for replay headroom. - Hot-path Lambda with an enhanced fan-out consumer,
batch_size = 300,maximum_batching_window = 2 s,parallelization_factor = 4. It enriches each auth (issuer name and country from BIN, merchant category,is_decline, tokenizes the PAN so no card number ever reaches OpenSearch) and bulk-indexes ~300 docs per call into dailyauth-YYYY.MM.DDindices. Partial-batch failures (ReportBatchItemFailures) re-drive only bad records; a poison auth goes to an SQS DLQ that an engineer triages. End-to-end authorise→dashboard latency measured at 3–5 s at peak. - OpenSearch Service: 3 dedicated master nodes + 6
r6gdata nodes across 3 AZs, in-VPC, fine-grained access control with field-level security hiding the (tokenized) PAN from analyst roles. ISM keeps 2 days hot, rolls 14 days to UltraWarm, then cold-stores to 90 days. An alerting monitor evaluates decline-rate-by-BIN every 60 s over a 5-minute window; anomaly detection runs on per-issuer auth volume. - Firehose, consuming the same stream independently, buffers 128 MB / 60 s, converts to Parquet via a Glue schema, and dynamic-partitions into
s3://meridian-lake/auth/dt=…/hour=…/in the separate data account. Athena and Redshift Spectrum serve the analysts and the bank’s monthly reporting off S3 — never touching the live cluster.
The decisions that mattered. They explicitly chose two consumers over one mega-Lambda after a load test showed that an OpenSearch _bulk 429 during a spike stalled the combined function and started backing up archival — unacceptable for a regulated copy of record. Splitting Firehose onto its own consumer made the S3 archive immune to OpenSearch health. They chose EFO for the hot path after iterator age crept up under combined polling load; the dedicated 2 MB/s pipe dropped iterator age back to near-zero. And they sized OpenSearch hot capacity for 2 days, not 90 — UltraWarm holds the rest — after realising analysts queried >7-day-old data via Athena anyway.
The incident that proved it. Eleven weeks in, an issuer had a partial outage. Decline rate for that issuer’s BIN range went vertical. The OpenSearch monitor paged the on-call within 90 seconds; the analyst opened the deep-linked dashboard, sliced the last 15 minutes, and confirmed it was issuer-side (one BIN range, all merchants, all geographies) rather than a Meridian gateway fault — and routed affected traffic to a fallback issuer. Total time from spike to mitigation: under 7 minutes, versus the next-day discovery their old system offered. A month later, when the fraud team added two features to the model, they replayed three days of auths from S3 Parquet to rebuild a new derived index — no producer changes, no data loss, a few hours of EMR.
The outcome. Fraud-pattern and outage detection moved from T+1 day to seconds. The bank got its near-real-time assurance. Steady-state cost landed around $3,100/month (Kinesis provisioned ~$300, Lambda ~$250, Firehose ~$200, OpenSearch ~$2,000, S3/Athena/Glue ~$350) — roughly one-third of an early quote that had put 90 days of data on hot OpenSearch nodes. The whole pipeline is ~600 lines of Terraform plus one Lambda; nobody hand-writes Parquet, partitioning, or buffering, because Firehose owns all three.
When to use it
Use this architecture when you have a continuous event stream that must be actionable in seconds and retained cheaply for later, where the operational view and the analytical archive have genuinely different access patterns. The sweet spot: fraud/risk monitoring, IoT/device telemetry, clickstream and product analytics, gaming telemetry, ad-tech, application/security observability, and any “decline/anomaly/spike — tell me now and let me dig later” problem. It shines precisely because the hot and cold paths scale and fail independently, and because the durable log makes every derived store rebuildable.
Trade-offs to go in with eyes open. You are running and right-sizing an OpenSearch cluster, which is the operationally heaviest piece here (shard sizing, JVM pressure, version upgrades) — budget for that expertise or lean harder on managed alarms. Kinesis introduces shard math and partition-key thinking that a simpler queue doesn’t; get the key wrong and you fight hot shards. And there is a real latency floor — this is seconds-fresh, not microseconds; it is not a low-latency transactional path.
Anti-patterns to avoid. Do not write one Lambda that does ingest, enrich, index, and archive — you couple the durability of your record-of-truth to the health of your dashboard, exactly the failure Meridian load-tested away. Do not index to OpenSearch one document per record — always _bulk per batch. Do not pick a low-cardinality partition key for convenience; hot shards will find you at peak. Do not keep all history on hot OpenSearch nodes because it’s easy — that’s the line item that triples the bill. Do not skip the DLQ / partial-batch-failure wiring; a single poison record will otherwise wedge a shard and silently stall the pipeline. And do not treat Firehose-to-OpenSearch as a substitute for the Lambda hot path when you need enrichment and custom routing — Firehose’s transform Lambda is fine for light shaping, but the dual-consumer split (Lambda→OpenSearch, Firehose→S3) is what keeps the stores independent.
Alternatives, and when they win.
- Managed Service for Apache Flink (KDA) instead of Lambda when you need true stateful stream processing — windowed aggregations, joins across streams, exactly-once sinks, sessionization. Lambda is perfect for stateless per-batch enrich-and-index; reach for Flink the moment you need sustained windowed state or stream-to-stream joins.
- Amazon MSK (managed Kafka) instead of Kinesis when you have existing Kafka investment/ecosystem, need very long retention with compaction, sub-shard-granularity consumer-group semantics, or multi-cloud portability. Kinesis wins on operational simplicity, native Firehose/Lambda integration, and on-demand auto-scaling; MSK wins on Kafka-ecosystem fit and fine-grained control.
- DynamoDB Streams / Kinesis-from-DynamoDB when the events you care about are already table mutations — capture change data without a separate producer.
- EventBridge instead of Kinesis when you have discrete, low-to-moderate-volume business events needing rich routing/filtering to many targets, rather than a high-throughput ordered firehose. EventBridge is routing-shaped; Kinesis is log-shaped. Choose by whether you think in events to route or a stream to process.
- OpenSearch Ingestion / Data Prepper as an alternative to the Lambda for the OpenSearch path when you want a managed, config-driven pipeline (with its own buffering and transforms) instead of owning function code — a reasonable swap that keeps the dual-consumer shape.
The decision rule in one line: if you have a high-volume, keyed event stream that two very different audiences must consume — one in seconds, one over months — without interfering, this Kinesis → (Lambda → OpenSearch | Firehose → S3) reference is the AWS-native answer, and the durable log underneath it is what lets you sleep.