AWS Serverless

S3 Access Points, Object Lambda, and Multi-Region Access Points for Shared Data at Scale

A shared data lake bucket starts clean and ends as a 20 KB bucket policy that nobody dares to edit. Forty teams, each needing a different prefix, a different VPC restriction, a different account — all crammed into one JSON document with a hard 20 KB ceiling and a single point of failure. Access points exist precisely to break that document apart: each consumer gets its own named endpoint to the bucket, with its own policy, and the bucket policy shrinks to one line that says “trust access points.” On top of that, Object Lambda lets you transform objects on the read path without copying data, and Multi-Region Access Points give you one global endpoint over replicated buckets. This guide wires all three together the way a platform team actually does it.

The mental model: an access point is a named door, not a copy

An S3 access point is a named network endpoint attached to a single bucket, each with its own resource policy, its own Block Public Access settings, and optionally a VPC restriction. It is not a copy of the data and it is not a new storage location — it is an alternate front door into the same objects, with its own lock.

Three facts drive every design decision below:

Internalize this: the bucket policy becomes a delegation document (“allow access via my access points”), and the access point policies become the authorization documents. You move from one unauditable monolith to many small, single-responsibility policies you can actually reason about.

Access points use a distinct ARN shape and a distinct hostname, so application code addresses the access point, not the bucket:

arn:aws:s3:us-east-1:111122223333:accesspoint/finance-reports-ap
https://finance-reports-ap-111122223333.s3-accesspoint.us-east-1.amazonaws.com

1. Why bucket policies break down at scale

A single bucket policy has a 20 KB size limit. That sounds generous until you have dozens of consumers, each needing a Condition block for their VPC, their aws:PrincipalOrgID, their prefix, and their allowed actions. You also hit operational problems that have nothing to do with size:

Access points solve all four: separate policies (separate size budgets), per-access-point blast radius, independent change ownership, and per-access-point VPC binding. The bucket policy collapses to a delegation statement:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DelegateToAccessPoints",
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::111122223333:root" },
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::datalake-shared-prod",
        "arn:aws:s3:::datalake-shared-prod/*"
      ],
      "Condition": {
        "StringEquals": { "s3:DataAccessPointAccount": "111122223333" }
      }
    }
  ]
}

That s3:DataAccessPointAccount condition is the key: it says “permit any request that arrived via an access point owned by this account.” The bucket stops making fine-grained decisions and lets the access points do it. (You can also use s3:DataAccessPointArn to scope to specific access points.)

2. Creating access points with scoped policies

Create an access point per application. The first one is internet-routable but still gated by its policy and BPA; you scope it down with the policy:

aws s3control create-access-point \
  --account-id 111122223333 \
  --name finance-reports-ap \
  --bucket datalake-shared-prod

Now attach a policy that confines this access point to one prefix and one set of actions. Note the access point ARN in Resource and the /object/ segment used to reference objects through the access point:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "FinanceReadWritePrefix",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111122223333:role/finance-etl"
      },
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": "arn:aws:s3:us-east-1:111122223333:accesspoint/finance-reports-ap/object/finance/*"
    }
  ]
}

Apply it:

aws s3control put-access-point-policy \
  --account-id 111122223333 \
  --name finance-reports-ap \
  --policy file://finance-ap-policy.json

VPC-only access points

For an access point that must never be reachable from the internet, bind it to a VPC at creation. This is the single strongest network control S3 offers for a shared bucket: the access point simply has no public DNS path.

aws s3control create-access-point \
  --account-id 111122223333 \
  --name analytics-internal-ap \
  --bucket datalake-shared-prod \
  --vpc-configuration VpcId=vpc-0abc123def456 \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true

A VPC-bound access point is reachable only through an S3 interface endpoint (or gateway endpoint) in that VPC. Combine it with an endpoint policy and you have a closed loop: traffic stays on the AWS network, and the access point rejects anything from outside vpc-0abc123def456.

Delegated cross-account access

Access points shine for cross-account sharing because the access point policy can grant to a principal in another account, and that consumer addresses the access point ARN directly — they never see your bucket name. The owning account still controls everything via the access point policy plus the delegating bucket policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PartnerReadOnly",
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::444455556666:role/partner-ingest" },
      "Action": "s3:GetObject",
      "Resource": "arn:aws:s3:us-east-1:111122223333:accesspoint/partner-share-ap/object/exports/*",
      "Condition": {
        "StringEquals": { "aws:PrincipalOrgID": "o-exampleorgid" }
      }
    }
  ]
}

The cross-account principal also needs a matching Allow in its own IAM policy — cross-account always requires both sides. But critically, the bucket policy stays untouched; all the partner-specific logic lives in one small access point policy you can revoke by deleting the access point.

3. Block Public Access inheritance and naming patterns

Every access point carries its own BPA configuration, and it is the most restrictive of the access point setting and the account/bucket setting that wins. You cannot use an access point to loosen public-access controls that the account-level BPA has locked down. If the account blocks public policies, no access point can re-expose the bucket. Set all four flags on every access point unless you have an explicit, audited reason not to:

aws s3control put-access-point-policy-status \
  --account-id 111122223333 \
  --name finance-reports-ap 2>/dev/null || true
# BPA is set at create time; inspect it:
aws s3control get-access-point \
  --account-id 111122223333 \
  --name finance-reports-ap \
  --query 'PublicAccessBlockConfiguration'

For shared datasets, adopt a naming convention that encodes ownership and intent, because the hostname is derived from the name. A consistent scheme — {team}-{dataset}-{rw|ro}-ap — keeps endpoints self-documenting and makes IAM Resource wildcards predictable:

Access point name Purpose Network
finance-reports-rw-ap Finance ETL read/write Internet + policy
analytics-events-ro-ap Analytics read-only VPC-only
partner-share-ro-ap Cross-account export Internet + OrgID

The ARN and hostname both embed the account ID, which is why two accounts can have an access point named reports-ap over different buckets without collision.

4. S3 Object Lambda: transform on the read path

Object Lambda inserts your own Lambda function into the GET path. When a client reads an object through an Object Lambda Access Point, S3 invokes your function, hands it the original object stream, and your function returns the transformed bytes to the caller — without writing a derived copy back to S3. This is the right tool for redaction, PII masking, row-level filtering, watermarking, and format conversion, because you keep exactly one authoritative copy and transform per-request based on who is asking.

The topology has three layers:

  1. The bucket holds the authoritative object.
  2. A supporting access point (a normal access point) sits on the bucket.
  3. The Object Lambda Access Point points at that supporting access point and names the Lambda transform.

The Lambda receives an event with a pre-signed inputS3Url. It fetches the original, transforms it, and calls WriteGetObjectResponse to stream the result back. Here is a correct PII-masking transform in Python:

import boto3
import re
import urllib3

s3 = boto3.client("s3")
http = urllib3.PoolManager()

# Mask anything that looks like a US SSN.
SSN = re.compile(rb"\b\d{3}-\d{2}-\d{4}\b")

def handler(event, context):
    ctx = event["getObjectContext"]
    # S3 hands us a pre-signed URL to the ORIGINAL object.
    resp = http.request("GET", ctx["inputS3Url"])
    original = resp.data

    transformed = SSN.sub(b"***-**-****", original)

    # Stream the transformed bytes back to the caller.
    s3.write_get_object_response(
        Body=transformed,
        RequestRoute=ctx["outputRoute"],
        RequestToken=ctx["outputToken"],
    )
    return {"status_code": 200}

The function’s execution role needs s3-object-lambda:WriteGetObjectResponse. Because the inputS3Url is pre-signed by S3 itself, the function does not need separate s3:GetObject on the bucket for the standard fetch path — but grant it if your code makes additional S3 calls (e.g., reading a redaction config object).

5. Wiring Object Lambda, supporting access points, and Range handling

First create the supporting access point (a plain access point), then the Object Lambda Access Point that references it. The Object Lambda Access Point’s SupportingAccessPoint must be the full ARN of the supporting access point:

# 1. Supporting access point on the bucket.
aws s3control create-access-point \
  --account-id 111122223333 \
  --name pii-supporting-ap \
  --bucket datalake-shared-prod

# 2. Object Lambda Access Point referencing it.
aws s3control create-access-point-for-object-lambda \
  --account-id 111122223333 \
  --name pii-redacted-olap \
  --configuration '{
    "SupportingAccessPoint": "arn:aws:s3:us-east-1:111122223333:accesspoint/pii-supporting-ap",
    "TransformationConfigurations": [
      {
        "Actions": ["GetObject"],
        "ContentTransformation": {
          "AwsLambda": {
            "FunctionArn": "arn:aws:lambda:us-east-1:111122223333:function:pii-redactor"
          }
        }
      }
    ]
  }'

Clients then read through the Object Lambda Access Point ARN, and S3 invokes the transform transparently:

aws s3api get-object \
  --bucket arn:aws:s3-object-lambda:us-east-1:111122223333:accesspoint/pii-redacted-olap \
  --key customers/2026/records.csv \
  ./redacted.csv

Handling Range and partial reads

This is where naive Object Lambda functions break in production. If a client sends a Range or partNumber header (the AWS SDKs do this constantly for large objects and multipart downloads), your function must handle it. You have two correct options:

A length-preserving transform like fixed-width masking is range-safe; a format conversion (CSV to Parquet, or gzip) is not, because byte offsets in the output no longer map to the input. Know which one you have before you enable range support. A safe rejection looks like this:

    head = event.get("userRequest", {}).get("headers", {})
    if "Range" in head or "range" in head:
        s3.write_get_object_response(
            StatusCode=501,
            ErrorCode="RangeNotSatisfiable",
            ErrorMessage="Range requests are not supported by this transform",
            RequestRoute=ctx["outputRoute"],
            RequestToken=ctx["outputToken"],
        )
        return {"status_code": 501}

6. Multi-Region Access Points: one global endpoint

A Multi-Region Access Point (MRAP) is a single global endpoint that routes requests to whichever underlying bucket — across multiple Regions — is closest and healthy. You attach buckets in different Regions, wire S3 Cross-Region Replication (CRR) between them for active-active, and clients use one hostname that ends in .accesspoint.s3-global.amazonaws.com. S3 routes each request to the lowest-latency available bucket using latency-based routing built on AWS Global Accelerator under the hood.

Create the MRAP over two regional buckets:

aws s3control create-multi-region-access-point \
  --account-id 111122223333 \
  --details '{
    "Name": "global-assets-mrap",
    "Regions": [
      { "Bucket": "assets-use1" },
      { "Bucket": "assets-euw1" }
    ]
  }'

This is asynchronous; poll the request token until it reports SUCCEEDED, then read the generated alias (the global hostname prefix):

aws s3control list-multi-region-access-points \
  --account-id 111122223333 \
  --query 'AccessPoints[?Name==`global-assets-mrap`].[Name,Alias,Status]'

For active-active you must configure two-way replication so a write in either Region propagates to the other. Enable replication on both buckets, turn on bidirectional sync (replica modifications and delete-marker replication as your data model requires), and ideally enable S3 Replication Time Control (RTC) for a 15-minute replication SLA. Without CRR, an MRAP is just latency routing over divergent data — which is a correctness bug waiting to happen.

Failover is automatic for read availability: if S3 detects a Regional impairment, it routes around it. But “the object exists in the other Region” is your responsibility via replication. MRAP routes; it does not copy. Replication copies.

7. Request routing, failover, and SigV4A signing

MRAP supports two routing controls. By default it uses latency-based routing across all active Regions. You can also flip a Region’s routing status to passive to drain it (for maintenance or a controlled failover) using the routing-control API:

aws s3control submit-multi-region-access-point-routes \
  --account-id 111122223333 \
  --mrap global-assets-mrap \
  --route-updates '[
    { "Bucket": "assets-euw1", "Region": "eu-west-1", "TrafficDialPercentage": 0 },
    { "Bucket": "assets-use1", "Region": "us-east-1", "TrafficDialPercentage": 100 }
  ]'

Setting TrafficDialPercentage to 0 drains a Region without deleting anything — the canonical way to do a planned, reversible failover.

SigV4A is mandatory

This is the detail that trips up every first MRAP integration. Because a single global request can be served from any Region, it cannot be signed with classic SigV4 (which is Region-scoped). MRAP requests must use Signature Version 4A (SigV4A), the multi-Region signing variant. The recent AWS SDKs and CLI v2 support SigV4A, but you typically must enable the CRT (Common Runtime) auth dependency. For the CLI:

# SigV4A for MRAP requires the CRT signing component.
pip install 'awscli[crt]'   # or use a CLI v2 build with CRT bundled

# Address the MRAP by its ARN; the SDK selects SigV4A automatically.
aws s3api get-object \
  --bucket arn:aws:s3::111122223333:accesspoint/mfzwi23gnjvgw.mrap \
  --key images/logo.png \
  ./logo.png

Note the MRAP ARN form: arn:aws:s3::<account>:accesspoint/<alias>.mrap — no Region segment, because it is global. If you see SignatureDoesNotMatch on your first MRAP call, the cause is almost always a SigV4A-incapable signer; install the CRT extra and retry.

Verify

Confirm each layer is doing exactly what you intended before you hand the topology to consumers.

# 1. Access point exists and carries the VPC + BPA you expect.
aws s3control get-access-point \
  --account-id 111122223333 --name analytics-internal-ap \
  --query '{vpc:VpcConfiguration, bpa:PublicAccessBlockConfiguration}'

# 2. The access point policy is the small, scoped one (not the monolith).
aws s3control get-access-point-policy \
  --account-id 111122223333 --name finance-reports-rw-ap

# 3. Object Lambda actually transforms: original vs. transformed bytes differ.
aws s3api get-object --bucket datalake-shared-prod --key customers/2026/records.csv ./raw.csv
aws s3api get-object \
  --bucket arn:aws:s3-object-lambda:us-east-1:111122223333:accesspoint/pii-redacted-olap \
  --key customers/2026/records.csv ./masked.csv
diff <(head -c 200 ./raw.csv) <(head -c 200 ./masked.csv) && echo "NO MASKING" || echo "MASKED OK"

# 4. MRAP is READY and reports both Regions.
aws s3control get-multi-region-access-point \
  --account-id 111122223333 --name global-assets-mrap \
  --query 'AccessPoint.{status:Status, regions:Regions[].Region}'

# 5. Replication is keeping both buckets in sync (expect near-zero pending).
aws cloudwatch get-metric-statistics \
  --namespace AWS/S3 --metric-name ReplicationLatency \
  --dimensions Name=SourceBucket,Value=assets-use1 \
  --start-time "$(date -u -d '-1 hour' +%FT%TZ)" \
  --end-time "$(date -u +%FT%TZ)" \
  --period 300 --statistics Maximum

Step 3 is the one that catches misconfigured Object Lambda topologies: if the masked output equals the raw output, your Object Lambda Access Point is pointing at the wrong supporting access point or the transform silently failed. Step 5 is the one that catches a “global” MRAP that is actually serving divergent data because replication fell behind.

Cost, request-rate scaling, and observability

A few economics and limits worth internalizing before you build a sprawling topology:

For observability, enable request metrics with an access-point filter so each consumer’s traffic is independently visible, and turn on S3 server access logging or CloudTrail data events — both record the access point ARN, so you can attribute every request to the door it came through. A useful CloudWatch metric-math approach is to alarm per access point on 4xx rate, which surfaces a single broken consumer without noise from the others.

Enterprise scenario

A media analytics platform team ran a single customer-events-prod bucket shared by 30+ internal teams plus two external data partners. The bucket policy had grown past 18 KB and was within sight of the 20 KB hard limit; the last partner onboarding had failed because the policy would not save. Worse, the same dataset had to be served two ways: internal analysts got raw event records, but a downstream BI partner was contractually forbidden from seeing raw email addresses and device IDs. The team had been solving this by running a nightly Glue job that wrote a second, redacted copy of every object to a partner/ prefix — doubling storage for 400 TB of events and introducing a 24-hour staleness gap that the partner kept complaining about.

The constraint: stay within the policy size limit, eliminate the duplicate redacted copy, and serve both audiences from one authoritative object — while keeping the partner traffic off the public internet and the raw data inside a specific VPC.

They restructured around access points and Object Lambda. The bucket policy was rewritten to a single 400-byte delegation statement using s3:DataAccessPointAccount. Internal teams each got a VPC-bound access point scoped to their prefix. The partner got an Object Lambda Access Point whose transform masked email and device fields on the fly, eliminating the nightly Glue job and the 400 TB duplicate entirely — and the partner now saw live data with zero staleness. The masking function was deliberately length-preserving so range requests stayed safe:

import re
EMAIL = re.compile(rb'"email":"[^"]*"')
DEVICE = re.compile(rb'"device_id":"[^"]*"')

def mask(chunk: bytes) -> bytes:
    chunk = EMAIL.sub(b'"email":"[REDACTED]"', chunk)
    return DEVICE.sub(b'"device_id":"[REDACTED]"', chunk)

The result: the 18 KB policy became one line, partner onboarding stopped being a policy-size gamble, S3 storage dropped by roughly a third (the eliminated redacted copies), and the staleness complaint disappeared because redaction now happened at read time on the single live object. The one real cost they accepted was Lambda invocation on the partner read path — which, for a partner pulling a few thousand objects a day, was a rounding error next to 400 TB of duplicated storage.

Checklist

awss3access-pointsobject-lambdadata-accessmulti-region

Comments

Keep Reading