Migrating to Graviton: arm64 Builds, Multi-Arch Pipelines, and Performance Benchmarking

Graviton is the cheapest performance win most AWS estates are leaving on the table. The pitch — “up to ~40% better price-performance over comparable x86 instances” — is real for a large class of workloads, but it is not a checkbox. arm64 is a different instruction set, and the migration risk lives in the long tail: a native Python wheel with no aarch64 build, an agent your security team mandates that only ships x86, a base image that silently pulls the wrong architecture and runs your service under QEMU emulation at a third of the throughput. This guide is the migration runbook I actually use: audit portability, build honest multi-arch images, stand up arm64 CI, roll out on EC2/EKS with mixed-architecture scheduling, and prove the win with benchmarks before you commit production traffic.

1. The Graviton value proposition, and where it wins

Graviton processors (the current generation is Graviton4, behind the R8g/M8g/C8g families; Graviton3 powers *7g and Graviton2 the *6g) are AWS-designed Arm Neoverse cores. Three things matter for a migration decision:

Price-performance. The headline ~40% number is workload-dependent. It holds well for throughput-bound, horizontally scalable services — web/API tiers, microservices, caches, queue consumers, and many JIT/managed-runtime workloads (JVM, Go, .NET, Node, modern Python).
Energy efficiency. Graviton uses meaningfully less energy per unit of work, which is why it underpins much of AWS’s own fleet and is a lever for sustainability reporting.
Where it does not automatically win. Single-thread-latency-bound code tuned for x86, anything with hand-written x86 intrinsics or AVX-512 paths, and workloads gated by a dependency that has no aarch64 build. Per-core clock is not where Arm competes; aggregate throughput per dollar is.

Rule of thumb: if a workload scales out cleanly and you already run more than one instance of it, it is a Graviton candidate. If it depends on a single fat box tuned for x86 single-thread, benchmark before you believe anything.

2. Assess portability before you touch infrastructure

The migration fails or succeeds in the dependency audit. Inventory three layers.

Native dependencies. Anything with compiled code needs an aarch64 build. Audit your lockfiles, not your top-level requirements.

# Python: find wheels that are x86-only (no aarch64/universal tag)
pip download -r requirements.txt -d /tmp/wheels --only-binary=:all: \
  --platform manylinux2014_aarch64 --python-version 312 --implementation cp \
  --abi cp312 2>&1 | tee /tmp/aarch64-audit.log
# Any package that errors with "no matching distribution" needs a source build or a swap.

# Node: native addons surface as prebuilt binaries or node-gyp rebuilds
npm ls --all 2>/dev/null | grep -Ei 'sharp|bcrypt|grpc|canvas|node-sass|re2'

Language runtimes and toolchains. The major managed runtimes are first-class on arm64: Go (GOARCH=arm64), Rust (aarch64-unknown-linux-gnu), Java (use a current OpenJDK; Corretto ships aarch64), .NET, Node, and Python. The traps are pinned old runtimes and base images that only publish linux/amd64.

ISV, agents, and sidecars. This is where production migrations stall. Confirm aarch64 support for everything that runs next to your app: the observability agent (Datadog, Dynatrace, New Relic, OpenTelemetry Collector all ship arm64), security/EDR agents (CrowdStrike, etc. — verify the exact version your policy mandates), service mesh sidecars (Envoy/App Mesh, Istio), and CI/build tooling. One mandated x86-only agent can veto an entire tier; find it now, not in week three.

Produce a simple portability matrix and gate on it:

Layer	Component	aarch64 status	Action
Runtime	Go 1.22	Native	none
Native dep	`grpcio` 1.x	Wheel available	pin >= version with aarch64 wheel
Native dep	legacy `cryptography` pin	No aarch64 wheel	unpin / source-build with Rust toolchain
Agent	EDR sensor	Vendor GA on arm64	validate mandated version
Sidecar	Envoy	Native	none

3. Build multi-arch container images with buildx and ECR

Do not maintain two Dockerfiles. Build one image as a multi-arch manifest list so docker pull / Kubernetes resolves the right architecture automatically. The key correctness rule: use the --platform build arg and $TARGETPLATFORM/$BUILDPLATFORM so cross-builds are explicit, never accidental emulation.

# syntax=docker/dockerfile:1
FROM --platform=$BUILDPLATFORM golang:1.22 AS build
ARG TARGETOS TARGETARCH
WORKDIR /src
COPY . .
# Cross-compile from the builder's native arch to the target arch (fast, no QEMU)
RUN CGO_ENABLED=0 GOOS=$TARGETOS GOARCH=$TARGETARCH go build -o /out/app ./cmd/app

FROM public.ecr.aws/docker/library/alpine:3.20
COPY --from=build /out/app /usr/local/bin/app
ENTRYPOINT ["/usr/local/bin/app"]

Create a builder and push a manifest list covering both architectures in one command:

# One-time: a buildx builder backed by the docker-container driver
docker buildx create --name multiarch --driver docker-container --use
docker buildx inspect --bootstrap

aws ecr get-login-password --region ap-south-1 \
  | docker login --username AWS --password-stdin \
    111122223333.dkr.ecr.ap-south-1.amazonaws.com

docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag 111122223333.dkr.ecr.ap-south-1.amazonaws.com/app:1.4.0 \
  --provenance=false \
  --push .

ECR stores this as a single tag pointing at an image index. Verify both platforms are present:

aws ecr batch-get-image --repository-name app --image-ids imageTag=1.4.0 \
  --region ap-south-1 \
  --query 'images[].imageManifest' --output text | jq -r '.manifests[].platform'
# Expect: {"architecture":"amd64",...} and {"architecture":"arm64","os":"linux"}

For interpreted/native-heavy stacks where cross-compilation is painful, build each arch on a native runner instead of emulating — that is the next step.

4. arm64 CI: native runners and cross-compilation

Emulated arm64 builds under QEMU are correct but slow, and slow CI erodes adoption. Build arm64 artifacts on arm64 hardware.

CodeBuild offers native Arm compute. Select an ARM_CONTAINER environment with an aarch64 image:

# buildspec.yml -- runs natively on an ARM_CONTAINER compute fleet
version: 0.2
phases:
  pre_build:
    commands:
      - aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $REPO_HOST
  build:
    commands:
      - docker build --platform linux/arm64 -t $REPO_URI:$IMAGE_TAG-arm64 .
      - docker push $REPO_URI:$IMAGE_TAG-arm64

resource "aws_codebuild_project" "app_arm" {
  name         = "app-arm64"
  service_role = aws_iam_role.codebuild.arn

  artifacts { type = "NO_ARTIFACTS" }
  source { type = "CODEPIPELINE" } # or GITHUB / CODECOMMIT

  environment {
    type            = "ARM_CONTAINER"
    compute_type    = "BUILD_GENERAL1_LARGE"
    image           = "aws/codebuild/amazonlinux2-aarch64-standard:3.0"
    privileged_mode = true # required for docker build
  }
}

GitHub Actions now provides Linux arm64 hosted runners; you can build each architecture on native hardware and stitch the manifest from the digests. A clean pattern is a build matrix that pushes per-arch digests, then a merge job:

jobs:
  build:
    strategy:
      matrix:
        include:
          - platform: linux/amd64
            runner: ubuntu-24.04
          - platform: linux/arm64
            runner: ubuntu-24.04-arm     # native arm64 runner
    runs-on: ${{ matrix.runner }}
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::111122223333:role/gha-ecr-push
          aws-region: ap-south-1
      - uses: aws-actions/amazon-ecr-login@v2
      - uses: docker/setup-buildx-action@v3
      - uses: docker/build-push-action@v6
        with:
          platforms: ${{ matrix.platform }}
          # Push by digest only; the merge job assembles the manifest list
          outputs: type=image,name=111122223333.dkr.ecr.ap-south-1.amazonaws.com/app,push-by-digest=true,name-canonical=true,push=true

The merge job then runs docker buildx imagetools create -t <repo>:<tag> <digest-amd64> <digest-arm64> to publish the final manifest list. Either way, the artifact your registry serves is architecture-correct and built on real silicon.

5. Migrate managed services

Most managed services let you flip to Graviton by changing the instance/node class — the heavy lifting is benchmarking, not plumbing.

RDS / Aurora. Move to a Graviton DB instance class (e.g. db.r7g.*, db.r8g.* where available for your engine/version). On Aurora this is a modify-and-failover; the storage layer is untouched, so it is low-risk and reversible. Test on a clone first.
ElastiCache. Redis/Valkey and Memcached run on Graviton node types (cache.r7g.*, cache.m7g.*). For Redis you can change node type via a scaling operation; validate with your real key/value sizes.
OpenSearch. Data and master nodes support Graviton instance types (e.g. r7g.*.search); roll via a blue/green domain update.
Lambda. The cheapest, lowest-risk arm64 win there is. Set the architecture to arm64 and Lambda charges less per GB-second while many workloads also run faster.

resource "aws_lambda_function" "worker" {
  function_name = "worker"
  role          = aws_iam_role.lambda.arn
  package_type  = "Image"
  image_uri     = "111122223333.dkr.ecr.ap-south-1.amazonaws.com/worker:1.4.0"
  architectures = ["arm64"] # the entire migration for a packaged-correctly function
  memory_size   = 1024
  timeout       = 30
}

For zip-based Lambdas, the only requirement is that any bundled native dependency is an aarch64 build. Layer-packaged binaries compiled for x86 will fail at cold start — rebuild them on arm64.

6. Roll out on EC2 and EKS with mixed-architecture scheduling

On EC2, the change is the instance type plus an arm64 AMI (Amazon Linux 2023, Ubuntu, Bottlerocket all publish aarch64). The trap is pulling an x86 AMI for an arm64 instance type — the launch fails, but in an ASG that can look like a capacity stall.

On EKS, run mixed-architecture node groups during the transition and let the scheduler place pods on matching nodes. Two non-negotiables:

Your images must be multi-arch manifest lists (step 3), so a pod scheduled to either arch pulls the right layer.
Pods that are not yet arm64-clean must be pinned to x86 with nodeAffinity so they never land on a Graviton node.

apiVersion: apps/v1
kind: Deployment
metadata: { name: app }
spec:
  replicas: 6
  template:
    spec:
      affinity:
        nodeAffinity:
          # Prefer arm64 once the image is validated; flip to required to enforce
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 80
              preference:
                matchExpressions:
                  - key: kubernetes.io/arch
                    operator: In
                    values: ["arm64"]
      containers:
        - name: app
          image: 111122223333.dkr.ecr.ap-south-1.amazonaws.com/app:1.4.0

For a workload still pinned to x86, invert it with a required affinity on kubernetes.io/arch: amd64. With Karpenter, express the same intent in the NodePool so it provisions Graviton capacity on demand:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata: { name: graviton }
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["arm64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["c7g.xlarge", "m7g.xlarge", "r7g.xlarge"]

The well-known label kubernetes.io/arch is set automatically by the kubelet on every node, so you can rely on it without custom labeling.

7. Benchmarking methodology

Never migrate on faith. Run a controlled comparison and report price-performance, not raw speed.

Identical software, different arch. Same image (multi-arch), same config, same data set. The only variable is instance family — compare like-for-like sizes (m6i.xlarge vs m7g.xlarge).
Representative load. Replay production-shaped traffic, not a synthetic hello-world. Measure at a fixed, sustained request rate and report p50/p95/p99 latency and max sustained throughput before SLO breach.
Warm and steady. Discard warm-up; let JITs compile and caches fill. Run long enough to see GC/compaction behavior.
Compute the ratio that matters. Price-performance = (throughput per dollar). Take sustained RPS at your latency SLO, divide by the On-Demand hourly price of each instance, and compare.

# Fixed-rate, fixed-duration load with a constant-arrival-rate model (k6)
k6 run --vus 200 --duration 10m \
  -e TARGET=https://app.internal/api/checkout load.js

# Pull p95/p99 and RPS from your metrics, then:
# price-perf = sustained_rps_at_SLO / on_demand_price_per_hour
# Compare the m7g (Graviton) ratio against the m6i (x86) ratio.

A correct result looks like: “m7g.xlarge sustained 9,400 RPS at p99 < 120 ms vs 7,800 RPS on m6i.xlarge, at ~20% lower hourly price — ~45% better price-performance.” If Graviton loses, you have found a workload that needs profiling (often a hot path with no Arm-optimized library), not a reason to abandon the program.

8. Phased cutover, canary, and rollback

Migrate one tier at a time, in increasing order of blast radius: batch/async consumers and dev environments first, then stateless API tiers, then anything stateful.

For each tier, run a canary on Graviton behind the same load balancer / service and watch SLOs:

Route a small slice (e.g. 5-10%) to arm64 nodes (an ELB target group of Graviton instances, or a weighted Kubernetes rollout).
Compare error rate, p99 latency, and saturation against the x86 baseline for a full traffic cycle (cover peak).
Ramp 10 -> 25 -> 50 -> 100% only while the canary stays within SLO.

Rollback is trivial when you keep the x86 path alive. Because the image is multi-arch and the x86 node group still exists, rollback is a scheduling change: flip nodeAffinity back to amd64 (or shift the target-group weights), and pods reschedule onto x86 with no rebuild and no image change. Keep both node groups until a tier has soaked at 100% Graviton for at least one full business cycle.

Enterprise scenario

A fintech platform team ran a Java (Spring Boot) payments API on ~200 m6i.xlarge instances across three EKS clusters and wanted Graviton’s savings to hit a board-level cost target. The constraint was non-negotiable: a mandated EDR agent ran as a DaemonSet on every node, and the security team would not approve the migration until that exact sensor version was certified on arm64. They also discovered one internal library still pulled an x86-only native .so for a legacy HSM client.

They sequenced it deliberately. First, the portability audit caught both blockers in week one: they pinned the EDR DaemonSet to the certified arm64 build (and confirmed it on a single canary node group before fleet-wide), and rebuilt the HSM client library with an aarch64 toolchain, publishing the service as a multi-arch manifest list. They stood up a Graviton Karpenter NodePool alongside the existing x86 one and started with a 5% weighted canary, using preferred nodeAffinity so a bad pull could never strand a pod:

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 90
        preference:
          matchExpressions:
            - key: kubernetes.io/arch
              operator: In
              values: ["arm64"]

The canary held p99 within 4% of the x86 baseline across a full peak cycle, so they ramped to 100% over two weeks, draining the x86 node group last. Benchmarking showed ~43% better price-performance on the API tier; combined with a parallel flip of their async workers to arm64 Lambda and the Aurora reader fleet to db.r7g, the program cut the platform’s monthly compute bill by roughly a third. The decisive move was treating the EDR agent as a first-class migration dependency instead of an afterthought — it was the single thing that would have blocked the whole effort in production.

Verify

Confirm each layer is genuinely on arm64 and serving correctly before you trust the savings:

# 1) The running container is actually arm64 (not x86 under emulation)
kubectl exec deploy/app -- uname -m            # expect: aarch64
kubectl get nodes -L kubernetes.io/arch        # confirm node arch labels

# 2) The image is a real multi-arch manifest list, both platforms present
docker buildx imagetools inspect \
  111122223333.dkr.ecr.ap-south-1.amazonaws.com/app:1.4.0
# Expect Platform: linux/amd64 AND linux/arm64 in the output

# 3) No pod is accidentally running under QEMU on the wrong arch
kubectl get pods -o wide && kubectl describe node <arm-node> | grep -A3 Architecture

# 4) Lambda functions report arm64
aws lambda get-function-configuration --function-name worker \
  --query 'Architectures' --output text   # expect: arm64

# 5) Managed-service instance classes are Graviton
aws rds describe-db-instances \
  --query 'DBInstances[].[DBInstanceIdentifier,DBInstanceClass]' --output table

The single most important check is uname -m returning aarch64 and a healthy throughput number under load — that pair proves you are running native Arm, not an emulated image quietly burning your price-performance gain.

Migrating to Graviton: arm64 Builds, Multi-Arch Pipelines, and Performance Benchmarking

1. The Graviton value proposition, and where it wins

2. Assess portability before you touch infrastructure

3. Build multi-arch container images with buildx and ECR

4. arm64 CI: native runners and cross-compilation

5. Migrate managed services

6. Roll out on EC2 and EKS with mixed-architecture scheduling

7. Benchmarking methodology

8. Phased cutover, canary, and rollback

Enterprise scenario

Verify

Checklist

Written by Vinod

Comments

Keep Reading

Centralized AWS Backup with Organizations: Vault Lock, Cross-Account Copy, and Recovery Runbooks

Centralized Egress Inspection with AWS Network Firewall: Routing, Domain Filtering, and Suricata Rules

Validating VPC Connectivity with Reachability Analyzer and Network Access Analyzer