Architecture Multi-cloud

Object Storage 101: Buckets, Tiers, and Lifecycle Across Clouds

A mid-size online education company — call it the kind of outfit that runs courses for a few hundred thousand learners on Moodle — has a storage bill that doubled in eighteen months, and nobody can explain why. The platform lead pulls the numbers and the answer is almost embarrassing: every lecture video, every PDF handout, every learner-uploaded assignment, every thumbnail, and every nightly database export is sitting in the same place, on the most expensive storage tier the cloud sells, forever. A 2019 cohort’s submitted assignments that no human will ever open again cost exactly as much per gigabyte as last night’s freshly recorded lecture. There is no versioning, so when a content editor overwrote the wrong course manifest last month, the only recovery was a frantic Slack thread. And the videos are served straight from storage to learners worldwide, so a popular launch hammers a single region’s bucket and the egress charges spike like a heart monitor.

None of these are exotic problems. They are the default problems you get when a team treats object storage as a magic infinite folder and never learns the three controls that make it cheap, safe, and fast: storage classes (tiers), lifecycle policies, and versioning with soft delete. This article is object storage from the ground up — what a bucket actually is, how the three big clouds name the same ideas differently, and how to wire it into a real platform so that the education company’s bill stops doubling and its content stops disappearing. It is a Junior-level tour, but everything in it is what a senior architect actually configures on day one.

What object storage actually is

Start with what it is not. It is not a disk you mount, and it is not a file system with folders. There are no real directories, no file handles, no partial in-place edits, and no POSIX permissions. Object storage is closer to a giant hash map: you PUT an object under a key and you GET it back by that key. That single design choice is why it scales to trillions of objects and eleven-nines of durability while a traditional file server falls over at a few million files.

An object is three things bundled together:

A bucket (AWS S3 and Google Cloud Storage call it a bucket; Azure calls it a container, living inside a storage account) is the top-level namespace that holds objects. It is where you attach the policies that matter: which region the data lives in, who can read it, whether versioning is on, and what lifecycle rules apply. Get the bucket right and most other decisions follow.

Three properties define the whole product category, and a beginner should internalize them before touching a console:

  1. Durability is extreme, availability is merely good. The major clouds design for eleven nines (99.999999999%) of durability — they replicate each object across multiple devices and availability zones so the practical chance of losing a stored object is vanishingly small. Availability (the chance it answers right now) is a separate, lower number, and it varies by tier. Durability protects you from hardware; it does not protect you from you deleting or overwriting the wrong thing — that is what versioning is for.
  2. It speaks HTTP, not a file protocol. Every object has a URL and you talk to it with GET/PUT/DELETE over HTTPS. That is exactly why it pairs so naturally with a CDN and with browsers.
  3. You pay for three different things. Storage (per GB-month), requests (per thousand operations), and egress (per GB leaving the cloud to the internet). Beginners watch only the first number; the bill that doubled was driven by the third.

The three clouds, one mental model

The single most useful thing for a newcomer is to see that S3, Azure Blob, and Google Cloud Storage are the same idea with different vocabulary. Learn one and you have learned all three; only the names and a few defaults change.

Concept AWS S3 Azure Blob Storage Google Cloud Storage
Top-level container Bucket Container (in a Storage Account) Bucket
The stored thing Object Blob Object
“Hot” / frequent tier S3 Standard Hot Standard
Infrequent tier Standard-IA / One Zone-IA Cool Nearline (30d) / Coldline (90d)
Archive tier Glacier Flexible / Deep Archive Archive Archive (365d)
Automatic tiering S3 Intelligent-Tiering (lifecycle rules / access tiers) Autoclass
Lifecycle engine S3 Lifecycle Blob lifecycle management Object Lifecycle Management
Keep old versions S3 Versioning Blob versioning Object Versioning
Recover deletes Versioning + MFA Delete Soft delete (blob & container) Soft delete + versioning
Pre-signed access Pre-signed URL SAS (Shared Access Signature) Signed URL
Server-side encryption key SSE-S3 / SSE-KMS Microsoft-managed / CMK Google-managed / CMEK

The shape is identical everywhere: a container holds objects, objects live in a class/tier, a lifecycle engine moves them between tiers and eventually deletes them, versioning keeps history, and a signed URL hands out scoped temporary access. The education company happens to run primarily on AWS for its Moodle deployment, so the worked example below uses S3 names — but every rule maps one-for-one to Azure and GCP via the table.

Storage classes: pay for the access pattern, not the bytes

The first lever that fixes the bill is storage classes, also called tiers. The insight is simple: not all data is accessed the same way, so you should not pay the same for all of it. Clouds offer a spectrum, and you trade retrieval cost and latency for cheaper storage.

The trap that beginners and the education company both fell into: leaving everything in the hot tier “to be safe.” The fix is not to manually move objects — nobody has time — it is to let lifecycle policies do it automatically, and for genuinely unpredictable access patterns, to use the automatic tiers.

When you cannot predict access, let the cloud decide. S3 Intelligent-Tiering and GCP Autoclass monitor each object’s access and move it between tiers automatically, with no retrieval fees for the moves. For a Moodle media library where some old courses suddenly trend again when an instructor re-shares them, Intelligent-Tiering is often the safest default — you stop guessing. The tradeoff is a small per-object monitoring fee, which is negligible for big videos and wasteful for millions of tiny files, so do not point it at your thumbnail bucket.

Lifecycle policies: the rule engine that fixes the bill

A lifecycle policy is a set of declarative rules the storage service evaluates daily: “objects matching this filter, once they are this old, should transition to a cheaper tier or be deleted.” This is the single highest-leverage thing the education company can turn on, and it is free.

Here is a lifecycle configuration for the lecture-video bucket, expressed as S3 JSON. Read it as plain English: keep new videos hot for a quarter, cool them, deep-archive them after a year, and clean up half-finished multipart uploads that otherwise silently accrue charges.

{
  "Rules": [
    {
      "ID": "tier-down-course-videos",
      "Filter": { "Prefix": "videos/" },
      "Status": "Enabled",
      "Transitions": [
        { "Days": 90,  "StorageClass": "STANDARD_IA" },
        { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
      ],
      "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
    },
    {
      "ID": "expire-nightly-db-exports",
      "Filter": { "Prefix": "backups/moodle-db/" },
      "Status": "Enabled",
      "Expiration": { "Days": 35 }
    }
  ]
}

The same intent in Azure is a JSON policy with tierToCool / tierToArchive / delete actions keyed on daysAfterModificationGreaterThan; in GCP it is a lifecycle block with SetStorageClass and Delete actions gated by age conditions. Different syntax, identical behavior.

Two rules above quietly solve two real money leaks. The expire rule on backups/moodle-db/ means database exports delete themselves after 35 days instead of piling up for years — the unbounded-backup problem nearly every team has. The AbortIncompleteMultipartUpload rule matters more than it looks: large video uploads happen in parts, and a failed upload leaves orphaned parts that you are billed for but cannot see in the normal object listing. Without this rule, that is a slow, invisible leak. Turning these on across the existing buckets is what stops the bill from doubling.

A subtlety worth knowing early: transitions are not free in the moment. Each transition is a per-object operation with a small cost, and moving a billion tiny files to a cheaper tier can cost more in transition fees than it ever saves in storage. Lifecycle tiering pays off for fewer, larger objects (videos, exports), not for swarms of kilobyte-sized thumbnails — for those, just pick the right tier at write time.

Versioning and soft delete: the undo button

The overwritten-course-manifest incident has a one-line fix: versioning. With versioning enabled on a bucket, every PUT to an existing key creates a new version and retains the old one; a DELETE just adds a “delete marker” and hides the object rather than destroying it. Recovery becomes “list versions, restore the previous one” instead of “restore from last night’s backup and lose a day.”

Soft delete is the closely related safety net. Azure Blob soft delete (and GCP soft delete, and S3’s delete markers under versioning) keeps deleted objects recoverable for a retention window you set — 7, 14, 30 days — before they are permanently purged. It is the difference between “an intern ran a bad cleanup script” being a five-minute restore versus a resume-updating event.

The two controls work together, and the lifecycle engine ties them off so old versions do not become a new cost problem:

{
  "ID": "prune-old-versions",
  "Filter": { "Prefix": "course-content/" },
  "Status": "Enabled",
  "NoncurrentVersionTransitions": [
    { "NoncurrentDays": 30, "StorageClass": "STANDARD_IA" }
  ],
  "NoncurrentVersionExpiration": { "NoncurrentDays": 180 }
}

That rule says: keep every old version for safety, cheapen it after 30 days, and delete it after 180 — so versioning gives you an undo button without quietly tripling your storage. For genuinely critical, compliance-bound data (financial exports, audit logs), the next step up is Object Lock / immutability (WORM), which prevents anyone — including an admin or attacker with stolen credentials — from deleting an object before a retention date. That is your ransomware backstop, and it is worth turning on for the backup bucket specifically.

Architecture overview

Object Storage 101: Buckets, Tiers, and Lifecycle Across Clouds — architecture

Now the worked example: serving the education company’s static and media assets — course thumbnails, CSS/JS bundles for the Moodle theme, downloadable PDFs, and lecture videos — to learners worldwide, cheaply and safely. The naive design that caused the egress spike was “browser → S3 bucket, directly.” The corrected architecture has three buckets with different jobs and a CDN in front, and it weaves object storage into the platform’s real operating model.

The bucket layout — separate by access pattern, not by convenience:

The control and data flow, end to end:

  1. Edge and CDN. Learners hit Akamai at the edge, which terminates TLS, serves cached objects from points of presence close to each learner, and provides WAF and bot protection. The CDN — not the bucket — faces the internet. This is the move that kills the egress bill: a popular video is fetched from the bucket once per region and then served thousands of times from cache, so origin egress collapses and learners get lower latency. The bucket trusts only the CDN’s identity (an origin access identity / signed origin request), so there is no way to bypass the cache and hammer storage directly.

  2. Reading public assets. For thumbnails and theme bundles, the CDN reads from the public-assets bucket over a private origin connection and caches aggressively. The objects carry long Cache-Control max-age values so the CDN and browsers hold them for weeks; a content change uses a new key (a content hash in the filename) so caches never serve stale bytes. No credentials are involved on the read path at all — these assets are public-by-design, just not public-by-origin.

  3. Reading private uploads — signed URLs. When a learner downloads their own past submission or an instructor pulls an assignment to grade, the asset is in the private bucket and must not be world-readable. The Moodle application (the only component holding storage credentials) generates a pre-signed URL (S3) / SAS token (Azure) / signed URL (GCP): a time-limited, single-object, often single-method URL, signed with the app’s credentials, that grants exactly “this one user, this one object, GET only, for the next 5 minutes.” The browser then fetches directly from storage with that URL — the bytes never pass back through the application server, which keeps the app stateless and cheap. When the clock runs out, the URL is dead. This is the pattern for serving private user content from object storage, and getting it wrong (URLs too long-lived, or scoped to a whole bucket) is a classic finding.

  4. Writing uploads. A learner submitting an assignment also gets a signed URL — this time scoped to PUT a single key — so the upload goes browser-direct to the private bucket without streaming gigabytes through Moodle’s web tier. The application records the resulting object key in its database; the bytes and the metadata-of-record stay cleanly separated.

  5. Identity and credentials. Human access to the platform is SSO via Okta (federated to Microsoft Entra ID where Azure resources are in play) — learners and staff log into Moodle once, and Okta’s SCIM provisioning keeps accounts in step with enrollment. The application’s own access to S3 uses a workload identity / IAM role, not a static access key, so there is no long-lived secret to leak. The few secrets that cannot be a role — a third-party transcoding API key, the database password Moodle needs — live in HashiCorp Vault, leased dynamically and short-lived, never baked into an AMI or a config file. This directly honors the platform’s standing rule after an old credential leak: no static storage keys, ever.

  6. Server-side encryption. Every bucket has server-side encryption on by default. For the sensitive private-uploads and backup buckets, that means customer-managed keys (SSE-KMS / CMK / CMEK) so key rotation and access are auditable and revocable; the public-assets bucket can use cloud-managed keys since the content is non-sensitive by design. Encryption in transit (HTTPS only, enforced by a bucket policy that denies non-TLS requests) is universal.

Wiring it into the operating model

Object storage is never just the bucket; it is the buckets plus the guardrails that keep them safe and observable as the team grows.

Failure modes, scaling, and the tradeoffs

Failure modes worth naming before they bite:

Scaling. Object storage itself is effectively infinite — you do not provision capacity, you just write more objects, and the clouds design for trillions of them. What scales with effort is everything around it: request rate (spread load across key prefixes and front it with a CDN), cross-region serving (replicate buckets to a paired region or rely on the CDN’s global cache), and cost-at-scale (lifecycle tiering and Intelligent-Tiering keep the per-GB number falling as the corpus grows). For a global learner base, the realistic posture is one authoritative region per bucket plus an aggressive CDN, escalating to cross-region replication only when you need regional read latency or DR on the storage layer itself.

Cost — the levers, ranked by impact for this platform:

Lever Mechanism Typical effect
CDN in front of storage Serve cached objects from the edge, not the bucket Egress collapses; often the single biggest line-item drop
Lifecycle tiering Auto-transition cold data to IA/Archive 40–95% cheaper storage on aged data
Intelligent-Tiering / Autoclass Automatic tiering for unpredictable access Right tier with no guesswork or retrieval-fee surprises
Expire backups & abort multipart Lifecycle Expiration + AbortIncompleteMultipartUpload Stops unbounded and invisible growth
Prune noncurrent versions Lifecycle on old versions Keeps the undo button from tripling storage
Block egress bypass Origin access identity; deny direct public reads Forces all traffic through the cheap, cached path

Explicit tradeoffs — accept these or you have not understood object storage:

The shape of the win

Six weeks after turning these controls on, the education company’s storage story is unrecognizable. The bill stopped doubling and actually fell, because a year of cold lecture videos and ancient assignment submissions now sit in archive tiers at a fraction of the price, multipart orphans and stale backups expire themselves, and the CDN means a viral course launch is served from the edge instead of melting a bucket’s egress budget. The overwritten-manifest panic cannot recur, because versioning and soft delete make recovery a thirty-second restore. Learner submissions are no longer one careless ACL away from the open internet, because public access is blocked by default, Wiz Code catches it in review, and Wiz watches for drift in production. And every one of these controls is declared in Terraform, gated through ServiceNow, and watched in Datadog — so the platform is not just cheaper and safer today, it stays that way as it grows.

None of this required exotic technology. It required understanding the three controls that object storage has offered all along — tiers, lifecycle, and versioning — and the handful of patterns (a CDN at the edge, signed URLs for private content, workload identity instead of static keys) that turn an infinite magic folder into a real, governed foundation. That is object storage done right, and it is the same on AWS, Azure, and GCP — the names change, the model does not.

Object StorageS3Azure BlobCloud StorageLifecycleMulti-cloud
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

Keep Reading