Benji Visser

Think Harder Before Picking an S3 Storage Class

Picking an S3 Storage Class when you're storing data at scale (>50TB/month) is more complicated than it seems, as I recently learned.

We naively thought: ah, well we have it in STANDARD, and we don't really need it after 3 months, so let's just transition it to GLACIER Deep Archive. Easy!

But we didn't understand our data well enough before making the switch.

The problem

The problem is that $/GB is not the only factor to consider when picking an S3 Storage Class.

You REALLY need to know your data before making a decision. Things to consider:

  • what percentage of your objects are under 128KB?
  • do you overwrite objects?
  • what percent of your objects are being accessed in those first 3 months?
  • how long do you need to keep the data for?
  • how often do you need to restore the data?

This is a multivariate problem, and you need to consider all of these factors together!

What went wrong

We store about 2TB/day of data.

Three things we didn't know about our data:

1. We do streaming uploads

Our app writes to the same S3 key multiple times as a task runs. Each write overwrites the previous one. In Standard, that's fine. In Glacier IR, each overwrite counts as deleting an object before its 90-day minimum duration—which triggers an early deletion fee.

If we switched to Glacier IR, we would be charged an early deletion fee for each overwrite.

2. Most of our objects are tiny

We checked our size distribution: 63% of objects are under 128KB. Glacier IR has a 128KB minimum billable size. A 10KB screenshot gets billed as 128KB. We were paying 10x the "sticker price" for most of our data.

3. Transitions have per-object costs

Moving an object into Glacier IR costs $0.02 per 1,000 objects. That's not per GB—it's per object. With millions of small files, the transition fees alone were significant.

The $/GB number on the pricing page told us nothing about any of this.


Storage Classes

There are minimums to consider as well as transition costs.

ClassMin Billable SizeMin DurationTransition Cost
Standard
Standard-IA128KB30 days$0.01/1K objects
Glacier IR128KB90 days$0.02/1K objects
Glacier Flexible40KB90 days$0.03/1K objects
Deep Archive40KB180 days$0.05/1K objects

Glacier Flexible and Deep Archive also add 40KB of metadata overhead per object, billed at their respective rates.


The traps

Small-object trap

You have 100 million 5KB files. You move them to Glacier IR thinking you'll save money. But each file gets billed as 128KB. Your "cheap" archive is now expensive.

Overwrite trap

Your app updates objects in place. Every update in Glacier IR means:

  • Early deletion fee on the old version (remaining days of 90-day minimum)
  • New 90-day clock on the new version

Restore trap

You need to restore 5TB from Deep Archive. You pay:

  • Retrieval request fees ($0.10 per 1,000 objects)
  • Per-GB retrieval fees
  • Standard storage rates on the temporary restored copy
  • Archive storage rates on the original (still paying)

The breakeven math on transition costs

If you want to calculate file sizes for breakeven when thinking about transitions to deeper tiers:

Glacier IR saves $0.057/GB over 91 days vs Standard
Transition costs $0.00002 per object

Breakeven: 0.057 × size = 0.00002
           size = 350KB

Below 350KB, the transition costs more than you save over the course of 91 days.

For Deep Archive, the transition cost is higher ($0.05/1K) and there's 40KB metadata overhead per object. Breakeven is around 1.9MB.

As you go deeper into the glacier tiers, the transition cost increases and thus file sizes need to be larger to make it worthwhile.

Conclusion

Before picking a storage class, use these tools to understand your data:

  1. S3 Storage Lens - Shows object size distribution, access patterns by age, and storage class breakdown across your buckets
  2. S3 Inventory - Generates reports listing every object with its size, storage class, and metadata—useful for identifying small files and calculating transitions at scale

Once you understand your data, feel free to use this tool I created: S3 Lifecycle Policy Simulator to simulate different lifecycle policies and see the true cost before committing.

If you have any questions about this topic, feel free to send me an email at benji@093b.org.