Smart Factory
3 Minutes

Behind the Curtain: AWS re:Invent 2024 Highlights

Expanding on my post from last week, it was great to see AWS leaning back into their engineering roots at re:Invent this year. They really pulled back the curtain to show us how they’re solving problems at a scale most of us can’t even imagine.

It also brought back memories of my own work at re:Invent in 2018, working behind the scenes on level-400 (expert) break-out sessions, workshops and demos. Those experiences gave me a front-row seat to the kind of engineering depth AWS brings to the table, and collaborating with teams of principal, staff and distinguished engineers.

Peter DeSantis kicked things off Monday night with a deep dive into the infrastructure innovations AWS teams have been cooking up. It was equal parts engineering showcase and crash course in how AWS keeps pushing the boundaries of what's possible.

Solving Problems at AWS Scale

During the keynote, David Clark took the stage to talk about the extraordinary lengths AWS goes through to manage its infrastructure from manufacturing all the way to bootup. This isn’t just some off-the-shelf solution; it’s the kind of work that separates AWS from your typical neocloud providers.

Take the BARGE, for example.

They built a setup with 288 spinning disks at 7200rpm. That might not sound too crazy until you realize the vibrations from the disks caused problems for themselves—not to mention the 4,500-pound weight that had to be distributed across the datacenter floor!

Then there’s the work they’re doing with Nitro, integrating it directly into their JBOD (Just a Bunch of Drives) racks. By adding Nitro to storage, they’re not just accelerating performance but also tackling scale and security at the same time.

A Peek at Advanced Silicon Design

Peter also gave us a masterclass in silicon design and advanced packaging. He explained how AWS’s chips are basically mini motherboards, built with interposers that act like PCBs but offer 10x the bandwidth of traditional designs.

One of the coolest parts? AWS has moved power delivery to the back of the package, shortening the wires and reducing power loss.

Then there’s the Trainium 2 server—a beast of a machine with 1.5TB of HBM3, 46TB/s bandwidth, and a dense 20.8 PFLOPS of compute power. They’ve significantly reduced the wiring and ribbon complexity, too. Especially helpful when you are dealing with liquid cooling!

And if that wasn’t enough, they teased Trainium 3. With a 3nm process, it promises 2x the compute power and 40% greater efficiency.

A Personal Reflection on Scaling

Hearing Peter talk about 10p10u (10s of petabit of bandwidth, with under 10 microseconds of latency) brought back some serious network memories.

Between 2014 and 2017, my Amazon teams built out the aggregation layer in our network, replacing Cisco 7k switches with racks packed to the brim with white-box hardware running in-house software. It was called EUCLID.

At the start, we were deploying one network per week. By the end, we had it fully automated, cranking out over 100 deployments a week. Seeing AWS talk about their scaling challenges brought equal parts nostalgia and pride. It’s a reminder of how far we’ve come—and how far there still is to go.

Amazon Nova

And then came the Amazon Nova announcement.

Andy Jassy himself showed up to lay out AWS’s strategy for foundational models, including four new offerings that directly compete with the likes of OpenAI.

The Nova family includes:

  • Nova Micro: A text-only model optimized for speed and cost.
  • Nova Lite: A multimodal model handling text, images, and video.
  • Nova Pro: Built for complex reasoning.
  • Nova Premier: Their most advanced model, set for release in early 2025.
  • Nova Canvas: For image generation.
  • Nova Reel: Focused on video creation.

The part that really grabbed my attention was AWS’s emphasis on price-performance, which they’re using to position themselves against competitors. It's the number one concern of businesses using hyperscale clouds, is that the cost has gotten out of control.

With Amazon not just hosting AI workloads anymore, instead building they are also building models themselves, whilst leaning into partnerships like Anthropic and keeping an eye on OpenAI’s rumored plans to build custom silicon.

It’s a fascinating dynamic. We’re in an industry where your biggest competitor one day might be your closest ally the next. Hyperscalers like AWS are no longer just infrastructure providers—they’re directly shaping the future of AI.

If there’s one takeaway from this year’s re:Invent, it’s this: AWS isn’t just scaling infrastructure; they’re investing heavily across their stack with AI. Their engineering-first approach shows why they remain relevant and a leader in this space.