OCP 2024 Regional Summit wrap

The Open Compute Project (OCP) Regional Summit was hosted in Lisbon, Portugal last month, the 5th (and largest) regional summit the group has hosted.

Whilst I wasn't able to make it in person, I’d be remiss if I didn't write a (very) quick summary about the conference, and pertinent updates to scaling digital infrastructure in a sustainable way.

The hot topic continues to be GenAI, such that OCP has created a new track for Artificial Intelligence, and a strategic initiative for Open AI Systems (no, not THAT OpenAI), promoting open rack-level systems comprising of hardware, firmware, management and validation.

Next-gen Cooling updates

Of course, the man in red, Rolf Brink (Promersion) was going to feature in this section, and continuing to forecast the maturity of immersion coming in the next 2-3 years, with cold plate well and truly in place today, supporting many large production workloads.

Liquid cooling is well over a $1B market.

Amy Short (Denvr) and Andy Young (Asperitas) front a safety and process maturity focused presentation about immersion cooling requirements, what's being done and how to navigate this next-gen solution.  Andy shows off a total cost of ownership (TCO) model that Allison leads, providing transparency and educating decision makers of the actual costs to deploying immersion.

And I believe I spy Jon Summers asking questions at the end, and in many other sessions!

Additionally, Amy (Denvr) and Peter Short (Submer) share the work they are doing with regards to guidelines on dealing with immersion fluids, how to clean up spills, environmental considerations, etc.

Big Signal Integrity updates from Andy (Asperitas).  I'm super pleased with the amount of R&D that has gone into plopping servers in fluid and creating standards around the correct fluids to use.

As Andy details, there are electromagnetic impedances that are introduced, that impact the ability to transmit data between components.

Mohamad Hnayno from OVHCloud discusses the work that OVH have done on developing a hybrid Immersion Liquid Cooling technique in Europe, using both direct to chip cooling as well as immersion fluid, and he brought receipts….plenty of testing data shared which is super interesting.

Sustainability

Sammy Nachimuthu from Intel is back (I covered his talk in last year's Global Summit)

discussing the need to move from the current, and antiquated metric of power usage effectiveness (PUE) and proposes an improved Infrastructure Utilization Efficiency (IUE), diving into optimizations like incorporating DC-to-DC power losses.

Quantum

Perhaps I missed the Quantum memo from last year's global summit, however there appeared to be more talks on Quantum, at least to my untrained eye. Stay tuned for something special (and different) from me regarding Quantum.

For a great introduction to Quantum, check this video out…

Then watch Unmesh Sahasrabuddhe's (of Universal Quantum) great talk about the current state of Quantum, and what is needed to scale.

Hint, it's lots of qubits and $$$.

Other interesting videos;

  • Updates to the Open Rack v3 (ORv3) with the suffix HPR (High Power Rack) upgrading the ORv3 rack from 18kW to 92kW!

  • Building a datacenter for Hyperscalers?  OCP has a 120-question checklist to validate your readiness and be certified to improve utilization and increase speed to market.

  • Great video from Andrew from Meta about how Optics are important in AI Clusters, sharing an interesting perspective on how RAM bandwidth and interconnect bandwidth (PCIe and NVLink) lag behind the performance improvements of GPUs.  Notably, latest GPU's and interconnects (NVLink5, PCIe6) are missing, however the trend is still the same.

  • Gilad discusses why the network defines the data center, demonstrating that network bandwidth for AI needs to architect for peaks, not averages.  Peaks typically hit line rate. Whilst NVIDIA owns/sells InfiniBand, they also have an ethernet solution (Spectrum-X) adding some (not all) InfiniBand features to improve performance.

The OCP folks (hi Rob Coyle) pull together so many like-minded folks, almost exclusively volunteers, it's great to see the fruits of so much labor being invested.

Previous
Previous

Oh great, another podcast...

Next
Next

Here come the Inferencing ASIC's