In Back to the Future, Doc Brown leans in and says, "Roads? Where we're going, we don't need roads."
Great Scott!
In AI infrastructure, the opposite is true. Where we're going, we absolutely need roads—and in many cases, the roads are the system.
John Gage said decades ago that "the network is the computer." In the era of distributed AI workloads, that statement has moved from visionary to literal.
When you deploy large GPU clusters for training or inference, performance is determined less by the individual server and more by the fabric that binds everything together. Collective communication patterns, east–west bandwidth, congestion behavior, firmware maturity, and deterministic latency now define system throughput in ways that raw FLOPS never could.
The GPUs and the network fabric are not separate layers. They form a single, tightly coupled machine.
The Cost Fallacy
In most AI clusters, networking represents perhaps 10–15% of total capital expenditure. Compared to the GPU line item, that can feel modest, even secondary.
But if that 10–15% is mis-sequenced, under-ordered, thermally constrained, or waiting on optics, then the remaining 85–90% of the investment sits idle. From a token production perspective, that is a brutal outcome. Idle GPUs burn power and depreciation while waiting on collective operations that cannot complete because the fabric is incomplete or unstable.
A relatively small shortfall in networking components can immobilize a vastly larger compute investment. That is not an engineering inconvenience; it is a tokenomics failure.
If your Delorean has a flux capacitor but no roads, you are not hitting 88 miles per hour.
The Golden Screw Problem
I often use the golden screw analogy when discussing supply chains. A complex system can be fully assembled, powered, cooled, and technically validated, but if one small, seemingly insignificant component is missing, the entire system cannot ship.
In AI clusters, networking components frequently become that golden screw.
Optics. Transceivers. Cables. Retimers. Qualified spares. Firmware-aligned revisions. None of these make headlines, yet any one of them can hold up an entire deployment. You can have racks of GPUs installed and commissioned, and still miss your revenue window because a tranche of 800G optics slipped or because the specific transceiver revision required for qualification is delayed.
Most organizations are still playing checkers with GPUs, focusing on allocation and headline supply, assuming networking will follow naturally. The more strategic operators are playing chess with the fabric, understanding that switch ASIC lead times, optics allocation, and interoperability validation cycles are often the true gating factors.
The 4D chess move is to bring more of this under direct control: developing upstream supplier relationships, qualifying multiple optics vendors early, carrying intelligent buffer inventory, and treating spares strategy as part of capital planning rather than an afterthought.
Supply Chain as Strategy
There is a harder truth here as well.
Large hyperscalers understand that networking supply is finite, and they act accordingly. It is not unusual for an operator to secure allocation for 10 million switches when they may only need five million in the near term, effectively absorbing months of market supply and reducing competitors' ability to scale in parallel.
This isn't panic buying. It is strategic positioning.
In that environment, assuming you can procure networking hardware later in the cycle is optimistic at best. For sovereign AI builds, neo-cloud platforms, and modular data center deployments, networking components can quietly become the critical path. By the time you realize this, the allocation window may already be closed.
Where we're going, we need roads, and sometimes the biggest players have already paved them for themselves.
Cooling, Density, and System Cohesion
As rack densities increase and liquid cooling becomes standard for GPUs, networking silicon is no longer thermally trivial. High-radix switch ASICs and dense 800G port configurations introduce meaningful heat loads that must be modeled within the same thermal envelope as the GPUs.
Designing direct-to-chip cooling for compute while treating networking as an air-cooled afterthought creates asymmetry in the system. Fabric topology, switch placement, manifold layout, and serviceability need to be considered together. Once you accept that the network is the computer, it follows that it must sit inside the same mechanical and thermal design loop.
Otherwise, you risk building a beautifully cooled compute platform constrained by the very roads it depends on.
Elevating Networking to First-Class Status
The required shift is organizational as much as technical.
Networking must be elevated to the first-class citizen it deserves to be in AI workloads. Fabric readiness should gate deployment sequencing. Optics and switch allocation need to be modeled as strategic risks. Spares strategy should be deliberate and capitalized appropriately. Firmware validation and interoperability testing belong in core platform engineering, not as a late-stage integration task.
When this discipline is absent, GPUs wait.
And when GPUs wait, token economics erode quickly.
We are no longer assembling servers and connecting them with cables. We are building distributed motherboards at megawatt scale, where fabric, compute, cooling, and supply chain form a single system.
Great Scott, where we're going, we're gonna NEED roads.