Nvidia’s new Blackwell GPUs are encountering significant challenges in data center environments, as reported by The Information. Clients of Nvidia have expressed concerns about the performance of these AI accelerators, particularly due to overheating issues that have delayed the rollout of AI training server racks.
The Blackwell architecture serves as the foundation for both Nvidia’s upcoming AI accelerators and the anticipated RTX 50-series graphics cards. Deployment of the B100 and B200 GPUs had already been postponed because of earlier design flaws, even with substantial orders from major tech companies like Meta, Microsoft, and Google.
The core issue plaguing the data centers is the high density of 72 AI accelerators packaged in a single server rack, which has resulted in overheating problems. According to Reuters, Nvidia has repeatedly instructed its suppliers to modify the server rack designs in an attempt to mitigate these thermal issues.
Blackwell represents a pivotal development for Nvidia. It underpins the next generation of GPUs, poised to secure its dominance over competitors like AMD. Team Red has made strides with its MI300X AI accelerator already in use in data centers, while it prepares to roll out the MI325X.
Nvidia claims that Blackwell can train large language models with up to 25 times the cost and energy efficiency of its previous Hopper architecture, or that it accelerates training speeds by as much as 30 times. These performance gains could exacerbate heat-related challenges that data centers are already facing.
There are also potential repercussions for the RTX 50-series GPUs. Despite the RTX 4090’s high efficiency in gaming, it faced issues with power consumption and overheating connectors. Speculation suggests that the RTX 5090 may push power demands to as high as 600 watts, and recent confirmations indicate that Nvidia will continue using the 12V-2×6 connector, linked to previous melting incidents with the RTX 4090.
While gamers might not fit 72 RTX 5090s into a single system, the overheating complications differ in magnitude between data centers and personal computers. If the Blackwell architecture struggles in a data center setting, this could signal trouble for Nvidia’s consumer desktop graphics lineup.
For now, observers are awaiting further developments, with Nvidia slated to unveil its RTX 50-series GPUs in January during CES 2025. Recent reports indicate a reduction in production of the existing RTX 40-series cards, likely in preparation for the next generation.