Cooling is becoming so central to data centers that it’s changing the relationship between the data center owner, chip maker, power source and the company that’s managing the thermal load, Mauro Atalla, senior vice president and chief technology and sustainability officer at Trane Technologies, told Facilities Dive in an interview.
When a single server rack can generate up to 200 kilowatts of thermal load — in some cases as much as 600 kW, and soon as much as a megawatt — the technology is nearing the point where thermal management is inseparable from data center system management, said Atalla, who joined Trane last year from Collins Aerospace.
It’s “really important for companies to connect at multiple levels,” said Atalla, who’s been overseeing collaborations between Trane and chip makers on reference designs that integrate cooling with other data center technologies. The designs are model thermal systems that colocation providers and other data center owners can adopt.
“Every colocation provider is going to go through [this thermal management] journey,” he said.
Among the company’s recent designs are AI facilities developed in collaboration with NVIDIA for gigawatt-scale data centers. The designs, which integrate with NVIDIA’s Omniverse DSX Blueprint, include Trane chillers, coolant distribution units, or CDUs, heat exchangers, pumps and control sequences.
“Built on the NVIDIA Omniverse Blueprint, [the designs] enhance sustainability and optimize operations,” Trane says on its website.
Atalla said direct-to-chip liquid cooling CDUs, which deliver a water glycol mixture to the chips, are becoming essential given the power and speed at which chips operate. “You are going to need liquid cooling — there is no way around it,” he said.
At a conference in late April, Atalla and Phill Lawson-Shanks, chief innovation and technology officer at Aligned Data Centers, said cooling systems today effectively operate as part of the system, which makes managing the thermal load inseparable from managing the system itself.
“You have to start thinking about the whole building as a cohesive system,” said Lawson-Shanks, whose company designs and builds data centers, “not separate layers reacting to each other.”
“The heat source drives everything,” Atalla said at the conference.
With liquid cooling, there’s no margin for the power to go down, Attila and Lawson-Shanks said.
Unlike with conventional air cooled systems, which can last several minutes before the system’s thermal limit is reached, liquid-cooled systems can only last a few seconds. For that reason, backup storage power that triggers immediately is essential. That imperative effectively makes components of energy management part of the thermal system, they said.
“With liquid … you need immediate continuity on the mechanical side,” Lawson-Shanks said.
Given the integration that’s needed for the power, cooling and chips to work together, the industry is moving toward more connected control. But it’s not there yet, Atalla said.
“Ideally, you want to have your cooling system be able to react with ups and downs of computing demand,” he said, which would mean having the thermal management provider connected to the operations system.
Atalla likened data centers to the aerospace industry in the importance of seamless collaboration, because the final product is only as good as the integration of the component parts.
“These [hyperscalers and other data center owners] are making billion dollar investments and they need to make sure everybody will deliver,” he said. You “have [to have] the relationships when things don’t go as planned.”