Computing Keeps Its Cool
The lower floors of SLAC's Building 50 house more than 3500 computer and data systems. This collection of servers, batch systems and hard-disc arrays serves the lab's computing needs from e-mail and file storage to large-scale simulations and data processing. All of that compute power generates a lot of heat—enough to melt about 580 one-ton blocks of ice a day—and relies on the circulation of chilled water to keep the air below circuit-blowing temperatures. So when it was time to replace the 40-year-old cooling pipelines, minimizing disruption of computer services required tight coordination between SLAC computing, electrical, and heat, ventilation and air conditioning crews. Completed January 12, the cooling upgrade was a success of careful planning, hard work, cooperative weather and a hearty dash of quick, creative thinking.
A network of pipes circulates water from the Building 23 chilled water plant to cool the air in Buildings 50, 137 and others. Since the system was built, the lab has grown and computing needs have increased, straining its cooling capacity. Last year, crews replaced the system's main, 8-inch chilled water pipes with 10-inch pipelines, running along and under Loop Road, to support a much-needed increase in flow. (See "Cool Upgrades.") But hooking the new pipes into the cooling system would require a temporary shutdown of the chilled water and, to avoid heat overload, most of the computers in Building 50's vast server room. To minimize the impact on the lab, a team of Scientific Computing and Computer Services, Facilities and Operations staff did months of careful planning to schedule an intense nine days of coordinated effort, just after the winter break.
One critical question was how many systems could remain running while Building 50 relied on outside air for cooling. Using historical January temperatures and computer manufacturer's specifications, SCCS Technical Operations Manager John Weisskopf estimated the heat load for different scenarios. His desktop computer includes a map of the data center, color-coded to show expected heat levels. Weisskopf's estimates showed that a majority of systems had to be shut down.
The next question was what to keep running. SCCS worked with computer users lab-wide to identify top-priority systems to be kept online during the changeover. (See "Computing Service Outages During Cooling System Upgrades.") All others were shut down, starting on Sunday, January 4th.
"We took down altogether over 1800 systems," said SCCS project manager Len Moss. "Quite a few of them were file servers, which are a lot more complicated to take up and down—many were attached to RAID arrays." These data arrays in particular are prone to fail when cycled off and on, Moss noted, so the team made an effort to keep them running.
With the computers safely shut down, contract teams lead by Facilities project manager Harry Shin shut off the chilled water system and began work on the pipes. They first drained then cut the old steel pipes before fitting joints to connect the new ones. Three crews worked to connect both ends at once—one end entering the back of Building 50, the other at Cooling Tower 101. Inside Building 50, four-story sections of vertical pipe required special bracing while connections at their base were severed, then welded to join the new pipes. Near the cooling tower, new pipe sections and elbows came together with jigsaw-like precision. "This was a nice bit of pipe fitting," Shin said. "Everything there is at angles, and not easy to measure."
Though the weather remained cool, without the chilled water, temperatures in Building 50 were rising too high. The HVAC team had installed ventilated, grate-style doors on the second floor, with fans pulling cool outside air into the building. Portable cooling units blew warm air away from hot spots near large server racks. Ceiling venting fans pumped the heated air outside. To ease the heat load, SCCS shut down the RAID data arrays, but the temperature continued to rise. So the HVAC team, lead by Sven Jensen, did some quick thinking. They reversed the side fans to blow air out the doors rather than in, then used the building air conditioning system, blowing up through the floor, to increase air pressure in the server room. The effect was to force hot air out through the doors instead of pulling cool air in. It worked. The temperature stabilized for the rest of the week.
The pipe work was complete the morning of Friday, January 9. Shin's team gradually brought up water pressure, expelling air from the pipes through release valves until only water flowed through the system. After a thorough check for leaks—and none found—the cooling system was back in business that afternoon. SCCS was able bring the computer systems back online Monday the 12th, a day ahead of schedule and with only a handful of casualties—disc drives that failed and have since been replaced. The team even managed to fit in an added repair over the weekend, replacing a computer room fan motor that had worn bearings.
The well-planned project both increased cooling capacity and provided a much-needed backup, by installing a spare set of valves into Building 50's chilled water circulation. The valves will make it possible to connect a temporary portable chiller, to keep Building 50 and its server room cool for any future outage or upgrades. "Even with the current improvements we continue to be at the edge of what can be done with power and cooling in Building 50," said SCCS Acting Core Services Manager Randy Melen, "while we expect SLAC's demands for computing capacity to continue to grow."