Maintaining critical chiller plant at the Advanced Computing Facility
8 February 2024
The Advanced Computing Facility houses the high performance computing services operated by EPCC, including ARCHER2 and the Edinburgh International Data Facility. To ensure maximum efficiency, and minimum energy use, a variety of methods are employed to cool the systems we host including direct air cooling, semi-direct water cooling, and direct water cooling.
To get the best efficiency, assured performance and longevity from our mechanical equipment, there is a need to ensure the correct inspection and maintenance regime is carried out at the correct frequency. To mitigate the risks of this type of work, we schedule such tasks for winter when we are less reliant on our equipment because of our ability to use low outside air temperatures to supplement our mechanical cooling.
This critical chiller plant provides the mechanical cooling options required on a 24/7 basis to support the operation of our Computer Rooms, which constantly generate heat as electrical energy is used in the running of our systems. The chiller systems are designed to operate to high levels of efficacy using free cooling from low outside air temperatures when possible and therefore require regular maintenance to ensure continued, assured operation.
As you would expect, these chillers are maintained regularly to manufacturer's recommendations, but the plant has recently undergone its first specialist internal inspection and clean of the condenser tubes. This is a fairly simple but long, arduous and extensive piece of work which requires disassembly of some of the larger parts of the machine to provide the necessary internal access.
The condenser tubes circulate water inside the chiller, removing heat from the chiller to the roof of the ACF where the water is cooled and re-cooled constantly via large heat exchangers (radiators) and returned to the chiller. Over time these tubes can become dirty or suffer from wear and therefore need to be cleaned and inspected to ensure the free flow of the water and to identify any potential issues before critical problems occur.
Access to each machine was provided by removing the local pipework and heavy end caps once the internal water/glycol mix was decanted and retained for later replacement. The next step was to individually clean each of the many 3 metre long tubes with long brushes until each was suitably cleaned.
This was followed up by visual and then ultra-sonic inspection of each tube wall to determine its thickness and any areas of thinning or concern. All of which is a time-consuming but an extremely valuable exercise, as a few of the tubes were found to be potential future failures.
Preventing critical failure
Fortunately only a handful of tubes were seen to be problematic. These were plugged at each end to remove them from service and, because such a very low number were affected, the loss of them will not overly affect the performance of the plant. This very small reduction in cooling capability is far outweighed by the benefit gained because some potential problems were discovered before a critical failure occurred.
Once the internal works were completed it was a simple, if arduous, task to re-fit the end caps and pipework then replace the saved water/glycol mix. Each machine was then recommissioned to bring it back into service with the knowledge that we are in a more assured operational position than before works started.
We can also be assured that the plant will now operate to its best efficiency which in turn minimises energy consumption, a vital consideration in our operations.