New SGI system at EPCC: Ultra!

Author: Maciej Olchowik
Posted: 26 Aug 2014 | 17:37

Last month EPCC added a new supercomputer to its portfolio. Working in collaboration with the Digital Health Institute Scotland we have acquired the SGI UV2000 system. Unlike many of our existing HPC resources, Ultra (as it’s known by the DNS name) is not a cluster, there is just one Linux operating system controlling all 512 computing cores and 8TB of memory. This offers many advantages to the researchers and opens up new possibilities - suddenly we can run a large code without complex parallelisation!

SGI’s history is a choppy one. In the 90s the company pioneered the use of supercomputers and was always the synonym of a cutting-edge technology. However the company’s sales began to decline, partly because the Beowulf clusters started to become more popular. In 2009 SGI was purchased and rescued from bankruptcy by Rackable Systems, the US manufacturer of traditional x86 clusters. The logo was changed as was the “I” letter, but the company survived. To sum up, SGI has been around for 30 years and has always been present in the HPC arena. EPCC is also about HPC at its core and we are delighted to be able to work with SGI again.

Today, the company’s flagship offering is the Ultra Violet 2000 supercomputer. It utilises SGI proprietary NUMAlink network technology, which is a glue that makes it possible to present discrete hardware to the operating system as one big entity. However, in my opinion, the most interesting aspect of our system is how it handles the data. Ultra is equipped with three different storage tiers:

  1. Fast IS5500 SAS disk array (120TB) for running jobs
  2. Infinite Storage Gateway appliance (192TB) for general use
  3. Spectra Logic T380 tape library (500TB) for long-term archiving.

In addition, should it become necessary, there are plans to connect the system to EPCC's Research Data Facility (up to 25PB capacity). The Infinite Storage Gateway (ISG) is an interesting piece of technology in itself. Currently, there are only two hierarchical storage management systems being predominantly used in the datacentres around the world. One is IBM’s Tivoli Storage Manager (TSM) and the other is SGI’s Data Migration Facility (DMF). The ISG that we have here is basically a DMF appliance in a 4U box. It comes pre-installed, plug and play, and is driven with a simple web-based interface that greatly simplifies DMF administration. In addition all the data tiers will continuously be monitored by TrustedEdge software, which promises to automatically migrate to tape the data that has not been accessed for a given period of time, releasing valuable space on the primary SAS disk array. TrustedEdge is Windows-based software and we run it on a virtual machine here. Finally, with such a large amount of data, we have also purchased a license for LiveArc indexing software to help make sense of all the information stored on the system.

To complete the picture and ensure fair resource allocation, we have deployed the PBS Pro batch job scheduler which works well with the SGI systems due to the support of the Cpusets. This feature is essential to ensure the HPC jobs are separated properly (cpu cores and memory) on a big SGI system.

There is no doubt that our Ultra system is an interesting one and packs a lot of new technologies inside. Now that the acceptance tests have been completed and the system is up and running, our next challenge is to put it to a good use and do some real science on it. Let’s hope the system lives up to its name!

Author

Maciej Olchowik, EPCC