From the video EPCC: A Tradition of Excellence
Alan Simpson: This is about the range of systems that EPCC has had, different eras in EPCC history and an analysis at the end and a few slides.
The intention isn’t to give a history of all the machines that have ever been in parallel computing, or in Edinburgh, but simply to pick out some highlights, a top ten if you like, of the systems that we have had, what we have done with them, and where they have sat in the overall scheme of things.
The beginning of EPCC was really in the initial interest in parallel computing in Edinburgh at the beginning of the 1980s. It wasn’t just the beginnings of us; it was the beginning of computer simulation really, the beginning of parallel computing in a serious way. It was a very interesting time.
The very first parallel machines that were in the University of Edinburgh were two DAPS, digital ray processors, and these had a performance of SAM 20-30 mega flops, each of them. They were very interesting, they were one of the very earliest production panel computers, in fact there was about ten of them made before this technology was taken on by another company. Each one of these had the same sort of compute power as a mobile phone.
All the ICL DAPS ever made had the same compute power as a pretty moderate, cheap secondhand laptop. At the time these were earth-shattering and it is hard to really see how much an impact this made, it really reignited a whole series of activities around the University of Edinburgh and elsewhere, in how you could exploit these facilities to a new type of computational research, a new type of science. When you look at the raw numbers, it is clear they are not all that impressive compared with what people expect of technology today.
Probably the biggest step at Edinburgh was the introduction of our first make of computer service, this saw a substantial boost in performance, more than an order of magnitude, nearly two orders of magnitude, and consequently a large boost to the numbers of users and applications that were able to exploit parallel computing. This machine was called the Edinburgh Concurrent Supercomputer, and the project associated with this was the immediate pre-cursor to EPCC. This was the father of EPCC in that sense. This was based on transputer technology developed in the United Kingdom, in Britain, and was unfortunately one of the last such machines we had. We had a couple of UK machines.
We move now to the 1990’s, the beginning of 1990’s was the birth of EPCC. This year, 2005, we have been celebrating our 15th anniversary; there have been a number of events associated with the EPCC@15 logo and concept. Obviously these machines were incredibly important for where we are today. The first machine which was really relevant here was the I860 machine also made by Meiko. This had a peak performance of some 5 gigaflops, based on I860 processors, again an order of magnitude step forward. From a computational prospective, one of the interesting things was that it was split between two large user groups. One was in quantum chromodynamics, which is a type of particle physics, and the other is in materials, looking at the properties of things around us. It was very much focused on these grand challenge concepts, which evolved in what today in the UK is known as capability computing, or high-end computing elsewhere. The QCD code that ran on this machine was hand-coded assembler, squeezed as much performance as possible, and sustained a gigathon, which today is what you might get of a well-tuned programmed on your PC. But at that time, you have to remember, it was one of the fastest application codes in the world, one of the very first codes to sustain that level of performance.
The other machine that was around at the birth of EPCC was the machine designed and built by Thinking Machines Corporation, the Connection Machine. In fact this was the CN200, in a very similar performance but it had 16,000 simple bit processors backed up by 512 floating point units, but again a pretty substantial machine. In many ways one of the biggest general purpose machines of its architecture that has ever been in the UK. This here is the associated state of the art data volt. It stored 10 gigabytes. If you had a lot of money you could get 4 of these which made a complete circle, about the size of a room. It would have cost quite a lot of money, and had about the same storage approximately as you would get in your laptop today. A lot of technology but not as much storage as you would like.
This was a fantastic machine, everyone liked it, it was easy to use. Unfortunately that sort of technology has not really been continued on in general purpose, we see much more a move to that which is much more based on the Meiko style and, later on, the Cray style, which I am just about to talk about.
The middle ages for us were in the mid 1990s. The flagship machine that we had then, and one of the key machines that EPCC has been involved with, was the Cray T3D. It was installed in 1994; it was the first time in the UK that the National Computing Service was on a parallel computer. It was a big step forward not just for Edinburgh, for EPCC as a host and provider of support for this facility, but for the UK as a whole. It seems like for many people it was not just a step to the side, but a major investment in re-engineering of codes, a lot of codes that were previously on vector machines or zero machines, we had to move them across to parallel computers.
I think we were fortunate that the Cray T3D machines was widely regarded as one of the most successful parallel computers, it was popular and easy to use, the performance was excellent. While it gave us some challenges it was critical to the uptake of parallel computing in the UK. This was followed by another Cray T3E, it was increased in size as well over a period of time, it was self-hosted, and the systems we had provided a variety of different services for different communities with different amounts of memory. So it was a pretty flexible machine, many people regard this machine as one of the most successful parallel architectures and implementations that there has ever been, a tremendously successful machine both here and worldwide.
The New Millennium brought yet another series of vendors and architectures here. The first of these was the Sun 15k, it is called Lomond. It is still the major provider of the University of Edinburgh High Performance Computing Service. For us we had a series of SNP clusters from Sun and we have been providing a service from those since 1997/98. But this is the major one and is currently in use today. The other most important system that we had in the early part of this millennium was the IBM P690 cluster, which is called HPCx. This performance involved 11 teraflops and is made of 1600 Power 4 IBM processors. This is the system that currently delivers the UK’s premier HPC service, so most of the major capability computing users in the UK make use of this. The consortium that runs it are University of Edinburgh at EPCC, our colleagues at Daresbury Laboratory and our technology provider is IBM. It has been more or less one of the largest academic supercomputers in Europe at various times, it has been upgraded, it will be upgraded again, and there are other machines upgraded. We swap where we are on the list, we are pretty high up on the list, we are optimistic that this will be a leading supercomputing facility till 2008 or so.
It has been a very exciting 12 months for the University of Edinburgh and for EPCC. We have recently launched a new building with computer rooms, and a number of new facilities, and that launch was carried out by HRH Prince Phillip who is the Chancellor of the University of Edinburgh and this happened in July 2005. One of the machines which was launched then was the QCDOC machine, which stands for: quantum chromodynamics on a chip. So this machine was designed by a group of academics in collaboration with IBM as the vendor, the academics were based in Columbia and the University of Edinburgh. The chip was designed to have what you need to do QCD particle physics and not a lot else, so it is not as flexible as some machines. But it delivers a lot of performance, same performance as you get out of HPCx or maybe a little bit more. It has 14000 CPUs, so a lot of CPUs, these are not as fast but they are very good value, and they provide fairly limited heat in a compact packaging.
It is currently one of the largest special purpose machines in the world, actually by most categories it is one of the largest machines in the world as is HPCx.
Just at the turn of the year, we also got an IBM Blue Gene. This single cabinet contains 2048 power PC processors, the technology used is very similar to QCDOC, and the projects are very closely related within IBM, and I have seen a strong amount of linkage and synergies throughout their history. We were pleased to get the first Blue Gene installed in Europe and one of the first in the world. The nice thing about it is this low power requirement and you let a lot of processors in the box. It is also capable of extreme scaling, this is a model size system, just over 5 teraflops peak, but there are systems in the USA, of 100 teraflops or more, this is currently number one on the top 500 list which is a list of the largest computers in the world. Ours is not quite as big as that, but still a very substantial resource; and a major opportunity for scientists at Edinburgh University to do high-quality capability science.
If you have got high quality systems, you need a high quality modern environment to house them. We have recently launched the ACF, the Advanced Computer Facility. It is just a few miles south of EPCC, and the intention is to house a number of current systems, and to look forward into housing systems in the future. There are 2 computer rooms, each of just under 300 square metres; one of them is fully kitted out with power and cooling and is just over half-full of systems. The other one remains pretty empty and allows us substantial room for expansion over the next few years, so we have a lot of systems now, but we hare looking ahead to the future and what opportunities there might be for that. Having a facility like the ACF provides us with the opportunity to take these things forward for the next few years as well.
I didn’t want to spend a lot of time and bore everyone, by going into all the different facilities we have had over the time. There is a whole bunch of them that I hadn’t had a chance to mention; you will see that there are a whole lot of different vendors a whole lot of architectures, quite an interesting spread. Many bring back happy memories, and some sad memories as well.
Having however talked about the top ten, I wanted to show a couple slides to extract some of the key trends that we have seen and summarize what the experience of the last quarter of a century has been like for EPCC and the University of Edinburgh.
We have been fortunate we have seen a consistent new generation of the machines every few years, you would be lucky to get 5 or 6 useful years out of most of these facilities. So it is important that you need to reinvigorate the technology on a regular basis. The early machines were quite difficult to programme and by modern terms really not that powerful. But at the time it is important to remember that they were the most powerful facilities that you had and a major opportunity for doing novel computational research. I think one of the trends that I haven’t really had any time to discuss in any detail is how much the tools have improved, how much the environments have improved. With many of the early machines you had to learn new languages, new operating systems. There has been much more standardisation over recent years. In terms of the architecture, the early machines were very much one-offs, they were dedicated architectures for high-performance computing, they were very different from the sort of computers people would have on their desktops or on their main frames elsewhere.
A few years ago, in the middle part of the 1990s, clusters became very popular, whether they were clusters of servers, SMP servers or clusters of PC systems like the Beowulf systems. But it has been nice that over the last few years, we have seen a variety of architectures available. As well as clusters we have seen some special-purpose machines and number of vector machines coming back into serious high-performance computing, and a number of very massively parallel systems like Blue Gene and QCDOC.
One of the interesting trends for EPCC has been in the numbers of processors. Most of the parallel machines have had between 100 and 10,000 processors, which have been pretty consistent over time. I know that when we have spoken to people they express that in recent years how machines have got so much bigger. When I look back to the early days of EPCC we had two 4000 processor machines, our first sort of more general purpose machine at 400 processors, and now we have machines today that range from 1000 to 10,000, there has been something of a growth in the size of these systems, but it has not been that enormous. That is because EPCC has been very much focused on parallel computing, that’s been our ethos and the technology that we have been involved with throughout that whole time. Whereas other computer centres have moved into parallel computing recently and they are looking back 25 years, and looking at main frames or serial missions.
Generally what you get from this is that parallel performance is sometimes 100 or 1000 times greater than serial processors. What that does is let you get about ten years ahead of a serial curve, if you compare it over the time it is variable. On average that is pretty much a trend, if you want to do science on serial computers you have to wait ten years if you can do it now if you exploit parallel computing.
I plotted this fun graph of the peak performance of the key systems at Edinburgh over the last quarter of a century. There is a well-known law called Moore’s Law, which suggests that performance of computers doubles roughly every 18 months, and I plotted that in this lighter colour. You see that generally EPCC has managed to stay on or above the trend of Moore’s Law for quite a long period of time. There have been a number of big boosts over a period of time but we have been quite successful at that. Most of the systems that I have discussed have been in the top 20 or so world wide, they are not always the world’s fastest by any means, but they are pretty good examples of where the state-of-the-art has been, and we have managed to sustain that over a very long period of time. We got gigaflops around 1990 for codes, we were running teraflop codes on the HPCx in 2002, and suggest looking ahead it is not all that long until we get peta flops for sustained application codes. It will be interesting to see what new frontiers of science that will launch.
Just to summarize, my message is that we have managed to stay at the forefront of parallel and high-performance computing for a quarter of a century. I think that is an achievement. When EPCC was first involved in this technology it looked to be a fairly challenging, novel and therefore risky technology.
Currently we run, with our collaborators, HPCx, which is the UK’s premier national PC service. Within the last 12 months we have launched two major capability systems, Blue Gene and QCDOC, and we have been able to house these systems in a state-of-the-art computer facility, the ACF, which also has significant room for expansion.
The future appears extremely exciting; we have got a lot of good technology today, a lot of attractive things to look forward to. Not just in terms of the systems but in terms of the science that will come out of those systems.
Thank you very much for your time, and thank you for listening.