PRACE SoHPC: Wee Archie project overview

Author: Guest blogger
Posted: 12 Aug 2019 | 11:02

Caelen Feller, a PRACE Summer of HPC (SoHPC) student working with Wee Archie, gives us an overview and status report of his project.

As I have said previously, I’m working with Wee ARCHIE, EPCC's mini supercomputer, this summer. Wee Archie is made of 16 Raspberry Pi chips – each very small, lightweight computers connected together to work as a single machine. With each chip is an attached LED panel, allowing me to display what is occurring on the chip and how it is communicating to other chips. I have created a series of tutorials and simple demonstrations to be run on Wee Archie which explain the basics of message passing using MPI to a complete novice in parallel computing and non-expert in computers in general. 

I am currently working on a coastline management programme, which simulates flood and wave barriers. I am using the LEDs to make sure it’s very obvious to a non-expert in computing how this is happening in parallel, and I will also package this and other existing demos into a web browser, where my tutorials will be displayed.

Wee Archlet prototyping

To begin, I assembled a “Wee Archlet” – a simpler version of the Wee Archie system – using only four Raspberry Pis, so I understood exactly how Wee Archie was put together.

Setting up a Raspberry Pi is much like setting up any Linux computer: you can do it graphically, or over the network. But I won’t go into that as the interesting parts of the configuration are how they share information, and how they show it.

Setting up MPI on a Raspberry Pi is fairly easy, and allows machines to start communicationg as soon as you have them on one network, and they all know where to find each other. You can see below the small “Ethernet switch” I use to connect them all to a local network.

An interesting aspect of the set up that is very convenient is a shared drive. Raspberry Pis all have their own SD cards to store files on but it’s possible with Raspbian (the operating system they run) to share a drive over the network. This means they can all have access to the same storage space as fast as the network allows, and means you can easily share data, programs, and output between Pis. EPCC has a very good guide on how to build your own, if you’re curious about more details!


Running MPI code on the Archlet is as easy as running it on any other cluster once it’s set up, but showing how the nodes communicate is not a typical thing to want to do. My LED panels are Adafruit LED backpacks, and while you can communicate with them in C, it’s far more pleasant to do so via Python. Adafruit provides a library handling the hardware end of things, such as bus communication, which you can find here. Then I can talk to the Python script from C, making it as simple as calling a slightly modified MPI function!

Animation server

With this Python library I can do simple things such as this idle animation by saving the animation as a sequence of 8×8 images of only ‘0’s and ‘1’s, loading them each in turn, and turning on the LEDs corresponding to ‘1’s and turning off the ones corresponding to ‘0’s. But handling the complicated types of animations necessary to show even a simple send and receive required a more complicated solution – my animation server.

On each Pi I run a Python script which starts a web server, built using Flask, a lightweight server framework for Python. Whenever this web server gets a connection with the necessary data, it will tell a separate script (the display server) to play the correct animations. The web server will also take care of talking to the servers on the other Pis if it is necessary. For example in the send and receive animations, each send has to wait for a matching receive to start and vice versa. I’ll describe this further below.

The display server runs on a separate process to the web server, and it will queue all the requested animations and processes up in memory, and work through their playback by converting them into single images, loading them onto the LED, and moving onto the next one.

I’ll revisit the animation server in a later post once it’s finalised, and go through the more complex, smaller details. For example the display server needs to always be able to receive more animations from the web server, so you have to use one of Python’s many concurrency solutions. I’m currently using processes, but that may change according to need. I also might change the queue-based display server to one that actually reads the animation from a static file on the SD card, allowing for more complicated sequences of animations, replays, etc.

This is a visualisation of a simple C program using MPI, using the animation server discussed above. In the program, the bottom Pi will send a message (the number "1") to the one above it. Then that Pi will pass it on to the one above it, and so on. When you reach the final Pi, it will pass the message back to the first one, completing the “ring”.

This may seem useless, but imagine you need to add together numbers stored on each Pi, the results of some computation run in parallel. This ring is a simple way of doing it – add your own number, pass on the sum. When it gets back to the start, you’re done! There are more sophisticated ways of doing this, but I’ll get to them in a later post.

Collective communications

Collective communications are the other important thing to illustrate in a demo on the basics of parallel computing. These are the initial animations I’ve created to demonstrate these concepts! For full context, a collective communication is a one-to-many or many-to-many communication, rather than the simpler one-to-one communication described previously. Some of the most commonly used collectives are shown above. A master Pi is needed in one-to-many, one which controls the flow, gives out all the data, or receives all the data. Here, this is always the bottom Pi.

Gather will collect data from every computer in your group and drop it all on one. Reduce will do the same, but apply an operation as it goes to the data – here it sums it up. Broadcast will send data from the root to every computer in the group. Scatter will do the same, but rather then send them all everything, they each get slices of the data.