The EPCC seminars listed here are open to everybody, and take place 2–3pm unless otherwise advertised.
If you would like to present at one of our seminars please contact Arno Proeme.
Challenges in Data-Intensive HPC Software
Adrian Tate (Cray)
Thursday June 7th, 2pm, James Clerk Maxwell Building, room 6206
Cray is a world leading provider of supercomputers, analytics platforms, storage systems and the software stack that enables each of these to operate efficiently. The Cray EMEA Research Lab has a focus on the latter, software aspect. As the technologies Cray builds converge towards a single platform capable of performing HPC, AI and analytics, the scope and role of the software stack changes considerably. The lines between the vendor software and community software become more blurred and the traditional functional roles (e.g. compiler, library, tools) become inadequate to provide performance, productivity and operational efficiency. In particular, the interest in data-intensive computation and workflows connecting such data-intensive components challenges the traditional software environment provided by vendors.
In this talk we will describe activities in our research team in the areas of explicit data movement, memory abstraction, asynchronous runtimes and monitoring frameworks that all address different challenges of the converged platform. We will also describe some more futuristic HPC-related investigations.
Students may be interested in some of the challenges and unsolved problems described in the talk, which could form the basis for Cray-supported internships or projects.
Turbo-charge your browser with WebAssembly
James Perry (EPCC)
Wednesday June 27th, 2pm, James Clerk Maxwell Building, room 4325A
WebAssembly is a new browser technology with the potential to enable more performant and efficient client-side web applications. Most major browsers are now shipping with WebAssembly support. In this seminar I will give a brief introduction to WebAssembly and walk through how to get started with it. I will also be demonstrating a practical application that we have ported to WebAssembly: an acoustic ray tracer developed last year as part of the A3 project, which aims to develop fast and accurate acoustic simulation tools for architects.
ARCHER CRAY XC30 Compute Node Stripdown
Martin Lafferty (CRAY)
Wednesday May 9th, 2pm, James Clerk Maxwell Building, room 4325A
Users of supercomputers are often presented with a black box into which they log in and run their applications, which are sometimes exceedingly large and complex. The ARCHER CRAY XC30 supercomputer consists of 26 cabinets, contains 4920 compute nodes and hence nearly 120,000 compute cores, connected with a large array of various interconnections.
This talk gives users a chance to see the hardware that is part of the system they use. A presentation into the physical architecture of a CRAY XC30 will be followed by a rare chance to see and touch the components involved. This will include a full strip-down of a CRAY compute module down to the individual subassemblies, CPUs, etc.
"Why is MPI so slow?"
Daniel Holmes (EPCC)
Thursday May 3rd, 2pm, James Clerk Maxwell Building, room 4325A
The MPICH team published a paper entitled “Why Is MPI So Slow?” at SC17. They describe some important optimisation work inside MPICH but there are also several flaws and misconceptions. In seminar, I will cover the good, the bad, and the ugly aspects of this particular paper and sketch out a road-map for holistically addressing the question posed in its title.
Prediction and characterization of low-dimensional structures of antimony, indium and aluminum
Material research is a key factor in the advancement of technology. Their discovery, analysis and, finally, commercialization enable society to cope with technological challenges, economic problems, and ecological issues. Current trends in technology impose several prerequisites for developing devices that would be used: small dimensions, low price, greater efficiency, and better properties.
The largest share of modern technology belongs to the fields of electronics, energy and optics, with applications derived from nanomedicine to astronautics. Increasing the number of chemical elements used increases the number of components in the devices (transistors, batteries, purifiers). Therefore, it is a natural requirement to have smaller dimensions for these components.
Recent research includes 2D materials. Today, the highest attention is drawn to single-layer materials made of only one type of atoms, transition metal dichalcogenides, with general formula MX2, where M is a transition metal (e.g., Mo, W, Ti, Z, Ta, Nb) X of a halogenated element (eg S, Se, Te) and carbides and / or carbonitriles (MX mark) of early transition metals.
In this talk, one-layer (2D) alotropic modifications of antimony, indium and aluminum elements will be proposed: antimonene, indiene and aluminene, respectively. The existing research and plans for future research will be shown, along with preliminary results.
In the last few years, the machine learning community has focused primarily on developing AI approaches known as deep learning. Perhaps everyone has heard about the TensorFlow framework and its application to the Google Translate service. Deep learning is a powerful tool, however it is still a black box – explaining achieved models and analyzing the net nature are tough open problems. Therefore, conventional machine learning techniques are still popular for solving some particular problems where explanation of the model is desired.
For classification challenges, Support Vector Machines (SVMs) are widely used across different scientific disciplines, namely geo and environmental sciences, bioinformatics, and computer vision. You may meet different implementations of the SVM solvers designed for use on graphic cards, shared memory systems, XeonPhis, however, HPC solution does not exist. Therefore, my colleagues from the PERMON team and I decided to develop the PermonSVM.
In my talk, I will introduce the early stage of the PermonSVM development, summarize theoretical background of the SVM lightly, describe approaches for solving multiclass and multilabel problems, and transformation of the SVM model into probability space.
Theory and Simulation of Time-Resolved X-ray Scattering Experiments
Modern pulsed X-ray sources permit time-dependent measurements of dynamical changes in molecules via non-resonant scattering. The planning, analysis, and interpretation of such experiments, however, require a firm and elaborate theoretical framework as well as advanced numerical simulations.
We have derived appropriate expressions that describe the time-resolved X-ray scattering signal by means of quantum electrodynamics and implemented them in a simple algorithm. Their evaluation requires different input, most notably scattering matrix elements that we compute with our own code from wave functions obtained with commercial quantum chemical software. Since these calculations involve the optimisation of several electronic eigenstates with high-level methods and large basis sets for hundreds of points in nuclear coordinate space, the computational costs are significant even for small systems and HPC resources as provided by the EPCC are necessary.
In my talk I will summarise the main aspects of our theory and simulations, highlight their challenges, and illustrate key points with results from our current research.
Performance Portability with Kokkos: An Introduction
Kevin Stratford (EPCC)
Wednesday March 28th 2018, James Clerk Maxwell Building, room 6206
The issue of performance portability - that is, being able to write code which runs effectively on different architectures - is an important one for scientific applications. In this talk I will give an introductory overview of Kokkos, a C++ library developed by Sandia National Labs in the US which addresses performance portability at the node level.
I will try no assume no prior knowledge of specific features of C++, and explain what is required along the way.
I will discuss the central Kokkos idea of a parallel pattern. This is combined with an execution policy and a definition of the computational kernel to provide a level of abstraction which can be compiled to run on different architectures (typically including CPU and GPU). The common parallel patterns of "for" and "reduction" are used as examples, and compared with OpenMP. Memory abstraction and hierarchical parallelism involving thread and vector levels will also be covered.
All the material here is derived from a recent workshop. Kokkos source, tools, and tutorial material are available at: https://github.com/kokkos
An introduction to The Data Lab innovation centre
Brian Hills, Richard Carter, Matthew Higgs, Caterina Constantinescu (The Data Lab)
Wed January 31st 2018, James Clerk Maxwell Building, room 4325A
The Data Lab has been created to deliver economic and social impact to Scotland by catalysing data innovation across the country.
In this seminar Brian (Head of Data) will present an overview of The Data Lab’s focus and impact to date across the three pillars of collaborative innovation, skills and community. Richard, Matt and Caterina (our Data Science team) will present recent projects they have been working on.
The Data Lab will be moving into the Bayes building this year with EPCC and others. The objectives of the session will be to both share knowledge on our work and catalyse further collaboration in the future.
Progressive load balancing of asynchronous algorithms
Justs Zarins (Centre for Doctoral Training in Pervasive Parallelism, EPCC and Informatics, University of Edinburgh)
Wed November 8th, 2017, James Clerk Maxwell Building, room 4325A
Synchronisation in the presence of noise and hardware performance variability is a key challenge that prevents applications from scaling to large problems and machines. Using asynchronous or semi-synchronous algorithms can help overcome this issue, but at the cost of reduced stability or convergence rate. In this paper we propose progressive load balancing to manage progress imbalance in asynchronous algorithms dynamically. In our technique the balancing is done over time, not instantaneously.
Using Jacobi iterations as a test case, we show that, with CPU performance variability present, this approach leads to higher iteration rate and lower progress imbalance between parts of the solution space. We also show that under these conditions the balanced asyn- chronous method outperforms synchronous, semi-synchronous and totally asynchronous implementations in terms of time to solution.
Potholes in the Amazon (Cloud) - AWS Pipelines for the IoT
Alistair Grant (EPCC, University of Edinburgh)
Wed October 11th, 2017, James Clerk Maxwell Building, room 4325A
Road surface potholes can cause problems for all road users, so how do we detect them and prioritise their repair? We will take a look at some of the Amazon Web Services (AWS) technologies that we have been using as part of a data engineering project to build a prototype backend system for data collection, querying and analysis of pothole detection.
We will look at DynamoDB (a NoSQL database service), Amazon Lambda Functions, API Gateway and possibly a few others. We highlight some of the strengths and weaknesses of these technologies by examining them in the context of our example use cases.
Graph-based problems and the SpiNNaker neural HPC architecture
Dr Alan Stokes (Advanced Processor Technologies group, School of Computing Science, University of Manchester)
Wed September 6th 2017, James Clerk Maxwell Building, room 4325A
This talk highlights two of the many issues high performance computers will have to tackle to reach an exascale machine - power and data communication - and how these problems are starting to be solved. We discuss how software applications will need to be adapted for the solutions to these problems and then describe the SpiNNaker hardware platform and its synergies with the solution for HPCs. We then walk though a simple application mapped from standard C code onto SpiNNaker, and its performance. We end with options on how to acquire access to SpiNNaker hardware and training.
SpiNNaker is a novel computer architecture inspired by the working of the human brain. A SpiNNaker machine is a massively parallel computing platform, targeted towards three main areas of research:
• Neuroscience. Understanding how the brain works is a Grand Challenge of 21st century science. We will provide the platform to help neuroscientists to unravel the mystery that is the mind. The largest SpiNNaker machine will be capable of simulating a billion simple neurons, or millions of neurons with complex structure and internal dynamics.
• Robotics. SpiNNaker is a good target for researchers in robotics, who need mobile, low power computation. A small SpiNNaker board makes it possible to simulate a network of tens of thousands of spiking neurons, process sensory input and generate motor output, all in real time and in a low power system.
• Computer Science. SpiNNaker breaks the rules followed by traditional supercomputers that rely on deterministic, repeatable communications and reliable computation. SpiNNaker nodes communicate using simple messages (spikes) that are inherently unreliable. This break with determinism offers new challenges, but also the potential to discover powerful new principles of massively parallel computation.
MONC: an LES for cloud and atmospheric modelling
Dr Nick Brown (EPCC, University of Edinburgh)
Wed August 30th 2017, James Clerk Maxwell Building, room 4325A
For the past three years I have been working with the Met Office on the Met Office NERC Cloud model (MONC.) This replaces a thirty year old model which has been a crucial tool for UK weather and climate communities but which exhibited significant issues around performance, scalability and the code itself. Our replacement, MONC, has been written from scratch, maintaining the science of the previous model but with modern software engineering and parallelisation techniques. The aim has been to enable the scientists to study vastly larger systems, at far higher accuracy over many cores. In addition to computation, scientists also desire to perform analysis the on raw data in order to generate higher level information. This is a challenge because the raw data is very large in size (many TBs) so it is not realistic to write it out to file and analyse offline. Instead this is performed in-situ on the data as it is generated, which raised several challenges that we had to solve. I will talk about both these aspects of MONC, as well as some of the offshoot work that we have looked at such as porting and evaluating aspects of the model on GPUs and KNLs.
Experiences from EPCC's first MOOC: Supercomputing
Dr David Henty (EPCC, University of Edinburgh)
Wed July 26th 2017, James Clerk Maxwell Building room 4325A
As part of PRACE (Partnership for Advanced Computing Europe), EPCC ran its first ever MOOC (Massive Open Online Course) in March this year. The 5-week course used the FutureLearn platfrom - www.futurelearn.com/courses/supercomputing - which hosts many other Edinburgh MOOCs including the Higgs course from SoPA. In this short informal talk I will cover the history of the course, the process of designing our first MOOC, features of the FutureLearn platform and experiences from the first run in March. I will also compare and contrast MOOCs with other online teaching such as the HPC distance-learning courses we run as part of the DSTI (Data ScienceTech Institute) MSc programme. *Note: the next run of the MOOC starts August 28th - register now!*
Solar Panel detection in Satellite Images using Deep Learning
Marc Sabate (EPCC, University of Edinburgh)
Wed July 12th 2017, James Clerk Maxwell Building room 4325A
Deep Learning models have become very popular with the release of libraries such as Tensorflow, Torch, or Theano, allowing to train deep networks in a reasonable amount of time. In this talk I will present how a Convolutional Neural Network can be used to detect solar panels in Satellite Images.
This talk will start with a brief overview of binary classification problems using Logistic Regression. We will see how Logistic Regression models are built under the assumption that classes are linearly separable, and how Neural Networks can overcome this limitation. I will provide a defitinion of Convolutional Neural Networks, a particular type of Neural Network specifically designed for Image Processing problems, and I will finally present a network that successfully detects solar panels in satellite images from four cities in California.
It is all still an ExaHyPE
Dr Tobias Weinzierl (Department of Computer Science, Durham University)
Wed June 28th 2017, James Clerk Maxwell Building room 4325A
ExaHyPE (http://www.exahype.eu) is a H2020 project where an international consortium of scientists writes a simulation engine for hyperbolic equation system solvers based upon the ADER-DG paradigm. Two grand challenges are tackled with this engine: long-range seismic risk assessment and the search for gravitational waves emitted by rotating binary neutron stars. The code itself is based upon a merger of flexible spacetree data structures with highly optimised compute kernels for the majority of the simulation cells. It provides a very simple and transparent domain specific language as front-end that allows to rapidly set up parallel PDE solvers discretised with ADER-DG or Finite Volumes on dynamically adaptive Cartesian meshes.
This talk starts with a brief overview of ExaHyPE and demonstrates how ExaHyPE codes are programmed, before it sketches the algorithmic workflow of the underlying ADER-DG
scheme. We rephrase steps of this workflow in the language of tasks.
We then focus on a few methodological questions: how can we deploy these tasks to manycores, what execution patterns do arise, and are the new OpenMP task features of any use? How can we rearrange ADER-DG's workflow such that we reduce accesses to the memory, i.e. weaken the pressure on the memory subsystem? How can we reprogram
the most expensive tasks such that they exploit the wide vector registers coming along with the manycores? A brief outlook on MPI parallelisation wraps up this methodological talk.
We focus on results obtained on Intel KNL nodes provided by the RSC Group, on Intel Broadwell results from Durham's supercomputer Hamilton, and on results from the SuperMUC phase 2 supercomputer at Leibniz Supercomputing Centre.
This is joint work with groups from Frankfurt's FIAS, the University of Trento, as well as Ludwig-Maximilians-University Munich and Technical University of Munich.