Potential research topics
Please contact the specified supervisor for more information.
Supervisor: Mark Parsons
ARM is becoming a serious contender in the race towards Exascale, however in order to properly challenge the status quo the ecosystem must become broader and more mature. HPE have recently announced their first ARM-based HPC platform, the Apollo 70; EPCC will host a 4,096-core Apollo 70 system from early 2018. The PhD students funded through these studentships will be based at EPCC, where they will have direct access to leading supercomputing experts, and the will work closely with HPE to optimise the performance, efficiency and reliability of the ARM software stack. Possible topics for research may include, but are not limited to, performance tuning, runtime optimisation, scientific libraries and programming models. Applicants are free to propose their own areas for research.
Task-parallelism within the Lanzcos eigensolver algorithm
Supervisor: Chris Johnson
The Lanczos algorithm may be used to find eigenpairs (eigenvalues and eigenvectors) of Hermitian matrices. From early on the algorithm was known to suffer from problems due to rounding errors which cause previously found eigenpairs to reappear as the algorithm progresses. We have introduced a new algorithm for finding eigenvalues in specified low-density parts of the spectrum which eliminates these spurious copies. The method uses selective re-orthogonalization and is based upon the LANSO algorithm of Parlett and Scott.
The new algorithm has been implemented in the parallel QCD CHROMA package but there are still a number of areas which need further investigation. For example:
• Could the algorithm benefit from task-based parallelism?
• What is the most efficient way to access large number of vectors created as the algorithm proceeds – should they be stored to disk or recalculated?
• Could the presently serial tri-diagonal solver used within the algorithm benefit from parallelisation?
• How can the code be made to run most efficiently on manycore processors?
To investigate these it is likely that a new generic implementation of the algorithm would be needed to give the greatest flexibility.
Performance portable programming models
Supervisor: Kevin Stratford
Many scientific applications are extremely complex and have very long development times. They are almost all written in "traditional" programming languages such as C/C++/Fortran. This has meant that few have adapted gracefully (or at all) to increasingly heterogeneous high performance computing infra-structures incorporating highly vectorised many-core architectures, graphical processing units, and occasionally more exotic hardware such as digital signal processors and field programmable gate arrays. The challenge is therefore the development of programming models which allow scientific application developers to write portable and efficient applications without excruciating pain.
Many scientific applications achieve a disappointingly low performance compared with the peak performance hardware can deliver. This is a waste of energy and money. Can improved programming models, compilers, and tools help this situation? Can such performance be achieved in a portable manner on different types of hardware? What types of hardware are most cost-effective? Can programming models express a move away from a "bulk synchronous" execution to something more directed toward the asynchronous execution of independent tasks?
The aim of the project is to help develop parallel programming models for real scientific applications which can address the issue of "performance portability". This might include development of the model itself, tools to aid analysis of performance portability, and implementation of programming models on new hardware platforms.
Resource allocation management on Exascale systems
Supervisor: Michele Weiland
HPC systems use job schedulers to manage the allocation of work to resources. The principal aim of the scheduler is twofold: ensure that the system resources are used as much as possible, and ensure that users’ jobs progress through the system in an acceptable time scale. Job schedulers need to adhere to constraints, which are implemented in scheduling policies.
With HPC systems getting ever larger, and likely more heterogeneous, scheduling is becoming a more and more complex optimisation problem. The job throughput rate is also no longer the sole optimisation target for schedulers; power and energy limits that may be imposed on the system, or on the jobs, also come into play.
This research project will investigate novel techniques for optimised resource allocation management on Exascale systems, both for traditional HPC usage and more data centric applications.
Software engineering studies of software development in research
Supervisor: Neil Chue Hong
Software underpins the majority of current research. However relatively little research has been undertaken to identify how different software engineering approaches are used or may apply in the development of software within universities and research organisations, where incentives and practices are different from industry.
A PhD project in this area would be expected to look at gathering empirical data to investigate hypotheses around differences in practice, efficiency of approaches, understanding of incentives, or categorisation of project types and practices based on maturity of the software and its funding.
Study of the impact of software in research
Supervisor: Neil Chue Hong
Software underpins the majority of current research. However it is only recently that data has started to be collected on its impact and role, driven by the alt-metrics movement. Software is often in the form of libraries which make it hard to assess the true impact.
A PhD project in this area would be expected to examine current approaches to assessing the impact of software, and propose novel methods and tools to better identify the impact of software. This may include topics including: software citation and discovery, integrating data from research impact studies, defining authorship credit, and automatic identification of software used in research.
Scalable global load redistribution
Supervisor: Michael Bareford
Typically, HPC codes distribute the workload by assigning an identically-sized section of grid space to each parallel process; this approach, although suitable for simulations where the workload is inherently uniform, such as a uniform particle distribution, will not perform well should the workload possess marked spatial variations. Hence there is a need to develop alternative schemes, whereby processes may be assigned differently-sized sections of grid space in order to balance workload, eg, number of particles. Dividing the grid according to some property requires a curve that visits every grid cell: furthermore, any two adjacent points on the curve must always refer to neighbouring grid cells. The Hilbert space-filling curve meets these requirements, it allows us to represent a multi-dimensional space as a one-dimensional function.
However, balancing the workload in this way gives rise to another problem, where to store the data mapped by the space-filling curve such that any process can access any part of it. Storing such information on a single node inevitably limits the grid space that can be modelled. The aim of this project is to implement and test a scalable global load redistribution algorithm that requires only O(log p) computation and O(1) local storage. The key parts of this algorithm are a parallel prefix operation and a distributed binary search, which are good candidates for using a single-sided message passing or some other PGAS approach (eg UPC or CAF).
Addressing performance unpredictability with asynchronous tasks
Supervisor: Mark Bull
The design of many HPC applications is based on the fundamental assumption that the same computation on different data will always take the same amount of time. This allows applications to use static load balancing, which greatly simplifies their development. However, future trends in hardware design, including advanced power-saving features, may undermine this assumption.
One way to overcome this potential problem is to use asynchronous task-based programming models, which automatically balance the load at runtime. Some of the questions this project could address are:
• How much performance unpredictability is there in current hardware and what are its characteristics?
• Can we model this unpredictability?
• How unpredictable can hardware get before asynchronous task implementations are worthwhile?
Improving programmability of micro-core architectures
Supervisor: Nick Brown
Micro-core architectures are very many simple, low power, low memory cores placed on a single (often very cheap) chip. There are numerous examples such as the 1024 core Epiphany V, 256 core Knupath Hermosa, 256 core Kalray Bostan and the XMOS XCore. This means significant parallel performance at very low power and there is interest in using these not only for traditional HPC but also in other fields such as the embedded space. However currently they are difficult to program and take advantage of, not least because each technology is different, with its own idiosyncrasies and they each present different low level interfaces to the programmer.
One of the key questions is how to make these more accessible to potential programmers, who might not be HPC experts, such that those in machine learning. We have implemented a very low memory version of Python, ePython, which allows people to quickly and easily write highly parallel Python codes for the Epiphany and we have demonstrated its use in a number of areas including machine learning for detecting lung cancer in CT scans.
This research project would involve tackling some of the challenges that our current work has identified. For instance how we can best optimise high level interpreted languages for these micro-core architectures, the best way different applications can take advantage of this new technology and novel programming language abstractions for easing and/or standardising the porting of codes to micro-cores.
Node level MPI
Supervisor: Adrian Jackson
MPI is designed and implemented to be a process centric memory-to-memory data transfer approach. However, when scaling up to very large systems, with hundreds of thousands or millions of processes, efficient implementation is very hard. It also does not map well to current hardware, where large numbers of cores are grouped in shared
This research project would investigate implementing node level MPI functionality, enabling a single (or small group) of MPI enabled processes to undertake communication outside a node whilst collaborating with a larger group of processes within a node. The aim is to allow MPI programs to scale up to very large systems whilst still maintain good performance and not changing the fundamental features of the library.