I am a research fellow at EPCC working on several nationally- and internationally-funded projects. Before that, I worked as a Senior Data Scientist at the British Geological Survey, as a Senior Research Associate at the Data Intensive Research Group of the University Edinburgh, and as a Research and Teaching Assistant at the Computer Architecture Group of University Carlos III Madrid.
The main goal of my current research is to bring together data-intensive and high-performance computing. This objective can be broken down into two complementary research topics: The first one is to develop adaptive communication techniques which optimise the data movement for data-intensive applications at different HPC levels. The second one is to facilitate the development of scientific workflows that can by run in many HPC environments while hiding the complexity to the users, and to develop adaptive and automated scientific gateway to allow users to share workflows and data, and which allow them to submit their computational tasks to the most appropriate HPC resource depending on the characteristics of their tasks and the HPC resources available.
To achieve this objective, my past research has addressed several of these aspects on two different levels:
- Processing layer: Designing and implementing dynamic techniques to provide high-speed reliable access to remote data. This research targets the reduction of bottleneck in I/O subsystems applying locality-aware methods. Another aspect of this research is the reduction of the data volume transferred by applying lossless runtime compression algorithms, turning compression on and off and selecting at the most appropriate compression algorithms at run-time depending on the characteristics of each message, network performance, and compression algorithm behaviour.
- Application layer: Designing and implementing new python libraries and toolboxes for writing scientific applications. This research aims to enable users to write applications that can be run automatically in parallel across large datasets and visualise results without any HPC/Cloud environment-specific requirements and without scientists having to deal with the details of the underlying infrastructure.
List of my most relevant publications can be found at: https://docs.wixstatic.com/ugd/4472a9_e9c4e1ee5e074927b5ba785f8859e3de.pdf.