Project ExTASY: solving the sampling problem

Author: Iain Bethune
Posted: 15 Oct 2013 | 12:18

Alongside APES, EPCC plays an important role in another project jointly funded by EPSRC and the US National Science Foundation to overcome one of the "Grand Challenges in the Chemical Sciences".

Free Energy Landscape of Alanine-12

ExTASY - or Extensible Toolkit for Advanced Sampling and analYsis - tackles the problem of understanding the behaviour and function of complex macromolecules such as proteins, DNA, and other bio-molecules through sampling with molecular dynamics (MD).  

The sampling problem

The key problem facing the field is this: to preserve accuracy of simulations, MD is usually performed using a time-step of a few femto-seconds (billionths of a millionth of a second), whereas many events of biological importance such as protein folding, ligand docking and DNA replication occur on the order of seconds to hours.  Even with state-of-the art simulation software, HPC, and purpose-built hardware it is possible only to reach into the realms of milliseconds of MD, leaving the problems of interest tantalisingly out of reach.

Coupled with the fact that one occurence of a rare event is not good enough - typically scientists would like to observe tens or hundreds of events using MD to gather statistical data - you can see why this is a grand challenge!

ExTASY's objectives

What the ExTASY project proposes is a three-pronged attack on the problem:

  • Support for high-performance high-throughput execution of ensembles of MD calculations.  This means managing hundreds or thousands of coupled parallel jobs, each of which may run on hundreds of CPU cores as well as orchestrating associated 'big data' movement and interfacing with analysis tools in a heterogeneous execution environment spanning UK (e.g. ARCHER) and US (XSEDE) compute resources.
  • Developing novel analysis tools to allow on-the-fly control of the simulations to bias sampling (in a mathematically rigourous way) towards the rare events or long-timescale motions of a system.
  • Providing a flexible and portable interface to existing highly-tuned MD programs coupled with new algorithms for MD integration with ultra-large time steps.

If we can achieve these three objectives together in a single framework or toolkit - ExTASY - then we will truly make a step change in our ability to compute and understand the dynamics of these complex macromolecular systems.

About the project

The ExTASY project consortium is led by Prof. Cecilia Clementi (Rice U.) and Dr. Charles Laughton (U. Nottingham), both end-users of molecular dynamics and developers of analysis tools.  Prof. Glenn Martyna (IBM) and Prof. Ben Leimkuhler (U. Edinburgh) lead on design and implementation of new MD integration algorithms.  Dr. Panos Parpas (Imperial College) and Prof. Mauro Maggioni (Duke U.) are concerned with improved methods and algorithms for analysis of MD trajectories.  

On the software side, Prof. Shantenu Jha (Rutgers U.) leads on the architecture and implementation of the ExTASY workflow management framework, and I have a cross-cutting role coordinating the overall software development process and ensuring that the various software components are developed in a scalable and sustainable manner.

EPCC's project team

As well as myself the EPCC project team consists of Elena Breitmoser and Toni Collis.  Initially we are working on some improvements to Prof. Clementi's LSDMap analysis code to allow it to integrate into the toolkit, and understanding the interface between existing MD packages and the new integration algorithms developed by Profs. Martyna and Leimkuhler.

The project will run for three years and kicked off last month with an all-hands meeting in Nottingham, so we expect to be blogging regularly about our progress.  

The image shows a free energy landscape of the Alanine-12 molecule mapped out in two diffusion coordinates determined without a priori knowledge of the system.  Key stable and transition structures are labelled as shown.


Iain Bethune, EPCC