INTERTWinE project presented at collaboration workshop in Japan
Posted: 23 Jan 2018 | 09:11
Last month I attended a collaboration workshop in Japan between the Centre for Computational Sciences (CCS) at the University of Tsukuba and EPCC. I was talking about the INTERTWinE project, which addresses the problem of programming-model design and implementation for the Exascale, and specifically our work on the specification and implementation of a resource manager and directory/cache.
The resource manager enables different runtimes to work together and share resources in a fair way. For instance a code might wish to take advantage of multiple programming technologies (either explicitly combining these technologies in the user’s code or by implicitly calling via a library which then makes certain assumptions.) The danger is that resources can become over subscribed, for example too many threads, and this results in a performance loss. The resource manager therefore marshals these at a high level and, not only statically distributes resources but also supports them being manipulated dynamically, such as runtimes lending out resources to other processes when they are idle.
The directory/cache is designed to support transparent (to the programmer) execution of tasks over distributed memory machines. Traditionally task-based models have been limited to a single memory space but this is really more of an implementation challenge in the runtime rather than any specific limitation of the paradigm. Efficiently moving data between nodes is the critical challenge here and our directory tracks what data is physically allocated to what memory space to then support reading and writing either locally or remotely. The cache is for optimisation; the idea being that a specific piece of data might be used frequently so ideally we will just issue communications once to retrieve it and maintain a local store for as long as possible. As I say, this is entirely transparent to the programmer as it is intended that the directory/cache will be integrated with the runtime. Our reference implementation also contains numerous transport mechanisms which means that GPI-2, MPI RMA and BeeGFS can be swapped in and out trivially as the technology for physically moving the data around.