A public UK HPC knowledge base
Posted: 11 Nov 2019 | 08:49
In this blog post I consider how we (as the UK HPC community) could create a community HPC technical knowledge base that would allow us to share and reuse useful technical information. Much of these thoughts came out of discussions at the HPC Champions meeting that took place on 16 September 2019 alongside the UK RSE Conference 2019 in Birmingham, UK along with subsequent discussions at the monthly HPC RSE calls.
The UK HPC landscape has changed a lot over the past five years with the introduction of the EPSRC national Tier-2 HPC facilities to supplement the national supercomputing service, ARCHER, and the DiRAC national HPC service. A lot of work has been going on to increase information and experience sharing between the different national HPC services and to institutional HPC services with a large amount of success.
There are groups that successfully join together HPC systems technical staff, RSEs that support users and service management across the UK such as HPC-SIG, Society of Research Software Engineering, UK HPC Champions, UK HPC RSE Network and the DiRAC and Tier-2 Technical Working Groups (TWGs). EPCC has played a central role in all of these activities and they have been very successful at bringing the community together to share information and coordinate activity but there are still a number of areas where the community could improve.
One area for improvement has been highlighted by a question that has been raised within the current community meetings: How can we share technical solutions and information that we have locally across the community and potentially publicly to the wider HPC community (including users, RSEs and service providers)?
Many services and sites have internal knowledge bases (both private and public) that contain information that would be useful across the community and one option would be to find a way to create a public knowledge base that could be used by the community to expose this useful information.
Currently, the useful information is stored in a number of different locations with different access:
There is actually a large amount of information already publicly available, for example:
- Compilation instructions for different HPC systems: https://github.com/hpc-uk/build-instructions
- Isambard GW4 Arm benchmarks (including compilation instructions): https://github.com/UoB-HPC/benchmarks
- UCL build scripts: https://github.com/UCL-RITS/rcps-buildscripts
- UK HPC benchmarks: https://github.com/hpc-uk/archer-benchmarks
... and, I am sure, many others I do not know about (which is one of the reasons we are having this discussion!).
What would the requirements for a public UK HPC knowledge base look like? Based on discussions within the community, it should be:
- Publicly visible through a web browser: available for anyone to view the information and make use of it.
- Searchable and well-indexed: the most powerful search tools are internet search engines. The knowledge base should be well indexed and available to be found easily through standard search engines.
- Permissions to allow public addition of knowledge base entries: so all can contribute. This, however, means that we need a way to ensure that publicy-added answers are technically correct - which leads to the next point.
- Rate answers based on correctness: to ensure that entries are kept up to date and that publicly-added entries are reviewed.
The solution chosen should also likely be free as there is currently no funding stream to support this activity!
From the discussion at HPC Champions and more recently at the RSE HPC monthly open meetings, two initial services that could provide potential solutions for a shared, open knowledge base have been identified:
- Stack Overflow and related Stack Exchange sites
We have looked at these two options in more detail but there must be others out there.Please comment and share with your own suggestions!
Stack Overflow and related Stack Exchange sites
Stack Overflow is a public, online knowledge base aimed at sharing technical knowledge relevant to programming. The site uses peer review of answers to determine the most useful and accurate answers and allows tagging of questions. The overarching Stack Exchange organisation provides the platform for Stack Overflow and also a number of other, less well-known, community knowledge bases. One that is relevent for the discussion here is the Computational Science Stack Exchange site. The boundary between which questions would fit best on Stack Overflow and which would fit best on CompSci Stack Exchange is a bit fuzzy but the CompSci topic definition helps, with the distinction boiling down to programming questions go on Stack Overflow (eg How do I compile a code? How do I use MPI function X?) and high-level questions about packages go on CompSci (eg Which is the best LAPACK funtion to do X? Should I use the single-precision or double-precision version of GROMACS for modelling system Y?)
On all Stack Exchange sites it is perfectly acceptable to ask a question and provide an answer. This can be useful when transferring knowledge from an internal resource (eg a service desk ticket) into the public knowledge base.
The strengths of Stack Overflow (and related sites) include its high ranking in technical question searches using standard search engines, its well designed interface, its strong definition of allowed questions and their format, and the fact that it is already the top resource for this type of technical information on the internet. Stack Overflow also provides an API that would allow integration with other tools.
Disadvantages are that it is debatable how many HPC experts are currently engaged in the Stack Overflow communities so the value of the peer review function is unclear (at least initially). Also, questions that are a matter of opinion or discussion are disallowed (eg Should I use C++ or Fortran for my next HPC coding project?) though this can also be cast as a strength in the context of a technical knowledge base as such questions usually lead to an open-ended discussion which is not useful for answering the original question.
Some examples of where this has already been used for questions in this area are:
Note the focused nature of the questions that leads to specific technical answers.
The Ask.CI site was created in the US as:
The site provides a Q&A interface aimed at all people involved in research computing (e.g. researchers, users, support staff, RSEs, systems administrators). Like the Stack Exchange sites, Ask.CI is free to use and publicly available, allows for tagging of questions by topic and allows people to ask a question and answer it themselves. Unlike Stack Exchange sites, it allows discussion-style questions and allows topics from across the whole range of research computing – all of the example questions discussed in the Stack Exchange section above would be acceptable questions on Ask.CI. It does not have a peer review functionality to allow for rating of answers, people can simply like posts that they think have merit.
The strengths of Ask.CI are that it is focused particularly on research computing (though this is broader than just HPC), it already has a dedicated, committed community asking and answering questions and it allows all types of questions. Weaknesses are the low weight leading to answers being low in search engine rankings, lack of peer review, the limited community involved and the lack of specific question guidelines that mean that many of the questions descend into discussion with no indication of what the accepted answer is.
Examples of technical questions asked on Ask.CI:
As you can see, compared to Stack Overflow, the questions are vague and much more open-ended and do not lend themselves as well to a technical knowledge base.
Based on the analysis above, my personal opinion is that using the Stack Exchange sites, primarily Stack Overflow, will provide the strongest solution. The key feature that drive me towards this solution are the strong rules around allowable questions, which drive the responses towards being a useful knowledge base rather than a question-and-answer resource where posts tend to result in discussions with no clear answer to the technical question being asked. These types of questions have a place but they do not fit as well into the idea of a useful technical knowledge base.
The next step is for the community as a whole to take a decision on how to take this idea of a shared technical HPC knowledge base forward and boostrap its use. One initial option that has been mooted is to run a community coding day where the community would come together to seed the chosen option with initial questions and come up with documentation and guidelines to help sites adopt the chosen solution.
I think there is a great opportunity here for the community to come together to provide an amazing resource that will help everyone get the most benefit out of HPC for research.
If you have questions or comments on this topic, I would love to hear them! You can always get in touch with me at firstname.lastname@example.org.
Andy Turner, EPCC
Photo by Glen Noble on Unsplash