Data research

UrgentHPC SC19 workshop next week: see you in Denver!

Author: Nick Brown
Posted: 12 Nov 2019 | 11:11

Here in EPCC we lead a work package of the VESTEC EU FET project which is working on the fusion of real-time data and HPC for urgent decision-making for disaster response. While HPC has a long history of simulating disasters, what’s missing to support emergency, urgent, decision-making is fast, real-time acquisition of data and the ability to guarantee time constraints.

Precision persistent programming

Author: Adrian Jackson
Posted: 30 Oct 2019 | 12:48

Targeted Performance

Optane DIMM

Blog post updated 8th November 2019 to add Figure 6 highlighting PMDK vs fsdax performance for a range of node counts.

Following on from the recent blog post on our initial performance experiences when using byte-addressable persistent memory (B-APM) in the form of Intel's Optane DCPMM memory modules for data storage and access within compute nodes, we have been exploring performance and programming such memory beyond simple filesystem functionality.

For our previous performance results we used what is known as a fsdax (Filesystem Direct Access) filesystem, which enables bypassing the operating system (O/S) page cache and associated extra memory copies for I/O operations. We were using an ext4 filesystem on fsdax, although ext2 and xfs filesystems are also supported.

Mining digital historical textual data

Author: Rosa Filgueira
Posted: 23 Oct 2019 | 10:43

Over the last three decades the collections of libraries, archives and museums have been transformed by large-scale digitisation. The volume and quality of available digitised text now makes searching and linking these data feasible, where previous attempts were restricted due to limited data availability, quality, and lack of shared infrastructures. One example of this is the extensive digital collection offered by the National Library of Scotland (NLS) (see Figure 1) [1], which can be accessed online and also downloaded for further digital humanities research.

iCAIRD: the Industrial Centre for Artificial Intelligence Research in Digital Diagnostics

Author: Andrew Brooks
Posted: 26 Sep 2019 | 13:36

The iCAIRD project is working to establish a world-class centre of excellence in the application of artificial intelligence to digital diagnostics. The intention is that iCAIRD will allow clinicians, health planners and industry to work together, enabling research-active clinicians to collaborate with innovative SMEs to better inform clinical questions, and ultimately to solve healthcare challenges more quickly and efficiently.

Analysing historical newspapers and books using Apache Spark and Cray Urika-GX

Author: Mike Jackson
Posted: 16 Aug 2019 | 16:25

Library booksIn our October 2018 blog post on Analysing humanities data using Cray Urika-GX, we described how we had been collaborating with Melissa Terras of the College of Arts, Humanities and Social Sciences (CAHSS) at The University of Edinburgh to explore historical newspapers and books using the Alan Turing Institute's deployment of a Cray Urika-GX system ("Urika"). In this blog post we describe additional work we have done, to look at the origins of the term "stranger danger", find reports on the Krakatoa volcanic eruption of 1883, and explore the concept of "female emigration".

IoT Research and Innovation service

Author: Guest blogger
Posted: 9 Jul 2019 | 13:51

Guest blogger Simon Chapple introduces the University of Edinburgh's IoT Research and Innovation Service.

Most people will have heard of the Internet of Things (IoT). It is a hot topic in technology, business and the mainstream news, projected as it is to underpin a future trillion-dollar market at least as large as, and by some estimations even greater than, the cloud-based computing services industry. We define IoT as a network of dedicated physical objects that contain embedded technology to sense and interact with the external environment, and that can connect and exchange data.

Computing for extreme conditions

Author: Rosa Filgueira
Posted: 27 Jun 2019 | 10:06

The DARE project is addressing the challenges of combining extreme data, extreme computation and extreme complexity in scientific research.

Virtually every scientific domain is experiencing an increase in the volume of data it produces, with growing computational power enabling more complex simulations. Although comparing these simulations with observation can improve models and understanding, it is highly data-intensive.

NEXTGenIO at ISC High Performance 2019

Author: Catherine Inglis
Posted: 6 Jun 2019 | 14:34

The highly successful NEXTGenIO project is now drawing to a close after nearly four years. EPCC colleagues will be at ISC19 presenting the results of the project at a booth presentation, a BoF, and a workshop presentation. Come along and find out more!

Using OpenRefine to create new datasets

Author: Mario Antonioletti
Posted: 28 Apr 2019 | 16:07

One of the benefits of teaching a Carpentry course is that it can increase or deepen your understanding of a subject. A recent instance for me was in using OpenRefine, a tool that runs locally on your machine (you do not have to export your data to a third party service).

OpenRefine can help you:
• Explore and clean/transform your data. You can reconcile your data with other external data sources, i.e. enrich your data using external data

• Create a new dataset. It does not modify your original data and keeps provenance of all the steps. Depending on the capabilities of your local machine it can deal with data sets that are up to about 100k rows.

Watch the videos on the OpenRefine website for a good overview. If you want to know more, follow the Carpentry OpenRefine for Ecologists lesson. In this example, I am going to show how easy is to generate a new dataset from the EPCC website. Follow along after you have installed OpenRefine on your system.

Proof-driven queries to preserve patient privacy

Author: Mike Jackson
Posted: 4 Mar 2019 | 09:42

StethoscopeIn our role as members of the Research Engineering Group of the Alan Turing Institute, Anna Roubickova and I worked with Efi Tsamoura and Benjamin Spencer (Department of Computer Science at the University of Oxford) on PDQ, a proof-driven query planner that has great potential within the realm of data science for medical research. 

Pages