Analysing historical newspapers and books using Apache Spark and Cray Urika-GX

Author: Mike Jackson
Posted: 16 Aug 2019 | 16:25

Library booksIn our October 2018 blog post on Analysing humanities data using Cray Urika-GX, we described how we had been collaborating with Melissa Terras of the College of Arts, Humanities and Social Sciences (CAHSS) at The University of Edinburgh to explore historical newspapers and books using the Alan Turing Institute's deployment of a Cray Urika-GX system ("Urika"). In this blog post we describe additional work we have done, to look at the origins of the term "stranger danger", find reports on the Krakatoa volcanic eruption of 1883, and explore the concept of "female emigration".

Spark-based genome analysis on Cray-Urika and Cirrus clusters

Author: Rosa Filgueira
Posted: 16 Jan 2019 | 11:06

Analysing genomics data is a complex and compute intensive task, generally requiring numerous software tools and large reference data sets, tied together in successive stages of data transformation and visualisation.

Typically in a cancer genomics analysis, both a tumour sample and a “normal” sample from the same individual are first sequenced using NGS systems and compared using a series of quality control stages. The first control stage, ‘Sequence Quality Control’ (which is optional), checks sequence quality and performs some trimming. While the second one, ‘Alignment’, involves a number of steps, such as alignment, indexing, and recalibration, to ensure that the alignment files produced are of the highest quality as well as several more to guarantee the variants are called correctly. Both stages compromise a series of intermediately computing and data-intensive steps that very often are handcrafted by researchers and/or analysts.

Analysing humanities data using Cray Urika-GX

Author: Rosa Filgueira
Posted: 11 Oct 2018 | 14:52

During the last six months, in our role as members of the Research Engineering Group of the Alan Turing Institute, we have been working with Melissa Terras, University of Edinburgh's College of Arts, Humanities and Social Sciences (CAHSS), and Raquel Alegre, Research IT Services, University College London (UCL), to explore text analysis of humanities data. This work was funded by Scottish Enterprise as part of the Alan Turing Institute-Scottish Enterprise Data Engineering Programme.

ExCeL-lent EPCC at New Scientist Live 2018

Author: Oliver Brown
Posted: 25 Sep 2018 | 16:44

Regular readers will already know that EPCC was planning to attend New Scientist Live again this year. Despite our concerns about getting to London as Storm Ali bore down on us, we made it, and I’m happy to report we had a very successful and enjoyable trip!

When applications go exascale — the CRESTA project

Author: Guest blogger
Posted: 10 Feb 2014 | 09:22

Dr Jason Beech-Brandt, Manager Exascale Research, Europe at Cray writes about the CRESTA project, which is addressing the challenges of exascale computing.

Seymour Cray, the pioneer of supercomputing, famously asked if you would rather plough a field with two strong oxen or 1024 chickens.

ARCHER: the next national HPC service for academic research

Author: Andy Turner
Posted: 29 Nov 2013 | 11:00

ARCHER (Advanced Research Computing High End Resource) is the next national HPC service for academic research. The service comprises a number of components: accommodation provided by the University of Edinburgh; hardware by Cray; systems support by EPCC and Daresbury Laboratory; and user and computational science and engineering support by EPCC.

HECToR/Cray XC30 courses next week

Author: David Henty
Posted: 19 Jun 2013 | 12:13

Cray Advanced Tools Workshop

As part of our PRACE Advanced Training Centre (PATC) programme, EPCC is hosting a "Cray Advanced Tools Workshop" in JCMB on 26-27 June using HECToR as the platform. If you are interested in attending see the event page on the PRACE website.

Launching applications non-homogenously on Cray supercomputers

Author: Tom Edwards
Posted: 12 Apr 2013 | 11:31

The vast majority of applications running on HECToR today are designed around the Single Instruction Multiple Data (SIMD) parallel programming paradigm. Each processing element (PE), i.e. MPI rank or Fortran Coarray image, runs the same program and performs the same operations in parallel on the same or a similar amount of data. Usually these application are launched on the compute nodes homogenously, with the same number of processes spawned on each node and each with the same number of threads (if required).

Blog Archive