Research collaboration: machine learning on EPCC's Cirrus system

9 January 2024

The Cirrus HPC system, operated by EPCC, is designed to solve computational, simulation, modelling, and data science challenges. Michael Bareford describes a research collaboration that employed Cirrus' GPU nodes.

Cirrus GPU nodes were used to train a neural network (NN) to recognize agricultural field boundaries from satellite images. The NVIDIA V100 GPUs available on Cirrus are sufficiently capable that just one GPU node (containing four V100 GPUs) was required to train the NN using multi spectral images obtained from PlanetScope's flock of (Super)Dove nanosatellites [1].

Increased prediction accuracy

Training with such high-resolution (4m) data was not possible with the user's local compute cluster, which features NVIDIA K80 GPUs. On that platform the training was restricted to 100m resolution images, otherwise the runtime would become unfeasible. On the other hand, the greater processing speed and memory of V100 GPUs, makes it possible to train with higher resolution data, raising the prediction accuracy from 77% to 93%. This level of accuracy was demonstrated for a sample of 700,000 field boundaries located in Navarra province in northern Spain (see image below).

Satellite image of fields alongside 2 maps.

^{Agricultural field boundaries detected by a U-Net neural network trained and run on the Cirrus machine. The field boundaries are for farmland located within Navarra province in northern Spain.}

In addition, the total runtime on a single Cirrus GPU node was a reasonable 41 hours (or 164 GPU hours). We should note here also the extensiveness of the training data, five million polygons in total, covering regions in Europe, Africa and South East Asia.

Processing image data

The satellite image data must be processed and re-formatted before being fed to the NN. This was done by specialist Python packages such as eo-learn [2] and eo-flow [3], which were installed within a custom Python environment on Cirrus, one based on a centrally-installed TensorFlow module [4]. The latter software being obviously crucial in training the U-Net neural network. A U-Net [5] NN is one that segments the input image rather than just classifying elements of the image, ie the output is the same as the input image but with the agricultural field boundaries delineated.

Delineating the boundaries of agricultural fields is a necessary first step when it comes to gathering and representing the data essential for farm management. For example, satellite images can also provide information on various key metrics such as crop yield. This research effort is being led by Dr Simon Fraval [6] of the University of Edinburgh; its ultimate aim is to bring down the costs of ML-enchanced farm management such that it is accessible to farmers across the world.

Case study

You can read more about this project in the case study on the Cirrus website: https://www.cirrus.ac.uk/casestudies/cirrus_field_boundries_AW_LOW.pdf

Links

^{1. https://earth.esa.int/eogateway/missions/planetscope}
^{2. https://github.com/sentinel-hub/eo-learn}
^{3. https://github.com/sentinel-hub/eo-flow}
^{4. https://docs.cirrus.ac.uk/user-guide/python/#installing-your-own-python…}
^{5. https://towardsdatascience.com/understanding-u-net-61276b10f360}
^{6. https://www.research.ed.ac.uk/en/persons/simon-fraval}

Author

Dr Michael Bareford

m.bareford@epcc.ed.ac.uk

View profile