Faster large language model training with a Cerebras CS-2 Wafer-Scale Cluster

30 January 2024

The Edinburgh International Data Facility, operated by EPCC, includes the Cerebras CS-2 service which was recently upgraded to add a second unit and also reconfigured into a Wafer-Scale Cluster. This will allow researchers to quickly train large language models (LLMs) in-house and explore ever larger machine learning (ML) problems. Service Manager and Architect Nick Johnson explains more below.

In 2021, EPCC purchased and hosted the first Cerebras CS-1 system in Europe. In September 2022 this initial CS-1 unit was replaced by a CS-2, improving overall performance and usability. In December 2023 a second CS-2 was added and both now run as a Wafer-Scale Cluster (WSC) - where a problem can span both CS-2 units and the work can be in parallel.

Weight streaming

This important change allows researchers to train and use models with parameter sizes far above the 1-billion parameters size achievable with a single unit. This is accomplished by virtue of having both a second unit available and also by enabling a new operating mode: weight streaming. Instead of a complete set of model weights (the computed parameters) being stored directly on the single CS-2 system, they can now be streamed in (and out) as necessary from the associated MemoryX system across a high speed network coupled to both CS-2 units.

From a user perspective, alongside the increase in model size, we see a significant change in how models and problems are defined. Rather than contend with singularity containers as we used with the single-unit system, and which limited users to specific versions of PyTorch, Tensorflow and other frameworks, we can now define and orchestrate code and models inside a Python virtual environment (venv), which is much easier to manipulate and work with. This also allows users access to more up-to-date models from Cerebras' now open-source model zoo.


Dr Nick Johnson