Energy-efficient quantum computing simulations
24 January 2023
Jakub Adamski, a PhD student at EPCC, is investigating the benchmarking of classical simulations of quantum computing against real quantum hardware. His poster "Energy-efficient quantum computing simulations" was awarded first prize at Computing Insight UK 2022.
My poster aimed to investigate the efficiency of various approaches to simulate quantum computing. I showcased it at the Computing Insight UK conference (CIUK) alongside other participants, and I managed to win the competition. I had a great time discussing my results, and talking to other people about their work – I am looking forwards to more events like that.
Before presenting the results, let's first introduce a few concepts. In principle, quantum computing is like classical computing, but every term is preceded by the word "quantum". We have quantum bits, in short "qubits", which can take a value of 0 or 1, or both simultaneously, but each is measured with some probability. A quantum register contains multiple qubits, whose values together are referred to as a quantum state. There is an important property called entanglement, which describes to what extent a state can be factored into smaller states. A quantum gate modifies the state with some reversible transformation. A series of quantum gates makes a quantum algorithm, which we are usually interested in simulating.
The scaling problem
The most common approach to quantum simulations is to store the whole state in memory and to modify it with gates in a given order – this is called a state vector simulator. On HPC systems, these are often dominated by message passing (MPI), since the full state must be synchronised after any non-local operation. Moreover, in the quantum world, even local operations need to be distributed to the whole state. As a result, the time spent computing constitutes only a fraction of the total runtime, and the CPU can end up under-utilised, waiting for the communications to complete. This poses an obvious scaling problem, but here scaling is obligatory, as the memory requirements increase exponentially with the state vector size. Interestingly, such a situation creates an opportunity to decrease the CPU clock frequency and improve the energy efficiency with little runtime penalty. Since the algorithm is bound by MPI, it frequently ends up in a scenario where computations cannot happen util the communication is complete. Hence, making operations slightly slower shouldn't have a significant impact on the total runtime. Conveniently, decreasing the frequency provides a substantial reduction in the overall energy consumption.
While this is a great improvement, it doesn't change the problem of exponential scaling. However, there is a completely different approach that can sometimes eliminate this issue – tensor networks. It turns out that a circuit of quantum gates is in fact also a tensor network, and there are methods to represent and contract it efficiently, as long as the state isn't excessively entangled. Unfortunately, this means that the efficiency of the computation depends directly on the input state, which is in contrast to the state vector approach. In other words, some simple quantum states can be simulated incredibly fast, whilst others require even more time and memory than for the state vector method.
Both approaches were verified with experiments. QuEST was the framework chosen for state vector simulations, as it is well-optimised and has built-in HPC scaling. The second method was tested in iTensor, which was the only tensor network framework that compiled on ARCHER2. It also turned out not to support any parallelism, but luckily, this wasn't necessary. Two contrasting circuits were selected to evaluate the approaches – Quantum Fourier Transform (QFT), and a simplified random circuit based on the Google quantum advantage paper .
The state vector method worked as expected – it managed to handle input states up to 41 qubits, and lowering the CPU clock indeed increased the energy efficiency. The runtime penalty proved to depend on the circuit. After some Arm MAP profiling, QFT appeared to be slightly more compute bound, as it consists mostly of diagonal gates, and requires less synchronisation. The random circuit, on the other hand, was clearly MPI bound. Nevertheless, the runtime cost was in both cases much smaller than the energy efficiency gain (e.g. 25% energy reduction with a 5% slower simulation). The tensor network approach also showed a lot of potential. The QFT could often manage dozens more qubits in the input state than with the state vector technique. The difference was not as clear when the input had more entanglement, but the second approach could often still handle more qubits and do it faster. However, the random circuit did much worse as a tensor network, struggling to simulate more than 22 qubits. This is because this circuit always inserts a lot of entanglement between all qubits.
In conclusion, there are several ways to make quantum computing simulations more efficient. An easy solution is to lower the CPU frequency, which MPI-heavy state vector simulations particularly benefit from. However, there exist completely different simulation approaches that are worth investigating, such as tensor networks. Unfortunately, this method is dependent on the input, which means sometimes it performs exponentially better, and sometimes much worse. In the future, as a part of my PhD research, I would like to try to improve the tensor network approach, by introducing noise that limits the entanglement – which is what happens in realistic near-term quantum hardware.
 Arute, F., Arya, K., Babbush, R. et al. Quantum supremacy using a programmable superconducting processor. Nature 574, 505–510 (2019). https://doi.org/10.1038/s41586-019-1666-5
Jakub Adamski: firstname.lastname@example.org
Jakub's PhD studies are supervised by EPCC's Oliver Brown.
Related article: Quantum lab set to boost discoveries