Skip to main content
Login | Suomeksi | På svenska | In English

Browsing by Author "Puro, Touko"

Sort by: Order: Results:

  • Puro, Touko (2023)
    GPUs have become an important part of large-scale and high-performance physics simulations, due to their superior performance [11] and energy effiency [23] over CPUs. This thesis examines how to accelerate an existing CPU stencil code, that is originally parallelized through message passing, with GPUs. Our first research question is how to utilize the CPU cores alongside GPUs when the bulk of the computation is happening on GPUs. Secondly, we investigate how to address the performance bottleneck of data movement between CPU and GPU when there is a need to perform computational tasks originally intended to be executed on CPUs. Lastly, we investigate how the performance bottleneck of communication between processes can be alleviated to make better use of the available compute resources. In this thesis we approach these problems by building a preprocessor designed for making an existing CPU codebase suitable for GPU acceleration and the communication bottleneck is alleviated through extending a existing GPU oriented library Astaroth. We improve its task scheduling system and extend its own domain specific language (DSL) for stencil computations. Our solutions are demonstrated by making an existing CPU based astrophysics simulation code Pencil Code [4] suitable for GPU acceleration with the use of our preprocessor and the Astaroth library. Our results show that we are able to utilize CPU cores to perform useful work alongside the GPUs. We also show that we are able to circumvent the CPU-GPU data movement bottleneck by making code suitable for offloading through OpenMP offloading and code translation to GPU code. Lastly, we show that in certain cases Astaroth’s communication performance is increased by around 21% through smaller message sizes — with the added benefit of 14% lower memory usage, which corresponds to around 18% improvement in overall performance. Furthermore, we show benefits of the improved tasking and a identified memoryperformance trade-off.