Publicação

Extending, improving and optimizing Marrow

Ver documento

Detalhes bibliográficos
Resumo:Most computers nowadays are heterogeneous, composed of a Central Processing Unit (CPU) and one or more Graphics Processing Units (GPUs). In order to harness the power of each of these devices, developers must have experience with low-level toolchains such as CUDA, and expert knowledge of the underlying architecture. However these low-level approaches add several layers of complexity to the task at hand. High-level programming models such as the Marrow framework are used to attenuate the arduous task that is offloading computation to accelerator devices. Usually, they do so by abstracting memory management and implicitly parallelizing workloads by exposing high-level constructs to the programmer. However, these frameworks come with several limitations and it isn’t always possible to maximize performance as this might require writing specific code to map computation to a device. In this thesis we ported several programs implemented in other frameworks and plat- forms to the Marrow framework, which allowed us to better understand its limitations and further extend and optimize the framework. An iterative process was used, where we started by analyzing how a given program was implemented on a given framework, secondly we investigated if the program could be implemented in Marrow’s current state. If not, we extended Marrow by improving its features, in order to make the implementa- tion possible. Then we implemented and benchmarked the given program, and used the performance comparisons as a tool to further optimize the framework. With the development of this thesis we managed to implement several applications with the Marrow framework, which allowed us to add several new features such as the inclusive scan, matrix multiplication operation, the zip and unzip functions, and we significantly improved the flexibility of Marrow’s constructs such as that of Marrow’s exclusive scan. Furthermore, we managed to better understand Marrow’s performance bottlenecks through the Marrow profiler, and optimize asynchronous memory transfers.
Autores principais:Cardoso, Francisco José Sampaio de Freitas
Assunto:Heterogeneous Computing Marrow CUDA GPU
Ano:2022
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
Descrição
Resumo:Most computers nowadays are heterogeneous, composed of a Central Processing Unit (CPU) and one or more Graphics Processing Units (GPUs). In order to harness the power of each of these devices, developers must have experience with low-level toolchains such as CUDA, and expert knowledge of the underlying architecture. However these low-level approaches add several layers of complexity to the task at hand. High-level programming models such as the Marrow framework are used to attenuate the arduous task that is offloading computation to accelerator devices. Usually, they do so by abstracting memory management and implicitly parallelizing workloads by exposing high-level constructs to the programmer. However, these frameworks come with several limitations and it isn’t always possible to maximize performance as this might require writing specific code to map computation to a device. In this thesis we ported several programs implemented in other frameworks and plat- forms to the Marrow framework, which allowed us to better understand its limitations and further extend and optimize the framework. An iterative process was used, where we started by analyzing how a given program was implemented on a given framework, secondly we investigated if the program could be implemented in Marrow’s current state. If not, we extended Marrow by improving its features, in order to make the implementa- tion possible. Then we implemented and benchmarked the given program, and used the performance comparisons as a tool to further optimize the framework. With the development of this thesis we managed to implement several applications with the Marrow framework, which allowed us to add several new features such as the inclusive scan, matrix multiplication operation, the zip and unzip functions, and we significantly improved the flexibility of Marrow’s constructs such as that of Marrow’s exclusive scan. Furthermore, we managed to better understand Marrow’s performance bottlenecks through the Marrow profiler, and optimize asynchronous memory transfers.