Publicação
Extending, improving and optimizing Marrow
| Resumo: | Most computers nowadays are heterogeneous, composed of a Central Processing Unit (CPU) and one or more Graphics Processing Units (GPUs). In order to harness the power of each of these devices, developers must have experience with low-level toolchains such as CUDA, and expert knowledge of the underlying architecture. However these low-level approaches add several layers of complexity to the task at hand. High-level programming models such as the Marrow framework are used to attenuate the arduous task that is offloading computation to accelerator devices. Usually, they do so by abstracting memory management and implicitly parallelizing workloads by exposing high-level constructs to the programmer. However, these frameworks come with several limitations and it isn’t always possible to maximize performance as this might require writing specific code to map computation to a device. In this thesis we ported several programs implemented in other frameworks and plat- forms to the Marrow framework, which allowed us to better understand its limitations and further extend and optimize the framework. An iterative process was used, where we started by analyzing how a given program was implemented on a given framework, secondly we investigated if the program could be implemented in Marrow’s current state. If not, we extended Marrow by improving its features, in order to make the implementa- tion possible. Then we implemented and benchmarked the given program, and used the performance comparisons as a tool to further optimize the framework. With the development of this thesis we managed to implement several applications with the Marrow framework, which allowed us to add several new features such as the inclusive scan, matrix multiplication operation, the zip and unzip functions, and we significantly improved the flexibility of Marrow’s constructs such as that of Marrow’s exclusive scan. Furthermore, we managed to better understand Marrow’s performance bottlenecks through the Marrow profiler, and optimize asynchronous memory transfers. |
|---|---|
| Autores principais: | Cardoso, Francisco José Sampaio de Freitas |
| Assunto: | Heterogeneous Computing Marrow CUDA GPU |
| Ano: | 2022 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade Nova de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório Institucional da UNL |
Registos relacionados
school OpenCL-MMP : codificação de imagens com sistemas com múltiplos núcleos
por: Silva, João Filipe Crespo
Publicado em: (2015)
por: Silva, João Filipe Crespo
Publicado em: (2015)
school Implementação de algoritmos de reconstrução de imagens de tomossíntese utilizando processamento paralelo em GPU
por: Duarte, Carlos Pereira
Publicado em: (2016)
por: Duarte, Carlos Pereira
Publicado em: (2016)
school Mapas Auto-Organizados Ubíquos em Unidades de Processamento Gráfico
por: Borrego, João Pedro Vicente Martins
Publicado em: (2018)
por: Borrego, João Pedro Vicente Martins
Publicado em: (2018)
school Reconstrução de imagem médica de mamografia por emissão de positrões (PEM) com GPU
por: Mendes, Sérgio Alexandre Alves
Publicado em: (2011)
por: Mendes, Sérgio Alexandre Alves
Publicado em: (2011)
groups GPUMLib Framework: Using the GPU to Empower Machine Learning Research
por: Lopes, Noel
Publicado em: (2016)
por: Lopes, Noel
Publicado em: (2016)
article Tubulointerstitial nephritis and uveitis syndrome with non caseating granuloma in bone marrow biopsy
por: Fraga, Maria
Publicado em: (2014)
por: Fraga, Maria
Publicado em: (2014)
school Massively parallel GPU acceleration of population-based optimization metaheuristics: application to the solution large-scale systems of nonlinear equations
por: Silva, Bruno Miguel Pereira da
Publicado em: (2025)
por: Silva, Bruno Miguel Pereira da
Publicado em: (2025)
article Bioengineering the human bone marrow microenvironment in liquefied compartments: a promising approach for the recapitulation of osteovascular niches
por: Oliveira, Cláudia S.
Publicado em: (2022)
por: Oliveira, Cláudia S.
Publicado em: (2022)
article Retinopathy and Bone Marrow Failure Revealing Coats Plus Syndrome
por: Painho, T
Publicado em: (2018)
por: Painho, T
Publicado em: (2018)
school CUDA-MMP - Codificação de imagens com sistemas com múltiplos núcleos
por: Ribeiro, Tiago Martins
Publicado em: (2016)
por: Ribeiro, Tiago Martins
Publicado em: (2016)
school Optimization of Pattern Matching Algorithms for Multi- and Many-Core Platforms
por: Pereira, Pedro Miguel Marques
Publicado em: (2016)
por: Pereira, Pedro Miguel Marques
Publicado em: (2016)
school Protein docking GPU acceleration
por: Ribeiro, Ricardo Alexandre do Rosário
Publicado em: (2019)
por: Ribeiro, Ricardo Alexandre do Rosário
Publicado em: (2019)
article HLA-A, -C, -B, and -DRB1 allelic and haplotypic diversity in bone marrow volunteer donors from Northern Portugal
por: Lima, Bruno A
Publicado em: (2013)
por: Lima, Bruno A
Publicado em: (2013)
article Tissue engineered constructs based on SPCL scaffolds cultured with goat marrow cells : functionality in femoral defects
por: Rodrigues, Márcia T.
Publicado em: (2011)
por: Rodrigues, Márcia T.
Publicado em: (2011)
school Optimized MLOps deployments
por: Rodrigues, Pedro Henrique Figueiredo
Publicado em: (2025)
por: Rodrigues, Pedro Henrique Figueiredo
Publicado em: (2025)
school Tomographic image processing using Julia and GPU
por: Brito, Bruno Daniel Afonso de
Publicado em: (2022)
por: Brito, Bruno Daniel Afonso de
Publicado em: (2022)
school Metodologia para Teste Rigoroso de Softwares de Mineração de Criptomoedas
por: Vieira, Carolina Bandeira Luís
Publicado em: (2022)
por: Vieira, Carolina Bandeira Luís
Publicado em: (2022)
article Observational Study Protocol: Management of Anemia in Allogeneic Bone Marrow Donors
por: Gradim , Mariana
Publicado em: (2026)
por: Gradim , Mariana
Publicado em: (2026)
article In Vitro (Re)programming of Human Bone Marrow Stromal Cells Toward Insulin-Producing Phenotypes
por: Limbert, C
Publicado em: (2009)
por: Limbert, C
Publicado em: (2009)
article Osteoblastic behavior of human bone marrow cells cultured over adsorbed collagen layer, over surface of collagen gels, and inside collagen gels
por: Fernandes, Luís F.
Publicado em: (2009)
por: Fernandes, Luís F.
Publicado em: (2009)
article Energy-Efficient and Portable Least Squares Prediction for Image Coding on a Mobile GPU
por: Cordeiro, Pedro
Publicado em: (2017)
por: Cordeiro, Pedro
Publicado em: (2017)
article Systemic Mastocytosis - a Diagnostic Challenge
por: Lladó, AC
Publicado em: (2014)
por: Lladó, AC
Publicado em: (2014)
article Accelerating floating-point fitness functions in evolutionary algorithms: a FPGA-CPU-GPU performance comparison
por: Gomez-Pulido, Juan A.
Publicado em: (2011)
por: Gomez-Pulido, Juan A.
Publicado em: (2011)
text_fields Revisiting Database Indexing for Parallel and Accelerated Computing: A Comprehensive Study and Novel Approaches
por: Abbasi, Maryam
Publicado em: (2024)
por: Abbasi, Maryam
Publicado em: (2024)
article Platelets Structure, Function and Modulator Capacity in Replacement Therapy
por: Moutinho, B.
Publicado em: (2017)
por: Moutinho, B.
Publicado em: (2017)
school Intelligent scanning method for adaptive positron emission tomography
por: Encarnação, Pedro Manuel Crispim da Costa da
Publicado em: (2024)
por: Encarnação, Pedro Manuel Crispim da Costa da
Publicado em: (2024)
article Optimizing payments based on efficiency, quality, complexity, and heterogeneity: the case of hospital funding
por: Ferreira, Diogo C.
Publicado em: (2020)
por: Ferreira, Diogo C.
Publicado em: (2020)
school GPU power for medical imaging
por: Fonseca, Francisco Xavier dos Santos
Publicado em: (2011)
por: Fonseca, Francisco Xavier dos Santos
Publicado em: (2011)
article Case Report: Wide Spectrum of Manifestations of Ligase IV Deficiency: Report of 3 Cases
por: Costa e Castro, A
Publicado em: (2022)
por: Costa e Castro, A
Publicado em: (2022)
article Hierarchical neighbor discovery scheme for handover optimization
por: Buiati, Fábio
Publicado em: (2010)
por: Buiati, Fábio
Publicado em: (2010)
school Essays on incomplete markets and aggregate shocks
por: Ferreira, Miguel Homem
Publicado em: (2020)
por: Ferreira, Miguel Homem
Publicado em: (2020)
article Realtime parallel software implementation of a DS-CDMA Multiuser Detector
por: Gonçalves, Luís Carlos
Publicado em: (2021)
por: Gonçalves, Luís Carlos
Publicado em: (2021)
article The therapeutic potential of hematopoietic stem cells in bone regeneration
por: Oliveira, Claudia S.
Publicado em: (2022)
por: Oliveira, Claudia S.
Publicado em: (2022)
article Modelling Monochamus galloprovincialis dispersal trajectories across a heterogeneous landscape to optimize monitoring by trapping networks
por: Nunes, Pedro
Publicado em: (2022)
por: Nunes, Pedro
Publicado em: (2022)
groups The presence of distortions in the extended skew : normal distribution
por: Seijas-Macias, J. Antonio
Publicado em: (2017)
por: Seijas-Macias, J. Antonio
Publicado em: (2017)
article Inefficiency caused by random matching and heterogeneity
por: Kultti, Klaus
Publicado em: (2010)
por: Kultti, Klaus
Publicado em: (2010)
article On the design of innovative heterogeneous sheet metal tests using a shape optimization approach
por: Andrade-Campos, António
Publicado em: (2019)
por: Andrade-Campos, António
Publicado em: (2019)
article Intravenous Busulfan for Autologous Stem Cell Transplantation in Adult Patients with Acute Myeloid Leukemia: a Survey of 952 Patients on Behalf of the Acute Leukemia Working Party of the European Group for Blood and Marrow Transplantation
por: Nagler, A
Publicado em: (2014)
por: Nagler, A
Publicado em: (2014)
article Meglumine antimoniate and miltefosine combined with allopurinol sustain pro-inflammatory immune environments during canine leishmaniosis treatment
por: Santos, Marcos André Ferreira
Publicado em: (2019)
por: Santos, Marcos André Ferreira
Publicado em: (2019)
article Improving Information Extraction through Biological Correlation
por: Francisco Couto
Publicado em: (2003)
por: Francisco Couto
Publicado em: (2003)
Registos relacionados
-
school OpenCL-MMP : codificação de imagens com sistemas com múltiplos núcleos
por: Silva, João Filipe Crespo
Publicado em: (2015) -
school Implementação de algoritmos de reconstrução de imagens de tomossíntese utilizando processamento paralelo em GPU
por: Duarte, Carlos Pereira
Publicado em: (2016) -
school Mapas Auto-Organizados Ubíquos em Unidades de Processamento Gráfico
por: Borrego, João Pedro Vicente Martins
Publicado em: (2018) -
school Reconstrução de imagem médica de mamografia por emissão de positrões (PEM) com GPU
por: Mendes, Sérgio Alexandre Alves
Publicado em: (2011) -
groups GPUMLib Framework: Using the GPU to Empower Machine Learning Research
por: Lopes, Noel
Publicado em: (2016)