Publicação

Optimization of a feature selection tool for inference of gene regulatory networks

Ver documento

Detalhes bibliográficos
Resumo:This dissertation concerns the computational optimization of DimReduction, a feature selection tool for inference of Gene Regulatory Networks (GRN). The primary aim was to make it faster and more performance scalable, in order to to handle large amounts of data, which would bring it closer to the bioinformatics community. The approach involved the translation of the original Java GUI-based implementation into a CLI version and the re-implementation of the latter in Python. Once the performance of the Python version was lower than expected, the focus turned again to the Java CLI version. The major bottleneck in this version was determined and addressed, namely the elimination of explicit invocation of the Garbage Collector (GC) led to the runtime of a reference dataset (with 4511 genes) to reduce from more than 2 days to 42 minutes. The optimized version of Java was then parallelized, using a threaded approach, which yielded near-linear speedups. The new Java parallel implementation was then compared with other reference platforms from the literature (GENIE3, CLR, ARACNE, C3NET, BC3NET, MRNET, MRNETB, KBOOST and PCIT). The findings indicate that even though some alternatives ensure higher metrics of quality (AUROC/AUPR), DimReduction speed makes it a competitive tool in the field.
Autores principais:Cunha, João Vítor Fuzetti da
Assunto:Optimization of DimReduction Gene Regulatory Networks (GRN)
Ano:2025
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Instituto Politécnico de Bragança
Idioma:inglês
Origem:Biblioteca Digital do IPB
Descrição
Resumo:This dissertation concerns the computational optimization of DimReduction, a feature selection tool for inference of Gene Regulatory Networks (GRN). The primary aim was to make it faster and more performance scalable, in order to to handle large amounts of data, which would bring it closer to the bioinformatics community. The approach involved the translation of the original Java GUI-based implementation into a CLI version and the re-implementation of the latter in Python. Once the performance of the Python version was lower than expected, the focus turned again to the Java CLI version. The major bottleneck in this version was determined and addressed, namely the elimination of explicit invocation of the Garbage Collector (GC) led to the runtime of a reference dataset (with 4511 genes) to reduce from more than 2 days to 42 minutes. The optimized version of Java was then parallelized, using a threaded approach, which yielded near-linear speedups. The new Java parallel implementation was then compared with other reference platforms from the literature (GENIE3, CLR, ARACNE, C3NET, BC3NET, MRNET, MRNETB, KBOOST and PCIT). The findings indicate that even though some alternatives ensure higher metrics of quality (AUROC/AUPR), DimReduction speed makes it a competitive tool in the field.