Publicação
Optimization of a feature selection tool for inference of gene regulatory networks
| Resumo: | This dissertation concerns the computational optimization of DimReduction, a feature selection tool for inference of Gene Regulatory Networks (GRN). The primary aim was to make it faster and more performance scalable, in order to to handle large amounts of data, which would bring it closer to the bioinformatics community. The approach involved the translation of the original Java GUI-based implementation into a CLI version and the re-implementation of the latter in Python. Once the performance of the Python version was lower than expected, the focus turned again to the Java CLI version. The major bottleneck in this version was determined and addressed, namely the elimination of explicit invocation of the Garbage Collector (GC) led to the runtime of a reference dataset (with 4511 genes) to reduce from more than 2 days to 42 minutes. The optimized version of Java was then parallelized, using a threaded approach, which yielded near-linear speedups. The new Java parallel implementation was then compared with other reference platforms from the literature (GENIE3, CLR, ARACNE, C3NET, BC3NET, MRNET, MRNETB, KBOOST and PCIT). The findings indicate that even though some alternatives ensure higher metrics of quality (AUROC/AUPR), DimReduction speed makes it a competitive tool in the field. |
|---|---|
| Autores principais: | Cunha, João Vítor Fuzetti da |
| Assunto: | Optimization of DimReduction Gene Regulatory Networks (GRN) |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Instituto Politécnico de Bragança |
| Idioma: | inglês |
| Origem: | Biblioteca Digital do IPB |
| Resumo: | This dissertation concerns the computational optimization of DimReduction, a feature selection tool for inference of Gene Regulatory Networks (GRN). The primary aim was to make it faster and more performance scalable, in order to to handle large amounts of data, which would bring it closer to the bioinformatics community. The approach involved the translation of the original Java GUI-based implementation into a CLI version and the re-implementation of the latter in Python. Once the performance of the Python version was lower than expected, the focus turned again to the Java CLI version. The major bottleneck in this version was determined and addressed, namely the elimination of explicit invocation of the Garbage Collector (GC) led to the runtime of a reference dataset (with 4511 genes) to reduce from more than 2 days to 42 minutes. The optimized version of Java was then parallelized, using a threaded approach, which yielded near-linear speedups. The new Java parallel implementation was then compared with other reference platforms from the literature (GENIE3, CLR, ARACNE, C3NET, BC3NET, MRNET, MRNETB, KBOOST and PCIT). The findings indicate that even though some alternatives ensure higher metrics of quality (AUROC/AUPR), DimReduction speed makes it a competitive tool in the field. |
|---|