Publicação
Study of Python-based workflow engines
| Resumo: | Scientists, engineers, and other professionals across various fields of study seek automated, fast, and cost-effective ways to execute their programs. However, in order to achieve this, it would be necessary to increase the complexity of applications and make constant changes to their code. Workflow management systems are tools capable of executing workflows in different environments, ranging from the local machine to HPC clusters and clouds. Currently, there are several workflow management systems available, which are implemented using different approaches and can be executed on a wide variety of platforms. It is important to let users experiment to compare various tools in different environments, but there is a lack of tools available to conduct these experiments seamlessly. This dissertation presents Huginn, a modular framework designed to automate the process of performance metrics collection of workflow executions using workflow management systems. Huginn enables users to create and execute synthetic workflow applications and generates reports with the metrics collected. This framework leverages the WfCommons Project, which has been expanded with two new translators to allow the creation of synthetic CWL and Snakemake workflow applications. In order to validate our framework, we have created reports for a variety of test cases and conducted a comparative analysis of Toil, Snakemake and Nextflow. The objective of this analysis is to provide an impartial comparison guide to these mainstream workflow management systems in a multi-node environment with different workloads and workflow applications from various scientific domains. |
|---|---|
| Autores principais: | Vieira, André Gonçalves |
| Assunto: | Scientific Workflows Workflow Management Systems Common Workflow Language Toil Nextflow Snakemake Fluxos de Trabalho Científico Motores de Fluxo de Trabalho |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | português |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | Scientists, engineers, and other professionals across various fields of study seek automated, fast, and cost-effective ways to execute their programs. However, in order to achieve this, it would be necessary to increase the complexity of applications and make constant changes to their code. Workflow management systems are tools capable of executing workflows in different environments, ranging from the local machine to HPC clusters and clouds. Currently, there are several workflow management systems available, which are implemented using different approaches and can be executed on a wide variety of platforms. It is important to let users experiment to compare various tools in different environments, but there is a lack of tools available to conduct these experiments seamlessly. This dissertation presents Huginn, a modular framework designed to automate the process of performance metrics collection of workflow executions using workflow management systems. Huginn enables users to create and execute synthetic workflow applications and generates reports with the metrics collected. This framework leverages the WfCommons Project, which has been expanded with two new translators to allow the creation of synthetic CWL and Snakemake workflow applications. In order to validate our framework, we have created reports for a variety of test cases and conducted a comparative analysis of Toil, Snakemake and Nextflow. The objective of this analysis is to provide an impartial comparison guide to these mainstream workflow management systems in a multi-node environment with different workloads and workflow applications from various scientific domains. |
|---|