Publicação

Study of Python-based workflow engines

Ver documento

Detalhes bibliográficos
Resumo:Scientists, engineers, and other professionals across various fields of study seek automated, fast, and cost-effective ways to execute their programs. However, in order to achieve this, it would be necessary to increase the complexity of applications and make constant changes to their code. Workflow management systems are tools capable of executing workflows in different environments, ranging from the local machine to HPC clusters and clouds. Currently, there are several workflow management systems available, which are implemented using different approaches and can be executed on a wide variety of platforms. It is important to let users experiment to compare various tools in different environments, but there is a lack of tools available to conduct these experiments seamlessly. This dissertation presents Huginn, a modular framework designed to automate the process of performance metrics collection of workflow executions using workflow management systems. Huginn enables users to create and execute synthetic workflow applications and generates reports with the metrics collected. This framework leverages the WfCommons Project, which has been expanded with two new translators to allow the creation of synthetic CWL and Snakemake workflow applications. In order to validate our framework, we have created reports for a variety of test cases and conducted a comparative analysis of Toil, Snakemake and Nextflow. The objective of this analysis is to provide an impartial comparison guide to these mainstream workflow management systems in a multi-node environment with different workloads and workflow applications from various scientific domains.
Autores principais:Vieira, André Gonçalves
Assunto:Scientific Workflows Workflow Management Systems Common Workflow Language Toil Nextflow Snakemake Fluxos de Trabalho Científico Motores de Fluxo de Trabalho
Ano:2025
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Universidade do Minho
Idioma:português
Origem:RepositóriUM - Universidade do Minho
_version_ 1866878321851105280
author Vieira, André Gonçalves
author_facet Vieira, André Gonçalves
author_role author
contributor_name_str_mv Vilaça, Ricardo Manuel Pereira
RepositóriUM - Universidade do Minho
country_str PT
creators_json_txt [{\"Person.name\":\"Vieira, André Gonçalves\"}]
datacite.contributors.contributor.contributorName.fl_str_mv Vilaça, Ricardo Manuel Pereira
RepositóriUM - Universidade do Minho
datacite.creators.creator.creatorName.fl_str_mv Vieira, André Gonçalves
datacite.date.Accepted.fl_str_mv 2025-01-16T00:00:00Z
datacite.rights.fl_str_mv http://purl.org/coar/access_right/c_abf2
datacite.subjects.subject.fl_str_mv Scientific Workflows
Workflow Management Systems
Common Workflow Language
Toil
Nextflow
Snakemake
Fluxos de Trabalho Científico
Motores de Fluxo de Trabalho
datacite.titles.title.fl_str_mv Study of Python-based workflow engines
dc.contributor.none.fl_str_mv Vilaça, Ricardo Manuel Pereira
RepositóriUM - Universidade do Minho
dc.creator.none.fl_str_mv Vieira, André Gonçalves
dc.date.Accepted.fl_str_mv 2025-01-16T00:00:00Z
dc.description.none.fl_str_mv Cientistas, engenheiros e outros profissionais de várias vertentes procuram formas automáticas e rápidas de executar os seus programas. No entanto, para o conseguir, seria necessário aumentar a complexidade das aplicações e efetuar alterações constantes ao seu código. Os sistemas de gestão de fluxo de trabalho são ferramentas capazes de executar fluxos de trabalho em diferentes ambientes, desde a máquina local até clusters HPC e clouds. Atualmente, existem vários sistemas de gestão de fluxo de trabalho disponíveis, que são implementados através de diferentes abordagens e podem ser executados numa grande variedade de plataformas. É importante permitir que os utilizadores façam experiências para comparar várias ferramentas em diferentes ambientes, mas há uma falta de ferramentas disponíveis para realizar essas experiências de forma simples. Esta dissertação apresenta o Huginn, uma framework modular concebida para automatizar o processo de recolha de métricas de performance de execuções de fluxos de trabalho utilizando sistemas de gestão de fluxo de trabalho. O Huginn permite aos utilizadores criar e executar fluxos de trabalho sintéticos e gerar relatórios com as métricas recolhidas. Esta ferramenta recorre ao WfCommons Project, que foi enriquecido com dois novos translators para permitir a criação de fluxos de trabalho sintéticos escritos em CWL e Snakemake. Para validar a nossa framework, foram criados relatórios para uma variedade de casos de teste e efectuada uma análise comparativa entre Toil, Snakemake e Nextflow. O objetivo desta análise é fornecer um guia de comparação imparcial para estes sistemas de gestão de fluxo de trabalho mais utilizados num ambiente multi-node com diferentes cargas de trabalho e tipos de fluxos de trabalho provenientes de vários campos científicos.
dc.format.none.fl_str_mv application/pdf
dc.identifier.none.fl_str_mv https://hdl.handle.net/1822/100391
dc.language.none.fl_str_mv por
dc.rights.cclincense.fl_str_mv http://creativecommons.org/licenses/by/4.0/
dc.rights.none.fl_str_mv http://purl.org/coar/access_right/c_abf2
dc.rights.rights.copyright.fl_str_mv openAccess
dc.subject.none.fl_str_mv Scientific Workflows
Workflow Management Systems
Common Workflow Language
Toil
Nextflow
Snakemake
Fluxos de Trabalho Científico
Motores de Fluxo de Trabalho
dc.title.fl_str_mv Study of Python-based workflow engines
dc.type.none.fl_str_mv http://purl.org/coar/resource_type/c_bdcc
description Scientists, engineers, and other professionals across various fields of study seek automated, fast, and cost-effective ways to execute their programs. However, in order to achieve this, it would be necessary to increase the complexity of applications and make constant changes to their code. Workflow management systems are tools capable of executing workflows in different environments, ranging from the local machine to HPC clusters and clouds. Currently, there are several workflow management systems available, which are implemented using different approaches and can be executed on a wide variety of platforms. It is important to let users experiment to compare various tools in different environments, but there is a lack of tools available to conduct these experiments seamlessly. This dissertation presents Huginn, a modular framework designed to automate the process of performance metrics collection of workflow executions using workflow management systems. Huginn enables users to create and execute synthetic workflow applications and generates reports with the metrics collected. This framework leverages the WfCommons Project, which has been expanded with two new translators to allow the creation of synthetic CWL and Snakemake workflow applications. In order to validate our framework, we have created reports for a variety of test cases and conducted a comparative analysis of Toil, Snakemake and Nextflow. The objective of this analysis is to provide an impartial comparison guide to these mainstream workflow management systems in a multi-node environment with different workloads and workflow applications from various scientific domains.
dirty 0
eu_rights_str_mv openAccess
format masterThesis
fulltext.url.fl_str_mv https://repositorium.uminho.pt/bitstreams/4c782544-99e0-4bf8-bcc8-edf21d5cad45/download
id rum_cbcc182ec0de4e65ed590aff7ca1fa18
identifier.url.fl_str_mv https://hdl.handle.net/1822/100391
instacron_str repositorium
institution Universidade do Minho
instname_str Universidade do Minho
language por
network_acronym_str rum
network_name_str RepositóriUM - Universidade do Minho
oai_identifier_str oai:repositorium.uminho.pt:1822/100391
organization_str_mv urn:organizationAcronym:repositorium
person_str_mv Vieira, André Gonçalves
publishDate 2025
reponame_str RepositóriUM - Universidade do Minho
repository_id_str urn:repositoryAcronym:rum
service_str_mv urn:repositoryAcronym:rum
spelling porengScientists, engineers, and other professionals across various fields of study seek automated, fast, and cost-effective ways to execute their programs. However, in order to achieve this, it would be necessary to increase the complexity of applications and make constant changes to their code. Workflow management systems are tools capable of executing workflows in different environments, ranging from the local machine to HPC clusters and clouds. Currently, there are several workflow management systems available, which are implemented using different approaches and can be executed on a wide variety of platforms. It is important to let users experiment to compare various tools in different environments, but there is a lack of tools available to conduct these experiments seamlessly. This dissertation presents Huginn, a modular framework designed to automate the process of performance metrics collection of workflow executions using workflow management systems. Huginn enables users to create and execute synthetic workflow applications and generates reports with the metrics collected. This framework leverages the WfCommons Project, which has been expanded with two new translators to allow the creation of synthetic CWL and Snakemake workflow applications. In order to validate our framework, we have created reports for a variety of test cases and conducted a comparative analysis of Toil, Snakemake and Nextflow. The objective of this analysis is to provide an impartial comparison guide to these mainstream workflow management systems in a multi-node environment with different workloads and workflow applications from various scientific domains.porCientistas, engenheiros e outros profissionais de várias vertentes procuram formas automáticas e rápidas de executar os seus programas. No entanto, para o conseguir, seria necessário aumentar a complexidade das aplicações e efetuar alterações constantes ao seu código. Os sistemas de gestão de fluxo de trabalho são ferramentas capazes de executar fluxos de trabalho em diferentes ambientes, desde a máquina local até clusters HPC e clouds. Atualmente, existem vários sistemas de gestão de fluxo de trabalho disponíveis, que são implementados através de diferentes abordagens e podem ser executados numa grande variedade de plataformas. É importante permitir que os utilizadores façam experiências para comparar várias ferramentas em diferentes ambientes, mas há uma falta de ferramentas disponíveis para realizar essas experiências de forma simples. Esta dissertação apresenta o Huginn, uma framework modular concebida para automatizar o processo de recolha de métricas de performance de execuções de fluxos de trabalho utilizando sistemas de gestão de fluxo de trabalho. O Huginn permite aos utilizadores criar e executar fluxos de trabalho sintéticos e gerar relatórios com as métricas recolhidas. Esta ferramenta recorre ao WfCommons Project, que foi enriquecido com dois novos translators para permitir a criação de fluxos de trabalho sintéticos escritos em CWL e Snakemake. Para validar a nossa framework, foram criados relatórios para uma variedade de casos de teste e efectuada uma análise comparativa entre Toil, Snakemake e Nextflow. O objetivo desta análise é fornecer um guia de comparação imparcial para estes sistemas de gestão de fluxo de trabalho mais utilizados num ambiente multi-node com diferentes cargas de trabalho e tipos de fluxos de trabalho provenientes de vários campos científicos.application/pdfengStudy of Python-based workflow enginesVieira, André GonçalvesVilaça, Ricardo Manuel PereiraHostingInstitutionOrganizationalRepositóriUM - Universidade do Minhoe-mailmailto:repositorium@usdb.uminho.ptrepositorium@usdb.uminho.ptURNurn:tid:2042219192025-01-162024-102025-01-16T00:00:00ZHandlehttps://hdl.handle.net/1822/100391http://purl.org/coar/access_right/c_abf2open accessScientific WorkflowsWorkflow Management SystemsCommon Workflow LanguageToilNextflowSnakemakeFluxos de Trabalho CientíficoMotores de Fluxo de Trabalho5042774 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesis2025-01-16http://creativecommons.org/licenses/by/4.0/openAccesshttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://repositorium.uminho.pt/bitstreams/4c782544-99e0-4bf8-bcc8-edf21d5cad45/download
spellingShingle Study of Python-based workflow engines
Vieira, André Gonçalves
Scientific Workflows
Workflow Management Systems
Common Workflow Language
Toil
Nextflow
Snakemake
Fluxos de Trabalho Científico
Motores de Fluxo de Trabalho
status SINGLETON
subject.fl_str_mv Scientific Workflows
Workflow Management Systems
Common Workflow Language
Toil
Nextflow
Snakemake
Fluxos de Trabalho Científico
Motores de Fluxo de Trabalho
title Study of Python-based workflow engines
title_full Study of Python-based workflow engines
title_fullStr Study of Python-based workflow engines
title_full_unstemmed Study of Python-based workflow engines
title_short Study of Python-based workflow engines
title_sort Study of Python-based workflow engines
topic Scientific Workflows
Workflow Management Systems
Common Workflow Language
Toil
Nextflow
Snakemake
Fluxos de Trabalho Científico
Motores de Fluxo de Trabalho
topic_facet Scientific Workflows
Workflow Management Systems
Common Workflow Language
Toil
Nextflow
Snakemake
Fluxos de Trabalho Científico
Motores de Fluxo de Trabalho
url https://hdl.handle.net/1822/100391
visible 1