Publicação
Study of Python-based workflow engines
| Resumo: | Scientists, engineers, and other professionals across various fields of study seek automated, fast, and cost-effective ways to execute their programs. However, in order to achieve this, it would be necessary to increase the complexity of applications and make constant changes to their code. Workflow management systems are tools capable of executing workflows in different environments, ranging from the local machine to HPC clusters and clouds. Currently, there are several workflow management systems available, which are implemented using different approaches and can be executed on a wide variety of platforms. It is important to let users experiment to compare various tools in different environments, but there is a lack of tools available to conduct these experiments seamlessly. This dissertation presents Huginn, a modular framework designed to automate the process of performance metrics collection of workflow executions using workflow management systems. Huginn enables users to create and execute synthetic workflow applications and generates reports with the metrics collected. This framework leverages the WfCommons Project, which has been expanded with two new translators to allow the creation of synthetic CWL and Snakemake workflow applications. In order to validate our framework, we have created reports for a variety of test cases and conducted a comparative analysis of Toil, Snakemake and Nextflow. The objective of this analysis is to provide an impartial comparison guide to these mainstream workflow management systems in a multi-node environment with different workloads and workflow applications from various scientific domains. |
|---|---|
| Autores principais: | Vieira, André Gonçalves |
| Assunto: | Scientific Workflows Workflow Management Systems Common Workflow Language Toil Nextflow Snakemake Fluxos de Trabalho Científico Motores de Fluxo de Trabalho |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | português |
| Origem: | RepositóriUM - Universidade do Minho |
| _version_ | 1866878321851105280 |
|---|---|
| author | Vieira, André Gonçalves |
| author_facet | Vieira, André Gonçalves |
| author_role | author |
| contributor_name_str_mv | Vilaça, Ricardo Manuel Pereira RepositóriUM - Universidade do Minho |
| country_str | PT |
| creators_json_txt | [{\"Person.name\":\"Vieira, André Gonçalves\"}] |
| datacite.contributors.contributor.contributorName.fl_str_mv | Vilaça, Ricardo Manuel Pereira RepositóriUM - Universidade do Minho |
| datacite.creators.creator.creatorName.fl_str_mv | Vieira, André Gonçalves |
| datacite.date.Accepted.fl_str_mv | 2025-01-16T00:00:00Z |
| datacite.rights.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| datacite.subjects.subject.fl_str_mv | Scientific Workflows Workflow Management Systems Common Workflow Language Toil Nextflow Snakemake Fluxos de Trabalho Científico Motores de Fluxo de Trabalho |
| datacite.titles.title.fl_str_mv | Study of Python-based workflow engines |
| dc.contributor.none.fl_str_mv | Vilaça, Ricardo Manuel Pereira RepositóriUM - Universidade do Minho |
| dc.creator.none.fl_str_mv | Vieira, André Gonçalves |
| dc.date.Accepted.fl_str_mv | 2025-01-16T00:00:00Z |
| dc.description.none.fl_str_mv | Cientistas, engenheiros e outros profissionais de várias vertentes procuram formas automáticas e rápidas de executar os seus programas. No entanto, para o conseguir, seria necessário aumentar a complexidade das aplicações e efetuar alterações constantes ao seu código. Os sistemas de gestão de fluxo de trabalho são ferramentas capazes de executar fluxos de trabalho em diferentes ambientes, desde a máquina local até clusters HPC e clouds. Atualmente, existem vários sistemas de gestão de fluxo de trabalho disponíveis, que são implementados através de diferentes abordagens e podem ser executados numa grande variedade de plataformas. É importante permitir que os utilizadores façam experiências para comparar várias ferramentas em diferentes ambientes, mas há uma falta de ferramentas disponíveis para realizar essas experiências de forma simples. Esta dissertação apresenta o Huginn, uma framework modular concebida para automatizar o processo de recolha de métricas de performance de execuções de fluxos de trabalho utilizando sistemas de gestão de fluxo de trabalho. O Huginn permite aos utilizadores criar e executar fluxos de trabalho sintéticos e gerar relatórios com as métricas recolhidas. Esta ferramenta recorre ao WfCommons Project, que foi enriquecido com dois novos translators para permitir a criação de fluxos de trabalho sintéticos escritos em CWL e Snakemake. Para validar a nossa framework, foram criados relatórios para uma variedade de casos de teste e efectuada uma análise comparativa entre Toil, Snakemake e Nextflow. O objetivo desta análise é fornecer um guia de comparação imparcial para estes sistemas de gestão de fluxo de trabalho mais utilizados num ambiente multi-node com diferentes cargas de trabalho e tipos de fluxos de trabalho provenientes de vários campos científicos. |
| dc.format.none.fl_str_mv | application/pdf |
| dc.identifier.none.fl_str_mv | https://hdl.handle.net/1822/100391 |
| dc.language.none.fl_str_mv | por |
| dc.rights.cclincense.fl_str_mv | http://creativecommons.org/licenses/by/4.0/ |
| dc.rights.none.fl_str_mv | http://purl.org/coar/access_right/c_abf2 |
| dc.rights.rights.copyright.fl_str_mv | openAccess |
| dc.subject.none.fl_str_mv | Scientific Workflows Workflow Management Systems Common Workflow Language Toil Nextflow Snakemake Fluxos de Trabalho Científico Motores de Fluxo de Trabalho |
| dc.title.fl_str_mv | Study of Python-based workflow engines |
| dc.type.none.fl_str_mv | http://purl.org/coar/resource_type/c_bdcc |
| description | Scientists, engineers, and other professionals across various fields of study seek automated, fast, and cost-effective ways to execute their programs. However, in order to achieve this, it would be necessary to increase the complexity of applications and make constant changes to their code. Workflow management systems are tools capable of executing workflows in different environments, ranging from the local machine to HPC clusters and clouds. Currently, there are several workflow management systems available, which are implemented using different approaches and can be executed on a wide variety of platforms. It is important to let users experiment to compare various tools in different environments, but there is a lack of tools available to conduct these experiments seamlessly. This dissertation presents Huginn, a modular framework designed to automate the process of performance metrics collection of workflow executions using workflow management systems. Huginn enables users to create and execute synthetic workflow applications and generates reports with the metrics collected. This framework leverages the WfCommons Project, which has been expanded with two new translators to allow the creation of synthetic CWL and Snakemake workflow applications. In order to validate our framework, we have created reports for a variety of test cases and conducted a comparative analysis of Toil, Snakemake and Nextflow. The objective of this analysis is to provide an impartial comparison guide to these mainstream workflow management systems in a multi-node environment with different workloads and workflow applications from various scientific domains. |
| dirty | 0 |
| eu_rights_str_mv | openAccess |
| format | masterThesis |
| fulltext.url.fl_str_mv | https://repositorium.uminho.pt/bitstreams/4c782544-99e0-4bf8-bcc8-edf21d5cad45/download |
| id | rum_cbcc182ec0de4e65ed590aff7ca1fa18 |
| identifier.url.fl_str_mv | https://hdl.handle.net/1822/100391 |
| instacron_str | repositorium |
| institution | Universidade do Minho |
| instname_str | Universidade do Minho |
| language | por |
| network_acronym_str | rum |
| network_name_str | RepositóriUM - Universidade do Minho |
| oai_identifier_str | oai:repositorium.uminho.pt:1822/100391 |
| organization_str_mv | urn:organizationAcronym:repositorium |
| person_str_mv | Vieira, André Gonçalves |
| publishDate | 2025 |
| reponame_str | RepositóriUM - Universidade do Minho |
| repository_id_str | urn:repositoryAcronym:rum |
| service_str_mv | urn:repositoryAcronym:rum |
| spelling | porengScientists, engineers, and other professionals across various fields of study seek automated, fast, and cost-effective ways to execute their programs. However, in order to achieve this, it would be necessary to increase the complexity of applications and make constant changes to their code. Workflow management systems are tools capable of executing workflows in different environments, ranging from the local machine to HPC clusters and clouds. Currently, there are several workflow management systems available, which are implemented using different approaches and can be executed on a wide variety of platforms. It is important to let users experiment to compare various tools in different environments, but there is a lack of tools available to conduct these experiments seamlessly. This dissertation presents Huginn, a modular framework designed to automate the process of performance metrics collection of workflow executions using workflow management systems. Huginn enables users to create and execute synthetic workflow applications and generates reports with the metrics collected. This framework leverages the WfCommons Project, which has been expanded with two new translators to allow the creation of synthetic CWL and Snakemake workflow applications. In order to validate our framework, we have created reports for a variety of test cases and conducted a comparative analysis of Toil, Snakemake and Nextflow. The objective of this analysis is to provide an impartial comparison guide to these mainstream workflow management systems in a multi-node environment with different workloads and workflow applications from various scientific domains.porCientistas, engenheiros e outros profissionais de várias vertentes procuram formas automáticas e rápidas de executar os seus programas. No entanto, para o conseguir, seria necessário aumentar a complexidade das aplicações e efetuar alterações constantes ao seu código. Os sistemas de gestão de fluxo de trabalho são ferramentas capazes de executar fluxos de trabalho em diferentes ambientes, desde a máquina local até clusters HPC e clouds. Atualmente, existem vários sistemas de gestão de fluxo de trabalho disponíveis, que são implementados através de diferentes abordagens e podem ser executados numa grande variedade de plataformas. É importante permitir que os utilizadores façam experiências para comparar várias ferramentas em diferentes ambientes, mas há uma falta de ferramentas disponíveis para realizar essas experiências de forma simples. Esta dissertação apresenta o Huginn, uma framework modular concebida para automatizar o processo de recolha de métricas de performance de execuções de fluxos de trabalho utilizando sistemas de gestão de fluxo de trabalho. O Huginn permite aos utilizadores criar e executar fluxos de trabalho sintéticos e gerar relatórios com as métricas recolhidas. Esta ferramenta recorre ao WfCommons Project, que foi enriquecido com dois novos translators para permitir a criação de fluxos de trabalho sintéticos escritos em CWL e Snakemake. Para validar a nossa framework, foram criados relatórios para uma variedade de casos de teste e efectuada uma análise comparativa entre Toil, Snakemake e Nextflow. O objetivo desta análise é fornecer um guia de comparação imparcial para estes sistemas de gestão de fluxo de trabalho mais utilizados num ambiente multi-node com diferentes cargas de trabalho e tipos de fluxos de trabalho provenientes de vários campos científicos.application/pdfengStudy of Python-based workflow enginesVieira, André GonçalvesVilaça, Ricardo Manuel PereiraHostingInstitutionOrganizationalRepositóriUM - Universidade do Minhoe-mailmailto:repositorium@usdb.uminho.ptrepositorium@usdb.uminho.ptURNurn:tid:2042219192025-01-162024-102025-01-16T00:00:00ZHandlehttps://hdl.handle.net/1822/100391http://purl.org/coar/access_right/c_abf2open accessScientific WorkflowsWorkflow Management SystemsCommon Workflow LanguageToilNextflowSnakemakeFluxos de Trabalho CientíficoMotores de Fluxo de Trabalho5042774 bytesliteraturehttp://purl.org/coar/resource_type/c_bdccmaster thesis2025-01-16http://creativecommons.org/licenses/by/4.0/openAccesshttp://purl.org/coar/access_right/c_abf2application/pdffulltexthttps://repositorium.uminho.pt/bitstreams/4c782544-99e0-4bf8-bcc8-edf21d5cad45/download |
| spellingShingle | Study of Python-based workflow engines Vieira, André Gonçalves Scientific Workflows Workflow Management Systems Common Workflow Language Toil Nextflow Snakemake Fluxos de Trabalho Científico Motores de Fluxo de Trabalho |
| status | SINGLETON |
| subject.fl_str_mv | Scientific Workflows Workflow Management Systems Common Workflow Language Toil Nextflow Snakemake Fluxos de Trabalho Científico Motores de Fluxo de Trabalho |
| title | Study of Python-based workflow engines |
| title_full | Study of Python-based workflow engines |
| title_fullStr | Study of Python-based workflow engines |
| title_full_unstemmed | Study of Python-based workflow engines |
| title_short | Study of Python-based workflow engines |
| title_sort | Study of Python-based workflow engines |
| topic | Scientific Workflows Workflow Management Systems Common Workflow Language Toil Nextflow Snakemake Fluxos de Trabalho Científico Motores de Fluxo de Trabalho |
| topic_facet | Scientific Workflows Workflow Management Systems Common Workflow Language Toil Nextflow Snakemake Fluxos de Trabalho Científico Motores de Fluxo de Trabalho |
| url | https://hdl.handle.net/1822/100391 |
| visible | 1 |