Publicação

WEXGrid: a modular python framework for defining, managing, and executing complex workflows in local and lSF grid environments

Ver documento

Detalhes bibliográficos
Resumo:WEXGrid is a modular framework and software tool designed to address the challenges of creating, managing, and executing scientific workflows on local workstations and distributed Load Sharing Facility (LSF) Grid environments. Break workflows into discrete targets with explicit inputs, outputs, execution actions, and resource requirements, all of which are defined programmatically in Python. This ensures that the underlying concepts remain accessible to computational scientists. The system then builds a dependency graph in which the execution order is automated based on data and control dependencies. This graph can be used to drive dynamic scheduling based on resource availability and task readiness. Cache management reduces redundant computations by validating outputs through timestamps and checksums. This influences scheduling decisions, optimizing resource utilization. WEXGrid supports parallel and asynchronous task execution, balancing workload distribution across resources with data locality concerns to avoid Input/Output (I/O) bottlenecks. Its fault-tolerant architecture includes mechanisms for detecting and isolating faults to avoid cascading failures. Its provenance capture and metadata interoperability meet the Findable, Accessible, Interoperable, Reusable (FAIR) principles for reproducibility and portability. WEXGrid aims to enable scalability and maintainability, providing a unified interface for defining and executing workflows for heterogeneous computational infrastructures without sacrificing performance or transparency. Future enhancements include adaptive scheduling informed by real-time telemetry, rich provenance integration, and extended interoperability with cloud and containerized environments. These enhancements will ensure sustained efficiency and reproducibility across heterogeneous scientific computing platforms.
Autores principais:Marques, João Pedro da Silva
Assunto:Distributed computing Parallel execution LSF grid environments Dynamic task scheduling
Ano:2025
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso aberto
Instituição associada:Instituto Politécnico de Bragança
Idioma:inglês
Origem:Biblioteca Digital do IPB
Descrição
Resumo:WEXGrid is a modular framework and software tool designed to address the challenges of creating, managing, and executing scientific workflows on local workstations and distributed Load Sharing Facility (LSF) Grid environments. Break workflows into discrete targets with explicit inputs, outputs, execution actions, and resource requirements, all of which are defined programmatically in Python. This ensures that the underlying concepts remain accessible to computational scientists. The system then builds a dependency graph in which the execution order is automated based on data and control dependencies. This graph can be used to drive dynamic scheduling based on resource availability and task readiness. Cache management reduces redundant computations by validating outputs through timestamps and checksums. This influences scheduling decisions, optimizing resource utilization. WEXGrid supports parallel and asynchronous task execution, balancing workload distribution across resources with data locality concerns to avoid Input/Output (I/O) bottlenecks. Its fault-tolerant architecture includes mechanisms for detecting and isolating faults to avoid cascading failures. Its provenance capture and metadata interoperability meet the Findable, Accessible, Interoperable, Reusable (FAIR) principles for reproducibility and portability. WEXGrid aims to enable scalability and maintainability, providing a unified interface for defining and executing workflows for heterogeneous computational infrastructures without sacrificing performance or transparency. Future enhancements include adaptive scheduling informed by real-time telemetry, rich provenance integration, and extended interoperability with cloud and containerized environments. These enhancements will ensure sustained efficiency and reproducibility across heterogeneous scientific computing platforms.