Publicação
Improving Machine Learning Pipeline Creation using Visual Programming and Static Analysis
| Resumo: | ML pipelines are composed of several steps that load data, clean it, process it, apply learning algorithms and produce either reports or deploy inference systems into production. In real-world scenarios, pipelines can take days, weeks, or months to train with large quantities of data. Unfortunately, current tools to design and orchestrate ML pipelines are oblivious to the semantics of each step, allowing developers to easily introduce errors when connecting two components that might not work together, either syntactically or semantically. Data scientists and engineers often find these bugs during or after the lengthy execution, which decreases their productivity. We propose a Visual Programming Language (VPL) enriched with semantic constraints regarding the behavior of each component and a verification methodology that verifies entire pipelines to detect common ML bugs that existing visual and textual programming languages do not. We evaluate this methodology on a set of six bugs taken from a data science company focused on preventing financial fraud on big data. We were able detect these data engineering and data balancing bugs, as well as detect unnecessary computation in the pipelines. |
|---|---|
| Autores principais: | David, João Pedro Vieira |
| Assunto: | Programação Visual Aprendizagem Automática Pipeline Verificação de Tipos Compilador Teses de mestrado - 2021 |
| Ano: | 2021 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade de Lisboa |
| Idioma: | inglês |
| Origem: | Repositório da Universidade de Lisboa |
Registos relacionados
school Static Analysis for Detection of Defects in Machine Learning Pipelines
por: Silva, Pedro Miguel Alcântara da
Publicado em: (2024)
por: Silva, Pedro Miguel Alcântara da
Publicado em: (2024)
school Formal verification of parallel C+MPI programs
por: Martins, Nuno Alexandre Dias
Publicado em: (2013)
por: Martins, Nuno Alexandre Dias
Publicado em: (2013)
school Protocol based programming of concurrent systems
por: Santos, César Augusto Ribeiro dos
Publicado em: (2014)
por: Santos, César Augusto Ribeiro dos
Publicado em: (2014)
groups NGSPipes: from specification to automatic deployment of NGS pipelines
por: Dantas, Bruno
Publicado em: (2016)
por: Dantas, Bruno
Publicado em: (2016)
school Synthesis of correct-by-construction MPI programs
por: Lemos, Filipe Emanuel Ventura Pires de Matos
Publicado em: (2014)
por: Lemos, Filipe Emanuel Ventura Pires de Matos
Publicado em: (2014)
school Modernização de uma pipeline de desenvolvimento web
por: Ferreira, Daniela Maria Lopes
Publicado em: (2024)
por: Ferreira, Daniela Maria Lopes
Publicado em: (2024)
article Deep learning based pipeline for fingerprinting using brain functional MRI connectivity data
por: Lori, Nicolás F.
Publicado em: (2018)
por: Lori, Nicolás F.
Publicado em: (2018)
school LiquidJava : extending Java with refinements
por: Gamboa, Catarina Ventura
Publicado em: (2022)
por: Gamboa, Catarina Ventura
Publicado em: (2022)
school Tools and techniques for the static verification of progress in communication-centred systems
por: Camacho, André Filipe Marinhas Henriques da Silva
Publicado em: (2014)
por: Camacho, André Filipe Marinhas Henriques da Silva
Publicado em: (2014)
school Performance measure analysis and data pipeline integration for enhanced dashboard visualisation
por: Tronsberg, Anna
Publicado em: (2024)
por: Tronsberg, Anna
Publicado em: (2024)
school Automation of machine learning pipelines for anomaly detection challenges
por: Martins, Ricardo Rodrigues
Publicado em: (2023)
por: Martins, Ricardo Rodrigues
Publicado em: (2023)
school Pipeline CI/CD para automação e orquestração de redes
por: Borges, João Miguel Caracóis
Publicado em: (2023)
por: Borges, João Miguel Caracóis
Publicado em: (2023)
school Optmizing 16S sequencing analysis pipelines
por: Viana, Samuel Dias Rosa
Publicado em: (2016)
por: Viana, Samuel Dias Rosa
Publicado em: (2016)
school Towards the Conceptualization of Refinement Typed Genetic Programming
por: Santos, Paulo Alexandre Canelas dos
Publicado em: (2020)
por: Santos, Paulo Alexandre Canelas dos
Publicado em: (2020)
article Tuning pipelined scientific data analyses for efficient multicore execution
por: Pereira, André Martins
Publicado em: (2016)
por: Pereira, André Martins
Publicado em: (2016)
school Automatic conversion of ADA source code to scala
por: Espada, Guilherme Jorge Nunes Monteiro
Publicado em: (2020)
por: Espada, Guilherme Jorge Nunes Monteiro
Publicado em: (2020)
school Parallel execution of pipelines using bioinformatics tools
por: Fleitas, Calmenelias Pino
Publicado em: (2019)
por: Fleitas, Calmenelias Pino
Publicado em: (2019)
school Developing an automated machine learning framework for protein classification
por: Bullitta, Roberto Costa
Publicado em: (2023)
por: Bullitta, Roberto Costa
Publicado em: (2023)
article MOSCA: an automated pipeline for integrated metagenomics and metatranscriptomics data analysis
por: Sequeira, J. C.
Publicado em: (2019)
por: Sequeira, J. C.
Publicado em: (2019)
school Robi: a visual programming language for educational robotics
por: Galvão, Gustavo Linhares
Publicado em: (2022)
por: Galvão, Gustavo Linhares
Publicado em: (2022)
school LINEAR AND SHARED OBJECTS IN CONCURRENT PROGRAMMING
por: Campos, Joana Correia
Publicado em: (2010)
por: Campos, Joana Correia
Publicado em: (2010)
school Porting do compilador LLVM com o frontend Clang para uma nova arquitetura de processador
por: Silva, José Miguel Pereira da
Publicado em: (2013)
por: Silva, José Miguel Pereira da
Publicado em: (2013)
school A machine learning based drug discovery pipeline: finding new therapies for Cystic Fibrosis
por: Sousa, Paulo Nuno Hilário Teixeira de
Publicado em: (2019)
por: Sousa, Paulo Nuno Hilário Teixeira de
Publicado em: (2019)
category Exemplos práticos de programação visual em C#
por: Cortez, Paulo
Publicado em: (2008)
por: Cortez, Paulo
Publicado em: (2008)
school Ensino da programação através de programação visual
por: Santos, Renato Manuel Simões, 1980-
Publicado em: (2013)
por: Santos, Renato Manuel Simões, 1980-
Publicado em: (2013)
school RobotFix : Detecting Bugs On Variables In Robot Programs
por: Tavares, Miguel Rodrigues
Publicado em: (2023)
por: Tavares, Miguel Rodrigues
Publicado em: (2023)
school Automation of machine learning models benchmarking
por: Sá, João Pedro Barros
Publicado em: (2022)
por: Sá, João Pedro Barros
Publicado em: (2022)
school EWVM - an Educational Web Virtual Machine
por: Teixeira, Sofia Almeida
Publicado em: (2022)
por: Teixeira, Sofia Almeida
Publicado em: (2022)
school Studying elements ofgenetic programming for multiclass classification
por: Batista, João Eduardo Silva Pombinho
Publicado em: (2018)
por: Batista, João Eduardo Silva Pombinho
Publicado em: (2018)
article Colored Petri nets in the simulation of ETL standard tasks: the surrogate key pipelining case
por: Silva, Diogo
Publicado em: (2012)
por: Silva, Diogo
Publicado em: (2012)
school Criação de ferramentas de desenvolvimento para uma arquitetura baseada em Microblaze
por: Vasconcelos, Tiago Manuel Martins
Publicado em: (2014)
por: Vasconcelos, Tiago Manuel Martins
Publicado em: (2014)
school Static verification of data races in openMP
por: Silva, Kátya Thaís Martins da
Publicado em: (2014)
por: Silva, Kátya Thaís Martins da
Publicado em: (2014)
school Adaptive scheduling of multi-product pipelines using Fuzzy-ACO hybrid approach
por: Bukhari, Syed Muhammad Zeeshan
Publicado em: (2025)
por: Bukhari, Syed Muhammad Zeeshan
Publicado em: (2025)
school QUIC a quiz creation tool to support programming classes
por: Alves, Joana Maia Teixeira
Publicado em: (2024)
por: Alves, Joana Maia Teixeira
Publicado em: (2024)
school Building an analysis pipeline to explore at ultra-high field (7T) MRI the laminar distribution of myelin in the human cortex in vivo
por: Nunes, Márcia Cláudia Guimarães
Publicado em: (2023)
por: Nunes, Márcia Cláudia Guimarães
Publicado em: (2023)
school Automatic task discovery : towards full automation of the machine learning lifecycle
por: Gehmayr, Jonathan
Publicado em: (2024)
por: Gehmayr, Jonathan
Publicado em: (2024)
school Development of a machine learning-based pipeline able to predict genes associated with diseases and cell processes using interpretable network embeddings
por: Coelho, Alexandre Filipe dos Reis
Publicado em: (2023)
por: Coelho, Alexandre Filipe dos Reis
Publicado em: (2023)
school Metodologias BIM para verificação regulamentar em contexto de licenciamento municipal: proposta, implementação e aplicação
por: Santos, Miguel Filipe de Sousa
Publicado em: (2021)
por: Santos, Miguel Filipe de Sousa
Publicado em: (2021)
school Development of an image processing pipeline for the study of corticol lesions in multiple sclerosis patients using ultra-high field MRI
por: Marques, Marta Filipa Mateus
Publicado em: (2019)
por: Marques, Marta Filipa Mateus
Publicado em: (2019)
school Software weaknesses detection using static-code analysis and machine learning techniques
por: Conté, Sana
Publicado em: (2023)
por: Conté, Sana
Publicado em: (2023)
Registos relacionados
-
school Static Analysis for Detection of Defects in Machine Learning Pipelines
por: Silva, Pedro Miguel Alcântara da
Publicado em: (2024) -
school Formal verification of parallel C+MPI programs
por: Martins, Nuno Alexandre Dias
Publicado em: (2013) -
school Protocol based programming of concurrent systems
por: Santos, César Augusto Ribeiro dos
Publicado em: (2014) -
groups NGSPipes: from specification to automatic deployment of NGS pipelines
por: Dantas, Bruno
Publicado em: (2016) -
school Synthesis of correct-by-construction MPI programs
por: Lemos, Filipe Emanuel Ventura Pires de Matos
Publicado em: (2014)