Publicação
A meta-learning approach for selecting machine learning algorithms
| Resumo: | One of the major challenges in Machine Learning is to investigate the capabilities and lim itations of the existing algorithms to identify when one algorithm is more adequate than another to solve particular problems. Traditional approaches to predicting the performance of algorithms often involve costly trial-and-error procedures or expert knowledge, which is not always straightforward to acquire. Thus, the main goal of this dissertation is to support beginners or even experienced data scientists by automatically indicating which classifica tion algorithm is most suitable for their datasets. This dissertation proposes the use of Meta-Learning as a possible solution to the above mentioned problem. In this respect, we introduced a novel framework for the automatic generation of meta-datasets. Taking advantage of the developed framework, several clas sification datasets from public sources were used. The result is the meta-dataset for the experiment of this research project. Concerning the goal of forecasting the best model for a classification dataset, two different solutions are presented: the first toward binary classification and the second on multiclass classification. A variety of Machine Learning algorithms are tested and compared through cross-validation. The experiment confirms the feasibility of applying Meta-Learning to select the algorithm that is expected to obtain the best performance for classification problems. |
|---|---|
| Autores principais: | Monteiro, José Pedro Santos |
| Assunto: | Machine learning Meta-learning Metadata Machine learning algorithms selection Classification Data mining Metadados Seleção de algoritmos Problemas de classificação Análise de dados |
| Ano: | 2020 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso aberto |
| Instituição associada: | Universidade do Minho |
| Idioma: | inglês |
| Origem: | RepositóriUM - Universidade do Minho |
| Resumo: | One of the major challenges in Machine Learning is to investigate the capabilities and lim itations of the existing algorithms to identify when one algorithm is more adequate than another to solve particular problems. Traditional approaches to predicting the performance of algorithms often involve costly trial-and-error procedures or expert knowledge, which is not always straightforward to acquire. Thus, the main goal of this dissertation is to support beginners or even experienced data scientists by automatically indicating which classifica tion algorithm is most suitable for their datasets. This dissertation proposes the use of Meta-Learning as a possible solution to the above mentioned problem. In this respect, we introduced a novel framework for the automatic generation of meta-datasets. Taking advantage of the developed framework, several clas sification datasets from public sources were used. The result is the meta-dataset for the experiment of this research project. Concerning the goal of forecasting the best model for a classification dataset, two different solutions are presented: the first toward binary classification and the second on multiclass classification. A variety of Machine Learning algorithms are tested and compared through cross-validation. The experiment confirms the feasibility of applying Meta-Learning to select the algorithm that is expected to obtain the best performance for classification problems. |
|---|