Publicação

Generating and updating supervised Data Mining models on a periodic basis

Detalhes bibliográficos
Resumo:	Data mining techniques are currently of great importance in companies and organisations worldwide for building predictive models. These models are particularly useful for classifying new data and supporting decision-making processes by helping to make the most appropriate decisions. However, over time, the predictive models created can become outdated as the patterns found in the data change due to natural evolution. This aspect can affect the quality of the models and lead to results that do not match reality. In this paper, we present a general approach for creating a self-updating system of predictive models that can be adapted to specific contexts. This system periodically generates and selects the most appropriate predictive model for ensuring the validity of its predictions. It integrates data processing and data mining model generation, and allows for the detection of changes in existing patterns as new data is added. This is suitable for supervised data mining tasks that may be affected by data evolution. The implementation of the system has demonstrated that it is possible to pre-process the data and select the best predictive model. In addition, since the execution is triggered automatically, the need for system maintenance is reduced.
Autores principais:	Duarte, Ana
Outros Autores:	Belo, Orlando
Assunto:	Concept drift Data mining Pentaho data integration Self-updating models Weka Workflow
Ano:	2024
País:	Portugal
Tipo de documento:	comunicação em conferência
Tipo de acesso:	acesso restrito
Instituição associada:	Universidade do Minho
Idioma:	inglês
Origem:	RepositóriUM - Universidade do Minho

Descrição
Resumo:	Data mining techniques are currently of great importance in companies and organisations worldwide for building predictive models. These models are particularly useful for classifying new data and supporting decision-making processes by helping to make the most appropriate decisions. However, over time, the predictive models created can become outdated as the patterns found in the data change due to natural evolution. This aspect can affect the quality of the models and lead to results that do not match reality. In this paper, we present a general approach for creating a self-updating system of predictive models that can be adapted to specific contexts. This system periodically generates and selects the most appropriate predictive model for ensuring the validity of its predictions. It integrates data processing and data mining model generation, and allows for the detection of changes in existing patterns as new data is added. This is suitable for supervised data mining tasks that may be affected by data evolution. The implementation of the system has demonstrated that it is possible to pre-process the data and select the best predictive model. In addition, since the execution is triggered automatically, the need for system maintenance is reduced.