Author(s): Duarte, Ana ; Belo, Orlando
Date: 2024
Persistent ID: https://hdl.handle.net/1822/90751
Origin: RepositóriUM - Universidade do Minho
Subject(s): Concept drift; Data mining; Pentaho data integration; Self-updating models; Weka; Workflow
Author(s): Duarte, Ana ; Belo, Orlando
Date: 2024
Persistent ID: https://hdl.handle.net/1822/90751
Origin: RepositóriUM - Universidade do Minho
Subject(s): Concept drift; Data mining; Pentaho data integration; Self-updating models; Weka; Workflow
Data mining techniques are currently of great importance in companies and organisations worldwide for building predictive models. These models are particularly useful for classifying new data and supporting decision-making processes by helping to make the most appropriate decisions. However, over time, the predictive models created can become outdated as the patterns found in the data change due to natural evolution. This aspect can affect the quality of the models and lead to results that do not match reality. In this paper, we present a general approach for creating a self-updating system of predictive models that can be adapted to specific contexts. This system periodically generates and selects the most appropriate predictive model for ensuring the validity of its predictions. It integrates data processing and data mining model generation, and allows for the detection of changes in existing patterns as new data is added. This is suitable for supervised data mining tasks that may be affected by data evolution. The implementation of the system has demonstrated that it is possible to pre-process the data and select the best predictive model. In addition, since the execution is triggered automatically, the need for system maintenance is reduced.