Português Contacts Subscribe RSS

Document details

Aggregation in Ill-Conditioned Regression Models: A Comparison with Entropy-Based Methods

Author(s): Tavares, Ana Helena ; Silva, Ana ; Freitas, Tiago ; Costa, Maria ; Macedo, Pedro ; Costa, Rui A. da

Date: 2025

Persistent ID: http://hdl.handle.net/10773/46060

Origin: RIA - Repositório Institucional da Universidade de Aveiro

Subject(s): Big data; Collinearity; Maximum entropy; Regression modelling

Description

Despite the advances on data analysis methodologies in the last decades, most of the traditional regression methods cannot be directly applied to large-scale data. Although aggregation methods are especially designed to deal with large-scale data, their performance may be strongly reduced in ill-conditioned problems (due to collinearity issues). This work compares the performance of a recent approach based on normalized entropy, a concept from information theory and info-metrics, with bagging and magging, two well-established aggregation methods in the literature, providing valuable insights for applications in regression analysis with large-scale data. While the results reveal a similar performance between methods in terms of prediction accuracy, the approach based on normalized entropy largely outperforms the other methods in terms of precision accuracy, even considering a smaller number of groups and observations per group, which represents an important advantage in inference problems with large-scale data. This work also alerts for the risk of using the OLS estimator, particularly under collinearity scenarios, knowing that data scientists frequently use linear models as a simplified view of the reality in big data analysis, and the OLS estimator is routinely used in practice. Beyond the promising findings of the simulation study, our estimation and aggregation strategies show strong potential for real-world applications in fields such as econometrics, genomics, environmental sciences, and machine learning, where data challenges such as noise and ill-conditioning are persistent.

Document Type Journal article
Language English

Document details

Aggregation in Ill-Conditioned Regression Models: A Comparison with Entropy-Based Methods

Related documents