Resultados da pesquisa
Catálogo de Publicações - Todos
- A mostrar 1 - 2 resultados de 2
-
1
imbalanced-learn-extra
Publicaçãopor Douzas, GeorgiosOutros Autores: Bação, FernandoOrigem: Repositório Institucional da UNLLearning from imbalanced data is a common challenge in supervised learning, as most classifiers assume balanced class distributions. Among the strategies to mitigate this issue, oversampling algorithms offer a flexible and model-agnostic solution by generating synthetic samples for minority classes. In this paper, we introduce the imbalanced-learn-extra Python library, an open-source extension of the imbalanced-learn ecosystem that provides additional oversampling techniques for research and practical use. The library integrates seamlessly with Scikit-Learn, allowing users to easily incorporate it into existing workflows. It implements Geometric SMOTE, a geometrically enhanced drop-in replacement for the original SMOTE algorithm, and clustering-based oversampling methods such as KMeans-SMOTE and G-SOMO, which combine existing imbalanced-learn oversamplers with Scikit-Learn clustering algorithms to address within-class imbalances. Rather than re-assessing the performance of these algorithms, which has already been thoroughly evaluated in prior studies, this paper focuses on their software design, implementation, and practical use within a unified framework. -
2
Integrating scanner data in Consumer Price Index calculation: Consumer Price Index calculation using scanner data
Publicaçãopor Griska, KostasOrigem: Repositório Institucional da UNLThe conducted research is a work project of Nova IMS student employed by Statistics Lithuania (National Statistics Office). The primary purpose of this research is to propose an alternative methodology for Consumer Price Index calculation using new data sources, which will contribute to national and international level consumer price statistics. The current methodology for consumer price index calculation is based on Laspeyers method for index calculation, which was last updated in 2004. Two decades ago, the supply of scanner data was minimal. Therefore the methodology was based on physical price collectors and survey statistics. Such methodology is known to be extremely costly and inefficient. This research will investigate the possibilities of incorporating new data sources in the consumer price statistics and investigate the alternative index calculation methods that could potentially eliminate the old model, which is known to struggle with bias sampling and chain drift problems. The literature review will cover these issues thoroughly and present possible multilateral or bilateral index alternatives to counter them. The research will also cover the classification and sampling procedures necessary to construct timeseries data. Obtained multilateral and bilateral index values are then compared against the current methodology, followed by conclusions and discussion sections. The research does not only consider scanner data but also explicitly discusses the applications of web-scraped data. The results section reveals that Jevons and GEKS index values are not correlated, indicating that sales turnover may not be rationally correlated with price movements. That is valid evidence that web-scraped data can also be beneficial in consumer price index calculation and official statistics as a supplementary source of information.
Ferramentas de pesquisa:
Filtros
Page will reload when a filter is selected or excluded.- Machine Learning Classification
- Bilateral 1 results 1
- Clustering 1 results 1
- Consumer Price Index 1 results 1
- Geometric SMOTE 1 results 1
- Imbalanced Learning 1 results 1
- Information Systems 1 results 1
- Library and Information Sciences 1 results 1
- Multilateral 1 results 1
- Oversampling 1 results 1
- Scanner data 1 results 1
- Scikit-Learn 1 results 1
- Software 1 results 1
- Ver todos...