Author(s):
Nunes, Carlos ; Ashofteh, Afshin
Date: 2024
Persistent ID: http://hdl.handle.net/10362/169567
Origin: Repositório Institucional da UNL
Subject(s): Feature store; Official statistics; Machine learning operations; Data science; Big data; Data quality; Artificial Intelligence; Computer Networks and Communications; Computer Science Applications; Software; Media Technology; Computational Mathematics; Education; SDG 8 - Decent Work and Economic Growth; SDG 9 - Industry, Innovation, and Infrastructure; SDG 17 - Partnerships for the Goals
Description
Nunes, C. E. R., & Ashofteh, A. (2024). A Review of Big Data and Machine Learning Operations in Official Statistics: MLOps and Feature Store Adoption. In H. Shahriar, H. Ohsaki, M. Sharmin, D. Towey, AKM. J. A. Majumder, Y. Hori, J-J. Yang, M. Takemoto, N. Sakib, R. Banno, & S. I. Ahamed (Eds.), 2024 IEEE 48th Annual Computers, Software, and Applications Conference: COMPSAC 2024 (pp. 711-718). (Proceedings of the IEEE Annual Computer Software and Applications Conference). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/COMPSAC61105.2024.00101
Integrating machine learning (ML) into the official statisticians' toolset is gaining popularity as National Statistical Offices (NSOs) strive to improve their methodologies. This trend poses new challenges and implications for incorporating innovative techniques that ensure the reliability of the official statistical production process. A comprehensive literature review was conducted using Scopus and Web of Science databases to explore the contemporary applications of data science in official statistics. A total of 178 research articles were identified, focusing on areas such as big data, machine learning, and data quality. While the literature review revealed extensive proposals on utilizing alternative data and applying machine learning techniques to support official statistics production, it also identified research gaps in the post-training steps of the machine learning process. Areas requiring further investigation include machine learning operations in a production environment, data quality assurance, and governance.