Document details

A Review of Big Data and Machine Learning Operations in Official Statistics

Author(s): Nunes, Carlos ; Ashofteh, Afshin

Date: 2024

Persistent ID: http://hdl.handle.net/10362/169567

Origin: Repositório Institucional da UNL

Subject(s): Feature store; Official statistics; Machine learning operations; Data science; Big data; Data quality; Artificial Intelligence; Computer Networks and Communications; Computer Science Applications; Software; Media Technology; Computational Mathematics; Education; SDG 8 - Decent Work and Economic Growth; SDG 9 - Industry, Innovation, and Infrastructure; SDG 17 - Partnerships for the Goals


Description

Nunes, C. E. R., & Ashofteh, A. (2024). A Review of Big Data and Machine Learning Operations in Official Statistics: MLOps and Feature Store Adoption. In H. Shahriar, H. Ohsaki, M. Sharmin, D. Towey, AKM. J. A. Majumder, Y. Hori, J-J. Yang, M. Takemoto, N. Sakib, R. Banno, & S. I. Ahamed (Eds.), 2024 IEEE 48th Annual Computers, Software, and Applications Conference: COMPSAC 2024 (pp. 711-718). (Proceedings of the IEEE Annual Computer Software and Applications Conference). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/COMPSAC61105.2024.00101

Integrating machine learning (ML) into the official statisticians' toolset is gaining popularity as National Statistical Offices (NSOs) strive to improve their methodologies. This trend poses new challenges and implications for incorporating innovative techniques that ensure the reliability of the official statistical production process. A comprehensive literature review was conducted using Scopus and Web of Science databases to explore the contemporary applications of data science in official statistics. A total of 178 research articles were identified, focusing on areas such as big data, machine learning, and data quality. While the literature review revealed extensive proposals on utilizing alternative data and applying machine learning techniques to support official statistics production, it also identified research gaps in the post-training steps of the machine learning process. Areas requiring further investigation include machine learning operations in a production environment, data quality assurance, and governance.

Document Type Conference object
Language English
Contributor(s) NOVA Information Management School (NOVA IMS); Information Management Research Center (MagIC) - NOVA Information Management School; RUN
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents