Publicação

A Feature Store Architecture for Official Statistics Machine Learning

Ver documento

Detalhes bibliográficos
Resumo:Integrating machine learning (ML) into the official statisticians' toolset is gaining popularity as National Statistical Offices (NSOs) strive to improve their methodologies. This trend poses new challenges and implications for incorporating innovative techniques that ensure the reliability of the official statistical production process. For this study, first, a comprehensive literature review was conducted using Scopus and Web of Science databases to explore the contemporary applications of data science in official statistics. A total of 178 research articles were identified, focusing on areas such as big data, machine learning, and data quality. While the literature review revealed extensive proposals on utilizing alternative data and applying machine learning techniques to support official statistics production, it also identified research gaps in the post-training steps of the machine learning process. Areas requiring further investigation include machine learning operations in a production environment, data quality assurance, and governance. Considering these gaps, this study was progressed to experiment with the concept of a feature store to evaluate its potential benefits and drawbacks, considering the maturity matrix for Machine Learning capability proposed by ONS-UNECE Machine Learning Group 20221. The experiment conducted in this study demonstrated that the implementation of a feature store can result in an 8% reduction in processing time for datasets associated with multiple projects. The findings contribute to understanding how a feature store can enhance efficiency and productivity in the context of official statistics. Furthermore, the research highlights the need for continued exploration and research in the areas of machine learning deployment, data quality, and governance to ensure the successful integration of machine learning into the official statistical production process. While the findings and implications discussed in this study are primarily applicable to official statistics, the findings raised can also be relevant to other domains.
Autores principais:Nunes, Carlos Eduardo Ramos
Assunto:Feature Store Official Statistics Machine Learning Operations Data Science Big Data Data Quality SDG 9 - Industry, innovation and infrastructure SDG 16 - Peace, justice and strong institutions
Ano:2024
País:Portugal
Tipo de documento:dissertação de mestrado
Tipo de acesso:acesso embargado
Instituição associada:Universidade Nova de Lisboa
Idioma:inglês
Origem:Repositório Institucional da UNL
Descrição
Resumo:Integrating machine learning (ML) into the official statisticians' toolset is gaining popularity as National Statistical Offices (NSOs) strive to improve their methodologies. This trend poses new challenges and implications for incorporating innovative techniques that ensure the reliability of the official statistical production process. For this study, first, a comprehensive literature review was conducted using Scopus and Web of Science databases to explore the contemporary applications of data science in official statistics. A total of 178 research articles were identified, focusing on areas such as big data, machine learning, and data quality. While the literature review revealed extensive proposals on utilizing alternative data and applying machine learning techniques to support official statistics production, it also identified research gaps in the post-training steps of the machine learning process. Areas requiring further investigation include machine learning operations in a production environment, data quality assurance, and governance. Considering these gaps, this study was progressed to experiment with the concept of a feature store to evaluate its potential benefits and drawbacks, considering the maturity matrix for Machine Learning capability proposed by ONS-UNECE Machine Learning Group 20221. The experiment conducted in this study demonstrated that the implementation of a feature store can result in an 8% reduction in processing time for datasets associated with multiple projects. The findings contribute to understanding how a feature store can enhance efficiency and productivity in the context of official statistics. Furthermore, the research highlights the need for continued exploration and research in the areas of machine learning deployment, data quality, and governance to ensure the successful integration of machine learning into the official statistical production process. While the findings and implications discussed in this study are primarily applicable to official statistics, the findings raised can also be relevant to other domains.