Publicação

Design and development of an assessment framework for data quality of a dataset

Detalhes bibliográficos
Resumo:	This work developed a modular framework for the automated evaluation of dataset quality, with a focus on High-Value Datasets (HVDs). The flexible architecture integrates independent modules that score specific quality dimensions, such as completeness, consistency, accessibility, and adherence to standards like FAIR and INSPIRE. Results are consolidated into normalised scores and detailed explanations, presented within an intuitive interface embedded in the CKAN portal. The framework’s dual-level evaluation approach, covering both the dataset as a whole and individual resources, provides tailored feedback and actionable insights for data stewards and decision-makers. Testing with synthetic and real-world datasets confirmed the system’s robust handling of missing and inconsistent data, while highlighting its capacity to support a wide range of dataset structures and domains. The user-friendly interface allows evaluations to be triggered directly from dataset pages, eliminating the need for technical expertise and fostering broader adoption. Beyond these technical achievements, this work contributes to data governance practices by enabling consistent, transparent, and reproducible quality assessments. It offers a foundation for future enhancements, such as automated weight, AI-driven improvement suggestions, and integration with evolving data standards. Ultimately, the proposed framework enhances trust in open data, facilitating evidence-based decision-making and supporting data-driven innovation across sectors.
Autores principais:	Mastracci, Silvia
Assunto:	Data quality High-value datasets HVD Data evaluation Open data CKAN Interoperability FAIR INSPIRE DCAT-AP ISO/IEC
Ano:	2025
País:	Portugal
Tipo de documento:	dissertação de mestrado
Tipo de acesso:	acesso embargado
Instituição associada:	Universidade de Aveiro
Idioma:	inglês
Origem:	RIA - Repositório Institucional da Universidade de Aveiro

Descrição
Resumo:	This work developed a modular framework for the automated evaluation of dataset quality, with a focus on High-Value Datasets (HVDs). The flexible architecture integrates independent modules that score specific quality dimensions, such as completeness, consistency, accessibility, and adherence to standards like FAIR and INSPIRE. Results are consolidated into normalised scores and detailed explanations, presented within an intuitive interface embedded in the CKAN portal. The framework’s dual-level evaluation approach, covering both the dataset as a whole and individual resources, provides tailored feedback and actionable insights for data stewards and decision-makers. Testing with synthetic and real-world datasets confirmed the system’s robust handling of missing and inconsistent data, while highlighting its capacity to support a wide range of dataset structures and domains. The user-friendly interface allows evaluations to be triggered directly from dataset pages, eliminating the need for technical expertise and fostering broader adoption. Beyond these technical achievements, this work contributes to data governance practices by enabling consistent, transparent, and reproducible quality assessments. It offers a foundation for future enhancements, such as automated weight, AI-driven improvement suggestions, and integration with evolving data standards. Ultimately, the proposed framework enhances trust in open data, facilitating evidence-based decision-making and supporting data-driven innovation across sectors.