Publicação
Design and development of an assessment framework for data quality of a dataset
| Resumo: | This work developed a modular framework for the automated evaluation of dataset quality, with a focus on High-Value Datasets (HVDs). The flexible architecture integrates independent modules that score specific quality dimensions, such as completeness, consistency, accessibility, and adherence to standards like FAIR and INSPIRE. Results are consolidated into normalised scores and detailed explanations, presented within an intuitive interface embedded in the CKAN portal. The framework’s dual-level evaluation approach, covering both the dataset as a whole and individual resources, provides tailored feedback and actionable insights for data stewards and decision-makers. Testing with synthetic and real-world datasets confirmed the system’s robust handling of missing and inconsistent data, while highlighting its capacity to support a wide range of dataset structures and domains. The user-friendly interface allows evaluations to be triggered directly from dataset pages, eliminating the need for technical expertise and fostering broader adoption. Beyond these technical achievements, this work contributes to data governance practices by enabling consistent, transparent, and reproducible quality assessments. It offers a foundation for future enhancements, such as automated weight, AI-driven improvement suggestions, and integration with evolving data standards. Ultimately, the proposed framework enhances trust in open data, facilitating evidence-based decision-making and supporting data-driven innovation across sectors. |
|---|---|
| Autores principais: | Mastracci, Silvia |
| Assunto: | Data quality High-value datasets HVD Data evaluation Open data CKAN Interoperability FAIR INSPIRE DCAT-AP ISO/IEC |
| Ano: | 2025 |
| País: | Portugal |
| Tipo de documento: | dissertação de mestrado |
| Tipo de acesso: | acesso embargado |
| Instituição associada: | Universidade de Aveiro |
| Idioma: | inglês |
| Origem: | RIA - Repositório Institucional da Universidade de Aveiro |
| Resumo: | This work developed a modular framework for the automated evaluation of dataset quality, with a focus on High-Value Datasets (HVDs). The flexible architecture integrates independent modules that score specific quality dimensions, such as completeness, consistency, accessibility, and adherence to standards like FAIR and INSPIRE. Results are consolidated into normalised scores and detailed explanations, presented within an intuitive interface embedded in the CKAN portal. The framework’s dual-level evaluation approach, covering both the dataset as a whole and individual resources, provides tailored feedback and actionable insights for data stewards and decision-makers. Testing with synthetic and real-world datasets confirmed the system’s robust handling of missing and inconsistent data, while highlighting its capacity to support a wide range of dataset structures and domains. The user-friendly interface allows evaluations to be triggered directly from dataset pages, eliminating the need for technical expertise and fostering broader adoption. Beyond these technical achievements, this work contributes to data governance practices by enabling consistent, transparent, and reproducible quality assessments. It offers a foundation for future enhancements, such as automated weight, AI-driven improvement suggestions, and integration with evolving data standards. Ultimately, the proposed framework enhances trust in open data, facilitating evidence-based decision-making and supporting data-driven innovation across sectors. |
|---|