Author(s):
Fadlelmoula, Ahmed ; Torres, Helena ; Catarino, Susana Oliveira ; Minas, Graça ; Carvalho, Vitor
Date: 2025
Persistent ID: https://hdl.handle.net/1822/96824
Origin: RepositóriUM - Universidade do Minho
Subject(s): Machine Learning; Privacy Preserving Analytics; Synthetic Tabular Dataset; Type 2 Diabetes
Description
Strict privacy regulations in healthcare limit access to patient data for research and predictive modeling. To address this, the study explores synthetic data as a privacy-preserving alternative that also enhances dataset size and diversity. Using machine learning-based generative models, Generative Adversarial Networks (GANs) and Autoencoders (AEs) the researchers created synthetic tabular data to improve type 2 diabetes prediction. Six classifiers were used to evaluate the realism and quality of the synthetic data, with GAN-based accuracy ranging from 49% to 86% and AE from 50% to 88%. Results show that both models produce high-fidelity synthetic data, with some samples closely resembling real data. Despite minor detectability, the findings highlight the promise of synthetic data for secure, effective healthcare research and early disease prediction.