Document details

Synthetic data generation using machine learning algorithms for enhanced Type 2 diabetes prediction

Author(s): Fadlelmoula, Ahmed ; Torres, Helena ; Catarino, Susana Oliveira ; Minas, Graça ; Carvalho, Vitor

Date: 2025

Persistent ID: https://hdl.handle.net/1822/96824

Origin: RepositóriUM - Universidade do Minho

Subject(s): Machine Learning; Privacy Preserving Analytics; Synthetic Tabular Dataset; Type 2 Diabetes


Description

Strict privacy regulations in healthcare limit access to patient data for research and predictive modeling. To address this, the study explores synthetic data as a privacy-preserving alternative that also enhances dataset size and diversity. Using machine learning-based generative models, Generative Adversarial Networks (GANs) and Autoencoders (AEs) the researchers created synthetic tabular data to improve type 2 diabetes prediction. Six classifiers were used to evaluate the realism and quality of the synthetic data, with GAN-based accuracy ranging from 49% to 86% and AE from 50% to 88%. Results show that both models produce high-fidelity synthetic data, with some samples closely resembling real data. Despite minor detectability, the findings highlight the promise of synthetic data for secure, effective healthcare research and early disease prediction.

Document Type Conference paper
Language English
Contributor(s) Universidade do Minho
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents

No related documents