A2 Vertaisarvioitu katsausartikkeli tieteellisessä lehdessä

Methods for Generating and Evaluating Synthetic Longitudinal Patient Data: A Systematic Review




TekijätPerkonoja, Katariina; Auranen, Kari; Virta, Joni

KustantajaSpringer Science and Business Media LLC

Julkaisuvuosi2025

Lehti: Journal of Healthcare Informatics Research

ISSN2509-4971

eISSN2509-498X

DOIhttps://doi.org/10.1007/s41666-025-00223-7

Julkaisun avoimuus kirjaamishetkelläAvoimesti saatavilla

Julkaisukanavan avoimuus Osittain avoin julkaisukanava

Verkko-osoitehttps://doi.org/10.1007/s41666-025-00223-7

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/505614547


Tiivistelmä
The rapid growth in data availability has facilitated research and development, yet not all industries have benefited equally due to legal and privacy constraints. The healthcare sector faces significant challenges in utilizing patient data because of concerns about data security and confidentiality. To address this, various privacy-preserving methods, including synthetic data generation, have been proposed. Synthetic data replicate existing data as closely as possible, acting as a proxy for sensitive information. While patient data are often longitudinal, this aspect remains underrepresented in existing reviews of synthetic data generation in healthcare. This paper maps and describes methods for generating and evaluating synthetic longitudinal patient data in real-life settings through a systematic literature review, conducted following the PRISMA guidelines and incorporating data from five databases up to May 2024. Thirty-nine methods were identified, with four addressing all key challenges in longitudinal patient data generation: preserving temporal structure, heterogeneous variable types, missing values, and unbalanced data. Most studies assessed resemblance to real data, the majority evaluated utility, and just over half examined privacy. However, only a minority considered all three aspects together. While four methods addressed the key challenges in generating synthetic longitudinal patient data, none incorporated privacy-preserving mechanisms. Additionally, their effectiveness with small sample sizes remains unclear, raising concerns about their real-world applicability. The lack of standardized evaluation criteria further complicates comparison. Future research should focus on developing privacy-preserving methods, robust evaluation frameworks, and ensuring publicly accessible code. Clearer directives from data protection authorities are needed, as synthetic patient data availability lags behind method development.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Julkaisussa olevat rahoitustiedot
Open Access funding provided by University of Turku (including Turku University Central Hospital). This work was supported by the Novo Nordisk Foundation (grant number NNF19SA0059129), the Finnish Cultural Foundation (grant number 00220801) and the Research Council of Finland (grant numbers 335077, 347501 and 353769).


Last updated on 2025-27-11 at 13:17