A1 Refereed original research article in a scientific journal

Finnish perspective on using synthetic health data to protect privacy: the PRIVASA project




AuthorsPitkämäki, Tinja; Pahikkala, Tapio; Montoya Perez, Ileana; Movahedi, Parisa; Nieminen, Valtteri; Southerington, Tom; Vaiste, Juho; Jafaritadi, Mojtaba; Khan, Muhammad Irfan; Kontio, Elina; Ranttila, Pertti; Pajula, Juha; Pölönen, Harri; Degerli, Aysen; Plomp, Johan; Airola, Antti

PublisherAmerican Institute of Mathematical Sciences (AIMS)

Publication year2024

JournalApplied Computing and Intelligence

Volume4

Issue2

First page 138

Last page163

eISSN2771-392X

DOIhttps://doi.org/10.3934/aci.2024009

Web address https://www.aimspress.com/article/doi/10.3934/aci.2024009

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/459120766


Abstract

The use of synthetic data could facilitate data-driven innovation across industries and applications. Synthetic data can be generated using a range of methods, from statistical modeling to machine learning and generative AI, resulting in datasets of different formats and utility. In the health sector, the use of synthetic data is often motivated by privacy concerns. As generative AI is becoming an everyday tool, there is a need for practice-oriented insights into the prospects and limitations of synthetic data, especially in the privacy sensitive domains. We present an interdisciplinary outlook on the topic, focusing on, but not limited to, the Finnish regulatory context. First, we emphasize the need for working definitions to avoid misplaced assumptions. Second, we consider use cases for synthetic data, viewing it as a helpful tool for experimentation, decision-making, and building data literacy. Yet the complementary uses of synthetic datasets should not diminish the continued efforts to collect and share high-quality real-world data. Third, we discuss how privacy-preserving synthetic datasets fall into the existing data protection frameworks. Neither the process of synthetic data generation nor synthetic datasets are automatically exempt from the regulatory obligations concerning personal data. Finally, we explore the future research directions for generating synthetic data and conclude by discussing potential future developments at the societal level.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
This work has been carried out as part of the PRIVASA joint action (2021–2024) funded by Business Finland. The associated grant numbers are 37428/31/2020 for the University of Turku, 33961/31/2020 for the Turku University of Applied Sciences and 43450/31/2020 for the VTT Technical Research Centre of Finland. The authors wish to thank all consortium partners for their valuable contributions that have shaped the project’s public research outcomes.


Last updated on 2025-27-01 at 19:08