Comparative assessment of synthetic time series generation approaches in healthcare: leveraging patient metadata for accurate data synthesis
dc.contributor.author | Isasa, Imanol | |
dc.contributor.author | Hernández, Mikel | |
dc.contributor.author | Epelde Unanue, Gorka | |
dc.contributor.author | Londoño, Francisco | |
dc.contributor.author | Beristain Iraola, Andoni | |
dc.contributor.author | Larrea, Xabat | |
dc.contributor.author | Alberdi Aramendi, Ane | |
dc.contributor.author | Bamidis, Panagiotis D. | |
dc.contributor.author | Konstantinidis, Evdokimos I. | |
dc.date.accessioned | 2024-04-30T17:01:18Z | |
dc.date.available | 2024-04-30T17:01:18Z | |
dc.date.issued | 2024 | |
dc.identifier.citation | BMC Medical Informatics and Decision Making 24 : (2024) // Article ID 27 | es_ES |
dc.identifier.issn | 1472-6947 | |
dc.identifier.uri | http://hdl.handle.net/10810/66964 | |
dc.description.abstract | Background Synthetic data is an emerging approach for addressing legal and regulatory concerns in biomedical research that deals with personal and clinical data, whether as a single tool or through its combination with other privacy enhancing technologies. Generating uncompromised synthetic data could significantly benefit external researchers performing secondary analyses by providing unlimited access to information while fulfilling pertinent regulations. However, the original data to be synthesized (e.g., data acquired in Living Labs) may consist of subjects’ metadata (static) and a longitudinal component (set of time-dependent measurements), making it challenging to produce coherent synthetic counterparts. Methods Three synthetic time series generation approaches were defined and compared in this work: only generating the metadata and coupling it with the real time series from the original data (A1), generating both metadata and time series separately to join them afterwards (A2), and jointly generating both metadata and time series (A3). The comparative assessment of the three approaches was carried out using two different synthetic data generation models: the Wasserstein GAN with Gradient Penalty (WGAN-GP) and the DöppelGANger (DGAN). The experiments were performed with three different healthcare-related longitudinal datasets: Treadmill Maximal Effort Test (TMET) measurements from the University of Malaga (1), a hypotension subset derived from the MIMIC-III v1.4 database (2), and a lifelogging dataset named PMData (3). Results Three pivotal dimensions were assessed on the generated synthetic data: resemblance to the original data (1), utility (2), and privacy level (3). The optimal approach fluctuates based on the assessed dimension and metric. Conclusion The initial characteristics of the datasets to be synthesized play a crucial role in determining the best approach. Coupling synthetic metadata with real time series (A1), as well as jointly generating synthetic time series and metadata (A3), are both competitive methods, while separately generating time series and metadata (A2) appears to perform more poorly overall. | es_ES |
dc.description.sponsorship | This research was partly funded by the VITALISE (VIrtual healTh And weLlbeing Living Lab InfraStructurE) project, funded by the Horizon 2020 Framework Program of the European Union for Research Innovation (grant agreement 101007990). Ane Alberdi is part of the Intelligent Systems for Industrial Systems research group of Mondragon Unibertsitatea (IT1676-22), supported by the Department of Education, Universities and Research of the Basque Country. | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | BMC | es_ES |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/101007990 | es_ES |
dc.rights | info:eu-repo/semantics/openAccess | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
dc.subject | time series | |
dc.subject | synthetic data | |
dc.subject | privacy-preserving data sharing | |
dc.subject | health data | |
dc.title | Comparative assessment of synthetic time series generation approaches in healthcare: leveraging patient metadata for accurate data synthesis | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.rights.holder | © The Author(s) 2024. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. | es_ES |
dc.rights.holder | Atribución 3.0 España | * |
dc.relation.publisherversion | https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-024-02427-0 | es_ES |
dc.identifier.doi | 10.1186/s12911-024-02427-0 | |
dc.contributor.funder | European Commission | |
dc.departamentoes | Ciencia de la computación e inteligencia artificial | es_ES |
dc.departamentoeu | Konputazio zientziak eta adimen artifiziala | es_ES |
Files in this item
This item appears in the following Collection(s)
Except where otherwise noted, this item's license is described as © The Author(s) 2024. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.