Investigating the use of synthetic data in intensive care

Lack of access to sufficient data and restrictions on sharing data between healthcare organisations are two of the main challenges when developing AI models in healthcare. AI Sweden, together with project partners Region Västerbotten, Örebro University and Syndata AB, have investigated the opportunities for using synthetic data as a potential way forward.

A common challenge when working with AI models in healthcare is the need for large amounts of data to train the models. If the healthcare regions could share their data with each other, it would be very beneficial for AI development. A potential solution could be to work with synthetic data that does not contain identifiable patient data and can be shared without breaching privacy.

We want to support the national use of synthetic data to develop beneficial AI models in healthcare.

We want to see this approach replicated at a national level. The following report can be shared among the regions to promote the practice.

This phase 1 report focuses on the usefulness of AI models in predictive healthcare. We have investigated different generative methods for creating synthetic data and validated synthesized data with respect to both quality and utility for training AI models.

The report has six parts:

1: Syntetisk data inom IVA rapport (på svenska) Här beskrivs förutsättningarna och insikterna från projektet.
2: Rapport Syndata AB (på svenska) Lärdomar från vår partner Syndata AB som genomförde syntetiseringen av data från Region Västerbotten.
Attachment 1a: Recovered dataset evaluation report (in English) Technical assessment of the quality of the recovered synthetic dataset.
Attachment 1b: Discharged dataset evaluation report (in English) Technical assessment of the quality of the discharged synthetic dataset.
Attachment 2: Results from AI Sweden (in English) Technical results regarding the quality and privacy of the synthetic data generated by the models.
Attachment 3: Results from Region Vasterbotten (in English) Technical validation of how well the different synthesized datasets perform when training AI models in the same way as for the original data.