Skip to main content

How can synthetic data enhance AI in healthcare?

Monday, June 7, 2021

Lack of data and restrictions on how to share it between regions are two of the main challenges when developing AI models in healthcare. AI Sweden, together with project partners Region Västerbotten and Örebro University, are therefore investigating the opportunities for using synthetic data as a potential way forward. The project is part of the innovation milieu Information-driven healthcare.

Challenges with healthcare data

A constant challenge when developing AI models in healthcare is the lack of data. Generally, all regions and other healthcare organisations will need more data than they have available, and they can only use their own data sources. Being able to share data between regions in order to train AI models would therefore be a great step for using AI in healthcare. However, sharing sensitive patient data is tightly regulated by law. Given that synthetic data does not contain any data from real patients, training AI models on synthetic data may enable sharing data between regions. 

Project goals

The project is based on the AI prediction models for intensive care applications that data scientists at Region Västerbotten are developing in close cooperation with their healthcare professionals.
Phase 1 of the project is about investigating the possibilities with synthetic data derived from the real data from these AI applications. We will also survey applicable tools for synthesizing data.

  • When is synthetic data needed, what are the fields of application?
  • How can AI training with synthetic data be evaluated?
  • How can synthetic data be used for balancing data sets to minimize bias?
  • When is synthetic data not advisable to use?

In Phase 2 the project will be expanded in order to investigate the legal, technical and ethical aspects of sharing synthetic datasets across regions.

Project partners phase 1

Region Västerbotten, the Care Support Special unit
Örebro University, the AASS Machine Perception and Interaction Lab

The report from phase 1 of this project can be accessed here.

What is Synthetic data?
Synthetic data is simulated data created by advanced algorithms that retain the statistical properties and correlations from the real dataset while having no connection to real data subjects. Therefore, synthetic data does not contain identifiable data from real patients.

For ideas or questions, contact Henrik Ahlén, AI Change Agent Healthcare at henrik.ahlen@ai.se