Skip to main content

LeakPro: Leakage profiling and risk oversight for machine learning models

Many recent works have highlighted the possibility of extracting data from trained machine-learning models. However, these examples are typically performed under idealistic conditions and it is unclear if the risk prevails under more realistic assumptions. LeakPro will facilitate such testing.

Challenges

Machine learning models are algorithms that internally encode the capability to identify patterns in a data source. In many domains, e.g., in life science or finance, the data may be sensitive. It is, therefore, paramount to assess the difficulty in extracting sensitive information under realistic adversary settings.

In this project, we will create LeakPro, a platform to assess the information leakage of i) trained machine learning models, ii) the risk of leaking information during training with federated learning, and iii) the risk of leaking information in synthetic data.

Project purpose

The primary objective is to create LeakPro, a platform to evaluate the risk of information leakage in machine learning applications pertaining to the deployment of machine learning models, collaborative training, and synthetic data. LeakPro will adhere to the following principles:

  1. Openness: LeakPro will be developed as an open-source tool for the Swedish ecosystem. As inference attacks primarily reside as isolated islands within the research literature, we aim to collect state-of-the-art attacks for different modalities and make them accessible to non-experts. 
  2. Scalable: As there is a plethora of different inference attacks available and the field is continuously evolving, it is imperative to design LeakPro in a modular fashion to allow scalability and the incorporation of novel attacks. Furthermore, LEAKPRO will allow users to assess information leakage in realistic settings in their use cases. Hence, LeakPro will allow for the identification/validation of realistic attack vectors.
  3. Relevance: To ensure LeakPro's sustained relevance, we not only adopt an open-source approach but also work towards its integration within the RISE Cyber Range to prepare for a long-term handover. Furthermore, to verify LEAKPRO’s practical application, we are aiming to integrate LeakPro internally at AstraZeneca, Sahlgrenska, and Region Halland.

Expected outcomes 

At project finalization, LeakPro will offer a holistic platform, that can be run locally, to assess information leakage in the following contexts:

  1. Of trained machine learning models under membership attacks and data-reconstruction attacks under white-box access and API access. Multiple data modalities are considered, e.g., tabular, images, and text.
  2. During the training stage of federated learning where the adversary constitutes either a client or the server. Attacks under consideration are membership inference and data reconstruction.
  3. The information leakage between synthetic data and its original data source. Attacks of interest are membership inference, singling out, linkability, and in-painting attacks.
Overview of LeakPro and its interaction between technical and legal experts.

Overview of LeakPro and its interaction between technical and legal experts.

Facts

Funding: Vinnova: Advanced and Innovative Digitalization

Total project budget: 18 373 296 SEK

Project period: 1/12-2023 - 1/12-2025

Participants: AI Sweden, RISE, Scaleout, Syndata, Sahlgrenska University Hospital, Region Halland, and AstraZeneca

Reference Group (legal experts): AI Sweden, RISE, Region Halland, IMY, and Esam

For more information, contact

Fazeleh Hoseini
Fazeleh Hoseini
Machine Learning Engineer
+46 (0)73-305 69 22

Related news

Picture of Johan Östman

LeakPro enables collaboration around sensitive data

2025-01-29
In the latest episode of the AI Sweden Podcast, Johan Östman, researcher and project manager at AI Sweden, talks about LeakPro. The project’s goal is to better understand—and therefore be able to...
Johan Östman and Fazeleh Hoseini, Research engineers at AI Sweden

When will an AI model reveal your sensitive data?

2024-06-05
AI models can leak training data–this is known. However, such leakage has mainly been observed in lab-like conditions that often favor the attacker. Today, there are few answers on what the risks look...