Skip to main content

LeakPro 1: Leakage profiling and risk oversight for machine learning models

Many recent works have highlighted the possibility of extracting data from trained machine-learning models. However, these examples are typically performed under idealistic conditions and it is unclear if the risk prevails under more realistic assumptions. LeakPro aimed at facilitating such testing.

Listen to AI Sweden podcast about LeakPro

AI Sweden podcast sat down with Johan Östman to understand what the LeakPro framework is about and what machine learning risks the project aims to adress (in Swedish). 

Challenges

Machine learning models are algorithms that internally encode the capability to identify patterns in a data source. In many domains, e.g., in life science or finance, the data may be sensitive. It is, therefore, paramount to assess the difficulty in extracting sensitive information under realistic adversary settings.

The project aimed at creating LeakPro, a platform to assess the information leakage of i) trained machine learning models, ii) the risk of leaking information during training with federated learning, and iii) the risk of leaking information in synthetic data.

Project purpose

The primary objective was to create LeakPro, a platform to evaluate the risk of information leakage in machine learning applications pertaining to the deployment of machine learning models, collaborative training, and synthetic data. LeakPro adhere to the following principles:

  1. Openness: LeakPro is developed as an open-source tool for the Swedish ecosystem. As inference attacks primarily reside as isolated islands within the research literature, we collected state-of-the-art attacks for different modalities and made them accessible to non-experts.
  2. Scalable: As there is a plethora of different inference attacks available and the field is continuously evolving, it was imperative to design LeakPro in a modular fashion to allow scalability and the incorporation of novel attacks. Furthermore, LeakPro allow users to assess information leakage in realistic settings in their use cases. Hence, LeakPro allow for the identification/validation of realistic attack vectors.
  3. Relevance: To ensure LeakPro's sustained relevance, we not only adopted an open-source approach but also worked towards its integration within the RISE Cyber Range to prepare for a long-term handover. Furthermore, to verify LeakPro’s practical application, we wanted to integrate LeakPro internally at AstraZeneca, Sahlgrenska, and Region Halland.

Outcomes 

At project finalization, LeakPro offered a holistic platform, that can be run locally, to assess information leakage in the following contexts:

  1. Of trained machine learning models under membership attacks and data-reconstruction attacks under white-box access and API access. Multiple data modalities are considered, e.g., tabular, images, and text.
  2. During the training stage of federated learning where the adversary constitutes either a client or the server. Attacks under consideration are membership inference and data reconstruction.
  3. The information leakage between synthetic data and its original data source. Attacks of interest are membership inference, singling out, linkability, and in-painting attacks.
  4. LeakPro is released as open source.
  5. LeakPro continues 2025–2027 with a widened scope. Read more about LeakPro 2.

LeakPro repository

Released under Apache 2.0 license: LeakPro repository

Facts

Funding: Vinnova: Advanced and Innovative Digitalization

Total project budget: 18 373 296 SEK

Project period: 1/12-2023 - 1/12-2025

Participants: AI Sweden, RISE, Scaleout, Syndata, Sahlgrenska University Hospital, Region Halland, and AstraZeneca

Reference Group (legal experts): AI Sweden, RISE, Region Halland, IMY, and Esam

Funding from Advanced Digitalisation
 

For more information, contact

Fazeleh Hoseini
Fazeleh Hoseini
Research scientist
+46 (0)73-305 69 22

Related

LeakPro workshop

LeakPro 2 expands successful AI privacy assessment tool

2026-03-18
LeakPro 2 aims at creating a tool that not only will quantify the risk that a model could leak sensitive data, but also the impact such a leakage could have. The work builds on the successful results...
Fazeleh Hoseini and Johan Östman

LeakPro 2: Operational privacy risk management for AI systems

LeakPro 2 is a tool/framework for assessing and mitigating privacy risks in machine learning models and AI systems. It combines privacy attacks, PET evaluation, and structured workflows to support...
Picture of Johan Östman

LeakPro enables collaboration around sensitive data

2025-01-29
In the latest episode of the AI Sweden Podcast, Johan Östman, researcher and project manager at AI Sweden, talks about LeakPro. The project’s goal is to better understand—and therefore be able to...
Johan Östman and Fazeleh Hoseini, Research engineers at AI Sweden

When will an AI model reveal your sensitive data?

2024-06-05
AI models can leak training data–this is known. However, such leakage has mainly been observed in lab-like conditions that often favor the attacker. Today, there are few answers on what the risks look...