LeakPro 1: Leakage profiling and risk oversight for machine learning models
Many recent works have highlighted the possibility of extracting data from trained machine-learning models. However, these examples are typically performed under idealistic conditions and it is unclear if the risk prevails under more realistic assumptions. LeakPro aimed at facilitating such testing.
Listen to AI Sweden podcast about LeakPro
AI Sweden podcast sat down with Johan Östman to understand what the LeakPro framework is about and what machine learning risks the project aims to adress (in Swedish).
Challenges
Machine learning models are algorithms that internally encode the capability to identify patterns in a data source. In many domains, e.g., in life science or finance, the data may be sensitive. It is, therefore, paramount to assess the difficulty in extracting sensitive information under realistic adversary settings.
The project aimed at creating LeakPro, a platform to assess the information leakage of i) trained machine learning models, ii) the risk of leaking information during training with federated learning, and iii) the risk of leaking information in synthetic data.
Project purpose
The primary objective was to create LeakPro, a platform to evaluate the risk of information leakage in machine learning applications pertaining to the deployment of machine learning models, collaborative training, and synthetic data. LeakPro adhere to the following principles:
- Openness: LeakPro is developed as an open-source tool for the Swedish ecosystem. As inference attacks primarily reside as isolated islands within the research literature, we collected state-of-the-art attacks for different modalities and made them accessible to non-experts.
- Scalable: As there is a plethora of different inference attacks available and the field is continuously evolving, it was imperative to design LeakPro in a modular fashion to allow scalability and the incorporation of novel attacks. Furthermore, LeakPro allow users to assess information leakage in realistic settings in their use cases. Hence, LeakPro allow for the identification/validation of realistic attack vectors.
- Relevance: To ensure LeakPro's sustained relevance, we not only adopted an open-source approach but also worked towards its integration within the RISE Cyber Range to prepare for a long-term handover. Furthermore, to verify LeakPro’s practical application, we wanted to integrate LeakPro internally at AstraZeneca, Sahlgrenska, and Region Halland.
Outcomes
At project finalization, LeakPro offered a holistic platform, that can be run locally, to assess information leakage in the following contexts:
- Of trained machine learning models under membership attacks and data-reconstruction attacks under white-box access and API access. Multiple data modalities are considered, e.g., tabular, images, and text.
- During the training stage of federated learning where the adversary constitutes either a client or the server. Attacks under consideration are membership inference and data reconstruction.
- The information leakage between synthetic data and its original data source. Attacks of interest are membership inference, singling out, linkability, and in-painting attacks.
- LeakPro is released as open source.
- LeakPro continues 2025–2027 with a widened scope. Read more about LeakPro 2.
LeakPro repository
Released under Apache 2.0 license: LeakPro repository
Facts
Funding: Vinnova: Advanced and Innovative Digitalization
Total project budget: 18 373 296 SEK
Project period: 1/12-2023 - 1/12-2025
Participants: AI Sweden, RISE, Scaleout, Syndata, Sahlgrenska University Hospital, Region Halland, and AstraZeneca
Reference Group (legal experts): AI Sweden, RISE, Region Halland, IMY, and Esam
For more information, contact
Related