LeakPro: Leakage profiling and risk oversight for machine learning models
Many recent works have highlighted the possibility of extracting data from trained machine-learning models. However, these examples are typically performed under idealistic conditions and it is unclear if the risk prevails under more realistic assumptions. LeakPro will facilitate such testing.
Challenges
Machine learning models are algorithms that internally encode the capability to identify patterns in a data source. In many domains, e.g., in life science or finance, the data may be sensitive. It is, therefore, paramount to assess the difficulty in extracting sensitive information under realistic adversary settings.
In this project, we will create LeakPro, a platform to assess the information leakage of i) trained machine learning models, ii) the risk of leaking information during training with federated learning, and iii) the risk of leaking information in synthetic data.
Project purpose
The primary objective is to create LeakPro, a platform to evaluate the risk of information leakage in machine learning applications pertaining to the deployment of machine learning models, collaborative training, and synthetic data. LeakPro will adhere to the following principles:
- Openness: LeakPro will be developed as an open-source tool for the Swedish ecosystem. As inference attacks primarily reside as isolated islands within the research literature, we aim to collect state-of-the-art attacks for different modalities and make them accessible to non-experts.
- Scalable: As there is a plethora of different inference attacks available and the field is continuously evolving, it is imperative to design LeakPro in a modular fashion to allow scalability and the incorporation of novel attacks. Furthermore, LEAKPRO will allow users to assess information leakage in realistic settings in their use cases. Hence, LeakPro will allow for the identification/validation of realistic attack vectors.
- Relevance: To ensure LeakPro's sustained relevance, we not only adopt an open-source approach but also work towards its integration within the RISE Cyber Range to prepare for a long-term handover. Furthermore, to verify LEAKPRO’s practical application, we are aiming to integrate LeakPro internally at AstraZeneca, Sahlgrenska, and Region Halland.
Expected outcomes
At project finalization, LeakPro will offer a holistic platform, that can be run locally, to assess information leakage in the following contexts:
- Of trained machine learning models under membership attacks and data-reconstruction attacks under white-box access and API access. Multiple data modalities are considered, e.g., tabular, images, and text.
- During the training stage of federated learning where the adversary constitutes either a client or the server. Attacks under consideration are membership inference and data reconstruction.
- The information leakage between synthetic data and its original data source. Attacks of interest are membership inference, singling out, linkability, and in-painting attacks.
Facts
Funding: Vinnova: Advanced and Innovative Digitalization
Total project budget: 18 373 296 SEK
Project period: 1/12-2023 - 1/12-2025
Participants: AI Sweden, RISE, Scaleout, Syndata, Sahlgrenska University Hospital, Region Halland, and AstraZeneca
Reference Group (legal experts): AI Sweden, RISE, Region Halland, IMY, and Esam