Adipocyte Cell Imaging Challenge
AstraZeneca and AI Sweden are challenging the AI community to solve the problem of labeling cell images without requiring toxic preprocessing of cell cultures.
This competition will help AstraZeneca to accelerate the drug development process. While we hope that you will enjoy the challenge, you should know that the solutions created will be used by AstraZeneca scientists to treat failing hearts, fibrotic lungs, diabetes, liver diseases, neurodegeneration, and cancer.
The task is to utilize machine learning to combine the advantages of bright field and fluorescence imaging and at the same time avoid the toxic effects of cell labeling by predicting the content of the fluorescence images from the corresponding bright field images.
Please note that we continuously update the Q&A section at the end of this page.
Why you should participate
1. Solve a real world problem together with researchers from AstraZeneca
2. Access to cell imaging data and state-of-the-art computational infrastructure (NVIDIA A100)*
4. Total price sum of $5000 sponsored by AstraZeneca
*teams will have access to one GPU each during the two-week challenge period
Participation and application
The hackathon is open for national and international participants from either industry or academia. The maximum number of team members is 5.
A written application is needed (maximum 1500 words) containing a proposed solution method, including identified risks and mitigation strategies. Make sure that you have read through the problem formulation document. A resumé describing the relevant background and experience of the team is also required.
We acknowledge that to construct a solution strategy, it is often necessary to have access to the data. Hence, a smaller, representative data set can be downloaded.
Based on the application, eight teams will be chosen to participate in the hackathon. Selected teams will be notified on October 23rd.
The application is closed
The images for the challenge will be provided by Astra Zeneca in the form of tif files. There are three sets of images corresponding to three different magnification settings (20x, 40x, and 60x) of the microscope. For each field of view, there will be seven bright field images for different values of focal plane, and three different fluorescence images corresponding to labeling of nuclei, lipocytes, and cell matrix respectively. There will be on the order of 50-100 images for each magnification setting. Each image is approximately 2156 by 2556 pixels in size, using 16 bits to represent each pixel value.
To the left is a superimposed fluorescence image of cell nuclei (blue), lipid droplets (green), and cytoplasm (red). Each color is a single channel image. To the right is the corresponding bright field image.
The teams will be evaluated in two steps:
1. Quality metrics
A set of pre-defined quality metrics will be used to measure the objective quality of the generated images. These metrics as well as the code to calculate them will be released at the start of the challenge.
2. Jury presentation
A jury with members from both academia and industry will assess the quality of the different contributions based on the presentation given by each group.
The results and resulting code should be openly published and free to use for non-commercial purposes.
Q: Do the teams have their own right to publish the results?
A: Yes, we encourage the teams to publish their results. AI Sweden and AstraZeneca will write a joint paper describing the data set and results from the competition. This paper should be cited in all publications.
Q: What results should be submitted on November 15?
A: You should submit your code, quality metrics as well as a short written report describing your method and results. You will get the metrics and code to calculate them as well as a report template at the kick-off event on November 2.
Q: Will the different magnifications be labeled in the data set?
Q: Is it allowed to use other data, e.g for pre-training of the model
A: Pre-training is ok as long as you are transparent with the method and that the results are reproducible.
Q: How large is the training data set?
A: The number of images at each magnification is:
20x: 490 files
40x: 650 files
60x: 970 files
Observe that there are 10 images at each field of view (7 bright field and 3 fluorescence), the number of input-output pairs is 49, 65, and 97 for the different magnifications.
Q: How many GB RAM does one NVIDIA A100 GPU have?
Q: What about ownership and IP?
A: Results and code should be shared as open source. Neither AstraZeneca nor the teams will have ownership
Q: Can you reveal the evaluation metrics already now?
A: We will calculate the metrics on all three magnifications separately and the measurements will be the same for all the magnifications. We will release the metrics at the start of the hackathon. However, we can say that they will be based on CellProfiler pipelines. The measurements will be on individual grey scale images, which in this case are max intensity projections of confocal fluorescent stacks.
Q: Is it possible for you to reveal the name of the fluorescent dyes?
Q: How much batch variation is normally observed with the dyes? Does the normal pipeline used at the moment require normalization prior to downstream imaging?
A: There is no need to account for the batch variation in the challenge since only one dataset is given with little internal variation.
Q: Will you also provide fluorescent images for each z-stack height?
A: No, there is only one image per fluorescent channel, so for each field of view there are 7 bright field and 3 (one for each channel) fluorescence images.
Q: Is there any possibility of publishing the results afterwards together with the AstraZeneca team?
A: Possibility of this is good. Depending on what is required from AZ, we are open for further collaboration and joint publications
Q: Is a human expert is able to recognize the nucleus instances in the brightfield image?
A: A human expert could identify the nuclei in the bright field image in some cases, but not all. This is however also true for the content in the lipid and cytoplasm channels. With that said, we are aware that the different channels are representing different levels of difficulty and this is part of the challenge. We are curious to see what approaches the participants will come up with to overcome these and look forward to see the results.