Baltic seabird dataset
The dataset consists of 2000 hours of video footage of guillemots on a ledge on Stora Karlsö, Sweden. Using AI, SLU has utilized the dataset to study and automate documentation of the birds' behaviors. However, the data has also been used as a proxy data set by Zenseact when developing autonomous driving.
In short
Content
The dataset consists of video footage of 2000 hours of video footage of common murres Uria aalge - a seabird that nests on cliffs in the Baltic Sea. The cameras are installed inside an artificial breeding ledge for common murres, providing unique opportunities to film the birds from a close distance.
Author
Jonas Hentati Sundberg, Associate Senior Lecturer at SLU - The Swedish University of Agricultural Sciences (Sveriges Lantbruksuniversitet).
Data Type
The dataset is a real-world, challenging, class-imbalanced dataset.
Anonymization
No anonymization was performed due to the dataset not containing any personal data.
Annotations
2300 still frames annotated with bounding boxes for the birds present and classifying them into “Adult”, “Chick” and “Egg”.
Size
Currently, 28TB, but more can be made available.
Access
The dataset is available for all AI Sweden partners.
Terms and conditions
To use this dataset, you must comply with the Baltic Seabird Dataset Terms and Conditions available below.
Dataset specifics
Video footage of common murres Uria aalge – a seabird that nests on cliffs in the Baltic Sea. These birds spend most of their time offshore, but from May to July every year they come to Stora Karlsö to lay their eggs on the limestone cliffs.
The CCTV camera system was installed in 2019. The footage for 2019 comes from two cameras that film continuously at 60 frames per second between May 1st and July 15th. The video material is in .avi files with an average length of 2 H, and a total file size of approximately 2 Tb. IR-light provides clear imagery even under complete darkness.
In 2020, 4 cameras were used, creating approximately 5 Tb of data. Due to the COVID lockdown, no tourists visited the island this year which led to an increased number of sea eagles and thereby increased disturbances of the common murres.
In order to be able to use machine learning, researchers from SLU and AI Sweden have manually annotated around 2300 still frames taken from the material with bounding boxes of approximately 18000 objects belonging to any of the three categories “Adult”, “Chick”, “Egg”. These annotations are included in the dataset download bundle.
More specifications can be found here
Use cases to date
1. SLU – Researching Guillemots using AI
Challenge
For many years, the Swedish University of Agricultural Sciences has been studying the behavior of Guillemots on Stora Karlsö using CCTV. The work entailed watching thousands of hours of video footage to identify different behaviors of the birds.
Approach
By developing an object detection algorithm that can identify Adults, Chicks and Eggs, SLU could automate the categorization of the different behaviors as well as identify new ones.
The first steps in developing this algorithm were taken in the Baltic Seabird Hackathon co-hosted by AI Sweden, SLU, and WWF in 2019. The hackathon resulted in several projects, as well as a clear intention from the researchers at SLU to focus more on AI-powered seabird research.
Outcome
Through this approach, SLU could cut months of watching video footage while at the same time advancing in detecting new behaviors difficult for humans to identify.
Data scientists are currently developing a target tracking algorithm to follow individual birds frame by frame, with the goal of identifying bird individuals. Seabird researchers plan to use the target tracking of individuals to identify behaviors such as socializing, fights, and copulation.
Further reading
→ Baltic Seabird Hackathon
→ Baltic Seabird Github Repo
→ The Baltic Seabird Project
2. Zenseact – Autonomous driving
Challenge
Overcoming difficult and time-consuming legal hurdles when training federated AI models on real-world road data.
Approach
The Baltic Seabird dataset has important similarities to autonomous driving data. Firstly, the data is real-world data, not synthetic. Secondly, the challenges related to the bird dataset are similar to the ones for developing solutions for autonomous driving. Examples of this are variations in light conditions and differences in bird appearance like differences between pedestrians. But there are also two important differences between road data and bird data. There is no GDPR regulating the privacy of seabirds and there are no business secrets hidden in the dataset.
Outcome
Thanks to the availability of the Baltic Seabird dataset, Zenseact was able to start working with their partners straight away, saving an estimated 6-12 months of organizing and waiting for legal clearances. The knowledge and learnings gained from working with the proxy data, in collaboration with others, was another gain that sped things up.
Learnings
1. Develop an understanding of what proxy data is and the benefits it can bring, for example, faster development times, easier collaboration with other organizations, and the possibility to build different solutions based on the same shared dataset.
2. Find datasets that are representative of the challenges you have in the actual dataset your models are going to use. Without these similarities between the real data and the proxy data, it will be hard to draw any generalized conclusions from the work with the proxy data.
3. Find organizations to collaborate with. Share knowledge with each other and/or benchmark different solutions’ results.
Further reading
→ What do breeding seabirds have in common with autonomous driving?
→ Working with proxy data in Data Factory
Access
The dataset is available for all AI Sweden partners. Contact Beatrice Comoli for further instructions on how to access the data. If you are interested in becoming a partner of AI Sweden, getting access to the partner benefits, including the Data Factory and datasets, or in sharing a dataset or a model, please feel free to reach out.