Baltic Seabird Dataset
2000 hours of video footage of guillemots
Jonas Hentati Sundberg, Associate Senior Lecturer at SLU - The Swedish University of Agricultural Sciences (Sveriges Lantbruksuniversitet).
Real-world, challenging, class-imbalanced dataset
2300 still frames annotated with bounding boxes for the birds present and classifying them into “Adult”, “Chick” and “Egg”.
Available for all AI Sweden partners. Read more.
Video footage of common murres Uria aalge – a seabird that nests on cliffs in the Baltic Sea. These birds spend most of their time offshore, but in May to July every year they come to Stora Karlsö to lay their eggs on the limestone cliffs.
The CCTV camera system was installed in 2019. The footage for 2019 comes from two cameras that film continuously at 60 frames per second between May 1st and July 15th. The video material is in .avi files with an average length of 2 H, and a total file size of approximately 2 Tb. IR-light provides clear imagery even under complete darkness.
In 2020, 4 cameras were used, creating approximately 5 Tb of data. Due to the COVID lockdown, no tourists visited the island this year which led to an increased number of sea eagles and thereby increased disturbances of the common murres.
In order to be able to use machine learning, researchers from SLU and AI Sweden have manually annotated around 2300 still frames taken from the material with bounding boxes of approximately 18000 objects belonging to any of the three categories “Adult”, “Chick”, “Egg”. These annotations are included in the dataset download bundle.
Use cases to date
1. SLU – Researching Guillemots using AI
For many years, the Swedish University of Agricultural Sciences has been studying the behaviour of Guillemots on Stora Karlsö using CCTV. The work entailed watching thousands of hours of video footage to identify different behaviours of the birds.
By developing an object detection algorithm that can identify Adults, Chicks and Eggs, SLU could automate the categorization of the different behaviours as well as identifying new ones.
The first steps in developing this algorithm were taken in the Baltic Seabird Hackathon co-hosted by AI Sweden, SLU, and WWF in 2019. The hackathon resulted in several projects, as well as a clear intention from the researchers at SLU to focus more on AI powered seabird research.
Through this approach SLU could cut months of watching video footage while at the same time advancing in detecting new behaviours difficult for humans to identify.
Data scientists are currently developing a target tracking algorithm to follow individual birds frame by frame, with the goal of identifying bird individuals. Seabird researchers plan to use the target tracking of individuals to identify behaviors such as socializing, fights, and copulations.
2. Zenseact – Autonomous driving
Overcoming difficult and time consuming legal hurdles when training federated AI models on real world road data.
The Baltic Seabird dataset has important similarities to autonomous driving data. Firstly, the data is real-world data, not synthetic. Secondly, the challenges related to the bird dataset are similar to the ones for developing solutions for autonomous driving. Examples of this are variations in light conditions and differences in bird appearance being like differences between pedestrians. But there are also two important differences between road data and bird data. There is no GDPR regulating the privacy of seabirds and there are no business secrets hidden in the dataset.
Thanks to the availability of the Baltic Seabird dataset, Zenseact was able to start working with their partners straight away, saving an estimated 6-12 months of organising and waiting for legal clearances. The knowledge and learnings gained from working with the proxy data, in collaboration with others, was another gain that sped things up.
1. Develop an understanding of what proxy data is and the benefits it can bring, for example, faster development times, easier collaboration with other organizations and the possibility to build different solutions based on the same shared dataset.
2. Find datasets that are representative of the challenges you have in the actual dataset your models are going to use. Without these similarities between the real data and the proxy data, it will be hard to draw any generalized conclusions from the work with the proxy data.
3. Find organizations to collaborate with. Share knowledge with each other and/or benchmark different solutions’ results.
The dataset is available for all AI Sweden partners. Contact Ebba Josefson Lindqvist and she will give you further instructions on how to access the data.
Become a partner and engage in the Data Factory
If you are interested in becoming a partner of AI Sweden, getting access to the the partner benefits, including the Data Factory and datasets, or in sharing a dataset or a model, please feel free to reach out.