Skip to main content
conference lunch move company map contacts lindholmen lindholmen 2 travel info

Baltic Seabird Dataset

The dataset consists of 2000 hours of video footage of guillemots on a ledge on Stora Karlsö, Sweden. Using AI, SLU has utilized the dataset to study and automate documentation of the birds behaviours. But the data has also been used as a proxy data set by Zenseact when developing autonomous driving.

In short

Content
2000 hours of video footage of guillemots

Author
Jonas Hentati Sundberg, Associate Senior Lecturer at SLU - The Swedish University of Agricultural Sciences (Sveriges Lantbruksuniversitet).

Quality
Real-world, challenging, class-imbalanced dataset

Annotations
2300 still frames annotated with bounding boxes for the birds present and classifying them into “Adult”, “Chick” and “Egg”.

Use cases
1. Used by SLU for researching guillemots
2. Used by Zenseact for federated learning projects

Access
Available for all AI Sweden partners. Read more.

Dataset specifics

Video footage of common murres Uria aalge – a seabird that nests on cliffs in the Baltic Sea. These birds spend most of their time offshore, but in May to July every year they come to Stora Karlsö to lay their eggs on the limestone cliffs.

The CCTV camera system was installed in 2019. The footage for 2019 comes from two cameras that film continuously at 60 frames per second between May 1st and July 15th. The video material is in .avi files with an average length of 2 H, and a total file size of approximately 2 Tb. IR-light provides clear imagery even under complete darkness.

In 2020, 4 cameras were used, creating approximately 5 Tb of data. Due to the COVID lockdown, no tourists visited the island this year which led to an increased number of sea eagles and thereby increased disturbances of the common murres.

In order to be able to use machine learning, researchers from SLU and AI Sweden have manually annotated around 2300 still frames taken from the material with bounding boxes of approximately 18000 objects belonging to any of the three categories “Adult”, “Chick”, “Egg”. These annotations are included in the dataset download bundle.

Use cases to date

1. SLU – Researching Guillemots using AI

Challenge
For many years, the Swedish University of Agricultural Sciences has been studying the behaviour of Guillemots on Stora Karlsö using CCTV. The work entailed watching thousands of hours of video footage to identify different behaviours of the birds. 

Approach
By developing an object detection algorithm that can identify Adults, Chicks and Eggs, SLU could automate the categorization of the different behaviours as well as identifying new ones.

The first steps in developing this algorithm were taken in the Baltic Seabird Hackathon co-hosted by AI Sweden, SLU, and WWF in 2019. The hackathon resulted in several projects, as well as a clear intention from the researchers at SLU to focus more on AI powered seabird research.

Outcome
Through this approach SLU could cut months of watching video footage while at the same time advancing in detecting new behaviours difficult for humans to identify.

Data scientists are currently developing a target tracking algorithm to follow individual birds frame by frame, with the goal of identifying bird individuals. Seabird researchers plan to use the target tracking of individuals to identify behaviors such as socializing, fights, and copulations.

Further read
→ Baltic Seabird Hackathon
→ Baltic Seabird Github Repo
→ The Baltic Seabird Project

2. Zenseact – Autonomous driving

Challenge
Overcoming difficult and time consuming legal hurdles when training federated AI models on real world road data.

Approach
The Baltic Seabird dataset has important similarities to autonomous driving data. Firstly, the data is real-world data, not synthetic. Secondly, the challenges related to the bird dataset are similar to the ones for developing solutions for autonomous driving. Examples of this are variations in light conditions and differences in bird appearance being like differences between pedestrians. But there are also two important differences between road data and bird data. There is no GDPR regulating the privacy of seabirds and there are no business secrets hidden in the dataset.

Outcome
Thanks to the availability of the Baltic Seabird dataset, Zenseact was able to start working with their partners straight away, saving an estimated 6-12 months of organising and waiting for legal clearances. The knowledge and learnings gained from working with the proxy data, in collaboration with others, was another gain that sped things up.

Learnings
1. Develop an understanding of what proxy data is and the benefits it can bring, for example, faster development times, easier collaboration with other organizations and the possibility to build different solutions based on the same shared dataset. 

2. Find datasets that are representative of the challenges you have in the actual dataset your models are going to use. Without these similarities between the real data and the proxy data, it will be hard to draw any generalized conclusions from the work with the proxy data.

3. Find organizations to collaborate with. Share knowledge with each other and/or benchmark different solutions’ results.

Further read
→ What do breeding seabirds have in common with autonomous driving?
→ Working with proxy data in Data Factory 

Access

The dataset is available for all AI Sweden partners. Contact Ebba Josefson Lindqvist and she will give you further instructions on how to access the data.

Become a partner and engage in the Data Factory

If you are interested in becoming a partner of AI Sweden, getting access to the the partner benefits, including the Data Factory and datasets, or in sharing a dataset or a model, please feel free to reach out.

Project Manager, LLM

Ebba Josefson Lindqvist