Multimodal language model

AI Sweden's language team is taking the next major step by initiating the development of Sweden's first large multimodal language model. The new model is expected, just like GPT-SW3, to become an important national resource for Sweden.

An illustrative composition featuring multiple screens arranged in a circular formation, radiating light from the center

The new model will be able to handle text, images, and audio, thus possessing a broad capability to solve various types of tasks, including interaction with external tools such as databases and browsers. Additionally, it will have the ability to generate both images and audio.

Since the start of the GPT-SW3 project, the forefront of large-scale foundation models has shifted from language models that can only handle text. With the development of a multimodal model, Sweden continues to stay at the forefront of advancements in this field

Facts

The ambition is to create a model family where the largest model has at least 100 billion parameters. All models developed within this project are planned to be open, allowing them to be downloaded and accessible for modification, fine-tuning, research, and commercialization.

Phase 1

This is the initial phase, currently funded by Vinnova and scheduled to run until the summer of 2024
During this period, we will be collecting training data for the model and conducting experiments regarding new functionality within the model, among other activities