Skip to main content

Multimodal language model

AI Sweden's language team is taking the next major step by initiating the development of Sweden's first large multimodal language model. The new model is expected, just like GPT-SW3, to become an important national resource for Sweden.

An illustrative composition featuring multiple screens arranged in a circular formation, radiating light from the center

The new model will be able to handle text, images, and audio, thus possessing a broad capability to solve various types of tasks, including interaction with external tools such as databases and browsers. Additionally, it will have the ability to generate both images and audio.

Since the start of the GPT-SW3 project, the forefront of large-scale foundation models has shifted from language models that can only handle text. With the development of a multimodal model, Sweden continues to stay at the forefront of advancements in this field


The ambition is to create a model family where the largest model has at least 100 billion parameters. All models developed within this project are planned to be open, allowing them to be downloaded and accessible for modification, fine-tuning, research, and commercialization.

Phase 1

  • This is the initial phase, currently funded by Vinnova and scheduled to run until the summer of 2024
  • During this period, we will be collecting training data for the model and conducting experiments regarding new functionality within the model, among other activities

Phase 2

  • The second stage is planned to extend until the end of 2024
  • This period is earmarked for the large-scale training of the new model

For more information, contact

A picture of Magnus Sahlgren
Magnus Sahlgren
Head of Research, NLU
+46 (0)76-315 34 80

Recommended reading

An open book with text
Natural Language Understanding (NLU) will be key for making the most of AI and enables tasks as wide...
Scrabble tiles tumbling in the air, each spelling out the letters and symbols of GPT-SW3
AI Sweden, together with RISE and WASP WARA Media & Language, have developed a large-scale...