Multimodal language model
AI Sweden's language team is taking the next major step by initiating the development of Sweden's first large multimodal language model. The new model is expected, just like GPT-SW3, to become an important national resource for Sweden.
The new model will be able to handle text, images, and audio, thus possessing a broad capability to solve various types of tasks, including interaction with external tools such as databases and browsers. Additionally, it will have the ability to generate both images and audio.
Since the start of the GPT-SW3 project, the forefront of large-scale foundation models has shifted from language models that can only handle text. With the development of a multimodal model, Sweden continues to stay at the forefront of advancements in this field
Facts
The ambition is to create a model family where the largest model has at least 100 billion parameters. All models developed within this project are planned to be open, allowing them to be downloaded and accessible for modification, fine-tuning, research, and commercialization.
Phase 1
- This is the initial phase, currently funded by Vinnova and scheduled to run until the summer of 2024
- During this period, we will be collecting training data for the model and conducting experiments regarding new functionality within the model, among other activities
Phase 2
- The second stage is planned to extend until the end of 2024
- This period is earmarked for the large-scale training of the new model