Skip to main content

Four New Language Models Showcase AI Sweden’s New NLU Strategy

Monday, May 27, 2024

AI Sweden is proud to announce the release of four new language models. These advancements mark a significant step in AI Sweden’s new strategic journey in Natural Language Understanding, which emphasizes both long-term collaboration and immediate application goals.

“There are two key paths forward for AI Sweden’s NLU team. First, we are collaborating with other leading European organizations to develop large-scale open models from scratch. Second, we are rapidly fine-tuning existing open models to meet specific application needs,” says Magnus Sahlgren, Head of Research, Natural Language Understanding at AI Sweden.

A picture of Magnus Sahlgren

Together with Germany’s Fraunhofer IAIS (Intelligent Analysis and Information Systems), AI Sweden has secured access to the MareNostrum 5 supercomputer, part of the EuroHPC initiative and its Extreme Scale Access Mode. The collaboration will enable the training of a new family of large language models for 45 European languages and dialects. The initiative, called EuroLingua-GPT, is one of three major ongoing EU projects on language models with AI Sweden as a project partner.

Photo: By courtesy of Barcelona Supercomputing Center - www.bsc.es

Image: MareNostrum5, Barcelona Supercomputing Center

“Our partnership with Fraunhofer IAIS and our work with EuroLingua-GPT are examples of our long-term commitment to address the increasing demand for open and transparently trained language models across both public and private sectors,” Magnus Sahlgren explains. 

In parallel, the four newly released language models, now available on AI Sweden's HuggingFace page, showcase AI Sweden's strategy of quickly adapting and refining open models from sources, such as Meta and Mistral AI, to meet the specific needs of Swedish organizations.

“We’ve received numerous requests for specialized and localized models, from both government agencies and other stakeholders. Thanks to our previous work with GPT-SW3, we now have the expertise and data required to meet these demands by quickly adapting models developed by others,” says Magnus Sahlgren.

The four new language models are tailored for specific applications:

"These four models, along with our collaboration with Fraunhofer, reflect our commitment to advancing open model development. We aim to create language models that are not only technically advanced, but also practically adaptable and relevant for specific Swedish use cases" says Magnus Sahlgren.

RoBERTa

An enhancement of Meta's RoBERTa-large, AI Sweden’s RoBERTa model has been optimized for niche language tasks. The model is trained on Intel’s Gaudi accelerator using the Nordic Pile dataset, which was developed during the construction of GPT-SW3. The 335-million-parameter model excels in sentiment analysis, named entity recognition (NER), and semantic search (such as an encoder model in an RAG system), ranking top on ScandEvals for encoder models as of mid-May 2024.

Tyr

Named after the Norse god of justice, Tyr is an innovative model for the Swedish legal domain: the first of its kind. Tyr merges a Swedish Mistral model with the English legal language model Saul, resulting in a model capable of answering basic legal questions in Swedish, even though it has not been specifically trained on Swedish legislation.

With further fine-tuning, Tyr has the potential to offer more precise answers within the Swedish legal domain, paving the way for AI-supported legal advice and system use.

Translation

The translation model is based on GPT-SW3 and facilitates translations between Swedish and English. Trained on a DGX machine from Aixia using AI Sweden’s translation data, it is ideal for contexts requiring the translation of large text volumes.

Llama 3

A Scandinavian adaption of Meta’s Llama 3, this model utilizes the Nordic Pile training data to better handle Scandinavian languages. This 8-billion-parameter version of Llama 3 offers enhanced performance for Scandinavian language applications. 

On using these models

All models are available via AI Sweden’s model library on HuggingFace. AI Sweden does not offer technical support. Like GPT-SW3, these models require organizations to implement any necessary guardrails and customizations for their specific applications. It is important to note that these models may have limitations, including potential inaccuracies and hallucinations. AI Sweden does not warrant or guarantee the results from the model can be used as advice, legal or otherwise, or are factually true.

You might also be interested in:

One of the supercomputors in the Barcelona Supercomputing Center
AI Sweden, in collaboration with Germany's Fraunhofer IAIS, has gained access to one of Europe's...
an image of earth seen from space
How to maximize the impact of AI in the short term? In his keynote at AI Sweden’s event The Latest...
Scrabble tiles tumbling in the air, each spelling out the letters and symbols of GPT-SW3
AI Sweden now releases the first large Nordic language model, GPT-SW3. It is available as an open...