Skip to main content

AI Sweden and Fraunhofer IAIS to Develop Language Models for All of Europe

Thursday, May 16, 2024

AI Sweden, in collaboration with Germany's Fraunhofer IAIS, has gained access to one of Europe's most powerful supercomputers to train language models for all EU languages. The EuroLingua-GPT project marks the third major EU collaboration on language models that AI Sweden is currently participating in.

One of the supercomputors in the Barcelona Supercomputing Center

Photo: One of the supercomputers, By courtesy of Barcelona Supercomputing Center - www.bsc.es

This is a unique opportunity for AI Sweden to help strengthen European and Swedish competitiveness and digital sovereignty by developing a powerful and open European language model

Magnus Sahlgren, Head of Research, NLU, at AI Sweden.

In this collaboration, AI Sweden and Fraunhofer IAIS will develop a series of open, large multilingual language models for 45 European languages, dialects, and codes, including the 24 official European languages. The EuroLingua-GPT project will last one year and aims to produce a family of new language models ranging from 7 to 180 billion parameters. Training will start at the end of May, with the first models expected to be ready within a few months.

Training will take place on MareNostrum 5 in Barcelona, one of Europe's most powerful supercomputers. MareNostrum 5, part of the Barcelona Supercomputing Center and EuroHPC, is funded by the EU, with computing power allocated through an application process for projects with the greatest potential.

Photo: By courtesy of Barcelona Supercomputing Center - www.bsc.es

Photos: MareNostrum5, By courtesy of Barcelona Supercomputing Center - www.bsc.es

Photo: By courtesy of Barcelona Supercomputing Center - www.bsc.es

The “extreme scale access” granted to AI Sweden and Fraunhofer IAIS equates to 8.8 million compute hours on a computer cluster with a total of 4480 Nvidia H100 GPUs.

"The allocation we received on MareNostrum 5 represents a computational capacity significantly greater than what is available nationally. Both the public and private sectors in the EU are asking for open, powerful language models trained for European languages. This is one way to meet that need,” says Magnus Sahlgren.

"The computing capacities for EuroLingua are a milestone – GenAI 'Made in Europe' is thus becoming a reality. The goal of our collaboration with AI Sweden is to train a family of large language models from scratch that will be published open source. I am very happy that the two organizations are pooling their expertise to achieve this," says Dr. Joachim Köhler, Head of Department Netmedia at Fraunhofer IAIS.

One of Three Major EU Projects Involving AI Sweden's Language Team

EuroLingua-GPT is one of three major ongoing EU projects on language models with AI Sweden as a project partner. The other two are TrustLLM and Deploy AI.

"This demonstrates that the NLU team at AI Sweden is one of the leading research groups in language technology in Europe. It provides Sweden with a unique opportunity to both contribute to the rest of Europe and create an attractive environment to draw in top talent and significant investments," says Mikael Ljungblom, Director of Public Policy and International Relations at AI Sweden.

Background

Fraunhofer IAIS and the NLU group at AI Sweden are two of Europe's leading LLM labs, with proven expertise and experience in developing LLMs. 

Fraunhofer IAIS has led the development of OpenGPT-X, and the NLU group at AI Sweden has together with RISE and WASP WARA Media & Language developed GPT-SW3 for the Scandinavian languages.

EU flag and text: Funded by the European Union

Contact Information

A picture of Magnus Sahlgren
Magnus Sahlgren
Head of Research, NLU
+46 (0)76-315 34 80

You might also be interested in:

A picture of Magnus Sahlgren and text: New from NLU: RoBERTa, Tyr, Translation model, Adaption of Meta's Llama 3
AI Sweden is proud to announce the release of four new language models. These advancements mark a...
Scrabble tiles tumbling in the air, each spelling out the letters and symbols of GPT-SW3
AI Sweden now releases the first large Nordic language model, GPT-SW3. It is available as an open...