AI Sweden, in collaboration with Germany's Fraunhofer IAIS, has gained access to one of Europe's most powerful supercomputers to train language models for all EU languages. The EuroLingua-GPT project marks the third major EU collaboration on language models that AI Sweden is currently participating in.
In this collaboration, AI Sweden and Fraunhofer IAIS will develop a series of open, large multilingual language models for 45 European languages, dialects, and codes, including the 24 official European languages. The EuroLingua-GPT project will last one year and aims to produce a family of new language models ranging from 7 to 180 billion parameters. Training will start at the end of May, with the first models expected to be ready within a few months.
Training will take place on MareNostrum 5 in Barcelona, one of Europe's most powerful supercomputers. MareNostrum 5, part of the Barcelona Supercomputing Center and EuroHPC, is funded by the EU, with computing power allocated through an application process for projects with the greatest potential.
The “extreme scale access” granted to AI Sweden and Fraunhofer IAIS equates to 8.8 million compute hours on a computer cluster with a total of 4480 Nvidia H100 GPUs.
"The allocation we received on MareNostrum 5 represents a computational capacity significantly greater than what is available nationally. Both the public and private sectors in the EU are asking for open, powerful language models trained for European languages. This is one way to meet that need,” says Magnus Sahlgren.
"The computing capacities for EuroLingua are a milestone – GenAI 'Made in Europe' is thus becoming a reality. The goal of our collaboration with AI Sweden is to train a family of large language models from scratch that will be published open source. I am very happy that the two organizations are pooling their expertise to achieve this," says Dr. Joachim Köhler, Head of Department Netmedia at Fraunhofer IAIS.
EuroLingua-GPT is one of three major ongoing EU projects on language models with AI Sweden as a project partner. The other two are TrustLLM and Deploy AI.
"This demonstrates that the NLU team at AI Sweden is one of the leading research groups in language technology in Europe. It provides Sweden with a unique opportunity to both contribute to the rest of Europe and create an attractive environment to draw in top talent and significant investments," says Mikael Ljungblom, Director of Public Policy and International Relations at AI Sweden.
Fraunhofer IAIS and the NLU group at AI Sweden are two of Europe's leading LLM labs, with proven expertise and experience in developing LLMs.
Fraunhofer IAIS has led the development of OpenGPT-X, and the NLU group at AI Sweden has together with RISE and WASP WARA Media & Language developed GPT-SW3 for the Scandinavian languages.