Skip to main content

EuroLingua-GPT

AI Sweden and the Fraunhofer Institute for Intelligent Analysis and Information Systems are together training a new series of large language models (LLMs) for all of the official European languages.

Photo: By courtesy of Barcelona Supercomputing Center - www.bsc.es

Photos: MareNostrum5, By courtesy of Barcelona Supercomputing Center - www.bsc.es

Challenges

Both the public and private sectors in the EU are asking for open, powerful language models trained for European languages. EuroLingua-GPT is one way to meet that need.

About

The NLU group of AI Sweden and the Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS) have jointly secured computing time on the new MareNostrum 5 supercomputer at the Barcelona Supercomputing Center. The allocation granted through the EuroHPC "Extreme Scale Access" includes 8.8 million GPU hours on H100 chips.

This is one of the largest allocations granted by the European High-Performance Computing Joint Undertaking (EuroHPC JU) for developing large European AI language models (LLMs) on the EuroHPC infrastructure.

Starting from the end of May 2024, the partners will begin computing the first multilingual models. The "EuroLingua-GPT" project will run for one year, bringing large European multilingual open-source models within reach.

The models developed on the EuroHPC infrastructure are intended to serve both as general-purpose base models to advance research and science and as specialized models for specific industries or topics for productive use in companies or public administration through joint transfer projects.

Expected outcomes

The project will train and release a series of large language models trained on all of Europe's official languages. They will come in sizes from 7 billion parameters up to 180 billion parameters.

The focus of the new models is on 46 Indo-European languages, including the 24 official European languages – multilingual AI language models are still rare. The training of the models will start at the end of May 2024, with the first joint models expected to be published in the coming months.

In addition to the models themselves, the project will also develop a new framework for training LLMs, called Modalities.

Photo: By courtesy of Barcelona Supercomputing Center - www.bsc.es

Photos: MareNostrum5, By courtesy of Barcelona Supercomputing Center - www.bsc.es

Photo: By courtesy of Barcelona Supercomputing Center - www.bsc.es

Facts

EU flag and text: Funded by the European Union

The EuroLingua-GPT project has allocated 8.8 million GPU hours on H100 chips from  EuroHPC’s EUROHPC JU CALL FOR PROPOSALS FOR EXTREME SCALE ACCESS MODE.

Participants: AI Sweden and Fraunhofer IAIS

Project period: 2024-2025

For more information, contact

A picture of Magnus Sahlgren
Magnus Sahlgren
Head of Research, NLU
+46 (0)76-315 34 80
Photo of Amelia Högberg
Amelia Högberg
Project administrator & Event manager NLU team
+46 (0)70-431 92 38