Skip to main content

OpenEuroLLM takes the next step for European AI sovereignty

Wednesday, March 11, 2026

It has now been a year since the launch of OpenEuroLLM, one of Europe's most ambitious AI initiatives. By uniting 20 leading research institutions and companies, the project has, during its first year, laid the foundation for a new generation of open language models aimed at stronger European digital sovereignty and competitiveness.

OPEN EURO LLM project co-funded by the European Union

During its first year, the OpenEuroLLM project reached crucial milestones in infrastructure, data practices, and model development. The purpose is clear: to develop next-generation open-source language models to advance European AI capabilities.

Nina Ökvist

OpenEuroLLM is proof that cutting-edge technical expertise combined with a strong European network is a prerequisite for our success in large-scale AI development. For Sweden and our partners, this represents a unique opportunity to build on an open and transparent foundation that strengthens our shared innovative power.

Nina Ökvist

Nina Ökvist

Head of NLU at AI Sweden

Breakthroughs in open data and infrastructure 

One of the project's most significant successes is the launch of the MixtureVitae dataset. It is the first dataset that is free to use, even for commercial purposes, with performance that matches or exceeds the leading restrictive alternatives on the market. MixtureVitae is particularly strong in code and mathematical reasoning, which is critical for the next step in industrial AI applications.

To meet the challenge of data scarcity for smaller European languages, the project, together with EuroLLM, has developed the first comprehensive multilingual synthetic dataset for pre-training. Within the subproject, called MultiSynt, AI Sweden's NLU team has worked on translating high-quality English data into languages such as Swedish, Icelandic, Hungarian, and Spanish. The work aims to overcome the shortcomings in current data collection methods that limit the accurate representation of many languages.

By creating an open multilingual dataset, the project aims both to enable the training of language models within OpenEuroLLM and to drive research on multilingual models forward. By making these resources available on European supercomputer systems like LUMI and Leonardo, duplication of effort is avoided, and resources can be maximized throughout the ecosystem.

Computational capacity is a crucial building block

In December 2025, OpenEuroLLM became the first AI project to be granted strategic access to several of EuroHPC's supercomputers simultaneously, including LUMI, Leonardo, Jupiter, and MareNostrum 5. However, additional computational resources will be required to supplement the previous allocations, emphasizes the project's coordinator, Jan Hajič.

Jan Hajič

Creating an open source multilingual LLM in the public space and within a large consortium is a challenging task. I am proud that thanks to the expertise, enthusiasm, commitment and hard work of especially the core partners the project has achieved its first-year goals. However, significant challenges, especially in securing more compute for creating the final models, still remain.

Jan Hajič

Jan Hajič

Charles University

The way forward

Over the coming year, AI Sweden will gear up its work to create the conditions for the European models to become practically useful. Within the framework of post-training - the critical phase where models are fine-tuned for their specific purposes - AI Sweden's NLU team is focusing on equipping the models with the capabilities and behaviors required for advanced use. Specifically, this involves optimizing the models' ability to handle long contexts, improving instruction following and chat interaction, and strengthening their capacity for reasoning and function calling.

During the fall, the first language models developed within the OpenEuroLLM framework are scheduled to be published.

About EuroHPC

European High Performance Computing (EuroHPC) consists of a cluster of large-scale computational infrastructure in Europe. The computational systems are primarily intended for use in academic research. The EuroHPC framework also includes support for research and innovation in the form of calls for proposals in all areas related to large-scale computing, as well as investments in competence centers across Europe designed to facilitate knowledge exchange, innovation, and new research collaborations.

Related articles

Anders Krifors and Lorna Bartram

Large language models to analyze healthcare incidents

2026-02-05
Can large language models be used to strengthen patient safety? Results from Region Västmanland suggest that the answer is yes. "Healthcare has collected a goldmine of information that we now see an...
Rows of black server racks with colorful cables in a bright, modern data center with glass floor panels. Eu-flag with the text 'Co-funded by the European Union'

OpenEuroLLM secures strategic access to multiple supercomputers: A milestone for European language models

2025-12-18
OpenEuroLLM is set to develop high-performance language models for Europe. The major European collaborative project has now been granted 10 million GPU hours on Europe’s most powerful supercomputers....
Nina Ökvist and Magnus Sahlgren (composite image)

Unique language model expertise strengthens Swedish and European independence

2025-11-05
The availability of large, open language models, along with the expertise required to develop, maintain, and utilize them, is of great importance for the sovereignty of Sweden and Europe.
Danila Petrelli

Sustainable data strategies for natural language understanding

2025-11-05
The accelerating use of large language models is unlocking enormous value for organizations. But to fully realize this potential, a critical bottleneck must be addressed: Sustainable data management....