SuperLim

The SuperLim project forms a Swedish version of the English benchmarking platform (Super)GLUE. The first version of an evaluation framework for Swedish language models has now been released with a set of language understanding tasks. Being able to evaluate the models within parameters such as performance and bias is key in enabling NLP applications more broadly.

Presentation of the SuperLIM project

Hear the presentation by Aleksandrs Berdicevskis, University of Gothenburg, at the workshop 'Applied Swedish NLP' at SLTC, in November 2020.

Purpose

The SuperLim project forms the basis of a national testbed for Swedish language models by creating a Swedish version of (Super)GLUE. By providing a standardized collection of benchmarking tests for Swedish language models, the collaboration supports the national responsibility of facilitating the development of trustworthy and robust NLP applications.

The benchmarking tests provide an essential step in enabling the implementation of quality NLP applications on a broad scale in Sweden. The transferability of the results from ongoing initiatives and projects depends largely on such an infrastructure.

Project goal

The goal of the project is to start forming a national evaluation framework. The project consists of data collection as well as annotation of data.

Results

Find the pre-release version of SuperLim at Språkbanken.

Background

The goal is to enable applied solutions using NLP on a broad, national level. Swedish Natural Language Processing (NLP) is now undergoing a transformative breakthrough with the development of large-scale Swedish language models. The ongoing initiatives attempting to make Swedish language models available, and to increase the accessibility to Swedish language data, are important steps toward this goal

Each company, university, and public authority will be able to make use of this groundwork when applying models specifically trained for assignments identified within their own organization. This will have a positive impact on efficiency in many different kinds of professions. For example, healthcare personnel can focus on providing care instead of administration.

A precondition for achieving this is to have a common way of describing the quality of the language models and how they work. When developing or implementing applications that are based on Swedish language models, it is of great importance to be able to understand the qualities of both the underlying algorithms as well as the application itself. The performance of a language model is not only about how well it has been trained to understand and perform its specified task. The data it is trained on and how it affects the model is at least as important for our understanding of the results that we receive when applying the models.

To increase the explainability of language models and what they actually do, there are standard benchmarking tests for describing a range of qualities of language models provided in the collections GLUE/SuperGLUE. GLUE - General Language Understanding Evaluation - is a set of language understanding tasks, that enable the evaluation of language models with regards to both performance and bias. Semantic similarity, inherent bias and prejudices, and word in context are examples of tests that can be found on the platform. However, these tests are not adjusted for Swedish language models.

Facts
The project is a collaboration between Språkbanken Text at Gothenburg University, National Library of Sweden, RISE and AI Sweden. Representatives from academia, public and private sector form the reference group. The project is funded by Vinnova.

Project period
SuperLim: 2020-09-01 - 2021-11-31
SuperLim 2.0: 2021-12-01 - 2022-12-31