The SuperLim project forms the basis of a national testbed for Swedish language models by creating a Swedish (Super)GLUE. By providing a standardised collection of benchmarking tests for Swedish language models, the collaboration supports the national responsibility of facilitating the development of trustworthy and robust NLP applications. The benchmarking tests provide an essential step in enabling the implementation of quality NLP applications on a broad scale in Sweden. To a large extent, the transferability of the results from ongoing initiatives and projects depends on such an infrastructure.
The goal of the project is to start forming a national evaluation framework. The project consists of data collection as well as annotation of data.
Find the pre-release version of SuperLim at Språkbanken, here
Swedish Natural Language Processing (NLP) is now undergoing a transformative breakthrough with the development of large-scale Swedish language models. The ongoing initiatives and projects aiming to make available Swedish language models and increase the accessibility to Swedish language data are important steps towards enabling the utilisation of NLP on a broad, national level; within public and private sector as well as academia. Each company, organisation, university, and public authority will be able to make use of this groundwork when applying models specifically trained for assignments identified within their own organisation. This could have a positive impact on many different kinds of professions; allowing for example health care personnel to focus on providing care instead of administration.
A precondition for realising this value is to have a common way of describing the quality of the language models and how they work. When developing or implementing applications that are based on Swedish language models, it is of great importance to be able to understand the qualities of both the underlying algorithms as well as the application itself. The performance of a language model is not only about how well it has been trained to understand and perform its specified task; the data it is trained on and how it affects the model is at least as important for our understanding of the results that we receive when applying the models.
To increase the explainability of language models and what they actually do, there are standard benchmarking tests for describing a range of qualities of language models provided in the collections GLUE/SuperGLUE. GLUE - General Language Understanding Evaluation - is a set of language understanding tasks, enabling evaluation of language models with regards to both performance and bias. Semantic similarity, inherent bias and prejudices, and word in context are examples of tests that can be found on the platform. However, these tests are not adjusted for Swedish language models.