By forming the basis of a national evaluation framework for Swedish language models, the project 'SuperLim' will enable evaluation with regard to both performance and bias. This is a key component in facilitating Natural Language Processing (NLP) applications on a broad scale in Sweden, which in turn have the potential to positively impact many different kinds of professions; allowing for example health care personnel to focus on providing care instead of administration.
Language models are the underlying algorithms of NLP applications. The performance of a language model is not only based on how well it has been trained to understand and perform its specified task. The data that the model has been trained on and how it affects the model is at least as important for our understanding of the results that we receive when applying the models.
Språkbanken at Gothenburg University, National Library of Sweden, RISE and AI Sweden have joined forces and received funding from Vinnova to start forming a national and standardised testbed, consisting of Swedish evaluation data. This will enable a better and common understanding of the language models that are being developed, and how they work. The collaboration is a key contribution to the national responsibility of facilitating the development of trustworthy and robust NLP applications.
Magnus Sahlgren, Head of Natural Language Processing Group at RISE, is the coordinator of the project, and explains that the project will further the development of Swedish NLP by taking the first step towards a Swedish evaluation framework.
“The transformative breakthrough that Swedish NLP is undergoing right now - with the development and application of large-scale language models - gives completely new possibilities for the usage of NLP in Sweden. A precondition for this development to be viable and have its desirable effects, is that there is Swedish evaluation data available that can be used to judge and compare the quality of the models that are developed.”
The English original benchmarking platforms GLUE/SuperGLUE (General Language Understanding Evaluation) consist of a set of language understanding tasks, enabling evaluation of language models with regards to both performance and bias. The tasks are for example used to evaluate and describe the degree of bias and prejudices that the model has learned from the underlying data. Another example is the 'word-in-context' task, which tells how well the model can understand different meanings of the same word, based on the context (homonyms). These tests are currently only adjusted for English language models, which is why they now will be developed for Swedish language models.
A number of important initiatives are currently taking place in Sweden to develop and make available Swedish language models. The 'SuperLim' project is one of the latest contributions to these initiatives. AI Sweden’s co-director Daniel Gillblad emphasises the importance of developing the Swedish area within NLP;
“A lot of the data that is necessary for us to build the future’s AI solutions consists of text and natural language. The language models that are needed to make use of this data have become a lot better in the recent couple of years, but we need to focus even more on the Swedish language. This project will result in better Swedish language models and smarter AI solutions for Sweden.”
As more and more organisations and companies start implementing NLP solutions specifically trained for assignments identified within their own organisation, the need of having a common way of describing the quality of the language models and how they work, will continue to grow. This project will lay an important foundation for that common understanding.Språkbanken Text at Gothenburg University, National Library of Sweden, RISE and AI Sweden. The project is funded by Vinnova.