Language Models for Swedish Authorities

The project develops large-scale language models for the Swedish public sector, enabling a number of applications with specific relevance to Swedish authorities.

Purpose

RISE (Research Institutes of Sweden), the project coordinator, describes the project here as follows:

“This project provides the tools and prerequisites for Swedish authorities to build and integrate state-of-the-art NLP solutions in their current and future services. The development of state-of-the-art Swedish language models specifically designed for use in the public sector, and techniques for utilizing them, will enable novel types of NLP applications that have the potential to revolutionize the use of NLP in the Swedish public sector. The data, infrastructure and frameworks developed in the project will also enable the development of novel types of representation learning, and will thus ensure that Swedish NLP remains at the forefront of development."

The focus of the project is initially on four use cases: machine understanding and conversational AI, semantic textual similarity, entity tagging (named entity recognition) and text classification.

AI Sweden is responsible for coordinating and validating the data used in the project and for distributing the results and the knowledge to the reference group, external stakeholders and interested parties. Organizing workshops and events (such as hackathons) will be part of the distribution activities, as well as cooperating and sharing knowledge, experience, and results with the other NLP projects and data owners.

Background

The language models developed during the course of the project will eventually assist the authorities in sifting through, categorizing and finding the right information in large amounts of text. By using NLP to make communication more automated, there is a great benefit to society in significantly reducing costs and inefficient use of resources. Public authorities, for example, can be assisted with compiling and summarising reports or cutting normally long service queue times. When authorities need to manage large amounts of documentation, NLP will allow them to link together different text documents based on their content. For example, this benefits the Swedish Public Employment Service in improving the matching process between applicants and job adverts.

Facts

Project partners: RISE, Peltarion, Swedish Public Employment Service, Luleå University of Technology, Swedish Agency for Economic and Regional Growth, Swedish Tax Agency, National Library of Sweden and AI Sweden. RISE is responsible for project management. The project is funded by Vinnova.

Project period: November 2019 - October 2022

Work packages and project plan

The project will be completed in October 2022 and is divided into five work packages.

Work package 1
Data and evaluation (AI Sweden 70% of WP1 hours)

Work package 2
Algorithms and architecture

Work package 3
Implementation

Work package 4
Applications

Work package 5
Distribution of results and coordination (AI Sweden 90% of WP5 hours)

Project deliveries and status update (September 2020)

Data Readiness Level for Natural Language Processing (September 2020)

About the document: For authorities as well as any organization that aims to use their own data to develop solutions that involve training or fine-tuning of language models, there are certain requirements on the quality of the data that need to be considered. As a foundation for the project, RISE has summarised its extensive knowledge of these requirements and related challenges in a guiding document.

"The purpose of this document is to outline and highlight issues related to data accessibility, validity, and utility that may arise in [these] situations."

The document aims to "... provide insights into the type of challenges one might encounter, with respect to data, when embarking on a project involving NLP. The document is focused on asking the right questions rather than providing an explicit guide that covers all possible challenges in a project: such a guiding will inevitably vary with the specific task at hand".

Access the Data Readiness Level for NLP document