Skip to main content
conference lunch move company map contacts lindholmen lindholmen 2 travel info

Swedish Language Data Lab

A thoroughly developed base for Natural Language Processing (NLP) is one of the cornerstones of successful AI applications. NLP is one of AI Sweden's strategic areas, and the Swedish Language Data Lab was the first of our NLP projects initiated.

Latest news from the Swedish Language Data Lab

Head of Project Portfolio Johanna Bergman, tells us about the project status of the Swedish Language Data Lab, September 2020

Background

Natural Language Processing (NLP) creates opportunities in developing methods, tools, and applications that is based on machine understanding of the human language. In this way, NLP enables making the information-bearing data more available and accessible to us in many different contexts. These NLP-based applications can assist us in extracting the relevant information based on the context, by doing summaries, simulations, interpretations, and much more - of large amounts of language data. 

The algorithms that form the basis of these applications are called language models. Development of language models specifically designed for the Swedish language relies to a large extent on data specifically written (or spoken) in Swedish. Additionally, Swedish is a small language and global players rarely have an interest in producing annotated data sets for Swedish. The development of language models in Swedish is important to maintain linguistic diversity and promote innovation in the field of NLP in Sweden, which will benefit a whole series of organisations in the academic world, industry and the public sector.

Purpose

The Swedish Language Data Lab is a project funded by Vinnova and coordinated by AI Sweden. It is an explorative project, based on collaboration between leading players in the field of NLP and stakeholders from public sector and academy. The aim of the project is to collect the know-how and connected challenges of some of the important steps in the NLP implementation process - from identifying the needs, to evaluating trained language models. The work is divided into several work packages with the aim to: 

  • Develop and make available trained Swedish language models; a NER model and two sentiment analysis models
  • Produce a technical, legal, and ethical framework for processing and facilitating accessibility to Swedish language data sets.
  • Analyse text and models from the perspective of spoken dialogue.
  • Perform requirement analysis and data harvesting in the public sector
  • Conduct preliminary studies for NLP specifically developed for the medical and legal domains.

The focus for the upcoming year in terms of data, will be on investigating the alternatives for facilitating and increasing the access to Swedish datasets in general. One part of this work, is the starting of the development of a platform for training models without seeing the actual data. 

Project goal

The overall goal of the project is to create a national knowledge hub within NLP, that will accelerate innovation, research, and applications in this area. The project forms part of Vinnova’s “Data-driven innovation” funding programme which aims to “increase the level of expertise in reusing data in innovations in Sweden”. It is also in line with the EU strategy for the digital transition and, in particular, for data and artificial intelligence. The strategy highlights the importance of broadening the access to data in order to “create added value for citizens”, while at the same time ensuring that individuals have greater control over their own data.

Facts

The project is coordinated by AI Sweden. Recorded FutureGavagai and Talkamatic provide language technology expertise, while Språkbanken, the language research unit at the University of Gothenburg, and the Swedish Association of Local Authorities and Regions (SKR) are stakeholders and owners of data. A wide variety of other stakeholders also support the project by providing letters of support and taking part in the reference group.

Project period: 20190601-20210530

Contact

Project Manager

Isabelle Johansson