AI Sweden is joining the National Library of Sweden and Scaleout Systems for a pilot on federatively trained language models. This will be the first federative, large-scale modelling of artificial neural networks for language comprehension in Sweden and one the first examples world-wide. The potential impact is substantial as it would enable more actors to use large, existing datasets without the data ever leaving the point where it originated - thus solving pressing challenges around data sharing and privacy. The pilot could also be a first, important step towards a shared Scandinavian language model.
The digital collections at the National Library are the largest and most advanced ones existing for the Swedish language today. They are used for some of the most successful work with large language models, including the widely used Swedish language model KB-BERT. The new pilot study will allow the National Library to combine their own data with text resources from for example other national libraries. As a first step, data from the Norwegian National Library will be included and thereafter potentially be expanded to Denmark and Finland as well as the Swedish University Libraries. Moreover, it will give other actors in Sweden the opportunity to train and benchmark large language models as well.
The project is a great example of the potential in combining the power of AI Sweden’s two strategic programs on Decentralized AI and Applied Language Technology. Exploring federated learning when developing Swedish language models is a fantastic next step for the advancement of the Swedish language and all of the Scandinavian languages which will likely be able to benefit from each other, says Johanna Bergman, Head of Project Portfolio at AI Sweden.
Read more about the project here >