The importance of data cannot be understated. You need large quantities of high-quality data to train a good language model. For building really large models, you need significant amounts of data, which also need to be sufficiently clean. You also need access to powerful computer infrastructure to train really large models.
The current GPT-SWE model is trained on Linköpings University’s supercomputer, Berzelius using the Megatron framework from NVIDIA, which is specifically designed to optimally utilize the SuperPod architecture. This is important to reduce the training time of the model, which can otherwise become prohibitive.
Show me the numbers!
It took close to 8500 GPU* hours on Berzelius, working with 106 GB of Swedish text as training data. Currently, the model has approximately 3.5 billion parameters.
Is a language model of this size, in a small language like Swedish, unique?
Yes. There are no other billion-parameter model like GPT-SWE that have been presented for a language with a size similar to Swedish. Going forward, preparations are already underway to train an even larger model, with more data.
There are a lot of similarities between Swedish, Norwegian and Danish. What possibilities are there to train a Nordic language model?
That is something AI Sweden is looking into. We aim to train a significantly larger Nordic language model from data in all Nordic languages. Such a model would be able to handle all the Nordic languages, and would likely lead to improved performance for each of the individual languages, due to the variation present in the combined training data.
When OpenAI presented GPT-3, there was a lot of discussion around ethical issues and potential malicious use cases. How is this addressed for GPT-SWE?
Generative language models are specifically designed to generate language. Of course, this means that they can be used for any generative scenario, including the generation of controversial and even harmful content. Language models also learn any artifacts present in the data, such as human biases, prejudice, and the lack of representativity of certain demographic groups. We are well aware of these potential problems, and therefore exercise caution in how we handle and apply the model. This is also the main reason why we are not at this time releasing the model openly; although the goal is to provide access to the model openly to researchers, and to host the model for commercial applications.
All this is fascinating, but help me understand how this can be useful in everyday life.
As the size of the model grows, it will become more competent in tasks like text classification, grammar correction, question answering, summarization, translation, idea generation, information extraction, and much more. The general applicability of large language models is sometimes referred to as “zero-shot” capacity, which means that the model can solve tasks even without being specifically trained for the task. This type of generalized capacity correlates with model size, which is the main reason for our ambition to train really large models.
That means that the model can take instructions in natural language, like “Write a news article about AI,” “What’s climate change?” or even “Fix bugs in the following code,” and generate a text as output. Since text is used in all sectors, in all different kinds of ways, technology that makes this possible will have a high impact in terms of what’s possible, how long it will take and what it will cost.
Which organizations have been working on GPT-SWE?
So far, it’s been a collaboration between AI Sweden and RISE.
I would like to know more about GPT-SWE. Who can I contact?
If your question is more on the technical side, get in touch with Magnus Sahlgren, Head of Research, Natural Language Understanding at AI Sweden. If your organization wants to get involved in AI Sweden’s language projects, get in touch with Francisca Hoyer, Strategic Program Manager for AI Sweden’s NLP initiatives.
* Graphics processing unit
Watch our latest NLP Wednesday edition!
GPT-SWE: Going Larger with Swedish Language Models, With Evangelia Gogoulou, RISE, Ariel Ekgren, RISE and Alice Heiman, AI Sweden.