GPT-SW3 is the first truly large-scale generative language model for the Swedish language. It is now released for all organizations to start using it.
AI Sweden, with support from WASP, NVIDIA, and RISE, has developed a large-scale generative language model for the Nordic languages, primarily Swedish. The model was trained on the Berzelius supercomputer hosted at the National Supercomputer Center, Linköping University. With the open release, any company, government agency, or organization can tap into the power of GPT-SW3 to build products and services.
GPT-SW3 is the first truly large-scale generative language model for the Swedish language. It is based on the same technical principles as the much-discussed GPT-4.
“We’re happy that this cross-organizational effort turned out so well, and that we can now share the resulting models under an open license for others to try,” says Johanna Björklund, Project Manager WARA Media and Language.
The models (with 126M, 356M, 1.3B, 6.7B, 20B, 40B parameters) are accessible under an open and permissive license from AI Sweden’s repository on Hugging Face, where also a model card and a datasheet are provided.
GPT-SW3 is a collection of large decoder-only pretrained transformer language models that are developed by AI Sweden with support from the WASP Research Arena for Media and Language, NVIDIA, and RISE.
Decoder language models are generative language models built specifically to be able to generate language (GPT stands for Generative Pretrained Transformer). GPT-SW3 is built on massive amounts of Swedish, Norwegian, Danish, Icelandic, and English text data with the explicit purpose of being able to generate Swedish and Nordic language texts.
GPT-SW3 is not an off-the-shelf product or service ready for use. Developers must harness its capabilities to construct applications like chatbots or document summarization services.
Organizations can further tailor GPT-SW3 by training it on their own data sets for specific tasks.
Published: December 1st, 2023