Skip to content

SE.LLMA – Swedish large language models

Large language models (LLMs) and the digital tools built upon them are becoming part of tomorrow’s societal infrastructure. They will form the underlying software upon which an increasing number of digital services, decision-making systems, and knowledge environments rely. A new project, SE.LLMA, addresses this important infrastructure project by training Swedish language models.

For Sweden to remain competitive, deliver efficient and secure public services, and act in line with its values, we need to strengthen research, knowledge, and expertise related to language models. This also involves ensuring that our digital infrastructure is developed with a strong linguistic and cultural grounding in Sweden and supported by legal frameworks that safeguard copyright, transparency, and accountability.

What will the project do?

The SE.LLMA project will train new Swedish language models that deliver high-quality performance in Swedish and will be evaluated on how well such models address Sweden’s culture, history, societal principles, and norms. These are specialized language models designed for use in contexts where linguistic precision, understanding of Swedish conditions, and compliance with legal frameworks are essential.

By developing our own Swedish language models several outcomes are expected. First, we can gain control over data, transparency, and traceability, while also reducing reliance on foreign models that may not meet national security and quality requirements. Second, we want to find a way to acknowledge contributors of data to SE.LLMA, that include public sector organizations, Swedish authors, publishers, journalists, and news media companies.

Therefore, a key component of the project is to lay the foundations for legal frameworks and licensing models that ensure fair compensation for authors and other rights holders whose protected material is used in training.

Who is behind SE.LLMA?

The project has been initiated and is carried out by researchers within WASP, in collaboration with representatives of journalists, news media, authors, and publishers, who contribute data, linguistic expertise, and serve as a reference group for legal aspects.

The project is funded through a research grant from Knut and Alice Wallenberg Foundation and is conducted within WASP. The foundation is Sweden’s largest private funder of long-term, free basic research that is beneficial to Sweden. WASP is one of the foundation’s strategic initiatives which focusses on research areas that are of particular importance for the country’s development. Established in 2015, WASP is run by researchers at Sweden’s six leading technical universities.

Project framework and values

  • The project is firmly committed to protecting the rights of copyright holders. Collected data is used strictly to train Swedish LLMs within this research project and will not be made available to other projects.
  • The work is grounded in academic research within WASP, involving several leading Swedish universities, and promotes openness within the scientific community.
  • The project maintains high ambitions regarding quality in all its aspects.
  • We aim to deliver high-quality Swedish language performance that reflects Sweden’s culture, history, societal principles, and norms.
  • Public documents, as well as high-quality editorial and literary material, are used to build better language models that directly benefit Swedish society.
  • The project will run for two years.
  • This initiative creates a unique opportunity to establish a framework for the development, management, and governance of Swedish LLMs.

Within SE.LLMA, several language models specialized for different domains will be trained. These models are not built from scratch but instead build on existing open European language models that allow transparency into training data and model weights, while complying with legal requirements. Training is conducted iteratively, starting with a smaller model and progressing to larger, more capable models.

Project goals

  • Develop and provide language models with strong Swedish language capabilities by using high-quality training data from the public sector, Swedish authors, publishers, journalists, and news media companies.
  • Develop workflows for evaluation processes and benchmarking specifically designed for Swedish models. A key research question is how to improve the models’ ability to capture the finer nuances of the Swedish language and adapt to Swedish cultural contexts.
  • Evaluate how much performance improves in both small and large models when high-quality Swedish data is included. This will be carried out continuously during model development, with results published on an ongoing basis.
  • Build expertise and a foundation for developing a broader range of models over time, including Swedish adaptations of leading models that represent the state of the art in reasoning and multimodality.
  • Establish the foundations for legal frameworks required for model licensing, as well as long-term governance and management.
  • Build national competence, create an ecosystem around the training, management, and governance of Swedish language models, and establish workflows for data collection, data preparation, fine-tuning, and user-centered evaluation; forming a long-term pathway toward broader use of high-quality Swedish language models.

Project organization

SE.LLMA is led by a project group consisting of researchers from WASP’s partner universities Linköping University, Umeå University, and Uppsala University, along with researchers from NAISS and representation from media companies and other data owners.

As part of the organization, a steering group includes representatives from copyright holders and universities. In addition, several reference groups will be established, focusing on ethics and societal aspects, data and technical choices, language quality, and the legal frameworks required for the long-term governance of Swedish LLMs.

We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. View more
Cookies settings
Accept
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active
The WASP website wasp-sweden.org uses cookies. Cookies are small text files that are stored on a visitor’s computer and can be used to follow the visitor’s actions on the website. There are two types of cookie:
  • permanent cookies, which remain on a visitor’s computer for a certain, pre-determined duration,
  • session cookies, which are stored temporarily in the computer memory during the period under which a visitor views the website. Session cookies disappear when the visitor closes the web browser.
Permanent cookies are used to store any personal settings that are used. If you do not want cookies to be used, you can switch them off in the security settings of the web browser. It is also possible to set the security of the web browser such that the computer asks you each time a website wants to store a cookie on your computer. The web browser can also delete previously stored cookies: the help function for the web browser contains more information about this. The Swedish Post and Telecom Authority is the supervisory authority in this field. It provides further information about cookies on its website, www.pts.se.
Save settings
Cookies settings