How can we make large language models safer?

Several powerful language models have been released in the last months powering some of the most popular applications, most notably ChatGPT and GPT-4. But how safe are these models to use? This question is the focus of episode 6 of the WARA Media & Language podcast.

Language models are trained on huge amounts of data, which is typically collected from publicly available online sources such as websites, source code repositories and social media. Their proficiency and acquired knowledge are heavily reliant on the quality of the training data. Consequently, it is crucial to exercise control over both data and models, and be mindful of the potential biases that may arise from the training process.

Responsible AI studies how AI can be developed and used in a responsible way, for example, how we can minimize the risk of harmful biases in the trained models. A problem with many of the currently leading language models is that little information has been shared about how they were trained, and how the training data was gathered.

“We do not know what data sets are used and how the models are trained,” Associate Professor Henrik Björklund and PhD student Hannah Devinney from Umeå University point out.

“We don’t want self-driving cars that run cyclists over, nor do we want algorithms that assess loan or job applications in an unfair way,” says Henrik Björklund.

Another aspect that users need to be aware of is the truthfulness of the content that the language models produce. The information may look completely correct, with references to various sources, but in reality, the content may be completely fabricated without any truth whatsoever.

“It is always important to be critical and fact-check what the models produce. Things can go very wrong if you blindly trust what the language models suggests, for example, if you would use the answer to treat an ill or injured person,” adds Hannah Devinney.

GPT-SW3 is the largest language model based on the Nordic languages. It has been realized in collaboration between WASP through WARA Media & Language and partners such as AI Sweden, Rise, and NVIDIA. The team has consistently documented and published their training data in an effort to achieve transparency and improve reliability. The model has been nominated for “Best Use of Tech” by Tech Awards Sweden.

“With initiatives like this, we can make the new technology relevant to more people, while reducing the risk that harm is done to already vulnerable groups,” says Johanna Björklund, Project Manager of WARA Media & Language.

The basic research conducted within WASP has enormous potential to contribute to a positive and sustainable societal development. To realize this potential, it’s critical to investigate and make efforts to mitigate risks for citizens and society.

Within WASP, there are several efforts aimed at making AI solutions more responsible. A concrete example is that, in collaboration with WARA Media & Language and NVIDIA, Henrik Björklund and Hannah Devinney are investigating how to further reduce social bias by using the NVIDIA NeMo series of large language models. The series involves models with up to 530 billion parameters, whose primary language is English. The goal is to improve the already existing bias mitigation measures, and also to find ways of transferring them to other languages, so that they can be used together with, e.g., GPT-SW3. In addition, WARA will explore additional safeguards using NeMo Guardrails software.

Link to the episode (Spotify)

Published: June 7th, 2023

How can we make large language models safer?

Latest news

Niclas Fock steps down as Director of WASP Research Arenas

WASP research ecosystem to benefit from NAISS–Nvidia–KBLab partnership

From troubleshooting to problem solving – Tobias Sundqvist paves the way for AI at Tietoevry

Subscribe to our newsletter