The WASP Research Arena for Media and Language (WARA M&L) builds a multidisciplinary ecosystem that bridges scientific fields and industrial sectors.
Our Objectives
The arena builds strong partnerships through collaborative projects in Media AI, leveraging ongoing investments in media infrastructure to evaluate emerging technologies and promote cross-industry knowledge sharing. Our research is centered on the generation and analysis of media data, as well as understanding its broader societal impacts. To accelerate advancements through various technology-readiness levels, we offer comprehensive support in data management, benchmarking, and engineering.
Photo: Peter Karlsson, Svarteld form & foto
Research Focus Areas
Through active dialogue with our community, we’ve identified key areas of research: Embodied Machine Learning, Graph-Based Models, Multimodal Foundation Models, and Interactive and Creative AI.
Embodied machine learning
Machine learning models in robotics and autonomous systems are often trained on small, task-specific datasets and struggle with skill transfer to new tasks. In contrast, foundation models, pretrained on large datasets, show better generalization and can solve problems not directly represented in their training data. These models, especially when multimodal, have the potential to significantly improve robot autonomy, from perception and human-robot interaction to planning. Vision-language models, for example, enhance visual recognition and generalizable action planning. Furthermore, robots that autonomously interact with their environment and update their models through real-time data collection are key to building more informed foundation models. A combination of reinforcement learning, representational learning, and language grounding could help solve many current challenges. The arena benefits from expertise in this area through its collaboration with Danica Kragic and KTH’s Robotics, Perception, and Learning lab, and sees a potential for further interaction with WARA Robotics.
Graph-based models
Generative AI is a popular research area with diffusion models and normalizing flows being applied to diverse tasks, such as language, image generation, source code, gestures, and music. However, current methods often generate media without symbolic representation, such as raster images instead of vector-based ones, limiting user flexibility and making it harder for systems to maintain semantics when editing images. For example, generative systems like Midjourney may misinterpret images when asked to create variations, leading to distorted results. To address this, a two-step generation process, first producing a graph-based representation and then the surface form, could improve accuracy. Key researchers in this area include Frank Drewes, Anastasia Varava, Henrik Björklund, and Ruibo Tu.
Multimodal foundation models
The recent collaboration with RISE, NVIDIA, and AI Sweden on Language Models (LLMs) has been valuable for both practical and academic insights, particularly around the GPT-SW3 model series (access the model here). The project has deepened understanding of user needs and challenges, with many favoring model-agnostic systems to integrate the best cost-performance LLMs. Some organizations, however, may require private, cloud-based instances to protect intellectual property. Public bodies with sensitive data, such as the Swedish Tax Agency and Swedish Armed Forces, need open models that can be hosted on premises. Looking ahead, the focus is on developing small to medium-sized foundation models, especially for multimodal data like time-series or graphs, where significant scientific and practical gains are expected. Ongoing collaboration with AI Sweden will complement this by providing larger, more versatile models. Key researchers in this area include Love Börjesson (KB Labs) and Marco Kuhlmann (LiU), with an emphasis on attracting international talent.
Interactive and Creative AI
AI is fundamentally transforming how we interact with data, necessitating the parallel development of the fields of Human-Computer Interaction and User Experience. On one hand, challenges arise in ensuring Trustworthy and Explainable AI. For instance, there is the question of how to convey to the user the information and assumptions on which an AI bases its decisions, help the user understand what tasks fall outside the scope of the AI’s capabilities.
Another important question is human-AI teaming, which requires effective methods for interpreting and controlling semi-automatic systems. The context can be autonomous mining or forestry, but can also encompass gaming and media production workflows. To this end, we want to collaborate with international contacts acquired through the Gaming Stream, and with the researchers linked to WASP-HS. Arena members with specific expertise in this domain include Gutav Eje Henter and Konrad Tollmar. Relevant partners include Electronic Arts, King and Motorica with which we organize events and projects, including the GENEA Challenge.
Photo: Peter Karlsson, Svarteld form & foto
Our Community
At WARA Media & Language, our community is a vibrant, inclusive, and collaborative network that brings together experts from various industries, academia, and PhD students. We believe in the power of diverse perspectives, and our community thrives on the exchange of ideas across these sectors. Throughout the year, we host a variety of seminars and conferences that foster deeper engagement and knowledge sharing. Two key recurring events are the WASP Summer School on Generative AI, where participants explore the latest advancements in AI, and the WARA Community Days, which serve as a hub for our community to connect, collaborate, and dive deeper into cutting-edge research and applications in Media AI.
To join our community, sign up here, you will also get on our email list so that you get the latest news and opportunities in the arena.
Photo: Peter Karlsson, Svarteld form & foto
Opportunities for PhD Students
We hosts an annual WASP Summer School on Generative AI at the Visualization Center in Norrköping. This event features expert lectures on generative language models, invertible neural networks, and speech synthesis. Students also collaborate on designing prototype avatars, with final presentations held in the 3D dome theatre at the Visualization Center. This week-long event offers both in-depth knowledge and valuable networking opportunities with peers, professors, and industry experts.
We also offer various workshops and events in collaboration with our partners.
Video: Peter Karlsson, Svarteld form & foto
WARA Media & Language Podcast
Stay updated on the latest AI research in Media, Language, and Gaming by tuning into the WARA Media & Language podcast. Hear insights from industry leaders, tech companies, and startups within our community.
Collaborate with us
We welcome collaborations with organizations and individuals dedicated to advancing AI in Media and Language. Our network includes both established enterprises and international initiatives like Mila in Montreal and the British DPP.
Contact: johanna@cs.umu.se to explore collaborative opportunities.
Stay Connected: Follow us on LinkedIn to stay updated on upcoming events and opportunities.
Video: Peter Karlsson, Svarteld form & foto
The Core Team
Johanna Björklund, Project Manager, Umeå University & Codemill AB
Sandor Albrecht, Co-project Manager, KAW
Ivana von Proschwitz, Community Manager, WARA Media & Language
Anastasia Varava, Data Scientist, SEB
Konrad Tollmar, Research Director, EA Games / Associate Professor, KTH Royal Institute of Technology
Gustav Eje Henter, Assistant Professor, KTH Royal Institute of Technology
Jonas Unger, Professor, Linköping University
Alexandra Kafka Larsson, CEO, Parsd