The mission of WARA-Media and Language is to build a multidisciplinary ecosystem around Media AI, connecting scientific fields and a diversity of industrial segments.
The uptake of media technologies has reached the point where they are prevalent, enabling innovation and productivity everywhere from the board room to the factory floor. When media technology is combined with AI, the opportunities are seemingly endless. We can, for example, automate complex workflows, extract and fuse data from heterogenous sources, improve human-machine interaction, and advance the state of the art in remote sensing and control. To explore the new landscape to the fullest, and to find sustainable solutions that promote trust, transparency, and fairness, academic and industry research must go hand in hand.
The primary of objective of WARA-Media and Language is to foster personal relationships through shared tasks, one-to-one partner matching, and focused research seminars. Additionally, the arena leverages on-going investments in media infrastructure to verify emerging technologies and transfer solutions between verticals. It addresses research topics related to the generation and analysis of media data, and extrinsic effects of the same. To overcome thresholds for collaboration, the arena provides data management and benchmarking, together with engineering support and legal consultation on IP matters.
Objectives for participation
- Access to testbeds in an industrial setting
- More relevant research questions through cross-boundary collaboration
- Aid visibility, practical relevance, and impact of WASP research
- Support the WASP Project Course
- Increased network facilitates a future research career in industry or academia
- Opportunities for internships, study visits, and industry co-advisors
- Build new and strengthen existing networks between industry and academia
- Faster knowledge transfer between academia and industry
- Access to competence and talent in emerging technological fields
Research Focus Areas
Generation and Analysis
Generation and understanding of data are fundamental parts of Media AI. It might be to establish structured models of concepts that can be used to detect, process, and create instances of the concepts in a media setting. The models can, e.g., consist in statistic distributions over a language of graphs or vector embeddings. A central challenge is that of language grounding, where the objective is to link the entities of a natural-language sentence with objects in a different modality. The area also covers research in semantic parsing and media synthesis.
Sensing and Control
Combining media technology and AI, we are able to extend the human senses and reach of control. The context for research in this area is often that of telepresence or edge computing. Typically, applications in this field need sophisticated technological solutions that draw on computer vision, data visualisation, haptic feedback systems, and smart interfaces. Central techniques and concepts are hierarchical classification, adaptive sensing, embodiment and adjustable autonomy.
Verification and Validation
An important component when applying AI to real world challenges is ensuring fairness, accountability, and transparency. Verification and validation of AI systems are key to understand the systems we build, ensuring that they are safe, purposeful, and transparent. This also concerns mapping out the interplay between the new technology and society.
These issues can be addressed through bias management by understanding the effects of training data tools built by machine learning, by adding explainability as a quality notion to decision making procedures based on artificial intelligence. Bias management is encompassed in the larger topic of trustworthy AI.
The arena will develop with the needs of its users. To give an idea of what kind of challenges the arena could support, here is a list of typical topics related to Media AI.
- Link the entities of a natural-language sentence with objects in a different modality. The sentence can be an image caption, and the task would be to tie the words in the caption to segments in the associated image
- Mapping words in a source-code comment to function and variable names, to find all comments that have to be updated when a portion of the program is rewritten.
- Translate a natural-language query to an SQL query.
- Create high-fidelity instances of a target concept, based on a limited set of samples. A concrete application is to mimic the intonation pattern of a voice actor when applying speech synthesis to a new manuscript.
- Through hierarchical classification, decompose complex classification tasks into cascades of either simpler ones, or gradually more complex. The value lies, e.g., in streamlining video streaming with multiple 8K cameras over an edge network.
- Equip digital sensors with intelligent control that continuously analyses the signal and adjusts control parameters to maximise sensitivity and precision, mimicking biological functions.
- Develop systems for human operator control of machines that are distributed over various remote sites and perform complex tasks. The operator should be able to – metaphorically – step into the shoes of the machine and control it in a natural and efficient way.
- Understanding the effects of training data tools built by machine learning, adding explainability as a quality notion to decision making procedures based on artificial intelligence.
- Study how media content is understood, transmitted, and contextualised by its users, and how the new media formats, ways of distributing media content, situations for consumption, and algorithms and methods for individualising the media diet effects societal systems. How will those effects influence the media itself, the demand for media, and thus the requirements and priorities we pose for technology development?
- Virtual verification environments are both a research topic and a tool for core ML. It can be expensive, time-consuming, and even dangerous to evaluate autonomous systems in a real-world setting. Verification in simulated environments admits production of statistically unlikely but critical situations, and also enables generative adversarial approaches to discover weaknesses in the evaluated system, with low risks.
Resources and Services
The arena provides both technical, structural and legal resources.
- Software engineering team
- Legal support and contracting
- Research packaging
- Foster collaboration and community building
- Crowd-sourced and access-restricted
- Multimodal focus: video, images, speech etc.
- Curated datasets for benchmarking
Data Generation and Capturing
- Controlled environments
- Secure data sharing
- Synthetic data creation
The Core Team
Johanna Björklund, Project Manager, Umeå University & Codemill AB
Kambiz Ghoorchian, Data Scientist, SEB
Johan Mattsson, Head of Strategy & Architecture, SEB
Sandor Albrecht, Co-project Manager, WALP
Jussi Karlgren, Co-project Manager, Spotify
Konrad Tollmar, Research Director, EA Games / Associate Professor, KTH Royal Institute of Technology
Gustav Eje Henter, Assistant Professor, KTH Royal Institute of Technology