Excited about DALL·E and GPT-3? Join a strong research team across Sweden taking on the challenge of merging NLP, speech processing, and computer vision into one single model.
We are looking for a postdoc to join the newly started WASP NEST project STING – Synthesis and analysis with Transducers and Invertible Neural Generators, working at Umeå University.
Human communication is multimodal in nature, and occurs through combinations of speech, language, gesture, facial expression, and similar signals. STING aims to design models that capture this richness, uniting synthesis and analysis with the help of transducers and deep neural generative models.
This involves connecting concrete, continuous valued sensory data such as images, sound, and motion, with high-level, predominantly discrete, representations of meaning, which has the potential to endow synthesis output with human understandable high-level explanations, while simultaneously improving the ability to attach probabilities to semantic representations. The bidirectionality also allows us to create efficient mechanisms for explainability, and to inspect and enforce fairness in the models.
The partner research groups bring complementary expertise to the project: KTH has extensive experience with probabilistic deep learning for analysis and synthesis of human verbal and nonverbal communication. Umeå University, on the other hand, are experts on transducer and grammar models for generating semantic graphs, and have recently started to apply these to the task of parsing multimodal data. They also contribute experience with bias analysis and mitigation. Linköping University complements these aspects with in depth knowledge of natural language processing, language being a discrete yet observable signal modality of great interest for bridging the two ends of the project.
In addition to its scientific value, the project is expected to have a substantial societal imprint. The resulting technologies may, e.g., be used to create virtual patients for medical training, to model non-playable characters in video games, and to derive affective states and underlying health issues from human speech and nonverbal behaviour.
This position will be based at Umeå University, but you will be expected to collaborate closely with the PIs and other researchers in the project at the other sites as well.
Possible research directions
- Work on the latest deep generative models with diffusion models and normalising flows
- Combining discrete and continuous methods for synthesis and/or analysis
What we offer
- Stimulating and collaborative research and environment
- Smart people with a variety of competences
- Ability to shape your own research and working conditions
What you bring
- PhD (or near graduation) in machine learning, or application areas such as computer vision, image or voice synthesis, or NLP
- The intellectual curiosity to learn about our other application areas
- Strong analytical and technical skills
- Strong research track record
- The will and drive to make a difference
What you should do
If you are interested, please contact us! You can reach us at:
- Johanna Björklund (johanna@cs.umu.se, Umeå University)
- Henrik Björklund (henrikb@cs.umu.se, Umeå University)
- Frank Drewes (drewes@cs.umu.se, Umeå University)
The other PIs involved in the project are:
- Gustav Eje Henter (KTH Royal Institute of Technology)
- Hedvig Kjellström (KTH Royal Institute of Technology)
- Marco Kuhlmann (Linköping University)