PhD student position at the Department of Speech, Music and Hearing at KTH Royal Institute of Technology.
Project description
We are looking for a PhD student interested in Artificial Intelligence, Natural Language Processing and Speech Technology, that will work in a newly funded project at the Department of Speech, Music and Hearing within the School of Electrical Engineering and Computer Science at KTH. The project is financed by the Swedish AI-program WASP (Wallenberg AI, Autonomous Systems and Software Program)), which offers a graduate school with research visits, partner universities, and visiting lecturers.
The newly started project is titled “Thinking Fast and Slow: Real-time Speech Generation for Conversational AI”. The aim of the project is to develop AI-models capable of generating spoken responses in an incremental fashion, mirroring the nuanced and dynamic nature of human conversation. Our approach is inspired by our previous pioneering efforts in the realm of incremental and predictive models for dialogue, which have laid the groundwork for this project. We aim to construct a dual-component system, based on large language models, consisting of a ‘System I’ module for the rapid generation of response prefixes and a ‘System II’ for crafting more considered and detailed responses. This will be complemented by the development of an incremental speech synthesizer, designed to modulate speech rate and prosody in real-time, in response to the unfolding dialogue context. The models will be evaluated in both offline and online settings, employing both simulated interactions to refine our models under controlled conditions and real-world scenarios to validate their effectiveness in practical applications, for example in human-robot interaction.
The position is mainly a research position, with a small fraction of departmental duties (e.g. teaching).
Supervision: Professor Gabriel Skantze and Assoc. Prof. Gustav Eje Henter