Speech and motion: Shivam Mehta receives the 2025 WASP Best Thesis Award

Shivam Mehta, former WASP doctoral student and current research scientist at Netflix, has been named the recipient of the 2025 WASP Best Thesis Award. In a recorded message to the WASP community, Mehta shared his excitement, gratitude, and reflections on the research that led to this recognition, along with the people and environment that made it possible.

“I’m absolutely honored and honestly a little overwhelmed,” he said. “The moment I heard this news, I got super excited. It was amazing.” While Swedish visa complications prevented him from attending the ceremony in person, he sent a heartfelt video message to express what the award—and WASP—has meant to him.

shivam mehta accepts 2025 Thesis of the Year award in a video shared at Winter Conference 2026

Making AI more human

Mehta’s thesis focuses on probabilistic speech and motion synthesis, a research area at the heart of how humans and machines interact. His work tackles a fundamental challenge in artificial intelligence: human communication is not clean, deterministic, or uniform, it is expressive, varied, and often unpredictable.

“To build truly autonomous systems, whether digital avatars or social robots, we need them to communicate like humans and not like machines,” Mehta explained.

Traditional AI systems, he noted, often fail to capture this richness because they attempt to “average out” human behavior. Whether it is the way we say hello or gesture while speaking, people express themselves in countless subtle ways. When models smooth over this diversity, the result can feel unnatural or even unsettling.

His research aims to close this expressivity gap by embracing, rather than eliminating, the randomness and variability of real human interaction.

Embracing uncertainty with probabilistic models

At the core of Mehta’s thesis is the idea that future AI systems must embrace uncertainty. Instead of forcing all outputs toward a single “best guess,” his work uses probabilistic and generative models that represent a range of possible behaviors.

“My thesis argues that to build the next generation of AI, we need these probabilistic models that embrace variety,” he said. “Instead of over-smoothening, they try to model it.”

During his PhD, Mehta combined classical techniques such as Hidden Markov Models with modern deep learning approaches, including normalizing flows, diffusion models, and transformer-based architectures. This hybrid strategy allowed him and his collaborators to build models that were not only expressive, but also robust and data-efficient.

Importantly, these were not just theoretical contributions. His work led to several state-of-the-art systems that are now used by researchers worldwide, and in some cases, form part of the technological backbone of large-scale industry applications.

“That’s crazy when you think about it,” Mehta reflected.

Multimodal communication with speech and gesture

One of the defining features of Mehta’s research is its multimodal focus. Human communication is not just about words—it involves tone, rhythm, facial expressions, and body language.

Once his team had developed strong generative models for speech, they expanded their work to include coordinated speech and motion. Using advanced multimodal architectures, they built systems capable of generating speech and gestures together, producing interactions that feel more natural and lifelike.

This work was recognized with best paper awards at major workshops, including CVPR workshops on human motion generation.

From the beginning of his PhD, Mehta wanted to create tools that other researchers could easily adopt.

“Not everyone has tons of GPUs and datasets lying around,” he said.

His team therefore focused on developing practical, accessible software that could be installed and used with minimal overhead. This philosophy of combining rigorous mathematics with strong software engineering became a defining feature of his work. It ensured that his contributions were not only scientifically strong, but also widely usable.

The WASP ecosystem as catalyst

A recurring theme in Mehta’s message was gratitude for the WASP Graduate School and its broader research environment.

“If a PhD is a difficult video game, I got lucky enough to have cheat codes,” he joked. “And the biggest cheat code was undoubtedly the WASP ecosystem.”

He described WASP not just as a program, but as a community—one that extends across universities, disciplines, and research cultures. While based at KTH, he benefited from both his local department and the national WASP network.

“The bubble was no longer just a university department,” he said. “It was the entire Sweden and the best minds of Sweden.”

Through conferences, research arenas, and interdisciplinary collaboration, Mehta gained new perspectives on how technical research can translate into real-world impact.

Mentorship and the WASP community

Mehta also highlighted the importance of mentorship and personal support. He offered special thanks to his supervisors Gustav Eje Henter and Jonas Beskow and the research environment at KTH.

“I wrote in my thesis that Gustav was the optimizer to my randomly initialized neural network PhD life,” he said. “He tuned my learning rate and helped me navigate this loss surface of a PhD with patience and trust.”

This combination of intellectual freedom, guidance, and community support shaped not only his research, but his confidence in pursuing ambitious ideas.

Now working as a research scientist at Netflix, Mehta continues to apply his expertise in generative modeling, multimodal AI, and human-centered machine learning. His journey from doctoral research to real-world deployment reflects WASP’s mission to connect foundational science with societal and industrial impact.

“Thank you WASP and thank you everyone who is responsible for creating such an environment,” he concluded, “where ambitious research isn’t just allowed, but encouraged.”

Published: January 19th, 2026

[addtoany]

Speech and motion: Shivam Mehta receives the 2025 WASP Best Thesis Award

Making AI more human

Embracing uncertainty with probabilistic models

Multimodal communication with speech and gesture

The WASP ecosystem as catalyst

Mentorship and the WASP community

Latest news

AD-EYE featured in Swedish TV news program

Uppsala University becomes a new partner university in WASP

New initiative aims to give Sweden its own large-scale AI language model

Subscribe to our newsletter