Shivam Mehta presented his paper called “Matcha-TTS: A Fast TTS Architecture with Conditional Flow Matching” in 2024.

Shivam Mehta, former WASP doctoral student and current research scientist at Netflix, has been named the recipient of the 2025 WASP Best Thesis Award. In a recorded message to the WASP community, Mehta shared his excitement, gratitude, and reflections on the research that led to this recognition, along with the people and environment that made it possible.

“I’m absolutely honored and honestly a little overwhelmed,” he said. “The moment I heard this news, I got super excited. It was amazing.” While Swedish visa complications prevented him from attending the ceremony in person, he sent a heartfelt video message to express what the award—and WASP—has meant to him.

shivam mehta accepts 2025 Thesis of the Year award in a video shared at Winter Conference 2026

Making AI more human

Mehta’s thesis focuses on probabilistic speech and motion synthesis, a research area at the heart of how humans and machines interact. His work tackles a fundamental challenge in artificial intelligence: human communication is not clean, deterministic, or uniform, it is expressive, varied, and often unpredictable.

“To build truly autonomous systems, whether digital avatars or social robots, we need them to communicate like humans and not like machines,” Mehta explained.

Traditional AI systems, he noted, often fail to capture this richness because they attempt to “average out” human behavior. Whether it is the way we say hello or gesture while speaking, people express themselves in countless subtle ways. When models smooth over this diversity, the result can feel unnatural or even unsettling.

His research aims to close this expressivity gap by embracing, rather than eliminating, the randomness and variability of real human interaction.

Embracing uncertainty with probabilistic models

At the core of Mehta’s thesis is the idea that future AI systems must embrace uncertainty. Instead of forcing all outputs toward a single “best guess,” his work uses probabilistic and generative models that represent a range of possible behaviors.

“My thesis argues that to build the next generation of AI, we need these probabilistic models that embrace variety,” he said. “Instead of over-smoothening, they try to model it.”

During his PhD, Mehta combined classical techniques such as Hidden Markov Models with modern deep learning approaches, including normalizing flows, diffusion models, and transformer-based architectures. This hybrid strategy allowed him and his collaborators to build models that were not only expressive, but also robust and data-efficient.

Importantly, these were not just theoretical contributions. His work led to several state-of-the-art systems that are now used by researchers worldwide, and in some cases, form part of the technological backbone of large-scale industry applications.

“That’s crazy when you think about it,” Mehta reflected.

Multimodal communication with speech and gesture

One of the defining features of Mehta’s research is its multimodal focus. Human communication is not just about words—it involves tone, rhythm, facial expressions, and body language.

Once his team had developed strong generative models for speech, they expanded their work to include coordinated speech and motion. Using advanced multimodal architectures, they built systems capable of generating speech and gestures together, producing interactions that feel more natural and lifelike.

This work was recognized with best paper awards at major workshops, including CVPR workshops on human motion generation.

From the beginning of his PhD, Mehta wanted to create tools that other researchers could easily adopt.

“Not everyone has tons of GPUs and datasets lying around,” he said.

His team therefore focused on developing practical, accessible software that could be installed and used with minimal overhead. This philosophy of combining rigorous mathematics with strong software engineering became a defining feature of his work. It ensured that his contributions were not only scientifically strong, but also widely usable.

The WASP ecosystem as catalyst

A recurring theme in Mehta’s message was gratitude for the WASP Graduate School and its broader research environment.

“If a PhD is a difficult video game, I got lucky enough to have cheat codes,” he joked. “And the biggest cheat code was undoubtedly the WASP ecosystem.”

He described WASP not just as a program, but as a community—one that extends across universities, disciplines, and research cultures. While based at KTH, he benefited from both his local department and the national WASP network.

“The bubble was no longer just a university department,” he said. “It was the entire Sweden and the best minds of Sweden.”

Through conferences, research arenas, and interdisciplinary collaboration, Mehta gained new perspectives on how technical research can translate into real-world impact.

Mentorship and the WASP community

Mehta also highlighted the importance of mentorship and personal support. He offered special thanks to his supervisors Gustav Eje Henter and Jonas Beskow and the research environment at KTH.

“I wrote in my thesis that Gustav was the optimizer to my randomly initialized neural network PhD life,” he said. “He tuned my learning rate and helped me navigate this loss surface of a PhD with patience and trust.”

This combination of intellectual freedom, guidance, and community support shaped not only his research, but his confidence in pursuing ambitious ideas.

Now working as a research scientist at Netflix, Mehta continues to apply his expertise in generative modeling, multimodal AI, and human-centered machine learning. His journey from doctoral research to real-world deployment reflects WASP’s mission to connect foundational science with societal and industrial impact.

“Thank you WASP and thank you everyone who is responsible for creating such an environment,” he concluded, “where ambitious research isn’t just allowed, but encouraged.”


Published: January 19th, 2026

[addtoany]

Latest news

We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. View more
Cookies settings
Accept
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active
The WASP website wasp-sweden.org uses cookies. Cookies are small text files that are stored on a visitor’s computer and can be used to follow the visitor’s actions on the website. There are two types of cookie:
  • permanent cookies, which remain on a visitor’s computer for a certain, pre-determined duration,
  • session cookies, which are stored temporarily in the computer memory during the period under which a visitor views the website. Session cookies disappear when the visitor closes the web browser.
Permanent cookies are used to store any personal settings that are used. If you do not want cookies to be used, you can switch them off in the security settings of the web browser. It is also possible to set the security of the web browser such that the computer asks you each time a website wants to store a cookie on your computer. The web browser can also delete previously stored cookies: the help function for the web browser contains more information about this. The Swedish Post and Telecom Authority is the supervisory authority in this field. It provides further information about cookies on its website, www.pts.se.
Save settings
Cookies settings