Stefan Stojanovic’s journey from theory to practice

At KTH, WASP PhD student Stefan Stojanovic explores how self-supervised reinforcement learning could shape the future of AI.

“At some point, I wanted my research to contribute more directly—to make a change beyond just being mathematically beautiful.”

When Stefan Stojanovic began his PhD at KTH Royal Institute of Technology, his path seemed set in the direction of theory. With a strong background in statistics, he was drawn to the elegance of proofs, guarantees, and the clean world of abstract mathematics. “For a time, it was enough for me that the results were interesting and mathematically novel,” he says.

But as his doctoral journey unfolded, he began seeking something more. He wanted work that could touch the real world. That shift led him to study self-supervised reinforcement learning (RL), a research direction he believes can shape how machines learn and interact with humans.

The promise of reinforcement learning

Traditional machine learning, he says, is like handing a neural agent a book full of labeled pictures of cats and dogs. You have the agent study it, memorize the differences and similarities between the two different species, and then answer the question ‘What is a cat? What is a dog?’

Reinforcement learning is different: “It’s like being given a bicycle and told, ‘Go learn how to ride.’ You’ll probably fall a few times, then try something new, then fall again—until eventually, you succeed.”

The challenge is that unlike with fixed datasets, an RL agent must decide what data to collect and how to use it. That makes the problem both general and difficult. Early on in his PhD research, Stefan focused on theoretical guarantees—mathematical proofs of how well an RL algorithm could be expected to perform. But theory alone began to feel limiting.

“I realized you cannot have both something that is mathematically rigorous and something that performs competitively in practice,” he says. “At some stage I wanted my research to make a change, even if it was a small one.”

Why self-supervision matters to reinforcement learning

In standard RL, the agent has a defined task and gets feedback when it performs well. Stefan illustrates it with a simple example: “Imagine one agent spends all day learning to change a car tire because that’s the assigned task, and another spends all day learning how to close a window. At the end of the day, each knows exactly one thing.”

Self-supervised RL is different. Here, the agent is told simply to explore. It picks up objects, opens windows, tests doors—without being told what matters. Later, when asked to do something specific, it can draw on this broad experience. The goal is to encourage zero-shot learning – so in the face of a new situation, the agent can recognize similarities and potential solutions to problems without needing to explicitly practice for that situation.

Part of this self-supervision is teaching agents to ignore distractions. He says humans take this ability for granted: “If you ride a bike past a tree, you know it’s the same tree when you come around again. You don’t think it’s a new tree.”

Agents, by contrast, often struggle to know which details matter. If they’re asked to open a drawer and retrieve an object, they may get stuck on unnecessary information, like the color of the drawer. Early experiments ask them to disregard such trivial information, but that required a human to tell them what to ignore. Stefan’s current work aims to have the agent figure that out on its own.

“You want the model to learn by itself what information is irrelevant, so it can focus on the essence of a task,” he says.

Stefan Stojanovic — Stefan at Princeton.

Research across borders

Stefan’s views have been shaped not only by algorithms, but also by geography. His academic path has taken him from Serbia to Switzerland, on to Sweden, and across the Atlantic to the United States—each stop leaving its mark on how he thinks about science.

In Serbia, the emphasis was very much on teaching; research was something of a secondary pursuit. Moving to Switzerland opened a new world, his first real encounter with research as a central activity, and he found himself caught somewhere between the rigors of academia and the excitement of discovery. Sweden, by contrast, brought a distinctly different flavor: here, he noticed how often the conversation turned to innovation and how research might translate into real-world applications.

And then he went to Princeton for a WASP research stint. “I was in a place where everyone around me was constantly talking about science,” he recalls. “That atmosphere stays with you. When I came back, I wanted to energize KTH in the same way.” The months in the U.S. reminded him how much the culture around research matters—how inspiring it can be to be surrounded by people who live and breathe scientific exchange.

Back in Sweden, he has tried to carry that spirit forward. Earlier this year, he stepped into an unexpected leadership role, organizing WASP’s Sequential Decision-Making and Reinforcement Learning cluster, which now gathers almost 20 researchers from across the country.

Looking forward

For Stefan, the PhD is both an education and a journey of self-discovery. “Even if the results don’t show immediate real-world impact, I see this work as a way to become a better researcher, a better engineer,” he reflects. He envisions a future where reinforcement learning combines with other branches of AI—such as the reasoning capabilities of large language models—to tackle complex challenges in areas like healthcare. While he is cautious about the grand label of “artificial general intelligence,” he sees his work as part of a broader movement toward more adaptable and capable systems.

And the day-to-day motivation? That part is refreshingly simple: “I want to work on something that makes me excited to come in every day, something that feels inspiring. For me, self-supervised reinforcement learning offers that spark.”

Published: September 25th, 2025

[addtoany]

Stefan Stojanovic’s journey from theory to practice

The promise of reinforcement learning

Why self-supervision matters to reinforcement learning

Research across borders

Looking forward

Latest news

From visits to leading universities to embassy dinner: WASP International Study Trip to Singapore

Open Call: WASP and WASP-HS Project Call

Belgioioso awarded prize for work on smarter infrastructure systems

Subscribe to our newsletter