Stefan Stojanovic in New York City during his time at Princeton.

At KTH, WASP PhD student Stefan Stojanovic explores how self-supervised reinforcement learning could shape the future of AI.

“At some point, I wanted my research to contribute more directly—to make a change beyond just being mathematically beautiful.”

When Stefan Stojanovic began his PhD at KTH Royal Institute of Technology, his path seemed set in the direction of theory. With a strong background in statistics, he was drawn to the elegance of proofs, guarantees, and the clean world of abstract mathematics. “For a time, it was enough for me that the results were interesting and mathematically novel,” he says.

But as his doctoral journey unfolded, he began seeking something more. He wanted work that could touch the real world. That shift led him to study self-supervised reinforcement learning (RL), a research direction he believes can shape how machines learn and interact with humans.

The promise of reinforcement learning

Traditional machine learning, he says, is like handing a neural agent a book full of labeled pictures of cats and dogs. You have the agent study it, memorize the differences and similarities between the two different species, and then answer the question ‘What is a cat? What is a dog?’

Reinforcement learning is different: “It’s like being given a bicycle and told, ‘Go learn how to ride.’ You’ll probably fall a few times, then try something new, then fall again—until eventually, you succeed.”

The challenge is that unlike with fixed datasets, an RL agent must decide what data to collect and how to use it. That makes the problem both general and difficult. Early on in his PhD research, Stefan focused on theoretical guarantees—mathematical proofs of how well an RL algorithm could be expected to perform. But theory alone began to feel limiting.

“I realized you cannot have both something that is mathematically rigorous and something that performs competitively in practice,” he says. “At some stage I wanted my research to make a change, even if it was a small one.”

Why self-supervision matters to reinforcement learning

In standard RL, the agent has a defined task and gets feedback when it performs well. Stefan illustrates it with a simple example: “Imagine one agent spends all day learning to change a car tire because that’s the assigned task, and another spends all day learning how to close a window. At the end of the day, each knows exactly one thing.”

Self-supervised RL is different. Here, the agent is told simply to explore. It picks up objects, opens windows, tests doors—without being told what matters. Later, when asked to do something specific, it can draw on this broad experience. The goal is to encourage zero-shot learning – so in the face of a new situation, the agent can recognize similarities and potential solutions to problems without needing to explicitly practice for that situation.

Part of this self-supervision is teaching agents to ignore distractions. He says humans take this ability for granted: “If you ride a bike past a tree, you know it’s the same tree when you come around again. You don’t think it’s a new tree.”

Agents, by contrast, often struggle to know which details matter. If they’re asked to open a drawer and retrieve an object, they may get stuck on unnecessary information, like the color of the drawer.  Early experiments ask them to disregard such trivial information, but that required a human to tell them what to ignore. Stefan’s current work aims to have the agent figure that out on its own.

“You want the model to learn by itself what information is irrelevant, so it can focus on the essence of a task,” he says.

Stefan Stojanovic
Stefan at Princeton.

Research across borders

Stefan’s views have been shaped not only by algorithms, but also by geography. His academic path has taken him from Serbia to Switzerland, on to Sweden, and across the Atlantic to the United States—each stop leaving its mark on how he thinks about science.

In Serbia, the emphasis was very much on teaching; research was something of a secondary pursuit. Moving to Switzerland opened a new world, his first real encounter with research as a central activity, and he found himself caught somewhere between the rigors of academia and the excitement of discovery. Sweden, by contrast, brought a distinctly different flavor: here, he noticed how often the conversation turned to innovation and how research might translate into real-world applications.

And then he went to Princeton for a WASP research stint. “I was in a place where everyone around me was constantly talking about science,” he recalls. “That atmosphere stays with you. When I came back, I wanted to energize KTH in the same way.” The months in the U.S. reminded him how much the culture around research matters—how inspiring it can be to be surrounded by people who live and breathe scientific exchange.

Back in Sweden, he has tried to carry that spirit forward. Earlier this year, he stepped into an unexpected leadership role, organizing WASP’s Sequential Decision-Making and Reinforcement Learning cluster, which now gathers almost 20 researchers from across the country.

Looking forward

For Stefan, the PhD is both an education and a journey of self-discovery. “Even if the results don’t show immediate real-world impact, I see this work as a way to become a better researcher, a better engineer,” he reflects. He envisions a future where reinforcement learning combines with other branches of AI—such as the reasoning capabilities of large language models—to tackle complex challenges in areas like healthcare. While he is cautious about the grand label of “artificial general intelligence,” he sees his work as part of a broader movement toward more adaptable and capable systems.

And the day-to-day motivation? That part is refreshingly simple: “I want to work on something that makes me excited to come in every day, something that feels inspiring. For me, self-supervised reinforcement learning offers that spark.”


Published: September 25th, 2025

Latest news

We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. View more
Cookies settings
Accept
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active
The WASP website wasp-sweden.org uses cookies. Cookies are small text files that are stored on a visitor’s computer and can be used to follow the visitor’s actions on the website. There are two types of cookie:
  • permanent cookies, which remain on a visitor’s computer for a certain, pre-determined duration,
  • session cookies, which are stored temporarily in the computer memory during the period under which a visitor views the website. Session cookies disappear when the visitor closes the web browser.
Permanent cookies are used to store any personal settings that are used. If you do not want cookies to be used, you can switch them off in the security settings of the web browser. It is also possible to set the security of the web browser such that the computer asks you each time a website wants to store a cookie on your computer. The web browser can also delete previously stored cookies: the help function for the web browser contains more information about this. The Swedish Post and Telecom Authority is the supervisory authority in this field. It provides further information about cookies on its website, www.pts.se.
Save settings
Cookies settings