Developing new techniques for improved generation of three-dimensional scene models of the visual world.

A longstanding goal of artificial intelligence is to recover a representation of the three-dimensional world which is semantically meaningful and geometrically faithful given sensory input data. Recent work has taken a big step towards this goal by showing that such representations can be learned, and it is now possible to automatically generate photorealistic imagery from an observed scene. This project will go further towards developing new theory and algorithms that would make it possible, not only to recognize and reconstruct a semantically and geometrically meaningful representation, but also to manipulate, modify and automatically complete a three-dimensional scene model of the visual world.

Long term impact

From the long­term perspective, the approach to modify, auto­complete and generate 3D scenes will be important for several Swedish companies, e.g., in the growing AR and VR sector. Volvo Cars and Zenseact are interested in efficiently generating realistic 3D training data to test systems for autonomous driving and driver­ assistance and H&M aims to create plausible compositions for deformable garments. The results will improve sustainable and environmentally friendly technological development. For instance, instead of overproducing products that might not be wanted by the customers, companies could generate realistic and dynamic 3D models to forecast demand.

This project will develop new theory and algorithms for general-­purpose neural network-based scene models, focusing on disentanglement, generalization and invariance which will enable 3D controllable models that incorporate dynamics and deformation. From a scientific viewpoint, this is an emerging field and so far, to a considerable extent an unexplored research area.

The goal to modify, auto­complete and generate dynamic 3D scene models is an emerging field, and a better understanding and practical usage of disentangled representations are highly sought after, putting this WASP NEST at the international forefront. If the project will be successful, this would lead to a paradigm shift as 3D models will be more ubiquitous, flexible and can be used for extrapolation, and not just interpolation.

The team of PIs consists of four renowned researchers with complementary skills:

Fredrik Kahl is a professor in computer vision and image analysis and one of the pioneers in developing modern optimization techniques for applications in computer vision.

Cristian Sminchisescu is a professor of applied mathematics at Lund University, with specialization in computer vision (semantic segmentation, human sensing) and machine learning (structured prediction).

Kathlén Kohn is a WASP­AI/Math assistant professor with main research expertise in algebraic geometry, working on the mathematical foundations in computer vision, machine learning, optimization, and statistical inference.

Mårten Björkman is an associate professor at the division of Robotics, Perception and Learning at KTH with a long experience in self­learning robotic systems using latent embeddings for both sensor and action spaces.

Scientific presentation


In recent years, generative neural network models for creation of photo­realistic images have become increasingly popular, as a result of its many practical applications. It is now possible to automatically generate photo­realistic imagery from an observed scene using an encoder­-decoder model. Generative models are trained using large sets of examples from which the commonalities and variation between examples are learned. The results is a low­ dimensional latent space representation of a distribution from which new examples can be drawn and images be generated. However, for these generative models to be of practical use, they need to provide some degree of controllability. One way of doing this is to ensure that the latent representation is disentangled, so that different qualities of the resulting images are kept separate. For example, the shape of an object can be separated from its material properties, the viewing direction and the overall illumination. With such a representation, new images can be created by either directly controlling the qualities or by mixing qualities from different sources, such as taking the material of a tablecloth and replacing that of a dress.


This project aims to develop new theory and algorithms that would make it possible, not only to recognize and reconstruct a semantically and geometrically meaningful representation, but also to manipulate, modify and automatically complete a three­-dimensional scene model of the visual world.


This project will exploit recent research on so ­called neural radiance fields to produce implicit three­-dimensional representations of objects and scenes, representations from which images can easily be rendered from arbitrary viewpoints. Such representations will simplify manipulation and make it more intuitive, while improving the robustness, interpretability, and generalization of models.

At the core of the project is the encoder-­decoder model (Figure 1). The underlying research challenge addressed is to improve the robustness, interpretability, controllability, and generalization performance for models of this type.

Figure 1: At the core of the project is the encoder­decoder model. In this illustration, a set of images and the output is a controllable 3D scene representation that can be used to render novel views under different conditions. Towards these ends, the research is structured into four work packages.

The research is structured into two parts and each part has two work packages (WPs, Figure 1).

  1. 3D perception and scene understanding.

WP1.1: Embeddings and disentanglement.

WP1.2: Generalization and invariance.


  1. Manipulation and rendering based on flexible and controllable generative models.

WP2.1: Dynamics and deformation.

WP2.2: Controllable generative models.


The first part, 3D perception and scene understanding, is devoted to an area which has seen important advances in recent years and where the PIs have made significant contributions. Still, there is new methodology which is required to make the second part realizable. The second part, Manipulation and rendering based on flexible and controllable generative models focuses on a new emerging area which is largely unexplored. A more in-depth description of the methodology can be found at the project website below.

NEST environment

The project will be performed in close collaboration between academy and industry, to take the developed research and integrate it into industrially relevant prototypes. Focus will be on two scenarios:

Outdoor environments:

  • Industrial collaborators: Volvo Cars and Zenseact.
  • Hotspot: Chalmers, Göteborg.
  • Demonstrator: System for efficient generation of varied validation data for improved testing methodology of sensors and training data for deep learning algorithms used in autonomous driving.

Indoor environments:

  • Industrial collaborators: Wallvision and H&M.
  • Hotspot: KTH, Stockholm.
  • Demonstrator: 3D indoor mostly still life scene reconstruction and navigation for environments where some deformation is present in the form of oscillating flowers, curtains blown by the wind, etc. There will also be a collaboration with Wallvision, for improved AR technology targeting automatic replacement of wallpaper in indoor scenes.


Fredrik Kahl

Professor, Department of Electrical Engineering, Chalmers University of Technology

Kathlén Kohn

Assistant Professor, Department of Mathematics, KTH Royal Institute of Technology

Mårten Björkman

Associate Professor, Division of Robotics, Perception and Learning, KTH Royal Institute of Technology

Cristian Sminchisescu

Professor, Mathematics, Faculty of Engineering, Lund University
We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. View more
Cookies settings
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active
The WASP website uses cookies. Cookies are small text files that are stored on a visitor’s computer and can be used to follow the visitor’s actions on the website. There are two types of cookie:
  • permanent cookies, which remain on a visitor’s computer for a certain, pre-determined duration,
  • session cookies, which are stored temporarily in the computer memory during the period under which a visitor views the website. Session cookies disappear when the visitor closes the web browser.
Permanent cookies are used to store any personal settings that are used. If you do not want cookies to be used, you can switch them off in the security settings of the web browser. It is also possible to set the security of the web browser such that the computer asks you each time a website wants to store a cookie on your computer. The web browser can also delete previously stored cookies: the help function for the web browser contains more information about this. The Swedish Post and Telecom Authority is the supervisory authority in this field. It provides further information about cookies on its website,
Save settings
Cookies settings