Welcome to Doctoral Defense of Gustav Häger, Computer Vision Laboratories, ISY
Titel: Learning Visual Perception for Autonomous Systems
Försvarare: Gustav Häger, Linköping University
Handledare: Michael Felsberg, Linköping University
Opponent: Roman Pflugfelder, Technische Universität Wien och Austrian Institute of Technology
Plats: Ada Lovlace, Zoom: https://liu-se.zoom.us/j/65502138405
In the last decade, developments in hardware, sensors and software have made it possible to create increasingly autonomous systems. These systems can be as simple as limited driver assistance software lane-following in cars, or limited collision warning systems for otherwise manually piloted drones. On the other end of the spectrum exist fully autonomous cars, boats or helicopters. With increasing abilities to function autonomously, the demands to operate with minimal human supervision in unstructured environments increase accordingly.
Common to most, if not all, autonomous systems is that they require an accurate model of the surrounding world. While there is currently a large number of possible sensors useful to create such models available, cameras are one of the most versatile. From a sensing perspective cameras have several advantages over other sensors in that they require no external infrastructure, are relatively cheap and can be used to extract such information as the relative positions of other objects, their movements over time, create accurate maps and locate the autonomous system within these maps.
Using cameras to produce a model of the surroundings require solving a number of technical problems. Often these problems have a basis in recognizing that an object or region of interest is the same over time or in novel viewpoints. In visual tracking this type of recognition is required to follow an object of interest through a sequence of images. In geometric problems it is often a requirement to recognize corresponding image regions in order to perform 3D reconstruction or localization.
The first set of contributions in this thesis is related to the improvement of a class of on-line learned visual object trackers based on discriminative correlation filters. In visual tracking estimation of the objects size is important for reliable tracking, the first contribution in this part of the thesis investigates this problem. The performance of discriminative correlation filters is highly dependent on what feature representation is used by the filter. The second tracking contribution investigates the performance impact of different features derived from a deep neural network.
A second set of contributions relate to the evaluation of visual object trackers. The first of these are the visual object tracking challenge. This challenge is a yearly comparison of state-of-the art visual tracking algorithms. A second contribution is an investigation into the possible issues when using bounding-box representations for ground-truth data.
In real world settings tracking typically occur over longer time sequences than is common in benchmarking datasets. In such settings it is common that the model updates of many tracking algorithms cause the tracker to fail silently. For this reason it is important to have an estimate of the trackers performance even in cases when no ground-truth annotations exist. The first of the final three contributions investigates this problem in a robotics setting, by fusing information from a pre-trained object detector in a state-estimation framework. An additional contribution describes how to dynamically re-weight the data used for the appearance model of a tracker. A final contribution investigates how to obtain an estimate of how certain detections are in a setting where geometrical limitations can be imposed on the search region. The proposed solution learns to accurately predict stereo disparities along with accurate assessments of each predictions certainty.