Welcome to Doctoral Defense of Abdelrahman Eldesokey, ISY, Linköping University
Titel: Uncertainty-Aware Convolutional Neural Networks for Vision Tasks on Sparse Data
Försvarare: Abdelrahman Eldesokey
Handledare: Prof. Michael Felsberg, Linköpings Universitet
Opponent: Prof. Richard Bowden, University of Surrey, United Kingdom
Early computer vision algorithms operated on dense 2D images captured using conventional monocular or color sensors. Those sensors embrace a passive nature providing limited scene representations based on light reflux, and are only able to operate under adequate lighting conditions. These limitations hindered the development of many computer vision algorithms that require some knowledge of the scene structure under varying conditions. The emergence of active sensors such as Time-of-Flight (ToF) cameras contributed to mitigating these limitations; however, they gave a rise to many novel challenges, such as data sparsity that stems from multi-path interference, and occlusion.
Many approaches have been proposed to alleviate these challenges by enhancing the acquisition process of ToF cameras or by post-processing their output. Nonetheless, these approaches are sensor and model specific, requiring an individual tuning for each sensor. Alternatively, learning-based approaches, i.e., machine learning, are an attractive solution to these problems by learning a mapping from the original sensor output to a refined version of it. Convolutional Neural Networks (CNNs) are one example of powerful machine learning approaches and they have demonstrated a remarkable success on many computer vision tasks. Unfortunately, CNNs naturally operate on dense data and cannot efficiently handle sparse data from ToF sensors.
In this thesis, we propose a novel variation of CNNs denoted as the Normalized Convolutional Neural Networks that can directly handle sparse data very efficiently. First, we formulate a differentiable normalized convolution layer that takes in sparse data and a confidence map as input. The confidence map provides information about valid and missing pixels to the normalized convolution layer, where the missing values are interpolated from their valid vicinity. Afterwards, we propose a confidence propagation criterion that allows building cascades of normalized convolution layers similar to the standard CNNs. We evaluated our approach on the task of unguided scene depth completion and achieved state-of-the-art results using an exceptionally small network.
As a second contribution, we investigated the fusion of a normalized convolution network with standard CNNs employing RGB images. We study different fusion schemes, and we provide a thorough analysis for different components of the network. By employing our best fusion strategy, we achieve state-of-the-art results on guided depth completion using a remarkably small network.
Thirdly, to provide a statistical interpretation for confidences, we derive a probabilistic framework for the normalized convolutional neural networks. This framework estimates the input confidence in a self-supervised manner and propagates it to provide a statistically valid output confidence. When compared against existing approaches for uncertainty estimation in CNNs such as Bayesian Deep Learning, our probabilistic framework provides a higher quality measure of uncertainty at a significantly lower computational cost.
Finally, we attempt to employ our framework in a common task in CNNs, namely upsampling. We formulate the upsampling problem as a sparse problem, and we employ the normalized convolutional neural networks to solve it. In comparison to existing approaches, our proposed upsampler is structure-aware while being light-weight. We test our upsampler with various optical flow estimation networks, and we show that it consistently improves the results. When integrated with a recent optical flow network, it sets a new state-of-the-art on the most challenging optical flow dataset.