Files

Abstract

In this thesis we explore the applications of projective geometry, a mathematical theory of the relation between 3D scenes and their 2D images, in modern learning-based computer vision systems. This is an interesting research question which contradicts the recent trend to forgo such domain knowledge in favor of learning everything directly from data. We show how to use these robust mathematics where applicable while maximally leveraging data for the remaining aspects. The thesis extends three peer-reviewed papers. In the first, we introduce an algorithm to extract local image features, a technique of matching related regions across images. Unlike in standard supervised learning, we do not define the features through examples but rather their desired properties. We leave it to the training procedure to find a conforming algorithm. This shows an application of projective geometry for supervision of neural networks. We then turn to two cases of using projective geometry in the network architecture. In one, we present a method to deduce indoor scene layouts from video walkthroughs. We constrain the Transformer, a computationally intensive task-agnostic learning system, by using relevant geometry to significantly reduce its processing time and enhance memory efficiency. In the last paper, we address the challenge of reversing the 3D-to-2D projection in a generative setting. By offering multiple potential 3D reconstructions based on a 2D view, we acknowledge the inherent uncertainties of this inversion. Each chapter provides a thorough review of existing literature and outlines potential avenues for future research in the domain.

Details

PDF