Incorporating Projective Geometry into Deep Learning

Tyszkiewicz, Michal Jan

doi:10.5075/epfl-thesis-10538

Tyszkiewicz, Michal Jan

2024

Download

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DublinCore
EndNote
NLM
RefWorks
RIS

Files

Abstract

In this thesis we explore the applications of projective geometry, a mathematical theory of the relation between 3D scenes and their 2D images, in modern learning-based computer vision systems. This is an interesting research question which contradicts the recent trend to forgo such domain knowledge in favor of learning everything directly from data. We show how to use these robust mathematics where applicable while maximally leveraging data for the remaining aspects. The thesis extends three peer-reviewed papers. In the first, we introduce an algorithm to extract local image features, a technique of matching related regions across images. Unlike in standard supervised learning, we do not define the features through examples but rather their desired properties. We leave it to the training procedure to find a conforming algorithm. This shows an application of projective geometry for supervision of neural networks. We then turn to two cases of using projective geometry in the network architecture. In one, we present a method to deduce indoor scene layouts from video walkthroughs. We constrain the Transformer, a computationally intensive task-agnostic learning system, by using relevant geometry to significantly reduce its processing time and enhance memory efficiency. In the last paper, we address the challenge of reversing the 3D-to-2D projection in a generative setting. By offering multiple potential 3D reconstructions based on a 2D view, we acknowledge the inherent uncertainties of this inversion. Each chapter provides a thorough review of existing literature and outlines potential avenues for future research in the domain.

Details

Title Incorporating Projective Geometry into Deep Learning

Author(s) Tyszkiewicz, Michal Jan

Advisor(s)

Fua, Pascal

Pagination 97

Date 2024

Publisher Lausanne, EPFL

Keywords

computer vision; 3D vision; projective geometry; transformers; diffusion models; reinforcement learning; local features; object detection; point clouds

Language English

DOI https://doi.org/10.5075/epfl-thesis-10538

Laboratories CVLAB

Record Appears in Scientific production and competences > I&C - School of Computer and Communication Sciences > IINFCOM > CVLAB - Computer Vision Laboratory
Scientific production and competences > Euler Center for Signal Processing
Scientific production and competences > EPFL Theses
Work produced at EPFL
Published
Theses

Record creation date 2024-01-15

Files

Abstract

Details

PDF