1 Introduction

Consensus maximization is a powerful tool in computer vision that has enabled practical applications of highly complex algorithms such as Structure-from-Motion (SfM) [1,2,3] to work despite incorrect measurements and noise. Apart from heuristic strategies such as Random Sampling and Consensus (RANSAC) [4], globally optimal consensus maximizers [5,6,7,8,9,10,11] have been widely studied for rigid shapes, where there exists a simple analytical transformation between two sets of measurements. In contrast, such tools have not been explored in earnest for the model-free scenario, where simple analytical transformation models cannot explain the measurements. An important field where model-free approaches are needed is in non-rigid shape registration. Consensus maximization in non-rigid shapes have applications in augmented reality, object animations and shape analysis, among others.

While a large number of works have tackled non-rigid registration problem between images or shapes [12,13,14,15,16], little attention has been given to identifying outliers in matching correspondences. A few methods solve the problem in the images of non-rigid shapes [17, 18] and between a template shape and an image [19] through locally optimal approaches. The difficulty of assigning a suitable minimal parameter model to non-rigid transformations makes it highly challenging to devise a consensus maximizer.

In this paper, we propose a common framework of seeking consensus in a model-free correspondence set. Our key idea is that despite lacking a model which can explain each instance in a matching set individually, one can consider the agreement between two or more instances using certain rules to formulate constraints. In non-rigid shapes, a rule widely applied for reconstruction and registration is the isometric deformation prior. Isometry implies that the geodesic distances are preserved despite deformations. Using these theoretical understandings, we provide our contributions in three different aspects. First we show how a model-free consensus maximization problem can be posed as a graph problem and solved as an Integer Program if we have inlier/outlier rules on the matching sets. Such an Integer Program can be solved optimally using a BnB approach. Second, we apply this formulation for removing outliers in non-rigid shape correspondences under the isometry prior. We show that our method can handle as much as 80% outlier correspondences on isometric surfaces. We provide extensive experiments on several isometric and partial shapes, as well as ‘loosely’ isometric partial inter-subject human shapes, where we obtain results that improve over the state-of-the-art methods. To show the generic nature of the introduced consensus maximizer, we also formulate a 3D template-to-image outlier removal problem using the piecewise rigidity and smoothness prior. We conduct extensive experiments in order to analyze the behavior of the proposed algorithms and to compare with the state-of-the-art methods.

2 Related Work

We briefly highlight the related works that are relevant to non-rigid registration problems. The first problem is that of maximizing consensus between matched 3D surface points in non-rigid 3D shapes using the isometry prior. Isometry is a widely used prior in registration [14, 15, 20,21,22] as well as 3D reconstruction [23, 24]. Most non-rigid shape registration methods [15, 20,21,22] start with a 3D descriptor such as the SHOT descriptor [25] or heat kernels [26] and establish correspondences between shapes through energy minimization. Others compute the registration by blending conformal maps [14, 27]. Any such matching method results in good and bad matches. In the following sections, we study how the outlier matches from various methods can be removed in practical cases, including complete, partial and inter-subject scenarios.

3D template-to-image matching is yet another important problem in non-rigid shapes that can be used to localize cameras [28] or for template-based 3D reconstruction [19, 24, 29, 30]. Eliminating outlier matches in such cases is addressed in [19] by using a local iterative approach. Most other methods which solve image registration [12, 18] do not use a 3D geometric prior explicitly. We address the problem of consensus maximization in this setting with piece-wise rigidity and smoothness prior. A recent method [16] solves the combinatorial matching problem with similar constraints but does not focus on the problem of identifying outlier matches.

3 Background and Theory

Notations. We represent sets and graphs as special Latin characters, e.g., \(\mathcal {V}\). We use lowercase Latin letters ij, k or l to represent indices or sets of indices. For example, \(\mathcal {V}_i\) is an element of the set \(\mathcal {V}\). We write known or unknown scalars also in lowercase letters, such as z. We use uppercase bold Latin letters to represent matrices (e.g., \(\mathsf {M}\)) and lowercase bold Latin letters to represent vectors (e.g., \(\mathsf {v}\)). We use lowercase Greek letter \(\epsilon \) to represent thresholds. We use uppercase Greek letters to represent mappings or functions (e.g., \(\varPhi \)). We use \(||. ||\) to denote the \(\ell _2\) – norm and \(\mid . \mid \) to denote the \(\ell _1\) – norm of a vector or the cardinality of a set. Unless stated otherwise, we write primed letters to represent quantities related to the transformed set.

3.1 Outliers

Let \(\varPhi : \varOmega \rightarrow \varOmega '\) be a transformation function between two spatial domains. \(\varPhi \) is related to the matching sets \(\mathcal {P}= \{\mathcal {P}_i: \mathcal {P}_i \in \varOmega ,\ i=1,\dots ,p\}\) and \(\mathcal {P}'= \{\mathcal {P}'_i: \mathcal {P}'_i \in \varOmega ',\ i=1,\dots ,p\}\). In practice, \(\varPhi \) may be a rigid or non-rigid transformation function or such transformations followed by camera projection. Each member \(\mathcal {P}_i\) corresponds to the member \(\mathcal {P}'_{i}\) in the second set. This defines a set of matches \(\mathcal {C} \subset \mathcal {P} \times \mathcal {P}'\) that may contain outliers. The outlier set \(\mathcal {O}\) is defined with a distance function \(\varDelta \) as:

$$\begin{aligned} \forall i\in \{1\dots p\},\quad \varDelta \left( \varPhi (\mathcal {P}_i),\mathcal {P}'_{i}\right) \ge \epsilon \implies i\in \mathcal {O}. \end{aligned}$$
(1)

A correspondence pair \((\mathcal {P}_i, \mathcal {P}'_{i})\), also simply denoted as i, is an outlier if the distance between the mapping of \(\mathcal {P}_i\) and its correspondence \(\mathcal {P}'_{i}\), is greater than the threshold \(\epsilon \).

3.2 Consensus Maximization

Using the definition of outliers in (1), the problem of consensus maximization is defined as the minimization of the cardinality of the set \(\mathcal {O}\) for the unknown \(\varPhi \):

$$\begin{aligned} \begin{aligned}&\underset{\varPhi }{\text {minimize}}\quad |\mathcal {O}| \\&\text {subject to}\quad \varDelta \left( \varPhi (\mathcal {P}_i),\mathcal {P}'_{i}\right) \ge \epsilon \implies i\in \mathcal {O}. \end{aligned} \end{aligned}$$
(2)

Problem (2) implies that we wish to find the mapping \(\varPhi \) which results in the least number of disagreements given by the cardinality of \(\mathcal {O}\), in the given matching set \(\mathcal {C}\). In rigid SfM related problems, \(\varPhi \) can be often expressed using a linear or non-linear function with a small fixed number of parameters. This means that Eq. (1) can be evaluated point-wiseFootnote 1 and also estimated using a very small size of point correspondence set, known as the minimal set. There is no doubt that such problems can be efficiently solved using RANSAC and other globally optimal methods highlighted in Sect. 1. It should be noted that even if \(\varPhi \) can be parameterized, very recently problem (2) was shown to be NP-hard with W[1]-complexity [31, 32], meaning that solving it optimally is very expensive. We call such a problem, when \(\varPhi \) can be parameterized (with a reasonably small number of parameters), as model-based consensus maximization. In the sections below we focus on the model-free case. Note that most formulations on consensus maximization are written as maximization of the inlier set cardinality rather than the minimization of the outlier set cardinality. However, these definitions are equivalent and we choose the latter for convenience.

3.3 Generic Rules-Based Consensus Maximization

In contrast to model-based problems, for many applications such as those related to non-rigid shapes, \(\varPhi \) cannot be represented with a small size of parameters and therefore it cannot be estimated using a minimal point set. As a consequence, \(\varDelta \) cannot be evaluated point-wise. For example, consider the case when \(\varPhi \) represents the mapping between the two instances of a non-rigid surface. Such a map may be represented by Free-Form Deformation (FFD) [18, 33] or specialized latent space models such as SMPL [34] for human body, both requiring a large number of points to fit the latent parameters. In such cases problem (2) is impractical to solve in its original form.

Therefore, we offer an alternative consensus maximization formulation which is easier to solve for a special class of problems. A problem belongs to this special class if the sets \(\mathcal {P}\) and \(\mathcal {P}'\) have a common underlying structure which can be measured using subsets of the match set \(\mathcal {C}\), without explicitly computing the transformation function \(\varPhi \). To obtain a tractable formulation, we define a set of binary variables \(\{z\}, z_i \in \{0,1\}\) and \(i\in \{1,\dots ,p\}\) such that \(z_i=1 \Longleftrightarrow i\in \mathcal {O}\). Let a binary valued function \(\varTheta : (\mathcal {S}_a, \mathcal {S}_b)\rightarrow \{0,1\} \) measure the agreement between two small subsets \(\mathcal {S}_a\), \(\mathcal {S}_b\subset \mathcal {C}\). \(\varTheta \) evaluates to 1 if the subsets \(\mathcal {S}_a\) and \(\mathcal {S}_b\) agree up to some threshold \(\epsilon \) and 0 otherwise. Then the following is an alternative of the original problem (2):

$$\begin{aligned} \begin{aligned}&\underset{\{z\}}{\text {minimize}}\quad \sum _i z_i \\&\text {subject to}\, \\&\varTheta (\mathcal {S}_a,\mathcal {S}_b) = 0\implies \exists (\mathcal {P}_i,\mathcal {P}'_{i})\in \mathcal {S}_a\cup \mathcal {S}_b \ :\ z_i=1, \\&\forall \ (\mathcal {S}_a, \mathcal {S}_b): \mathcal {S}_a\ne \mathcal {S}_b, \end{aligned} \end{aligned}$$
(3)

The function \(\varTheta \) can be thought of as a rule which uses priors on the sets \(\mathcal {P}\) and \(\mathcal {P}'\) to measure the agreement on the matched subsets. The subsets \(\mathcal {S}_a\) and \(\mathcal {S}_b\) sampled from the match set \(\mathcal {C}\), are the minimal sets such that \(\varTheta \) can be evaluated. Problem (3) simply means, in case two subsets chosen on the basis of a prior do not agree with each other, at least one member from the union of those subsets must be an outlier. This is the key idea of our work. Although solving problem (3) optimally does not guarantee an optimal solution for problem (2), the latter is a close relaxation of the former. Therefore solving problem (3) amounts to solving the model-free consensus maximization. Problem (3) is still a combinatorial problem and is NP-hard. In the next section we give more insights into the problem with a graph structure and provide a globally optimal method for solving it with integer programming.

4 Consensus Maximization with a Graph

We represent the union of all samples \(\mathcal {S}_a\) and \(\mathcal {S}_b\) as the nodes and the connection between them as the edges of a graph \(\mathcal {G}=\{\mathcal {V},\mathcal {E}\}\). The node set \(\mathcal {V}\) consists of all unique sampled subsets \(\mathcal {S}_a\) and \(\mathcal {S}_b\). An edge \((\mathcal {S}_a, \mathcal {S}_b)\in \mathcal {E}\) connects the nodes \(\mathcal {S}_a\) and \(\mathcal {S}_b\) and induces the agreement function \(\varTheta (\mathcal {S}_a, \mathcal {S}_b)\). We use the index \(k\in \{1 \dots v\}\) to denote the nodes \(\mathcal {V}\) and the index \(\ l \in \{1 \dots e\}\) to denote the edges \(\mathcal {E}\). Figure 1 illustrates this representation of the problem.

Fig. 1.
figure 1

Graph formulation for consensus maximization. The selected point sets (nodes) are drawn as orange and purple circles in the graph, connected by edges representing the compatibility between the sets. The point clouds are taken from [35].

4.1 Graph Formulation

Given the graph \(\mathcal {G}\), we would still like to compute the original binary variable set \(\{z\}\). With a slight abuse of notations, we define the binary variable set of a node as \(z_k \triangleq \{z_i\}: (\mathcal {P}_i,\mathcal {P}'_{i})\in \mathcal {V}_k\). Similarly we define the binary variable set of an edge as \(z_l \triangleq \{z_i\}:(\mathcal {P}_i,\mathcal {P}'_{i})\in \mathcal {V}_{k_a}\cup \mathcal {V}_{k_b}\) for \(\mathcal {E}_l = (\mathcal {V}_{k_a},\mathcal {V}_{k_b})\). The constraint on the binary variables can then be compactly expressed as:

$$\begin{aligned} \quad \mathsf {\Sigma } z_l + \varTheta (\mathcal {E}_l) \ge 1, \end{aligned}$$
(4)

where \(\mathsf {\Sigma } z_l\) represents the sum of all the elements in the set \(z_l\). Problem (3) with constraint (4) is an example of graph optimization where we need to compute the node properties \(z_k\) for each node k using the edge measurements \(\varTheta (\mathcal {E}_{l})\).

4.2 Integer Programming

Using the constraint of (4) in a graph, we propose an efficient way to solve the consensus maximization problem, under the framework of Integer Programming, as:

$$\begin{aligned} \begin{aligned}&\underset{\{z\}}{\text {minimize}}\quad \sum z_i \\&\text {subject to} \quad \sum z_l \ge 1,\quad \forall l\in \{1\dots e\},\quad \text {if}\,\, \varTheta (\mathcal {E}_l) = 0. \end{aligned} \end{aligned}$$
(5)

Problem (5) can be optimally solved using any off-the-shelf solver for Integer Programming. This is done using the popular BnB method. Often such problems in consensus maximization are solved using the so-called big M method [36]. Such a method is needed when a binary decision function \(\varTheta \) cannot be defined for a given edge \(\mathcal {E}_l\). In that case, the integer inequality in problem (5) is written as \(M \sum z_l + \epsilon \ge \varLambda (\mathcal {E}_l)\) using the scalar-valued function \(\varLambda \) and a scalar threshold \(\epsilon \). Here, M is a chosen large scalar number that makes the problem feasible when \(\varLambda \) is large. However, in this work we consider only those problems that can be expressed with a binary rule \(\varTheta \).

Relaxed Alternatives and BnB. Integer programming problems are generally non-convex in nature. They can be simplified by further relaxing the binary or integer constraint with real bounds. In contrast, we opt for the BnB approach keeping the integer constraint in order to obtain a globally optimal solution even in case of high outlier ratio. Such an approach computes the lower and upper bound of the cost iteratively and terminates with a certificate of \(\epsilon \) sub-optimality if they are equal. We compare the relaxed and the globally optimal methods in Sect. 6. In the next section, we describe two different problems in non-rigid shapes which can be expressed in the form of problem (5).

5 Non-Rigid Shapes

Non-rigid objects have deformations that cannot be parameterized with a small fixed set of parameters. Nevertheless, they do obey some shape priors. We provide our methods for two problems in non-rigid shapes below, based on such deformation priors.

5.1 Shape Matching with Isometry

We consider two different shapes \(\mathcal {P}\) and \(\mathcal {P}'\) related by an unknown deformation \(\varPhi \). We want to establish the set of outlier points \(\mathcal {O}\) on the matching set \(\mathcal {C}\). Such problems may arise, for example, when registering 3D non-rigid surfaces using image matches [28] or when registering different shapes with a 3D feature point descriptor [25, 26]. In order to solve the problem, we consider the isometric deformation prior which assumes that the surface distances are preserved under deformations. The prior allows us to use the following graph attributes:

$$\begin{aligned} \begin{aligned}&\mathcal {V}_k = (\mathcal {P}_i, \mathcal {P}'_{i})\\&\varTheta (\mathcal {E}_l)= {\left\{ \begin{array}{ll} 1 \quad &{} \text {if}\quad ||\varPsi (\mathcal {P}_{i_1},\mathcal {P}_{i_2}) - \varPsi (\mathcal {P}'_{i_1},\mathcal {P}'_{i_2})||\le \epsilon \\ 0 \quad &{} \text {otherwise}. \end{array}\right. } \end{aligned} \end{aligned}$$
(6)

where \(\varPsi \) is the function that measures the geodesic distance between two points on a surface. Each graph node consists of a single matching pair in \(\mathcal {C}\). Therefore each constraint in (6) obtained for an edge consists of only two binary variables, making the problem highly sparse. Although, we only show the problem formulation using isometry, other deformation priors such as conformality may be used in problem (6).

Practical Considerations. While the method works perfectly for isometric surfaces, objects which are undergoing topological changes such as a tearing piece of paper or loosely isometric surfaces such as a human body pose additional difficulty, as isometry is not always satisfied in such cases. We therefore provide a more practical approach to solve the problem in Algorithm 1. In Algorithm 1, separately applying the method for different clusters also addresses the non-linear time complexity of the integer programming problem. This allows us to use the method in dense point surfaces as the time complexity with the number of clusters is always linear. To estimate the geodesics, we compute a mesh by Delaunay triangulation when a mesh is not provided.

figure a

5.2 Template to Image Matching

Template-based reconstruction is a well-studied problem [19, 23, 24, 30] which uses matches between the template shape \(\mathcal {P}\) and the deformed shape’s image \(\mathcal {I}\) to reconstruct the deformed surface. Again, the matches established may consist of outliers, in which case the reconstruction obtained can be of poor quality. Here, we propose the use of piece-wise rigidity and surface smoothness as the priors to define the agreement function \(\varTheta \). Despite non-rigidity, surface smoothness has been successfully used in the state-of-the-art template-based reconstruction methods [19, 30]. We use a similar approach by considering that the relative camera to object pose changes smoothly over the surface. Using these priors we define the graph attributes as follows:

$$\begin{aligned} \begin{aligned}&\mathcal {V}_{k_1} = \{(\mathcal {P}_{i}, \mathcal {I}_{i})\}, \quad i=\{i_1, i_2, i_3\},\quad i_1, i_2, i_3 \in \{1\dots p \} \\&\mathcal {V}_{k_2} \in \mathcal {N}(\mathcal {V}_{k_1})\\&\varTheta (\mathcal {E}_l) = {\left\{ \begin{array}{ll} 1&{}\quad \text {if}\quad \varDelta \left( \mathsf {R}^\top _{k_1}, \mathsf {R}_{k_2}\right) \le \epsilon _1 \, \, \text {and} \, \, |\mathsf {t}_{k_1} - \mathsf {t}_{k_2}| \le \epsilon _2 \\ 0&{}\quad \text {otherwise}\\ \end{array}\right. } \end{aligned} \end{aligned}$$
(7)
figure b

where \(\mathsf {R}_{k_1}\) and \(\mathsf {R}_{k_2}\) represent the rotations of the absolute pose estimated using the nodes \(\mathcal {V}_{k_1}\) and \(\mathcal {V}_{k_2}\) respectively, for the image \(\mathcal {I}\). We define \(\mathcal {N}(.)\) to be the set valued function giving neighboring nodes in the graph. Similarly \(\mathsf {t}_{k_1}\) and \(\mathsf {t}_{k_2}\) represent the camera translations. The rule \(\varTheta \) measures how well the poses agree for the two nodes. To that end, \(\varDelta \) is the function used to measure the distance between two rotations. We use two hyperparameters \(\epsilon _1\) and \(\epsilon _2\) to threshold the change in rotation and translation respectively. Local rigidity and surface smoothness imply that the poses should also change smoothly. The absolute pose problem can be solved using any of the so-called PnP methods [37,38,39]. We consider only the minimal problem that uses three non-collinear matched points and is also known as the P3P method [37]. The solutions obtained with P3P have a 4-fold ambiguity. This can be disambiguated either by using an additional matching point pair or by simply choosing the solution that minimizes \(\varDelta \). The nodes are sampled such that each edge requires only four unique point matches and therefore each inequality constraint will consist of four binary variables.

Practical Considerations. Piecewise rigidity is a stronger prior compared to isometry. For non-rigid shapes, this holds true only for close neighbors. In contrast to the shape matching problem of 5.1, each edge here requires four point matches. For that reason, it requires the matching set to be dense enough so that the points obey rigidity at least locally. Algorithm 2 describes the implementation of the method. A very naive simplification of Algorithm 2 can be made by considering all points that produce a high number of 1’s in the agreement function \(\varTheta \) to be inliers. We term such a voting method as local filtering which can find obvious inliers in the template-image matching problem.

5.3 Complexity Analysis

The combinatorial complexity of problem (5) depends on four main aspects: the number of points p, the neighborhood size q, the cluster size r and the cardinality of the minimal set required to represent a vertex set \(\mathcal {S}\), say s in the graph (see Fig. 1). The complexity for a single Integer Program as reported in Table 1 can be directly obtained from the combinatorics in graph. Although the template-to-image matching complexity (\(s=3\)) is high, the problem demands only local agreement, which allows us to use a small local neighborhood (\(q=15\)) for creating the vertices (triangles in this case) with on average 30 edges per point. This is not the case in the shape matching and we use a fully-connected graph (\(q = p/r-1\)) on any cluster as the geodesic measurements are valid irrespective of the points’ proximity.

Table 1. Complexity Analysis. Solving for n points and minimal set size s with full connectivity (cluster size \(r=1\)) and q-connectivity (cluster size r).

6 Experimental Results

We present the results and analysis of our proposed methods in this section on several standard datasets. We refer to the integer program based methods as exact or the proposed method. We also compare with the simplified method where the binary constraints in problem (5) have been relaxed to real, which we refer to as the relaxed method. We compare and use several matching or outlier removal methods. We write the spline-warp based image outlier removal method [18] as featds. We write the graph matching method [12] as maxpoolm. We test the template-image outlier removal method based on mesh Laplacian [19] as laplacian. Apart from these image-based methods we also use shape matching methods. We write the recent deformable shape kernel matching method [15] as KM. We write the deep functional map [22] as DFM and the blended intrinsic maps [14] as BIM. We implement our methods in MATLAB with YALMIP [40] and MOSEK [41] for integer programming. Below we describe in detail the experiments for each of the discussed non-rigid registration problems.

Clustering and Threshold Parameters. For some experiments, we apply clustering to handle the high number of point matches. For template-to-image matching and the Hand dataset, the number of point matches is low (\(n<200\)) and therefore the number of clusters is 1. For the human shapes and the newspaper dataset we choose the number of clusters as 5 based on neighborhood (k-means clustering). Note that the result aggregation is straight forward, since the clusters are disjoint. For Fig. 2, to vary the number of points, we randomly sub-sampled the points to a fixed number. Regarding thresholds, we fix \(\epsilon = 20\%\) distance error relative to the template for shape matching (Sect. 6.1) unless stated otherwise. In the template-to-image (Sect. 6.2) matching case, we use \(\epsilon _1 = 10^{\text {o}}\) and \(\epsilon _2 = 40\%\) for all datasets.

6.1 Non-rigid Shape Matching

We begin by analyzing the behavior of the proposed methods on synthetic data where the ground truth correspondences are available for the shape matching problem. We also compare the proposed methods with the state-of-the-art methods on several real datasets. All these are outlined below.

Mocap Data. We test with two cloth-capture data [35]. The datasets consist of a cloth falling (toss) and a moving pair of trousers (stepping trousers). The datasets are generated with mocap and consist of registered real 3D points. We synthetically generate outliers by randomly re-assigning matches to evaluate our methods.

Figure 2 (a) compares the relaxed and exact versions of the proposed method. We observe that, for low outlier ratio, it is possible to remove all the outliers using the relaxed method. However, it breaks down as the percentage of outliers increase beyond 50%, while the exact solution still correctly detects the inliers even in conditions with 80% outliers. Note that the proposed method does not detect any false positive inliers. Figure 2 (b) shows how the exact method behaves with the number of iterations. We observe that the method quickly computes the upper bound cost or the pessimistic inlier set while it takes a while to obtain the certificate of optimality. We find this behavior to be consistent to many other experimental setups. Figure 2 (c) shows the number of open nodes at each iteration, describing how BnB evaluates and prunes branches. To investigate time complexity, we also plot the execution time for the exact method in Fig. 2 (d). It can be observed that the execution time increases with increase in the number of points. However, this is not a problem in practice thanks to the clustering framework presented in Algorithm 1.

Fig. 2.
figure 2

Analysis of our method. Number of inliers detected, convergence of the proposed method, and time taken for the mocap cloth dataset [35] under various setups. Note that the number of iterations in (b) and (c) are in log-scale.

KINECT Newspaper Dataset. The RGB-D data obtained from depth-camera sensors such as KINECT make an important field of application for the method. We investigated our method on the Newspaper datasetFootnote 2 [42]. It consists of a double sheet of newspaper being torn into two parts. Figure 3 shows the inliers and outliers for a part of the template image with our method. Due to the local neighborhood computed using both point sets, the exact method can robustly handle the topological changes. On the other hand, the relaxed method does not work well from lack of enough constraintsFootnote 3.

Fig. 3.
figure 3

Newspaper dataset. Visualization of inlier and outlier matches from our exact and two next best performing methods for an example pair of the Newspaper dataset. Left column shows the inlier detection and the right column shows the outlier detection.

Hand Dataset. The hand dataset [42] consists of two different instances of a hand and their 3D ground truth obtained with SfM. Due to the non-rigid deformation, the detected SIFT correspondences consist of very few matches with a large percentage (more than 70%) of outliers. The shape matching methods [14, 15] completely fail on this dataset and we do not show them here. We show the results of the exact method in Fig. 4 and the next best performing methods in Fig. 5. These qualitative results clearly show that the compared methods do not perform well in such difficult cases.

Fig. 4.
figure 4

SfM Hand dataset. Inlier detections (left) and outlier detections (right) of our exact method.

Fig. 5.
figure 5

Inlier detections with laplacian (left) and relaxed (right).

Human Body Shapes. In the next set of experiments, we use our methods on human body scans from the FAUST [43] dataset. To introduce challenging outliers, we consider a partial matching scenario by cutting out one arm and one leg from the mesh, and matching it to the full one. Thanks to the mesh registrations provided by the dataset, we can exactly evaluate inliers and outliers based on geodesic deviations to the ground truth correspondences (deviations greater than 15 cm are considered as outliers). We compare our relaxed and exact methods against matches estimated by DFM [22], KM [15], and BIM [14]. Although BIM [14] produced visually good correspondences, it suffered from mirror-image ambiguity, that could not be resolved. Therefore we compare to BIM only where proper evaluations were possible.

Since our method is designed for isometric shapes, we conduct the first experiment in the intra-subject case (same subject in 9 different poses). We observe that our method can successfully eliminate more than 90% outliers produced by DFM and KM while removing only a few true inliers, as shown in the first column of Table 2.

In inter-subject body shape matching applications however, the isometry assumption holds only to some extent. We use two challenging datasets to test such scenarios. The first one is on inter-subject matching on the FAUST data, again in the partial matching setting. Since the body shape varies across subjects, isometry doesn’t hold anymore. The results presented in the second column of Table 2 demonstrate that this problem is significantly harder than the isometric matching. We see that the matches resulting from BIM contain outliers that are very hard to detect, and only 15% can be removed without sacrificing many inliers. For DFM and KM, we reliably detect more than 80% and 90% resp., and therefore improve the matching robustness for subsequent tasks.

Our third experiment with human body shapes involves dense correspondence estimation from a depth map to the 3D model. We rendered synthetic depth map mimicking the projection and noise properties of KINECT from an articulated MPII Human Shape model [44] using variations of upright poses and body shapes. To compute the geodesics on this modality, we triangulated the point cloud using 2D Delaunay triangulation. Applying DFM and KM on the raw input does not work well, since SHOT [45] and HKS [46] are not reliable features for depth maps. We therefore take initial matches from a metric regression forest [47] trained on the specific task of dense correspondence estimation. We then compare our methods, KM and ICP on top of these matches in the third column of Table 2. We can conclude that, although provided with initial matches, KM fails to correctly match the two modalities. Our method however shows promising results even though the shapes are non-isometric, and geodesics are computed on the triangulated point cloud. Interestingly, our result is comparable to that of the articulated non-rigid ICP which exploits additional information such as the kinematics and a stronger shape prior. Figure 6 shows a qualitative example from our test set.

In summary, we showed that our method can be used on top of generic matching methods to robustly detect outliers for isometric deformations, and some classes of non-isometric registration such as inter-subject body shapes. Moreover, we can confirm our results on the synthetic data and conclude that even the relaxed method provides good results if the proportion of the outliers is below 50%.

6.2 Template to Image Matching

The template 3D to image matching is an important problem in non-rigid geometry. Most reconstruction methods [19, 30] are sensitive to outlying correspondences and proceed by first removing outliers in matches. We use problem (7) to formulate the template to image outlier removal method using the absolute pose. We test our results on three datasets: KINECT Paper [48], T-Shirt [49] and the MPI Sintel [50] all of which contain the groundtruth 3D data and images. We select a random single pair for each dataset and compute the SIFT matches. We count the number of inliers and outlier matches manually for each of the methods’ output. We compare our methods with three other state-of-the-art methods laplacian, featds and maxpoolm. Similarly, as discussed in Sect. 5.2 we also report the results of the relaxed method. We further report the results of the local-filtering method as another baseline where the inliers are decided based on the local neighborhood voting (Table 3).

Fig. 6.
figure 6

Qualitative results. Non-isometric shape matching from depth map. Left to right: body mesh model [44], RF [47], RF+KM [15], RF+Ours, RF+ICP, input depth map. Correspondences are color-coded, gray indicates removed matches.

Table 2. Non-rigid 3D shape matching. Results on FAUST [43] intra- and inter-subject, as well as matching depth maps to the MPII HumanShape [44] model. We report the number of true positive (inliers) and false positive (remaining outliers) matches.
Table 3. 3D template to image matching. Comparison on three different real datasets.

We test all the methods with favorable parameters. The reported inliers are manually validated. The results show that our method performs in par with laplacian designed exactly for the template-based outlier removal. Note that the exact method consistently detects more number of inliers than other methods. Our method performs better than featds in multi-body situation as featds uses a single spline-based warp and computes the residuals to identify outliers. We visualize the results of outlier removal in Fig. 7 for the proposed method and two other best performing methods: featds and laplacian.

Fig. 7.
figure 7

Inliers (left) vs. Outliers (right) for the T-shirt dataset using the exact method. The performance of our method is better than that of the two compared methods designed for non-rigid matching. More results are provided in the supplementary material.

7 Conclusions and Future Work

In this paper we brought forward a theory on model-free consensus maximization using integer programming and an optimal method to solve it using Branch and Bound. We formulated two different registration problems using our consensus maximizer: isometric shape outlier removal and template-image outlier removal. We obtained very good results at up to 80% mismatches in non-rigid shape registration and >25% mismatches in template-image registration. We obtained these results by solving a close relaxation of the original problem with guaranteed optimality. We showed with extensive experiments that our methods consistently performs on par or better than the existing methods.

Although the focus of this paper was on non-rigid shapes, many vision problems can be converted to formulation 5 with three or less variables per graph node. A non-exhaustive list includes: (i) one variable problems: relative pose on robot navigation [51], (ii) two variable problems: robust triangulation [52] and pure translation estimation [53], and (iii) three variable problems: image to image affine homography and three-view modulus constraints [54]. For future works, we intend to tackle some of these problems using the formulation we developed in this paper.