Exact solutions in low-rank approximation with zeros
Introduction
The best rank-r approximation problem aims to find a real rank-r matrix that minimizes the Euclidean distance to a given real data matrix. The solution of this problem is completely addressed by the Eckart-Young-Mirsky theorem which states that the best rank-r approximation is given by the first r components of the singular value decomposition (SVD) of the data matrix.
We study the structured best rank-r approximation problem, namely we consider additional linear constraints on rank-r matrices. We focus on coordinate subspaces, i.e., linear spaces that are defined by setting some entries to zero. Let denote the indices of zero entries. Given , our optimization problem becomes
Structured low-rank approximation problem has been studied in [4], [20], [21]; see also [13] for low rank approximations with weights. Exact solutions to this problem have been investigated by Golub, Hoffman and Stewart [14], and by Ottaviani, Spaenlehauer and Sturmfels [24].
In [14], rank-r critical points are studied under the constraint that entries in a set of rows or in a set of columns of a matrix stay fixed. This situation is more general than ours in the aspect that the fixed entries are not required to be zero but more restrictive when it comes to the indices of the entries that are fixed. In [24], rank-r critical points restricted to generic subspaces of matrices are studied. In our paper, the linear spaces set some entries equal to zero and hence are not generic. Because of this, we cannot use many powerful tools from algebraic geometry and intersection theory and we have to come up with algebraic and computational techniques that exploit this special structure. For some properties of determinantal ideals of matrices with 0 entries and their relations to problems in graph theory we refer the reader to [9] and references therein. Horobet and Rodriguez study the problem when at least one solution of a certain family of optimization problems satisfies given polynomial conditions, and address the structured low-rank approximation as a particular case [17, Example 15].
The global minimum of the optimization problem (1.1) always exists, because we can select any point X in the feasible region and consider the feasible region intersected with the closed ball centered at U and with radius . Since the feasible region is a closed semialgebraic set, then the intersection is closed and bounded, and hence compact. The distance function is continuous, thus achieves its minimum on this set. This minimum is a global minimum of (1.1). The optimization problem (1.1) is nonconvex and often local methods are used to solve it. They return a local minimum of the optimization problem. There are heuristics for finding a global minimum, but these heuristics do not guarantee that a local minimum is indeed a global minimum. We refer to [20] for various algorithms and to [31] for an algorithm with locally quadratic convergence. Cifuentes recently introduced convex relaxations for structured low-rank approximation that under certain assumptions have provable guarantees [6]. Another interesting direction, closely related but not directly applicable to our problem, is to employ recent optimization techniques for simultaneously sparse and low rank approximation [27], [30].
To compute a global minimizer of (1.1) algebraically, we need to look at all the complex critical points of the polynomial function on the intersection , where and then select the real solution that minimizes the Euclidean distance. The problem of finding critical points of on can be considered in the more general setting when U is a complex data matrix. This setting includes the practically meaningful setting when U has real entries. If is generic, namely if it belongs to the complement of a Zariski closed set, then the number of critical points is constant and is called the Euclidean Distance degree (ED degree) of . We denote this invariant by . The importance of the ED degree is that it measures the algebraic complexity of writing the optimal solution as a function of U. More generally, the ED degree of an algebraic variety is introduced in [10]. The main goal of this paper is to study the critical points and the ED degree of the minimization problem (1.1).
When rank is one, then characterizing critical points becomes a combinatorial problem. More precisely, listing all critical points translates to the problem of listing minimal vertex covers of a bipartite graph. The complexity of counting vertex covers in a bipartite graph is known to be #P-complete [25]. Our main result about rank-one critical points is Proposition 3.3 which gives the ED degree of in terms of the minimal covers. For row/column and diagonal zero patterns this results in explicit formulas (Corollary 3.8, Corollary 3.9).
Our first main result for rank-r critical points is Theorem 4.3 which studies the linear span of rank-r critical points of . We call it the critical space in the structured setting. This is motivated by the notion of critical space of a tensor in the unstructured setting defined by Draisma, Ottaviani and Tocino [12]. From the algebraic perspective, Theorem 4.3 provides a lower bound on the minimal number of generators of degree one in the zero dimensional ideal of rank-r critical points of . When is an irreducible variety, we expect this lower bound to be also an upper bound, as stated in Conjecture 4.4.
In the unstructured setting, the rank-one critical points form a basis of the critical space and the rank-r critical points are linear combinations of the basis vectors with coefficients in . In the structured setting, there are not enough rank-one critical points to give a basis of the critical space. We leave it as an open question, whether there is a natural extension to a basis and whether the coefficients that give rank-r critical points as linear combinations of basis elements have a nice description.
Our second main result is Proposition 4.12 that describes affine linear relations that are satisfied by the rank-r critical points of in the unstructured setting. In the structured setting, we conjecture the affine linear relations satisfied by the rank-r critical points of . The last kind of constraints satisfied by the rank-r critical points that we consider are nonlinear determinantal constraints given in Proposition 4.18. The ED degree of is studied in Section 5. Our experiments indicate that the ED degree is exponential in .
The optimization problem (1.1) is motivated by the nonnegative matrix factorization (NMF) problem. Given a nonnegative matrix , the nonnegative rank of X is the smallest r such that NMF aims to find a matrix X of nonnegative rank at most r that minimizes the Euclidean distance to a given data matrix , see [15] for further details.
In Section 6, we apply the structured best rank-two approximation problem to NMF. Let be the set of matrices of nonnegative rank at most two and consider a matrix . In order to compute the best nonnegative rank-2 approximation of U, we need to compute the critical points of the Euclidean distance function over for all zero patterns . We show that the minimal number of critical points needed to determine the global minimum of is 756 for a generic U. For the same case, we show experimentally that the optimal critical point may have a few zeros.
The rest of the paper is organized as follows. In Section 2 we set our notations (Section 2.1), we recall the basics of ED minimization on an algebraic variety (Section 2.2) and we discuss Frobenius distance minimization on a variety of low-rank matrices (Section 2.3). In Section 3 we address the best rank-one approximation problem with assigned zero patterns (Section 3.1) and best rank-r approximation for rectangular and block diagonal matrices (Section 3.2). In Section 4 we investigate special polynomial relations among the critical points of . In particular, in Sections 4.1 and 4.3 we concentrate on particular linear and affine relations among critical points respectively, and in Section 4.4 on some special nonlinear relations. Observations for generic linear constraints not necessarily coming from assigned zero patterns are given in Section 4.2. In Section 5 we provide conjectural ED degree formulas for special formats and zero patterns S, obtained from computational experiments. In Section 6 we relate the minimization problem (1.1) to nonnegative matrix factorization. The results of Sections 5 and 6 are supported by computations that use the HomotopyContinuation.jl [3] software package as well as the software Macaulay2 [16] and Maple™ 2016 [19]. The code can be found at github.com/kaiekubjas/exact-solutions-in-low-rank-approximation-with-zeros.
Section snippets
Preliminaries
The preliminaries section consists of three subsections on algebra basics and notations (Section 2.1), Euclidean distance minimization (Section 2.2) and unstructured low-rank approximation (Section 2.3).
Rank-one structured approximation and beyond
This section is divided into two subsections: In Section 3.1, we focus on rank-one approximation with zeros, and in Section 3.2, on the simplest cases of rank-r approximation with zeros for rectangular and block-diagonal matrices.
Special relations among critical points
In this section we provide (some of) the generators of the ideal of critical points on of . In particular, in Sections 4.1 and 4.3 we concentrate on particular linear and affine relations among critical points respectively, and in Section 4.4 on some special nonlinear relations. Observations for generic linear constraints not necessarily coming from assigned zero patterns are given in Section 4.2.
We stress that in our statements we always consider a real matrix U. However, as we
Computations of Euclidean distance degrees
In this section we present various experiments that study the ED degree of , when and the zero pattern S involves only elements in the diagonal.
First, we restrict to square matrices and consider the zero pattern . Since the number of (complex) critical points of on is constant for a generic (complex) data matrix U, it is reasonable to apply a monodromy technique for computing these critical points numerically. For this, we use the HomotopyContinuation.jl [3] software
Nonnegative low-rank matrix approximation
In this section, we apply rank-two approximation with zeros to the problem of nonnegative rank-two approximation. Our goal is to find the best nonnegative rank-two approximation with a guarantee that we have found the correct solution. There are two options for the critical points of the Euclidean distance function over :
- 1.
A critical point of the Euclidean distance function over is a critical point of the Euclidean distance function over the set of matrices of rank at most two.
- 2.
A critical
Declaration of Competing Interest
No declaration of competing interest.
Acknowledgements
We thank Giorgio Ottaviani, Grégoire Sergeant-Perthuis, Pierre-Jean Spaenlehauer, and Bernd Sturmfels for helpful discussions and suggestions. We thank two anonymous reviewers for insightful comments which improved the original manuscript. Kaie Kubjas and Luca Sodomaco are partially supported by the Academy of Finland Grant No. 323416. Elias Tsigaridas is partially supported by ANR JCJC GALOP (ANR-17-CE40-0009), the PGMO grant ALMA, and the PHC GRAPE.
References (31)
- et al.
Structured low rank approximation
Linear Algebra Appl.
(2003) - et al.
Algebraic/combinatorial proofs of Cayley-type identities for derivatives of determinants and Pfaffians
Adv. Appl. Math.
(2013) - et al.
A generalization of the Eckart-Young-Mirsky matrix approximation theorem
Linear Algebra Appl.
(1987) A non-commutative version of Jacobi's equality on the cofactors of a matrix
Discrete Math.
(1996)Structured low-rank approximation and its applications
Automatica
(2008)- et al.
Improved sparse low-rank matrix estimation
Signal Process.
(2017) - et al.
Computational Complexity: a Modern Approach
(2009) - et al.
Computing a nonnegative matrix factorization—provably
SIAM J. Comput.
(2016) - et al.
HomotopyContinuation.jl: a package for homotopy continuation in Julia
- et al.
An analog of the singular value decomposition for complex orthogonal equivalence
Linear Multilinear Algebra
(1987)