skip to main content
research-article
Open Access

Semi-supervised Multi-view Clustering based on NMF with Fusion Regularization

Authors Info & Claims
Published:27 April 2024Publication History

Skip Abstract Section

Abstract

Multi-view clustering has attracted significant attention and application. Nonnegative matrix factorization is one popular feature of learning technology in pattern recognition. In recent years, many semi-supervised nonnegative matrix factorization algorithms were proposed by considering label information, which has achieved outstanding performance for multi-view clustering. However, most of these existing methods have either failed to consider discriminative information effectively or included too much hyper-parameters. Addressing these issues, a semi-supervised multi-view nonnegative matrix factorization with a novel fusion regularization (FRSMNMF) is developed in this article. In this work, we uniformly constrain alignment of multiple views and discriminative information among clusters with designed fusion regularization. Meanwhile, to align the multiple views effectively, two kinds of compensating matrices are used to normalize the feature scales of different views. Additionally, we preserve the geometry structure information of labeled and unlabeled samples by introducing the graph regularization simultaneously. Due to the proposed methods, two effective optimization strategies based on multiplicative update rules are designed. Experiments implemented on six real-world datasets have demonstrated the effectiveness of our FRSMNMF comparing with several state-of-the-art unsupervised and semi-supervised approaches.

Skip 1INTRODUCTION Section

1 INTRODUCTION

Clustering is a very important multi-variant analysis technology in pattern recognition and machine learning. In past decades, many clustering methods [1, 21, 50, 63] have been proposed such as k-means [15], Gaussian mixture model, spectral clustering, hierarchical clustering algorithm, and so on. While most of them are designed for single view data. However, in many cases, the data in the real world are collected from multiple sources [18, 34, 55], which characterizes the objects from various perspectives and is usually termed as multiple views or modalities in these literatures [9, 47, 51, 61]. The different views generally contain interaction and complementary information which can be integrated for enhancing the recognition performance for different tasks. Therefore, for multiview clustering tasks, how to effectively fuse the information from the multiple views to enhance the clustering performance is the most important issue in the learning process [2, 59].

Among various outstanding multiview clustering methods [7, 8, 45, 48, 56, 57], nonnegative matrix factorization (NMF) [24, 25] based methods are widely studied owing to the excellent feature learning ability of NMF. Under the framework of multi-view nonnegative matrix factorization (MVNMF) [35, 38, 64], each point can be represented with an efficient low dimensional feature vector. From the perspective of whether to utilize label information, MVNMF methods [32] can be divided into unsupervised methods [46] and semi-supervised methods [28]. Many unsupervised MVNMFs focus on exploring the fusion mechanism of multiple views in hidden space. For example, in [46], a novel pair-wise co-regularization was proposed to align the multiple views, which can consider the similarity of the inter-view. In [14], a nonredundancy regularization was developed to discover the distinct contributions of the different views. In [42] and [27], the inter-view diversity terms are developed to learn more comprehensive information of data. Additionally, unsupervised MVNMFs usually take some normalizing operations to improve the performance of the model. For instance, in [41], to ensure the numerical stablility of NMF, in each iteration the column summation of the basis matrix is normalized to be 1, i.e., \(\sum \nolimits _i {{{\bf W}}_{i\cdot }^v = 1}\) (\(v\) denotes the \(v\)th view). Furthermore, to make the representations of the different views comparable at the same scale, in [52], [39], and [32], the same normalization operation in [41] is adopted and the norms of the basis matrix are compensated into the coefficient matrix in the co-regularization term. To guarantee the solution uniqueness of NMF, in [17], the row-summation of the coefficient matrix is normalized to be 1, i.e., \(\sum \nolimits _j {{{\bf H}}_{\cdot j}^v = 1}\), where each column denotes a data point. These methods have failed to consider the label information that the data may provide. In fact, effectively utilizing the label information provided by a small proportion of the data, the clustering performance can be further enhanced. Under this consideration, in recent years, some semi-supervised MVNMFs are developed trying to employ the label information.

Most recently proposed semi-supervised MVNMF methods are based on constrained NMF (CNMF) [30] which is a semi-supervised NMF method for single view data. The shortage of CNMF-based semi-supervised MVNMFs [5, 6, 43, 54] is that they all have failed to discover the discriminative information among the different classes. In these methods, the data points of the same classes are presented by the same feature vectors. By this means, the distances among the data points residing in the same classes are zeroed. However, the distances among the data points residing in the different classes are not maximized. Another problem of these CNMF-based methods is that they all have failed to make use of the local geometrical information of the data. In [22], Jiang et al. have tried to discover the discriminative information of data by regressing the partial coefficient matrix to the label matrix. The work in [31] has also considered the discriminative information of data. Unfortunately, both of these methods have ignored the geometrical information of data. The defect of these methods has been overcome in the work of [28] which has extended the work of [31] by including a graph regularization. Liang et al. [29] have tried to make use of the geometrical information of data by involving an adaptive local structure learning module. In [58], hypergraph regularization is used to encode the structure of data.

Although, the methods in [28], [29], and [58] have considered both discriminative information and geometrical information of the data. The price is that too many hyper-parameters make these methods hard to get satisfying performance for different datasets. Another problem is that, normalizing strategy is widely disccussed in unsupervised MVNMF methods; however, it is rarely explored under the framework of semi-supervised MVNMF. Addressing these issues, in this article, a semi-supervised multi-view nonnegative matrix factorization with fusion regularization (FRSMNMF) is proposed. In this method, the discriminative term and the feature alignment term are fused as one regularization prior termed as fusion regularization. The meaning of this regularization comes from two aspects. On the one hand, the features of different views are fused by this regularization. On the other hand, the feature fusion constraint and discriminative constraint are integrated as one constraint with this regularization, which effectively reduces the number of hyper-parameters. Additionally, to further make use of geometrical information of the data, graph regularization is also constructed for each view. To align the feature scale of different views, two different feature normalizing strategies are adopted, which corresponds to two variants denoted as FRSMNMF_N1 and FRSMNMF_N2. And the influences of these two normalizing strategies are compared in the experiments. For the two variants, corresponding specific iterative optimizing schemes are also designed. The contributions of this article are summarized as follows:

A novel fusion regularization based semi-supervised MVNMF is presented. In this work, the exploration of discriminative information and feature alignment are achieved in one term which reduces the number of hyper-parameters and enlarges the inter-class distinction;

Two feature normalizing strategies are adopted to align the feature scales of different views, which produces two variants of the proposed framework;

For the two variants of the proposed framework, two specific iterative optimizing schemes are designed to solve the corresponding minimization problems effectively;

The effectiveness of the proposed methods is evaluated by comparing with some recently proposed representative unsupervised and semi-supervised MVNMFs on six datasets.

The rest of this article is organized as follows: Section 2 introduces some related works on multi-view clustering. In Section 3, the details of the proposed semi-supervised multi-view nonnegative matrix factorization with fusion regularization (FRSMNMF) and its two specific variants are introduced. Section 4 gives the corresponding optimizing strategy, convergence proof and computational complexity analysis. In Section 5, the experimental results are demonstrated. Finally, the conclusion of this article is made in Section 6.

Skip 2RELATED WORK Section

2 RELATED WORK

In this section, we briefly review the related unsupervised and semi-supervised multi-view clustering methods.

2.1 Unsupervised Multi-view Clustering

Most of unsupervised MVNMF methods are based on the idea of aligning the multiple views in a shared common subspace to fuse the information. In [32], Liu et al. proposed a pioneer MVNMF method which tries to learn a consensus representation by minimizing a centroid co-regularization. With this regularization, the feature matrices of the different views are pushed toward a consensus feature matrix, and then multiview alignment is achieved. Furthermore, a feature scale normalization matrix is introduced to constrain the feature scales of different views to be similar hoping to facilitate the feature alignment. In [39], Rai et al. have adopted the same strategy to make the scales of different views comparable. In [52], Yang et al. have explored another way to reduce the distribution divergences of the feature matrix. They presented a MVNMF based on nonnegative matrix tri-factorization [53]. In order to restrict the feature scale of different views to be similar, the column-summation of the product matrix of basis matrix and shared embedding matrix are constrained to be 1. Essentially, Liu et al. [32], Rai et al. [39], and Yang et al. [52] have adopted the same feature normalizing strategy, in which the column-summation of the basis matrix is constrained to be 1, i.e., \(\sum \nolimits _i {{{\bf W}}_{i\cdot }^v = 1}\), and the norms are compensated to the coefficient matrix in the co-regularization term. Actually, to improve the performance of unsupervised MVNMFs, various normalizing operations are explored. For example, to guarantee the numerical stability of NMF, Shao et al. [41] have constrained the column-summation of the basis matrix to be 1 and do not compensate the norms to the coefficient matrix in the co-regularization term. To ensure the solution uniqueness of NMF, Hu et al. [17] have constrained the row-summation of the coefficient matrix to be 1, i.e., \(\sum \nolimits _j {{{\bf H}}_{\cdot j}^v = 1}\). Most of existing methods are based on the centroid co-regularization. While, in [46], a pair-wise co-regularization was proposed to align the multiple views pair-wisely. By minimizing this co-regularization, the alignment is acquired through pushing the representations of different views together. Until now, we can find that, most of the MVNMF methods were developed from aligning the multiple views, and ignored the diversity of the different views. Addressing this issue, in [42], an MVNMF called diverse nonnegative matrix factorization has been proposed. In this method, a diverse term is designed to encourage the heterogeneity of different views and try to learn some more comprehensive information. Recently, with the flourish of deep learning in computer vision and pattern recognition, some “deep” models are proposed for multi-view clustering task [13]. For example, a deep multi-view semi-nonnegative matrix factorization is proposed in [60]. In this method, an adaptive weighting strategy is adopted to balance the effect of different views. Similarly, a deep MVNMF is proposed in [26]. The graph regularization is applied in all layers not only in the final layer like [60] and all factor matrices are restricted to be nonnegative. All of the previously reviewed methods focus on dealing with “well-established” data. It means that all views have the corrected correspondence on sample-level. In [20], Huang et al. have developed a partially view-aligned clustering method which can tackle the multi-view data not fully view-aligned. “Partially view-aligned” means that only a part of data have the corrected correspondence on sample-level.

2.2 Semi-Supervised Multi-View Clustering

In recent years, several semi-supervised MVNMFs are proposed and promising results are obtained with label information. Wang et al. [43] have proposed a semi-supervised MVNMF based on CNMF for the first time. In this work, an adaptive weighting strategy is applied to balance the effect of different views. To learn a consensus representation, a similar centroid co-regularization like [32] is utilized in their method. In [5] and [6], Cai et al. have developed two semi-supervised MVNMF methods; these two methods are also based on CNMF. Both methods have adopted pair-wise co-regularization with Euclidean distance to align multiple views like [46]. The difference between [5] and [6] is the second regularizing term. In [5], \(\ell _{2,1}\)-norm regularization of the auxiliary matrix is used, and an orthonormality constraint on the auxiliary matrix is used in [6]. The former one tries to get sparse representations for each view, and the latter one tries to constrain the feature scale of different views to be similar. Yang et al. [54] have improved the work in [6] by relaxing the label constraint matrix. The common drawback of above methods is that they all have ignored the discriminative information of data. To make use of this information, Jiang et al. [22] have proposed a unified latent factor learning model trying to reconstruct the label matrix with the product of partial shared coefficient matrix and an auxiliary matrix. Further, Liu et al. [31] have improved the above work by splitting the coefficient matrix into the private part and the common part. Based on [31], Liang et al. [28] have tried to improve the performance of the method by imposing a graph regularization on the coefficient matrix. The problem with [28] is that it uses too many hyper-parameters, which makes this method very complex and hard to fine-tune. Wang et al. [44] have presented a semi-supervised multi-view clustering model with weighted anchor graph embedding, in which the anchors are constructed based on label information. Nie et al. [36] have developed a semi-supervised multi-view learning method which can model the structure of data adaptively. Based on this work, Liang et al. [29] have presented a semi-supervised MVNMF method based on label propagation. Zhang et al. [58] have proposed a semi-supervised MVNMF method, called dual hypergraph regularized partially shared NMF, in which hypergraph regularization was imposed on both the coefficient matrix and the basis matrix. Similar to [28], these two methods also suffer from involving too many hyper-parameters. Recently, some deep multi-view clustering methods were also developed. Zhao et al. [62] have proposed a deep semi-supervised MVNMF method which encodes the discriminative information of data by constructing an affinity graph for intra-class compactness and a penalty graph for inter-class distinctness. Chen et al. [11] have presented an autoencoder-based semi-supervised multi-view clustering method which encodes the discriminative information of data by introducing a pairwise constraint.

From above review, we can find that, although various matrix normalizing strategies are widely explored in many unsupervised MVNMF methods, these stategies are rarely discussed under the semi-supervised MVNMF framework. In these previously introduced normalizing strategies, the column-summation of the basis matrix is usually normalized to be 1, and sometimes the norms are compensated to the coefficient matrix in the co-regularization term. However, it is more reasonable to constrain the columns of the basis to be unit vectors, i.e., \(||{{\bf W}}_{\cdot j}^v|{|_2} = 1\). Additionally, recently proposed semi-supervised MVNMF methods have involved too many hyper-parameters, which has made these methods very complex and hard to fine-tune. Addressing these issues, in the following sections, we will introduce a novel semi-supervised MVNMF framework with fusion regularization. This framework can be implemented with two different feature normalizing strategies. And the fusion regularization can make use of the discriminative information of data and reduce the number of hyper-parameters.

Skip 3SEMI-SUPERVISED MULTI-VIEW CLUSTERING WITH FUSION REGULARIZATION Section

3 SEMI-SUPERVISED MULTI-VIEW CLUSTERING WITH FUSION REGULARIZATION

For a traditional MVNMF framework, the objective function can be written as follows: (1) \(\begin{equation} \sum \limits _v {{\rm {||}}{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}{\rm {||}}_F^2} + \lambda \mathcal {R}({\bf H}^v), \end{equation}\) where \({\bf X}^v\in \mathbb {R}_{+}^{ m^v\times n}\), \({\bf W}^v\in \mathbb {R}_{+}^{ m^v\times d^v}\) and \({\bf H}^v\in \mathbb {R}_{+}^{ d^v\times n}\) are the data matrix, the basis matrix and the coefficient matrix of the \(v\)th view, respectively. \(m^v\) and \(d^v\) are the dimensions of original space and latent space for the \(v\)th view, and \(n\) is the size of the dataset. \(\mathcal {R}({\bf H}^v)\) denotes the regularization term which is employed to introduce extra prior or constrains for enhancing the clustering performance, such as manifold regularization, co-regularization. \(\lambda\) is the hyper-parameter which regulates the participation of these constraints in the optimization process. Both the selection of hyper-parameter \(\lambda\) and regularization prior \(\mathcal {R}({\bf H}^v)\) play the important role in the design of the model. In order to achieve satisfactory clustering result, enlarge the inter-class distinction, shrink the distance of intra-class, and achieve multi-view feature alignment effectively are crucial. However, most of the existing multi-view NMF methods have failed to consider discriminative information and feature alignment simultaneously. Although some algorithms like AMVNMF, MVCNMF, and MVOCNMF have adopted label information of the data, they have not further considered the distinguishability among classes. Therefore, performance of clustering is limited. In addition, the model should avoid introducing too many hyper-parameters while integrating more regular priors \(\mathcal {R}({\bf H}^v)\) for feature learning.

For enforcing the inter-class distinctness with multi-view features, in this article, we first construct a novel fusion regularization which considers the distinct information among clusters and alignment of different views together. It can be formulated as follows: (2) \(\begin{equation} {\rm {||}}[{\bf Y}, {\bf H}_c] -[{{\bf H}^v_l}, {{\bf H}^v_{ul}}]{\rm {||}}_F^2 \end{equation}\) where \({\bf Y}\in \lbrace 0, 1\rbrace ^{d^v\times n_l}\) is the label matrix and \({\bf H}^v_l\in \mathbb {R}_{+}^{d^v\times n_l}\) is the feature matrix of labeled samples for the \(v\)th view. \(n_l\) is the number of labeled data points. \({\bf H}_c\in \mathbb {R}_{+}^{d^v\times n_{ul}}\) denotes the common feature matrix and \({\bf H}^v_{ul}\in \mathbb {R}_{+}^{d^v\times n_{ul}}\) is the feature matrix of unlabeled data points for the \(v\)th view. \(n_{ul}\) is the number of unlabeled data points. Obviously, in this work, \(d^1=d^2=\cdots =d^V=C\). \(C\) is the number of classes. A general way to simultaneously consider discriminative information and feature alignment is to define them separately as follows: (3) \(\begin{equation} ||{{\bf Y}} - {{\bf H}}_l^v{\rm {||}}_F^2 + ||{{\bf H}}_c^* - {{{\bf H}}^v}{\rm {||}}_F^2, \end{equation}\) where \({{{\bf H}}^v} = [{{\bf H}}_l^v,{{\bf H}}_{ul}^v]\) and \({{\bf H}}_c^* = [{{\bf H}}_{cl}^*,{{\bf H}}_{c}]\). Then, Equation (3) can be further written as follows: (4) \(\begin{equation} ||{{\bf Y}} - {{\bf H}}_l^v{\rm {||}}_F^2 + ||[{{\bf H}}_{cl}^*,{{\bf H}}_{c}] - [{{\bf H}}_l^v,{{\bf H}}_{ul}^v]{\rm {||}}_F^2. \end{equation}\) In Equation (4), the first term is used to align the coefficient matrix of the labeled samples (i.e., \({{\bf H}}_l^v\)) to the label matrix (i.e., \({\bf Y}\)), and the second term is used to align the whole coefficient matrix of all samples (i.e., \({{{\bf H}}^v} = [{{\bf H}}_l^v,{{\bf H}}_{ul}^v]\)) to the common consensus matrix \({{\bf H}}_c^* = [{{\bf H}}_{cl}^*,{{\bf H}}_c]\). We can see that the representation of the labeled samples (\({{\bf H}}_l^v\)) is aligned by both \({\bf Y}\) and \({{\bf H}}_{cl}^*\). The goal of aligning \({{\bf H}}_l^v\) to \({\bf Y}\) is to discover the discriminative information of data. While, the goal of aligning \({{\bf H}}_l^v\) to \({{\bf H}}_{cl}^*\) is to align the multiple views to push the representations of different views (\({{\bf H}}_l^v\), \(v=1, 2, \ldots , V\)) close to each other. Actually, by aligning \({{\bf H}}_l^v\) to \({\bf Y}\) can also achieve this goal, because the label matrix \({\bf Y}\) is shared by all views. With this intuition, we substitute \({{\bf H}}_{cl}^*\) with \({\bf Y}\) in Equation (4) and remove the first term, then, we can get the proposed Equation (2). Until now, we can see that by aligning \({{\bf H}}_l^v\) with \({\bf Y}\), the discriminative information can be discovered and \({{\bf H}}_l^v\) (\(v=1, 2, \ldots , V\)) of different views can be pushed close to each other. And \({{\bf H}}_{ul}^v\) (\(v=1, 2, \ldots , V\)) of different views can be pushed together by aligning them to \({{\bf H}}_c\). The meaning of fusion regularization has two folds: (1) the features of different views are fused by this regularization; (2) the feature fusion constraint and discriminative constraint are integrated as one constraint with this regularization, which reduces the introduction of hyper-parameters.

In Equations (2), (3), and (4), the label matrix \({\bf Y}\) is approximated by \({{\bf H}}_l^v\). We can denote the data matrix \({{\bf X}}^v\) as \({{{\bf X}}^v} = [{{\bf X}}_l^v,{{\bf X}}_{ul}^v]\) corresponding to the low dimensional representation \({{{\bf H}}^v} = [{{\bf H}}_l^v,{{\bf H}}_{ul}^v]\). Then \({{\bf H}}_l^v\) is the low dimensional representation of \({{\bf X}}_l^v\). Actually, the label matrix \({\bf Y}\) can also be seen as a feature representation of \({{\bf X}}_l^v\), which contains the discriminative information of data. So, by subtracting \({\bf Y}\) with \({{\bf H}}_l^v\), i.e., minimizing \(||{{\bf Y}} - {{\bf H}}_l^v{\rm {||}}_F^2\), \({\bf Y}\) is reconstructed by \({{\bf H}}_l^v\), then the discriminative information in \({\bf Y}\) can be learned by \({{\bf H}}_l^v\). Note that, we do not constrain the column summation of \({{\bf H}}_l^v\) to be 1, i.e., \(\sum \nolimits _i {{\rm {H}}_{l(i)}^v \ne 1}\), so \({{\bf H}}_l^v\) is a relaxed representation of \({\bf Y}\).

Additionally, to make use of the geometry information of the multi-view data in each view for decreasing the intra-class diversities, a graph regularizer is constructed based on both labeled samples and unlabeled samples, which is defined as follows: (5) \(\begin{equation} \sum \limits _{i = 1}^n {\sum \limits _{j = 1}^n {\big |\big |h_i^v - h_j^v\big |\big |_2^2{\bf S}_{ij}^v} } = {\rm {tr}}({{\bf H}^v}^T{{\bf L}^v}{{\bf H}^v}), \end{equation}\) where \({\bf L}^v={\bf D}^v-{\bf S}^v\), \({\bf D}_{ii}^v=\sum _{j}{\bf S}_{ij}^v\) (or \({\bf D}_{ii}^v=\sum _{j}{\bf S}_{ji}^v\)). \({\bf S}^v\) is defined as follows: (6) \(\begin{equation} {\bf S}_{ij}^v \!=\! \left\lbrace \!\! \begin{array}{*{20}{l}} {{1}}&{{\rm {if}} \ {h_i^v} \!\in \! {N_k}(h_j^v) \ {\rm {or}} \ {h_j^v} \!\in \! {N_k}(h_i^v)}\\ 0&{{\rm {otherwise}}} \end{array} \right., \end{equation}\) where \({N_k}(h_i^v)\) consists of \(k\) nearest neighbors of \(h^v_i\).

Considering the above properties of inter- and intra-class, the overall objective function of our proposed semi-supervised multi-view nonnegative matrix factorization with fusion regularization (FRSMNMF) can be described as follows: (7) \(\begin{equation} \begin{array}{l} {O_{{\rm {FRSMNMF}}}} = \sum \limits _{v = 1}^{{V}} {{ ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2} + \alpha {\rm {tr}}({{\bf H}^v}^T{{\bf L}^v}{{\bf H}^v})\\ +\,\, \beta ||[{\bf Y}, {\bf H}_c] - [{{\bf H}^v_l}, {{\bf H}^v_{ul}}]||_F^2 } \\ s.t. \ {\bf W}^v,{\bf H}^v,{{\bf H}_c} \ge 0. \end{array} \end{equation}\)

To tackle multi-view tasks, another important issue is to align multiple features effectively for aggregating the information of different views. Instead of optimizing Equation (7) directly, we first restrict the scales of multiple features to be comparable-level. For this target, the column vectors of \({\bf W}^v\) are constrained be \(||{\bf W}_{\cdot j}^v||_2=1\) or \(||{\bf W}_{\cdot j}^v||_1=1\). One possible way to account for the above constraints is to add an extra regularization term to Equation (7), but this would make the optimization problem very complicated. Instead of taking the direct strategy, an alternative scheme is to compensate the norms of the basis matrix into the coefficient matrix. Then, the objective function of FRSMNMF in Equation (7) can be rewritten as: (8) \(\begin{equation} \begin{array}{l} {O_{{\rm {FRSMNMF}}}} = \sum \limits _{v = 1}^{{V}} {\lbrace ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2} + \alpha {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T{{\bf Q}^v}^T)\\ \qquad \qquad \qquad +\,\, \beta ||[{\bf Y}, {\bf H}_c] - {{\bf Q}^v}{{\bf H}^v}||_F^2\rbrace \\ s.t. \ {{\bf W}^v}, {{\bf H}^v},{{\bf H}_c} \ge 0, \end{array} \end{equation}\) where \({\bf Q}^v\) is defined as (9) \(\begin{equation} {{{\bf Q}}^v} = Diag\left(\sqrt {\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{{_{i,1}^v}^2}} } ,\sqrt {\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{{_{i,2}^v}^2}} } ,\ldots ,\sqrt {\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{{_{i,d^v}^v}^2}} }\right)\!, \end{equation}\) or (10) \(\begin{equation} {{{\bf Q}}^v} = Diag\left({\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{_{i,1}^v} } , {\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{_{i,2}^v}} } ,\ldots , {\sum \limits _{i = 1}^{{m^v}} {{{\bf W}}{_{i,d^v}^v}} }}\right)\!. \end{equation}\)

For simplicity, the proposed methods with \(||{\bf W}_{\cdot j}^v||_2=1\) and \(||{\bf W}_{\cdot j}^v||_1=1\) are denoted as FRSMNMF_N1 and FRSMNMF_N2, respectively, corresponding to two different normalization manners. In following sections, the optimization algorithms of FRSMNMF_N1 and FRSMNMF_N2 will be introduced in detail.

Skip 4OPTIMIZATION PROCEDURE Section

4 OPTIMIZATION PROCEDURE

4.1 Optimization Algorithm of FRSMNMF_N1

The optimization problem of minimizing Equation (8) is not direct. In this section, an alternating optimization strategy is developed, which breaks the original problem into several subproblems such that each subproblem is tractable.

For a specific view \(v\), its optimization with \({\bf W}^{v}\) and \({\bf H}^{v}\) does not depend on other views. The problem of minimizing Equation (8) can be written as follows: (11) \(\begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{\bf W}^v},{{\bf H}^v},{{\bf H}_c} \ge 0} ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T{{\bf Q}^v}^T) \\ \qquad \qquad +\,\, \beta ||[{\bf Y}, {\bf H}_c] - {{\bf Q}^v}{{\bf H}^v}||_F^2 \end{array} \end{equation}\)

4.1.1 Fixing Hc, updating Wv and Hv.

(1) Fixing \({\textbf {H}_c}\) and \({\bf H}^v\), updating \({\bf W}^v\)

When \({{\bf H}_c}\) and \({\bf H}^v\) are fixed, the above equation is equivalent to the following problem: (12) \(\begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{\bf W}^v}\ge 0} ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T{{\bf Q}^v}^T)\\ \qquad +\,\, \beta {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf H}^v}^T{{\bf Q}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}^T}{{\bf Q}^v}^T), \end{array} \end{equation}\) where \({\bf H}_{yc}=[{\bf Y}, {\bf H}_c]\).

Let (13) \(\begin{equation} \begin{array}{l} {{\bf Y}^v_1} = [{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T] \odot {\bf I}\\ \quad \;\,={\bf Y}^{v+}_1-{\bf Y}^{v-}_1\\ {{\bf Y}^v_2} = [{{\bf H}^v}{{\bf H}^v}^T] \odot {\bf I}, \end{array} \end{equation}\) where \(\odot\) is the element-wise product operator, \({\bf I}\) is the identity matrix and the matrices \({\bf Y}^{v+}_1\) and \({\bf Y}^{v-}_1\) are defined as (14) \(\begin{equation} \begin{array}{l} {\bf Y}^{v+}_1 = [{{\bf H}^v}{{\bf D}^v}{{\bf H}^v}^T] \odot {\bf I}\\ {\bf Y}^{v-}_1 = [{{\bf H}^v}{{\bf S}^v}{{\bf H}^v}^T] \odot {\bf I}. \end{array} \end{equation}\) Then, Equation (12) with \({\bf Q}^v\) defined as in Equation (9) can be further rewritten as (15) \(\begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{\bf W}^v} \ge 0} ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf W}^v}{{\bf Y}^v_1}{{\bf W}^v}^T)\\ \qquad +\,\, \beta {\rm {tr}}({{\bf W}^v}{{\bf Y}^v_2}{{\bf W}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}^T}{{\bf Q}^v}^T). \end{array} \end{equation}\)

As \({{\bf W}^v} \ge 0\), introducing the Lagrangian multipliers \(\psi _{ih}\) for the constraints \({\bf W}^v_{ih} \ge 0\). Let \(\Psi =[\psi _{ih}]\). Then, the Lagrangian function \({\mathcal {L}}\) is as follows: (16) \(\begin{equation} \begin{array}{l} {\mathcal {L}} = ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf W}^v}{{\bf Y}^v_1}{{\bf W}^v}^T)\\ \qquad +\,\, \beta {\rm {tr}}({{\bf W}^v}{{\bf Y}^v_2}{{\bf W}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}^T}{{\bf Q}^v}^T) + {\rm {tr}}(\Psi {{\bf W}^v}^T) \end{array} \end{equation}\)

The partial derivative of \({\rm {tr}}({{\bf H}^v}{{\bf H}_{yc}^T}{{\bf Q}^v}^T)\) with respect to \({\bf W}^v_{ih}\) is (17) \(\begin{equation} \begin{array}{l} \displaystyle {\frac{{\partial {\rm {tr}}({{{\bf H}}^v}{{\bf H}}_{yc}^T{{{\bf Q}}^v}^T)}}{{\partial {{\bf W}}_{ih}^v}} = \frac{{\partial {{({{\bf H}^v}{{\bf H}_{yc}^T})}_{hh}}\sqrt {\sum \nolimits _{k = 1}^{{m^v}} {{\bf W}{{_{kh}^v}^2}} } }}{{\partial {\bf W}_{ih}^v}}}\\ \quad \;\,\displaystyle {= {({{\bf H}^v}{{\bf H}_{yc}^T})_{hh}}\frac{{{\bf W}_{ih}^v}}{{\sqrt {\sum \nolimits _{k = 1}^{{m^v}} {{\bf W}{{_{kh}^v}^2}} } }}}\\ \quad \;\,= ({\bf W}^v({{\bf Q}^v})^{-1}({{\bf H}^v}{{\bf H}_{yc}^T}\odot {\bf I}))_{ih}\\ \quad \;\,= ({\bf W}^v({{\bf Q}^v})^{-1}{{{\bf Y}}^v_3})_{ih}, \end{array} \end{equation}\) where \({{{\bf Y}}^v_3}=[{{\bf H}^v}{{\bf H}_{yc}^T}]\odot {\bf I}\).

Until now, the partial derivative of (16) with respect to \({\bf W}^v\) can be written as follows: (18) \(\begin{equation} \begin{array}{*{20}{l}} \displaystyle {\frac{{\partial \mathcal {L}}}{{\partial {{{\bf W}}^v}}}}=&\!\!\!\! 2{{\bf W}}^v{{\bf H}}^v{{\bf H}^v}^T - 2{{\bf X}^v}{{\bf H}^v}^T + 2\alpha {{{\bf W}}^v}{{{\bf Y}}^v_1}\\ &\!\!\!\!+\,\, 2\beta {{{\bf W}}^v}{{{\bf Y}}^v_2} { - 2\beta {{{\bf W}}^v}({{\bf Q}^v})^{ - 1}{{{\bf Y}}^v_3}} { + \Psi .} \end{array} \end{equation}\)

Setting above equation to zero and using the Karush–Kuhn–Tucker (KKT) condition [3] of \(\psi _{ih} {\bf W}^v_{ih}=0\), then have (19) \(\begin{equation} \begin{array}{l} 2({{\bf W}}^v{{\bf H}}^v{{\bf H}^v}^T)_{ih}{{\bf W}^v_{ih}} - 2({{\bf X}^v}{{\bf H}^v}^T)_{ih}{{\bf W}^v_{ih}}\\ +\,\, 2\alpha ({{\bf W}^v}{{\bf Y}^v_1})_{ih}{{\bf W}^v_{ih}} + 2\beta ({{\bf W}^v}{{\bf Y}^v_2})_{ih}{{\bf W}^v_{ih}}\\ -\,\, 2\beta ({{{\bf W}}^v}({{\bf Q}^v})^{ - 1}{{{\bf Y}}^v_3})_{ih}{{\bf W}^v_{ih}}\\ =0, \end{array} \end{equation}\) which leads to the following update rule: (20) \(\begin{equation} {{\bf W}}_{ih}^v = {{\bf W}}_{ih}^v\frac{{{{({{{\bf X}}^v}{{{\bf H}}^v}^T + \alpha {{{\bf W}}^v}{{\bf Y}}_1^{v - } + \beta {{{\bf W}}^v}({{{\bf Q}}^v})^{ - 1}{{\bf Y}}_3^v)}_{ih}}}}{{{{({{{\bf W}}^v}{{{\bf H}}^v}{{{\bf H}}^v}^T + \alpha {{{\bf W}}^v}{{\bf Y}}_1^{v + } + \beta {{{\bf W}}^v}{{\bf Y}}_2^v)}_{ih}}}} \end{equation}\) (2) Fixing \({\textbf {H}_c}\) and \({\bf W}^v\), updating \({\bf H}^v\)

After the updating of \({\bf W}^v\), the column vectors of \({\bf W}^v\) are normalized with \({\bf Q}^v\) in Equation (9) and then the norm is conveyed to the coefficient matrix \({\bf H}^v\), that is: (21) \(\begin{equation} {{{\bf W}}^v} \Leftarrow {{{\bf W}}^v}({{{\bf Q}}^v})^{ - 1},{{{\bf H}}^v} \Leftarrow {{{\bf Q}}^v}{{{\bf H}}^v}. \end{equation}\)

When \({{\bf H}_c}\) and \({\bf W}^v\) are fixed, Equation (11) is equivalent to the following problem: (22) \(\begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{\bf H}^v} \ge 0} ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T)\\ \qquad +\,\, \beta {\rm {tr}}({{\bf H}^v}{{\bf H}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}^{T}}). \end{array} \end{equation}\)

As \({{\bf H}^v} \ge 0\), introducing the Lagrangian multipliers \(\phi _{jh}\) for the constraints \({\bf H}^v_{jh} \ge 0\). Let \(\Phi =[\phi _{jh}]\). Then, the Lagrangian function \({\mathcal {L}}\) is as follows: (23) \(\begin{equation} \begin{array}{l} {\mathcal {L}} = ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T)\\ \qquad +\,\, \beta {\rm {tr}}({{\bf H}^v}{{\bf H}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}^{T}}) + {\rm {tr}}(\Phi {{\bf H}^v}^T). \end{array} \end{equation}\)

The partial derivative of (23) with respect to \({\bf H}^v\) is as follows: (24) \(\begin{equation} \begin{array}{*{20}{l}} \displaystyle {\frac{{\partial \mathcal {L}}}{{\partial {{{\bf H}}^v}}} }=&\!\!\!\! 2{{{\bf W}}^v}^T{{{\bf W}}^v}{{{\bf H}}^v} -2{{{\bf W}}^v}^T{{{\bf X}}^v}\\ &\!\!\!\!+\,\, 2\alpha {{{\bf H}}^v}{{{\bf D}}^v} - 2\alpha {{{\bf H}}^v}{{\bf S}^v} + 2\beta {{{\bf H}}^v} - 2\beta {{{\bf H}}_{yc}} + \Phi . \end{array} \end{equation}\) Similarly, setting \(\displaystyle {\frac{{\partial \mathcal {L}}}{{\partial {{{\bf H}}^v}}}} = 0\) and using KKT condition of \(\phi _{hj} {\bf H}^v_{hj}=0\), then we have (25) \(\begin{equation} \begin{array}{l} 2({{{\bf W}}^v}^T{{{\bf W}}^v}{{{\bf H}}^v})_{hj}{{\bf H}^v_{hj}} -2({{{\bf W}}^v}^T{{{\bf X}}^v})_{hj}{{\bf H}^v_{hj}} + 2\alpha ({{{\bf H}}^v}{{{\bf D}}^v})_{hj}{{\bf H}^v_{hj}}\\ -\,\, 2\alpha ({{{\bf H}}^v}{{\bf S}^v})_{hj}{{\bf H}^v_{hj}} + 2\beta ({{{\bf H}}^v})_{hj}{{\bf H}^v_{hj}} - 2\beta ({{{\bf H}}_{yc}})_{hj}{{\bf H}^v_{hj}}\\ =0, \end{array} \end{equation}\) which leads to the following update rules: (26) \(\begin{equation} \begin{array}{l} {{\bf H}}_{hj}^v = \displaystyle {{{\bf H}}_{hj}^v\frac{{({{{\bf W}}^v}^T{{{\bf X}}^v} + \alpha {{{\bf H}}^v}{{{\bf S}}^v} + \beta {{{\bf H}}_{yc}}{)_{hj}}}}{{({{{\bf W}}^v}^T{{{\bf W}}^v}{{{\bf H}}^v} + \alpha {{{\bf H}}^v}{{{\bf D}}^v} + \beta {{{\bf H}}^v}{)_{hj}}}}}. \end{array} \end{equation}\)

4.1.2 Fixing W and Hv, updating Hc.

As \({\bf W}^v\) is normalized in each iteration, the partial derivative of (8) with respect to \({\bf H}_c\) is as follows: (27) \(\begin{equation} \begin{array}{{l}} \displaystyle {\frac{{\partial {O_{\rm {FRSMNMF}}}}}{{\partial {{{\bf H}}_c}}} = \frac{{\partial \sum \nolimits _{v = 1}^{{V}} {\beta ||{{{\bf H}_l^v}} - {{{\bf H}}_c}||_F^2} }}{{\partial {{{\bf H}}_c}}}}\\ \qquad \qquad \quad = \sum \limits _{v = 1}^{{V}} {[ - 2\beta {{{\bf H}_l^v}} + 2\beta {{{\bf H}}_c}]} = 0. \end{array} \end{equation}\)

Then, the exact solution for \({{\bf H}}_c\) is (28) \(\begin{equation} {{{\bf H}}_c} = \frac{{\sum \nolimits _{v = 1}^{{V}} {{{{\bf H}_l^v}}} }}{{{V}}} \ge 0. \end{equation}\)

The optimizing scheme of FRSMNMF_N1 is summarized in Algorithm 1.

4.2 Optimization Algorithm of FRSMNMF_N2

Comparing with FRSMNMF_N1, the main difference of optimizing procedure for FRSMNMF_N2 lies in the updating rule of \({\bf W}^v\). So, in this section, only the updating rule of \({\bf W}^v\) is deduced. Fixing \({\bf H}_c\) and \({\bf H}^v\), the objective function of FRSMNMF_N2 with respect to \({\bf W}^v\) is written as follows: (29) \(\begin{equation} \begin{array}{l} \mathop {\min }\limits _{{{\bf W}^v} \ge 0} ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T{{\bf Q}^v}^T)\\ {[}10pt] +\,\, \beta ||[{\bf Y}, {\bf H}_c] -{{\bf Q}^v}{{\bf H}^v}||_F^2, \end{array} \end{equation}\) where \({\bf Q}^v\) is defined as in Equation (10).

As \({{\bf W}^v} \ge 0\), introducing the Lagrangian multipliers \(\psi _{ih}\) for the constraints \({\bf W}^v_{ih} \ge 0\). Let \(\Psi =[\psi _{ih}]\). Then, the Lagrangian function \({\mathcal {L}}\) is as follows: (30) \(\begin{equation} \begin{array}{l} {\mathcal {L}} = ||{{\bf X}^v} - {{\bf W}^v}{{\bf H}^v}||_{F}^2 + \alpha {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf L}^v}{{\bf H}^v}^T{{\bf Q}^v}^T)\\ {[}5pt] \qquad +\,\, \beta {\rm {tr}}({{\bf Q}^v}{{\bf H}^v}{{\bf H}^v}^T{{\bf Q}^v}^T - 2{{\bf H}^v}{{\bf H}_{yc}}^T{{\bf Q}^v}^T) + {\rm {tr}}(\Psi {{\bf W}^v}^T). \end{array} \end{equation}\)

The partial derivative of (30) with respect to \({\bf W}^v\) can be written as follows: (31) \(\begin{equation} \begin{array}{l} \displaystyle {\frac{{\partial L}}{{\partial {{\bf W}}_{ih}^v}} }= 2{({{{\bf W}}^v}{{{\bf H}}^v}{{{\bf H}}^v}^T)_{ih}} - 2({{{\bf X}}^v}{{{\bf H}}^{vT}})_{ih}\\ {[}5pt] +\,\, 2\alpha {({{\bf Q}^v}{{\bf Y}}_1^v)_{hh}} + 2\beta {({{\bf Q}^v}{{\bf Y}}_2^v)_{hh}} - 2\beta {({{\bf Y}}_3^v)_{hh}} + {\psi _{ih}}, \end{array} \end{equation}\) where \({\bf Y}^v_1\), \({\bf Y}^v_2\) and \({{\bf Y}^v_3}\) are defined in Equation (13) and Equation (17).

Setting the above equation to zero and using the KKT condition of \(\psi _{ih} {\bf W}^v_{ih}=0\), then we have (32) \(\begin{equation} \begin{array}{l} 2{({{{\bf W}}^v}{{{\bf H}}^v}{{{\bf H}}^v}^T)_{ih}}{{\bf W}}_{ih}^v - 2{({{{\bf X}}^v}{{{\bf H}}^{vT}})_{ih}}{{\bf W}}_{ih}^v\\ {[}3pt] +\,\, 2\alpha {({{\bf Q}^v}{{\bf Y}}_1^v)_{hh}} + 2\beta {({{\bf Q}^v}{{\bf Y}}_2^v)_{hh}}{{\bf W}}_{ih}^v\\ {[}3pt] -\,\, 2\beta {({{\bf Y}}_3^v)_{hh}} = 0, \end{array} \end{equation}\) which leads to the following update rule: (33) \(\begin{equation} {{\bf W}}_{ih}^v = {{\bf W}}_{ih}^v\frac{{{{({{{\bf X}}^v}{{{\bf H}}^{vT}})}_{ih}} + {{(\alpha {{{\bf Q}}^v}{{\bf Y}}_1^{v - } + \beta {{\bf Y}}_3^v)}_{hh}}}}{{{{({{{\bf W}}^v}{{{\bf H}}^v}{{{\bf H}}^v}^T)}_{ih}} + {{(\alpha {{{\bf Q}}^v}{{\bf Y}}_1^{v + } + \beta {{{\bf Q}}^v}{{\bf Y}}_2^v)}_{hh}}}}. \end{equation}\)

The updating rules of \({{\bf H}^v}\) and \({{\bf H}_c}\) in FRSMNMF_N2 are the same as those in Equation (26) and Equation (28). The optimizing scheme of FRSMNMF_N2 is summarized in Algorithm 2. The convergence proofs for FRSMNMF_N1 and FRSMNMF_N2 are presented in the Appendix.

4.3 Computational Complexity Analysis of FRSMNMF_N1 and FRSMNMF_N2

In this section, the computational complexity of FRSMNMF_N1 and FRSMNMF_N2 is analyzed. The computational complexity is expressed in a big \(O\) notation [12]. For a specific view \(v\) of FRSMNMF_N1 in one iteration, updating \({\bf W}^v\) needs to calculate \({{\bf W}^v}{{\bf H}^v}{{\bf H}^v}^T\), \({{\bf X}^v}{{\bf H}^v}^T\), \({{\bf W}^v}{\bf Y}_1^v\), \({{\bf W}^v}{\bf Y}_2^v,\) and \({{\bf W}^v}({{\bf Q}^v})^{ - 1}{\bf Y}_3^v\). \({{\bf W}^v}{{\bf H}^v}{{\bf H}^v}^T\) and \({{\bf X}^v}{{\bf H}^v}^T\) cost \(O((n + m^v)d^{v2})\) and \(O(m^v n d^v),\) respectively. Total cost of \({{\bf W}^v}{\bf Y}_1^v\), \({{\bf W}^v}{\bf Y}_2^v\) and \({{\bf W}^v}({{\bf Q}^v})^{ - 1}{\bf Y}_3^v\) is \(O(m^vd^v + Kd^v n + d^{v2} n)\). \(K\) is the number of \(k\) nearest neighbors in graph. So, the cost for updating \({\bf W}^v\) is \(O(Kd ^v n + m^v n d^v)\). Normalize \({\bf W}^v\) and \({\bf H}^v\) in Equation (21) needs \(O(nd^v + m^v d^v)\) computation. The main cost of updating \({\bf H}^v\) is on the calculation of \({{\bf W}^v}^T{{\bf X}^v}\), \({{\bf W}^v}^T{{\bf W}^v}{{\bf H}^v}\), \({\bf H}^v{\bf S}^v\) and \({\bf H}^v{\bf D}^v\). The cost on \({{\bf W}^v}^T{{\bf X}^v}\) and \({{\bf W}^v}^T{{\bf W}^v}{{\bf H}^v}\) are \(O(m^v n d^v)\) and \(O((n + m^v)d^{v2})\). While \({{\bf H}^v}{{\bf S}^v}\) and \({{\bf H}^v}{{\bf D}^v}\) need \(O(Kd^v n)\) and \(O(d^v n)\) computation respectively. The cost for updating \({\bf H}^v\) is \(O(Kd^v n + m^v n d^v)\). Comparing with the cost of updating \({\bf W}^v\) and \({\bf H}^v\), the computation cost on updating \({\bf H}_c\) is ignorable. So, the final cost for FRSMNMF_N1 is \(O(t V m^v n d^v)\). \(t\) is the number of iterations and \(V\) denotes the number of views.

For FRSMNMF_N2, the main difference of computation cost resides on the updating of \({\bf W}^v\). Except the computation cost on calculating \({{\bf W}^v}{{\bf H}^v}{{\bf H}^v}^T\) and \({{\bf X}^v}{{\bf H}^v}^T\), it also need to compute \({{\bf Q}^v}{\bf Y}_1^v\), \({{\bf Q}^v}{\bf Y}_2^v\) and \({\bf Y}_3^v\). The total computation cost on these additional terms is \(O(Kd^v n + d^{v2}n)\). So, the cost for updating \({\bf W}^v\) in FRSMNMF_N2 is \(O(Kd^v n + m^v n d^v)\). The final cost for FRSMNMF_N2 is \(O(t V m^v n d^v)\).

Skip 5EXPERIMENTS Section

5 EXPERIMENTS

5.1 Evaluation Metrics

Two widely used clustering metrics are adopted to evaluate the performance of the methods: clustering accuracy (AC) [49] and normalized mutual information (NMI) [40].

AC is defined as (34) \(\begin{equation} {\rm {AC}} = \frac{{\sum \nolimits _{i = 1}^n {\delta (gn{d_i},map({z_i}))} }}{n}, \end{equation}\) where \(n\) is the number of samples in each multiview dataset. \(gnd_i\) is the ground truth label of \(x_i\). \(z_i\) corresponds to the label that is obtained by clustering methods due to the sample \(x_i\). \(map(\cdot)\) is an optimal mapping function [33] which can map the obtained label \(z_i\) to match the ground truth label provided by the datasets. \(\delta (a,b)\) is an indicator function. \(\delta (a,b) = 1\), when \(a=b\), otherwise, \(\delta (a,b) = 0\).

NMI is defined as follows: (35) \(\begin{equation} {\rm {NMI}}(C,\hat{C}){\rm { = }}\frac{{{\rm {MI}}(C,\hat{C})}}{{\sqrt {{\rm {H}}(C){\rm {H}}(\hat{C})} }}, \end{equation}\) where \(C\) and \(\hat{C}\) are two cluster sets, one for ground truth and another for the label obtained by clustering methods. \(\rm {H}(C)\) and \(\rm {H}(\hat{C})\) denote the entropy of cluster sets \(C\) and \(\hat{C}\). \(\rm {MI}(C,\hat{C})\) is the mutual information between \(C\) and \(\hat{C}\), which is defined as (36) \(\begin{equation} {\rm {MI}}(C,\hat{C}) = \sum \limits _{{c_i} \in C,{{\hat{c}}_j} \in \hat{C}} {p({c_i},{{\hat{c}}_j}){{\log }_2}\frac{{p({c_i},{{\hat{c}}_j})}}{{p({c_i})p({{\hat{c}}_j})}}}, \end{equation}\) where \(p(c_i,c_j)\) denotes the joint probability that a randomly selected item belongs to \(c_i\) and \(c_j\) simultaneously, while \(p(c_i)\) and \(p(c_j)\) denote the probability that a randomly selected item belongs to \(c_i\) and \(c_j\) , respectively.

5.2 Datasets

In the experiments, six multiview datasets are used to evaluate the effectiveness of the proposed method on the clustering task. The statistics are summarized in Table 1.

Table 1.
datasetSizeViewsFeaturesClasses
Yale16532048/256/102415
ORL40032048/256/102440
FEI70032048/256/102450
3sources16933560/3631/30686
Texas1872187/17035
Carotid1378217/172

Table 1. Database Description

(1)

Yale 1 Dataset. This dataset contains 165 grayscale images collected from 15 people. Each person has 11 images with different facial expressions or configurations and all these images are normalized to 32 \(\times\) 32 pixel array.

(2)

FEI Part 1 2 Dataset. The FEI part 1 dataset is a subset of the original FEI database. This dataset contains 700 color images captured from 50 people. Each person has 14 images taken from different views. The original 640 \(\times\) 480 resolution images were downsampled to 32 \(\times\) 24 grayscale pixels.

(3)

ORL 3 Dataset. This dataset has 400 grayscale face images collected from 40 people, each 10 images. These images are taken in different light conditions, with different facial expression and with/without glasses. From this database, two subsets are produced for testing.

(4)

Texas4 Dataset. The dataset contains 187 documents over the 5 labels (student, project, course, staff, faculty). The documents are described by 1,703 words in the content view, and by 187 links between them in cites views.

(5)

3sources [16] Dataset. This dataset has 948 texts collected from three news sources, i.e., BBC, Reuters, and the Guardian. In this article, the experiments are conducted on the subset with 169 news articles and six topical labels that are reported in all three sources. The dimensions of this dataset’s three views are 3,560; 3,631; and 3,068, respectively.

(6)

Carotid Dataset. This dataset is an Electronic Medical Record dataset which has 1,378 records with 34 attributes. This dataset is a subset of the raw data collected from stroke screening and prevention project conducted by Shenzhen Second People’s Hospital (Ethical approval Number: 20200116002). There are two classes in this dataset, i.e., abnormal carotid artery and normal carotid artery. Two views are constructed by splitting the whole attributes evenly, each has 17 attributes. The whole attributes are ranked by mean feature importance score of six feature importance scoring methods [19]. Odd-ranked attributes are used to construct one view and even-ranked attributes are used to construct the other view. The feature importance scores are provided by the work in Reference [19].

For Yale, ORL, and FEI datasets, the multiple views are the image intensity, Gabor [23], and LBP [37] with the dimensions of 1,024; 2,048; and 256, respectively.

The clustering results are obtained by conducting k-means clustering method on the learned representations of above algorithms. To avoid errors from randomness, all unsupervised methods were tested 10 times, and the average value is reported. To evaluate the performance of semi-supervised methods, for parameter analysis experiments, 10\(\%\), 20\(\%,\) and 30\(\%\) data points are labeled randomly for 5 times, and the average clustering results are reported. For comparison experiments, 10\(\%\), 20\(\%,\) and 30\(\%\) data points are labeled randomly for 20 times, and the average clustering results and statistically significant differences (p-value) are reported.

5.3 Comparative Algorithms

To evaluate the effectiveness of the proposed methods, they are compared with several representative NMF-based multi-view clustering approaches, including seven unsupervised ones and four semi-supervised ones. These methods are described as follows:

VAGNMF. Conduct GNMF [4] on each view individually and average feature of multiple views are treated as final representation.

VCGNMF. Conduct GNMF on each view individually and concatenate the features of multiple views as final representation.

LP-DiNMF [42]. The objective function of this method involves a diverse term which is defined to enforce the heterogeneity of the different views. Local geometric structure information is also utilized to improve the performance.

rNNMF [10]. This method attempts to deal with the noise and outliers among the views through defining a reconstruction term and a neighbor-structure-preserving term with \(\ell _{2,1}\)-norm.

MPMNMF_1 [46]. This method has adopted the Euclidean distance based pair-wise co-regularization to align multiple features. Graph regularization is constructed to encode the geometry information of each view.

MPMNMF_2 [46]. This method has adopted the kernel based pair-wise co-regularization to align multiple features. Graph regularization is constructed to encoding the geometry information of each view.

UDNMF [52]. This method factorizes the each view into three matrices and constrain the columns of product matrix of basis matrix and shared embedding matrix to be unit vector. Graph regularization is imposed on the coefficient matrix of each view. Centroid co-regularization is used to learn a common consensus matrix as the final representation.

MVCDMF [60]. This is a deep multi-view semi-NMF algorithm. “semi-nonnegative” means that it only constrains the shared coefficient matrix to be nonnegative. Graph regularization is also used to utilize the geometrical structure information of data. The shared coefficient matrix is used as the final representation.

MvDGNMF [26]. This is a deep multi-view NMF algorithm. It iteratively factorizes the coefficient matrix in each layer to get hierarchical representations of data. Similar to UDNMF, it adopts centroid co-regularization to fuse the multiple views to obtain the final representation.

AMVNMF [43]. This method factorizes each view in CNMF [30] framework and try to learn a consensus auxiliary matrix. The entries summation of each columns of auxiliary matrix is constrained to be 1. An auto-weighting strategy is imposed on the consensus learning terms.

MVCNMF [5]. This method factorizes each view in CNMF framework and align multiple views with Euclidean distance based pair-wise co-regularization. \(\ell _{2,1}\)-norm regularization is imposed on the auxiliary matrix in each view.

MVOCNMF [6]. This method factorizes each view in CNMF framework and align multiple views with Euclidean distance based pair-wise co-regularization. An orthonormality constraint is imposed on the auxiliary matrix in each view to normalize the scale of the feature.

GPSNMF [28]. This method is a semi-supervised partially shared MVNMF, which tries to make use of both distinct and shared information of multiple views. A graph regularization is also constructed for each view.

The results of above methods with optimal parameter settings (obtained by grid search) are reported. Note that, before conducting all methods, the data points of the original data are normalized as unit vectors. The source code of this paper is available at: https://github.com/GuoshengCui/FRSMNMF.

5.4 Convergence Analysis

To verify the convergence of FRSMNMF_N1 and FRSMNMF_N2, the convergence curves on six datasets with labeling ratios of 10\(\%\), 20\(\%,\) and 30\(\%\) are visualized in Figure 1. We also visualize the variation of clustering performance with different iterations. In each figure, the left y-axis denotes the objective function value, the right y-axis denotes the clustering accuracy and the x-axis is the iteration number. We can see that the proposed methods converge on all datasets and the clustering performances on all datasets become stable in 300 iterations.

Fig. 1.

Fig. 1. Convergence curves and clustering performances of FRSMNMF_N1 and FRSMNMF_N2 with labeling ratios of 10 \(\%\) , 20 \(\%\) and 30 \(\%\) on six datasets. (Best viewed in color.)

5.5 Parameter Analysis and Comparisons

5.5.1 Parameter Analysis.

There are three hyper-parameters, i.e., the graph regularization parameter \(\alpha\), the fusion regularization parameter \(\beta\) and \(k\) nearest neighbor parameter, in FRSMNMF_N1 and FRSMNMF_N2. The number of hyper-parameters in our methods is the same as the most unsupervised MVNMFs, such as LP-DiNMF, MPMNMF\(\_\)1, MPMNMF\(\_\)2 and UDNMF; and less than the most semi-supervised MVNMFs, such as MVCNMF, MVOCNMF, GPSNMF, and the method in [31]. Figure 2 demonstrates the influence of parameters \(\alpha\) and \(\beta\) to FRSMNMF_N1 and FRSMNMF_N2 on six datasets with different labeling ratios, i.e., 10\(\%\), 20\(\%\), and 30\(\%\). From this figure, we can see that the performance varying trends of the proposed methods on \(\alpha\) and \(\beta\) are very similar. This phenomenon indicates that the value of \(\alpha /\beta\) is relatively stable on all datasets. The best parameter settings of the proposed methods on six datasets with different labeling ratios are demonstrated on Table 2. We can see that the value of \(\alpha /\beta\) with best results are always locating in the sets of \(\lbrace 1, 10, 100\rbrace\) with \(\alpha\) around \(10^5\) or \(10^1\).

Table 2.
FRSMNMF_N1FRSMNMF_N2
10\(\%\)20\(\%\)30\(\%\)10\(\%\)20\(\%\)30\(\%\)
Yale (0, \(-\)1) (5, 4) (5, 4) (2, 1) (2, 0) (2, 1)
ORL(0, \(-\)1)(0, \(-\)1)(0, \(-\)1)(3, 2)(5, 5)(5, 4)
FEI(5, 4)(5, 3)(5, 4)(5, 4)(5, 3)(5, 4)
3sources(0, \(-\)1)(0, \(-\)1)(0, \(-\)1)(0, \(-\)2)(0, \(-\)2)(0, \(-\)1)
Texas(5, 5)(1, \(-\)1)(0, \(-\)1)(5, 5)(5, 5)(1, \(-\)1)
Carotid(5, 5)(5, 5)(5, 5)(5, 5)(5, 5)(5, 5)
  • The values in brackets are (log(\(\alpha\)),\(\\)log(\(\beta\))).

Table 2. The Best Parameter Settings of FRSMNMF_N1 and FRSMNMF_N2 on Six Datasets, the Labeling Ratios are 10 \(\%\) , 20 \(\%\) , and 30 \(\%\)

  • The values in brackets are (log(\(\alpha\)),\(\\)log(\(\beta\))).

Fig. 2.

Fig. 2. Clustering performances of FRSMNMF_N1 and FRSMNMF_N2 versus parameters \(\alpha\) and \(\beta\) with labeling ratios of 10 \(\%\) , 20 \(\%,\) and 30 \(\%\) on six datasets.

Figure 3 has demonstrated the performance variation of GPSNMF, FRSMNMF_N1, and FRSMNMF_N2 versus the parameter \(k\) on six datasets with different labeling ratios. We can see that on most datasets, FRSMNMF_N1 and FRSMNMF_N2 have obtained better performance than GPSNMF on a large range of \(k\). Basically, the varying trends of these three methods on six datasets are familiar.

Fig. 3.

Fig. 3. Clustering performances of FRSMNMF_N1, FRSMNMF_N2, and GPSNMF versus parameter \(k\) nearest neighbors with labeling ratios of 10 \(\%\) , 20 \(\%\) and 30 \(\%\) on six datasets.

5.5.2 Comparisons.

In Table 3, the comparing results of our methods and several recently proposed semi-supervised MVNMFs are demonstrated. Statistical significant differences (p-value) between FRSMNMF_N1 and the other methods are also reported. AMVNMF, MVCNMF, and MVOCNMF are all CNMF-based MVNMFs, and their performances are obviously not as good as GPSNMF, FRSMNMF_N1, and FRSMNMF_N2. One reason is that, the former three methods have not considered geometrical information of the data. The second reason is that, the former three methods have all failed to explore the discriminative information of the data, although they have considered intra-class compactness due to the basis of CNMF. On Yale, ORL, FEI, and 3sources datasets, the advantage of FRSMNMF_N1 is significant comparing to the other semi-supervised MVNMF methods (except FRSMNMF_N2). On Texas and Carotid datasets, the advantage of FRSMNMF_N1 is significant comparing to AMVNMF, MVCNMF and MVOCNMF. On these two datasets, the performances of GPSNMF are approaching FRSMNMF_N1 as the labeling ratios increase. Overall, on all datasets, FRSMNMF_N1 usually report the best results. And in many cases, the performance differences of FRSMNMF_N1 and FRSMNMF_N2 are not that significant. Although FRSMNMF_N1 is slightly better than FRSMNMF_N2 which indicates that the former feature normalizing strategy works better.

Table 3.
DatasetsYaleORL
10\(\%\)20\(\%\)30\(\%\)10\(\%\)20\(\%\)30\(\%\)
ACp-valueACp-valueACp-valueACp-valueACp-valueACp-value
AMVNMF [43]49.97\(\lt\)0.00155.80\(\lt\)0.00162.95\(\lt\)0.00163.65\(\lt\)0.00168.02\(\lt\)0.00172.57\(\lt\)0.001
MVCNMF [5]50.93\(\lt\)0.00157.93\(\lt\)0.00163.56\(\lt\)0.00166.17\(\lt\)0.00170.07\(\lt\)0.00174.22\(\lt\)0.001
MVOCNMF [6]51.14\(\lt\)0.00158.03\(\lt\)0.00164.98\(\lt\)0.00166.40\(\lt\)0.00170.75\(\lt\)0.00175.30\(\lt\)0.001
GPSNMF [28]57.73\(\lt\)0.00167.50\(\lt\)0.00174.06\(\lt\)0.00168.82\(\lt\)0.00181.24\(\lt\)0.00184.09\(\lt\)0.001
FRSMNMF_N262.250.77770.790.48478.000.90372.05\(\lt\)0.00182.65\(\lt\)0.00187.980.015
FRSMNMF_N162.0171.4178.0776.4085.4889.24
NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
AMVNMF [43]53.11\(\lt\)0.00159.62\(\lt\)0.00166.42\(\lt\)0.00179.27\(\lt\)0.00182.23\(\lt\)0.00184.81\(\lt\)0.001
MVCNMF [5]54.60\(\lt\)0.00161.62\(\lt\)0.00167.19\(\lt\)0.00181.07\(\lt\)0.00183.69\(\lt\)0.00186.01\(\lt\)0.001
MVOCNMF [6]54.59\(\lt\)0.00161.51\(\lt\)0.00168.13\(\lt\)0.00181.13\(\lt\)0.00184.02\(\lt\)0.00186.64\(\lt\)0.001
GPSNMF [28]59.27\(\lt\)0.00166.00\(\lt\)0.00172.05\(\lt\)0.00183.35\(\lt\)0.00190.04\(\lt\)0.00191.46\(\lt\)0.001
FRSMNMF_N261.110.00168.030.29875.150.98083.97\(\lt\)0.00189.77\(\lt\)0.00192.330.020
FRSMNMF_N163.1468.7775.1686.6390.9693.05
DatasetsFEI3sources
10\(\%\)20\(\%\)30\(\%\)10\(\%\)20\(\%\)30\(\%\)
ACp-valueACp-valueACp-valueACp-valueACp-valueACp-value
AMVNMF [43]56.65\(\lt\)0.00160.06\(\lt\)0.00166.36\(\lt\)0.00159.27\(\lt\)0.00165.83\(\lt\)0.00170.79\(\lt\)0.001
MVCNMF [5]59.19\(\lt\)0.00163.58\(\lt\)0.00169.60\(\lt\)0.00161.65\(\lt\)0.00169.83\(\lt\)0.00172.72\(\lt\)0.001
MVOCNMF [6]59.88\(\lt\)0.00163.47\(\lt\)0.00169.68\(\lt\)0.00157.44\(\lt\)0.00172.30\(\lt\)0.00181.12\(\lt\)0.001
GPSNMF [28]81.45\(\lt\)0.00184.56\(\lt\)0.00188.84\(\lt\)0.00180.79\(\lt\)0.00188.30\(\lt\)0.00191.05\(\lt\)0.001
FRSMNMF_N283.670.50687.030.42891.780.79687.910.02092.060.08393.45\(\lt\)0.001
FRSMNMF_N183.8886.7991.8389.9193.3095.24
NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
AMVNMF [43]76.13\(\lt\)0.00178.16\(\lt\)0.00182.36\(\lt\)0.00155.62\(\lt\)0.00161.02\(\lt\)0.00165.74\(\lt\)0.001
MVCNMF [5]77.43\(\lt\)0.00180.33\(\lt\)0.00184.35\(\lt\)0.00160.00\(\lt\)0.00166.76\(\lt\)0.00166.67\(\lt\)0.001
MVOCNMF [6]77.79\(\lt\)0.00180.16\(\lt\)0.00184.37\(\lt\)0.00151.77\(\lt\)0.00159.36\(\lt\)0.00168.13\(\lt\)0.001
GPSNMF [28]90.040.00991.510.11193.60\(\lt\)0.00167.39\(\lt\)0.00175.00\(\lt\)0.00178.56\(\lt\)0.001
FRSMNMF_N290.530.73991.930.65995.040.89574.57\(\lt\)0.00182.210.02984.04\(\lt\)0.001
FRSMNMF_N190.8891.8495.0679.3984.7088.06
DatasetsTexasCarotid
10\(\%\)20\(\%\)30\(\%\)10\(\%\)20\(\%\)30\(\%\)
ACp-valueACp-valueACp-valueACp-valueACp-valueACp-value
AMVNMF [43]45.55\(\lt\)0.00149.66\(\lt\)0.00155.73\(\lt\)0.00153.47\(\lt\)0.00156.61\(\lt\)0.00154.53\(\lt\)0.001
MVCNMF [5]64.61\(\lt\)0.00167.80\(\lt\)0.00170.77\(\lt\)0.00156.47\(\lt\)0.00160.49\(\lt\)0.00161.65\(\lt\)0.001
MVOCNMF [6]66.32\(\lt\)0.00167.91\(\lt\)0.00169.80\(\lt\)0.00156.43\(\lt\)0.00160.01\(\lt\)0.00165.45\(\lt\)0.001
GPSNMF [28]64.57\(\lt\)0.00170.12\(\lt\)0.00175.250.03270.230.06275.560.08678.880.194
FRSMNMF_N269.290.22472.230.00273.54\(\lt\)0.00171.040.87075.910.68779.191.000
FRSMNMF_N169.9673.9677.2071.1176.0379.19
NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
AMVNMF [43]20.31\(\lt\)0.00123.37\(\lt\)0.00128.57\(\lt\)0.0010.93\(\lt\)0.0013.41\(\lt\)0.0012.11\(\lt\)0.001
MVCNMF [5]32.27\(\lt\)0.00136.23\(\lt\)0.00141.71\(\lt\)0.0011.98\(\lt\)0.0014.88\(\lt\)0.0018.65\(\lt\)0.001
MVOCNMF [6]32.49\(\lt\)0.00136.14\(\lt\)0.00140.78\(\lt\)0.0011.90\(\lt\)0.0014.51\(\lt\)0.0017.06\(\lt\)0.001
GPSNMF [28]26.440.09839.960.18553.80\(\lt\)0.00112.500.12520.290.33326.000.501
FRSMNMF_N228.870.73137.01\(\lt\)0.00147.100.52513.410.94520.490.65626.311.000
FRSMNMF_N129.1641.5246.4213.4520.7026.31
  • The best results are highlighted in bold and the second best results are in italic.

Table 3. Clustering Performance (AC( \(\%\) ) and NMI( \(\%\) )) of Several Semi-supervised MVNMFs on Six Datasets with Different Labeling Ratios

  • The best results are highlighted in bold and the second best results are in italic.

Our methods are also compared with several recently proposed unsupervised MVNMFs. The comparing results are shown in Table 4. The last two rows are our methods with 10\(\%\) labeled data points. Similarly, statistical significant differences (p-value) between FRSMNMF_N1 and the other methods are also reported. We can see that the advantage of the proposed FRSMNMF_N1 and FRSMNMF_N2 is very obvious comparing to the other methods. On ORL and 3sources datasets, FRSMNMF_N1 is superior to FRSMNMF_N2. On Yale, FEI, Texas and Carotid datasets, the performance differences of FRSMNMF_N1 and FRSMNMF_N2 are not that significant. And on all datasets, FRSMNMF_N1 is significantly better than the other unsupervised MVNMFs.

Table 4.
DatasetsYaleORLFEI3sourcesTexasCarotid
ACp-valueACp-valueACp-valueACp-valueACp-valueACp-value
VCGNMF [4]54.24\(\lt\)0.00171.33\(\lt\)0.00169.54\(\lt\)0.00159.17\(\lt\)0.00162.51\(\lt\)0.00161.51\(\lt\)0.001
VAGNMF [4]50.00\(\lt\)0.00169.13\(\lt\)0.00168.83\(\lt\)0.00162.66\(\lt\)0.00154.28\(\lt\)0.00157.17\(\lt\)0.001
LP-DiNMF [42]50.85\(\lt\)0.00170.43\(\lt\)0.00169.54\(\lt\)0.00162.78\(\lt\)0.00155.35\(\lt\)0.00157.03\(\lt\)0.001
rNNMF [10]50.42\(\lt\)0.00165.58\(\lt\)0.00155.26\(\lt\)0.00162.07\(\lt\)0.00160.37\(\lt\)0.00157.89\(\lt\)0.001
UDNMF [52]47.88\(\lt\)0.00167.95\(\lt\)0.00174.71\(\lt\)0.00162.78\(\lt\)0.00154.71\(\lt\)0.00160.86\(\lt\)0.001
MPMNMF\(\_\)1 [46]50.85\(\lt\)0.00170.40\(\lt\)0.00169.04\(\lt\)0.00172.13\(\lt\)0.00165.72\(\lt\)0.00157.20\(\lt\)0.001
MPMNMF\(\_\)2 [46]50.12\(\lt\)0.00169.20\(\lt\)0.00168.76\(\lt\)0.00165.27\(\lt\)0.00161.71\(\lt\)0.00156.99\(\lt\)0.001
MVCDMF [60]53.52\(\lt\)0.00169.28\(\lt\)0.00168.84\(\lt\)0.00177.28\(\lt\)0.00156.74\(\lt\)0.00157.49\(\lt\)0.001
MvDGNMF [26]50.79\(\lt\)0.00170.45\(\lt\)0.00174.17\(\lt\)0.00171.66\(\lt\)0.00158.13\(\lt\)0.00156.79\(\lt\)0.001
FRSMNMF_N262.250.77772.05\(\lt\)0.00183.670.25487.91\(\lt\)0.00169.290.05071.040.333
FRSMNMF_N162.0176.4083.8889.9169.9671.11
NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
VCGNMF [4]57.55\(\lt\)0.00184.68\(\lt\)0.00185.40\(\lt\)0.00158.68\(\lt\)0.00128.70\(\lt\)0.0013.97\(\lt\)0.001
VAGNMF [4]52.74\(\lt\)0.00183.67\(\lt\)0.00184.48\(\lt\)0.00156.50\(\lt\)0.00121.48\(\lt\)0.0012.36\(\lt\)0.001
LP-DiNMF [42]53.81\(\lt\)0.00184.65\(\lt\)0.00184.50\(\lt\)0.00156.37\(\lt\)0.00124.49\(\lt\)0.0012.34\(\lt\)0.001
rNNMF [10]52.56\(\lt\)0.00180.91\(\lt\)0.00174.13\(\lt\)0.00157.83\(\lt\)0.00127.89\(\lt\)0.0012.31\(\lt\)0.001
UDNMF [52]50.69\(\lt\)0.00182.22\(\lt\)0.00187.17\(\lt\)0.00158.41\(\lt\)0.00122.93\(\lt\)0.0014.44\(\lt\)0.001
MPMNMF\(\_\)1 [46]54.39\(\lt\)0.00184.21\(\lt\)0.00185.09\(\lt\)0.00165.71\(\lt\)0.00131.48\(\lt\)0.0012.38\(\lt\)0.001
MPMNMF\(\_\)2 [46]52.54\(\lt\)0.00184.46\(\lt\)0.00185.08\(\lt\)0.00158.61\(\lt\)0.00126.46\(\lt\)0.0012.59\(\lt\)0.001
MVCDMF [60]56.05\(\lt\)0.00184.27\(\lt\)0.00184.54\(\lt\)0.00171.59\(\lt\)0.00125.12\(\lt\)0.0012.16\(\lt\)0.001
MvDGNMF [26]53.44\(\lt\)0.00184.53\(\lt\)0.00187.15\(\lt\)0.00161.67\(\lt\)0.00128.39\(\lt\)0.0012.11\(\lt\)0.001
FRSMNMF_N261.110.00183.97\(\lt\)0.00190.530.37274.57\(\lt\)0.00128.870.02613.410.691
FRSMNMF_N163.1486.6390.8879.3929.1613.45
  • The best results are highlighted in bold and the second best results are in italic.

Table 4. Clustering Performance (AC( \(\%\) ) and NMI( \(\%\) )) of the Proposed Methods and Several Unsupervised MVNMFs on Six Datasets

  • The best results are highlighted in bold and the second best results are in italic.

We also have tested the performance variation of all semi-supervised MVNMFs with different labeling ratios. The varying curves are demonstrated in Figure 4 with the mean results of all unsupervised methods as baseline. We can see that on the whole labeling ratios, GPSNMF, FRSMNMF_N1 and FRSMNMF_N2 are above the baseline. When the number of labeled data points is small, AMVNMF, MVCNMF and MVOCNMF sometimes behave worse than the baseline. Note that all unsupervised MVNMFs have considered geometrical information of the data. This means that both label information and geometrical information of the data are important for the learning of meaningful representation.

Fig. 4.

Fig. 4. Clustering performances of FRSMNMF_N1 and FRSMNMF_N2 versus different labeling ratios on six datasets.

5.5.3 Studying the Effects of Individual Views.

To study the effect of each individual view, we apply k-means algorithm on the feature of each view and report the clustering results. To validate the clustering performance of FRSMNMF_N1 and FRSMNMF_N2, we select MVCNMF and MVOCNMF as baselines. Statistical significant differences (p-value) between FRSMNMF_N1 and the other methods are also reported. Table 5 provides the clustering results on Yale, ORL, 3sources and Carotid datasets. The labeling ratios are all set to 30\(\%\). From Table 5, we can see that the proposed methods FRSMNMF_N1 and FRSMNMF_N2 outperform MVCNMF and MVOCNMF in each view. This phenomenon verifies the feature quality of each view in FRSMNMF_N1 and FRSMNMF_N2.

Table 5.
DatasetsYaleORL
view1view2view3view1view2view3
ACp-valueACp-valueACp-valueACp-valueACp-valueACp-value
MVCNMF [5]60.95\(\lt\)0.00164.70\(\lt\)0.00156.56\(\lt\)0.00174.13\(\lt\)0.00174.54\(\lt\)0.00174.14\(\lt\)0.001
MVOCNMF [6]63.91\(\lt\)0.00163.97\(\lt\)0.00164.07\(\lt\)0.00175.06\(\lt\)0.00175.37\(\lt\)0.00174.42\(\lt\)0.001
FRSMNMF_N277.840.95477.960.95476.710.95488.010.13388.410.13387.350.133
FRSMNMF_N177.9877.9376.6789.1689.6188.17
NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
MVCNMF [5]64.68\(\lt\)0.00168.03\(\lt\)0.00161.01\(\lt\)0.00185.99\(\lt\)0.00186.22\(\lt\)0.00186.02\(\lt\)0.001
MVOCNMF [6]67.64\(\lt\)0.00167.64\(\lt\)0.00167.63\(\lt\)0.00186.65\(\lt\)0.00186.70\(\lt\)0.00186.28\(\lt\)0.001
FRSMNMF_N274.690.95274.930.95273.880.95292.240.08492.550.08491.770.084
FRSMNMF_N174.7674.9273.8592.9293.3292.34
Datasets3sourcesCarotid
view1view2view3view1view2
ACp-valueACp-valueACp-valueACp-valueACp-value
MVCNMF [5]71.65\(\lt\)0.00169.54\(\lt\)0.00168.89\(\lt\)0.00163.78\(\lt\)0.00164.73\(\lt\)0.001
MVOCNMF [6]81.75\(\lt\)0.00181.72\(\lt\)0.00181.74\(\lt\)0.00165.22\(\lt\)0.00165.22\(\lt\)0.001
FRSMNMF_N293.010.00492.860.00493.300.00476.581.00072.961.000
FRSMNMF_N195.1894.5894.8776.5872.96
NMIp-valueNMIp-valueNMIp-valueNMIp-valueNMIp-value
MVCNMF [5]67.38\(\lt\)0.00165.82\(\lt\)0.00164.77\(\lt\)0.00112.01\(\lt\)0.00110.71\(\lt\)0.001
MVOCNMF [6]69.13\(\lt\)0.00169.07\(\lt\)0.00169.09\(\lt\)0.0016.82\(\lt\)0.0016.82\(\lt\)0.001
FRSMNMF_N282.64\(\lt\)0.00182.54\(\lt\)0.00183.82\(\lt\)0.00121.811.00015.961.000
FRSMNMF_N187.9986.1187.4621.8115.96
  • The labeling ratios are all set to 30\(\%\). The best results are highlighted in bold and the second best results are in italic.

Table 5. Individual View Clustering Performance (AC( \(\%\) ) and NMI( \(\%\) )) of MVCNMF, MVOCNMF, FRSMNMF_N1, and FRSMNMF_N2 on Yale, ORL, 3sources and Carotid Datasets

  • The labeling ratios are all set to 30\(\%\). The best results are highlighted in bold and the second best results are in italic.

5.5.4 Ablation Study.

In this subsection, we will further explore the influences of graph regularization, fusion regularization, the constraints \(||{\bf W}_{\cdot j}^v||_2=1\) and \(||{\bf W}_{\cdot j}^v||_1=1\). We denote FRSMNMF_N1 w\(|\)o GR as FRSMNMF_N1 without graph regularization, FRSMNMF_N1 w\(|\)o FR as FRSMNMF_N1 without fusion regularization, FRSMNMF_N2 w\(|\)o GR as FRSMNMF_N2 without graph regularization, FRSMNMF_N2 w\(|\)o FR as FRSMNMF_N2 without fusion regularization. We further denote FRSMNMF as FRSMNMF_N1 or FRSMNMF_N2 without the constraint \(||{\bf W}_{\cdot j}^v||_2=1\) or \(||{\bf W}_{\cdot j}^v||_1=1\). The performances of above methods on Yale, ORL, FEI and Carotid datasets with the labeling ratios of 10\(\%\) are reported. The experimental results for the ablation study of FRSMNMF N1 and FRSMNMF N2 on Yale, ORL, 3sources and Carotid datasets are reported in Table 6. From Table 6, we can see that, on all datasets, the performances of FRSMNMF_N1 and FRSMNMF_N2 decrease without graph regularization or fusion regularization. This indicates that both graph regularization and fusion regularization contribute to the performances of FRSMNMF_N1 and FRSMNMF_N2. Comparing the results of FRSMNMF_N1, FRSMNMF_N2, and FRSMNMF, it can be seen that the performance of FRSMNMF decreases without the constraint \(||{\bf W}_{\cdot j}^v||_2=1\) or \(||{\bf W}_{\cdot j}^v||_1=1\). It means that the constraints \(||{\bf W}_{\cdot j}^v||_2=1\) and \(||{\bf W}_{\cdot j}^v||_1=1\) contribute to the performances of FRSMNMF_N1 and FRSMNMF_N2, respectively.

Table 6.
YaleORL
NormMethodACNMIACNMI
NoneFRSMNMF61.55 \(\pm\) 3.0260.97 \(\pm\) 2.1769.69 \(\pm\) 2.0382.35 \(\pm\) 1.40
\(||{\bf W}_{\cdot j}^v||_2=1\)FRSMNMF_N162.01 \(\pm\) 2.3163.14 \(\pm\) 1.6276.40 \(\pm\) 1.3986.63 \(\pm\) 0.71
FRSMNMF_N1 w\(|\)o GR58.43 \(\pm\) 2.4059.74 \(\pm\) 1.9171.24 \(\pm\) 1.5082.97 \(\pm\) 0.73
FRSMNMF_N1 w\(|\)o FR49.78 \(\pm\) 0.9153.23 \(\pm\) 0.5467.97 \(\pm\) 0.7182.96 \(\pm\) 0.29
\(||{\bf W}_{\cdot j}^v||_1=1\)FRSMNMF_N262.25 \(\pm\) 3.0161.11 \(\pm\) 2.0072.05 \(\pm\) 1.5183.97 \(\pm\) 0.76
FRSMNMF_N2 w\(|\)o GR44.48 \(\pm\) 1.1448.05 \(\pm\) 0.7162.56 \(\pm\) 0.7978.60 \(\pm\) 0.40
FRSMNMF_N2 w\(|\)o FR51.47 \(\pm\) 0.9954.97 \(\pm\) 0.8668.64 \(\pm\) 0.7183.51 \(\pm\) 0.28
3sourcesCarotid
NormMethodACNMIACNMI
NoneFRSMNMF84.35 \(\pm\) 3.9067.82 \(\pm\) 5.6869.07 \(\pm\) 1.4810.91 \(\pm\) 1.74
\(||{\bf W}_{\cdot j}^v||_2=1\)FRSMNMF_N189.91 \(\pm\) 2.4979.39 \(\pm\) 3.4171.11 \(\pm\) 1.4913.45 \(\pm\) 1.96
FRSMNMF_N1 w\(|\)o GR85.04 \(\pm\) 4.1773.76 \(\pm\) 3.6361.11 \(\pm\) 1.094.14 \(\pm\) 0.58
FRSMNMF_N1 w\(|\)o FR60.55 \(\pm\) 2.5356.42 \(\pm\) 1.8655.76 \(\pm\) 0.122.54 \(\pm\) 0.07
\(||{\bf W}_{\cdot j}^v||_1=1\)FRSMNMF_N287.91 \(\pm\) 2.7174.57 \(\pm\) 3.9571.04 \(\pm\) 1.4613.41 \(\pm\) 1.90
FRSMNMF_N2 w\(|\)o GR53.68 \(\pm\) 1.7650.50 \(\pm\) 1.6058.48 \(\pm\) 0.293.32 \(\pm\) 0.17
FRSMNMF_N2 w\(|\)o FR59.10 \(\pm\) 0.6455.08 \(\pm\) 0.8557.17 \(\pm\) 0.812.60 \(\pm\) 0.41
  • The labeling ratio is set to 10\(\%\). The best results are highlighted in bold and the second best results are in italic.

Table 6. Ablation Study of FRSMNMF_N1 and FRSMNMF_N2 on Yale, ORL, 3sources and Carotid Datasets

  • The labeling ratio is set to 10\(\%\). The best results are highlighted in bold and the second best results are in italic.

5.5.5 Performance on Noise Datasets.

In this section, we will evaluate the performances of the proposed methods on noise datasets. Specifically, we manually pollute Yale, ORL, FEI and Carotid datasets with different levels of noise. And the different levels of pollution are simulated by randomly discarding 10\(\%\), 20\(\%\) , and 30\(\%\) data points in each view. The performances of FRSMNMF_N1 and FRSMNMF_N2 are evaluated on above constructed noise datasets with MVCNMF and MVOCNMF as baselines. The labeling ratios of all methods are set to 20\(\%\). Before applying above methods, the missing data points in each view are filled with the average value of existing data points in the corresponding view. The experimental results on noise datasets are reported in Table 7. As can be seen from Table 7, the performances of all methods decrease with the increase of noise level. FRSMNMF_N1 is superior to FRSMNMF_N2 in most cases. On the Yale dataset, we find that the performance of FRSMNMF_N1 decreases dramatically, indicating that the constraint \(||{\bf W}_{\cdot j}^v||_2=1\) is more robust-to-noise than the constraint \(||{\bf W}_{\cdot j}^v||_1=1\).

Table 7.
Yale
Noise10\(\%\)20\(\%\)30\(\%\)
MethodACNMIACNMIACNMI
MVCNMF [5]56.45 \(\pm\) 1.6960.18 \(\pm\) 1.6853.32 \(\pm\) 1.7157.85 \(\pm\) 1.3449.01 \(\pm\) 1.4653.40 \(\pm\) 1.17
MVOCNMF [6]56.61 \(\pm\) 1.6660.44 \(\pm\) 1.3753.82 \(\pm\) 1.6858.22 \(\pm\) 1.5350.02 \(\pm\) 1.5754.22 \(\pm\) 1.31
FRSMNMF_N257.59 \(\pm\) 2.5456.16 \(\pm\) 1.7954.60 \(\pm\) 2.7852.14 \(\pm\) 2.2352.37 \(\pm\) 2.6250.03 \(\pm\) 2.40
FRSMNMF_N165.59 \(\pm\) 1.8664.85 \(\pm\) 1.9761.71 \(\pm\) 1.9461.15 \(\pm\) 1.8959.38 \(\pm\) 2.5558.39 \(\pm\) 2.03
ORL
Noise10\(\%\)20\(\%\)30\(\%\)
MethodACNMIACNMIACNMI
MVCNMF [5]67.06 \(\pm\) 1.1181.10 \(\pm\) 0.5164.21 \(\pm\) 1.0978.64 \(\pm\) 0.6958.94 \(\pm\) 1.1674.74 \(\pm\) 0.49
MVOCNMF [6]66.92 \(\pm\) 0.7680.92 \(\pm\) 0.4564.12 \(\pm\) 1.0078.64 \(\pm\) 0.5358.90 \(\pm\) 0.9274.92 \(\pm\) 0.41
FRSMNMF_N278.42 \(\pm\) 1.6985.35 \(\pm\) 1.1275.70 \(\pm\) 1.9283.24 \(\pm\) 1.1271.82 \(\pm\) 1.5180.01 \(\pm\) 1.14
FRSMNMF_N182.26 \(\pm\) 1.4588.68 \(\pm\) 0.8480.09 \(\pm\) 1.2086.91 \(\pm\) 0.8275.43 \(\pm\) 1.2483.69 \(\pm\) 0.78
FEI
Noise10\(\%\)20\(\%\)30\(\%\)
MethodACNMIACNMIACNMI
MVCNMF [5]60.09 \(\pm\) 0.8577.25 \(\pm\) 0.5154.76 \(\pm\) 1.2073.23 \(\pm\) 0.6251.48 \(\pm\) 0.8770.29 \(\pm\) 0.54
MVOCNMF [6]59.85 \(\pm\) 0.9077.04 \(\pm\) 0.6054.63 \(\pm\) 1.0873.27 \(\pm\) 0.5151.08 \(\pm\) 0.8670.22 \(\pm\) 0.57
FRSMNMF_N283.68 \(\pm\) 1.0590.18 \(\pm\) 0.7079.54 \(\pm\) 1.3687.16 \(\pm\) 0.7676.93 \(\pm\) 1.2184.74 \(\pm\) 0.75
FRSMNMF_N183.79 \(\pm\) 1.1390.21 \(\pm\) 0.7279.72 \(\pm\) 1.5987.20 \(\pm\) 0.8176.87 \(\pm\) 1.1184.70 \(\pm\) 0.67
Carotid
Noise10\(\%\)20\(\%\)30\(\%\)
MethodACNMIACNMIACNMI
MVCNMF [5]59.88 \(\pm\) 1.654.54 \(\pm\) 1.0659.07 \(\pm\) 1.123.89 \(\pm\) 0.7858.62 \(\pm\) 0.815.06 \(\pm\) 0.65
MVOCNMF [6]59.69 \(\pm\) 1.544.39 \(\pm\) 0.9658.82 \(\pm\) 1.123.71 \(\pm\) 0.7459.54 \(\pm\) 0.564.18 \(\pm\) 0.31
FRSMNMF_N274.56 \(\pm\) 1.0218.28 \(\pm\) 1.4773.88 \(\pm\) 1.3617.26 \(\pm\) 1.9072.74 \(\pm\) 1.9715.70 \(\pm\) 2.71
FRSMNMF_N174.54 \(\pm\) 0.8818.23 \(\pm\) 1.2873.95 \(\pm\) 0.9617.34 \(\pm\) 1.3973.23 \(\pm\) 1.2916.33 \(\pm\) 1.86
  • The best results are highlighted in bold and the second best results are in italic.

Table 7. Performances of FRSMNMF_N1 and FRSMNMF_N2 on Yale, ORL, FEI and Carotid datasets polluted by different levels of noises (10 \(\%\) , 20 \(\%\) and 30 \(\%\) ). The labeling ratio is set to 20 \(\%\) .

  • The best results are highlighted in bold and the second best results are in italic.

5.5.6 Feature Visualization.

In this section, we visualize the fused features learned by MVOCNMF, MVCNMF, FRSMNMF_N1, and FRSMNMF_N2 in Figure 5 (on the Yale dataset) and Figure 6 (on the FEI dataset). In this figure, rows are features and columns are data points. In Figures 5(a) and 5(b), the features of the first 60 data points, whose labels are given in the dataset, are with clear block diagonal structure. The block diagonal structure means that the distances between the data points with the same labels are minimized, approaching zero, while the distances between the data points with the different labels are maximized. In Figures 5(a) and 5(b), the feature matrix of the rest unlabeled data points also shows block diagonal structure, although with some noises. From Figures 5(c) and 5(d), we can see that, for the first 60 data points, although the distances of the data points with the same labels are zero (the data with same labels are represented with the same feature vectors), the distances of the data points with different classes are not minimized. This means that the label information is not effectively used in MVOCNMF and MVCNMF. The similar phenomenon can also be found in Figure 6, in which the labels of the top 250 data points are given.

Fig. 5.

Fig. 5. The feature visualization of FRSMNMF_N1, FRSMNMF_N2, MVCNMF, and MVOCNMF on the Yale dataset. The labeling ratio is 30 \(\%\) . Rows are features and columns are data points.

Fig. 6.

Fig. 6. The feature visualization of FRSMNMF_N1, FRSMNMF_N2, MVCNMF, and MVOCNMF on FEI dataset. The labeling ratio is 30 \(\%\) . Rows are features and columns are data points.

Skip 6CONCLUSION Section

6 CONCLUSION

In this article, a novel semi-supervised MVNMF framework with fusion regularization (FRSMNMF) is proposed. In our work, the discriminative term and the feature alignment term are fused as one regularizing term, this effectively enhances the learning of discriminative feature and reduces the number of hyper-parameters which makes the proposed framework more easy to fine tune than existing semi-supervised MVNMFs. The geometrical information is also considered by constructing graph regularizer for each view. To align multiple views effectively, two feature scale normalizing strategies are adopted, two corresponding specific implementations and iterative optimizing schemes of the proposed framework are presented. The effectiveness of our methods is evaluated by comparing with several state-of-the-art unsupervised and semi-supervised MVNMFs on six datasets.

ACKNOWLEDGMENTS

Thanks to doctor Lijie Ren of Shenzhen Second People’s Hospital for helping to collect and organize the medical dataset used in this paper.

Footnotes

REFERENCES

  1. [1] Bassani H. F. and Araujo A. F. R.. 2015. Dimension selective self-organizing maps with time-varying structure for subspace and projected clustering. IEEE Transactions on Neural Networks and Learning Systems 26, 3 (2015), 458471.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Bickel Steffen and Scheffer Tobias. 2004. Multi-view clustering. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 1926.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Boyd Stephen and Vandenberghe Lieven. 2004. Convex Optimization. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Cai Deng, He Xiaofei, Han Jiawei, and Huang Thomas S.. 2011. Graph regularized nonnegative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 8 (2011), 15481560.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Cai Hao, Liu Bo, Xiao Yanshan, and Lin LuYue. 2019. Semi-supervised multi-view clustering based on constrained nonnegative matrix factorization. Knowledge-Based Systems 182 (2019), 104798.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Cai Hao, Liu Bo, Xiao Yanshan, and Lin LuYue. 2020. Semi-supervised multi-view clustering based on orthonormality-constrained nonnegative matrix factorization. Information Sciences 536 (2020), 171184.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Cai Xiao, Nie Feiping, and Huang Heng. 2013. Multi-view k-means clustering on big data. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Citeseer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Cao Xiaochun, Zhang Changqing, Fu Huazhu, Liu Si, and Zhang Hua. 2015. Diversity-induced multi-view subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586594.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Chao Guoqing, Sun Shiliang, and Bi Jinbo. 2021. A survey on multi-view clustering. IEEE Transactions on Artificial Intelligence 2, 2 (2021), 146168.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Chen Feiqiong, Li Guopeng, Wang Shuaihui, and Pan Zhisong. 2019. Multiview clustering via robust neighboring constraint nonnegative matrix factorization. Mathematical Problems in Engineering 2019 (2019).Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Chen Rui, Tang Yongqiang, Zhang Wensheng, and Feng Wenlong. 2022. Deep multi-view semi-supervised clustering with sample pairwise constraints. Neurocomputing 500 (2022), 832845.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Cormen Thomas H., Leiserson Charles E., Rivest Ronald L., and Stein Clifford. 2009. Introduction to Algorithms. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Cui Beilei, Yu Hong, Zhang Tiantian, and Li Siwen. 2019. Self-weighted multi-view clustering with deep matrix factorization. In Proceedings of the Asian Conference on Machine Learning. 567582.Google ScholarGoogle Scholar
  14. [14] Cui Guosheng and Li Ye. 2022. Nonredundancy regularization based nonnegative matrix factorization with manifold learning for multiview data representation. Information Fusion 82 (2022), 8698.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Elkan Charles. 2003. Using the triangle inequality to accelerate k-means. In Proceedings of the 20th International Conference on Machine Learning, Vol. 3. 147153.Google ScholarGoogle Scholar
  16. [16] Greene Derek and Cunningham Padraig. 2006. Practical solutions to the problem of diagonal dominance in kernel document clustering. In Proceedings of the 23rd International Conference on Machine Learning, Vol. 148. ACM, 377384.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Hu Menglei and Chen Songcan. 2018. Doubly aligned incomplete multi-view clustering. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI). 22622268.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Huang Ling, Chao Hong-Yang, and Wang Chang-Dong. 2019. Multi-view intact space clustering. Pattern Recognition 86 (2019), 344353.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Huang Xiaoxiang, Cui Guosheng, Wu Dan, and Li Ye. 2020. A semi-supervised approach for early identifying the abnormal carotid arteries using a modified variational autoencoder. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine. IEEE, 595600.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Huang Zhenyu, Hu Peng, Zhou Joey Tianyi, Lv Jiancheng, and Peng Xi. 2020. Partially view-aligned clustering. Advances in Neural Information Processing Systems 33 (2020), 28922902.Google ScholarGoogle Scholar
  21. [21] Jain A. K., Murty M. N., and Flynn P. J.. 1999. Data clustering: A review. ACM Computing Surveys. 31, 3 (1999).Google ScholarGoogle Scholar
  22. [22] Jiang Yu, Liu Jing, Li Zechao, and Lu Hanqing. 2014. Semi-supervised unified latent factor learning with multi-view data. Machine Vision and Applications 25 (2014), 16351645.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Lades Martin, Vorbruggen Jan C., Buhmann Joachim, Lange Jörg, Malsburg Christoph Von Der, Wurtz Rolf P., and Konen Wolfgang. 1993. Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput.3 (1993), 300311.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Lee Daniel D. and Seung H. Sebastian. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401, 6755 (1999), 788791.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Lee Daniel D. and Seung H. Sebastian. 2000. Algorithms for non-negative matrix factorization. In Proceedings of the Advances in Neural Information Processing Systems. 556562.Google ScholarGoogle Scholar
  26. [26] Li Jianqiang, Zhou Guoxu, Qiu Yuning, Wang Yanjiao, Zhang Yu, and Xie Shengli. 2020. Deep graph regularized non-negative matrix factorization for multi-view clustering. Neurocomputing 390 (2020), 108116.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Liang Naiyao, Yang Zuyuan, Li Zhenni, Sun Weijun, and Xie Shengli. 2020. Multi-view clustering by non-negative matrix factorization with co-orthogonal constraints. Knowledge-Based Systems 194 (2020), 105582.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Liang Naiyao, Yang Zuyuan, Li Zhenni, Xie Shengli, and Su Chun-Yi. 2020. Semi-supervised multi-view clustering with graph-regularized partially shared non-negative matrix factorization. Knowledge-Based Systems 190 (2020), 105185.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Liang Naiyao, Yang Zuyuan, Li Zhenni, Xie Shengli, and Sun Weijun. 2021. Semi-supervised multi-view learning by using label propagation based non-negative matrix factorization. Knowledge-Based Systems 228 (2021), 107244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Liu Haifeng, Wu Zhaohui, Li Xuelong, Cai Deng, and Huang Thomas S.. 2012. Constrained nonnegative matrix factorization for image representation. Transactions on Pattern Analysis and Machine Intelligence 34, 7 (2012), 12991311.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Liu Jing, Jiang Yu, Li Zechao, Zhou Zhi-Hua, and Lu Hanqing. 2014. Partially shared latent factor learning with multiview data. IEEE Transactions on Neural Networks and Learning Systems 26, 6 (2014), 12331246.Google ScholarGoogle Scholar
  32. [32] Liu Jialu, Wang Chi, Gao Jing, and Han Jiawei. 2013. Multi-view clustering via joint nonnegative matrix factorization. In Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM, 252260.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Lovász László and Plummer Michael D.. 2009. Matching Theory. Vol. 367. American Mathematical Society.Google ScholarGoogle Scholar
  34. [34] Lu Chun-Ta, He Lifang, Ding Hao, Cao Bokai, and Yu Philip S.. 2018. Learning from multi-view multi-way data via structural factorization machines. In Proceedings of the 2018 World Wide Web Conference. 15931602.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Luo Peng, Peng Jinye, Guan Ziyu, and Fan Jianping. 2018. Dual regularized multi-view non-negative matrix factorization for clustering. Neurocomputing 294 (2018), 111.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Nie Feiping, Cai Guohao, Li Jing, and Li Xuelong. 2017. Auto-weighted multi-view learning for image clustering and semi-supervised classification. IEEE Transactions on Image Processing 27, 3 (2017), 15011511.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Ojala Timo, Pietikäinen Matti, and Mäenpää Topi. 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence7 (2002), 971987.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Ou Weihua, Yu Shujian, Li Gai, Lu Jian, Zhang Kesheng, and Xie Gang. 2016. Multi-view non-negative matrix factorization by patch alignment framework with view consistency. Neurocomputing 204 (2016), 116124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Rai Nishant, Negi Sumit, Chaudhury Santanu, and Deshmukh Om. 2016. Partial multi-view clustering using graph regularized NMF. In Proceedings of the International Conference on Pattern Recognition (ICPR). 21922197.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Shahnaz Farial, Berry Michael W., Pauca V. Paul, and Plemmons Robert J.. 2006. Document clustering using non-negative matrix factorization. Information Processing and Management 42, 2 (2006), 373386.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Shao Weixiang, He Lifang, and Philip S. Yu. 2015. Multiple incomplete views clustering via weighted nonnegative matrix factorization with \(L_{2, 1}\) regularization. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 318334.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Wang Jing, Tian Feng, Yu Hongchuan, Liu Chang Hong, Zhan Kun, and Wang Xiao. 2017. Diverse non-negative matrix factorization for multiview data representation. IEEE Transactions on Cybernetics 48, 9 (2017), 26202632.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Wang Jing, Wang Xiao, Tian Feng, Liu Chang Hong, Yu Hongchuan, and Liu Yanbei. 2016. Adaptive multi-view semi-supervised nonnegative matrix factorization. In Proceedings of the International Conference on Neural Information Processing. Springer, 435444.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wang Senhong, Cao Jiangzhong, Lei Fangyuan, Dai Qingyun, Liang Shangsong, and Ling Bingo Wing-Kuen. 2021. Semi-supervised multi-view clustering with weighted anchor graph embedding. Computational Intelligence and Neuroscience 2021 (2021).Google ScholarGoogle Scholar
  45. [45] Wang Xiaobo, Guo Xiaojie, Lei Zhen, Zhang Changqing, and Li Stan Z.. 2017. Exclusivity-consistency regularized multi-view subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 923931.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Wang Xiumei, Zhang Tianzhen, and Gao Xinbo. 2019. Multiview clustering based on non-negative matrix factorization and pairwise measurements. IEEE Transactions on Cybernetics 49, 9 (2019), 33333346.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Xu Chang, Tao Dacheng, and Xu Chao. 2013. A survey on multi-view learning. CoRR abs/1304.5634 (2013).Google ScholarGoogle Scholar
  48. [48] Xu Jinglin, Han Junwei, and Nie Feiping. 2016. Discriminatively embedded k-means for multi-view clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 53565364.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Xu Wei, Liu Xin, and Gong Yihong. 2003. Document clustering based on non-negative matrix factorization. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 267273.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Yang Xiaojun, Yu Weizhong, Wang Rong, Zhang Guohao, and Nie Feiping. 2020. Fast spectral clustering learning with hierarchical bipartite graph for large-scale data. Pattern Recognition Letters 130 (2020), 345352.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Yang Yan and Wang Hao. 2018. Multi-view clustering: A survey. Big Data Mining and Analytics 1, 2 (2018), 83107.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Yang Zuyuan, Liang Naiyao, Yan Wei, Li Zhenni, and Xie Shengli. 2020. Uniform distribution non-negative matrix factorization for multiview clustering. IEEE Transactions on Cybernetics 51, 6 (2020), 32493262.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Yang Zuyuan, Xiang Yong, Xie Kan, and Lai Yue. 2016. Adaptive method for nonsmooth nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems 28, 4 (2016), 948960.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Yang Zuyuan, Zhang Huimin, Liang Naiyao, Li Zhenni, and Sun Weijun. 2023. Semi-supervised multi-view clustering by label relaxation based non-negative matrix factorization. The Visual Computer 39, 4 (2023), 14091422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Zeng Shan, Wang Xiuying, Cui Hui, Zheng Chaojie, and Feng David. 2017. A unified collaborative multikernel fuzzy clustering for multiview data. IEEE Transactions on Fuzzy Systems 26, 3 (2017), 16711687.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Zha Hongyuan, He Xiaofeng, Ding Chris, Gu Ming, and Simon Horst D.. 2002. Spectral relaxation for k-means clustering. In Proceedings of the Advances in Neural Information Processing Systems. 10571064.Google ScholarGoogle Scholar
  57. [57] Zhang Changqing, Hu Qinghua, Fu Huazhu, Zhu Pengfei, and Cao Xiaochun. 2017. Latent multi-view subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 42794287.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Zhang DongPing, Luo YiHao, Yu YuYuan, Zhao QiBin, and Zhou GuoXu. 2022. Semi-supervised multi-view clustering with dual hypergraph regularized partially shared non-negative matrix factorization. Science China Technological Sciences 65, 6 (2022), 13491365.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Zhang Zheng, Liu Li, Shen Fumin, Shen Heng Tao, and Shao Ling. 2018. Binary multi-view clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 7 (2018), 17741782.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Zhao Handong, Ding Zhengming, and Fu Yun. 2017. Multi-view clustering via deep matrix factorization. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Zhao Jing, Xie Xijiong, Xu Xin, and Sun Shiliang. 2017. Multi-view learning overview: Recent progress and new challenges. Information Fusion 38 (2017), 4354.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Zhao Wei, Xu Cai, Guan Ziyu, and Liu Ying. 2020. Multiview concept learning via deep matrix factorization. IEEE Transactions on Neural Networks and Learning Systems 32, 2 (2020), 814825.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Zhu X., Zhang S., Li Y., Zhang J., Yang L., and Fang Y.. 2019. Low-rank sparse subspace for spectral clustering. IEEE Transactions on Knowledge and Data Engineering 31, 8 (2019), 15321543.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Zong Linlin, Zhang Xianchao, Zhao Long, Yu Hong, and Zhao Qianli. 2017. Multi-view clustering via multi-manifold regularized non-negative matrix factorization. Neural Networks 88 (2017), 7489.Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 18, Issue 6
    July 2024
    760 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3613684
    Issue’s Table of Contents

    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 April 2024
    • Online AM: 18 March 2024
    • Accepted: 8 March 2024
    • Revised: 11 September 2023
    • Received: 19 October 2022
    Published in tkdd Volume 18, Issue 6

    Check for updates

    Qualifiers

    • research-article
  • Article Metrics

    • Downloads (Last 12 months)256
    • Downloads (Last 6 weeks)164

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader