Deep Streaming Graph Representations

Lei, Minglong; Shi, Yong; Li, Peijia; Niu, Lingfeng

doi:10.1007/978-3-319-93713-7_46

Minglong Lei²⁰,
Yong Shi^21,22,23,24,
Peijia Li²⁰ &
…
Lingfeng Niu^21,22,23

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10862))

Included in the following conference series:

International Conference on Computational Science

3016 Accesses
1 Citations

Abstract

Learning graph representations generally indicate mapping the vertices of a graph into a low-dimension space, in which the proximity of the original data can be preserved in the latent space. However, traditional methods that based on adjacent matrix suffered from high computational cost when encountering large graphs. In this paper, we propose a deep autoencoder driven streaming methods to learn low-dimensional representations for graphs. The proposed method process the graph as a data stream fulfilled by sampling strategy to avoid straight computation over the large adjacent matrix. Moreover, a graph regularized deep autoencoder is employed in the model to keep different aspects of proximity information. The regularized framework is able to improve the representation power of learned features during the learning process. We evaluate our method in clustering task by the features learned from our model. Experiments show that the proposed method achieves competitive results comparing with methods that directly apply deep models over the complete graphs.

You have full access to this open access chapter, Download conference paper PDF

Graph Autoencoder with Community Neighborhood Network

A Simple Approach to Attributed Graph Embedding via Enhanced Autoencoder

Simple and Effective Graph Autoencoders with One-Hop Linear Models

Keywords

1 Introduction

Graph representation or graph embedding [17] aims at mapping the vertices into a low-dimensional space while keeping the structural information and revealing the proximity of instances [17]. The compact representations for graph vertices is then useful for further tasks such as classification [10] and clustering [8, 15].

The most intuitive and simple idea to handle graph is only using the connection information and then representing the graph as a deterministic adjacent matrix. Dimension reduction techniques [12] that directly applied in the adjacent matrix can achieve superior performance in many cases.

Directly launch dimension reduction under the complete graph are efficient in many scenarios, they also have obvious disadvantages. Generally speaking, the limitations of direct matrix models are threefold. First, the direct matrix models are easily suffered from high computation complexity in large scale graphs. Since the adjacent matrix is deterministic and fixed, such methods are not flexible enough when the dataset is large. Second, the direct matrix models have not consider enough information in the model. They only provide a global view of the graph structure. However, the local information which depicts the neighborhood information should also be considered in the learned features. Finally, the success of the direct matrix models highly depend on the representation power of dimension reduction models. Methods such as spectral learning [12] and Non-negative Matrix Factorization [9] have limited representation power.

In order to solve those challenges, we propose a new deep graph representation method that based on streaming algorithm [1, 19]. The proposed method keeps the advantages of deterministic matrix methods and also introduces several new ideas to handle the limitations.

First, we introduce a streaming motivated stochastic idea into the model. Streaming methods are methods that process data streams. Specially, the input of a streaming model is organized as a sequence of data blocks. The main target of data streams is to solve memory issues. In this paper, we sample a small portion of vertices once and formulate a graph stream. With the accumulation of vertices along with the flow of data stream, more and more information will be automatically contained in the model rather in the data. Since we choose fixed small number of vertices for each time, the dimension of the input will be reduced significantly. Consequently, the streaming strategy is helpful in handling computation complexity issues.

Second, in order to combine more information in the model, we adopt a regularization framework in the proposed method. The direct matrix models only consider visible edges between vertices. The highlight point of the regularization framework is that the graph regularization term includes the vertex similarities in the model in addition to visible connections. Vertices that are similar in the original space should have similar representations in the latent low-dimensional space.

Finally, after the graph streams are obtained, we fed the data stream into a deep autoencoder [6] to learn the representations of graph vertices. The learning power of deep autoencoder assures that the learned features keep sufficient information from the original graph.

2 Related Work

Graph representation, also known as graph embedding, is a sub topic of representation learning. What representation learning [4] attempts to do is to decode data in the original space into a vector space in an unsupervised fashion so that the leaned features can be used in further tasks.

Early methods such as Laplacian Eigenmaps (LE) [3], Local Linear Embedding (LLE) [14] and are deterministic. Among those methods, the graph is denoted as an adjacent matrix and methods under the matrix is generally related to dimension reduction techniques [18]. Specially, the intuitive idea is to solve the eigenvectors of the affinity matrix. They exploit spectral properties of affinity matrices and are known as laplace methods. More recently, deep learning models are also used as dimension reduction tools for their superior ability in representation [8, 15].

More recent works on graph embedding are stochastic in which the graph is no longer represented as a fixed matrix [5, 13]. Methods such as Deepwalk [13] and node2vec [5] regard the graph as a vertex vocabulary where a collection node sequences are sampled from. Subsequently, language models such as skip-gram [11] can be used to obtain the ultimate representations.

The deterministic methods are not flexible [2] and the disadvantages of stochastic models are also obvious. Since stochastic models only consider local information that describes the nearest neighbors of vertices, they fail in providing a global picture under the whole graph view. The loss of global information influences the performance of such models when the graph structure is irregular.

3 Network Embedding Problem

3.1 Notations

In this paper, we denote vectors as lowercase letters with bold form and matrixes as uppercase letters in boldface. The elements of a matrix and a vector are denoted as $\mathbf {X}_{ij}$ and $\mathbf {x}_{i}$ respectively. Given a graph G(V, E), V is the vertices set denoted as $\{v_{1},...,v_{n}\}$ and E is the edges set denoted as $\{v_{ij}\}_{i,j=1}^{n}$.

We then define the graph embedding as:

Definition 1

(Graph Embedding). Given a N-vertex graph G(V, E), the goal of graph embedding is to learn a mapping $v_{i} \longmapsto \mathbf {y}_{i}$, $\mathbf {y}_{i} \in \mathbb {R}^{d}$. The learned representations in the latent low-dimensional space should be capable to keep the structural information of the original graph.

3.2 Streaming Strategy

In this subsection, we illustrate how to formulate the data stream from a given graph G(V, E). Let K be the number of data chunks in a data stream. Denote $S_{k}$ as the $k^{th}$ data chunk in the data stream where $k\in {1,2,\cdots ,K}$. The K can be extremely large since the substantial numbers of samplings is conducive to visiting the graph completely.

Let the number of vertices that selected in one time to be $D(D \ll N)$. Obviously, the D is also the input of the embedding model since D is fixed as a constant number. In the training phase, in an arbitrary step k, we select D nodes from the vertex collection $\{v_{1},...,v_{n}\}$ uniformly. A subgraph is then constructed by the selected nodes. The $S_{k}$ is the adjacent matrix of the subgraph. In Fig. 1 we present the sampling process to formulate a data stream. A data stream S is denoted as $S={S_{k};k\in {1,\cdots ,K}}$.

In the embedding phase, the goal is mapping each vertex to its representations by the trained model. However, the dimension of the original data N is much higher than the input dimension of the model D. Consequently, we run a simple Principal Component Analysis (PCA) in $\mathbf {X}$ to get $\mathbf {X}^{D}$ with a dimension D. Then the $\mathbf {X}^{D}$ is served as input to obtain the compact representations for each vertex.

3.3 Graph Autoendoer

Autoencoders [16] is powerful in representation task. After getting the data stream, we use a deep graph autoencoder to get the low-dimension vectors.

Deep Autoencoder: Autoencoder paradigm attempts to copy its input to its output, which results in a code layer that may capture useful properties of the input.

Let $\mathbf {X} = \{ {\mathbf {x}_{i}}: {\mathbf {x}_{i}}\in \mathbb {R}^{m \times 1} \}_{i=1}^{n}$ and $\mathbf {Z} = \{ {\mathbf {z}}_{i}: {\mathbf {z}}_{i}\in \mathbb {R}^{m \times 1} \}_{i=1}^{n}$ be the input matrix and reconstruction matrix. $\mathbf {Y} = \{ {\mathbf {y}_{i}}: {\mathbf {y}_{i}}\in \mathbb {R}^{d \times 1} \}_{i=1}^{n}$ is the code matrix where the dimension of $\mathbf {y}_{i}$ is usually much lower than the dimension of the original data $\mathbf {x}_{i}$. A layer wise interpretation of the encoder and decoder can be represented as:

$$\begin{aligned} \mathbf {Y} = f_{\theta }(\mathbf {X}) = \delta (W_{encoder}\mathbf {X} + b_{encoder}) \end{aligned}$$

(1)

$$\begin{aligned} \mathbf {Z} = g_{\theta }(\mathbf {Y}) = \delta (W_{decoder}\mathbf {Y} + b_{decoder}) \end{aligned}$$

(2)

For convenience, we summarize the encoder parameters as $\theta _{encoder}$, and the decoder parameters as $\theta _{decoder}$. Then the loss function can be defined as:

$$\begin{aligned} \mathcal {L} = \Vert \mathbf {X} - \mathbf {Z}\Vert ^{2}_{F} = \sum _{i=1}^{n} \Vert \mathbf {x}_{i} - \mathbf {z}_{i} \Vert ^{2}_{2} \end{aligned}$$

(3)

Since a deep autoencoder can be thought of as a special case of feedforward networks, the parameters are optimized by backpropagate gradients through chain-rules.

Graph Regularization: In order to preserve the local structure of the data, we employ a graph regularization term derived from Laplacian Eigenmaps [3]. Suppose A is the indicator matrix where $\mathbf {A}_{ij}$ indicate if node i and node j are connected, the laplacian loss is then defined as:

$$\begin{aligned} Laplacian = \sum _{i}^{n}\sum _{j}^{n}\mathbf {A}_{ij} \Vert \mathbf {y}_{i}-\mathbf {y}_{j} \Vert _{2}^{2} \end{aligned}$$

(4)

The laplacian loss can be further written as:

$$\begin{aligned} Laplacian = \sum _{i}^{n}\sum _{j}^{n}{A}_{ij} \Vert \mathbf {y}_{i}-\mathbf {y}_{j} \Vert _{2}^{2} = 2tr(\mathbf {Y}^{T}\mathbf {L}\mathbf {Y}) \end{aligned}$$

(5)

where $tr(*)$ denotes the trace and $\mathbf {L}$ is the laplace matrix calculated by matrix $\mathbf {A}$: $\mathbf {L} = \mathbf {D} - \mathbf {A}$. $\mathbf {D}$ is a diagonal matrix where $\mathbf {D} = \sum _{i}^n \mathbf {A}_{ij} = \sum _{i}^n \mathbf {A}_{ij}$.

Combining the graph information, the optimization problem is:

$$\begin{aligned} \mathcal {L} = \Vert \mathbf {X} - \mathbf {Z}\Vert ^{2}_{F} + \alpha ^\prime \cdot 2tr(\mathbf {Y}^{T}\mathbf {L}\mathbf {Y}) + \beta ^\prime \cdot \frac{1}{2} \Vert W\Vert _{F}^{2} \end{aligned}$$

(6)

Merge the constant numbers into parameters $\alpha $ and $\beta $, the loss function is updated as:

$$\begin{aligned} \mathcal {L} = \Vert \mathbf {X} - \mathbf {Z}\Vert ^{2}_{F} + \alpha tr(\mathbf {Y}^{T}\mathbf {L}\mathbf {Y}) + \beta \Vert W\Vert _{F}^{2} \end{aligned}$$

(7)

where $\alpha $ and $\beta $ are the hyperparameters that control the model complexity.

Recall that each time we have a data chunk $S_{k}$, let $\mathbf {X}=\mathbf {A}=S_{k}$ and then run the graph regularized autoencoder under $\mathbf {X}$ and $\mathbf {A}$. Similar to most deep neural networks, we choose gradient decent to optimize the deep autoencoder. The objective function is $ \mathcal {L} = \varepsilon (f,g) + \lambda \varOmega (f)$. The first term $\varepsilon (f,g)$ is the reconstruction error and the second term $\varOmega (f)$ is the regularization term. The partial derivatives of $\theta _{decoder}$ only depend on the first term and the partial derivatives of $\theta _{encoder}$ depend on both terms. By using chain rules, parameters at each layer can be calculated sequentially.

4 Experiments

In this section, we conduct experiments in clustering tasks to testify the effectiveness of our method.

We use two datasets, COIL20 and ORL, to testify the our efficiency. COIL20 contains 1440 instances that belongs to 20 categories and ORL contains 400 samples that belongs to 40 classes. The KNN-graph is constructed by computing the k-nearest neighbors of each sample.

We compare our approach with several deep models to evaluate the performance of our method. Specially, we employ deep autoencoder (DAE) [7] and stacked autoencoder (SAE) [16] as baseline models.

We evaluate the learned features in clustering task. Following the general settings in most clustering procedure, we employ purity and NMI to evaluate the results.

In our experiment, we set the D to be 500 for COIL20 and 200 for ORL. The layers of deep graph autoencoders for COIL20 and ORL are 5 and 3 respectively. For COIL20, we set the dimensions as $500-200-100-200-500$. For ORL, we set the dimensions as $200-100-200$.

The clustering results of COIL20 and ORL are presented in Table 1. The results show that the streaming method has competitive representation power comparing with baseline models that utilize the complete matrices. The results also indicate that when encountering large graphs, the streaming method is relieved from computation issues and is still able to achieve superior performance.

Table 1. Results in clustering task

Full size table

5 Conclusion

We proposed a streaming motivated embedding method to learn the low dimensional representations of the graph. The streaming strategy is used to reduce the effect of computation complexity. The deep autoencoder and graph regularization idea make sure the learned features include enough information. Experiments in clustering task verify the effectiveness of our methods. Our model achieve results as good as models that directly apply dimension reduction in the original matrix. The results can be generalized to large graphs where directly matrix models are inapplicable.

References

Aggarwal, C.C.: Data Streams: Models and Algorithms, vol. 31. Springer, New York (2007)
MATH Google Scholar
Ahmed, A., Shervashidze, N., Narayanamurthy, S., Josifovski, V., Smola, A.J.: Distributed large-scale natural graph factorization. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 37–48. ACM (2013)
Google Scholar
Belkin, M., Niyogi, P.: Laplacian Eigenmaps and spectral techniques for embedding and clustering. In: NIPS, vol. 14, pp. 585–591 (2001)
Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Patt. Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Google Scholar
Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Article MathSciNet Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Huang, P., Huang, Y., Wang, W., Wang, L.: Deep embedding network for clustering. In: 2014 22nd International Conference on Pattern Recognition (ICPR), pp. 1532–1537. IEEE (2014)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788 (1999)
Article Google Scholar
Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning (ICML2003), pp. 496–503 (2003)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2002)
Google Scholar
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Tian, F., Gao, B., Cui, Q., Chen, E., Liu, T.Y.: Learning deep representations for graph clustering. In: AAAI, pp. 1293–1299 (2014)
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
MathSciNet MATH Google Scholar
Wang, D., Cui, P., Zhu, W.: Structural deep network embedding. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1225–1234. ACM (2016)
Google Scholar
Yan, S., Xu, D., Zhang, B., Zhang, H.J., Yang, Q., Lin, S.: Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans. Patt. Anal. Mach. Intell. 29(1), 40–51 (2007)
Article Google Scholar
Zhang, P., Zhu, X., Guo, L.: Mining data streams with labeled and unlabeled training examples. In: 2009 Ninth IEEE International Conference on Data Mining, ICDM 2009, pp. 627–636. IEEE (2009)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 91546201, No. 71331005, No. 71110107026, No. 11671379, No. 11331012), UCAS Grant (No. Y55202LY00).

Author information

Authors and Affiliations

School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing, 100049, China
Minglong Lei & Peijia Li
School of Economics and Management, University of Chinese Academy of Sciences, Beijing, 100190, China
Yong Shi & Lingfeng Niu
Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, 100190, China
Yong Shi & Lingfeng Niu
Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing, 100190, China
Yong Shi & Lingfeng Niu
College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, 68182, USA
Yong Shi

Authors

Minglong Lei
View author publications
You can also search for this author in PubMed Google Scholar
Yong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Peijia Li
View author publications
You can also search for this author in PubMed Google Scholar
Lingfeng Niu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lingfeng Niu .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Yong Shi
National Supercomputing Center in Wuxi, Wuxi, China
Haohuan Fu
Chinese Academy of Sciences, Beijing, China
Yingjie Tian
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Amsterdam, Amsterdam, The Netherlands
Michael Harold Lees
University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lei, M., Shi, Y., Li, P., Niu, L. (2018). Deep Streaming Graph Representations. In: Shi, Y., et al. Computational Science – ICCS 2018. ICCS 2018. Lecture Notes in Computer Science(), vol 10862. Springer, Cham. https://doi.org/10.1007/978-3-319-93713-7_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-93713-7_46
Published: 12 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93712-0
Online ISBN: 978-3-319-93713-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Streaming Graph Representations

Abstract

Similar content being viewed by others

Graph Autoencoder with Community Neighborhood Network

A Simple Approach to Attributed Graph Embedding via Enhanced Autoencoder

Simple and Effective Graph Autoencoders with One-Hop Linear Models

Keywords

1 Introduction

2 Related Work

3 Network Embedding Problem

3.1 Notations

Definition 1

3.2 Streaming Strategy

3.3 Graph Autoendoer

4 Experiments

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Deep Streaming Graph Representations

Abstract

Similar content being viewed by others

Graph Autoencoder with Community Neighborhood Network

A Simple Approach to Attributed Graph Embedding via Enhanced Autoencoder

Simple and Effective Graph Autoencoders with One-Hop Linear Models

Keywords

1 Introduction

2 Related Work

3 Network Embedding Problem

3.1 Notations

Definition 1

3.2 Streaming Strategy

3.3 Graph Autoendoer

4 Experiments

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation