Abstract
Latent variable models summarize high-dimensional data while preserving its many complex properties. This paper proposes a locality-aware and low-rank approximated Gaussian process latent variable model (LolaGP) that can preserve the global relationship and local geometry in the derivation of the latent variables. We realize the global relationship by imitating the sample similarity non-linearly and the local geometry based on our newly constructed neighborhood graph. Formally, we derive LolaGP from GP-LVM and implement a locality-aware regularization to reflect its adjacency relationship. The neighborhood graph is constructed based on the latent variables, making the local preservation more resistant to noise disruption and the curse of dimensionality than the previous methods that directly construct it from the high-dimensional data. Furthermore, we introduce a new lower bound of a log-posterior distribution based on low-rank matrix approximation, which allows LolaGP to handle larger datasets than the conventional GP-LVM extensions. Our contribution is to preserve both the global and local structures in the derivation of the latent variables using the robust neighborhood graph and introduce the scalable lower bound of the log-posterior distribution. We conducted an experimental analysis using synthetic as well as images with and without highly noise disrupted datasets. From both qualitative and quantitative standpoint, our method produced successful results in all experimental settings.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Attali, D., Lieutier, A., Salinas, D.: Vietoris-Rips complexes also provide topologically correct reconstructions of sampled shapes. Comput. Geom. 46(4), 448–465 (2013)
Balasubramanian, M., Schwartz, E.L., Tenenbaum, J.B., de Silva, V., Langford, J.C.: The isomap algorithm and topological stability. Science 295(5552), 7 (2002)
Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Burgess, C.P., et al.: Understanding disentangling in \(\beta \)-VAE. arXiv preprint arXiv:1804.03599 (2018)
Chang, H., Yeung, D.Y.: Robust locally linear embedding. Pattern Recognit. 39(6), 1053–1065 (2006)
Chazal, F., Cohen-Steiner, D., Mérigot, Q.: Geometric inference for probability measures. Found. Comput. Math. 11(6), 733–751 (2011)
Chazal, F., et al.: Robust topological inference: distance to a measure and kernel distance. J. Mach. Learn. Res. 18(1), 5845–5884 (2017)
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)
Damianou, A.C., Titsias, M.K., Lawrence, N.D.: Variational inference for latent variables and uncertain inputs in Gaussian processes. J. Mach. Learn. Res. 17(42), 1–62 (2016)
Diaz-Papkovich, A., Anderson-Trocmé, L., Ben-Eghan, C., Gravel, S.: UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLoS Genet. 15(11), e1008432 (2019)
Ferris, B., Fox, D., Lawrence, N.D.: WiFi-SLAM using Gaussian process latent variable models. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 2480–2485 (2007)
Hausmann, J.C.: On the Vietoris-Rips complexes and a cohomology theory for metric spaces. In: Prospects in Topology: Proceedings of a Conference in Honor of William Browder, pp. 175–188 (1995)
He, X., Cai, D., Yan, S., Zhang, H.J.: Neighborhood preserving embedding. In: International Conference on Computer Vision (ICCV), pp. 1208–1213 (2005)
Hensman, J., et al.: GPy: A Gaussian process framework in python. https://github.com/SheffieldML/GPy (2012)
Hensman, J., Fusi, N., Lawrence, N.D.: Gaussian processes for big data. arXiv preprint arXiv:1309.6835 (2013)
Higgins, I., et al.: \(\beta \)-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations (ICLR), pp. 1–22 (2016)
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417–441 (1933)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv Preprint arXiv:1312.6114 (2013)
Lawrence, N.D.: Learning for larger datasets with the Gaussian process latent variable model. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 243–250 (2007)
Lawrence, N.D., Hyvärinen, A.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 6(11), 1783–1816 (2005)
Lu, C., Tang, X.: Surpassing human-level face verification performance on LFW with GaussianFace. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 3811–3819 (2015)
Lu, Y., Lai, Z., Xu, Y., Li, X., Zhang, D., Yuan, C.: Low-rank preserving projections. IEEE Trans. Cybern. 46(8), 1900–1913 (2015)
Moon, K.R., et al.: Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37(12), 1482–1492 (2019)
MacKay, D.J.: Bayesian nonlinear modeling for the prediction competition. ASHRAE Trans. 100(2), 1053–1062 (1994)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Moor, M., Horn, M., Rieck, B., Borgwardt, K.: Topological autoencoders. In: International Conference on Machine Learning (ICML), pp. 7045–7054 (2020)
Nene, S.A., Nayar, S.K., Murase, H.: Columbia object image library (coil-20) (1996). www.cs.columbia.edu/CAVE/software/softlib/coil-20.php
Ng, Y.C., Colombo, N., Silva, R.: Bayesian semi-supervised learning with graph Gaussian processes. In: Advances in Neural Information Processing (NeurIPS) (2018)
Ordun, C., Purushotham, S., Raff, E.: Exploratory analysis of COVID-19 tweets using topic modeling, UMAP, and DiGraphs. arXiv preprint arXiv:2005.03082 (2020)
Quinonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6(65), 1939–1959 (2005)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Saul, L.K.: A tractable latent variable model for nonlinear dimensionality reduction. Proc. Nat. Acad. Sci. 117(27), 15403–15408 (2020)
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)
Song, G., Wang, S., Huang, Q., Tian, Q.: Harmonized multimodal learning with Gaussian process latent variable models. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 858–872 (2021)
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 61(3), 611–622 (1999)
Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 567–574 (2009)
Titsias, M., Lawrence, N.D.: Bayesian Gaussian process latent variable model. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 844–851 (2010)
Urtasun, R., Darrell, T.: Discriminative Gaussian process latent variable model for classification. In: International Conference on Machine Learning (ICML), 927–934 (2007)
Urtasun, R., Fleet, D.J., Geiger, A., Popović, J., Darrell, T.J., Lawrence, N.D.: Topologically-constrained latent variable models. In: International Conference on Machine Learning (ICML), pp. 1080–1087 (2008)
Van Der Maaten, L., Postma, E., Van den Herik, J., et al.: Dimensionality reduction: a comparative. Technical Report TiCC-TR 2009–005 (2009)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Venna, J., Kaski, S.: Visualizing gene interaction graphs with local multidimensional scaling. In: European Symposium on Artificial Neural Networks (ESANN), pp. 557–562 (2006)
Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2007)
Wasserman, L.: Topological data analysis. Ann. Rev. Stat. Appl. 5(1), 501–532 (2018)
You, Z.H., Lei, Y.K., Gui, J., Huang, D.S., Zhou, X.: Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26(21), 2744–2751 (2010)
Zhang, Y.J., Pan, S., He, L., Ling, Z.H.: Learning latent representations for style control and transfer in end-to-end speech synthesis. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6945–6949 (2019)
Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 23(4), 550–560 (1997)
Acknowledgements
This study was supported in part by JSPS KAKENHI Grant Number JP21H03456 and AMED Grant Number JP21zf0127004.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Watanabe, K., Maeda, K., Ogawa, T., Haseyama, M. (2023). Summarizing Data Structures with Gaussian Process and Robust Neighborhood Preservation. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13717. Springer, Cham. https://doi.org/10.1007/978-3-031-26419-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-26419-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26418-4
Online ISBN: 978-3-031-26419-1
eBook Packages: Computer ScienceComputer Science (R0)