Summarizing Data Structures with Gaussian Process and Robust Neighborhood Preservation

Watanabe, Koshi; Maeda, Keisuke; Ogawa, Takahiro; Haseyama, Miki

doi:10.1007/978-3-031-26419-1_10

Koshi Watanabe¹³,
Keisuke Maeda¹⁴,
Takahiro Ogawa¹⁴ &
…
Miki Haseyama¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13717))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

620 Accesses
1 Citations

Abstract

Latent variable models summarize high-dimensional data while preserving its many complex properties. This paper proposes a locality-aware and low-rank approximated Gaussian process latent variable model (LolaGP) that can preserve the global relationship and local geometry in the derivation of the latent variables. We realize the global relationship by imitating the sample similarity non-linearly and the local geometry based on our newly constructed neighborhood graph. Formally, we derive LolaGP from GP-LVM and implement a locality-aware regularization to reflect its adjacency relationship. The neighborhood graph is constructed based on the latent variables, making the local preservation more resistant to noise disruption and the curse of dimensionality than the previous methods that directly construct it from the high-dimensional data. Furthermore, we introduce a new lower bound of a log-posterior distribution based on low-rank matrix approximation, which allows LolaGP to handle larger datasets than the conventional GP-LVM extensions. Our contribution is to preserve both the global and local structures in the derivation of the latent variables using the robust neighborhood graph and introduce the scalable lower bound of the log-posterior distribution. We conducted an experimental analysis using synthetic as well as images with and without highly noise disrupted datasets. From both qualitative and quantitative standpoint, our method produced successful results in all experimental settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://yann.lecun.com/exdb/mnist/.

References

Attali, D., Lieutier, A., Salinas, D.: Vietoris-Rips complexes also provide topologically correct reconstructions of sampled shapes. Comput. Geom. 46(4), 448–465 (2013)
Article MathSciNet MATH Google Scholar
Balasubramanian, M., Schwartz, E.L., Tenenbaum, J.B., de Silva, V., Langford, J.C.: The isomap algorithm and topological stability. Science 295(5552), 7 (2002)
Article Google Scholar
Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Article MATH Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Burgess, C.P., et al.: Understanding disentangling in \(\beta \)-VAE. arXiv preprint arXiv:1804.03599 (2018)
Chang, H., Yeung, D.Y.: Robust locally linear embedding. Pattern Recognit. 39(6), 1053–1065 (2006)
Article MATH Google Scholar
Chazal, F., Cohen-Steiner, D., Mérigot, Q.: Geometric inference for probability measures. Found. Comput. Math. 11(6), 733–751 (2011)
Article MathSciNet MATH Google Scholar
Chazal, F., et al.: Robust topological inference: distance to a measure and kernel distance. J. Mach. Learn. Res. 18(1), 5845–5884 (2017)
MathSciNet Google Scholar
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)
Article MathSciNet MATH Google Scholar
Damianou, A.C., Titsias, M.K., Lawrence, N.D.: Variational inference for latent variables and uncertain inputs in Gaussian processes. J. Mach. Learn. Res. 17(42), 1–62 (2016)
MathSciNet MATH Google Scholar
Diaz-Papkovich, A., Anderson-Trocmé, L., Ben-Eghan, C., Gravel, S.: UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLoS Genet. 15(11), e1008432 (2019)
Article Google Scholar
Ferris, B., Fox, D., Lawrence, N.D.: WiFi-SLAM using Gaussian process latent variable models. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 2480–2485 (2007)
Google Scholar
Hausmann, J.C.: On the Vietoris-Rips complexes and a cohomology theory for metric spaces. In: Prospects in Topology: Proceedings of a Conference in Honor of William Browder, pp. 175–188 (1995)
Google Scholar
He, X., Cai, D., Yan, S., Zhang, H.J.: Neighborhood preserving embedding. In: International Conference on Computer Vision (ICCV), pp. 1208–1213 (2005)
Google Scholar
Hensman, J., et al.: GPy: A Gaussian process framework in python. https://github.com/SheffieldML/GPy (2012)
Hensman, J., Fusi, N., Lawrence, N.D.: Gaussian processes for big data. arXiv preprint arXiv:1309.6835 (2013)
Higgins, I., et al.: \(\beta \)-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations (ICLR), pp. 1–22 (2016)
Google Scholar
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417–441 (1933)
Article MATH Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv Preprint arXiv:1312.6114 (2013)
Lawrence, N.D.: Learning for larger datasets with the Gaussian process latent variable model. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 243–250 (2007)
Google Scholar
Lawrence, N.D., Hyvärinen, A.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 6(11), 1783–1816 (2005)
Google Scholar
Lu, C., Tang, X.: Surpassing human-level face verification performance on LFW with GaussianFace. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 3811–3819 (2015)
Google Scholar
Lu, Y., Lai, Z., Xu, Y., Li, X., Zhang, D., Yuan, C.: Low-rank preserving projections. IEEE Trans. Cybern. 46(8), 1900–1913 (2015)
Article Google Scholar
Moon, K.R., et al.: Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37(12), 1482–1492 (2019)
Google Scholar
MacKay, D.J.: Bayesian nonlinear modeling for the prediction competition. ASHRAE Trans. 100(2), 1053–1062 (1994)
Google Scholar
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Moor, M., Horn, M., Rieck, B., Borgwardt, K.: Topological autoencoders. In: International Conference on Machine Learning (ICML), pp. 7045–7054 (2020)
Google Scholar
Nene, S.A., Nayar, S.K., Murase, H.: Columbia object image library (coil-20) (1996). www.cs.columbia.edu/CAVE/software/softlib/coil-20.php
Ng, Y.C., Colombo, N., Silva, R.: Bayesian semi-supervised learning with graph Gaussian processes. In: Advances in Neural Information Processing (NeurIPS) (2018)
Google Scholar
Ordun, C., Purushotham, S., Raff, E.: Exploratory analysis of COVID-19 tweets using topic modeling, UMAP, and DiGraphs. arXiv preprint arXiv:2005.03082 (2020)
Quinonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6(65), 1939–1959 (2005)
MathSciNet MATH Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Saul, L.K.: A tractable latent variable model for nonlinear dimensionality reduction. Proc. Nat. Acad. Sci. 117(27), 15403–15408 (2020)
Article MathSciNet MATH Google Scholar
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)
Article Google Scholar
Song, G., Wang, S., Huang, Q., Tian, Q.: Harmonized multimodal learning with Gaussian process latent variable models. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 858–872 (2021)
Article Google Scholar
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 61(3), 611–622 (1999)
Article MathSciNet MATH Google Scholar
Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 567–574 (2009)
Google Scholar
Titsias, M., Lawrence, N.D.: Bayesian Gaussian process latent variable model. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 844–851 (2010)
Google Scholar
Urtasun, R., Darrell, T.: Discriminative Gaussian process latent variable model for classification. In: International Conference on Machine Learning (ICML), 927–934 (2007)
Google Scholar
Urtasun, R., Fleet, D.J., Geiger, A., Popović, J., Darrell, T.J., Lawrence, N.D.: Topologically-constrained latent variable models. In: International Conference on Machine Learning (ICML), pp. 1080–1087 (2008)
Google Scholar
Van Der Maaten, L., Postma, E., Van den Herik, J., et al.: Dimensionality reduction: a comparative. Technical Report TiCC-TR 2009–005 (2009)
Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
MATH Google Scholar
Venna, J., Kaski, S.: Visualizing gene interaction graphs with local multidimensional scaling. In: European Symposium on Artificial Neural Networks (ESANN), pp. 557–562 (2006)
Google Scholar
Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2007)
Article Google Scholar
Wasserman, L.: Topological data analysis. Ann. Rev. Stat. Appl. 5(1), 501–532 (2018)
Article MathSciNet Google Scholar
You, Z.H., Lei, Y.K., Gui, J., Huang, D.S., Zhou, X.: Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26(21), 2744–2751 (2010)
Article Google Scholar
Zhang, Y.J., Pan, S., He, L., Ling, Z.H.: Learning latent representations for style control and transfer in end-to-end speech synthesis. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6945–6949 (2019)
Google Scholar
Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 23(4), 550–560 (1997)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This study was supported in part by JSPS KAKENHI Grant Number JP21H03456 and AMED Grant Number JP21zf0127004.

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan
Koshi Watanabe
Faculty of Information Science and Technology, Hokkaido University, Sapporo, Japan
Keisuke Maeda, Takahiro Ogawa & Miki Haseyama

Authors

Koshi Watanabe
View author publications
You can also search for this author in PubMed Google Scholar
Keisuke Maeda
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Ogawa
View author publications
You can also search for this author in PubMed Google Scholar
Miki Haseyama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miki Haseyama .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Watanabe, K., Maeda, K., Ogawa, T., Haseyama, M. (2023). Summarizing Data Structures with Gaussian Process and Robust Neighborhood Preservation. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13717. Springer, Cham. https://doi.org/10.1007/978-3-031-26419-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-26419-1_10
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26418-4
Online ISBN: 978-3-031-26419-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)