Skip to main content

Summarizing Data Structures with Gaussian Process and Robust Neighborhood Preservation

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13717))

Abstract

Latent variable models summarize high-dimensional data while preserving its many complex properties. This paper proposes a locality-aware and low-rank approximated Gaussian process latent variable model (LolaGP) that can preserve the global relationship and local geometry in the derivation of the latent variables. We realize the global relationship by imitating the sample similarity non-linearly and the local geometry based on our newly constructed neighborhood graph. Formally, we derive LolaGP from GP-LVM and implement a locality-aware regularization to reflect its adjacency relationship. The neighborhood graph is constructed based on the latent variables, making the local preservation more resistant to noise disruption and the curse of dimensionality than the previous methods that directly construct it from the high-dimensional data. Furthermore, we introduce a new lower bound of a log-posterior distribution based on low-rank matrix approximation, which allows LolaGP to handle larger datasets than the conventional GP-LVM extensions. Our contribution is to preserve both the global and local structures in the derivation of the latent variables using the robust neighborhood graph and introduce the scalable lower bound of the log-posterior distribution. We conducted an experimental analysis using synthetic as well as images with and without highly noise disrupted datasets. From both qualitative and quantitative standpoint, our method produced successful results in all experimental settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://yann.lecun.com/exdb/mnist/.

References

  1. Attali, D., Lieutier, A., Salinas, D.: Vietoris-Rips complexes also provide topologically correct reconstructions of sampled shapes. Comput. Geom. 46(4), 448–465 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  2. Balasubramanian, M., Schwartz, E.L., Tenenbaum, J.B., de Silva, V., Langford, J.C.: The isomap algorithm and topological stability. Science 295(5552), 7 (2002)

    Article  Google Scholar 

  3. Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019)

    Article  Google Scholar 

  4. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)

    Article  MATH  Google Scholar 

  5. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  6. Burgess, C.P., et al.: Understanding disentangling in \(\beta \)-VAE. arXiv preprint arXiv:1804.03599 (2018)

  7. Chang, H., Yeung, D.Y.: Robust locally linear embedding. Pattern Recognit. 39(6), 1053–1065 (2006)

    Article  MATH  Google Scholar 

  8. Chazal, F., Cohen-Steiner, D., Mérigot, Q.: Geometric inference for probability measures. Found. Comput. Math. 11(6), 733–751 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chazal, F., et al.: Robust topological inference: distance to a measure and kernel distance. J. Mach. Learn. Res. 18(1), 5845–5884 (2017)

    MathSciNet  Google Scholar 

  10. Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmonic Anal. 21(1), 5–30 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  11. Damianou, A.C., Titsias, M.K., Lawrence, N.D.: Variational inference for latent variables and uncertain inputs in Gaussian processes. J. Mach. Learn. Res. 17(42), 1–62 (2016)

    MathSciNet  MATH  Google Scholar 

  12. Diaz-Papkovich, A., Anderson-Trocmé, L., Ben-Eghan, C., Gravel, S.: UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLoS Genet. 15(11), e1008432 (2019)

    Article  Google Scholar 

  13. Ferris, B., Fox, D., Lawrence, N.D.: WiFi-SLAM using Gaussian process latent variable models. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 2480–2485 (2007)

    Google Scholar 

  14. Hausmann, J.C.: On the Vietoris-Rips complexes and a cohomology theory for metric spaces. In: Prospects in Topology: Proceedings of a Conference in Honor of William Browder, pp. 175–188 (1995)

    Google Scholar 

  15. He, X., Cai, D., Yan, S., Zhang, H.J.: Neighborhood preserving embedding. In: International Conference on Computer Vision (ICCV), pp. 1208–1213 (2005)

    Google Scholar 

  16. Hensman, J., et al.: GPy: A Gaussian process framework in python. https://github.com/SheffieldML/GPy (2012)

  17. Hensman, J., Fusi, N., Lawrence, N.D.: Gaussian processes for big data. arXiv preprint arXiv:1309.6835 (2013)

  18. Higgins, I., et al.: \(\beta \)-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations (ICLR), pp. 1–22 (2016)

    Google Scholar 

  19. Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24(6), 417–441 (1933)

    Article  MATH  Google Scholar 

  20. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv Preprint arXiv:1312.6114 (2013)

  21. Lawrence, N.D.: Learning for larger datasets with the Gaussian process latent variable model. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 243–250 (2007)

    Google Scholar 

  22. Lawrence, N.D., Hyvärinen, A.: Probabilistic non-linear principal component analysis with Gaussian process latent variable models. J. Mach. Learn. Res. 6(11), 1783–1816 (2005)

    Google Scholar 

  23. Lu, C., Tang, X.: Surpassing human-level face verification performance on LFW with GaussianFace. In: AAAI Conference on Artificial Intelligence (AAAI), pp. 3811–3819 (2015)

    Google Scholar 

  24. Lu, Y., Lai, Z., Xu, Y., Li, X., Zhang, D., Yuan, C.: Low-rank preserving projections. IEEE Trans. Cybern. 46(8), 1900–1913 (2015)

    Article  Google Scholar 

  25. Moon, K.R., et al.: Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37(12), 1482–1492 (2019)

    Google Scholar 

  26. MacKay, D.J.: Bayesian nonlinear modeling for the prediction competition. ASHRAE Trans. 100(2), 1053–1062 (1994)

    Google Scholar 

  27. McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  28. Moor, M., Horn, M., Rieck, B., Borgwardt, K.: Topological autoencoders. In: International Conference on Machine Learning (ICML), pp. 7045–7054 (2020)

    Google Scholar 

  29. Nene, S.A., Nayar, S.K., Murase, H.: Columbia object image library (coil-20) (1996). www.cs.columbia.edu/CAVE/software/softlib/coil-20.php

  30. Ng, Y.C., Colombo, N., Silva, R.: Bayesian semi-supervised learning with graph Gaussian processes. In: Advances in Neural Information Processing (NeurIPS) (2018)

    Google Scholar 

  31. Ordun, C., Purushotham, S., Raff, E.: Exploratory analysis of COVID-19 tweets using topic modeling, UMAP, and DiGraphs. arXiv preprint arXiv:2005.03082 (2020)

  32. Quinonero-Candela, J., Rasmussen, C.E.: A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6(65), 1939–1959 (2005)

    MathSciNet  MATH  Google Scholar 

  33. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  34. Saul, L.K.: A tractable latent variable model for nonlinear dimensionality reduction. Proc. Nat. Acad. Sci. 117(27), 15403–15408 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  35. Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)

    Article  Google Scholar 

  36. Song, G., Wang, S., Huang, Q., Tian, Q.: Harmonized multimodal learning with Gaussian process latent variable models. IEEE Trans. Pattern Anal. Mach. Intell. 43(3), 858–872 (2021)

    Article  Google Scholar 

  37. Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 61(3), 611–622 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  38. Titsias, M.: Variational learning of inducing variables in sparse Gaussian processes. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 567–574 (2009)

    Google Scholar 

  39. Titsias, M., Lawrence, N.D.: Bayesian Gaussian process latent variable model. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 844–851 (2010)

    Google Scholar 

  40. Urtasun, R., Darrell, T.: Discriminative Gaussian process latent variable model for classification. In: International Conference on Machine Learning (ICML), 927–934 (2007)

    Google Scholar 

  41. Urtasun, R., Fleet, D.J., Geiger, A., Popović, J., Darrell, T.J., Lawrence, N.D.: Topologically-constrained latent variable models. In: International Conference on Machine Learning (ICML), pp. 1080–1087 (2008)

    Google Scholar 

  42. Van Der Maaten, L., Postma, E., Van den Herik, J., et al.: Dimensionality reduction: a comparative. Technical Report TiCC-TR 2009–005 (2009)

    Google Scholar 

  43. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)

    MATH  Google Scholar 

  44. Venna, J., Kaski, S.: Visualizing gene interaction graphs with local multidimensional scaling. In: European Symposium on Artificial Neural Networks (ESANN), pp. 557–562 (2006)

    Google Scholar 

  45. Wang, J.M., Fleet, D.J., Hertzmann, A.: Gaussian process dynamical models for human motion. IEEE Trans. Pattern Anal. Mach. Intell. 30(2), 283–298 (2007)

    Article  Google Scholar 

  46. Wasserman, L.: Topological data analysis. Ann. Rev. Stat. Appl. 5(1), 501–532 (2018)

    Article  MathSciNet  Google Scholar 

  47. You, Z.H., Lei, Y.K., Gui, J., Huang, D.S., Zhou, X.: Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data. Bioinformatics 26(21), 2744–2751 (2010)

    Article  Google Scholar 

  48. Zhang, Y.J., Pan, S., He, L., Ling, Z.H.: Learning latent representations for style control and transfer in end-to-end speech synthesis. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6945–6949 (2019)

    Google Scholar 

  49. Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 23(4), 550–560 (1997)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This study was supported in part by JSPS KAKENHI Grant Number JP21H03456 and AMED Grant Number JP21zf0127004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miki Haseyama .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Watanabe, K., Maeda, K., Ogawa, T., Haseyama, M. (2023). Summarizing Data Structures with Gaussian Process and Robust Neighborhood Preservation. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13717. Springer, Cham. https://doi.org/10.1007/978-3-031-26419-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26419-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26418-4

  • Online ISBN: 978-3-031-26419-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics