Skip to main content
Log in

Parametric Gaussian process regression for big data

  • Original Paper
  • Published:
Computational Mechanics Aims and scope Submit manuscript

Abstract

This work introduces the concept of parametric Gaussian processes (PGP), which is built upon the seemingly self-contradictory idea of making Gaussian processes parametric. The resulting framework is capable of encoding massive amount of data into a small number of “hypothetical” data points. Moreover, parametric Gaussian processes are well aware of their imperfections and are capable of properly quantifying the uncertainty in their predictions associated with such limitations. The effectiveness of the proposed approach is demonstrated using three illustrative examples, including one with simulated data, a benchmark with dataset in the airline industry with approximately 6 million records, and spatio-temporal sea surface temperature maps in Massachusetts and Cape Cod Bays and Stellwagen Bank for the year 2015.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://stat-computing.org/dataexpo/2009/

  2. https://sanctuaries.noaa.gov/science/socioeconomic/factsheets/stellwagenbank.html

References

  1. Adam V, Hensman J, Sahani M (2016) Scalable transformed additive signal decomposition by non-conjugate Gaussian process inference. In: 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP), pp 1–6. IEEE

  2. Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68(3):337–404

    Article  MathSciNet  MATH  Google Scholar 

  3. Berlinet A, Thomas-Agnan C (2011) Reproducing kernel Hilbert spaces in probability and statistics. Springer, Berlin

    MATH  Google Scholar 

  4. Csató L, Opper M (2002) Sparse on-line Gaussian processes. Neural Comput 14(3):641–668

    Article  MATH  Google Scholar 

  5. Daley R (1992) Atmospheric data analysis. Cambridge University Press, Cambridge

    Google Scholar 

  6. Dash P, Ignatov A, Martin M, Donlon C et al (2012) Group for high resolution sea surface temperature (ghrsst) analysis fields inter-comparisons-part 2: near real time web-based level 4 sst quality monitor (l4-squam). Deep Sea Res Part II Top Stud Oceanogr 77–80:31–43. https://doi.org/10.1016/j.dsr2.2012.04.002

    Article  Google Scholar 

  7. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  8. Deisenroth MP, Ng JW (2015) Distributed gaussian processes. In: Bach FR, Blei DM (eds.) Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6-11 July 2015, JMLR workshop and conference proceedings, vol 37, pp 1481–1490. JMLR.org. http://jmlr.org/proceedings/papers/v37/deisenroth15.html

  9. Gal Y, van der Wilk M, Rasmussen CE (2014) Distributed variational inference in sparse Gaussian process regression and latent variable models. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc., pp 3257–3265. http://papers.nips.cc/paper/5593-distributed-variational-inference-in-sparse-gaussian-process-regression-and-latent-variable-models.pdf

  10. Hensman J, Durrande N, Solin A (2016) Variational fourier features for Gaussian processes. arXiv preprint arXiv:1611.06740

  11. Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. arXiv preprint arXiv:1309.6835

  12. Hensman J, Rattray M, Lawrence ND (2012) Fast variational inference in the conjugate exponential family. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates, Inc., pp 2888–2896. http://papers.nips.cc/paper/4766-fast-variational-inference-in-the-conjugate-exponential-family.pdf

  13. Hoffman MD, Blei DM, Wang C, Paisley JW (2013) Stochastic variational inference. J Mach Learn Res 14(1):1303–1347

    MathSciNet  MATH  Google Scholar 

  14. Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

  15. Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Freeman J, Tsai D, Amde M, Owen S et al (2016) Mllib: Machine learning in apache spark. J Mach Learn Res 17(34):1–7

    MathSciNet  MATH  Google Scholar 

  16. Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, Cambridge

    MATH  Google Scholar 

  17. Neal RM (2012) Bayesian learning for neural networks, vol 118. Springer, Berlin

    MATH  Google Scholar 

  18. Oke PR, Sakov P (2008) Representation error of oceanic observations for data assimilation. J Atmos Ocean Technol 25(6):1004–1017. https://doi.org/10.1175/2007JTECHO558.1

    Article  Google Scholar 

  19. Pang G, Yang L, Karniadakis GE (2018) Neural-net-induced Gaussian process regression for function approximation and PDE solution. arXiv:1806.11187

  20. Pershing AJ, Alexander MA, Hernandez CM, Kerr LA et al (2015) Slow adaptation in the face of rapid warming leads to collapse of the gulf of maine cod fishery. Science 350(6262):809

    Article  Google Scholar 

  21. Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78(9):1481–1497

    Article  MATH  Google Scholar 

  22. Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate Gaussian process regression. J Mach Learn Res 6(Dec):1939–1959

    MathSciNet  MATH  Google Scholar 

  23. Rasmussen CE (2006) Gaussian processes for machine learning. MIT Press, Cambridge

    MATH  Google Scholar 

  24. Saitoh S (1988) Theory of reproducing kernels and its applications, vol 189. Longman, London

    MATH  Google Scholar 

  25. Samo YLK, Roberts SJ (2016) String and membrane Gaussian processes. J Mach Learn Res 17(131):1–87

    MathSciNet  MATH  Google Scholar 

  26. Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, Cambridge

    Google Scholar 

  27. Seeger M, Williams C, Lawrence N (2003) Fast forward selection to speed up sparse Gaussian process regression. In: Artificial intelligence and statistics 9, EPFL-CONF-161318

  28. Snelson E, Ghahramani Z (2007) Local and global sparse gaussian process approximations. AISTATS 11:524–531

    Google Scholar 

  29. Tikhonov A (1963) Solution of incorrectly formulated problems and the regularization method. Sov Math Dokl 5:1035–1038

    MATH  Google Scholar 

  30. Tikhonov AN, Arsenin VY (1977) Solutions of Ill-posed problems. W.H. Winston, Silver Spring

    MATH  Google Scholar 

  31. Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244

    MathSciNet  MATH  Google Scholar 

  32. Titsias MK (2009) Variational learning of inducing variables in sparse Gaussian processes. AISTATS 5:567–574

    Google Scholar 

  33. Urtasun R, Darrell T (2008) Sparse probabilistic regression for activity-independent human pose inference. In: IEEE conference on computer vision and pattern recognition, CVPR 2008. pp 1–8. IEEE

  34. Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin

    MATH  Google Scholar 

  35. Werdell PJ, Franz BA, Bailey SW, Feldman GC et al (2013) Generalized ocean color inversion model for retrieving marine inherent optical properties. Appl Opt 52(10):2019–2037. https://doi.org/10.1364/AO.52.002019

    Article  Google Scholar 

  36. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, pp 2–2. USENIX Association

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maziar Raissi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work received support by the DARPA EQUiPS Grant N66001-15-2-4055 and the AFOSR Grant FA9550-17-1-0013. This research has also been partially supported by a Grant from NOAA (NA18OAR4170105). The Moderate-resolution Imaging Spectroradiometer (MODIS) SST data were obtained from the NASA EOSDIS Physical Oceanography Distributed Active Archive Center (PO.DAAC) at the Jet Propulsion Laboratory, Pasadena, CA (http://dx.doi.org/10.5067/MODST-1D4N4).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Raissi, M., Babaee, H. & Karniadakis, G.E. Parametric Gaussian process regression for big data. Comput Mech 64, 409–416 (2019). https://doi.org/10.1007/s00466-019-01711-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00466-019-01711-5

Keywords

Navigation