Parametric Gaussian process regression for big data

Raissi, Maziar; Babaee, Hessam; Karniadakis, George Em

doi:10.1007/s00466-019-01711-5

Parametric Gaussian process regression for big data

Original Paper
Published: 09 May 2019

Volume 64, pages 409–416, (2019)
Cite this article

Computational Mechanics Aims and scope Submit manuscript

Abstract

This work introduces the concept of parametric Gaussian processes (PGP), which is built upon the seemingly self-contradictory idea of making Gaussian processes parametric. The resulting framework is capable of encoding massive amount of data into a small number of “hypothetical” data points. Moreover, parametric Gaussian processes are well aware of their imperfections and are capable of properly quantifying the uncertainty in their predictions associated with such limitations. The effectiveness of the proposed approach is demonstrated using three illustrative examples, including one with simulated data, a benchmark with dataset in the airline industry with approximately 6 million records, and spatio-temporal sea surface temperature maps in Massachusetts and Cape Cod Bays and Stellwagen Bank for the year 2015.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probability Laws for Nearly Gaussian Random Variables and Application

Multivariate Gaussian and Student-t process regression for multi-output prediction

Article Open access 31 December 2019

Dynamically Self-adjusting Gaussian Processes for Data Stream Modelling

Notes

References

Adam V, Hensman J, Sahani M (2016) Scalable transformed additive signal decomposition by non-conjugate Gaussian process inference. In: 2016 IEEE 26th international workshop on machine learning for signal processing (MLSP), pp 1–6. IEEE
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68(3):337–404
Article MathSciNet MATH Google Scholar
Berlinet A, Thomas-Agnan C (2011) Reproducing kernel Hilbert spaces in probability and statistics. Springer, Berlin
MATH Google Scholar
Csató L, Opper M (2002) Sparse on-line Gaussian processes. Neural Comput 14(3):641–668
Article MATH Google Scholar
Daley R (1992) Atmospheric data analysis. Cambridge University Press, Cambridge
Google Scholar
Dash P, Ignatov A, Martin M, Donlon C et al (2012) Group for high resolution sea surface temperature (ghrsst) analysis fields inter-comparisons-part 2: near real time web-based level 4 sst quality monitor (l4-squam). Deep Sea Res Part II Top Stud Oceanogr 77–80:31–43. https://doi.org/10.1016/j.dsr2.2012.04.002
Article Google Scholar
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Deisenroth MP, Ng JW (2015) Distributed gaussian processes. In: Bach FR, Blei DM (eds.) Proceedings of the 32nd international conference on machine learning, ICML 2015, Lille, France, 6-11 July 2015, JMLR workshop and conference proceedings, vol 37, pp 1481–1490. JMLR.org. http://jmlr.org/proceedings/papers/v37/deisenroth15.html
Gal Y, van der Wilk M, Rasmussen CE (2014) Distributed variational inference in sparse Gaussian process regression and latent variable models. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc., pp 3257–3265. http://papers.nips.cc/paper/5593-distributed-variational-inference-in-sparse-gaussian-process-regression-and-latent-variable-models.pdf
Hensman J, Durrande N, Solin A (2016) Variational fourier features for Gaussian processes. arXiv preprint arXiv:1611.06740
Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. arXiv preprint arXiv:1309.6835
Hensman J, Rattray M, Lawrence ND (2012) Fast variational inference in the conjugate exponential family. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems, vol 25. Curran Associates, Inc., pp 2888–2896. http://papers.nips.cc/paper/4766-fast-variational-inference-in-the-conjugate-exponential-family.pdf
Hoffman MD, Blei DM, Wang C, Paisley JW (2013) Stochastic variational inference. J Mach Learn Res 14(1):1303–1347
MathSciNet MATH Google Scholar
Kingma D, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Freeman J, Tsai D, Amde M, Owen S et al (2016) Mllib: Machine learning in apache spark. J Mach Learn Res 17(34):1–7
MathSciNet MATH Google Scholar
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, Cambridge
MATH Google Scholar
Neal RM (2012) Bayesian learning for neural networks, vol 118. Springer, Berlin
MATH Google Scholar
Oke PR, Sakov P (2008) Representation error of oceanic observations for data assimilation. J Atmos Ocean Technol 25(6):1004–1017. https://doi.org/10.1175/2007JTECHO558.1
Article Google Scholar
Pang G, Yang L, Karniadakis GE (2018) Neural-net-induced Gaussian process regression for function approximation and PDE solution. arXiv:1806.11187
Pershing AJ, Alexander MA, Hernandez CM, Kerr LA et al (2015) Slow adaptation in the face of rapid warming leads to collapse of the gulf of maine cod fishery. Science 350(6262):809
Article Google Scholar
Poggio T, Girosi F (1990) Networks for approximation and learning. Proc IEEE 78(9):1481–1497
Article MATH Google Scholar
Quiñonero-Candela J, Rasmussen CE (2005) A unifying view of sparse approximate Gaussian process regression. J Mach Learn Res 6(Dec):1939–1959
MathSciNet MATH Google Scholar
Rasmussen CE (2006) Gaussian processes for machine learning. MIT Press, Cambridge
MATH Google Scholar
Saitoh S (1988) Theory of reproducing kernels and its applications, vol 189. Longman, London
MATH Google Scholar
Samo YLK, Roberts SJ (2016) String and membrane Gaussian processes. J Mach Learn Res 17(131):1–87
MathSciNet MATH Google Scholar
Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, Cambridge
Google Scholar
Seeger M, Williams C, Lawrence N (2003) Fast forward selection to speed up sparse Gaussian process regression. In: Artificial intelligence and statistics 9, EPFL-CONF-161318
Snelson E, Ghahramani Z (2007) Local and global sparse gaussian process approximations. AISTATS 11:524–531
Google Scholar
Tikhonov A (1963) Solution of incorrectly formulated problems and the regularization method. Sov Math Dokl 5:1035–1038
MATH Google Scholar
Tikhonov AN, Arsenin VY (1977) Solutions of Ill-posed problems. W.H. Winston, Silver Spring
MATH Google Scholar
Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244
MathSciNet MATH Google Scholar
Titsias MK (2009) Variational learning of inducing variables in sparse Gaussian processes. AISTATS 5:567–574
Google Scholar
Urtasun R, Darrell T (2008) Sparse probabilistic regression for activity-independent human pose inference. In: IEEE conference on computer vision and pattern recognition, CVPR 2008. pp 1–8. IEEE
Vapnik V (2013) The nature of statistical learning theory. Springer, Berlin
MATH Google Scholar
Werdell PJ, Franz BA, Bailey SW, Feldman GC et al (2013) Generalized ocean color inversion model for retrieving marine inherent optical properties. Appl Opt 52(10):2019–2037. https://doi.org/10.1364/AO.52.002019
Article Google Scholar
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, pp 2–2. USENIX Association

Download references

Author information

Authors and Affiliations

Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA
Maziar Raissi & George Em Karniadakis
Department of Mechanical Engineering and Materials Science, University of Pittsburgh, Pittsburgh, PA, 15261, USA
Hessam Babaee

Authors

Maziar Raissi
View author publications
You can also search for this author in PubMed Google Scholar
Hessam Babaee
View author publications
You can also search for this author in PubMed Google Scholar
George Em Karniadakis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maziar Raissi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work received support by the DARPA EQUiPS Grant N66001-15-2-4055 and the AFOSR Grant FA9550-17-1-0013. This research has also been partially supported by a Grant from NOAA (NA18OAR4170105). The Moderate-resolution Imaging Spectroradiometer (MODIS) SST data were obtained from the NASA EOSDIS Physical Oceanography Distributed Active Archive Center (PO.DAAC) at the Jet Propulsion Laboratory, Pasadena, CA (http://dx.doi.org/10.5067/MODST-1D4N4).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raissi, M., Babaee, H. & Karniadakis, G.E. Parametric Gaussian process regression for big data. Comput Mech 64, 409–416 (2019). https://doi.org/10.1007/s00466-019-01711-5

Download citation

Received: 14 January 2019
Accepted: 15 April 2019
Published: 09 May 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s00466-019-01711-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parametric Gaussian process regression for big data

Abstract

Access this article

Similar content being viewed by others

Probability Laws for Nearly Gaussian Random Variables and Application

Multivariate Gaussian and Student-t process regression for multi-output prediction

Dynamically Self-adjusting Gaussian Processes for Data Stream Modelling

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parametric Gaussian process regression for big data

Abstract

Access this article

Similar content being viewed by others

Probability Laws for Nearly Gaussian Random Variables and Application

Multivariate Gaussian and Student-t process regression for multi-output prediction

Dynamically Self-adjusting Gaussian Processes for Data Stream Modelling

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation