Skip to main content

Advertisement

Log in

Non-linear regression models for Approximate Bayesian Computation

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Approximate Bayesian inference on the basis of summary statistics is well-suited to complex problems for which the likelihood is either mathematically or computationally intractable. However the methods that use rejection suffer from the curse of dimensionality when the number of summary statistics is increased. Here we propose a machine-learning approach to the estimation of the posterior density by introducing two innovations. The new method fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling. The new algorithm is compared to the state-of-the-art approximate Bayesian methods, and achieves considerable reduction of the computational burden in two examples of inference in statistical genetics and in a queueing model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Beaumont, M.A., Zhang, W., Balding, D.J.: Approximate Bayesian computation in population genetics. Genetics 162, 2025–2035 (2002)

    Google Scholar 

  • Beaumont, M.A.: Joint determination of topology, divergence time, and immigration in population trees. In: Matsumura, S., Forster, P., Renfrew, C. (eds.) Simulation, Genetics and Human Prehistory. McDonald Institute Monographs: Cambridge McDonald Institute for Archeological Research, UK, pp. 134–154 (2008)

  • Beaumont, M.A., Cornuet, J.-M., Marin, J.-M., Robert, C.P.: Adaptivity for ABC algorithms: the ABC-PMC scheme (2009). arXiv:0805.2256

  • Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    Book  MATH  Google Scholar 

  • Blum, M.G.B., Tran, V.C.: Approximate Bayesian Computation for epidemiological models: Application to the Cuban HIV-AIDS epidemic with contact-tracing and unobserved infectious population (2008). arXiv:0810.0896

  • Box, G.E.P., Cox, D.R.: An analysis of transformations. J. R. Stat. Soc. B 26, 211–246 (1964)

    MATH  MathSciNet  Google Scholar 

  • Bortot, P., Coles, S.G., Sisson, S.A.: Inference for stereological extremes. J. Am. Stat. Assoc. 102, 84–92 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Butler, A., Glasbey, C.A.: A latent Gaussian model for compositional data with structural zeroes. J. R. Stat. Soc. Ser. C (Appl. Stat.) 57, 505–520 (2008)

    Article  Google Scholar 

  • Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001). Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  • Diggle, P.J., Gratton, R.J.: Monte Carlo methods of inference for implicit statistical models. J. R. Stat. Soc. B 46, 193–227 (1984)

    MATH  MathSciNet  Google Scholar 

  • Fagundes, N.J.R., Ray, N., Beaumont, M., Neuenschwander, S., Salzano, S.M., Bonatto, S.L., Excoffier, L.: Statistical evaluation of alternative models of human evolution. Proc. Natl. Acad. Sci. USA 104, 17614–17619 (2007)

    Article  Google Scholar 

  • Fan, J., Yao, Q.: Efficient estimation of conditional variance functions in stochastic regression. Biometrika 85, 645–660 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman, J.H., Stuetze, W.: Projection pursuit regression. J. Am. Stat. Assoc. 76, 817–823 (1981)

    Article  Google Scholar 

  • Fu, Y.-X., Li, W.-H.: Maximum likelihood estimation of population parameters. Genetics 134, 1261–1270 (1993)

    Google Scholar 

  • Fu, Y.-X., Li, W.-H.: Estimating the age of the common ancestor of a sample of DNA sequences. Mol. Biol. Evol. 14, 195–199 (1997)

    Google Scholar 

  • Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn. Chapman & Hall, London (2003)

    Google Scholar 

  • Grelaud, A., Robert, C.P., Marin, J.-M., Rodolphe, F., Taly, J.-F.: ABC methods for model choice in Gibbs random fields (2009). arXiv:0807.2767

  • Gourieroux, C., Monfort, A., Renault, E.: Indirect inference. J. Appl. Econ. 8, 85–118 (1993)

    Article  Google Scholar 

  • Härdle, W., Müller, M., Sperlich, S., Werwatz, A.: Nonparametric and Semiparametric Models. Springer, New York (2004)

    MATH  Google Scholar 

  • Heggland, K., Frigessi, A.: Estimating functions in indirect inference. J. R. Stat. Soc. B 66, 447–462 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  • Hey, J., Nielsen, R.: Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc. Natl. Acad. Sci. USA 104, 2785–2790 (2007)

    Article  Google Scholar 

  • King, J.P., Kimmel, M., Chakraborty, R.: A power analysis of microsatellite-based statistics for inferring past population growth. Mol. Biol. Evol. 17, 1859–1868 (2000)

    Google Scholar 

  • Kuhner, M.K.: LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics 22, 768–770 (2006)

    Article  Google Scholar 

  • Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, New York (2001)

    MATH  Google Scholar 

  • Marjoram, P., Tavaré, S.: Modern computational approaches for analysing molecular genetic variation data. Nat. Rev. Genet. 7, 759–770 (2006)

    Article  Google Scholar 

  • Marjoram, P., Molitor, J., Plagnol, V., Tavaré, S.: Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100, 15324–15328 (2003)

    Article  Google Scholar 

  • Nadaraya, E.A.: On estimating regression. Theory Probab. Appl. 9, 141–142 (1964)

    Article  Google Scholar 

  • Nix, D.A., Weigend, A.S.: Learning local error bars for nonlinear regression. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems 7 (NIPS’94), pp. 489–496. MIT Press, Cambridge (1995)

    Google Scholar 

  • Ohta, T., Kimura, M.: A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22, 201–204 (1973)

    Article  MathSciNet  Google Scholar 

  • Pritchard, J.K., Feldman, M.W.: Statistics for microsatellite variation based on coalescence. Theor. Popul. Biol. 50, 325–344 (1996)

    Article  MATH  Google Scholar 

  • Pritchard, J.K., Seielstad, M.T., Perez-Lezaun, A., Feldman, M.W.: Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999)

    Google Scholar 

  • R Development Core Team: R: A Language and Environment for Statistical. R Foundation for Statistical Computing, Vienna, Austria (2008)

  • Ratmann, O., Jørgensen, O., Hinkley, T., Stumpf, M., Richardson, S., Wiuf, C.: Using likelihood-free inference to compare evolutionary dynamics of the protein networks of H. pylori and P. falciparum. PLoS Comput. Biol. 3, e230 (2007)

    Article  Google Scholar 

  • Reich, D.E., Goldstein, D.B.: Genetic evidence for a Paleolithic human population expansion in Africa. Proc. Natl. Acad. Sci. USA 95, 8119–8123 (1998)

    Article  Google Scholar 

  • Ripley, B.D.: Pattern Recognition and Neural Networks. Oxford University Press, London (1996)

    MATH  Google Scholar 

  • Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)

    MATH  Google Scholar 

  • Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J.: Estimating the support of a high-dimensional distribution. Neural Comput. 13, 1443–1471 (2001)

    Article  MATH  Google Scholar 

  • Shriver, M.D., Jin, L., Ferrell, R.E., Deka, R.: Microsatellite data support an early population expansion in Africa. Genome Res. 7, 586–591 (1997)

    Google Scholar 

  • Sisson, S.A., Fan, Y., Tanaka, M.M.: Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104, 1760–1765 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Stephens, M., Donnelly, P.: Inference in molecular population genetics. J. R. Stat. Soc. Ser. B 62, 605–635 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Tanaka, M., Francis, A., Luciani, F., Sisson, S.: Estimating tuberculosis transmission parameters from genotype data using approximate Bayesian computation. Genetics 173, 1511–1520 (2006)

    Article  Google Scholar 

  • Tavaré, S.: Ancestral inference in population genetics. In: Picard, J. (ed.) Lectures on Probability Theory and Statistics, pp. 1–188. Springer, Berlin (2004)

    Google Scholar 

  • Tavaré, S., Balding, D.J., Griffiths, R.C., Donnelly, P.: Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997)

    Google Scholar 

  • Toni, T., Welch, D., Strelkowa, N., Stumpf, M.P.H.: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187–202 (2009)

    Article  Google Scholar 

  • Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)

    MATH  Google Scholar 

  • Watson, G.S.: Smooth regression analysis. Shankya Ser. A 26, 359–372 (1964)

    MATH  Google Scholar 

  • Weiss, G., von Haeseler, A.: Inference of population history using a likelihood approach. Genetics 149, 1539–1546 (1998)

    Google Scholar 

  • Wilkinson, R.D.: Approximate Bayesian computation (ABC) gives exact results under the assumption of model error (2008). arXiv:0811.3355

  • Wilson, I.J., Weale, M.E., Balding, D.J.: Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities. J. R. Stat. Soc. Ser. A 166, 155–187 (2003)

    Article  MathSciNet  Google Scholar 

  • Zhivotovsky, L.A., Bennett, L., Bowcock, A.M., Feldman, M.W.: Human population expansion and microsatellite variation. Mol. Biol. Evol. 17, 757–767 (2000)

    Google Scholar 

  • Zhivotovsky, L.A., Rosenberg, N.A., Feldman, M.W.: Features of evolution and expansion of modern humans, inferred from genome-wide microsatellite markers. Am. J. Hum. Genet. 72, 1171–1186 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael G. B. Blum.

Electronic Supplementary Material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Blum, M.G.B., François, O. Non-linear regression models for Approximate Bayesian Computation. Stat Comput 20, 63–73 (2010). https://doi.org/10.1007/s11222-009-9116-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-009-9116-0

Keywords

Navigation