Abstract
Natural populations of living organisms often have complex histories consisting of phases of expansion and decline, and the migratory patterns within them may fluctuate over space and time. When parts of a population become relatively isolated, e.g., due to geographical barriers, stochastic forces reshape certain DNA characteristics of the individuals over generations such that they reflect the restricted migration and mating/reproduction patterns. Such populations are typically termed as genetically structured and they may be statistically represented in terms of several clusters between which DNA variations differ clearly from each other. When detailed knowledge of the ancestry of a natural population is lacking, the DNA characteristics of a sample of current generation individuals often provide a wealth of information in this respect. Several statistical approaches to model-based clustering of such data have been introduced, and in particular, the Bayesian approach to modeling the genetic structure of a population has attained a vivid interest among biologists. However, the possibility of utilizing spatial information from sampled individuals in the inference about genetic clusters has been incorporated into such analyses only very recently. While the standard Bayesian hierarchical modeling techniques through Markov chain Monte Carlo simulation provide flexible means for describing even subtle patterns in data, they may also result in computationally challenging procedures in practical data analysis. Here we develop a method for modeling the spatial genetic structure using a combination of analytical and stochastic methods. We achieve this by extending a novel theory of Bayesian predictive classification with the spatial information available, described here in terms of a colored Voronoi tessellation over the sample domain. Our results for real and simulated data sets illustrate well the benefits of incorporating spatial information to such an analysis.
Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Andrieu C, Doucet A and Robert CP (2004). Computational advances for and from Bayesian Analysis. Stat Sci 19: 120–129
Balding DJ and Nichols RA (1997). Significant genetic correlations among Caucasians at forensic DNA loci. Heredity 78: 583–589
Barber CB, Dobkin DP and Huhdanpaa HT (1996). The Quickhull algorithm for convex hulls. ACM Trans Math Software 22: 469–483
Berry A (1999) A wide-range efficient algorithm for minimal triangulation. Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, SIAM, pp 860–861
Cegelski CC, Waits LP and Anderson NJ (2003). Assessing population structure and gene flow in Montana wolverines (Gulo gulo) using assignment-based approaches. Mol Ecol 12: 2907–2918
Corander J, Waldmann P and Sillanpää MJ (2003). Bayesian analysis of genetic differentiation between populations. Genetics 163: 367–374
Corander J, Waldmann P, Marttinen P and Sillanpää MJ (2004). BAPS 2: enhanced possibilities for the analysis of genetic population structure. Bioinformatics 20: 2363–2369
Corander J, Marttinen P and Mäntyniemi S (2006). Bayesian identification of stock mixtures from molecular marker data. Fish Bull 104: 550–558
Corander J, Gyllenberg M, Koski T (2007) Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy. Adv Data Analysis Classification, under review
Denison DGT and Holmes CC (2001). Bayesian partitioning for estimating disease risk. Biometrics 57: 143–149
Duda RO, Hart PE and Stork DG (2000). Pattern classification, 2nd edn. Wiley, New York
Falush D, Stephens M and Pritchard JK (2003). Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics 164: 1567–1587
Gelfand AE and Vounatsou P (2003). Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4: 11–25
Guillot G, Estoup A, Mortier F and Cosson JF (2005). A spatial statistical model for landscape genetics. Genetics 170: 1261–1280
Hartl DL and Clark AG (1997). Principles of population genetics, 3rd edn. Sinauer Associates, Sunderland
Heikkinen J and Arjas E (1998). Non-parametric Bayesian estimation of a spatial Poisson intensity. Scand J Statist 25: 435–450
Heikkinen J and Arjas E (1999). Modeling a poisson forest in variable elevations: a nonparametric Bayesian approach. Biometrics 55: 738–745
Kimura M and Weiss GH (1964). The stepping-stone model of population structure and the decrease of genetic correlation with distance. Genetics 49: 561–576
Lauritzen SL (1996). Graphical models. Oxford University Press, Oxford
Manni F, Guérard E and Heyer E (2004). Geographic patterns of (genetic, morphologic, linguistic) variation: how barriers can be detected by “Monmonier’s algorithm”. Hum Biol 76: 173–190
Pella J and Masuda M (2001). Bayesian methods for analysis of stock mixtures from genetic characters. Fish Bull 99: 151–167
Perks W (1947). Some observations on inverse probability including a new indifference rule. J Inst Actuaries 73: 285–334
Pritchard JK, Stephens M and Donnelly P (2000). Inference of population structure using multilocus genotype data. Genetics 155: 945–959
Rannala B and Mountain JL (1997). Detecting immigration by using multilocus genotypes. PNAS 94: 9197–9201
Seppä P, Gyllenstrand M, Corander J and Pamilo P (2004). Coexistence of the social types: Genetic population structure in the ant Formica exsecta. Evolution 58: 2462–2471
Sawyer S (1977). Asymptotic properties of the equilibrium probability of identity in a geographically structured population. Adv Appl Prob 9: 268–282
Vounatsou P, Smith T and Gelfand AE (2000). Spatial modeling of multinomial data with latent structure; an application to geographical mapping of human gene and haplotype frequencies. Biostatistics 1: 177–189
Wasser SK, Shedlock AM, Comstock K, Ostrander EA, Mutayoba B and Stephens M (2004). Assigning African elephant DNA to geographic region of origin: Applications to the ivory trade. PNAS 101: 14847–14852
Wright S (1943). Isolation by distance. Genetics 28: 139–156
Wright S (1951). The genetical structure of populations. Ann Eugen 15: 323–354
Wright S (1965). The interpretation of population structure by F-statistics with special regard to systems of mating. Evolution 52: 950–956
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Corander, J., Sirén, J. & Arjas, E. Bayesian spatial modeling of genetic population structure. Computational Statistics 23, 111–129 (2008). https://doi.org/10.1007/s00180-007-0072-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-007-0072-x