Abstract
Spatio-temporal change of support methods are designed for statistical analysis on spatial and temporal domains which can differ from those of the observed data. Previous work introduced a parsimonious class of Bayesian hierarchical spatio-temporal models, which we refer to as STCOS, for the case of Gaussian outcomes. Application of STCOS methodology from this literature requires a level of proficiency with spatio-temporal methods and statistical computing which may be a hurdle for potential users. The present work seeks to bridge this gap by guiding readers through STCOS computations. We focus on the R computing environment because of its popularity, free availability, and high quality contributed packages. The stcos package is introduced to facilitate computations for the STCOS model. A motivating application is the American Community Survey (ACS), an ongoing survey administered by the U.S. Census Bureau that measures key socioeconomic and demographic variables for various populations in the United States. The STCOS methodology offers a principled approach to compute model-based estimates and associated measures of uncertainty for ACS variables on customized geographies and/or time periods. We present a detailed case study with ACS data as a guide for change of support analysis in R, and as a foundation which can be customized to other applications.
Similar content being viewed by others
References
Banerjee S, Carlin BP, Gelfand AE (2014) Hierarchical modeling and analysis for spatial data, 2nd edn. Chapman and Hall, London
Bates D, Maechler M (2019) Matrix: sparse and dense matrix classes and methods. https://CRAN.R-project.org/package=Matrix. R package version 1.2-18
Battersby SE, Finn MP, Usery EL, Yamamoto KH (2014) Implications of Web Mercator and its use in online mapping. Cartogr Int J Geogr Inf Geovis 49(2):85–101. https://doi.org/10.3138/carto.49.2.2313
Bivand RS, Pebesma E, Gómez-Rubio V (2013) Applied spatial data analysis with R, 2nd edn. Springer, Berlin
Bradley JR, Holan SH, Wikle CK (2015a) Multivariate spatio-temporal models for high-dimensional areal data with application to longitudinal employer-household dynamics. Ann Appl Stat 9(4):1761–1791. https://doi.org/10.1214/15-AOAS862
Bradley JR, Wikle CK, Holan SH (2015b) Spatio-temporal change of support with application to American Community Survey multi-year period estimates. Stat 4(1):255–270. https://doi.org/10.1002/sta4.94
Breakstone CD, Anderson TS (2019) Census data API user guide. https://www.census.gov/data/developers/guidance/api-user-guide.html. Version 1.6
Brunsdon C (2014) pycno: Pycnophylactic Interpolation. https://CRAN.R-project.org/package=pycno. R package version 1.2
Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017) Stan: a probabilistic programming language. J Stat Softw 76(1):1–32. https://doi.org/10.18637/jss.v076.i01
Cortes RX, Rey S, Knaap E (2019) pysal/tobler: Tobler initial release. https://doi.org/10.5281/zenodo.3386577
Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, New York
de Valpine P, Turek D, Paciorek CJ, Anderson-Bergman C, Lang DT, Bodik R (2017) Programming with models: writing statistical algorithms for general model structures with NIMBLE. J Comput Graph Stat 26(2):403–413. https://doi.org/10.1080/10618600.2016.1172487
Depaoli S, Clifton JP, Cobb PR (2016) Just another Gibbs sampler (JAGS): flexible software for MCMC implementation. J Educ Behav Stat 41(6):628–649. https://doi.org/10.3102/1076998616664876
Eddelbuettel D (2013) Seamless R and C++ integration with Rcpp. Springer, Berlin
Eddelbuettel D, Sanderson C (2014) RcppArmadillo: accelerating R with high-performance C++ linear algebra. Comput Stat Data Anal 71:1054–1063. https://doi.org/10.1016/j.csda.2013.02.005
Eicher CL, Brewer CA (2001) Dasymetric mapping and areal interpolation: implementation and evaluation. Cartogr Geogr Inf Sci 28(2):125–138. https://doi.org/10.1559/152304001782173727
ESRI (1998) ESRI shapefile technical description. https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf. Accessed 27 May 2020
Fuentes M, Song H-R, Ghosh SK, Holland DM, Davis JM (2006) Spatial association between speciated fine particles and mortality. Biometrics 62(3):855–863. https://doi.org/10.1111/j.1541-0420.2006.00526.x
Gotway CA, Young LJ (2002) Combining incompatible spatial data. J Am Stat Assoc 97(458):632–648. https://doi.org/10.1198/016214502760047140
Higham NJ (1988) Computing a nearest symmetric positive semidefinite matrix. Linear Algebra Appl 103:103–118. https://doi.org/10.1016/0024-3795(88)90223-6
Lam NS-N (1983) Spatial interpolation methods: a review. Am Cartogr 10(2):129–150. https://doi.org/10.1559/152304083783914958
Lunn D, Spiegelhalter D, Thomas A, Best N (2009) The BUGS project: evolution, critique and future directions. Stat Med 28(25):3049–3067. https://doi.org/10.1002/sim.3680
Mileu N, Queirós M (2018) Development of a QGIS plugin to dasymetric mapping. In: Free and open source software for geospatial (FOSS4G) conference proceedings, vol 18, no 9. https://doi.org/10.7275/3628-0a51
National Academy of Sciences (2015) Realizing the potential of the American Community Survey: challenges, tradeoffs, and opportunities. National Academies Press, Washington, DC. https://doi.org/10.17226/21653
Nguyen H, Cressie N, Braverman A (2012) Spatial statistical data fusion for remote sensing applications. J Am Stat Assoc 107(499):1004–1018. https://doi.org/10.1080/01621459.2012.694717
Nychka D, Saltzman N (1998) Design of air quality monitoring networks. Lecture notes in statistics. Springer, pp 51–76. https://doi.org/10.1007/978-1-4612-2226-2_4
Nychka D, Furrer R, Paige J, Sain S (2017) Fields: tools for spatial data. University Corporation for Atmospheric Research, Boulder, CO, USA. https://github.com/NCAR/Fields. R package version 10.3. Accessed 27 May 2020
Ooms J (2014) The jsonlite package: a practical and consistent mapping between JSON data and R objects. arXiv:1403.2805
Pebesma E (2018) Simple features for R: standardized support for spatial vector data. R J 10(1):439–446. https://doi.org/10.32614/RJ-2018-009
Plummer M, Best N, Cowles K, Vines K (2006) CODA: convergence diagnosis and output analysis for MCMC. R News 6(1):7–11
Prener CG, Revord CK (2019) Areal: an R package for areal weighted interpolation. J Open Source Softw 4(37):1221. https://doi.org/10.21105/joss.01221
Qiu F, Zhang C, Zhou Y (2012) The development of an areal interpolation ArcGIS extension and a comparative study. GISci Remote Sens 49(5):644–663. https://doi.org/10.2747/1548-1603.49.5.644
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Raim AM, Holan SH, Bradley JR, Wikle CK (2017) A model selection study for spatio-temporal change of support. In: JSM Proceedings, government statistics section. American Statistical Association, Alexandria, pp 1524–1540
Rode M, Arhonditsis G, Balin D, Kebede T, Krysanova V, Van Griensven A, Van der Zee SE (2010) New challenges in integrated water quality modelling. Hydrol Process 24(24):3447–3461. https://doi.org/10.1002/hyp.7766
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A (2002) Bayesian measures of model complexity and fit. J R Stat Soc Ser B (Stat Methodol) 64(4):583–639. https://doi.org/10.1111/1467-9868.00353
Stan Development Team (2020) RStan: the R interface to Stan. http://mc-stan.org/. R package version 2.19.3. Accessed 27 May 2020
Tobler WR (1979) Smooth pycnophylactic interpolation for geographical regions. J Am Stat Assoc 74(367):519–530. https://doi.org/10.1080/01621459.1979.10481647
U.S. Census Bureau (2016) American Community Survey data suppression. https://www.census.gov/programs-surveys/acs/technical-documentation/data-suppression.html. Accessed 2 Sept 2019
U.S. Census Bureau (2018) Understanding and using American Community Survey data: What all data users need to know. https://www.census.gov/programs-surveys/acs/guidance/handbooks/general.html. Accessed 2 Sept 2019
Walker K (2018) tigris: Load Census TIGER/Line Shapefiles. https://CRAN.R-project.org/package=tigris. R package version 0.7. Accessed 27 May 2020
Waller LA, Gotway CA (2004) Applied spatial statistics for public health data. Wiley-Interscience, New York
Weinberg DH, Abowd JM, Belli RF, Cressie N, Folch DC, Holan SH, Levenstein MC, Olson KM, Reiter JP, Shapiro MD, Smyth JD, Soh L-K, Spencer BD, Spielman SE, Vilhuber L, Wikle CK (2018) Effects of a government-academic partnership: Has the NSF-Census Bureau Research Network helped improve the US statistical system? J Surv Stat Methodol. https://doi.org/10.1093/jssam/smy023
Wickham H (2016) ggplot2: elegant graphics for data analysis. Springer, New York
Wickham H, François R, Henry L, Müller K (2020) dplyr: a grammar of data manipulation. https://CRAN.R-project.org/package=dplyr. R package version 0.8.5. Accessed 27 May 2020
Wikle CK, Berliner LM (2005) Combining information across spatial scales. Technometrics 47(1):80–91. https://doi.org/10.1198/004017004000000572
Acknowledgements
This research was partially supported by the U.S. National Science Foundation (NSF) and the U.S. Census Bureau under NSF grant SES-1132031, funded through the NSF-Census Research Network (NCRN) program, and NSF Awards SES-1853096 and SES-1853099. This article is released to inform interested parties of ongoing research and to encourage discussion. The views expressed on statistical issues are those of the authors and not the NSF or U.S. Census Bureau. The authors thank Taylor Bowen and Toni Messina from the Office of Information Technology/GIS, City of Columbia, Missouri for supplying the shapefile used in the case study and for useful discussion.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Computational details and proofs
Computational details and proofs
We will make use of the following well-known property in several places.
Property 1
If \({\varvec{A}} \in \mathbb {R}^{m \times k}\), \({\varvec{B}} \in \mathbb {R}^{k \times l}\), \({\varvec{C}} \in \mathbb {R}^{l \times n}\), then \(\text {vec}({\varvec{A}} {\varvec{B}} {\varvec{C}}) = ({\varvec{C}}^\top \otimes {\varvec{A}}) \text {vec}({\varvec{B}})\).
The following proposition gives the explicit solution to the minimization problem stated in (3.10). Bradley et al. (2015a) considers a similar problem featuring a more general objective function but assuming that the columns of \({\varvec{S}}\) are orthonormal. Higham (1988) gives a general discussion of problems involving Frobenius and 2-norm distance minimization.
Proposition 1
(Frobenius Norm Minimization) Suppose \({\varvec{S}} \in \mathbb {R}^{n \times r}\) has rank r and \({\varvec{\varSigma }} \in \mathbb {R}^{n \times n}\) is positive definite. The minimizer \({\varvec{X}} \in \mathbb {R}^{r \times r}\) of \(\Vert {\varvec{\varSigma }} - {\varvec{S}} {\varvec{X}} {\varvec{S}}^\top \Vert _\text {F}\) is \({\varvec{X}} = ({\varvec{S}}^\top {\varvec{S}})^{-1} {\varvec{S}}^\top {\varvec{\varSigma }} {\varvec{S}} ({\varvec{S}}^\top {\varvec{S}})^{-1}\).
Proof
Using Property 1, we have
where the norm on the last line is the usual 2-norm on \(\mathbb {R}^{n^2}\). We recognize the expression in (A.1) as a standard least squares minimization whose solution is
Therefore, the minimizer is \({\varvec{X}} = ({\varvec{S}}^\top {\varvec{S}})^{-1} {\varvec{S}}^\top {\varvec{\varSigma }} {\varvec{S}} ({\varvec{S}}^\top {\varvec{S}})^{-1}\), as desired. \(\square \)
Remark 1
(MLE Computation) To compute the MLE for the STCOS model, we first note that the likelihood, excluding the parameter model, is
where \({\varvec{\varDelta }} = \sigma _\xi ^2 {\varvec{I}} + {\varvec{V}} + \sigma _K^2 {\varvec{S}} {\varvec{K}} {\varvec{S}}^\top \). Given \(\sigma _K^2\) and \(\sigma _\xi ^2\), the likelihood is maximized by the weighted least squares estimator \(\hat{{\varvec{\mu }}}_B = ({\varvec{H}}^\top {\varvec{\varDelta }}^{-1} {\varvec{H}})^{-1} {\varvec{H}}^\top {\varvec{\varDelta }}^{-1} {\varvec{Z}}\). To estimate the unknown \(\sigma _K^2\) and \(\sigma _\xi ^2\), we carry out numerical maximization on the partially maximized log-likelihood
To enforce the constraints that \(\sigma _K^2 > 0\) and \(\sigma _\xi ^2 > 0\), we optimize over \((\vartheta _1, \vartheta _2) \in \mathbb {R}^2\) and take \(\sigma _K^2 = \exp (\vartheta _1)\), \(\sigma _\xi ^2 = \exp (\vartheta _2)\).
Rights and permissions
About this article
Cite this article
Raim, A.M., Holan, S.H., Bradley, J.R. et al. Spatio-temporal change of support modeling with R. Comput Stat 36, 749–780 (2021). https://doi.org/10.1007/s00180-020-01029-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-01029-4