ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (462 K)

Article Toolbox
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/j.csda.2006.11.008    
How to Cite or Link Using DOI (Opens New Window)

Copyright © 2006 Elsevier B.V. All rights reserved.

Computational techniques for spatial logistic regression with large data sets

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Christopher J. PaciorekCorresponding Author Contact Information, a, E-mail The Corresponding Author, E-mail The Corresponding Author, E-mail The Corresponding Author

aHarvard School of Public Health, 655 Huntington Avenue, Boston, MA 02115, USA


Received 6 October 2005; 
revised 29 August 2006; 
accepted 7 November 2006. 
Available online 1 December 2006.

Abstract

In epidemiological research, outcomes are frequently non-normal, sample sizes may be large, and effect sizes are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. I focus on binary outcomes, with the risk surface a smooth function of space, but the development herein is relevant for non-normal data in general. I compare penalized likelihood (PL) models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.

A Bayesian model using a spectral basis (SB) representation of the spatial surface via the Fourier basis provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being reasonably computationally efficient. One of the contributions of this work is further development of this underused representation. The SB model outperforms the PL methods, which are prone to overfitting, but is slower to fit and not as easily implemented. A Bayesian Markov random field model performs less well statistically than the SB model, but is very computationally efficient. We illustrate the methods on a real data set of cancer cases in Taiwan.

The success of the SB with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models.

Keywords: Bayesian statistics; Disease mapping; Fourier basis; Generalized linear mixed model; Geostatistics; Risk surface; Spatial statistics; Spectral basis

Article Outline

1. Introduction
2. Overview of methods
2.1. PL-based methods
2.1.1. PL and GLMMs
2.1.2. PL and generalized cross-validation
2.2. Bayesian methods
2.2.1. Bayesian GLMMs
2.2.2. Bayesian SB representation
2.2.3. Bayesian NN
2.2.4. Bayesian MRFs
2.3. Other methods
3. Implementation
3.1. Penalized likelihood and GLMMs (PL–PQL)
3.2. Penalized likelihood and GCV (PL–GCV)
3.3. Bayesian GLMM (Geo)
3.4. Bayesian spectral basis (SB)
3.5. Bayesian neural network (NN)
3.6. Bayesian Markov random field (MRF)
4. Simulations
4.1. Data sets
4.2. Assessment
4.3. Results
4.3.1. Quality of fit
4.3.2. Computational speed and MCMC performance
4.3.3. Ease of implementation
5. Case study
5.1. Background and data
5.2. Assessment
5.3. Results
6. Discussion
Acknowledgements
Appendix A. Representing functions in the spectral domain
Appendix B. Supplementary material
References





Corresponding Author Contact InformationTel.: +1 617 4324912; fax: +1 617 4325619.

 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.