Elsevier

Spatial and Spatio-temporal Epidemiology

Volume 1, Issue 1, October–December 2009, Pages 49-60
Spatial and Spatio-temporal Epidemiology

A Bayesian Maximum Entropy approach to address the change of support problem in the spatial analysis of childhood asthma prevalence across North Carolina

https://doi.org/10.1016/j.sste.2009.07.005Get rights and content

Abstract

The spatial analysis of data observed at different spatial observation scales leads to the change of support problem (COSP). A solution to the COSP widely used in linear spatial statistics consists in explicitly modeling the spatial autocorrelation of the variable observed at different spatial scales. We present a novel approach that takes advantage of the nonlinear Bayesian Maximum Entropy (BME) extension of linear spatial statistics to address the COSP directly without relying on the classical linear approach. Our procedure consists in modeling data observed over large areas as soft data for the process at the local scale. We demonstrate the application of our approach to obtain spatially detailed maps of childhood asthma prevalence across North Carolina (NC). Because of the high prevalence of childhood asthma in NC, the small number problem is not an issue, so we can focus our attention solely to the COSP of integrating prevalence data observed at the county level together with data observed at a targeted local scale equivalent to the scale of school districts. Our spatially detailed maps can be used for different applications ranging from exploratory and hypothesis-generating analyses to targeting intervention and exposure mitigation efforts.

Introduction

Asthma, one of the most common chronic childhood diseases (Gergen et al., 1988), is an inflammatory disease characterized by symptoms that include wheezing, coughing, breathlessness, and chest tightness. Approximately 8.9% of all children (6.5 million) in the United States suffer from current asthma symptoms (NCHS, 2006), reflecting its ubiquity in affluent societies (Strachan, 1999). Estimated total costs (direct and indirect) of treating asthma range up to $12.6 billion USD per year (Weiss et al., 2004). The causes of this costly chronic disease are still unknown; however, air pollution exposures (such as PM10, O3, SO2, and NO2) are suspects and have been extensively investigated (US EPA, 1996, US EPA, 2005, Gehring et al., 2002, Mortimer et al., 2002, Lewis et al., 2005).

While air pollutants have clearly been associated with exacerbations of asthma (including increased symptoms, emergency room visits, hospitalizations, and medication use), the association of air pollutants and increased asthma incidence is less clear. McConnell et al. (2002) have shown an association between asthma incidence and children exercising in high ozone areas, though conclusions were limited due to small sample sizes. Investigating the association between traffic-related air pollutants and incidence of children’s asthmatic symptoms, Zmirou et al. (2004) suggest that air pollution may be a potential contributor to increasing asthma prevalence in children, while other relevant environmental risk factors, such as exposure to traffic-related air pollution near the home, have recently been investigated (Delfino et al., 2009).

Asthma maps at fine spatial resolution provide invaluable information that allows epidemiologists to better understand risk factors , such as air pollutants, that may cause asthma and help identify susceptible subpopulations, such as the very young, the very old, individuals with particular pre-existing health conditions and/or with specific smoking behavior and socio-economic characteristics. Additionally, more spatially detailed asthma maps are helpful for public health intervention by not only identifying areas of high prevalence and targeting health clinical facilities for susceptible populations, but also identifying areas in which to focus efforts on abating suspected causal agents.

Geostatistics provide epidemiologists with an essential spatial estimation tool that takes into account the important spatial variability of asthma prevalence. The maps produced provide a visualization of disease prevalence that is extremely useful for health research. However, few studies on mapping asthma have been found, and existing works are often limited to an exploratory visualization of existing asthma prevalence data obtained at a single observation scale (e.g. Hernandez et al., 2000, Oyana et al., 2004).

Numerous data sources provide asthma prevalence data that can be used in mapping analysis. The asthma data can be collected in a number of ways, including random telephone surveys, questionnaire-based surveys, hospital discharge records, and Medicaid claims. However what is notable is the spatial aggregation scale, or observation scale, at which the data are reported, which may vary considerably from one data source to another.

One important reason for the difference in observation scale between data sources is that some data sources may have confidentiality requirements that only allow them to release data aggregated over large spatial scale (e.g. county level) to protect the privacy of the individuals who provided their health information. For example, the childhood asthma Medicaid claims data analyzed by Buescher and Jones-Vessey (1999) are aggregated at the county level, which is a large spatial observation scale providing strong protection of individual privacy and preventing deductive disclosure. Claims data are cost effective as they are derived from a health system that is already in place. However, it is not clear how well Medicaid claims data estimate asthma prevalence at a fine spatial scale. A second source of data that we used for this study is the cross-sectional asthma prevalence data obtained from a school-based asthma surveillance project, the North Carolina School Asthma Survey, or NCSAS (Yeatts et al., 2003). This project generated high quality asthma prevalence data at a fine spatial resolution. The NCSAS database provides good quality asthma prevalence estimates for the majority of middle schools in North Carolina, which corresponds to an observation scale that is much finer than that of the Medicaid data reported at the county level.

Our goal for this research is to perform an accurate mapping analysis of asthma symptom prevalence that rigorously accounts for the high natural variability of asthma prevalence across space, while efficiently integrating data collected at different observation scales. Integrating large scale data to obtain good estimates of asthma prevalence at a fine spatial resolution would lead to some substantial cost savings in North Carolina because it will enable the state health department to efficiently use data from existing systems such as Medicaid, which would reduce the need to conduct additional costly active asthma surveillance.

Gotway and Young (2002) provide an excellent review of statistical methods that address the issue of combining data obtained at different observation scales. A conceptual approach to this problem is to model observations at different observation scales as the spatial average of some fine scale process over the observation areas (i.e. the support of the observation), which is referred to as the change of support problem (COSP). Many of the methods addressing the COSP rely on modeling the spatial autocorrelation of the fine scale process observed at different spatial scales of interest. The procedure consists in averaging the fine scale process covariance to obtain the point-to-area or area-to-area covariance for areas (or observation scales) of any size (Journel and Huijbregts, 1978, Gotway and Young, 2002, Goovaerts, 2006, Banerjee et al., 2004). A classical solution for the prediction problem then consists in using the point-to-area and area-to-area covariances in a linear statistical estimator (e.g. block kriging). However, the implicit implication of this approach is that we are considering estimators that are a linear combination of the process observed at different scales. On the other hand, the powerful Bayesian Maximum Entropy (BME) method of modern spatiotemporal geostatistics (Christakos, 2000) provides a nonlinear non-Gaussian extension to classical linear Geostatistics that is not limited by this linear constraint. The goal of this paper is to present a novel approach to deal with the COSP using BME, which provides a framework for the nonlinear integration of data obtained at different observation scales. In the following sections we present this framework, and we apply it to the problem of mapping childhood asthma across North Carolina using prevalence data aggregated over large areas (counties) together with data obtained at the fine scale of interest (school districts).

One issue that we face when mapping rare diseases is the small number problem, which leads to noisy spatial distribution of observed rates that may require spatial smoothing. Let yi be the number of positive cases observed for some area i out of ni persons at risk. The spatial variation of the rate xi = yi/ni tends to be dominated for rare diseases by very high or low values observed where the denominator ni is small, because a small change in the numerator leads to a large change of the rate, resulting in the noisy spatial distribution mentioned above. The small number problem has been widely discussed and many approaches can deal with it. A classical approach to address this problem is to assume that the disease count Yi is Poisson distributed with a mean proportional to some measure of disease risk Ri, i.e Yi.  Poisson(EiRi), where Ei may be the expected number of cases or the population at risk. The approach then consists in obtaining estimates of the disease risk Ri, using, for example, a Bayesian framework (Besag et al., 1991, Lawson et al., 2003, Zhu et al., 2000, Kelsall and Wakefield, 2002, Gotway and Young, 2002, Banerjee et al., 2004, Diggle and Ribeiro, 2007), while the more recent Poison kriging method (Goovaerts, 2006, Goovaerts and Gebreab, 2008) might provide an attractive and computationally efficient alternative. These approaches basically consist in obtaining maps of disease risk that smooth out the noise arising from the small number problem for rare diseases. For example Goovaerts and Gebreab (2008) used Poisson kriging to obtain smooth maps of the risk for cervix cancer amongst white women in Indiana, where the population weighted mortality rate is only 2.85 per 100,000 person-years. By comparison, the population weighted prevalence of wheezing symptoms amongst North Carolina school children is 26,000 per 100,000 children, which is drastically greater than that of a rare disease. As a result, the small number problem is not an issue that we have addressed in this work. By choosing to model the observed rate X rather than the disease risk R we are able to solely focus our attention to the COSP. This allows us to focus on the novel BME solution to the COSP presented in this work, which can then be extended in future works to methods dealing with the small number problem.

Section snippets

Notation

Let R denote the set of real numbers, sR2 be a point in space, and X(s) be a spatial random field (SRF) representing the spatial distribution of disease prevalence. Let Xmap = [X1, X2,  , Xn] be a vector of random variables representing the SRF at points smap = [s1, s2,  , sn], i.e. X1 = X(s1),  , Xn = X(sn). In this article the upper case Xmap represents random variables and its lower case xmap = [x1,x2,  , xn] an observed sample (realization). The mean trend and covariance functions of the SRF X(s) are

Data

We have obtained two datasets with data on childhood asthma prevalence across North Carolina during the same time period of 1997–1999. The data with finer resolution are from a state-wide public middle school asthma survey collected in the 1999–2000 school year (Yeatts et al., 2003), while the second dataset is asthma Medicaid claims from 1997 to 1998 (Buescher and Jones-Vessey, 1999).

Trends and variability in childhood asthma prevalence

The SRF X(s) represents the spatial distribution of childhood asthma prevalence at the local scale. Its mean trend function provides a model for the systematic trends and consistent spatial structures of X(s), while its covariance function describes the inherent spatial variability of X(s).

We obtain the local scale mean trend function using a moving window average of the NCSAS data xhard with an exponentially decaying exponential filter. This leads to the mean trend function shown in Fig. 2(a).

Discussion

Mapping childhood asthma prevalence (as well as other diseases) is complicated by the fact that data are often available at a variety of spatial scales. This is particularly the case because several data sources have confidentiality requirements that only allow release of information aggregated over spatial scales that are sufficiently large to ensure the privacy of the individuals who provided their health information.

We develop a mathematical framework to map the spatial distribution of

Acknowledgment

This work was supported by grants from the National Institute of Environmental Health Sciences (Grant Nos. 5 P42 ES05948 and P30ES10126).

References (30)

  • Environmental Protection Agency (US EPA). Air quality criteria for ozone and related photochemical oxidants (second...
  • U. Gehring et al.

    Traffic-related air pollution and respiratory health during the first 2 yrs of life

    Eur Respir J

    (2002)
  • P.J. Gergen et al.

    National survey of prevalence of asthma among children in the United States, 1976 to 1980

    Pediatrics

    (1988)
  • P. Goovaerts

    Geostatistical analysis of disease data: accounting for spatial support and population density in the isopleth mapping of cancer mortality risk using area-to-point Poisson kriging

    Int J Health Geograph

    (2006)
  • P. Goovaerts et al.

    How does Poisson kriging compare to the popular BYM model for mapping disease risks?

    Int J Health Geograph

    (2008)
  • Cited by (20)

    • Principles and methods of scaling geospatial Earth science data

      2019, Earth-Science Reviews
      Citation Excerpt :

      However, the required calculation of the cross-semivariogram and point semivariogram make downscaling computationally intensive. To integrate several sources of information while accounting for the uncertainty of the estimated parameters, methods that use Bayesian inference such as Bayesian regression, Bayesian hierarchical modeling, and Bayesian maximum entropy exhibit great potential for downscaling applications (Jameson and Heymsfield, 2013; Kou et al., 2016; Lee et al., 2009; Li et al., 2013; Song et al., 2015; Zhang and Yan, 2015), in which the definition of the prior knowledge is still a challenge. Furthermore, beyond the traditional two-point geostatistics (such as kriging methods), multiple-point geostatistics has been proposed for downscaling (Jha et al., 2013; Strebelle, 2002), characterizing the spatial structure by involving three or more points simultaneously.

    • A survey on ecological regression for health hazard associated with air pollution

      2016, Spatial Statistics
      Citation Excerpt :

      Alternatively, a measurement error approach can be adopted, like the one proposed for example by Gryparis et al. (2009) and Szpiro and Paciorek (2013) based on a Berkson error model which allows to reduce bias in the second stage estimation (health model). The Bayesian Maximum Entropy (BME) method is an alternative proposed by Lee et al. (2009) to deal with the COSP thus providing a framework for the non-linear integration of data obtained at different observation scales. Typically, exposure is assessed using data from an existing monitoring network.

    • Estimating the spatial distribution of soil moisture based on Bayesian maximum entropy method with auxiliary data from remote sensing

      2014, International Journal of Applied Earth Observation and Geoinformation
      Citation Excerpt :

      BME can readily consider uncertain yet valuable information at the estimation points. Additionally, in the framework of BME, good estimates of childhood asthma prevalence at fine spatial resolution were obtained by nonlinear integration of prevalence data aggregated over large areas and the data obtained at the fine scale of interest (Lee, 2005; Lee et al., 2009). In the field of soil science, D’Or et al. (2001) and D’Or (2003) investigated the use of BME for estimating soil textural fractions in space by integrating a small hard data set with a larger soft data set.

    • Blending multi-resolution satellite sea surface temperature (SST) products using Bayesian maximum entropy method

      2013, Remote Sensing of Environment
      Citation Excerpt :

      As a probabilistic method, BME is capable of using uncertain data that enriches the subject information and considers their uncertainties in obtaining much more objective results. BME has been successfully applied to soil property mapping (Bogaert & D'Or, 2002; D’Or et al., 2001; D'Or, 2003; Douaik et al., 2005, 2007), environmental risk assessment (Akita et al., 2007; Bogaert et al., 2009; Christakos et al., 2001, 2004; Lee, 2005; Money, 2008; Pang et al., 2010; Puangthongthub, 2006; Wang et al., 2011; Yu et al., 2009) and in other endeavours (Law et al., 2004, 2006; Lee et al., 2008, 2009; Savelieva et al., 2005). BME has been proven to have the potential as a blending method to integrate in-situ observations and remotely sensed data.

    • Analytic science for geospatial and temporal variability in renewable energy: A case study in estimating photovoltaic output in Arizona

      2011, Solar Energy
      Citation Excerpt :

      We adopt the Bayesian Maximum Entropy (BME) method of modern geostatistics to process the probabilistic data because BME has improved estimates in similar situations by efficiently processing data uncertainty (i.e., “soft data”). This has been demonstrated in numerous case studies encompassing medical geography (Lee et al., 2009), environmental exposure (Akita et al., 2007; Puangthongthub et al., 2007; Allshouse et al., 2009), risk assessment (Serre et al., 2003; Choi et al., 2008), spatial epidemiology (Law et al., 2004), urban geography/sustainability (Brazel et al., 2007; Lee et al., 2008; Lee and Wentz, 2008), and water demand forecasting (Lee et al., 2010). This article shows how BME can accurately represent subhourly (10-min) solar output over state-size areas using: (1) estimation points taking advantage of temporally abundant measured PV data (Type I, below) and (2) estimation points without nearby measured data, which corresponds to a common case where there are inaccurate modeled data only. (

    View all citing articles on Scopus
    View full text