Abstract
This paper operationalizes the idea of a local indicator of spatial association for the situation where the variables of interest are binary. This yields a conditional version of a local join count statistic. The statistic is extended to a bivariate and multivariate context, with an explicit treatment of co-location. The approach provides an alternative to point pattern-based statistics for situations where all potential locations of an event are available (e.g., all parcels in a city). The statistics are implemented in the open-source GeoDa software and yield maps of local clusters of binary variables, as well as co-location clusters of two (or more) binary variables. Empirical illustrations investigate local clusters of house sales in Detroit in 2013 and 2014, and urban design characteristics of Chicago census blocks in 2017.











Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.Notes
For a general discussion of spatial weights, see, for example, Bavaud (1998), Getis (2009), and Anselin and Rey (2014). Social interaction and social network extensions can be found in Dow et al. (1982), Akerlof (1997), Leenders (2002), Páez et al. (2008), and Papachristos and Bartomski (2018), among others.
In some rare examples, data on the complete population are available, and a case-control design becomes equivalent to a lattice data setting. However, in a typical case-control setup, the controls are a sample, and thus, not all non-event locations are included.
Rogerson (2006) also includes a local form of the statistic, which counts the number of cases among the neighbors for a given location. Except for the case-control setup, this is formally equivalent to the local join count statistic described below.
This is formally the same as the Jacquez et al. (2005) local Q statistic for location i at time t with k-nearest neighbors, i.e., \(Q_{i,k,t} = c_i \sum _j n_{ijkt} c_j\), where \(c_{i,j} = 1\) for a case and \(= 0\) for a control, and \(n_{ijkt}\) are the nearest neighbor weights for k-nearest neighbors of location i at time t. It is also essentially the same as the local similarity relation in Farber et al. (2015), i.e., \(\Gamma _{d,i} = \sum _j I_{ij}\), where \(I_{ij} = 1\) when the values at i and j are “similar” for d nearest neighbors. In contrast to these measures, which are based on nearest neighbor relations, the local join count statistic is couched in a lattice data structure with spatial weights. Formally, the expressions are the same, but conceptually, they differ.
Yet a different strand of local cluster statistics is based on the scan-statistic logic first outlined in Kulldorff (1997), and its many extensions. However, since this approach does not provide a link between a local and global statistic—a fundamental property of a LISA statistic as outlined in Anselin (1995)—it is not further considered here.
Note that this is a conditional probability. It thus underestimates the actual uncertainty associated with the occurrence of a value of 1 and its particular configuration of neighbors. The unconditional probability would be the joint probability of observing \(x_i = 1\)andp neighbors \(x_j = 1\). This not what is considered here.
In larger samples, the distinction between using \(N-1\) and \(P-1\) compared to N and P is likely negligible. Also, the distinction between sampling without replacement (the hypergeometric distribution) and sampling with replacement (the binomial distribution) is likely to be small for large data sets with few events.
This is the logic behind the local z statistic for the case-control setting suggested in Rogerson (2006).
In the limit, the neighbors would include all other observations.
Note how a case-control setup can be couched in these terms, since a case and a control cannot occur at the same location. For example, \(x_i = 1\) for a case and \(z_j = 1\) for a control. The BJC statistic would then count the number of controls among the neighbors of i, or, with the roles reversed, the number of cases around a control a i.
Since the conditional permutation is designed to draw tuples of existing pairs of x and z, the procedure respects the in-place association between x and z.
Formally, we could also consider the situation where \(x_i = z_i = 1\) is surrounded by either \(z_j = 1\) or \(x_j = 1\), ignoring the value for the other variable. However, we see little practical application where there is a meaningful interpretation for this situation, and we do not consider it further.
Repeat sales were removed from the data set (only the latest sale is recorded), so that there is no overlap between the two point patterns.
Note that not all sales are standard transactions and many are the result of auctions, resulting in arbitrary sales prices, typically less than $1000. We ignore the actual sales value in our analysis, but keep all transactions in the data set.
Because of the resolution of the map, it is not possible to distinguish all individual points, since several pertain to close-by locations that tend to be plotted on top of each other.
Recall that by construction, none of the points overlap between the two years.
Again, due to the scale of the map, the figure only shows eight points. In three cases, two adjoining locations are found that cannot be individually distinguished in the map.
The classification is derived from an extensive set of data, most notably the City of Chicago Business Licenses data for 2017. Most data are for 2017, a few are for 2016, and the sidewalk data are for 2012. The census block definition is from 2010. Details can be found in Talen and Jeong (2018, Table 1).
Note that the highlighted blocks form the core of the cluster, but do not include the neighbors that also may show co-location. In this example, several blocks are neighbors as well, but this is not always the case. In other words, the highlighted blocks underestimate the spatial extent of the actual cluster.
References
Akerlof GA (1997) Social distance and social decisions. Econometrica 65:1005–1027
Anselin L (1995) Local indicators of spatial association—LISA. Geogr Anal 27:93–115
Anselin L (1996) The Moran scatterplot as an ESDA tool to assess local instability in spatial association. In: Fischer M, Scholten H, Unwin D (eds) Spatial analytical perspectives on GIS in environmental and socio-economic sciences. Taylor and Francis, London, pp 111–125
Anselin L (2019) A local indicator of multivariate spatial association: extending Geary’s c. Geogr Anal 51:133–150. https://doi.org/10.1111/gean.12164
Anselin L, Rey SJ (2014) Modern spatial econometrics in practice, a guide to GeoDa, GeoDaSpace and PySAL. GeoDa Press, Chicago
Anselin L, Syabri I, Smirnov O (2002) Visualizing multivariate spatial correlation with dynamically linked windows. In: Anselin L, Rey S (eds) New tools for spatial data analysis: proceedings of the specialist meeting. Center for Spatially Integrated Social Science (CSISS), University of California, Santa Barbara. CD-ROM
Bavaud F (1998) Models for spatial weights: a systematic look. Geogr Anal 30:153–171
Boots B (2003) Developing local measures of spatial association for categorical data. J Geogr Syst 5:139–160
Boots B (2006) Local configuration measures for categorical spatial data: binary regular lattices. J Geogr Syst 8:1–24
Cliff A, Ord JK (1973) Spatial autocorrelation. Pion, London
Congdon P (2016) A local join counts methodology for spatial clustering in disease from relative risk models. Commun Stat Theory Methods 45:3059–3075
Cromley RG, Hanink DM, Bentley GC (2014) Geographically weighted colocation quotients: specification and application. Prof Geogr 66:138–148
Cuzick J, Edwards R (1990) Spatial clustering for inhomogeneous populations. J R Soc B 52:73–104
de Castro MC, Singer BH (2006) Controlling the false discovery rate: an application to account for multiple and dependent tests in local statistics of spatial association. Geogr Anal 38:180–208
Dow MM, Burton ML, White DR (1982) Network autocorrelation: a simulation study of a foundational problem in regression and survey research. Soc Netw 4:169–200
Efron B, Hastie T (2016) Computer age statistical inference, algorithms, evidence, and data science. Cambridge University Press, Cambridge
Farber S, Martin MR, Páez A (2015) Testing for spatial independence using similarity relations. Geogr Anal 47:97–120
Getis A (1984) Interaction modeling using second-order analysis. Environ Plan A 16:173–183
Getis A (2009) Spatial weights matrices. Geogr Anal 41:404–410
Getis A, Franklin J (1987) Second-order neighborhood analysis of mapped point patterns. Ecology 68:473–477
Getis A, Ord JK (1992) The analysis of spatial association by use of distance statistics. Geogr Anal 24:189–206
Getis A, Ord JK (1996) Local spatial statistics: an overview. In: Longley P, Batty M (eds) Spatial analysis: modeling in a GIS environment. GeoInformation International, pp 261–277
Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16:1472–1485
Hubert LJ, Golledge R, Costanzo CM (1981) Generalized procedures for evaluating spatial autocorrelation. Geogr Anal 13:224–233
Jacquez GM, Kaufmann A, Meliker J, Goovaerts P, AvRuskin G, Nriagu J (2005) Global, local and focused geographic clustering for case-control data with residential histories. Environ Health 4:4
Jacquez GM, Meliker JR, AvRuskin GA, Goovaerts P, Kaufmann A, Wilson ML, Nriagu J (2006) Case-control geographic clustering for residential histories accounting for risk factors and covariates. Int J Health Geogr 5:32
Jirjies S, Wallstrom G, Halden RU, Scotch M (2016) pyJacqQ: python implementation of Jacquez’s Q-statistics for space-time clustering of disease exposure in case-control studies. J Stat Softw. https://doi.org/10.18637/jss.v074.i06
Kulldorff M (1997) A spatial scan statistic. Commun Stat Theory Methods 26:1481–1496
Lee S-I (2001) Developing a bivariate spatial association measure: an integration of Pearson’s r and Moran’s I. J Geogr Syst 3:369–385
Leenders RTAJ (2002) Modeling social influence through network autocorrelation: constructing the weights matrix. Soc Netw 24:21–47
Leslie TF, Kronenfeld BJ (2011) The colocation quotient: a new measure of spatial association between categorical subsets of points. Geogr Anal 43:306–326
Leslie TF, Frankenfeld CL, Makara MA (2012) The spatial food environment of the DC metropolitan area: clustering, co-location, and categorical differentiation. Appl Geogr 35:300–307
Long JA, Nelson TA, Wulder MA (2010) Local indicators for categorical data: impacts of scaling decisions. Can Geogr/Le Géographe Canadien 54:15–28
López F, Matilla-García M, Mur J, Marín MR (2010) A non-parametric spatial independence test using symbolic entropy. Reg Sci Urban Econ 40:106–115
Mack EA, Credit K, Suandi M (2017) A comparative analysis of firm co-location behavior in the Detroit metropolitan area. Ind Innov 25:264
Moran PA (1948) The interpretation of statistical maps. Biometrika 35:255–260
Okabe A, Boots B, Sato T (2010) A class of local and global K functions and their exact statistical properties. In: Anselin L, Rey SJ (eds) Perspectives on spatial data analysis. Springer, Berlin, pp 101–112
Ord JK, Getis A (1995) Local spatial autocorrelation statistics: distributional issues and an application. Geogr Anal 27:286–306
Ord JK, Getis A (2001) Testing for local spatial autocorrelation in the presence of global autocorrelation. J Reg Sci 41:411–432
Páez A, Scott DM, Volz E (2008) Weight matrices for social influence analysis: an investigation of measurement errors and their effect on model identification and estimation quality. Soc Netw 30:309–317
Papachristos AV, Bartomski S (2018) Connected in crime: the enduring effect of neighborhood networks on the spatial patterning of violence. Am J Sociol 124:517–568
Ripley BD (1981) Spatial statistics. Wiley, New York
Rogerson PA (2006) Statistical methods for the detection of spatial clustering in case-control data. Stat Med 25:811–823
Rogerson PA (2015) Maximum Getis-Ord statistic adjusted for spatially autocorrelated data. Geogr Anal 47:20–33
Ruiz M, López F, Páez A (2010) Testing for spatial association of qualitative data using symbolic dynamics. J Geogr Syst 12:281–309
Talen E, Jeong H (2018) Does the classic American main street still exist? An exploratory look. J Urban Des. https://doi.org/10.1080/13574809.2018.1436962
Wang F, Hu Y, Wang S, Li X (2017) Local indicator of colocation quotient with a statistical significance test: examining spatial association of crime and facilities. Prof Geogr 69:22–31
Acknowledgements
This research was funded in part by Award 1R01HS021752-01A1 from the Agency for Healthcare Research and Quality (AHRQ), “Advancing spatial evaluation methods to improve healthcare efficiency and quality.” Emily Talen and Hyesun Jeong provided the urban design classifications of the Chicago census block data. Comments by Julia Koschinsky and referees on an earlier version of the paper are greatly appreciated.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Anselin, L., Li, X. Operational local join count statistics for cluster detection. J Geogr Syst 21, 189–210 (2019). https://doi.org/10.1007/s10109-019-00299-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10109-019-00299-x