Spatial machine learning for predicting physical inactivity prevalence from socioecological determinants in Chicago, Illinois, USA

Lotfata, Aynaz; Georganos, Stefanos

doi:10.1007/s10109-023-00415-y

Spatial machine learning for predicting physical inactivity prevalence from socioecological determinants in Chicago, Illinois, USA

Original Article
Published: 05 June 2023

(2023)
Cite this article

Download PDF

Journal of Geographical Systems Aims and scope Submit manuscript

Spatial machine learning for predicting physical inactivity prevalence from socioecological determinants in Chicago, Illinois, USA

Download PDF

1753 Accesses
Explore all metrics

Abstract

The increase in physical inactivity prevalence in the USA has been associated with neighborhood characteristics. While several studies have found an association between neighborhood and health, the relative importance of each component related to physical inactivity or how this value varies geographically (i.e., across different neighborhoods) remains unexplored. This study ranks the contribution of seven socioecological neighborhood factors to physical inactivity prevalence in Chicago, Illinois, using machine learning models at the census tract level, and evaluates their predictive capabilities. First, we use geographical random forest (GRF), a recently proposed nonlinear machine learning regression method that assesses each predictive factor’s spatial variation and contribution to physical inactivity prevalence. Then, we compare the predictive performance of GRF to geographically weighted artificial neural networks, another recently proposed spatial machine learning algorithm. Our results suggest that poverty is the most important determinant in the Chicago tracts, while on the other hand, green space is the least important determinant in the rise of physical inactivity prevalence. As a result, interventions can be designed and implemented based on specific local circumstances rather than broad concepts that apply to Chicago and other large cities.

Machine Learning Approaches for Measuring Neighborhood Environments in Epidemiologic Studies

Article Open access 30 June 2022

Spatial analysis of neighborhood vitality determinants on physical activity: a case study of Chicago

Article 26 August 2022

Urbanization and physical activity in the global Prospective Urban and Rural Epidemiology study

Article Open access 06 January 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Behavioral health investigates behavioral problems impacting health such as physical inactivity (PI) (Kris-Etherton et al. 2021). PI is responsible for the deaths of approximately 3.2 million people per year and affects about 32% of the world's population aged 15 years and older (World Health Organization (WHO) 2020). The costs associated with PI account for more than 11% of total healthcare expenditures in the USA and are estimated to be $117 billion in 2021 (US America Health Ranking). The WHO defines physical inactivity in adults as less than 150 min of moderate-intensity activity per week. Since the industrial revolution, new technologies have enabled people to reduce the amount of physical labor required to complete many daily tasks. Although those technological revolutions have benefited many populations around the world, they have come at a high cost in terms of PI 's contribution to the global epidemic of noncommunicable diseases, such as cardiovascular disease-related mortality (Amidei et al. 2022), mental distresses (Pears et al. 2022), and cancer-related mortality (Cao et al. 2022). Indeed, adults' occupational physical activity is decreasing, but their leisure-time physical activity (i.e., walking and cycling) is increasing (Hallal et al. 2012). However, only 22% of Americans report regular, sustained physical activity (activity of any intensity lasting 30 min or more five times a week) (US America Health Ranking 2022). Individuals' physical activity is defined by a variety of policy, social, economic, cultural, environmental, behavioral, and biological factors (Sallis et al. 2006; Xue et al. 2018). Moreover, gaps in physical activity surveillance persist in individual and neighborhood levels. Given that more than half of the world's population lives in cities with limited leisure time, understanding the neighborhoods' characteristics may aid in identifying factors that promote sedentary and inactive lifestyles on a city-wide scale (Faka et al. 2019).

PI as a context-dependent and cross-disciplinary behavioral health has been associated with socioeconomic and built-environment factors (Aljabri 2022; Prince et al. 2022; Rütte et al. 2013). In terms of the built environment, urban characteristics such as urban form (e.g., residential density and high connectivity street network), neighborhood safety, and green space (Xiao et al. 2022; Peng et al. 2021; Passi-Solar et al. 2020; Feng and Astell-Burt 2019; Andersen et al. 2018) have received increased attention due to their associations with PI, obesity, and cardiovascular disease (Adlakha et al. 2021; Sims et al. 2020; An et al. 2019; Lenhart et al. 2017; Xu et al. 2015; Unger et al. 2014). Evidence suggests that the built environment can positively or adversely influence health behaviors. For example, PI increases are associated with residential insecurity and unaffordability (Rodgers et al. 2019). In economic crises, cost-burdened low-income households have limited resilience, resulting in housing insecurity that harm health (Rodgers et al. 2019). While the house circumstance is a social and environmental determinant of health (Rofle et al. 2020), there has been little research on the relationship between household ownership and health. Rental housing is related to poor neighborhood quality (Harris 1999), residential unaffordability, instability (Desmond and Shollenberger 2015), eviction from rental housing resulting in loss of social capital and trust among neighbors, and a lack of social cohesion, all of which have an impact on population behavioral health (Cradock et al. 2009). Household ownership, on the other hand, improves neighborhood quality by fostering social ties among neighbors, increasing satisfaction (Le Boennec and Lucas 2022), and ensuring residential stability (Mohanty and Raut 2009). Hence, housing ownership is critical in determining neighborhood quality and should be investigated further in public health studies.

Recent PI studies (Prince et al. 2022; Akinwumiju et al. 2022; Lee et al. 2017) have improved our understanding of the reasons of a significant increase in PI locally, globally, and in the United States. Typically, most PI research relies on linear aspatial models (i.e., Moreno-Llamas et al. 2021) and linear spatial models (i.e., Lane and Davis 2022; Kim and Kim 2021). However, nonlinear spatial models are rarely used to investigate relationships between tract-level PI prevalence and socioenvironmental factors. In one of those cases, Yang et al. (2021) studied the nonlinear effects of streetscape greenery on the walking behavior of older adults using a convolutional neural network (CNN) to evaluate human-scale, eye-level streetscape greenery. Although deep learning and artificial neural networks provide excellent predictive performance, they do not reveal how each variable affects the output. Moreover, geospatial technologies such as geographical positioning systems (GPS) are used to study people’s physical activities (i.e., Ramírez et al. 2021; Wang et al. 2021; Li 2021) that are leading to highly complex datasets, for which nonlinear solutions can be beneficial, especially when predictive performance is one of the objectives.

This study aims to introduce geographical random forest (GRF), a recently developed spatial machine learning algorithm into PI studies. The GRF is a decision tree-based nonparametric machine learning model (Georganos et al. 2021). It was recently developed to address the limitations of aspatial machine learning models and provide a modeling alternative to GWR approaches as it does not assume local linearity. The GRF model is similar to the GWR in concept, except that it is calibrated using random forests rather than an ordinary least squares regressors. It may also have superior predictive power and be used to evaluate associations between independent and dependent spatial variables by calculating variable importance (Georganos and Kalogirou 2022). Following that, each variable can be ranked from most to least important, tracing associations between independent and dependent variables.

GRF was recently used to model mean household income related to tertiary education, unemployment rate, and employment in the European Union regions (Georganos and Kalogirou 2022). Also, it was used to model and map the relative importance of 29 socioeconomic and health-related factors to the COVID-19 death rate, outperforming commonly used local and global regression methods such as OLS and GWR (Grekousis et al. 2022). It was also applied to predict diabetes prevalence in the USA (Quiones et al. 2021). However, more evidence is needed to investigate GRF use in other geographic contexts.

As a case study, we use Chicago, IL, USA, where nearly two out of every three adults (69%) are overweight or obese (US CDC 2021). While PI studies have previously relied heavily on traditional linear methods such as GWR and MGWR (Lee et al. 2017; Akinwumiju et al. 2022), this study applies state-of-the-art novel methods that have not been used in the specific domain before and introduces spatial machine learning to an audience that is typically not exposed to. Hence, the purpose of this study is to demonstrate how the recently developed GRF model can be used as a predictive and exploratory tool to investigate PI spatial heterogeneity while taking into account the nonlinear relationship between socioenvironmental factors and PI prevalence. This strategy can be used in various situations where independent variables can have varying predictive capacity and associations across space. To ensure the robustness of the experiments, the results are compared to the recently developed geographically weighted artificial neural networks (GWANN) (Hagenauer and Helbich 2022).

2 Data and method

2.1 Data

We obtained model-based estimates of current PI among the population over the age of 18 for all 793 census tracts in Chicago (“Physical inactivity” dependent variable). Based on responses to the Behavioral Risk Factor Surveillance System survey, the Centers for Disease Control and Prevention's PLACES Project provides statistics on PI as well as many other health-related metrics (US CDC 2021). PI is defined as the proportion of people who report doing no physical activity in their spare time. To obtain PI data, respondents were asked to answer the following question via a telephone survey: “Did you engage in any physical activities or exercises, other than your regular job, such as jogging, calisthenics, golf, gardening, or walking for exercise during the previous month?”.

As independent variables to the models, we utilize a poverty indicator (“Poverty” variable) based on income thresholds for those living below the poverty line, the unemployment ratio (“Unemployment” variable), and the eviction rates (“Eviction rate” variable). The eviction variable is constructed as the filings per renter-occupied households to the corresponding census tract population by the Chicago Health Atlas (CHA 2021). The threat of eviction has an impact on behavioral health as well as mental health due to anxiety and depression of household instability (Vásquez-Vera et al. 2017). Second, we quantified green space (“Green space” variable) as the mean of the green space index per census tract using the Normalized Difference Vegetation Index (NDVI) data from Landsat 8 NASA Earth Data (Google Earth Engine; Chander et al. 2009). In this study, urban green areas include any vegetation in the urban environment, including parks, green open spaces, residential gardens, mangroves, street trees, and green infrastructure (Labib et al. 2020). Furthermore, we quantified crime (“Crime ratio” Variable) using data provided by the Chicago Data Portal (CDP 2021) as the ratio between the number of crimes and the corresponding census tract population. Lastly, we provided the average percentage of owners per household (“Owners per household” variable) and the average percentage of renters per household (“Renter per household” variable) from the Location Affordability Index provided by the US Department of Housing and Urban Development (HUD 2021). As a final step into the preprocessing of the dataset, we examined the dataset for multicollinearity and ensured that correlation among independent variables did not lead to variance inflation. To do so, we calculated the variable correlation matrix and confirming that the variance inflation factors were below the recommended threshold of 2.5 (Cranley and Surles 2002).

With an average size of 0.28 square miles (standard deviation [SD] ± 0.39), census tracts were presumed a suitable analytical scale for assessing area-level physical inactivity prevalence. Census Tracts are small, permanent statistical subdivisions of a county according to the United States Administrative Divisions. Census tracts contain around 4000 people and are more demographically homogeneous, revealing more about local activities. The census tract-level variables were linked to the geometries via their 11-digit FIPS codes (US ACS 2019).

2.2 Methods

2.2.1 Random forest and geographical random forest

Breiman (2001) introduced random forest (RF), a nonparametric machine learning technique. RF is a grouping of independent decision trees that have been trained and tested for classification and regression problems. For each decision tree, a subset of training data is created through random sampling with replacement from the original training set (usually, two-thirds of the training set is used). Additionally, a random subset of predictor variables is picked for each node in each decision tree. A forest of trees is ultimately trained through the process described above with random subsets of training data points and predictors. Finally, each tree's prediction or classification error is computed, and all trees' final output is the average prediction value.

In the out-of-bag (OOB) set, the leftover data (the proportion of the data that is not sampled for training the trees) are used for internal evaluation and independent accuracy indicators. The RF OOB accuracy is a robust indicator frequently used to evaluate each variable's relevance and overall model performance (Breiman 2001). The increase in mean square error (IncMSE) is a typical metric for determining the relevance of a variable (Georganos et al. 2021). The OOB error is calculated by randomly permuting the values of each variable in the OOB set. If the OOB error increases, the variable is important, and the greater the change, the more relevant the variable is in estimating the dependent variable (Grekousis et al. 2022).

One of the most useful characteristics of an RF is its ability to determine the relative importance of each independent variable. In other words, it establishes which variable influences the dependent variable most. RF makes no statistical assumptions about the distribution of the data. As a result, it is better suited to dealing with variables with nonlinear correlations. Nonetheless, it still remains a global model, where spatial variations might not be accounted for, hindering both prediction and inference. The geographical random forest (GRF) model is a recent approach for modeling nonlinear and spatially heterogeneous relationships. GRF employs a spatial weights matrix and is locally calibrated using only nearby observations via a spatial kernel. In other words, a local RF is created and evaluated for each location based on input data from neighboring spatial units. GRF quantifies the local feature importance for each predictor in each local random forest model and the local residuals and goodness of fit (training and OOB). In addition, partial dependence plots were used to characterize the nonlinear relationships between the PI prevalence and the covariates. The partial dependence plots reveal whether the relationship between the target and a covariate is linear, monotonic, curvilinear, or more complex by presenting the expected target response as a function of the input covariate of interest (Friedman 2001). Georganos and Kalogirou (2022), Grekousis et al. (2022) and Georganos et al. (2021) provide comprehensive presentations of GRF regarding procedures, overfitting issues, and confounding variables. In this study, we also compared the results of the GRF with MultiScale geographical weighted regression (MGWR) (Oshan et al. 2019) to investigate the effectiveness of the machine learning models. The MGWR model produces the local R², the local residual, and the set of local coefficients for each spatial unity. This allows for analyzing the spatial variation of associations between dependent and independent variables. On the other hand, GRF yields the local feature importance for each predictor in each local random forest model, the local residuals, and the local goodness of fit (training and OOB) (Kalogirou and Georganos 2022). Because the optimal bandwidth may vary across the predictors and is a priori unknown, we employed a MGWR, whereas each predictor variable has its own bandwidth (Iyanda and Osayomi 2021). This allows the scale of relationship non-stationarity to vary for each response-to-predictor variable relationship.

2.2.2 Geographically weighted artificial neural network

Artificial neural networks (ANNs) are machine learning (ML) models that have gained popularity in the computer vision field over the last decade and typically outperform traditional statistical models in terms of predictive power (Dave and Dutta 2014). The ANN is a network of interconnected nodes inspired by the simplification of neurons in the brain. A network of neurons with unidirectional connections allows a simulation of brain features to identify patterns and detect associations within data (Abiodun et al. 2018).

Multiple layers of neurons may be contained in an ANN model. The “input layer” is the first layer, and the “output layer” is the last, with "hidden layers" in between where neuronal weights define and express the link between inputs and outputs (Dongare et al. 2012). A single neuron in the next layer connects a group of neurons in the previous layer, while each layer transforms input values and defines neuronal weights. Hyperparameters such as the learning rate, number of hidden layers, and batch size should all be adjusted before running the model. The number of nodes in hidden layers, or the number of hidden layers depends on the problem at hand. Broadly speaking, the more complex the problem at hand, the more hidden neurons are needed; if the problem is linear, i.e., the relationship between dependent and independent data is linear, no hidden neurons are required. With regard to the number of hidden layers, NNs with just one hidden layer are commonplace when tabular data are used to solve a problem. In fact, the universal approximation theorem states that NNs using just one hidden layer can approximate any continuous function on compact subsets of ${\mathbb{R}}^{n}$ with arbitrary accuracy (Cybenko 1989; Hornik et al. 1989). However, in some cases, using more than one hidden layer can be convenient. For instance, deep NNs that utilize hundreds of hidden layers have been successfully applied in the domain of image and audio processing (e.g., LeCunn et al. 2015; Goodfellow et al. 2016).

Additionally, the batch size determines how many samples will be trained to estimate the error gradient before the model weights are updated. Gradient descent is used to train neural networks to find the best parameters that minimize the model's error. Learning rate is all about adjusting the weights of neurons and synapses (junctions between two neurons) to increase the accuracy of the outcome by minimizing observable errors. Although a high learning rate reduces training time, it also reduces ultimate accuracy (Abiodun et al. 2018).

A geographically weighted artificial neural network (GWANN) is similar to a basic ANN (Haykin 2008) except for two properties. First, each output neuron of a GWANN is assigned to a geographic location for which a prediction is to be made. This enables the computation of the spatial distances between the observations and the positions of the output neurons. Second, rather than the standard quadratic error function, GWANN calculates an error signal using a geographically weighted error function. That is, the difference between the output neurons' output and the target values is weighted by the spatial distance between the output neurons' location and the observation; when the location and observation are close together, the difference is given more weight than when they are farther apart. It is important to note that the number of target values and output neurons must be the same. For a thorough presentation of the GWANN formulas, see Hagenauer and Helbich (2022).

2.2.3 Parameter tuning

GRF uses a spatial weights matrix and is calibrated locally using only nearby observations through a spatial kernel. This study used the adaptive spatial kernel to estimate the optimal number of nearest neighbors as it is the most widely used and more appropriate when sampling is not uniform, such as census tracts of varying sizes in this case. In more detail, for each location, a local RF model is created based on input data of neighboring spatial units, in a similar fashion to GWR. Following the example of others (Grekousis et al. 2022; Quinones et al. 2021), we optimized the hyperparameters of GRF (‘number of variables randomly sampled (mtry)’, and ‘number of trees (ntree)’) using Random Grid Search (RGS) on the RF model (using CARET library in R). We then kept these hyperparameters fixed on GRF and initialized models with varying bandwidth sizes and selected the one with the highest OOB R². The following variables were used to train the GRF: bandwidth = 22, “mtry” = 2, and “ntree” = 4000. Both global and local RF models were trained with the initial dataset to explore variation in feature importance due to the data distribution. Similar to RF, we ranked the variables’ importance based on the percent change in the MSE (Georganos et al. 2021). For predictions, a weight parameter is defined that fuses local and global estimates for the GRF model.

For the GWANN modeling, the input variables were scaled to have a zero mean and unit variance to facilitate the learning process. We adopted Nesterov's accelerated gradient (Sutskever et al. 2013) to adjust the connection weights with a momentum coefficient of 0.90. The learning rate of GWANN was set at 0.05, and the mini-batch size was set to 50. We used a single hidden layer consisting of eight hidden neurons. In addition, a bias neuron is added to the input and hidden layer. For the hidden neurons, the hyperbolic tangent function is employed as an activation function. We then kept these hyperparameters fixed and applied random grid search (RGS) with tenfold cross-validation on GWANN to find the best values for the bandwidth “bw,” which was found for a value of 3. To illustrate, a bandwidth of 3 means that the kernel bandwidth for estimating the parameters for a location equals the distance of this location to its third neighbor. It is important to note that small k does not necessarily mean “very local,” because k depends on the density of the data. For instance, if the data are very sparse, the kernel for a small k will encompass a larger geographic area than if the data are very dense. In addition, we used cross-validation to ensure that the models do not overfit. If the model would overfit due to a small kernel size (i.e. k = 3), the cross-validation score would be very poor and hence such a small kernel size would not be considered. Moreover, due to the Gaussian kernel, all locations are always considered but with varying degrees. GWANN does not make any assumption about the statistical distribution of the data and considers spatial heterogeneity. As a result, it is better equipped to handle variables which have nonlinear correlations that vary spatially.

2.2.4 Validation

To test the predictive performance of models, we used 10 randomly chosen splits of the data with 90% retained for training the models and 10% of the data held back for testing. Specifically, the data set (n = 793) was randomly split into 713 training data objects used to train the models and 80 test data objects used for evaluating the model performance. We then computed performance metrics such as the MSE, RMSE, and coefficient of determination (R²). Lastly, we reported the average value of these metrics over the 10 iterations.

2.2.5 Spatial autocorrelation

We use local Moran's I to assess residual spatial autocorrelation and track potential clustering in the residuals of the models (Anselin 1995) using the full census dataset.

2.3 Software

All our statistical analyses were conducted on the “SpatialML” package (Kalogirou and Georganos 2022) and the “GWANN” package (Hagenauer and Helbich 2022) in R Statistical Computing Environment (R Core Team 2020), and for cartography, the ArcGIS 10.8.1 was used.

3 Results

Figure 1 illustrates the prevalence of PI and socioecological variables at the tract level. PI is more common in Chicago's southern and western neighborhoods. Chicago's south side has the highest crime rate. Poverty, eviction, and unemployment are also more prevalent in the south and west. The greenest areas are in the neighborhoods to the south and far north. Except in downtown areas, the city's population of renters and house owners is fairly consistent.

Partial dependence plots (PDPs) were extracted from the random forest model to characterize the nonlinear relationship between the predictor variables and PI prevalence (Fig. 2). All characteristics, such as crime ratio (Fig. 2a), green space (Fig. 2d), and renters per household (Fig. 2f), have a curvilinear relationship with PI prevalence. Linear correlations exist between variables, but only within specific ranges. There is a positive linear link between eviction rate and PI prevalence in the range 0.5–1 (Fig. 2e). Similarly, in range 5–6, there is a linear positive link with poverty and PI prevalence, but the effect reduces after that (Fig. 2b). The data show that nearly all associations are nonlinear, highlighting the need of utilizing nonlinear regression models.

The GRF model had lower MSE values and a higher local R² in the out-of-sample predictions (OOB set) than the global RF model (Table 1). Table 1 shows that the R² and MSE from RF and GRF are very close, implying that the global model developed by RF may be valid for describing the relationships between the PI and the covariates. The GFR model should be used to test local variations of the relationships. According to the permutation-based feature importance, poverty is the most important feature, followed by unemployment (PFI) in both models. As shown in Table 1, the importance ranking of determinants in GRF can differ from that of the global RF model. Renters per household, for example, are ranked third in the RF model (higher IncMSE) but fifth in the GRF model (lower IncMSE). Similarly, in the RF model, the crime ratio is ranked fifth, but third in the GRF model (Table 1). Nonetheless, it is important to note that the ranking of the GRF approach is based on the mean value of the whole local distribution of feature importance.

Table 1 Summary results of random forest (RF) and geographically random forest (GRF) models

Full size table

Additionally, the MGWR model (AICc = 116.27, RSS = 36.02) exhibited an adjusted R² of 0.88. The Moran’s I test (I = −0.01 p = 0.15) of MGWR turns out to be statistically insignificant confirming a lack of residual spatial autocorrelation congruent with the model assumption (Supplemental Fig. 1.). Supplemental Fig. 2 illustrates the MGWR model covariates’ variation. The optimal bandwidth ranges from 44 to 487 per census tract, suggesting that the independent variables operate on different scales. It shows that the independent variables “green space,” “eviction rate,” and “renters per household” operate on a small spatial scale compared to “crime ratio,” “owners per household,” “unemployment,” and “poverty”.

In addition, the importance of the determinants was mapped to understand to better visualize and understand potentially interesting local variations (Fig. 3). The importance of crime ratio (Fig. 3a) is the largest in a few places in the Mideast along Lake Michigan and southwest. Poverty, on the other hand, is important throughout the city, particularly in the south, southwest, mid-north, and northeast (Fig. 3b). Similarly, unemployment is more prevalent in the south and west downtown, and a few places in the southwest neighborhoods of the city (Fig. 3c). However, the green space predominately has the lowest importance except for some neighborhoods of the west downtown (Fig. 3d). The importance of eviction rates and renters per household is highest in a few places of south, southwest downtown, and mid-north (Fig. 3e and f). However, the importance of owners per household is highest in the south downtown with a few places in the northwest and northeast (Fig. 3g). Overall, Fig. 3 demonstrates the importance of unemployment, green space, eviction rate, renters, and owners per household determinants in a few south downtown neighborhoods on PI prevalence. This strategy aids in identifying the combination of variables strongly associated with PI prevalence in a few neighborhoods, which is extremely useful for policymakers and health professionals in developing related policies to address place-based interventions.

3.1 Predictive performance of linear and nonlinear models

We found that the GRF and GWANN models performed similarly, with a slight GWANN outperformance, based on our independent validation approach (Table 2). The GWANN model accounts for somewhat higher PI variability (R² = 0.92) than the GRF model (R² = 0.91) while reducing overall RMSE values (0.26 vs. 0.27). Both models outperform the RF model. Notably, GRF quantifies the local feature importance for each predictor in each local random forest model, which is extremely valuable from the standpoint of geographical analysis.

Table 2 Evaluation of the predictive performance of models based on the test dataset

Full size table

To further understand how GWANN, GRF, and RF handled geographical heterogeneity, we show the spatial distribution of the local residuals (Fig. 4). We also utilize local Moran's I to look for potential clustering in the residuals and estimate spatial autocorrelation. GWANN and GRF have addressed spatial heterogeneity in most cases. Residuals of GWANN and GRF are not geographically clustered in most places and are randomly dispersed. Notably, for GRF, the residuals are based on pseudo-hold-out predictions of the OOB sample and not a direct fit on the training data.

4 Discussion

This cross-sectional socioecological study investigates the prevalence of PI at the tract level. The nonlinear regression models GWANN and GRF were used. While GWANN is a black-box model that is complex enough to be incomprehensible to humans, GRF is used to supplement interpretability by identifying elements that cause local variations in PI prevalence rates, as demonstrated by other studies (Georganos et al. 2022; Georganos et al. 2021; Luo et al. 2021). This study examines machine learning models for predicting PI prevalence and the spatial importance of neighborhoods’ socioecological determinants in PI studies. Neighborhood characteristics account for three of the top four most important local factors (poverty, unemployment, and crime), while one is related to housing (eviction rate). The implications of the findings regarding these determinants are discussed further.

4.1 Poverty

Poverty is the most important factor in the large percentage of Chicago's census tracts (Fig. 4b). Although poverty is a multidimensional feature that encompasses the various deprivations that people face in their daily lives (Aktinson et al. 2002; Alkire and Apablaza 2016), it is explicitly associated with low income (financial poverty) and a variety of non-monetary factors (i.e., poor health, poor urban services) (United Nations 2016). Poverty is more important in neighborhoods where the population is overwhelmingly socially disadvantaged. Poor people frequently bear a double burden: they must deal with both the multiple harms caused by their own lack of income and the social consequences of living in a poor neighborhood (Vallée et al. 2022), such as the vulnerable African-American population in south Chicago neighborhoods (Mendenhall 2018). The mental distress and anxiety in these places have an impact on behavioral health as well as physical activity (Copeland and Snyder 2011). In poor neighborhoods, health disparity worsens with lack of access to affordable, healthy foods, living in overcrowded areas with poor sleep quality, high rent burdens, and household instability, which discourage healthy behaviors (Stanley et al. 2022; Kwarteng et al. 2017). Additionally, low-income neighborhoods lack access to well-designed green spaces, open spaces, sidewalks, and sports equipment to engage in physical activities (Huston et al. 2003). Hence, it is primarily critical to understand the “leisure-time physical activity” definition among socially vulnerable people because their daily lives are pounded with economic instability. Additionally, the relationship between poverty and physical activity is not well understood or direct, and more research is required (Armstrong et al. 2018).

4.2 Unemployment

Unemployment is highest per census tract in the southern downtown places near Lake Michigan, as well as in a few places on the west side of downtown (Fig. 4c). Unemployment has long been associated with a higher risk of disease (e.g., Miao et al. 2021). This association is rarely explained by behavioral health such as physical inactivity, risk of overdose (Sawyer et al. 2021), exacerbated by lower income and psychological stress (Jin et al. 1995).

Further studies should investigate employment status (full-time, part-time, unemployed) and job type (active and sedentary) in behavioral health studies, because unemployed people, according to Gryson (1993), are just as active as employed people. Brown and Trost (2003) found that higher PI is associated with employment. In this sense, one's job has a little direct effect on the prevalence of PI. Furthermore, Marrone and Swarbrick (2020) found that long-term unemployment has a significant negative impact on population behavioral and mental health. As a result, the duration of unemployment should also be considered in public health studies.

4.3 Crime

The importance of crime per census tract is strongest in the southern downtown districts (Fig. 4a), where eviction rate and poverty are relatively high (Fig. 1e and b). The community and individual-level factors associate PI with neighborhood crime. Business withdrawal, population out-migration, physical deterioration, declining community resources, and crumbling public infrastructure are witnessed at the community level in high-crime zones that are unsafe for physical exercise (Feuillet et al. 2020; Lacoe et al. 2018). Also, individuals' perceptions of unsafety, anxiety, and stress are influenced by neighborhood crime, which restricts participation in physical activities (Park and Garcia 2020). In this study, crime refers to a variety of illegal behaviors. However, outdoor and indoor crimes and types of crimes (home burglary vs. robbery) should be treated separately when dealing with crime issues at the neighborhood level. Overall, well-planned with a well-maintained built environment (e.g., green spaces) can discourage illegal behavior. The term "well-planned" refers to areas with the capacity for informal and formal social control and thus provide safety (Weisburd et al. 2021).

4.4 Eviction rate

The eviction rate per census tract is most important in the southern neighborhoods along Lake Michigan (Fig. 4e), which have a high poverty rate and a share of green spaces (Fig. 1b and d). Housing insecurity and unaffordability can lead to stress and despair, as well as alter metabolism and increase the risk of unhealthy behaviors such as smoking and inactivity (Chambers and Fuster 2012; Hazekamp et al. 2021). Little is known about how policy interventions, such as supply-side housing subsidy programs aimed at increasing affordable housing, affect eviction dynamics in communities across the United States (Preston and Reina 2021). Our findings highlight the importance of local governments acting quickly to protect public health from the PI and obesity epidemics by increasing rental assistance and extending the eviction moratorium (Chicago Eviction 2022).

Although other socioecological variables in this study (green space, renters per household, and owners per household) had minor importance on the prevalence of PI in Chicago, green space that is available, visible, and accessible can improve the function of green space in boosting population health by encouraging physical activity participation. However, rather than focusing on green infrastructure to improve health, Chicago, and other large cities develop and plan green spaces with the tourism industry in mind. Additionally, there is insufficient research on the impact of the number of renters versus owners per household on developing a healthy and friendly neighborhood environment that encourages social cohesion and inclusion. More research is needed on the house as a social and environmental determinant of health for individuals and communities.

4.5 Limitations

The tract-level estimates used in previous studies (e.g., Hohl and Lotfata 2022) have some limitations. While our findings may be affected by the underlying analytical scale, causal inference is also complicated by the data's cross-sectional and ecological nature (Helbich et al. 2021). A telephone survey was used to collect the outcome variable, which is likely to have problems due to recall bias, social desirability bias, and the inability to reach households without landlines (US CDC 2021). Moreover, social, physical, spatial, cultural, personal, and interpersonal dynamics contribute to physical inactivity. However, the primary goal of this research is not to propose a holistic approach to model and map PI by taking into account every available factor, but rather to present new spatial methods to the community for improving the prediction of health outcomes. In addition to the data limitation, locally weighted models have some drawbacks. To train the GRF that accounts for differences in tract size, for example, we used an adaptive kernel bandwidth to choose the optimal number of tracts. Tracts of varying sizes dispersed across a geographic area may result in dependent variable spillover effects or residual spatial autocorrelation. The local R² in the GRF varied across space, as expected. The R² was less than 0.5 in some tracts in the city's south, west, and north. These findings suggest that the variables included only explained a small portion of the variance in the outcome variable and that additional variables should be included to improve the performance of the local models in these regions. In terms of GWANN model, the inner workings of GWANN are hard to perceive by humans. More experimental experiments are required to extend GWANN to local analysis. Also, GWANN depends on many hyperparameters (number of hidden neurons, batch size, momentum, etc.), which were chosen for this study a priori. Hence, it is not guaranteed that the chosen hyperparameters yielded the best results. Determining optimal hyperparameters in an experimental setting, though, is cumbersome because it can be expected that the hyperparameters are not independent of each other.

5 Conclusion

This is the first study to look at the geographic variation in PI prevalence at the tract level in response to factors associated with neighborhood circumstances using the GWANN and GRF models. It is a significant finding regarding the usefulness and applicability of the GRF and GWANN spatial regression models in geographical analysis because they account for nonlinear relationships and address spatial heterogeneity. More importantly, GRF is used to determine how the importance of each variable varies spatially. Additionally, given the high likelihood of local collinearity, researchers are encouraged to use GWANN and GRF in population health studies to assess the effects of a wide range of socioecological data samples on behavioral health. As these models are prediction-oriented, they can be used to run different scenarios and projections which are vital components for enhancing evidence-based policy making.

The findings of this study could lead to more targeted epidemiologic research at the tract level, allowing researchers to better understand the mechanisms driving inactive lifestyle in different places with different household and neighborhood-related characteristics, particularly in Chicago, where obesity is the highest risk. Other elements of household quality (such as household composition, mobile homes, homelessness, and pollution), as well as the spatial behavior of the PI epidemic from a neighborhood health standpoint, should be investigated in future research. Understanding the spatial heterogeneity of PI determinants’ associations could lead to more advanced research and policy development in large US cities to address PI's underlying, spatially varied drivers at the local level.

References

Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H (2018) State-of-the-art in artificial neural network applications: a survey. Heliyon 4:e00938
Article Google Scholar
Adlakha D, Brownson RC, Hipp JA (2021) Built environment correlates of overweight and obesity among adults in Chennai, India. Cities Health 5(3):289–297
Article Google Scholar
Akinwumiju AS, Oluwafemi O, Mohammed YD, Mobolaji JW (2022) Geospatial evaluation of COVID-19 mortality: Influence of socio-economic status and underlying health conditions in contiguous USA. Appl Geogr 141:102671
Article Google Scholar
Aljabri D (2022) Associations between obesity, physical inactivity, healthcare capacity, and the built environment: geographic information system analysis. J Multidiscip Healthc 15:689
Article Google Scholar
Alkire S, Apablaza M (2016) Multidimensional poverty in Europe 2006–2012: illustrating a methodology, OPHI Working Paper 74, University of Oxford.
Amidei CB, Trevisan C, Dotto M, Ferroni E, Noale M, Maggi S, Corti MC, Baggio G, Fedeli U, Sergi G (2022) Association of physical activity trajectories with major cardiovascular diseases in elderly people. Heart 108(5):360–366
Article Google Scholar
An R, Shen J, Yang Q, Yang Y (2019) Impact of built environment on physical activity and obesity among children and adolescents in China: a narrative systematic review. J Sport Health Sci 8(2):153–169
Article Google Scholar
Andersen LB, Riiser A, Rutter H, Goenka S, Nordengen S, Solbraa AK (2018) Trends in cycling and cycle related injuries and a calculation of prevented morbidity and mortality. J Transp Health 9:217–225
Article Google Scholar
Anselin L (1995) The local indicators of spatial association LISA. Geogr Anal 27:93–115
Article Google Scholar
Armstrong S, Wong CA, Perrin E, Page S, Sibley L, Skinner A (2018) Association of physical activity with income, race/ethnicity, and sex among adolescents and young adults in the United States: findings from the national health and nutrition examination survey, 2007–2016. JAMA Pediatr 172(8):732–740
Article Google Scholar
Atkinson T, Cantillon B, Marlier E, Nolan B (2002) Social indicators: The EU and social inclusion. Oxford University Press, Oxford
Book Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Brown WJ, Trost SG (2003) Life transitions and changing physical activity patterns in young women. Am J Prev Med 25(2):140–143
Article Google Scholar
Cao C, Friedenreich CM, Yang L (2022) Association of daily sitting time and leisure-time physical activity with survival among US cancer survivors. JAMA Oncol 8(3):395–403
Article Google Scholar
Chambers EC, Fuster D (2012) Housing as an obesity-mediating environment. Int J Public Health 57(2):453–454
Article Google Scholar
Chander G, Markham BL, Helder DL (2009) Summary of current radiometric calibration coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI sensors. Remote Sens Environ 113(5):893–903
Article Google Scholar
Chicago Data Portal (CDP) Crime census. https://data.cityofchicago.org/resource/x2n5-8w5q.json.Accessed 20 December 2021.
Chicago Eviction (2022) https://eviction.lcbh.org/reports/eviction-unemployment-covid-19. Accessed 15 April 2022.
Chicago Health Atlas (CHA) https://chicagohealthatlas.org/. Accessed 10 December 2021.
Copeland VC, Snyder K (2011) Barriers to mental health treatment services for low-income African American women whose children receive behavioral health services: an ethnographic investigation. Soc Work Public Health 26(1):78–95
Article Google Scholar
Cradock AL, Kawachi I, Colditz GA, Gortmaker SL, Buka SL (2009) Neighborhood social cohesion and youth participation in physical activity in Chicago. Soc Sci Med 68(3):427–435
Article Google Scholar
Cranley TA, Surles JG (2002) Model-dependent variance inflation factor cutoff values. Qual Eng 14(3):391–403
Article Google Scholar
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314
Article Google Scholar
Dave VS, Dutta K (2014) Neural network-based models for software effort estimation: a review Artif. Intell Rev 42(2):295–307
Article Google Scholar
Desmond M, Shollenberger T (2015) Forced displacement from rental housing: prevalence and neighborhood consequences. Demography 52(5):1751–1772
Article Google Scholar
Dongare AD, Kharde RR, Kachare AD (2012) Introduction to artificial neural network. Int J Eng Innov Technol 2:189–194
Google Scholar
Faka A, Chalkias C, Georgousopoulou EN, Tripitsidis A, Pitsavos C, Panagiotakos DB (2019) Identifying determinants of obesity in Athens, Greece through global and local statistical models. Spat Spatio-Temp Epidemiol 29:31–41
Article Google Scholar
Feng X, Astell-Burt T (2019) Can green space quantity and quality help prevent postpartum weight gain? A longitudinal study. J Epidemiol Commun Health 73(4):295–302
Article Google Scholar
Feuillet T, Valette JF, Charreire H, Kesse-Guyot E, Julia C, Vernez-Moudon A, Hercberg S, Touvier M, Oppert JM (2020) Influence of the urban context on the relationship between neighbourhood deprivation and obesity. Soc Sci Med 113537:0277–9536
Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article Google Scholar
Google Earth Engine,Landsat 8 Collection 1 Tier 1 Annual NDVI Composite. https://explorer.earthengine.google.com/#detail/LANDSAT%2FLC08%2FC01%2FT1_ANNUAL_NDV. Accessed 20 December 2021.
Georganos S, Kalogirou S (2022) A forest of forests: a spatially weighted and computationally efficient formulation of geographical random forests. ISPRS Int J Geo Inf 11(9):471
Article Google Scholar
Georganos S, Grippa T, Gadiaga AN, Linard C, Lennart M, Mboga SVN, Wolff E, Kalogirou S (2021) Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto Int 36(2):121–136
Article Google Scholar
Goodfellow I, Bengio Y., Courville, A. (2016) Deep learning (adaptive computation and machine learning series).
Grayson JP (1993) Health, physical activity level, and employment status in Canada. Int J Health Serv 23(4):743–761
Article Google Scholar
Grekousis G, Feng Z, Marakakis I, Lu Y, Wang R (2022) Ranking the importance of demographic, socioeconomic, and underlying health factors on US COVID-19 deaths: a geographical random forest approach. Health Place 74:102744
Article Google Scholar
Hagenauer J, Helbich M (2022) A geographically weighted artificial neural network. Int J Geogr Inf Sci 36(2):215–235
Article Google Scholar
Hallal PC, Andersen LB, Bull FC, Guthold R, Haskell W, Ekelund U (2012) Lancet physical activity series working group global physical activity levels: surveillance progress, pitfalls, and prospects. Lancet 380(9838):247–257
Article Google Scholar
Harris DR (1999) Property values drop when blacks move in, because...: racial and socioeconomic determinants of neighborhood desirability. Am Sociol Rev 1:461–479
Article Google Scholar
Haykin S (2008) Neural networks: a comprehensive foundation, 3rd edn. Prentice Hall, Upper Saddle River, NJ
Google Scholar
Hazekamp C, Yousuf S, Khare M, MacDowell M (2021) Unhealthy behaviours in urban Illinois communities affected by eviction: a descriptive analysis. Health Soc Care Commun 29:867–875
Article Google Scholar
Helbich M, Mute Browning M, Kwan M-P (2021) Time to address the spatiotemporal uncertainties in COVID-19 research: concerns and challenges. Sci Total Environ 764:1–2
Article Google Scholar
Hohl A, Lotfata A (2022) Modeling spatiotemporal associations of obesity prevalence with biking, housing cost and green spaces in Chicago, IL, USA, 2015–2017. J Transp Health 26:101412
Article Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Article Google Scholar
Huston SL, Evenson KR, Bors P, Gizlice Z (2003) Neighborhood environment, access to places for activity, and leisure-time physical activity in a diverse North Carolina population. Am J Health Promot 18(1):58–69
Article Google Scholar
Iyanda AE, Osayomi T (2021) Is there a relationship between economic indicators and road fatalities in Texas? A multiscale geographically weighted regression analysis. GeoJournal 86:2787–2807
Article Google Scholar
Jin RL, Shah CP, Svoboda TJ (1995) The impact of unemployment on health: a review of the evidence evidence (reprinted from Canadian Medical Association Journal, vol 153, pp 529–40, 1995). J Public Health Policy 18(3):275–301
Article Google Scholar
Kalogirou S, Georganos S. SpatialML, R package. https://cran.r-project.org/web/packages/SpatialML/SpatialML.pdf. Accessed 10 January 2022.
Kim C, Kim J (2021) Urban sprawl and leisure time physical activity. Sport Manag Rev. https://doi.org/10.1080/14413523.2021.1972664
Article Google Scholar
Kris-Etherton PM, Petersen KS, Hibbeln JR, Hurley D, Kolick V, Peoples S, Rodriguez N, Woodward-Lopez G (2021) Nutrition and behavioral health disorders: depression and anxiety. Nutr Rev 79(3):247–260
Article Google Scholar
Kwarteng JL, Schulz AJ, Mentz GB, Israel BA, Perkins DW (2017) Independent effects of neighborhood poverty and psychosocial stress on obesity over time. J Urban Health: Bull N Y Acad Med 94(6):791–802
Article Google Scholar
Labib SM, Lindley S, Huck JJ (2020) Spatial dimensions of the influence of urban green blue spaces on human health: a systematic review. Environ Res 180:108869
Article Google Scholar
Lacoe J, Bostic RW, Acolin A (2018) Crime and private investment in urban neighborhoods. J Urban Econom 108:154–169
Article Google Scholar
Lane JM, Davis BA (2022) Food, physical activity, and health deserts in Alabama: the spatial link between healthy eating, exercise, and socioeconomic factors. GeoJournal.
Le Boennec R, Lucas S (2022) Is neighborhood satisfaction related to density perception? Promoting liveable and sustainable land use planning. J Environ Plan Manage 65(11):2081–2098
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep Learning. Nature 521(7553):436–444
Article Google Scholar
Lee KH, Dvorak RG, Schuett MA, van Riper CJ (2017) Understanding spatial variation of physical inactivity across the continental United States. Landsc Urban Plan 1(168):61–71
Article Google Scholar
Lenhart CM, Wiemken A, Hanlon A, Perkett M, Patterson F (2017) Perceived neighborhood safety related to physical activity but not recreational screen-based sedentary behavior in adolescents. BMC Public Health 17(1):722
Article Google Scholar
Li X (2021) Examining the spatial distribution and temporal change of the green view index in New York City using Google street view images and deep learning. Environ Plann B Urban Anal City Sci 48(7):2039–2054
Article Google Scholar
Luo Y, Yan J, McClure S (2021) Distribution of the environmental and socioeconomic risk factors on COVID-19 death rate across continental USA: a spatial nonlinear analysis. Environ Sci Pollut Res 28(6):6587–6599
Article Google Scholar
Marrone J, Swarbrick MA (2020) Long-Term unemployment: a social determinant Underaddressed within community behavioral health programs. Psychiatr Serv 71(7):745–748
Article Google Scholar
Mendenhall R (2018) The medicalization of poverty in the lives of low-income Black mothers and children. J Law Med Ethics 46(3):644–650
Article Google Scholar
Miao Q, Dunn S, Wen SW, Lougheed J, Reszel J, Lavin Venegas C, Walker M (2021) Neighbourhood maternal socioeconomic status indicators and risk of congenital heart disease. BMC Pregnancy Childbirth 21(1):1–21
Article Google Scholar
Mohanty LL, Raut LK (2009) Home ownership and school outcomes of children: evidence from the PSID child development supplement. Am J Econ Sociol 68(2):465–489
Article Google Scholar
Moreno-Llamas A, García-Mayor J, De la Cruz-Sánchez E (2021) Urban-rural differences in trajectories of physical activity in Europe from 2002 to 2017. Health Place 69:102570
Article Google Scholar
Oshan TM, Smith J, Fotheringham AS (2019) A multiscale spatial analysis of obesity determinants in Phoenix, Arizona.
Park Y, Garcia M (2020) Pedestrian safety perception and urban street settings. Int J Sustain Transp 14(11):860–871
Article Google Scholar
Passi-Solar Á, Margozzini P, Cortinez-O’Ryan A, Muñoz JC, Mindell JS (2020) Nutritional and metabolic benefits associated with active and public transport: results from the Chilean national health survey, ENS 2016–2017. J Transp Health 17:100819
Article Google Scholar
Pears M, Kola-Palmer S, De Azevedo LB (2022) The impact of sitting time and physical activity on mental health during COVID-19 lockdown. Sport Sci Health 18(1):179–191
Article Google Scholar
Peng B, Erdogan S, Nasri AN, Zou Z (2021) Towards a health-conscious transportation planning: a framework for estimating health impacts of active transportation at local level. J Transp Health 22:101231
Article Google Scholar
Preston G, Reina VJ (2021) Sheltered from eviction? A framework for understanding the relationship between subsidized housing programs and eviction. Hous Policy Debate 31(3–5):785–817
Article Google Scholar
Prince SA, Lancione S, Lang JJ, Amankwah N, de Groh M, Garcia AJ, Merucci K, Geneau R (2022) Examining the state, quality and strength of the evidence in the research on built environments and physical activity among children and youth: An overview of reviews from high income countries. Health Place 76:102828
Article Google Scholar
Quiñones S, Goyal A, Ahmed ZU (2021) Geographically weighted machine learning model for untangling spatial heterogeneity of type 2 diabetes mellitus (T2D) prevalence in the USA. Sci Rep 11:6955
Article Google Scholar
Ramírez T, Hurtubia R, Lobel H et al (2021) Measuring heterogeneous perception of urban space with massive data and machine learning: an application to safety. Landsc Urban Plan 208:104002
Article Google Scholar
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Rodgers J, Briesacher BA, Wallace RB, Kawachi I, Baum CF, Kim D (2019) County-level housing affordability in relation to risk factors for cardiovascular disease among middle-aged adults: the national longitudinal survey of youths 1979. Health Place 59:102194
Article Google Scholar
Rolfe S, Garnham L, Godwin J et al (2020) Housing as a social determinant of health and wellbeing: developing an empirically-informed realist theoretical framework. BMC Public Health 20:1138
Article Google Scholar
Rütten A, Abu-Omar K, Gelius P, Schow D (2013) Physical inactivity as a policy problem: applying a concept from policy analysis to a public health issue. Health Res Policy Syst 11(1):1–9
Article Google Scholar
Sallis JF, Cervero RB, Ascher W, Henderson KA, Kraft MK, Kerr J (2006) An ecological approach to creating active living communities. Annu Rev Public Health 27:297–322
Article Google Scholar
Sawyer JL, Shrestha S, Pustz JC, Gottlieb R, Nichols D, Van Handel M, Lingwall C, Stopka TJ (2021) Characterizing opioid-involved overdose risk in local communities: an opioid overdose vulnerability assessment across Indiana, 2017. Prevent Med Rep 24:101538
Article Google Scholar
Sims M, Kershaw KN, Breathett K, Jackson EA, Lewis LM, Mujahid MS (2020) Importance of housing and cardiovascular health and well-being: a scientific statement from the American heart association. Circ Cardiovasc Qual Outcomes 13(8):e000089
Article Google Scholar
Stanley KC, Harrigan PB, Serrano EL, Kraak VI (2022) A systematic scoping review of the literacy literature to develop a digital food and nutrition literacy model for low-income adults to make healthy choices in the online food retail ecosystem to reduce obesity risk. Obes Rev 23(4):e13414
Google Scholar
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In International conference on machine learning (pp. 1139–1147). PMLR.
Unger E, Diez-Roux AV, Lloyd-Jones DM, Mujahid MS, Nettleton JA et al (2014) Association of neighborhood characteristics with cardiovascular health in the multi-ethnic study of atherosclerosis. Circ Cardiovasc Qual Outcomes 7(4):524–531
Article Google Scholar
United Nations (2016) Multidimensional poverty and its measurement: guide on poverty measurement, In Working Paper 12, Conference of European Statisticians, 12–13 July, Geneva, Switzerland.
US American Community Survey (ACS) (2019). https://www.census.gov/geographies/mappingfiles/time-series/geo/tiger-line-file.html. Accessed 1 Jan 2022
US America Health Ranking (2022). https://www.americashealthrankings.org/.Accessed 10 January 2022.
US Centers for Disease Control and Prevention (CDC) (2021) Behavioral Risk Factor Surveillance System Survey Data; Department of Health and Human Services, Centers for Disease Control and Prevention: Atlanta, GA, USA, 2021. https://www.cdc.gov/brfss/index.html. Accessed 27 December 2021.
US Department of Housing and Urban Development (HUD) (2021) Location affordability index. https://hudgis-hud.opendata.arcgis.com/datasets/location-affordability-index-v-3/explore?location=10.288506%2C0.315617%2C1.59. Accessed 3 Feb 2022
Vallée J, Shareck M, Kestens Y, Frohlich KL (2022) Everyday geography and service accessibility: the contours of disadvantage in relation to mental health. Ann Am Assoc Geogr 112(4):931–947
Google Scholar
Vásquez-Vera H, Palència L, Magna I, Mena C, Neira J, Borrell C (2017) The threat of home eviction and its effects on health through the equity lens: a systematic review. Soc Sci Med 175:199–208
Article Google Scholar
Wang R, Feng Z, Pearc J et al (2021) The distribution of greenspace quantity and quality and their association with neighbourhood socioeconomic conditions in Guangzhou, China: a new approach using deep learning method and street view images. Sustain Cities Soc 66:102664
Article Google Scholar
Weisburd D, White C, Wire S, Wilson DB (2021) Enhancing informal social controls to reduce crime: evidence from a study of crime hot spots. Prev Sci 22(4):509–522
Article Google Scholar
World Health Organization (WHO) (2019) Guidelines on physical activity, sedentary behaviour and sleep for children under 5 years of age. Geneva. Licence: CC BY-NC-SA 3.0 IGO.
World Health Organization (WHO) (2020) Physical inactivity: a global public health problem. https://www.who.int/dietphysicalactivity/factsheet_inactivity/en. Accessed 15 June 2020.
Xiao Y, Miao S, Zhang Y, Xie B, Wu W (2022) Exploring the associations between neighborhood greenness and level of physical activity of older adults in shanghai. J Transp Health 24:101312
Article Google Scholar
Xu Y, Wen M, Wang F (2015) Multilevel built environment features and individual odds of overweight and obesity in Utah. Appl Geogr 60:197–203
Article Google Scholar
Xue H, Slivka L, Igusa T, Huang TT, Wang Y (2018) Applications of systems modelling in obesity research. Obes Rev 19:1293–1308
Article Google Scholar
Yang L, Ao Y, Ke J et al (2021) To walk or not to walk? Examining non-linear effects of streetscape greenery on walking propensity of older adults. J Trans Geogr 94:103099
Article Google Scholar

Download references

Acknowledgements

We would also like to show our gratitude to Julian Hagenauer (Department of Human Geography and Spatial Planning, Faculty of Geosciences, Utrecht University, Netherlands) for sharing his pearls of wisdom with us during the course of this research, and we thank three reviewers for their insights.

Author information

Authors and Affiliations

School of Veterinary Medicine, Department of Veterinary Pathology, University of California, Davis, USA
Aynaz Lotfata
Geomatics, Department of Environmental and Life Sciences, Faculty of Health, Science and Technology, Karlstad University, Karlstad, Sweden
Stefanos Georganos

Authors

Aynaz Lotfata
View author publications
You can also search for this author in PubMed Google Scholar
Stefanos Georganos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aynaz Lotfata.

Ethics declarations

Conflict of interest

There is not conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 8038 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lotfata, A., Georganos, S. Spatial machine learning for predicting physical inactivity prevalence from socioecological determinants in Chicago, Illinois, USA. J Geogr Syst (2023). https://doi.org/10.1007/s10109-023-00415-y

Download citation

Received: 30 May 2022
Accepted: 04 May 2023
Published: 05 June 2023
DOI: https://doi.org/10.1007/s10109-023-00415-y

Keywords

JEL Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Spatial machine learning for predicting physical inactivity prevalence from socioecological determinants in Chicago, Illinois, USA

Abstract

Similar content being viewed by others

Machine Learning Approaches for Measuring Neighborhood Environments in Epidemiologic Studies

Spatial analysis of neighborhood vitality determinants on physical activity: a case study of Chicago

Urbanization and physical activity in the global Prospective Urban and Rural Epidemiology study

1 Introduction

2 Data and method

2.1 Data

2.2 Methods

2.2.1 Random forest and geographical random forest

2.2.2 Geographically weighted artificial neural network

2.2.3 Parameter tuning

2.2.4 Validation

2.2.5 Spatial autocorrelation

2.3 Software

3 Results

3.1 Predictive performance of linear and nonlinear models

4 Discussion

4.1 Poverty

4.2 Unemployment

4.3 Crime

4.4 Eviction rate

4.5 Limitations

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 8038 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation