Introduction

In January 2020, California had only three confirmed cases of 2019 novel coronavirus (COVID-19). Within weeks, community transmission was confirmed, and the first COVID-related deaths were reported. By September 2021, California had reported over 7.1 million COVID-19 cases and over 77,000 COVID-related deaths [1]. For a state with a population of close to 40 million, that equates to approximately 1 in 6 people infected (among those tested) and 1 in 500 people who have died.

The neighborhood where a person lives exposes them to social, structural, and environmental factors that have been shown to have complex effects on health beyond individual-level factors [2, 3]. Thus, it is perhaps no surprise that while SARS-CoV-2 may spread easily, it does not spread equally across neighborhoods. Inequities in SARS-CoV-2 exposure and COVID-19 outcomes are often driven by structural racism, an interconnected system of policies and institutions that maintains White privilege [4,5,6,7,8,9] and impacts health through various factors, such as differences in neighborhood factors, employment, and health care access [10,11,12]. Geographic areas with higher poverty, higher average household size, lower education, and more foreign born residents have been linked with higher COVID-19 case and death rates [13,14,15,16,17,18,19]. Since structural racism, such as historical redlining, has driven individuals of minoritized racial groups into racially segregated neighborhoods, neighborhood disparities can contribute to racial and ethnic disparities in health outcomes [20,21,22].

Use of more granular geographic units to assess environmental factors driving COVID-19 disparities can better inform policy changes and outreach strategies to groups facing structural inequities [15]. Although much has been published on racial disparities in COVID-19 outcomes, there is less known about the specific neighborhood-level factors contributing to inequities in burden of COVID-19 among racial and ethnic groups. Thus, we use census tract-level data to investigate neighborhood, social, and built environment factors that may account for COVID-19 case rates among racially minoritized populations in California. We hypothesize that the neighborhood-level contributors to COVID-19 case burden vary across racial and ethnic groups.

Methods

Data

Census tract-level data on cumulative numbers of COVID-19 cases and tests up to January 31, 2021, were obtained from the California Department of Public Health (CDPH). At that time, less than 2% of Californians were fully vaccinated, and thus, vaccinations had not likely begun to affect the trajectory of the pandemic. Tests were defined as the cumulative total number of COVID tests reported to CDPH. Cases were defined as the cumulative total number of COVID positive cases reported to CDPH via California Reportable Disease Information Exchange (CalREDIE), Los Angeles, and San Diego communicable disease reporting systems.

We used census tract-level data to represent neighborhoods in California. We selected variables based on a conceptual model of COVID-19 infection that includes neighborhood-level race/ethnicity, socioeconomic status, immigration factors, and housing factors. Census tract-level data on sociodemographics of residents (racial and ethnic composition, sex, age, limited English proficiency (LEP), foreign born, recent immigrant, uninsured, average household size, percentage with less than a high school education, and severe overcrowding) were obtained from the American Community Survey 5-year estimates (2015–2019) [23]. For the race and ethnicity variables, White was defined as non-Hispanic White, Black was defined as Hispanic or non-Hispanic Black, Hispanic included all who identified as Hispanic/Latino, and Asian included all Asian ethnicities but not Native Hawaiians or Pacific Islanders. For all races and ethnicities, residents were included if they reported a certain race and ethnicity alone or in combination with other races and ethnicities. Each race/ethnicity was used as a separate variable in the model, rather than as one categorical variable. LEP was defined as reporting speaking English less than “very well.” Recent immigrant was defined as being born outside the USA and entering the country after 2010. Severe overcrowding was defined as household with more than 1.51 occupants per room [24]. Population density was derived using United States Census (2010) population counts and defined as population counts per square kilometer [25]. Data on low-income households were from the United States Department of Housing and Urban Development (HUD) [26]. HUD’s Extremely Low-Income (ELI) measure defines poor households as those earning below 30% of the area median income. Each factor was expressed as proportion of residents in the census tract (except for average household size and population density).

The neighborhood data analyzed during the current study from American Community Survey, United States Census, and United States Department of Housing and Urban Development are publicly available and complied for download at healthatlas.ucsf.edu. The COVID-19 data analyzed during the current study are not publicly available due to privacy agreements with the California Department of Public Health.

Statistical Analyses

To examine the effect of neighborhood-level factors on the number of COVID-19 cases, quasi-Poisson generalized linear models with census tract-level observations were run, offset for the natural logarithm of population size and adjusted for the number of tests and county (to account for differential COVID-19 policies and practices). Because the case data were overdispersed, we used a generalized linear model with a quasi-likelihood approach (quasi-Poisson) that adjusts count data for overdispersion by including a dispersion parameter [27]. Pearson correlations were used to identify highly correlated variables (> 0.70) which were removed from the fully adjusted model. The exponentiated regression coefficient, rate ratio (RR) is interpreted as the relative rate of increase in COVID-19 case count per unit increase in neighborhood factor. All variables reported as a proportion were divided by 10 so the estimated effects (i.e., estimated change in the number of cases) can be interpreted as per a 10% change in the factor; similarly, population density was divided by 10,000 so the effects can be interpreted as per a 10,000-unit change in population density.

We estimated the relative contribution of each neighborhood factor to racial and ethnic associations with the number of COVID-19 cases. The baseline model included the proportion of Hispanic, Black, and Asian residents as three separate variables and was adjusted for number of tests and county. Covariables were added individually to the baseline model to determine their influence on COVID-19 associations with proportion Hispanic, Black, and Asian, by calculating the percent change in the racial/ethnic rate ratio when the covariable was added to the baseline model. For Hispanic, Black, and Asian separately, covariables were ranked in order of their influence (from most negative change to most positive change). In separate sequential models for Hispanic, Black, and Asian, covariables were added to the baseline model in the order of their influence. Subanalyses were conducted limited to census tracts with a majority (> 50%) of Hispanic, Black, and Asian residents to identify factors that might be unique to these neighborhoods.

Geographically weighted regression (GWR) using a quasi-Poisson model was conducted to identify regional differences in the associations identified in the previous model [28]. National Historical Geographic Information System spatial data for California Census tracts [29] was joined with California COVID-19 data and sociodemographic data. A quasi-Poisson GWR model with a log link and offset by the natural log of the total population was based on all variables of interest (proportion Asian, proportion Black, proportion Hispanic, proportion ≥ 65 years, proportion with limited English proficiency, proportion uninsured, proportion with extremely low income, severe overcrowding, proportion male, average household size, and population density). Data were then exported for mapping.

Quasi-Poisson models, visualizations, data preparation, and GWR were conducted in R (version 4.0.3) using RStudio (version 1.4.1106) with the tidyverse, sf, ggplot2, and spgwr packages. Maps were made in ArcGIS Pro (version 2.7.0).

Results

There was a total of 3,034,403 COVID-19 cases reported through January 31, 2021, from 58 counties and 8009 census tracts (out of a total of 8057 CA census tracts), with a maximum number of cases in one census tract of 3692 cases, and a median of 302. The mean number of cases within deciles of each census tract variable is reported in Fig. 1. Case numbers decreased with increasing proportions of White or Asian residents and residents ≥ 65 years. Case numbers increased in census tracts with increasing proportions of Hispanic or Black residents, LEP, uninsured, less than high school degree, severe overcrowding, and higher average household size.

Fig. 1
figure 1

Mean cumulative COVID-19 cases per 100,000 population (up to January 31, 2021) within each decile of variable for all census tracts in California

Baseline and fully adjusted models assessing associations of each neighborhood factor and COVID-19 cases are presented in Table 1. High Pearson correlations were identified between proportion of White and Hispanic (0.77), LEP and foreign born (0.85), and uninsured and not high school graduate (0.72). Variables for proportion White, foreign born, and not high school graduate were thus omitted from the fully adjusted model. In fully adjusted models, neighborhood factors that were independently associated with case rate included racial and ethnic composition, age, limited English proficiency, income, household size, and population density.

Table 1 Associations between cumulative COVID-19 cases (up to January 31, 2021) and neighborhood factors at the census tract level in California (n = 8009)

We assessed the relative contribution of each neighborhood factor on the association between each racial and ethnic group and COVID-19 cases (Table 2). Results of our sequential modeling (Fig. 2) show that proportion with LEP in a census tract had the largest influence on the positive association between the proportion of Hispanic residents and COVID-19 cases in a census tract, meaning that LEP explained some of the association (− 2.1% change of estimate for Hispanic with the addition of LEP into the model). This was also true for proportion of Asian residents (− 1.8% change), but not for the proportion of Black residents (− 0.1% change). None of the assessed variables had a > 1% influence on the association between the proportion of Black residents and COVID-19 cases.

Table 2 Percent change in association between race or ethnicity and cumulative COVID-19 cases (up to January 31, 2021) for each neighborhood factor
Fig. 2
figure 2

Sequential models for cumulative COVID-19 cases (up to January 31, 2021) for neighborhood factors at the census tract level by race and ethnicity. Baseline model included % Hispanic, % Black, and % Asian, and was adjusted for tests and county. Neighborhood factors were added individually to the baseline model to determine their influence on the association between COVID-19 cases and % Hispanic, % Black, and % Asian. Neighborhood factors were ranked from those with the largest negative change to the racial/ethnic parameter estimate to those with the largest positive change when the factor was added to the baseline model. Sequential models (shown here) added factors one by one to the model in the order of influence

In a subanalysis of the census tracts with a majority of Hispanic residents (2520 census tracts with > 50% Hispanic residents), the effect of LEP on COVID-19 cases was attenuated (Table 3) as compared to when looking at all California census tracts. A 10% increase in residents with LEP was associated with a 1.8% (1.018, 95% CI 1.005–1.031) increase in COVID-19 cases (compared to 4.4% (1.044, 95% CI 1.034–1.054) in all California census tracts), after adjusting for other neighborhood factors. In census tracts with a majority of Asian residents (449 census tracts with > 50% Asians), a 10% increase in residents with LEP was associated with an 11% increase in COVID-19 cases. In census tracts with a majority of Black residents (72 census tracts with > 50% Black residents), the factor most strongly associated with COVID-19 cases was proportion of Hispanic residents.

Table 3 Associations between neighborhood factors and cumulative COVID-19 cases up to January 31, 2021, for majority Hispanic, Black, or Asian census tracts

In the GWR, 26 census tracts with a population of zero were excluded from the analysis. Results of the generalized GWR, using an optimized bandwidth of 0.006218417, show that the strength of rate ratio estimates vary across California (Fig. 3, Supplemental Figure, and Supplemental Table). For percent LEP, the estimate was highest in many urban and suburban areas of the Bay Area, Los Angeles, and San Diego, as well as some other outlying areas.

Fig. 3
figure 3

Geographically weighted regression model for cumulative COVID-19 cases (up to January 31, 2021) showing rate ratios for limited English proficiency (LEP) in California

Discussion

Results of our analysis using census tract-level data confirm the disproportionate burden of the COVID-19 pandemic in Hispanic and Black neighborhoods in California. Using census tract-level data on all COVID-19 cases in California through January 31, 2021, we observed that a higher proportion of Hispanic and Black residents was significantly associated with a higher COVID-19 case rate. This association persisted even after controlling for age and other neighborhood contextual factors. In addition, in fully adjusted models, we observed that proportion of residents with LEP in a neighborhood has a strong influence on racial and ethnic associations with COVID-19 cases for Hispanic and Asian residents but not Black residents. Associations between LEP and case rate are relatively strong in neighborhoods that are majority Asian and less so in neighborhoods that are majority Hispanic.

Existing studies have established the connection between neighborhood influences and health, specifically how residential segregation and inequalities in resources reinforce each other and lead to inequities in neighborhood physical and social environments and ultimately result in increased risk for adverse health outcomes [30,31,32,33]. Recent studies have investigated the neighborhood-level factors contributing to COVID-19 disparities across racial and ethnic groups [13,14,15, 18, 34,35,36,37,38,39]. However, few of these studies have analyzed subcounty-level data. At the state and county level, researchers have reported worse COVID-19 outcomes for geographic areas with higher proportions of Black and Hispanic residents [13, 14, 18]. However, there is great heterogeneity in neighborhood environments within states and counties, and smaller geographic units allow for a better understanding of specific neighborhood social and built environment factors that may be driving neighborhood disparities in COVID-19 burden. A study by Reitsma et al. analyzed individual COVID-19 case rates and stratified by subcounty Public Use Microdata Area (PUMA) level and reported that Hispanic residents in California are 8.1 times more likely to live in high-exposure-risk households than White residents and are overrepresented in cumulative cases [40]. Another study from San Diego County in California analyzed spatiotemporal spread of COVID-19 at the zip code level and found the strongest associations with zip code-level proportions of Spanish-speaking residents and residents with less than 9th grade education [41]. Our study is one of the few statewide analyses of COVID-19 at the census tract level, and the only, to our knowledge, to examine this level of geography in California [34, 35].

The novelty of our analytic approach relates to our assessment of the relative contribution of each of the neighborhood social and built environment factors to COVID-19 cases separately for each group defined by race and ethnicity. A particularly notable result was the association between LEP and COVID-19 case burden specific to neighborhoods with higher proportions of Hispanic and Asian residents. Future studies should examine the extent to which lack of English proficiency in the neighborhood environment contributes to exposure risk or impedes public health efforts to control case rates. Ultimately, our results show the importance of expanding culturally and linguistically tailored public health strategies. For neighborhoods with high proportions of Hispanic and Asian residents, provision of more tailored measures such as translated resources and community-based educators are needed to reduce spread of COVID-19. Seto et al. reported similar findings to ours by assessing census tract-level neighborhood factors and COVID-19 testing in King County, Washington. The authors found a positive trend between proportion of linguistic isolation and COVID-19 case number. However, because this characteristic was highly correlated with race and ethnicity, it was dropped from the modeling [34]. Results from a single health system in Seattle, Washington, showed that despite availability of interpreter services, COVID-19 testing was lower and infection was higher in non-English speakers than English-speaking patients [38], suggesting that solutions need to go beyond providing interpreter services at the point of care.

The COVID-19 pandemic story for Black Californians is different than that for Hispanic or Asian Californians. The COVID-19 burden is clearly greater for census tracts with higher proportions of Black residents. However, none of the neighborhood factors included in this analysis accounted for case burden among predominantly Black neighborhoods. Similar to our study, Benitez et al. analyzed zip code-level data across six diverse cities, including San Diego, to assess COVID-19-related racial and ethnic disparities, showing a clear high burden among neighborhoods with a greater proportion of Hispanic and Black residents. After adjustment for neighborhood-level factors, the association between the proportion of Hispanic residents and COVID-19 cases and deaths was attenuated while that for the proportion of Black residents largely persisted [39]. We do not conceptualize this (or any other) ethnoracial inequity in COVID infection as resulting from purported ethnoracial biology or ethnoracial stereotypes around disease vulnerability or health behaviors; previous scholarship has addressed the lack of evidence for these suggestions and situated racial and ethnic COVID disparities within existing frameworks outlining fundamental causes of health inequities [42,43,44,45]. Other structural and institutional factors not available within our study thus may be driving COVID-19 case burdens among neighborhoods with higher proportions of Black residents. It is also possible that anti-Black racism is woven so tightly into American society that its effects are not adequately captured by common neighborhood-level mediators we assessed like insurance status, household size, education, and population density. Addressing the health inequities and societal impacts of anti-Black racism will require significant, sustained, multisector investments and action [46, 47].

This study is ecological and our interpretations of associations between neighborhood factors and neighborhood COVID-19 case rates are not to be extrapolated to explain individual circumstances. Use of census tract boundaries for analysis requires acknowledgment of the modifiable areal unit problem (MAUP) which states that the drawing of boundaries has direct effects on results and contributes to issues with ecologic analysis [48]. Our attempt to adjust for differences in local-level policy was to control for county; however, regional/city policy or other differences may also occur that may be obscured in our analysis. Additionally, because of the dynamic nature of the pandemic and shifting local policies over time, looking at data over an entire time period may miss temporal shifts of relevance to understanding the spread and dynamically changing risk of COVID-19. The COVID-19 case data in this analysis included cases up until January 31, 2021, which minimizes the impact of vaccination status and the Delta and Omicron variant on cases. In addition, neighborhood, racial, and ethnic COVID disparities extend beyond infection rates to COVID-19-related disease severity and mortality [13,14,15,16,17,18,19,20]. These post-infection outcomes are likely strongly impacted by corresponding disparities in infection rates [49], but further exacerbated by structural determinants of overall health (comorbidity burden) and health care [50, 51]. While we appreciate the importance of disparities in post-infection clinical outcomes and the structural and social determinants of these clinical outcomes, they were outside the scope of the present study. For the results of GWR, local coefficients for some census tracts in the resulting maps may not be significant and should be interpreted with caution. Furthermore, this study did not directly assess access to health care, which could be an uncontrolled confounding variable. Nursing homes were also not accounted for, which could have caused census tracts with many nursing homes to have increased COVID-19 cases. Moreover, this study did not have a measure for the proportion of residents employed in essential work in a census tract.

Our results show that neighborhood-level contextual factors appear to drive some racial and ethnic disparities in COVID-19 cases in California. These data should inform ongoing and future strategies for outreach to specific groups facing structural inequities. Future research can help better understand factors underlying disparities in COVID-19 transmission in different neighborhoods. Gathering qualitative information from key stakeholders in affected communities may help point toward the root of the issues as well as potential solutions. In order to protect the health of all, the health of these vulnerable populations needs to be prioritized, and the drivers of these disparities need to be addressed.