1 Introduction

The World Health Organization (WHO) named the seventh member of the coronavirus family, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and the disease that it causes, Coronavirus disease 2019 (COVID-19) [1]. The virus has been highly contagious and the number of infected cases and deaths is still increasing. As a result, after about two years and four months from the first emergence of COVID-19 in China, the number of positive confirmed cases has exceeded 500 million with more than six million deaths globally, so more than seven million patients and 140,000 deaths have been the share of Iran [2]. In developing countries, especially where there is an extraordinary congested population, a speeded urbanization, and scarcity of health services, restraining COVID-19 spread is complicated [3].

Since the outbreak of COVID-19, research has been conducted to investigate the clinical and epidemiological features of the disease. In this regard, Iranian researchers have also conducted relevant studies, e.g., in Tehran [4] a study showed the COVID-19 infection rate was twice as common in males as in females, with a case fatality rate (CFR) of 8%, and a co-morbidity rate of 15.9% in those older 60 years. In another study [5], 8% of overall mortality, 63% of the male gender, and a significant association between intensive care unit (ICU) admission and death rate were reported. Although preceding studies have shown that most COVID-19 cases have a promising prognosis [6], patients with severe co-morbidities would experience more severe health consequences [7].

Spatiotemporal analysis of disease incidence using the Geographic Information Systems (GISs) can lead to more knowledge and a deeper understanding of current and future trends of disease patterns, and also reveal the best-required interventions [8, 9]. Space and time elements are among the most important and influential features in spreading diseases [10, 11]. The COVID-19 burden is not distributed randomly but fluctuates by location in the different areas [12]. A geographical analysis of COVID-19 infection can be driven by diffusion patterns through large population hubs to surrounding communities [13, 14]. We assume that GIS would help the numerous countries fight more effectively against this pandemic by finding high-risk areas, and it can play a leading role in managing the COVID-19 epidemic.

While an international body of evidence consistently demonstrates how spatial and epidemiological factors impact the distribution of COVID-19 incidence, the understanding of the issue in a congested and touristy city is limited and has not been profiled well, especially in the Middle East region. In Iran, several epidemiological studies on COVID-19 have been conducted, however, little attention has been paid to the spatial correlation of adjacent areas and exploration of hotspot clusters. Therefore, given the necessity of understanding the COVID-19 geographical distribution to the provision of health care services and planning of the residents’ immunization, this study aims to explore and examine the spatial distribution of COVID-19 in the first outbreak in Mashhad, the second populous city in Iran. It seems this would be considered a necessary step in the current health challenge.

2 Materials and methods

2.1 Study area

This study was carried on in the capital city of Khorasan-Razavi Province (KRP), Mashhad, which is located in northeastern Iran and considering a 1.78% growth from 2019, has an estimated population of 3,208,000 [15]. Mashhad is known as the second-most congested city in Iran and has the highest number of tourists every year since it is the largest religious center in Iran (Fig. 1).

Fig. 1
figure 1

Study area location with COVID-19 spatial distribution map (point density map_ km2), Mashhad, Feb 4–Apr 13, 2020

2.2 Data sources

Data were collected from two tertiary hospitals in Mashhad in May 2020 (through the first wave of the pandemic, only these two hospitals admitted COVID-19 patients). Hence, all admitted patients with ICD-10-CM (International Classification of Diseases-Clinical Modification) code U07.1 were extracted from medical records from Feb 4 until Apr 13, 2020 (n = 1535). They all had a positive laboratory test using the real-time reverse transcription-polymerase chain reaction (RT-PCR) test and had already been hospitalized due to the severe condition caused by the COVID-19 infection.

2.3 Data analysis

2.3.1 Statistical analysis

Since the data did not follow a normal distribution, continuous variables were represented as the median and interquartile range (IQR) and then compared by the Mann–Whitney U test. Categorical variables were expressed as numbers (%) and compared once between males and females and once between survivors and non-survivors by χ2 test or Fisher’s exact. By stratifying the age into nine categories (10-year intervals), the age-specific rates of infection and fatality were presented separately by each age group. In calculating the cumulative incidence rate and mortality rate (MR) (both per 10,000), the base population of each age group was included based on the latest Mashhad population and housing census data (2016) [16]. The CFR also refers to the proportion of the number of deceased cases divided by the number of confirmed cases and reflects the severity of the condition [17]. A multivariable logistic regression model was built to assess the influence of age and sex on mortality. Accordingly, adjusted odds ratios (aOR) with a 95% confidence interval (CI) were calculated and reported. All tests were two-sided, and α less than 0.05 was considered statistically significant.

2.3.2 Spatial analysis

The residential addresses of COVID-19 patients were geocoded using the Google My Maps software. We calculated point densities of COVID-19 cases during the study period, according to Mashhad’s population and housing censuses data (2016) [16]. In ArcMap, point density calculates the density of point features (patients) in a defined neighborhood [18]. Then, a cartographic map was depicted using the natural break classification method with three classes. Natural break classification is a data classification method that looks for reducing the within-class variance and maximizing the between-class variance [19]. The core concept behind the hierarchical Bayesian model used in the current study is that each area-specific cumulative incidence is based on pooling information from neighboring areas (prior distribution). A previous study has reported methodological aspects of the Bayesian analysis applied to geographical clustering [20]. After calculating the empirical Bayesian rate (EBR) for every neighborhood, the local Moran’s I statistic [21] was used to quantify spatial autocorrelation of COVID-19 frequency at the city neighborhood level. This test calculates a z-score and p-value to determine whether the apparent similarity (spatial clustering of either high or low values) or dissimilarity (presence of spatial outliers) is more pronounced than predicted in a random distribution. The null hypothesis suggests that the COVID-19 cases are randomly distributed throughout the sample region. An extremely positive z-score for a feature indicates that the surrounding features have similar values (either high or low). However, a low negative z-score for a feature suggests a statistically significant spatial data outlier [21]. This analysis can spot four types of clusters: high-high (HH) and low-low (LL) areas specify clusters of COVID-19 incidence (clusters of high values and low values), on the other hand, the high-low (HL) and low-high (LH) areas designate the outliers, which means each value (high or low) is surrounded by its opposite value.

2.3.3 Software and significance level

We used a 95% CI, and all the clusters and outliers found in this study were significant at this CI. ArcGIS software, version 10.7 (ESRI, Redlands, CA, USA) and GeoDa, version 1.20, were used for creating the descriptive maps and spatial analyses. Statistical analyses were accomplished using R software, version 4.1.3 (R Foundation for Statistical Computing), and Microsoft Excel 2019.

3 Results and discussion

We reviewed the spatial and epidemiological patterns of COVID-19 among hospitalized cases in Mashhad from Feb 4 until Apr 13, 2020. Most HH clusters were identified in the central business district (CBD) of the city. Moreover, the highest cumulative incidence of COVID-19 was attributed to the first half of March 2020, so in the CBD areas (including the holy shrine of Imam Reza, one of the main visitor attractions in Iran), HH clusters were more developed. Males had a higher share of hospitalized patients and a higher share of COVID-19-related deaths. Older age, male sex, and longer LOS were significantly associated with higher odds of death.

3.1 Statistical analysis part

Considering 1535 confirmed cases of COVID-19 included in this study, 62% (n = 951) and 38% (n = 584) were male and female, respectively. Table 1 shows a comparison between sex groups and, in addition, reveals a comparison based on patients’ health outcomes including the non-survivors (n = 356, 23.2%) and survivors (n = 1179, 76.8%). As shown in Fig. 2, covering about 70 days of hospital admissions of COVID-19 patients, the sudden and unexpected start of the first pandemic peak in the city of Mashhad is quite evident, so each hospital admission peak was accompanied by a mortality peak; the previous one about three times bigger than the next one.

Table 1 Baseline characteristics of 1535 COVID-19 patients disaggregated by sex and disease outcomes
Fig. 2
figure 2

The epidemic trends of COVID-19 patients’ admission and mortality across the study period, Mashhad, Feb 4–Apr 13, 2020

The patients’ median age was 62 years (IQR: 47–73). One percent of cases (n = 16) were younger than 20 years, and 54.8% (n = 842) were older than 60 years. Compared to males, females were marginally older (one year on average), and they had no significant differences in terms of age groups and length of stay (LOS), which is the span from hospital entry to death or discharge. More males (61.9%, n = 951) than females (38.1%, n = 584) were admitted to these two tertiary hospitals because of SARS-CoV-2 infection. In fact, among the 356 deceased cases of COVID-19, the ratio of male to female was 70% (n = 249) to 30% (n = 107). Since LOS has always been an important and determining factor in predicting patients’ prognosis [22], we found a substantial disparity between those who died (9 days, IQR: 4–23) and those who survived (6 days, IQR: 4–11) in the median LOS. As mentioned earlier, there were two types of outcomes in our results. Table 1 shows that nearly 76.8% of patients were discharged alive in these two tertiary hospitals, and the remainder died (23.2%). The non-survivors median age (70, IQR: 60–82) was significantly (11 years) older than the survivors (59, IQR: 44–70), (p < 0.0001). As shown in Fig. 3, the cumulative incidence rate and MR were close to zero in the first three age groups. The highest infection rate with COVID-19 was found in the elderly over 80 years (65.6 /10,000, 95% CI: 57.8–74.6). MR and CFR increased with age so that in each of the three upper age groups (60–69, 70–79, and 80 ≤), there was a jump in fatality rates so that the highest MR (30.6/10,000, 95% CI: 25.3–36.9) and CFR (46.6%, 95% CI: 40.6–53.4) belonged to the oldest patients (80 ≤). In the current study, 77.2% (275/356) of all mortalities caused by SARS-CoV-2 infection occurred in people over 59 years.

Fig. 3
figure 3

Distribution of cumulative incidence and fatality rates of COVID-19 stratified in 9 age groups, Mashhad, Feb 4–Apr 13, 2020

It was found that older age (≥ 60 years), male sex, and longer LOS were associated with an increased risk of death (Table 2). The odds of death for patients over 60 years of age was more than three times higher (OR: 3.66, 95% CI: 2.79–4.81) than for those under 60 years. The OR of dying for male patients was significantly higher than for females (OR: 1.58, 95% CI: 1.23–2.04). The regression model also revealed that the odds of death increased slightly with an increase in the patients’ length of stay (OR: 1.02, IQR: 1.01–1.02).

Table 2 Odds of death due to COVID-19, based on the affective factors

Older patients in the study population were more infected than younger individuals and higher mortality rates prevailed in the three older age classes (60 ≤). This could be secondary to aging-related immune senescence, more underlying risk factors (co-morbid medical conditions), and potentially other factors that are not identified in older ages relative to younger ages. As a result, the number of affected patients admitted to hospitals was smaller with age falling. Of the three younger classes, those who were usually less engaging with social activities in society often had lower infections. Moreover, the natural power of the immune systems in the cited groups has influencing power too. This tendency may be secondary due to more presence and activities of males in the community than females. Hence, the number of non-survivals were higher in male compared to females. LOS was higher in non-survivals. Taken together death episodes were higher in infected persons over 60 years of age; this may be seen as a higher LOS predictor at the time of entry.

3.2 Spatial analysis part

As shown in Fig. 1, a point density map showed the distribution of COVID-19 patients across the city of Mashhad, and it is clear that the downtown, the patients’ density was much higher than in other regions. Figure 4 shows the spatial distribution of patients with COVID-19 in the region. From 31 March to 13 April, the number of COVID-19 patients decreased.

Fig. 4
figure 4

Spatial distribution of COVID-19 patients, Mashhad, Feb 4–Apr 13, 2020, a the first week of the study period (Feb 4–Feb 17), b the second week of the study period (Feb 18–Mar 2), c the third week of the study period (Mar 3–Mar 16), d the fourth week of the study period (Mar 17–Mar 30), e the fifth week of the study period (Mar 31–Apr 13), and f the total study period

Figure 5 shows the spatial clusters of COVID-19 patients in the study area. The figure reveals that the CBD area had the most significant HH clusters throughout much of the period.

Fig. 5
figure 5

Spatial clusters of COVID-19 patients, Mashhad, Feb 4–Apr 13, 2020, a the first week of the study period (Feb 4–Feb 17), b the second week of the study period (Feb 18–Mar 2), c the third week of the study period (Mar 3–Mar 16), d the fourth week of the study period (Mar 17–Mar 30), e the fifth week of the study period (Mar 31–Apr 13), and f the total study period

The four categories shown in Fig. 5 correspond to the four quadrants in the Moran scatter plots, as shown in Fig. 6. If nearby or neighboring areas are more identical, this is understood as positive spatial autocorrelation. Negative autocorrelation describes patterns that vary from neighboring areas. For example, the first scatter plot (which shows the neighborhood level COVID-19 for 04–17 Feb) has a Moran’s value of -0.017 that should be interpreted as a spatially random pattern. Note that the EBR values have been standardized and are specified in standard deviational units (the mean is zero and the standard deviation is 1). Similarly, the EBR spatial lag was calculated for these standardized values (Fig. 6).

Fig. 6
figure 6

Moran’s scatter plots of COVID-19 empirical Bayesian rate (EBR) at the neighborhood level, Mashhad, Feb 4–Apr 13, 2020, a the first week of the study period (Feb 4–Feb 17), b the second week of the study period (Feb 18–Mar 2), c the third week of the study period (Mar 3–Mar 16), d the fourth week of the study period (Mar 17–Mar 30), e the fifth week of the study period (Mar 31–Apr 13), and f the total study period

With over 20 million visitors annually, Mashhad is known as the most popular tourist destination in Iran [23]. The spatial distribution figure scientifically portrayed that the distribution of SARS-CoV-2 infection was nearly random in this city in the first outbreak. However, it spread quickly to downtown (CBD), which is often more congested with people. In the central part of the city, there is a holy shrine with numerous visitors and multiple roofed malls. These results may be consistent with the hypothesis that emphasizes the role of social distances and closed-air spaces in the spread of respiratory infections [24]. This may be a central concept for preventing contamination through health care provider programs, so the avoidance of congregations in indoor spaces might play a crucial role in subsiding the incidence of COVID-19 cases.

Another reason that can be raised for the hotspots formed in the CBD area and around the holy shrine is the deprived neighborhood. As previously shown, the neighborhood can change physical and mental health status by impressing the available resources of that neighborhood or impressing behaviors that impact the spread of the SARS-CoV-2 pathogen [25, 26]. In these deprived neighborhoods, one of the main characteristics that might have accelerated the transmission of the COVID-19 virus is presumably to be congested living spaces. A robust relationship between neighborhood deprivation grade, indoor crowding, and COVID-19 incidence has recently been observed in the USA [27]. Furthermore, there is a positive relationship between public transportation use and the spread of the SARS-CoV-2 virus [28]. Therefore, lower-income people who do not own a private car and are dependent on public transportation are more expected to be at further risk of getting the SARS-CoV-2 virus in these areas.

Due to the large number of travelers during the Iranian New Year holidays in Mashhad, the coincidence of increasing case incidence with these holidays is quite evident (March). Lockdown of cities can be a great solution to control and prevent the spread of the disease. In Mashhad, the lack of public awareness promoted the rapid spread of COVID-19, and the rate of hospital admission and mortality increased dramatically. However, raising public wisdom about the seriousness of the COVID-19 threat, imposing travel restrictions, and closing schools and universities as influential factors in reducing the burden of COVID-19 [29], the first peak of the outbreak was eventually largely controlled. However, the most important factor in this control was the annual Iranian Nowruz two-week holiday. Starting on March 20, it turned out to be a valuable excuse for a national lockdown, e.g., already on March 14, one week before the holiday, the city’s largest commercialized capital area was closed aiming to slow down the spread of the virus. This illustrates that a timely strategy can be an efficient approach. It should be noted that the most effective way to deal with this highly contagious disease is universal and integrated vaccination. Vaccination against the disease has been able to greatly cut down the hospitalization and fatality rates of COVID-19 [30].

In a summary, due to the quick transmission of the COVID-19 infection among the human population and causing substantial morbidity and mortality, it is advisable to detect the high priority areas for the immunization against COVID-19, and the action plans tailored to each region should be taken into account. For this purpose, one of the most proper methods for timely identification and tailored planning of crises such as viral pandemics is the spatial analysis of diseases.

We acknowledge the limitations of our study. Due to the complex nature of the COVID-19 pandemic, short-term analysis policies may fail. One of the other limitations was a lack of data on underlying medical comorbidities and how this influenced mortality. In addition, only using the residence address was utilized as a risk factor for infection and death in the current study. However, we do not know where people have traveled within Mashhad or outsides and were infected. Hence, it is hardly possible to attribute the acquisition of infection to the neighborhood of residence.

4 Conclusion

Although each GIS-based map has local benefit in showing directional distribution and trend of SARS-CoV-2 in the same society, it will help epidemiologists to better understand the virus’s behaviors. Based on this study, the identified high-risk areas in COVID-19 related resource allocation and priority setting should be considered. Targeted COVID-19 immunization programs and implementation of preventative interventions are warranted.