Introduction

The new coronavirus disease (COVID-19), which first appeared in Wuhan, is an infectious disease caused by a coronavirus called severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (Lu et al. 2020), and it is widespread throughout the country and endemic in many countries and regions worldwide. COVID-19 represents another serious public health emergency since the outbreak of SARS. As of December 22, 2020, a total of 76,250,431 cumulative cases and 1,699,230 deaths have been reported globally, including 95,916 cumulative cases and 4772 deaths in China (“https://covid19.who.int/table”). The disease was confirmed to be highly contagious from humans to humans, and human activity is the main cause of its spread (Coccia 2020b). In addition, short-term exposure to environmental pollutants has been recently studied to accelerate the spread of SARS-CoV-2(Coccia 2020a; Coccia 2021a). Most infected patients have the common symptom of high fever, and some have dyspnea with chest radiograph results showing invasive lesions in both lungs (Chan et al. 2020; Huang et al. 2020). The novel coronavirus epidemic also involves a continuous process of epidemic focus, which is based on both the growth over time and the spread of space.

The occurrence of COVID-19 is regular in space, time, and space-time. The cluster patterns of COVID-19 are the focus of our research establishing the risk factors involved in the spread of COVID-19 to effectively intervene and control COVID-19. Obviously, scan statistics are excellent methods to solve multiple testing problems when two dimensions of spatial area and/or time interval are evaluated simultaneously. The scan statistics software SaTScan is mainly applied to perform temporal, spatial, and spatiotemporal analyses for many infectious diseases including schistosomiasis (Yu-Wan et al. 2019); malaria (Yan-Feng et al. 2019); dengue fever (Lai et al. 2018); hand, foot, and mouth disease (Tian et al. 2018); and hemorrhagic fever with renal syndrome (Wu et al. 2011).

In previous studies, spatiotemporal analysis methods have been widely used in the epidemiology of infectious diseases (Fang et al. 2009; Fosgate et al. 2002). Spatiotemporal analysis methods can reveal the time trends and spatiotemporal patterns of diseases from different spatial scales and can more intuitively clarify the epidemiological characteristics and epidemiological laws of infectious diseases. However, to the best of our knowledge, no specific research has been conducted on the spatiotemporal pattern of COVID-19 in mainland China. A better comprehension of the spatiotemporal distribution of COVID-19 would benefit the identification of the cities, populations, and times at highest risk, which would aid in the effective implementation of relevant preventative measures.

Many studies have shown that meteorological factors are closely related to the occurrence and spread of infectious diseases (Irfan et al. 2021; Islam et al. 2021). For example, temperature and humidity are related to the spread of severe acute respiratory syndrome (SARS)(Rahimi et al. 2021; Rosario et al. 2020; Sarkodie and Owusu 2020); some scholars believe that the spread of influenza will increase around cooler, drier air (Haque and Rahman 2020; Shen et al. 2021), and other scholars believe that the high wind speeds that cause atmospheric instability seem to reduce the number of infected people and promote the spread of air pollution (Mario Coccia 2021; Coccia 2021b). In addition, air pollution has been also widely considered to be an important factor in increasing the risk of adverse health outcomes (Du et al. 2021; Tian et al. 2021), such as ischemic heart disease, chronic obstructive pulmonary disease, and respiratory infection (Manisalidis et al. 2020; Tian et al. 2021; Wu et al. 2016). Emerging evidence suggests potential links between exposure to polluted air and the epidemic of COVID-19(Chen et al. 2021; Chen et al. 2020; Konstantinoudis et al. 2021). Research has shown that the airborne transmission is highly virulent (Srivastava 2021), particulate matter can be used as a carrier to carry viruses to spread it everywhere (Vlachogiannis et al. 2021; Yao et al. 2020), and respiratory infectious disease viruses that are efficiently transmitted through particulate matter can remain stable for a longer period of time.

In this study, geographic information system (GIS)–based analyses were conducted to characterize the spatiotemporal patterns of COVID-19 in mainland China using surveillance data from January 15 to February 25, 2020, to identify spatiotemporal clusters of COVID-19 cases at the city level and to explore the influences of short-term exposure to environmental factors (PM2.5, PM10, temperature, and humidity) and imported risk on the cumulative numbers of confirmed cases of COVID-19 in agglomerated and non-aggregated regions. It is of great significance for understanding the epidemiological distribution characteristics of COVID-19, providing clues on the causes of changes in the type of epidemic areas, delineating key prevention areas, and promoting the prevention and control of COVID-19.

Materials and methods

Sample and data

Records on COVID-19 cases between January 15 and February 25, 2020, were obtained from local health commissions on the official websites. Particulate matter ≤ 2.5 μm (PM2.5), particulate matter ≤ 10 μm (PM10), and average temperature and relative humidity data were obtained from the AQI platform website (https://www.aqistudy.cn) and the China Meteorological Date Sharing Service System (http://cdc.cma.gov.cn). The demographic data of each city during the study period were obtained from the statistical yearbook of each city. The population migration indexes of other regions in mainland China were obtained from Baidu Migration (https://qianxi.baidu.com). In our research, all COVID-19 cases were confirmed by the Chinese Center for Disease Control and Prevention with the diagnostic criteria of COVID-19(Wu et al. 2020a).

Measures of variables

The calculation of the COVID-19 imported risk index assumes that the migrant population conforms to the population distribution in the source region, and the risk of disease is consistent with the incidence rate of the source region. The imported risk of COVID-19 caused by population migration can be expressed by the scale of migration, the number of patients in the source region, and the population. The calculation formula for the imported risk index is as follows:

$$ {\mathrm{Risk}}_{in,t}=\sum \left(\frac{{\mathrm{case}}_{i,t}}{{\mathrm{pop}}_i}\times {iMI}_{i,t}\times 1000\right) $$

where Riskin,t represents the imported risk index on day t, casei,t represents the number of new cases in city i on day t, popi represents the population of city i, and iMIi,t represents the population size index of immigration from city i on day t.

Data analysis procedure and tests

Our study used SaTScan10.0 software to conduct scan statistics on the spatial, temporal, and space-time distributions of COVID-19. Using ArcGIS10.0 software, the number of COVID-19 cases in 366 cities in mainland China was geocoded through management codes and matched with city-level polygons.

The analysis was performed in four stages. First, Moran’s I index was used to evaluate whether a global spatial autocorrelation existed among 366 cities. A positive value of Moran’s I implies a clustered distribution, whereas a negative value indicates a scattered distribution. Moran’s I value close to 0 indicates a random distribution in space. Second, the discrete Poisson probability model was used to conduct Kulldorff’s space-time scan statistics to explore the temporal, spatial, and spatial-temporal clusters of COVID-19. The principle of spatiotemporal scanning statistics is to use thousands of overlapping cylinders to detect spatiotemporal clusters, and each scanning window is regarded as a possible spatiotemporal cluster. The bottom of the cylinder represents the geographic area, and the height of the cylinder represents the possible duration of the outbreak. Monte Carlo calculations are performed through the Poisson model, and the log likelihood ratio and relative risk pair of each scanned cylinder are calculated. The null hypothesis of the Poisson model is that the incidence of new coronary pneumonia inside and outside the cylinder is the same. When the scan reveals that the incidence rate inside the cylinder is greater than that outside the cylinder, the invalid hypothesis can be rejected (Song and Kulldorff 2003). The spatiotemporal clusters with the largest log likelihood ratio (LLR) in the scan result are the most likely spatiotemporal clusters, whereas the other clusters are the secondary spatiotemporal clusters (Desjardins et al. 2020). Considering the human population density and the size of each city, the maximum cluster size was set to 20%. Finally, we conducted logistic regression modeling to examine the associations of the environmental factors and imported risk according to the results of the space-time clusters of COVID-19 cases. Logistic regression analysis adopts forward stepwise (conditional) for variable screening, the entry equation level is 0.05, and the elimination level is 0.10, taking particulate matters concentration and imported risk index as independent variables, and spatiotemporal aggregation as dependent variables.

Results and discussion

In mainland China, a total of 77,658 COVID-19 cases were reported from January 15 to February 25, 2020. The average incidence rate of COVID-19 varied from 0.37/100,000 to 120.31/100,000 at the city level. The scanning results indicated that COVID-19 cases geographically differ across China. The central areas of China exhibited an increased overall risk with the central region of China showing the highest overall risk (Fig. 1).

Fig. 1
figure 1

Spatial distribution map of cumulative cases of COVID-19 in mainland China from January 15 to February 25, 2020

We created a Moran scatter plot and evaluated the significance of the average incidence of COVID-19 through global spatial autocorrelation analysis. Moran’s I values ranged from −0.004 to 0.453. The global Moran’s I of the cumulative number of COVID-19 cases in 366 cities in mainland China passed the significance test (all P < 0.05), indicating that the spatial distribution was regular from January 23 to February 25, 2020. However, the spatial distribution was not significant from January 15 to January 22, 2020 (Table 1).

Table 1 Spatial autocorrelation analysis for the incidence of COVID-19 in mainland China from January 15 to February 25, 2020

The results of the spatial analysis revealed the spatial distribution characteristics of COVID-19 and identified one most likely cluster and three secondary clusters in 366 cities in mainland China (Table 2 and Fig. 2). Twenty cities were included in total, and most cities were located in Hubei Province of China. The most likely cluster included ten regions with an overall relative risk within the cluster of 96.87 (LLR = 124,907.02, P < 0.001). The secondary cluster with a total of 8 areas was adjacent to the most likely cluster, and the relative risk was 4.51 (LLR = 3616.12, P < 0.001). The remaining two spatial clustering regions were located in Xinyu and Wenzhou, and the relative risks were 2.49 and 1.24, respectively (P< 0.05). In our study, most of the confirmed cases were in Wuhan, and the outbreak coincided with the Spring Festival holidays, which is an annual period of large-scale migration (Yang et al. 2020a, 2020b). Therefore, the number of newly confirmed cases in China has increased rapidly in a short period of time. In addition, the cities with more cases were all in the nearby areas adjacent to Wuhan, and there was no obvious spatial clustering in cities far away from Wuhan. This finding reflected that the population outflow from Wuhan has a great impact on the epidemic situation in the surrounding areas.

Table 2 Space clusters of COVID-19 cases in mainland China from January 15 to February 25, 2020
Fig. 2
figure 2

Spatial distribution of clusters of COVID-19 cases in mainland China from January 15 to February 25, 2020

Distribution of reported COVID-19 cases based on temporal clustering indicated that COVID-19 was not random in time. One most likely cluster was identified, and the overall relative risk within the cluster was 3.10 (LLR = 8629.09, P < 0.001). No secondary clusters were identified (Table 3). The COVID-19 incidence in our identified COVID-19 cluster was 84.3% of the total number of cases. This finding indirectly indicates that the scanning method has high sensitivity.

Table 3 Temporal clusters of COVID-19 cases in mainland China from January 15 to February 25, 2020

Spatiotemporal cluster analysis of COVID-19 cases during the study period showed that COVID-19 cases were statistically significant in space-time and identified one most likely cluster and seven secondary clusters (Table 4 and Fig. 3). The most likely cluster contained 10 cities in the province of Hubei and the high-risk time was from January 29 to February 18, 2020, which accounted for 47.83% of the total cumulative disease cases during the study period (LLR = 123,564.50, RR = 125.17, P < 0.001). The other seven secondary clusters included 83 cities, which were mainly distributed in the central, eastern, southern, southeastern, and northeastern regions of mainland China. The overall relative risk within the cluster was statistically significant, indicating a nonrandom pattern of disease distribution (P < 0.001).

Table 4 Spatiotemporal clusters of COVID-19 cases in mainland China from January 15 to February 25, 2020
Fig. 3
figure 3

Spatiotemporal distribution of clusters of COVID-19 cases in mainland China from January 15 to February 25, 2020

The results of the logistic regression model showed that environmental factors (PM2.5, PM10, temperature, and humidity) and the imported risk index were strongly associated with the space-time clusters of COVID-19 cases in mainland China (P < 0.001), and in the current model, it can explain 18.3% of the variation (pseudo-R2=0.183) (Table 5). Among them, temperature and PM10 exhibited a negative association with the space-time clusters of COVID-19 cases (PM10: [OR], 0.959; 95% CI, 0.952–0.965; temperature: [OR], 0.987; 95% CI, 0.978–0.995); imported risk index, humidity, and PM2.5 were significant effectors on the space-time clusters of COVID-19 cases (imported risk index: [OR], 2.443; 95% CI, 2.240–2.665. PM2.5: [OR], 1.035; 95%CI, 1.028–1.043; humidity: [OR], 1.025; 95%CI, 1.019–1.031).

Table 5 The association between the space-time clusters of COVID-19 cases and environmental factors and imported risk index

Kulldorff’s retrospective scan statistics are one of the most effective methods using routinely collected data to explore the geographical and temporal distribution (Alemu et al. 2013; Rao et al. 2017). This method dynamically supplements the pure spatial statistical method, and is used for the prediction and detection of disease outbreaks. Prevention strategies focusing on regions of highest risk could help enhance the effectiveness of the public health interventions (Wu et al. 2011). Disease clusters were detected by using this method worldwide (Ge et al. 2016; Wang et al. 2016; Zhang et al. 2014; Zhao et al. 2013). The appropriate selection of the spatial scanning window and temporal scanning window plays an important role in model identification in the temporal and spatial models. The temporal scanning window was usually the default setting as 50% of the whole research time, but some evidence suspected whether it was reasonable (Wang et al. 2013). A simulation study by Ma et al. found that if the window covers 50% of the population, it may be too large to cover low-risk areas, which possibly causes a high false positive rate (Ma et al. 2016). Similarly, if the scanning window was too small, it would be difficult to discover the real high-risk regions, potentially resulting in a high false negative rate (Rao et al. 2017). To identify the cluster areas with less overlap, Ge and Zhang used the irregular scan statistic to choose an appropriate window (Ge et al. 2016). Therefore, we learned from their experiences and used these findings as a reference in our research. The maximum temporal window was set as 20%, and the spatial window covered 20% of the population at risk. Finally, the identified high-risk clusters had no overlap.

Although some studies in China have explored the distribution of COVID-19 cases, these studies were limited to a certain province or region or a national study conducted only at the provincial level instead of at the city level (Yang et al. 2020a). When assessing the country, we could either obtain the clustered provinces or determine the clustering in a specific city in the province and further analyze the impact of particulate matter and imported risk on the clustering area. In the current study, we investigated the temporal, spatial, and spatiotemporal distributions of COVID-19 cases and explored the potential effects of imported risk and particulate matter on spatiotemporal aggregation. We also used spatial scanning statistics to identify regions with a high prevalence of COVID-19 and clustering patterns. Our study indicated that, during the entire period from January 15 to February 25, 2020, the geographic, temporal, and spatiotemporal distribution patterns of COVID-19 cases in mainland China were not random.

Compared with separate spatial clustering analysis and temporal clustering analysis, spatiotemporal scanning statistics completely utilize the temporal and spatial information in infectious disease surveillance data and improve the timeliness and effectiveness of cluster detection (Tang et al. 2017). This analysis indicated that the COVID-19 epidemic focus had shifted from Hubei Province to other provinces of China. In addition to high population mobility, which provided favorable opportunities for the rapid spread of the virus, a number of observations point to the role of environmental factors in the survival and transmission of novel coronavirus (Bashir et al. 2020; Fattorini and Regoli 2020; Li et al. 2020). We observed that imported risk, humidity, and PM2.5 were the main risk factors affecting space-time accumulation. Conversely, temperature and PM10 became a protective factor against the spatiotemporal accumulation of COVID-19. The results of our research are partially consistent with the findings of Jiang et al. (2020) and Li et al. (2020), and we believe this phenomenon is related to the diameter of the particles. Studies have shown that when the diameter of particulate matter does not exceed 5 μm, it can reach type II alveolar cells, which harbor SARS-CoV-2 cell entry receptor (ACE2)(Copat et al. 2020). Although all particulate matter could facilitate viral attachment, PM2.5 can deliver SARS-CoV-2 into the target cells in the alveoli whereas PM10 cannot (Lee et al. 2014). This factor might also explain our research why PM2.5 is positively correlated with the accumulation of new coronary pneumonia, whereas PM10 is negatively correlated. In our research, temperature is a protective factor for new coronary pneumonia, and high temperature inhibits the clustering of confirmed cases of COVID-19 in Chinese cities. Some epidemiological studies also show that under low-temperature conditions, the spread of SARS-CoV-2 is more efficient and more stable (Haque and Rahman 2020). Humidity, as a risk factor, increases the possibility of temporal and spatial accumulation of new coronary pneumonia. This conclusion is similar to the research results of Jiang et al. (2020). However, many studies have inconsistent and conflicting results regarding the impact of meteorological factors on the new coronary pneumonia (Zheng et al. 2021). Therefore, further verification is necessary.

Conclusion

Under the conditions of no imported cases from abroad, our paper identified significant spatial, temporal, and spatiotemporal clusters of COVID-19 cases at the city level in mainland China. In addition, the impacts of environmental factors and imported risks on spatiotemporal clustering were also revealed. Hubei province and its surrounding cities were the most likely clustering areas with the migration of population and the impact of climate factors. A significant correlation is noted between imported risk, humidity, and PM2.5 and the spatiotemporal accumulation of COVID-19, whereas temperature and PM10 are negatively correlated with the spatiotemporal clustering of COVID-19.

The cluster analysis in this study is an ecological study, the quality of data over time and from different places may be quite variable, and the ecological fallacy is usually interpreted as a major weakness. Therefore, these results cannot be extrapolated to the individual level. Another weakness is the limitations of some traditional statistical methods in our research. The occurrence and development of new coronary pneumonia are complicated, and there are often correlations between observation indicators. When constructing the logistic linear regression equation, it appears as collinearity between variables, which may cause some important indicators to be screened out. However, due to its strong versatility, high fitting accuracy, and wide application value, the results are still convincing. And cluster analysis does help to visualize epidemiological data and detect and evaluate hot spots or clusters. The results may improve disease surveillance and effectively manage control plan resources.

In addition, the pandemic of new coronary pneumonia and the mutation of viral factors have caused many socio-economic problems (Coccia 2021c). Because it is a new pathogen that is highly infectious and concealed, humans generally lack immunity to it. Therefore, the primary response strategy in mainland China is to adopt physical isolation, and nucleic acid testing was used to detect and strip the source of infection early for isolation and centralized treatment. These measures not only prevent the virus in the discovered patient from spreading to the uninfected person, but also prevent the virus discharged by the asymptomatic person who has not been found to be positive with the virus to enter the next susceptible person and be inactivated in the environment. The new coronavirus will no longer exist in the area until after one or at most two incubation periods. Secondly, with the successful development of vaccines, it is necessary to do our best to complete the whole course of COVID-19 vaccination in the largest range and in the shortest time, so as to achieve the effect of herd immunity. Moreover, because cases continue to appear in our country, great uncertainties remain in the late stage of the epidemic. The incidence of COVID-19 cases should be closely monitored on a continuous basis, and targeted prevention and control strategies should be developed to effectively control the epidemic of COVID-19.