1 Introduction

The first pneumonia case of unknown cause was found close to a seafood market in Wuhan, the capital city of Hubei province, China, on December 8, 2019. Several clusters of patients with similar pneumonia were reported through late December 2019. The pneumonia was later identified to be caused by a new coronavirus (severe acute respiratory syndrome coronavirus 2, or SARS-CoV-2) (Zhu et al. 2020), later named Coronavirus Disease 2019 (COVID-19) by the World Health Organization (WHO).Footnote 1 While the seafood market was closed on January 1, 2020, a massive outflow of travelers during the Chinese Spring Festival travel rush (Chunyun) in mid-JanuaryFootnote 2 led to the rapid spread of COVID-19 throughout China and to other countries. The first confirmed case outside Wuhan in China was reported in Shenzhen on January 19 (Li et al. 2020). As of April 5, over 1.2 million confirmed cases were reported in at least 200 countries or territories.Footnote 3

Two fundamental strategies have been taken globally, one focused on mitigating but not necessarily stopping the virus spread and the other relying on more stringent measures to suppress and reverse the growth trajectories. While most Western countries initially implemented the former strategy, more and more of them (including most European countries and the USA) have since shifted towards the more stringent suppression strategy, and some other countries such as China, Singapore, and South Korea have adopted the latter strategy from the beginning. In particular, China has rolled out one of the most stringent public health strategies. That strategy involves city lockdowns and mandatory quarantines to ban or restrict traffic since January 23, social distance–encouraging strategies since January 28, and a centralized treatment and isolation strategy since February 2.

This study estimates how the number of daily newly confirmed COVID-19 cases in a city is influenced by the number of new COVID-19 cases in the same city, nearby cities, and Wuhan during the preceding 2 weeks using the data on confirmed COVID-19 case counts in China from January 19 to February 29. By comparing the estimates before and after February 2, we examine whether the comprehensive set of policies at the national scale delays the spread of COVID-19. Besides, we estimate the impacts of social distancing measures in reducing the transmission rate utilizing the closed management of communities and family outdoor restrictions policies that were gradually rolled out across different cities.

As COVID-19 evolves into a global pandemic and mitigating strategies are faced with growing pressure to flatten the curve of virus transmissions, more and more nations are considering implementing stringent suppression measures. Therefore, examining the factors that influence the transmission of COVID-19 and the effectiveness of the large-scale mandatory quarantine and social distancing measures in China not only adds to our understanding of the containment of COVID-19 but also provides insights into future prevention work against similar infectious diseases.

In a linear equation of the current number of new cases on the number of new cases in the past, the unobserved determinants of new infections may be serially correlated for two reasons. First, the number of people infected by a disease usually first increases, reaches a peak, and then drops. Second, there are persistent, unobservable variables, such as clusters that generate large numbers of infections, people’s living habits, and government policies. Serial correlations in errors give rise to correlations between the lagged numbers of cases and the error term, rendering the ordinary least square (OLS) estimator biased. Combining insights in Adda (2016), the existing knowledge of the incubation period of COVID-19 (World Health Organization 2020b), and data on weather conditions that affect the transmission rates of COVID-19 (Lowen and Steel 2014; Wang et al. 2020b), we construct instrumental variables for the number of new COVID-19 cases during the preceding 2 weeks. Weather characteristics in the previous third and fourth weeks do not directly affect the number of new COVID-19 cases after controlling for the number of new COVID-19 cases and weather conditions in the preceding first and second weeks. Therefore, our estimated impacts have causal interpretations and reflect population transmission rates.

Meanwhile, we estimate the mediating effects of socioeconomic factors on the transmission of COVID-19 in China. These factors include population flow out of Wuhan, the distance between cities, GDP per capita, the number of doctors, and contemporaneous weather conditions. We examine whether population flows from the origin of the COVID-19 outbreak, which is a major city and an important transportation hub in central China, can explain the spread of the virus using data on real-time travel intensity between cities that have recently become available for research. Realizing the urgency of forestalling widespread community transmissions in areas that had not seen many infections, in late January, many Chinese cities implemented public health measures that encourage social distancing. We also examine the impacts of these measures on curtailing the spread of the virus.

We find that transmission rates were lower in February than in January, and cities outside Hubei province had lower transmission rates. Preventing the transmission rates in non-Hubei cities from increasing to the level observed in late January in Hubei caused the largest reduction in the number of infections. Apart from the policies implemented nationwide, the additional social distancing policies imposed in some cities in late January further helped reduce the number of infections. By mid February, the spread of the virus was contained in China. While many socioeconomic factors moderated the spread of the virus, the actual population flow from the source posed a higher risk to destinations than other factors such as geographic proximity and similarity in economic conditions.

Our analysis contributes to the existing literature in three aspects. First, our analysis is connected to the economics and epidemiological literature on the determinants of the spread of infectious diseases and prevention of such spread. Existing studies find that reductions in population flow (Zhan et al. 2020; Zhang et al. 2020; Fang et al. 2020) and interpersonal contact from holiday school closings (Adda 2016), reactive school closures (Litvinova et al. 2019), public transportation strikes (Godzinski and Suarez Castillo 2019), strategic targeting of travelers from high-incidence locations (Milusheva 2017), and paid sick leave to keep contagious workers at home (Barmby and Larguem 2009; Pichler and Ziebarth 2017) can mitigate the prevalence of disease transmissions. In addition, studies show viruses spread faster during economic booms (Adda 2016), increases in employment are associated with increased incidence of influenza (Markowitz et al. 2019), and growth in trade can significantly increase the spread of influenza (Adda 2016) and HIV (Oster 2012). Vaccination (Maurer 2009; White 2019) and sunlight exposure (Slusky and Zeckhauser 2018) are also found effective in reducing the spread of influenza.

Second, our paper adds to the epidemiological studies on the basic reproduction number (R0) of COVID-19, i.e., the average number of cases directly generated by one case in a population where all individuals are susceptible to infection. Given the short time period since the beginning of the COVID-19 outbreak, research is urgently needed to assess the dynamics of transmissions and the implications for how the COVID-19 outbreak will evolve (Wu et al. 2020b, 2020c). Liu et al. (2020) identify 12 studies that estimated the basic reproductive number in the wide range of 1.4 to 6.5 (with a mean of 3.28 and a median of 2.79) for Wuhan, Hubei, China, or overseas during January 1 through January 28, 2020.Footnote 4 Our R0 estimate relies on spatially disaggregated data during an extended period (until February 29, 2020) to mitigate potential biases, and the instrumental variable approach we use isolates the causal effect of virus transmissions and imposes fewer restrictions on the relationship between the unobserved determinants of new cases and the number of cases in the past. Simultaneously considering a more comprehensive set of factors in the model that may influence virus spread, we find that one case generates 2.992 more cases within 2 weeks (1.876 if cities in Hubei province are excluded) in the sub-sample from January 19 to February 1. In the sub-sample from February 2 to February 29, the transmission rates fall to 1.243 (0.614 excluding Hubei province). Our estimate of R0 for the period in late January 2020 that overlaps with existing studies falls well within the range of the estimated R0 in the emerging COVID-19 literature (Liu et al. 2020).

Third, our study contributes to the assessments of public health measures aiming at reducing virus transmissions and mortality. Through a set of policy simulations, we report initial evidence on the number of avoided infections through the end of February 2020 for cities outside Hubei province. Specifically, the stringent health policies at the national and provincial levels reduced the transmission rate and resulted in 1,408,479 (95% CI, 815,585 to 2,001,373) fewer infections and potentially 56,339 fewer deaths.Footnote 5 In contrast, the effects of the Wuhan lockdown and local non-pharmaceutical interventions (NPIs) are considerably smaller. As a result of the Wuhan lockdown, closed management of communities, and family outdoor restrictions, 31,071 (95% CI, 8296 to 53,845), 3803 (95% CI, 1142 to 6465), and 2703 (95% CI, 654 to 4751) fewer cases were avoided, respectively. These three policies may respectively avoid 1,243 deaths, 152 deaths, and 108 deaths. Making some additional assumptions, such as the value of statistical life and lost productive time, these estimates may provide the basis for more rigorous cost-benefit analysis regarding relevant public health measures.

This paper is organized as follows. Section 2 introduces the empirical model. Section 3 discusses our data and the construction of key variables. Section 4 presents the results. Section 5 documents the public health measures implemented in China, whose impacts are quantified in a series of counterfactual exercises. Section 6 concludes. The Appendix contains additional details on the instrumental variables, data quality, and the computation of counterfactuals.

2 Empirical model

Our analysis sample includes 304 prefecture-level cities in China. We exclude Wuhan, the capital city of Hubei province, from our analysis for two reasons. First, the epidemic patterns in Wuhan are significantly different from those in other cities. Some confirmed cases in Wuhan contracted the virus through direct exposure to Huanan Seafood Wholesale Market, which is the most probable origin of the virusFootnote 6. In other cities, infections arise from human-to-human transmissions. Second, COVID-19 cases were still pneumonia of previously unknown virus infections in people’s perception until early January so that Wuhan’s health care system became overwhelmed as the number of new confirmed cases increased exponentially since mid-January. This may have caused severe delay and measurement errors in the number of cases reported in Wuhan, and to a lesser extent, in other cities in Hubei province. To alleviate this concern, we also conduct analyses excluding all cities in Hubei province from our sample.

To model the spread of the virus, we consider within-city spread and between-city transmissions simultaneously (Adda 2016). Our starting point is

$$ y_{ct}=\sum\limits_{s=1}^{14}\alpha_{\text{within},s}y_{c,t-s}+\sum\limits_{s=1}^{14}\alpha_{\text{between},s}\sum\limits_{r\neq c}d_{cr}^{-1}y_{r,t-s}+\sum\limits_{s=1}^{14}\rho_{s}z_{t-s}+x_{ct}\upbeta+\epsilon_{ct}, $$

where c is a city other than Wuhan, and yct is the number of new confirmed cases of COVID-19 in city c on date t. Regarding between-city transmissions, dcr is the log of the distance between cities c and r, and \({\sum }_{r\neq c}d_{cr}^{-1}y_{rt}\) is the inverse distance weighted sum of new infections in other cities. Considering that COVID-19 epidemic originated from one city (Wuhan) and that most of the early cases outside Wuhan can be traced to previous contacts with persons in Wuhan, we also include the number of new confirmed cases in Wuhan (zt) to model how the virus spreads to other cities from its source. We may include lagged yct, yrt, and zt up to 14 days based on the estimates of the durations of the infectious period and the incubation period in the literatureFootnote 7. xct includes contemporaneous weather controls, city, and day fixed effectsFootnote 8. 𝜖ct is the error term. Standard errors are clustered by province.

To make it easier to interpret the coefficients, we assume that the transmission dynamics (αwithin,s, αbetween,s, ρs) are the same within s = 1,⋯ ,7 and s = 8,⋯ ,14, respectively, but can be different across weeks. Specifically, we take averages of lagged yct, yrt, and zt by week, as \(\bar {y}_{ct}^{\tau }=\frac {1}{7}{\sum }_{s=1}^{7}y_{ct-7\left (\tau -1\right )-s}\), \(\bar {y}_{rt}^{\tau }=\frac {1}{7}{\sum }_{s=1}^{7}y_{rt-7\left (\tau -1\right )-s}\), and \(\bar {z}_{t}^{\tau }=\frac {1}{7}{\sum }_{s=1}^{7}z_{t-7\left (\tau -1\right )-s}\), in which τ denotes the preceding first or second week. Our main model is

$$ y_{ct}=\sum\limits_{\tau=1}^{2}\alpha_{\text{within},\tau}\bar{y}_{ct}^{\tau}+\sum\limits_{\tau=1}^{2}\alpha_{\text{between},\tau}\sum\limits_{r\neq c}d_{cr}^{-1}\bar{y}_{rt}^{\tau}+\sum\limits_{\tau=1}^{2}\rho_{\tau}\bar{z}_{t}^{\tau}+x_{ct}\upbeta+\epsilon_{ct}.\quad\textbf{Model A} $$
(1)

We also consider more parsimonious model specifications, such as the model that only considers within-city transmissions,

$$ y_{ct}=\sum\limits_{\tau=1}^{2}\alpha_{\text{within},\tau}\bar{y}_{ct}^{\tau}+x_{ct}\upbeta+\epsilon_{ct}, $$
(2)

and the model where the time lagged variables are averages over the preceding 2 weeks,

$$ y_{ct}=\alpha_{\text{within}}\frac{1}{14}\sum\limits_{s=1}^{14}y_{c,t-s}+\alpha_{\text{between}}\frac{1}{14}\sum\limits_{s=1}^{14} \sum\limits_{r\neq c}d_{cr}^{-1}y_{r,t-s}+\rho\frac{1}{14}\sum\limits_{s=1}^{14}z_{t-s}+x_{ct}\upbeta+\epsilon_{ct}.\quad\textbf{Model B} $$

There are several reasons that \(\bar {y}_{ct}^{\tau }\), \(\bar {y}_{rt}^{\tau }\), and \(\bar {z}_{t}^{\tau }\) may be correlated with the error term 𝜖ct. The unobserved determinants of new infections such as local residents’ and government’s preparedness are likely correlated over time, which causes correlations between the error term and the lagged dependent variables. As noted by the World Health Organization (2020b), most cases that were locally generated outside Hubei occurred in households or clusters. The fact that big clusters give rise to a large number of cases within a short period of time may still be compatible with a general low rate of community transmissions, especially when measures such as social distancing are implemented. Therefore, the coefficients are estimated by two-stage least squares in order to obtain consistent estimates on the transmission rates in the population.

In Eq. 2, the instrumental variables include averages of daily maximum temperature, total precipitation, average wind speed, and the interaction between precipitation and wind speed, for city c in the preceding third and fourth weeks. Detailed discussion of the selection of weather characteristics as instruments is in Section 3.2. The timeline of key variables are displayed in Fig. 1. The primary assumption on the instrumental variables is that weather conditions before 2 weeks do not affect the likelihood that a person susceptible to the virus contracts the disease, conditional on weather conditions and the number of infectious people within the 2-week window. On the other hand, they affect the number of other persons who have become infectious within the 2-week window, because they may have contracted the virus earlier than 2 weeks. These weather variables are exogenous to the error term and affect the spread of the virus, which have been used by Adda (2016) to instrument flu infectionsFootnote 9.

Fig. 1
figure 1

Timeline of key variables

Another objective of this paper is to quantify the effect of various socioeconomic factors in mediating the transmission rates of the virus, which may identify potential behavioral and socioeconomic risk factors for infections. For within-city transmissions, we consider the effects of local public health measures (see Section 5 for details) and the mediating effects of population density, level of economic development, number of doctors, and environmental factors such as temperature, wind, and precipitation. For between-city transmissions, apart from proximity measures based on geographic distance, we also consider similarity in population density and the level of economic development. To measure the spread of the virus from Wuhan, we also include the number of people traveling from Wuhan. The full empirical model is as follows:

$$ \begin{array}{@{}rcl@{}} y_{ct}&= & \sum\limits_{\tau=1}^{2}{\sum}_{k=1}^{K_{\text{within}}}\alpha_{\text{within},\tau}^{k}\bar{h}_{ct}^{k\tau}\bar{y}_{ct}^{\tau}+\sum\limits_{\tau=1}^{2}{\sum}_{k=1}^{K_{\text{between}}}{\sum}_{r\neq c}\alpha_{\text{between},\tau}^{k}\bar{m}_{crt}^{k\tau}\bar{y}_{rt}^{\tau}+\sum\limits_{\tau=1}^{2}{\sum}_{k=1}^{K_{\text{Wuhan}}}\rho_{\tau}^{k}\bar{m}_{c,\text{Wuhan},t}^{k\tau}\bar{z}_{t}^{\tau}\\ && + x_{ct}\upbeta+\epsilon_{ct}, \end{array} $$
(3)

where \(\bar {h}_{ct}^{k\tau }\) includes dummies for local public health measures and the mediating factors for local transmissions. \(\bar {m}_{crt}^{k\tau }\) and \(\bar {m}_{c,\text {Wuhan},t}^{k\tau }\) are the mediating factors for between-city transmissions and imported cases from Wuhan.

3 Data

3.1 Variables

January 19, 2020, is the first day that COVID-19 cases were reported outside of Wuhan, so we collect the daily number of new cases of COVID-19 for 305 cities from January 19 to February 29. All these data are reported by 32 provincial-level Health Commissions in ChinaFootnote 10. Figure 2 shows the time patterns of daily confirmed new cases in Wuhan, in Hubei province outside Wuhan, and in non-Hubei provinces of mainland China. Because Hubei province started to include clinically diagnosed cases into new confirmed cases on February 12, we notice a spike in the number of new cases in Wuhan and other cities in Hubei province on this day (Fig. 2). The common effects of such changes in case definitions on other cities can be absorbed by time fixed effects. As robustness checks, we re-estimate models A and B without the cities in Hubei province. In addition, since the number of clinically diagnosed cases at the city level was reported for the days of February 12, 13, and 14, we recalculated the daily number of new cases for the 3 days by removing the clinically diagnosed cases from our data and re-estimate models A and B. Our main findings still hold (Appendix B).

Fig. 2
figure 2

Number of daily new confirmed cases of COVID-19 in mainland China

Regarding the explanatory variables, we calculate the number of new cases of COVID-19 in the preceding first and second weeks for each city on each day. To estimate the impacts of new COVID-19 cases in other cities, we first calculate the geographic distance between a city and all other cities using the latitudes and longitudes of the centroids of each city and then calculate the weighted sum of the number of COVID-19 new cases in all other cities using the inverse of log distance between a city and each of the other cities as the weight.

Since the COVID-19 outbreak started from Wuhan, we also calculate the weighted number of COVID-19 new cases in Wuhan using the inverse of log distance as the weight. Furthermore, to explore the mediating impact of population flow from Wuhan, we collect the daily population flow index from Baidu that proxies for the total intensity of migration from Wuhan to other citiesFootnote 11. Figure 3 plots the Baidu index of population flow out of Wuhan and compares its values this year with those in 2019. We then interact the flow index with the share that a destination city takes (Fig. 4) to construct a measure on the population flow from Wuhan to a destination city. Other mediating variables include population density, GDP per capita, and the number of doctors at the city level, which we collect from the most recent China city statistical yearbook. Table 1 presents the summary statistics of these variables. On average, GDP per capita and population density are larger in cities outside Hubei province than those in Hubei. Compared with cities in Hubei province, cities outside Hubei have more doctors.

Fig. 3
figure 3

Baidu index of population flow from Wuhan

Fig. 4
figure 4

Destination shares in population flow from Wuhan

Table 1 Summary statistics

We rely on meteorological data to construct instrumental variables for the endogenous variables. The National Oceanic and Atmospheric Administration (NOAA) provides average, maximum, and minimum temperatures, air pressure, average and maximum wind speeds, precipitation, snowfall amount, and dew point for 362 weather stations at the daily level in China. To merge the meteorological variables with the number of new cases of COVID-19, we first calculate daily weather variables for each city on each day from 2019 December to 2020 February from station-level weather records following the inverse distance weighting method. Specifically, for each city, we draw a circle of 100 km from the city’s centroid and calculate the weighted average daily weather variables using stations within the 100-km circleFootnote 12. We use the inverse of the distance between the city’s centroid and each station as the weight. Second, we match the daily weather variables to the number of new cases of COVID-19 based on city name and date.

3.2 Selection of instrumental variables

The transmission rate of COVID-19 may be affected by many environmental factors. Human-to-human transmission of COVID-19 is mostly through droplets and contacts (National Health Commission of the PRC 2020). Weather conditions such as rainfall, wind speed, and temperature may shape infections via their influences on social activities and virus transmissions. For instance, increased precipitation results in higher humidity, which may weaken virus transmissions (Lowen and Steel 2014). The virus may survive longer with lower temperature (Wang et al. 2020b; Puhani 2020). Greater wind speed and therefore ventilated air may decrease virus transmissions. In addition, increased rainfall and lower temperature may also reduce social activities. Newly confirmed COVID-19 cases typically arise from contracting the virus within 2 weeks in the past (e.g., World Health Organization 2020b). The extent of human-to-human transmission is determined by the number of people who have already contracted the virus and the environmental conditions within the next 2 weeks. Conditional on the number of people who are infectious and environmental conditions in the previous first and second weeks, it is plausible that weather conditions further in the past, i.e., in the previous third and fourth weeks, should not directly affect the number of current new cases. Based on the existing literature, we select weather characteristics as the instrumental variables, which include daily maximum temperature, precipitation, wind speed, and the interaction between precipitation and wind speed.

We then regress the endogenous variables on the instrumental variables, contemporaneous weather controls, city, date, and city by week fixed effects. Table 2 shows that F-tests on the coefficients of the instrumental variables all reject joint insignificance, which confirms that overall the selected instrumental variables are not weak. The coefficients of the first stage regressions are reported in Table 9 in the appendix.

Table 2 First stage results

We also need additional weather variables to instrument the adoption of public health measures at the city level. Since there is no theoretical guidance from the existing literature, we implement the Cluster-Lasso method of Belloni et al. (2016) and Ahrens et al. (2019) to select weather characteristics that have good predictive power. Details are displayed in Appendix A.

4 Results

Our sample starts from January 19, when the first COVID-19 case was reported outside Wuhan. The sample spans 6 weeks in total and ends on February 29. We divide the whole sample into two sub-samples (January 19 to February 1, and February 2 to February 29) and estimate the model using the whole sample and two sub-samples, respectively. In the first 2 weeks, COVID-19 infections quickly spread throughout China with every province reporting at least one confirmed case, and the number of cases also increased at an increasing speed (Fig. 2). It is also during these 2 weeks that the Chinese government took actions swiftly to curtail the virus transmission. On January 20, COVID-19 was classified as a class B statutory infectious disease and treated as a class A statutory infectious disease. The city of Wuhan was placed under lockdown on January 23; roads were closed, and residents were not allowed to leave the city. Many other cities also imposed public policies ranging from canceling public events and stopping public transportation to limiting how often residents could leave home. By comparing the dynamics of virus transmissions in these two sub-samples, we can infer the effectiveness of these public health measures.

In this section, we will mostly rely on model A to interpret the results, which estimates the effects of the average number of new cases in the preceding first and second week, respectively, and therefore enables us to examine the transmission dynamics at different time lags. As a robustness check, we also consider a simpler lag structure to describe the transmission dynamics. In model B, we estimate the effects of the average number of new cases in the past 14 days instead of using two separate lag variables.

4.1 Within-city transmission

Table 3 reports the estimation results of the OLS and IV regressions of Eq. 2, in which only within-city transmission is considered. After controlling for time-invariant city fixed effects and time effects that are common to all cities, on average, one new infection leads to 1.142 more cases in the next week, but 0.824 fewer cases 1 week later. The negative effect can be attributed to the fact that both local authorities and residents would have taken more protective measures in response to a higher perceived risk of contracting the virus given more time. Information disclosure on newly confirmed cases at the daily level by official media and information dissemination on social media throughout China may have promoted more timely actions by the public, resulting in slower virus transmissions. We then compare the transmission rates in different time windows. In the first sub-sample, one new infection leads to 2.135 more cases within a week, implying a fast growth in the number of cases. However, in the second sub-sample, the effect decreases to 1.077, suggesting that public health measures imposed in late January were effective in limiting a further spread of the virus. Similar patterns are also observed in model B.

Table 3 Within-city transmission of COVID-19

Many cases were also reported in other cities in Hubei province apart from Wuhan, where six of them reported over 1000 cumulative cases by February 15Footnote 13. Their overstretched health care system exacerbates the concern over delayed reporting of confirmed cases in these cities. To mitigate the effect of such potential measurement errors on our estimates, we re-estimate (2) excluding all cities in Hubei province. The bottom panel of Table 3 reports these estimates. Comparing the IV estimates in columns (4) and (6) between the upper and lower panels, we find that the transmission rates are lower in cities outside Hubei. In the January 19–February 1 sub-sample, one new case leads to 1.483 more cases in the following week, and this is reduced to 0.903 in the February 2–February 29 sub-sample. We also find a similar pattern when comparing the estimates from model B.

4.2 Between-city transmission

People may contract the virus from interaction with the infected people who live in the same city or other cities. In Eq. 1, we consider the effects of the number of new infections in other cities and in the epicenter of the epidemic (Wuhan), respectively, using inverse log distance as weights. In addition, geographic proximity may not fully describe the level of social interactions between residents in Wuhan and other cities since the lockdown in Wuhan on January 23 significantly reduced the population flow from Wuhan to other cities. To alleviate this concern, we also use a measure of the size of population flow from Wuhan to a destination city, which is constructed by multiplying the daily migration index on the population flow out of Wuhan (Fig 3) with the share of the flow that a destination city receives provided by Baidu (Fig. 4). For days before January 25, we use the average destination shares between January 10 and January 24. For days on or after January 24, we use the average destination shares between January 25 and February 23Footnote 14.

Table 4 reports the estimates from IV regressions of Eq. 1, and Table 5 reports the results from the same regressions excluding Hubei province. Column (4) of Table 4 indicates that in the first sub-sample, one new case leads to 2.456 more cases within 1 week, and the effect is not statistically significant between 1 and 2 weeks. Column (6) suggests that in the second sub-sample, one new case leads to 1.127 more cases within 1 week, and the effect is not statistically significant between 1 and 2 weeks. The comparison of the coefficients on own city between different sub-samples indicates that the responses of the government and the public have effectively decreased the risk of additional infections. Comparing Table 4 with Table 3, we find that although the number of new cases in the preceding second week turns insignificant and smaller in magnitude, coefficients on the number of new cases in the preceding first week are not sensitive to the inclusion of terms on between-city transmissions.

Table 4 Within- and between-city rransmission of COVID-19
Table 5 Within- and between-city transmission of COVID-19, excluding cities in Hubei Province

As a robustness test, Table 5 reports the estimation results excluding the cities in Hubei province. Column (4) of Table 5 indicates that in the first sub-sample, one new case leads to 1.194 more cases within a week, while in the second sub-sample, one new case only leads to 0.899 more cases within a week. Besides, in the second subsample, one new case results in 0.250 fewer new infections between 1 and 2 weeks, which is larger in magnitude and more significant than the estimate (− 0.171) when cities in Hubei province are included for estimation (column (6) of Table 4).

The time varying patterns in local transmissions are evident using the rolling window analysis (Fig. 5). The upper left panel displays the estimated coefficients on local transmissions for various 14-day sub-samples with the starting date labelled on the horizontal axis. After a slight increase in the local transmission rates, one case generally leads to fewer and fewer additional cases a few days after January 19. Besides, the transmission rate displays a slight increase beginning around February 4, which corresponds to the return travels and work resumption after Chinese Spring Festival, but eventually decreases at around February 12. Such decrease may be partly attributed to the social distancing strategies at the city level, so we examine the impacts of relevant policies in Section 5. Moreover, the transmission rates in cities outside Hubei province have been kept at low levels throughout the whole sample period (columns (4) and (6) of Table 5). These results suggest that the policies adopted at the national and provincial levels soon after January 19 prevented cities outside Hubei from becoming new hotspots of infections. Overall, the spread of the virus has been effectively contained by mid February, particularly for cities outside Hubei province.

Fig. 5
figure 5

Rolling window analysis of within- and between-city transmission of COVID-19. This figure shows the estimated coefficients and 95% CIs from the instrumental variable regressions. The specification is the same as the IV regression models in Table 4. Each estimation sample contains 14 days with the starting date indicated on the horizontal axis

In the epidemiology literature, the estimates on the basic reproduction number of COVID-19 are approximately within the wide range of \(1.4\sim 6.5\) (Liu et al. 2020). Its value depends on the estimation method used, underlying assumptions of modeling, time period covered, geographic regions (with varying preparedness of health care systems), and factors considered in the models that affect disease transmissions (such as the behavior of the susceptible and infected population). Intuitively, it can be interpreted as measuring the expected number of new cases that are generated by one existing case. It is of interest to note that our estimates are within this range. Based on the results from model B in Tables 4 and 5, one case leads to 2.992 more cases in the same city in the next 14 days (1.876 if cities in Hubei province are excluded). In the second sub-sample (February 2–February 29), these numbers are reduced to 1.243 and 0.614, respectively, suggesting that factors such as public health measures and people’s behavior may play an important role in containing the transmission of COVID-19.

While our basic reproduction number estimate (R0) is within the range of estimates in the literature and is close to its median, five features may distinguish our estimates from some of the existing epidemiological estimates. First, our instrumental variable approach helps isolate the causal effect of virus transmissions from other confounded factors; second, our estimate is based on an extended time period of the COVID-19 pandemic (until the end of February 2020) that may mitigate potential biases in the literature that relies on a shorter sampling period within 1–28 January 2020; third, our modeling makes minimum assumptions of virus transmissions, such as imposing fewer restrictions on the relationship between the unobserved determinants of new cases and the number of cases in the past; fourth, our model simultaneously considers comprehensive factors that may affect virus transmissions, including multiple policy instruments (such as closed management of communities and shelter-at-home order), population flow, within- and between-city transmissions, economic and demographic conditions, weather patterns, and preparedness of health care system. Fifth, our study uses spatially disaggregated data that cover China (except its Hubei province), while some other studies examine Wuhan city, Hubei province, China as a whole, or overseas.

Regarding the between-city transmission from Wuhan, we observe that the population flow better explains the contagion effect than geographic proximity (Table 4). In the first sub-sample, one new case in Wuhan leads to more cases in other cities receiving more population flows from Wuhan within 1 week. Interestingly, in the second sub-sample, population flow from Wuhan significantly decreases the transmission rate within 1 week, suggesting that people have been taking more cautious measures from high COVID-19 risk areas; however, more arrivals from Wuhan in the preceding second week can still be a risk. A back of the envelope calculation indicates that one new case in Wuhan leads to 0.064 (0.050) more cases in the destination city per 10,000 travelers from Wuhan within 1 (2) week between January 19 and February 1 (February 2 and February 29)Footnote 15. Note that while the effect is statistically significant, it should be interpreted in context. It was estimated that 15,000,000 people would travel out of Wuhan during the Lunar New Year holidayFootnote 16. If all had gone to one city, this would have directly generated about 171 cases within 2 weeks. The risk of infection is likely very low for most travelers except for few who have previous contacts with sources of infection, and person-specific history of past contacts may be an essential predictor for infection risk, in addition to the total number of population flowsFootnote 17.

A city may also be affected by infections in nearby cities apart from spillovers from Wuhan. We find that the coefficients that represent the infectious effects from nearby cities are generally small and not statistically significant (Table 4), implying that few cities outside Wuhan are themselves exporting infections. This is consistent with the findings in the World Health Organization (2020b) that other than cases that are imported from Hubei, additional human-to-human transmissions are limited for cities outside Hubei. Restricting to cities outside Hubei province, the results are similar (Table 5), except that the transmission from Wuhan is not significant in the first half sample.

4.3 Social and economic mediating factors

We also investigate the mediating impacts of some socioeconomic and environmental characteristics on the transmission rates (3). To ease the comparison between different moderators, we consider the mediating impacts on the influence of the average number of new cases in the past 2 weeks. Regarding own-city transmissions, we examine the mediating effects of population density, GDP per capita, number of doctors, and average temperature, wind speed, precipitation, and a dummy variable of adverse weather conditions. Regarding between-city transmissions, we consider the mediating effects of distance, difference in population density, and difference in GDP per capita since cities that are similar in density or economic development level may be more closely linked. We also include a measure of population flows from Wuhan. Table 6 reports the estimation results of the IV regressions. To ease the comparison across various moderators, for the mediating variables of within-city transmissions that are significant at 10%, we compute the changes in the variables so that the effect of new confirmed infections in the past 14 days on current new confirmed cases is reduced by 1 (columns (2) and (4)).

Table 6 Social and economic factors mediating the transmission of COVID-19

In the early phase of the epidemic (January 19 to February 1), cities with more medical resources, which are measured by the number of doctors, have lower transmission rates. One standard deviation increase in the number of doctors reduces the transmission rate by 0.12. Cities with higher GDP per capita have higher transmission rates, which can be ascribed to the increased social interactions as economic activities increaseFootnote 18. In the second sub-sample, these effects become insignificant probably because public health measures and inter-city resource sharing take effects. In fact, cities with higher population density have lower transmission rates in the second sub-sample. Regarding the environmental factors, we notice different significant mediating variables across the first and second sub-samples. The transmission rates are lower with adverse weather conditions, lower temperature, or less rain. Further research is needed to identify clear mechanisms. In addition, population flow from Wuhan still poses a risk of new infections for other cities even after we account for the above mediating effects on own-city transmission. This effect is robust to the inclusion of the proximity measures based on economic similarity and geographic proximity between Wuhan and other cities. Nevertheless, we do not find much evidence on between-city transmissions among cities other than Wuhan.

5 Policy response to the COVID-19 outbreak in China

As the 2002–2004 SARS outbreak has shown, non-pharmaceutical interventions (NPIs) or public health measures may decrease or effectively stop the transmission of COVID-19 even without vaccines. Although the effectiveness of a single intervention strategy can be limited, multiple interventions together may generate substantial impacts on containing the spread of the virus. Figure 6 depicts the timeline for a series of policies enacted at the national, provincial, and city levels in China since January 19. After the official confirmation of human-to-human transmission by the Chinese authorities on January 20, China has adopted a variety of NPIs to contain the COVID-19 outbreak. At the national level, COVID-19 was classified as a statutory class B infectious disease on January 20, and prevention and control measures for class A infectious diseases have been taken. Government agencies across the country were mobilized. The Joint Prevention and Control Mechanism of the State Council was established on January 20, and the Central Leadership Group for Epidemic Response was established on January 25. On January 23, National Healthcare Security Administration announced that expenses related to COVID-19 treatments would be covered by the medical insurance and the government if necessary, in order that all COVID-19 cases could be hospitalizedFootnote 19. At the provincial level, 30 provinces declared level I responses to major public health emergencies from January 23 to 25, and all provinces had declared level I responses by January 29Footnote 20. Level I responses in China are designed for the highest state of emergencies. Measures taken include enhanced isolation and contact tracing of cases, suspension of public transport, cancelling public events, closing schools and entertainment venues, and establishment of health checkpoints (Tian et al. 2020). These policies together represent population-wide social distancing and case isolation (Ferguson et al. 2020).

Fig. 6
figure 6

Timeline of China’s public health policies in curtailing the spread of COVID-19

5.1 Policy response to COVID-19 in Hubei Province

Early detection of COVID-19 importation and prevention of onward transmission are crucial to all areas at risk of importation from areas with active transmissions (Gilbert et al. 2020). To contain the virus at the epicenter, Wuhan was placed under lockdown with traffic ban for all residents starting on January 23. The lockdown is not expected to be lifted until April 8. Local buses, subways, and ferries ceased operation. Ride-hailing services were prohibited, and only a limited number of taxis were allowed on road by January 24. Residents are not permitted to leave the city. Departure flights and trains were canceled at the city airport and train stations. Checkpoints were set up at highway entrances to prevent cars from leaving the city. Since January 22, it became mandatory to wear masks at work or in public places.

In addition, all cities in Hubei province implemented the lockdown policy, and most Hubei cities had also adopted measures commensurate with class A infectious diseases by January 28Footnote 21. Residents in those areas were strongly encouraged to stay at home and not to attend any activity involving public gathering.

Health facilities in Wuhan had been extremely overstretched with shortage in medical supplies and high rates of nosocomial infections until February 2 when (1) two new hospitals, i.e., Huoshenshan and Leishenshan, were built to treat patients of COVID-19 with severe symptoms, and (2) 14 makeshift health facilities were converted to isolate patients with mild symptoms and to quarantine people suspected of contracting COVID-19, patients with fever symptoms, and close contacts of confirmed patients. This centralized treatment and isolation strategy since February 2 has substantially reduced transmission and incident cases.

However, stringent public health measures within Hubei province enforced after the massive lockdown may have little to do with virus transmissions out of Hubei province due to the complete travel ban since January 23.

5.2 Reducing inter-city population flows

Quarantine measures have been implemented in other provinces that aim at restricting population mobility across cities and reducing the risk of importing infectionsFootnote 22. Seven cities in Zhejiang, Henan, Heilongjiang, and Fujian provinces had adopted the partial shutdown strategy by February 4 (Fang et al. 2020)Footnote 23. In Wenzhou, most public transportation was shut down, and traffic leaving the city was banned temporarily. On January 21, the Ministry of Transport of China launched level 2 responses to emergencies in order to cooperate with the National Health Commission in preventing the virus spread. On January 23, the Ministry of Transport of China, Civil Aviation Administration of China, and China State Railway Group Company, Ltd. (CSRGC) declared to waive the change fees for flight, train, bus, and ferry tickets that were bought before January 24. Later, the CSRGC extended the fee waiver policy to train tickets that were bought before February 6. By February 2, all railway stations in China had started to monitor body temperature of travelers when they enter and exit the station. Across the whole country, Transportation Departments set up 14,000 health checkpoints at bus and ferry terminals, at service centers and toll gates on highways, monitoring the body temperature of passengers and controlling the inflow of population (World Health Organization 2020b). Recent visitors to high COVID-19 risk areas are required to self-quarantine for 14 days at home or in designated facilities. On February 2, China’s Exit and Entry Administration temporarily suspended the approval and issuance of the travel permits to Hong Kong and Macau.

On January 23, Wuhan Municipal Administration of Culture and Tourism ordered all tour groups to cancel travels to Wuhan. On January 27, the Ministry of Education of China postponed start of the spring semester in 2020, and on February 7, it further announced that students were not allowed to return to school campus without approvals from school.

5.3 Encouraging social distancing in local communities

Recent studies suggest that there is a large proportion of asymptomatic or mild-symptomatic cases, who can also spread the virus (Dong et al. 2020; Mizumoto et al. 2020; Nishiura et al. 2020; Wang et al. 2020a). Thus, maintaining social distance is of crucial importance in order to curtail the local transmission of the virus.

The period from January 24 to 31, 2020, is the traditional Chinese Spring Festival holiday, when families are supposed to get together so that inter-city travel is usually much less. People were frequently reminded by official media (via TV news and phone messages) and social media to stay at home and avoid gathering activities. On January 26, China State Council extended this holiday to February 2 to delay people’s return travel and curtail the virus spread. Nevertheless, economic activities are still supposed to resume after the spring festival, bringing people back to workplaces, which may increase the risk of virus spread.

To help local residents keep social distance and decrease the risk of virus transmissions, many cities started to implement the “closed management of communities” and “family outdoor restrictions” policies since late January (Table 7), encouraging residents to restrict nonessential travels. From January 28 to February 20, more than 250 prefecture-level cities in China implemented “closed management of communities,” which typically includes (1) keeping only one entrance for each community, (2) allowing only community residents to enter and exit the community, (3) checking body temperature for each entrant, (4) testing and quarantining cases that exhibit fever immediately, and (5) tracing and quarantining close contacts of suspicious cases. Meanwhile, residents who had symptoms of fever or dry cough were required to report to the community and were quarantined and treated in special medical facilities. Furthermore, local governments of 127 cities also imposed more stringent “family outdoor restrictions”—residents are confined or strongly encouraged to stay at home with limited exceptions, e.g., only one person in each family may go out for shopping for necessities once every 2 daysFootnote 24. Exit permits were usually distributed to each family in advance and recollected when residents reenter the community. Contacts of those patients were also traced and quarantined. Table 7 summarizes the number of cities that had imposed “closed management of communities” or “family outdoor restrictions” by different dates in February.

Table 7 Number of cities with local quarantine measures by different dates

In order to help inform evidence-based COVID-19 control measures, we examine the effect of these local quarantine measures in reducing the virus transmission rates. Dummy variables for the presence of closed management of communities or family outdoor restrictions are created, and they are interacted with the number of infections in the preceding 2 weeks.

5.4 Assessment of the effects of non-pharmaceutical interventions

Several factors may contribute to the containment of the epidemic. The transmission dynamics may change during the course of this epidemic because of improved medical treatments, more effective case isolation and contact tracing, increased public awareness, etc. Therefore, we have split the sample into two sub-samples, and the estimated coefficients can be different across the sub-samples (Section 4). NPIs such as closed management of communities, city lockdowns, and restrictions on population flow out of areas with high infection risks may also directly affect the transmission rates. While many public health measures are implemented nationwide, spatial variations exist in the adoption of two types of measures: closed management of communities (denoted by closed management) and family outdoor restrictions (denoted by stay at home), which allow us to quantify the effect of these NPIs on the transmission dynamics.

Because most of these local NPIs are adopted in February and our earlier results indicate that the transmission of COVID-19 declines during late January, we restrict the analysis sample to February 2–February 29. We also exclude cities in Hubei province, which modified the case definition related to clinically diagnosed cases on February 12 and changed the case definition related to reduced backlogs from increased capacity of molecular diagnostic tests on February 20. These modifications coincide with the adoption of local NPIs and can significantly affect the observed dynamics of confirmed cases. The adoption of closed management or stay at home is likely affected by the severity of the epidemic and correlated with the unobservables. Additional weather controls that have a good predictive power for these NPIs are selected as the instrumental variables based on the method of Belloni et al. (2016). Details are displayed in Appendix A. The estimation results of OLS and IV regressions are reported in Table 8.

Table 8 Effects of local non-pharmaceutical interventions

We find that closed management and stay at home significantly decrease the transmission rates. As a result of closed management of communities, one infection will generate 0.244 (95% CI, \(-0.366\sim -0.123\)) fewer new infections in the first week. The effect in the second week is also negative though not statistically significant. Family outdoor restrictions (stay at home) are more restrictive than closing communities to visitors and reduce additional infections from one infection by 0.278 (95% CI, \(-0.435\sim -0.121\)) in the first week. The effect in the second week is not statistically significant. To interpret the magnitude of the effect, it is noted that the reproduction number of SARS-CoV-2 is estimated to be around \(1.4\sim 6.5\) as of January 28, 2020 (Liu et al. 2020).

Many cities implement both policies. However, it is not conclusive to ascertain the effect of further imposing family outdoor restrictions in cities that have adopted closed management of communities. When both policies are included in the model, the OLS coefficients (column (5)) indicate that closed management reduces the transmission rate by 0.547 (95% CI, \(-0.824\sim -0.270\)) in the first week, and by 0.259 (95% CI, \(-0.485\sim -0.032\)) in the second week, while the additional benefit from stay at home is marginally significant in the second week (− 0.124, 95% CI, \(-0.272\sim 0.023\)). The IV estimates indicate that closed management reduces the transmission rate in the first week by 0.193 (95% CI, \(-0.411\sim 0.025\)), while the effect in the second week and the effects of stay at home are not statistically significant. Additional research that examines the decision process of health authorities or documents the local differences in the actual implementation of the policies may offer insights into the relative merits of the policies.

We further assess the effects of NPIs by conducting a series of counterfactual exercises. After estimating (3) by 2SLS, we obtain the residuals. Then, the changes in yct are predicted for counterfactual changes in the transmission dynamics (i.e., coefficients \(\alpha _{\text {within},\tau }^{k}\)) and the impositions of NPIs (i.e., \(\bar {h}_{ct}^{k\tau }\), and the lockdown of Wuhan \(\bar {m}_{c,\text {Wuhan},t}^{k\tau }\)). In scenario A, no cities adopted family outdoor restrictions (stay at home). Similarly, in scenario B, no cities implemented closed management of communities. We use the estimates in columns (2) and (4) of Table 8 to conduct the counterfactual analyses for scenarios A and B, respectively. In scenario C, we assume that the index of population flows out of Wuhan after the Wuhan lockdown (January 23) took the value that was observed in 2019 for the same lunar calendar date (Fig. 3), which would be plausible had there been no lockdown around Wuhan. It is also likely that in the absence of lockdown but with the epidemic, more people would leave Wuhan compared with last year (Fang et al. 2020), and the effect would then be larger. In scenario D, we assume that the within-city transmission dynamics were the same as those observed between January 19 and February 1, i.e., the coefficient of 1-week lag own-city infections was 2.456 and the coefficient of 2-week lag own-city infections was − 1.633 (column (4) of Table 4), which may happen if the transmission rates in cities outside Hubei increased in the same way as those observed for cities in Hubei. Appendix C contains the technical details on the computation of counterfactuals.

In Fig. 7, we report the differences between the predicted number of daily new cases in the counterfactual scenarios and the actual data, for cities outside Hubei province. We also report the predicted cumulative effect in each scenario at the bottom of the corresponding panel in Fig. 7. Had the transmission rates in cities outside Hubei province increased to the level observed in late January, by February 29, there would be 1,408,479 (95% CI, \(815,585\sim 2,001,373\)) more cases (scenario D). Assuming a fatality rate of 4%, there would be 56,339 more deaths. The magnitude of the effect from Wuhan lockdown and local NPIs is considerably smaller. As a result of Wuhan lockdown, 31,071 (95% CI, \(8296\sim 53,845\)) fewer cases would be reported for cities outside Hubei by February 29 (scenario C). Closed management of communities and family outdoor restrictions would reduce the number of cases by 3803 (95% CI, \(1142\sim 6465\); or 15.78 per city with the policy) and 2703 (95% CI, \(654\sim 4751\); or 21.98 per city with the policy), respectively. These estimates, combined with additional assumptions on the value of statistical life, lost time from work, etc., may contribute to cost-benefit analyses of relevant public health measures.

Fig. 7
figure 7

Counterfactual policy simulations. This figure displays the daily differences between the total predicted number and the actual number of daily new COVID-19 cases for each of the four counterfactual scenarios for cities outside Hubei province in mainland China. The spike on February 12 in scenario C is due to a sharp increase in daily case counts in Wuhan resulting from changes in case definitions in Hubei province (see Appendix B for details)

Our counterfactual simulations indicate that suppressing local virus transmissions so that transmission rates are kept well below those observed in Hubei in late January is crucial in forestalling large numbers of infections for cities outside Hubei. Our retrospective analysis of the data from China complements the simulation study of Ferguson et al. (2020). Our estimates indicate that suppressing local transmission rates at low levels might have avoided one million or more infections in China. Chinazzi et al. (2020) also find that reducing local transmission rates is necessary for effective containment of COVID-19. The public health policies announced by the national and provincial authorities in the last 2 weeks in January may have played a determinant role (Tian et al. 2020) in keeping local transmission rates in cities outside Hubei at low levels throughout January and February. Among the measures implemented following provincial level I responses, Shen et al. (2020) highlight the importance of contact tracing and isolation of close contacts before onset of symptoms in preventing a resurgence of infections once the COVID-19 suppression measures are relaxed. We also find that travel restrictions on high-risk areas (the lockdown in Wuhan), and to a lesser extent, closed management of communities and family outdoor restrictions, further reduce the number of cases. It should be noted that these factors may overlap in the real world. In the absence of the lockdown in Wuhan, the health care systems in cities outside Hubei could face much more pressure, and local transmissions may have been much higher. In China, the arrival of the COVID-19 epidemic coincided with the Lunar New Year for many cities. Had the outbreak started at a different time, the effects and costs of these policies would likely be different.

6 Conclusion

This paper examines the transmission dynamics of the coronavirus disease 2019 in China, considering both within- and between-city transmissions. Our sample is from January 19 to February 29 and covers key episodes such as the initial spread of the virus across China, the peak of infections in terms of domestic case counts, and the gradual containment of the virus in China. Changes in weather conditions induce exogenous variations in past infection rates, which allow us to identify the causal impact of past infections on new cases. The estimates suggest that the infectious effect of the existing cases is mostly observed within 1 week and people’s responses can break the chain of infections. Comparing estimates in two sub-samples, we observe that the spread of COVID-19 has been effectively contained by mid February, especially for cities outside Hubei province. Data on real-time population flows between cities have become available in recent years. We show that this new source of data is valuable in explaining between-city transmissions of COVID-19, even after controlling for traditional measures of geographic and economic proximity.

By April 5 of 2020, COVID-19 infections have been reported in more than 200 countries or territories and more than 64,700 people have died. Behind the grim statistics, more and more national and local governments are implementing countermeasures. Cross border travel restrictions are imposed in order to reduce the risk of case importation. In areas with risks of community transmissions, public health measures such as social distancing, mandatory quarantine, and city lockdown are implemented. In a series of counterfactual simulations, we find that based on the experience in China, preventing sustained community transmissions from taking hold in the first place has the largest impact, followed by restricting population flows from areas with high risks of infections. Local public health measures such as closed management of communities and family outdoor restrictions can further reduce the number of infections.

A key limitation of the paper is that we are not able to disentangle the effects from each of the stringent measures taken, as within this 6-week sampling period, China enforced such a large number of densely timed policies to contain the virus spreading, often simultaneously in many cities. A second limitation is that shortly after the starting date of the official data release for confirmed infected cases throughout China, i.e., January 19, 2020, many stringent measures were implemented, which prevents researchers to compare the post treatment sub-sample with a pre treatment sub-sample during which no strict policies were enforced. Key knowledge gaps remain in the understanding of the epidemiological characteristics of COVID-19, such as individual risk factors for contracting the virus and infections from asymptotic cases. Data on the demographics and exposure history for those who have shown symptoms as well as those who have not will help facilitate these research.