1 Introduction

The number of new positive cases (also known as positive swabs) reported by the official agencies is a critical epidemiological dataset for public health issues such as the actual number of people infected or the impact of vaccination. The number of new positive cases, which is only an aggregate of positive swabs, was updated daily during the COVID-19 pandemic. The socioeconomic impact of these figures was considerably significant. For example, in March 2020, the New York stock market crashed at least three times. Most people were concerned about a pandemic of an unknown disease, reacted sensitively, and were sometimes misled by the number of new positive cases.

The World Health Organization (WHO) initially published a daily situation report, however, they have been publishing a weekly report since August 16, 2020.Footnote 1 Meanwhile, the Ministry of Health, Labor and Welfare of Japan (MHLW) still provides daily figures.Footnote 2 In addition, the news media continue to report these figures daily as if they were reporting the exchange rate between the U.S. dollar and the Japanese yen.

In Japan, the daily confirmed cases of COVID-19 show periodic changes in time series that repeat every week. For example, the number of new positive cases reported each Monday is relatively small, and it started to be widely mentioned after the second outbreak from July to August 2020. For example, in October 2020, Asahi Shimbun (Nagano 2020) reported this in Tokyo. In this paper, we consider the problem of the weekly periodicity in the daily data of COVID-19 in Japan.

We briefly examine the presence of periodicity in Japan by examining the national average of daily cases for the entire period from January 16, 2020, to June 15, 2021 (see Table 1). The last column shows the difference between the average of daily cases for the entire period and the average for each day of the week. Regarding periodicity within the week, Li (2020) examined time series data on newly confirmed cases of COVID-19 and discussed weekly recurrence. The author analyzed countries worldwide using country-level data and found autocorrelation with a 7-day lag. Cappi et al. (2022) also estimated annual seasonality using country-level data from 30 countries.

Table 1 National average for each day of the week (rounded down to the nearest whole number)

Our study analyzes in detail how this periodicity varies across regions. In Japan, public health management is performed mainly by local governments of the 47 prefectures. The central government has delegated the counting of confirmed cases to these local governments. Therefore, the effect is expected to be different for each prefecture.

Our analysis assumes a congestion-related delay in reporting information about positive cases of COVID-19. Shim et al. (2020) and Harris (2022) reported similar cases in South Korea and New York, respectively. Sachs et al. (2022) noted the importance of a reporting system. Tariq et al. (2019) also stated a delay in reporting an Ebola virus case due to the murder of an administrative official.

Our study focuses on reporting delays because of congestion caused by weekend bottlenecks. We refer to this as the “weekend effect.” Bonifazi et al. (2021) noted a similar effect. We analyze a lag structure that shows how congestion that occurs over the weekend affects the number of new confirmed cases at the beginning of the following week. We use data on the number of new positives reported by the prefectures.

By contrast, delay or lag is usually discussed in relation to the serial interval between infection and symptom onset, which is essential to estimate the epidemiological curve and the effective reproductive number; see, for example, Cori et al. (2013), Nishiura et al. (2020), Ali et al. (2020). Furthermore, since the beginning of the COVID-19 pandemic, numerous studies have used the number of positive cases as primary data to estimate the real number of infected; see, for example, Bassi et al. (2021).

However, our study does not aim to estimate the number of infections, but rather to describe the geographical heterogeneity found in the time series of reported positive cases.

Our result shows that the weekend effect was detected only in some of the most populous prefectures. In about half of the prefectures, the weekend effect was not detected because congestion did not occur due to the low number of reports. In other words, what appeared to be a nationwide event was observed only in the most populous prefectures. In the discussion section, we propose a hypothetical structure to explain the reporting delay that occurred on weekends. We also interpret the estimation results based on this structure.

The remainder of this paper is organized as follows. Section 2 discusses the data and methods used in subsequent sections. In Sect. 3, we demonstrate the weekend effect by applying the national-scale data and then present the analysis results for each prefecture. In Sect. 4, we discuss the implications of our results.

2 Data and method

2.1 Data

We use daily data on the number of new positive cases by prefecture obtained from the Nippon Hoso Kyokai, a.k.a. NHK (the Japan Broadcasting Corporation) website (https://www3.nhk.or.jp/news/special/coronavirus/), (https://www3.nhk.or.jp/news/special/coronavirus/data/). Data spanned 517 days from January 16, 2020, to June 15, 2021, \(t=1,2,\ldots ,517\).

Through news reports, many people in Japan were informed of the number of new positives. NHK is the only public corporation with a special broadcasting status according to Chapter III of the Broadcasting Act by the Japanese government (https://www.japaneselawtranslation.go.jp/ja/laws/view/2954).Footnote 3 Therefore, we used NHK’s summary data. The “Number of Newly Confirmed Cases” published on the NHK website was aggregated as follows: Until September 27, 2022, NHK collected the number of cases announced by the prefectures through NHK’s regional stations in each prefecture and made it public as the “NHK Summary” on the NHK website.

2.2 Method

We now assume a data-generating process to estimate the delay. It is well known that biological count data tend to be over-dispersed; see Zuur et al. (2010). To handle such an over-dispersion, we assume that the newly confirmed positive cases of the tth day, \(Y_t\) follow a negative binomial distribution.Footnote 4

In our model, the probability function of \(Y_t\) is defined as

$$\begin{aligned} P\left( Y_t=y\mid \mu _t,\theta \right) =\frac{\Gamma \left( \theta +y\right) }{\Gamma \left( \theta \right) \Gamma \left( y+1\right) }\left( \frac{\theta }{\theta +\mu _t}\right) ^\theta \left( \frac{\mu _t}{\theta +\mu _t}\right) ^y,\ y=0,1,2,\ldots \end{aligned}$$
(1)

where \(\mu _t\) is a deterministic parameter that varies over time and \(\theta >0\).

As the random variable \(Y_t\) has the conditional mean and variance

$$\begin{aligned} E\left( Y_t\mid \mu _t,\theta \right)&=\mu _t, \end{aligned}$$
(2)
$$\begin{aligned} V\left( Y_t\mid \mu _t,\theta \right)&=\mu _t + \frac{1}{\theta }\mu _t^2, \end{aligned}$$
(3)

it is clear that \(\theta >0\) is the condition of over-dispersion and \(\theta ^{-1}\) is an over-dispersion parameter. If \(\theta\) goes to infinity, the distribution converges to the Poisson distribution.

As shown in Table 1, the number of positive cases is lower on the first three days of the week and higher on the remaining days. We focus on weekends and the two following days. Therefore, we examine the effects of Saturday, Sunday, Monday, and Tuesday.

To evaluate the effect of the weekdays, we define the dummy variable

$$\begin{aligned} \textrm{MON}_t= {\left\{ \begin{array}{ll} 1 &{} \text {if the }t\text { th day is Monday,}\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(4)

\(\textrm{TUE}_t\), \(\textrm{SAT}_t\), and \(\textrm{SUN}_t\) are defined similarly. We assume that the natural logarithm of \(\mu _t\) is a linear combination of weekday dummies and a time trend t,

$$\begin{aligned} \log \mu _t = \alpha + \beta _1 \textrm{MON}_t + \beta _2 \textrm{TUE}_t + \beta _3 \textrm{SAT}_t + \beta _4 \textrm{SUN}_t + \phi t. \end{aligned}$$
(5)

Note that \(\phi\) represents the compound daily growth rate of the conditional mean \(\mu _t\).Footnote 5

As the prefectural data series of daily new cases contain a certain number of zeros in some prefectures, we modify our model to handle such data.

The probability function of zero-inflated negative binomial regression (ZINB) is defined asFootnote 6

$$\begin{aligned} P^*\left( X_t=x\mid \mu _t,\theta ,\pi _t\right) = {\left\{ \begin{array}{ll} \pi _t+\left( 1-\pi _t\right) P\left( Y_t=0\mid \mu _t,\theta \right) &{} x=0\\ \left( 1-\pi _t\right) P\left( Y_t=x\mid \mu _t,\theta \right) &{} x=1,2,\ldots . \end{array}\right. } \end{aligned}$$
(6)

where

$$\begin{aligned} \log \left( \frac{\pi _t}{1-\pi _t}\right) =\gamma +\beta _1\textrm{MON}_t+\beta _2\textrm{TUE}_t+\beta _3\textrm{SAT}_t+\beta _4\textrm{SUN}_t +\psi t. \end{aligned}$$
(7)

Note that the probability \(\pi _t\) is a deterministic parameter that varies over time. We estimate all 12 parameters in (5), (7), and \(\theta\) using the maximum likelihood method,Footnote 7 then select the variables in (5) and (7) via a stepwise approach based on Akaike’s information criterion (AIC). In our approach, due to computational complexity limits, we select and delete the same variables in both (5) and (7).

3 Results

We first demonstrate our result in national-scale data (see Table 2). The column called “count” reports the estimates of coefficients in (5) and \(\theta\). The column called “zero” also presents estimates of the coefficients in (7).

As shown in Table 2, we obtain the selected model that contains the following variables: constant term, Monday dummy, and time trend. The results indicate that the number of confirmed cases was significantly lower on Monday. This result shows that when the number of cases inspected decreases on Sunday, the number of cases confirmed on Monday decreases through the lag. On a national scale, no effect on Tuesdays implies that the delay appears to end within a day on average.

Table 2 Nationwide result

We determined that the number of confirmed cases was lower on Mondays nationally. We examine the differences between the prefectures. Specifically, we identify how the impact of the weekend affects the decrease in the number of new confirmed cases through the delay. We estimate our model independently by prefecture, which means that we do not consider spatial autocorrelation between prefectures. For more detailed results of the estimates for all prefectures, see Tables 4, 5, 6, 7, 8, 9, 10, 11 in the Appendix.

4 Discussion

We analyze the geographical heterogeneity of reporting lags found in the daily data of the number of positive cases announced by each prefecture. To begin, we provide a hypothetical structure between the number of new positive cases that decreased over the weekend and that of new confirmed cases that decreased at the beginning of the week.

We focus on the decline in new cases on weekends. Many hospitals and government offices are closed on Saturdays and Sundays. Therefore, the number of confirmed cases will likely be lower on weekends. Additionally, the closure of government offices leads to delays in aggregating the number of new positive cases reported. Thus, we observe the phenomenon of fewer new confirmed cases at the beginning of the week, which implies a lag structure that explains how the decrease in the number of new confirmed cases over the weekend leads to a decrease in the number of new confirmed cases, for example, on Monday. If no lag is detected, the daily release reflects the current inspection result without delay.

We discuss that the weekend effect in this study can be explained by the structure described above. In subsequent analyses, we interpret the estimates of the coefficients as some patterns of the structure.

Let x be the decrease in the number of inspections over the weekend. The decrease in the number of new positives reflected the next day or later is denoted by \(x^{\text {Confirmed}}_{\text {Inspected}}\). The superscript “Confirmed” indicates the day of the week reflected in the official announcement, and the subscript “Inspected” indicates the day of inspection. We assume that it takes up to two days for the official reported number of new confirmed cases, \(x^{\text {Confirmed}}\), to reflect \(x_{\text {Inspected}}\). Thus, we have \(x_{Sat} = x^{Sun}_{Sat} + x^{Mon}_{Sat}, x_{Sun} = x^{Mon}_{Sun}+x^{Tue}_{Sun}\).

We provide an example of the impact of a decrease in the number of inspections over the weekend on the decrease in the number of confirmed cases through a lag under the structure currently assumed. We assume that there would be even fewer inspections on Sundays than on Saturdays. As an example, we give \(x_{Sat} = -10, x_{Sun} = -20\). Without delay, we obtain \(x_{Sat} = x^{Sat}_{Sat} = x^{Sat} = -10, x_{Sun} = x^{Sun}_{Sun} = x^{Sun} = -20\). We call this the no delay.

Next, we define the one-day delay as the reporting delay that ends in one day, with two examples provided. Suppose that \(x_{Sat} = x^{Sun}_{Sat} = -10\) and \(x_{Sun} = x^{Mon}_{Sun} = -20\), then we have \(x^{Sun} = x^{Sun}_{Sat}=-10, x^{Mon} = x^{Mon}_{Sun} = -20\). In this example, there will be a decrease in the number of new confirmed cases of \(-10\) on Sunday and \(-20\) on Monday. In another case, suppose \(x_{Sat} = x^{Sat}_{Sat} + x^{Sun}_{Sat} = \left( -5\right) + \left( -5\right) = -10\) and \(x_{Sun} = x^{Sun}_{Sun} + x^{Mon}_{Sun} = \left( -10\right) + \left( -10\right) = -20\), then we have \(x^{Sat} = -5, x^{Sun} = x^{Sun}_{Sat} + x^{Sun}_{Sun} = \left( -5\right) + \left( -10\right) = -15, x^{Mon} = x^{Mon}_{Sun} = -10\). In the latter example, there will be a decrease in the number of new confirmed cases of \(-5\) on Saturday, \(-15\) on Sunday, and \(-10\) on Monday.

Next, we define the two-day delay, where the delay takes two days to end, with an example provided. Suppose that \(x_{Sat} = x^{Sun}_{Sat} + x^{Mon}_{Sat} = \left( -5\right) + \left( -5\right) = -10\) and \(x_{Sun} = x^{Mon}_{Sun} + x^{Tue}_{Sun} = \left( -10\right) + \left( -10\right) = -20\); we then have \(x^{Sun} = x^{Sun}_{Sat}=-5, x^{Mon} = x^{Mon}_{Sat} + x^{Mon}_{Sun} = -15\), and \(x^{Tue} = x^{Tue}_{Sun} = -10\). In this example, there will be a decrease in the number of new confirmed cases of \(-5\) on Sunday, \(-15\) on Monday, and \(-10\) on Tuesday. We can also consider an example where reporting delays are concentrated only on Tuesday through the two-day delay.

Moreover, \(x_{Sat} = x_{Sun} = 0\) indicates a constant average of inspections within the week. In other words, we cannot identify whether there is a reporting delay. We call this the constant average.

We summarize the relationship between the estimates and the delay structures in Table 3. We identify the delays in which the coefficients of the weekday dummies are negative and the p-value is less than 0.05 (\(p < 0.05\)).

Table 3 Delay structures

First, we discuss the one-day delay. Three groups in prefectures with one-day delay were observed. The first is the one-day delay 3, in which the impact through the lag was observed only on Mondays and not Tuesdays. These are the 13 prefectures: Miyagi, Tochigi, Saitama, Chiba, Tokyo, Kanagawa, Ishikawa, Gifu, Aichi, Osaka, Hyogo, Nara, and Okinawa. To these prefectures, we can apply the same interpretation as the trends observed on a nationwide scale. The second is one-day delay 2, where the coefficients on Sundays and Mondays are significantly negative. These are the three prefectures: Yamanashi, Shiga, and Shimane. The third is the one-day delay 1, Niigata.

Second, we discuss six prefectures where the delay was detected on Monday and Tuesday: Fukushima, Ibaraki, Gunma, Shizuoka, Okayama, and Fukuoka. We observe that the reduced number of inspections over the weekend affected both Monday and Tuesday.

These two results may also indicate that a large population causes a delay in the aggregation of data for official publication. The population of each prefecture included in the three groups (the one-day delay 1, 3 and the two-day delay 2) was roughly 2 million or more. This figure is the population threshold set by the Law concerning the Establishment of Special Wards in Metropolitan Areas (Act No.80, 2012).Footnote 8 The exceptions are Ishikawa and Okinawa Prefectures, with a population of less than 1.5 million. Furthermore, the three prefectures with the one-day delay 2 group are also small prefectures with a population of less than 2 million.

Third, Hokkaido is the only case of the two-delay 3 where the impact on the weekend is observed only on Tuesday, and all dummy variables remain. However, other dummy variables are insignificant except for the \(\textrm{TUE}\) dummy variable. Hokkaido has a population of more than 5 million. Additionally, Hokkaido is the largest prefecture in Japan, accounting for 20\(\%\) of the total area of Japan. The fact that Hokkaido is exceptionally large in population and size affects the results.

Fourth, we discuss the other prefectures: Aomori, Iwate, and Mie. Mie is the only case of the two-delay 1. The result of Aomori indicates that this prefecture reported new positive cases almost without delay, even on weekends. This case is the without delay 2. The number of confirmed cases in Iwate Prefecture was significantly higher on Saturdays and lower on Mondays, a trend observed only in Iwate. In addition to the operation on weekends, the low number of reported cases of the infection itself may have been related to this trend in these prefectures.

Finally, we discuss the prefectures for which we could not obtain any significant dummy variable, the constant average. In our analysis, the constant average corresponds to the case where none of the day-of-week dummies was finally selected. These 20 prefectures are Akita, Yamagata, Toyama, Fukui, Nagano, Kyoto, Wakayama, Tottori, Hiroshima, Yamaguchi, Tokushima, Kagawa, Ehime, Kochi, Saga, Nagasaki, Kumamoto, Oita, Miyazaki, and Kagoshima.

These results are summarized in Table 12 except for the Iwate Prefecture.

Here, we examine whether the conditional mean of confirmed cases was significantly less on Mondays, based on the selected model. We consider it if the coefficient of the \(\textrm{MON}\) dummy variable is negative and the p-value is less than 0.05 (\(p < 0.05\)). This definition includes both the one-day delay and the two-day delay. We identified this trend in 25 prefectures. By contrast, we did not find any trend in 22 prefectures.

5 Conclusion

The daily announcement of the number of positive cases of COVID-19 had a major socioeconomic impact. The weekly periodicity of these time series data is well-known in Japan. We assume that this periodicity is generated by changes in the timing of reporting due to bottlenecks. Our analysis focuses on the presence or absence of the weekend effect in each prefecture. We examine the dummy variables for the day of the week and the linear trend term \(\phi t\). In addition, we use a zero-inflated model, which allows us to simultaneously analyze periods of extremely low and relatively high numbers of reported cases. We find that the degree of congestion varies between prefectures. Furthermore, we discuss in detail how this reporting delay varies between prefectures.

Our results indicate the presence of the weekend effect in prefectures with large populations, including Japan’s three largest metropolitan areas (Tokyo, Osaka, and Nagoya). It can be interpreted that the number of new positives was simply higher in the more populated prefectures, thus causing the weekend effect.

Next, our results suggest a bottleneck in the administrative work of counting the number of new positive cases. In Japan as of January 2023, the Act on the Prevention of Infectious Diseases and Medical Care for Patients with Infectious Diseases (https://www.japaneselawtranslation.go.jp/en/laws/view/2830)Footnote 9 requires prefectures to report all cases of COVID-19 immediately. If new positive cases are confirmed daily, prefectures report daily confirmed cases. Basically, whether to report immediately or not depends on the classification of an infectious disease according to the law mentioned above.Footnote 10 Additionally, the information required by MHLW is defined by a uniform national standard. Thus, local governments cannot improve the current situation.

Finally, our results imply that smart administrative work is necessary to reduce congestion. Furthermore, it should consider geographical heterogeneity, as congestion occurred in only about half of the prefectures. Therefore, it would be better to concentrate on prefectures with a population of more than 2 million when improving the reporting process.