1 Introduction

A novel coronavirus disease (COVID-19) was reported in Wuhan (China) in December 2019 (Li et al. 2020a, b; Wu et al. 2020). It is known that coronavirus was transmitted through human-to-human by touching, coughing, sneezing, etc., with rapid speed (Wang et al. 2020). The COVID-19 disease spread throughout the world in a brief period and all people were under its threat. According to the World Health Organization (WHO) reports, more than 32.7 million people are infected, with 991,000 fatalities until 27th September 2020 (WHO 2020).

Environmental conditions affect the spread of respiratory diseases, such as influenza and severe acute respiratory syndrome (SARS) viruses (Tamerius et al. 2013). There have been number of studies that are attempted by using statistics and association analysis to identify the environmental factors and air pollutants that influence the pandemic fatality rate (Li et al. 2020a, b). They reported that NO2, aerosol optical depth (AOD) and population density are significant variables in determining the COVID-19 fatality rate. The recent studies show significant correlations between meteorological parameters (Li et al. 2020a, b; Ma et al. 2020; Muhammad et al. 2020; Sahin et al. 2020; Sajadi et al. 2020; Tosepu et al. 2020; Zhu et al. 2012). A positive correlation between mean temperature and COVID-19 cases was reported in China and Indonesia (Tosepu et al. 2020; Wang et al. 2020; Zhu et al. 2020). However, other studies have stated contradictory results that weather conditions may not be associated with the COVID-19 pandemic (Jamil et al. 2020; Shi et al. 2020). A negative correlation was also observed between temperature and COVID-19 transmission based on the daily weather (Shi et al. 2020). Apart from environmental conditions playing a role in the pandemic, there have been numerous studies that discussed the economic strata of a population in determining the protection against the virus (e.g. Osayomi et al. 2021).

The exact information about the impact of weather conditions on the COVID-19 pandemic is still unclear, specifically in tropical countries. Studies at a local level will help improve our understanding of weather's effect on the virus's spread (Gupta et al. 2020a). India being the second-most populous country in the world, the virus has spread to a great extent in the country. The first COVID-19 case in India was reported on 30th January in Kerala and from March, COVID-19 cases were found in different parts of the county. To control the spread of the infection, India was under lockdown from 25th March 2020 and it extended up to 31st May 2020, after that which the unlock phases were launched wherein part by part activities were resumed. India had the second largest number of confirmed cases worldwide, 5,992,532 as of 27th September (WHO). However, India's fatality rate is low (2.28%) as compared to the global value (3%). Many studies were modelling the effect of COVID-19 infection and meteorological parameters (Gupta et al. 2020a; Ma et al. 2020; Shi et al. 2020) and have depicted controversial correlation in different studies (Zhu et al. 2020). Principal component analysis (PCA) is a famous multivariate approach that converts several correlated variables into several linearly uncorrelated variables named principal components (PCs) (Mahmoudi et al. 2021). The transformed variables retain the variance of the original data, but it is expressed through a fewer number of variables. The new variables are expressed as a linear combination of original variables which is referred to as the PCs (Wilks 2011).

This paper provides the variation of meteorological parameters such as temperature, relative humidity and absolute humidity in 6 months of the COVID-19 pandemic and exploring the correlations between meteorological parameters and COVID19 cases in four megacities of India viz. Delhi, Mumbai, Pune and Ahmedabad, which had the high number of confirmed COVID-19 cases. However, among these 4 cities, the maximum number of cases was recorded in Mumbai followed by Delhi. This study spans over a long period including the lockdown phase and the unlock phases the cities faced and thereby the variability in the COVID-19 cases can be understood and a probable way to inspect the spread of the disease amongst the population in these cities. This kind of information is essential to show whether tropical climates are less or more favourable to the spread of the COVID-19. The longer span of the analysis considered here covers two prominent seasons of summer and monsoon in these cities, which is when the COVID-19 cases saw a sharp increase. Thus, analyzing the possible effects of meteorology on the spread of COVID-19 will further enhance our understanding of the disease's spread.

2 Study Area and Data and Methodology

We focus on the four metro cities of India with different metrology and climate. Mainly in the cities having highest COVID-19 cases. The COVID-19 data had been collected from all four cities for the period of 1st April 2020 to 30th September 2020. Data has been collected from April as cases started rising all over India and up to September 2020 because the highest peak of COVID-19 was observed in the month of September in India. Details of cumulative cases in selected cities are summarized in Fig. 1. The data of the daily COVID-19 cases of new infections were collected from the daily reports of the municipal corporations of Mumbai (https://stopcoronavirus.mcgm.gov.in/), Delhi (https://delhifightscorona.in/), Pune (https://punecoronatracker.in/) and Ahmedabad (https://ahmedabadcity.gov.in/). All India state-wise COVID-19 cases data are obtained from the Ministry of Health Govt. of India (https://www.mohfw.gov.in/). The data of city population and population density are as per the 2011 census (https://www.census2011.co.in/). To evaluate the COVID-19 impact, daily time-series data during the different lockdown and three unlock phases across the four cities. Metropolitan region air quality and meteorological data, viz. temperature, relative humidity, wind speed, etc. monitored by a pilot project System of Air Quality and Weather Forecasting and Research (SAFAR) ( Beig et al. 2021; Beig et al. 2020; Beig et al. 2015) is used to carried out the analysis in this study. The SAFAR program has 8–10 monitoring stations in each city and the average of these is taken as the representative of the whole town. Exploratory analysis of the data for each city under consideration has been conducted. The meteorological variables considered for this study are temperature and relative humidity. These meteorological parameters were paired with the COVID-19 cases to understand the linkages between the two and how the COVID-19 spread is affected by the weather parameters. We also calculated the absolute humidity, which describes the actual amount of water vapour in the atmosphere, depending on the air temperature (in °C) and relative humidity (in %) (Auler et al. 2020). This variable is used by many other studies pertaining to COVID-19 modelling (Li et al. 2020a, b; Zhu et al. 2020). The absolute humidity (AH; in g m−3) is the weight of water vapour per unit volume of air and was estimated using the Clausius–Clapeyron equation and can be described as follows (Gupta et al. 2020b).

Fig. 1
figure 1

The number of COVID-19 cases in different Indian states and the study locations pertaining to this study

$$\mathrm{AH}=\frac{6.112\times \mathrm{e}\left(\frac{17.67\times T}{T+243.5}\right)\times \mathrm{RH}\times 2.1674}{273.15+T}.$$

Linear regression was carried out between the various meteorological variables and the daily COVID-19 cases to understand the effect of meteorological parameters on the spread of COVID-19 at each of the locations. To further understand the impact of all the meteorological variables with the COVID-19 cases at different cities in India having varied geographical conditions, we used PCA. PCA is a technique used to emphasize the variation and bring out the significant patterns of the data. PCA is mainly used for reducing the dimensionality of a dataset that contains variables that are interrelated by transforming it into a new set of uncorrelated independent variables called the PCs (Jollife and Cadima 2016). The PCs are eigenvectors of a correlation matrix or a covariance matrix and every PC extracts a share of the total variance of the dataset. The initial PCs contain the maximum information of the dataset.

The PCs are further rotated by PCs varimax rotation to obtain a better relationship between the variables and the original dataset. When varimax rotation is performed it ensures that each variable is maximally correlated with one single component and has zero association with other components (Dominick et al. 2012). The factor loadings obtained after the rotation are important as they depict the contribution of specific variables to each PC.

3 Results and Discussion

3.1 Spread of COVID-19

India registered its first confirmed case of COVID-19 on 30th January 2020 in Kerala after that, from the first week of April onwards, the number of COVID-19 positive cases rapidly increased in different parts of the country. The highest peak has been recorded on 17th September 2020 (covid19.who.int). Initially, when the COVID-19 cases started rising in the country it was mainly attributed due to the people who had travelled to nations where COVID-19 cases were found prior to being found in India; essentially there was no transmission within the country. In the beginning of the pandemic, it was found that India was managing well with a low number of COVID-19 cases because of the constricted spread during the lockdown and social distancing measures (Paital et al. 2020), however, with the beginning of the unlocking phases, (1st June 2020 onward) India experienced a total of 190,648 confirmed case and 5407 deaths due to COVID-19 (Ghosh et al. 2020).

Figure 1 shows the cumulative COVID-19 cases up to 30th September 2020 in the various states of India. It also shows the COVID-19 cases per lakh at the four metro cities considered in this study. The highest number of cases is observed in the state of Maharashtra accounting for 21% of the total country’s confirmed COVID-19 cases. In addition, the number of cases per lakh of the population shows that the city of Pune has the highest cases of 2890 despite the lowest population, followed by Mumbai at 1679. This can be attributed to many reasons ranging from increased testing to more spread of the virus or also other reasons related to meteorology, which we have analyzed in this paper to a certain extent.

3.2 Daily Variability of Meteorological parameters

Figure 2 shows the daily variations of average temperature, relative humidity and absolute humidity along with the daily COVID-19 cases in the four cities. The daily cases started increasing from April end in Delhi (Fig. 2a) and Pune (Fig. 2c). Delhi showed a significant increase in the daily number of cases after May and it reached a high of 3947 cases per day in June, but this trend decreased by mid-July. Again, a sudden spike had been observed in September with the highest number of COVID cases (4473) and it is higher among the four cities. However, in Mumbai, the number of cases began increasing considerably from mid-April (Fig. 2b). Pune shows an increase in cases, but at a lower magnitude as compared to Delhi. Ahmedabad reported cases from April beginning (Fig. 2d). The highest cases were observed from April to June, but the numbers of COVID cases are very less (200–300) as compared to other cities.

Fig. 2
figure 2

Daily variation in temperature, RH, AH and daily new COVID—19 cases in Delhi (a), Mumbai (b), Pune (c) and Ahmedabad (d) from 1st April to 31st Aug 2020

When we observe Mumbai and Pune's average temperature, it does not show a considerable variation throughout the study period. However, in Delhi, during the beginning of the study period, the temperature was lower and then further from April, it started increasing and peaking at the end of May. Ahmedabad had higher temperatures during April–May. The relative humidity is higher in Mumbai as it is a coastal city throughout the study period. In the other cities, the relative humidity increases with the monsoon's onset in June and it remains so till the end of August. The absolute humidity also follows a similar pattern, but it is observed that it shows significantly less variability in Pune and Mumbai throughout the period. Table 1 indicates the descriptive statistics for daily positive cases of COVID-19 and meteorological variables. It includes the average values of the whole period considered during the study, median, standard deviation and kurtosis of the dataset at the four locations. The mean maximum temperature is highest in Ahmedabad and lowest in Mumbai, whereas the mean minimum temperature is highest at Ahmedabad and lowest at Delhi.

Table 1 The descriptive statistics of the data considered for this study

The study period considered, includes two seasons summer and monsoon. The summer season extends from April to May and monsoon season is from June to September. It is observed that the COVID-19 cases were higher during the monsoon season in Pune, Mumbai and Delhi as compared to the summer season. Rainfall is a characteristic feature of the tropical area. These four cities receive maximum rainfall in the monsoon season due to the south-westerly monsoon flows. In monsoon season, Mumbai received highest cumulative rainfall (2730 mm), followed by Pune (790 mm), Ahmedabad (686 mm) and Delhi (342 mm).

3.3 Multivariate Analysis

3.3.1 Linear Regression Analysis

Figure 3 shows linear regression between daily COVID-19 cases and the different meteorological parameters. The Pearson’s correlation coefficient has been calculated. The shaded region indicates the confidence interval (CI) of the regression coefficient. The CI is calculated at 5% level of significance. In Mumbai, daily cases and AH showed a correlation of 0.42. In Pune and Mumbai, the regression between daily cases and the temperature is negative (− 0.74 and − 0.30). However, two other cities Ahmedabad and Delhi show a positive correlation with temperature but it is insignificant and has a very poor correlation of 0.38 and 0.2 respectively. The linear regression between daily COVID-19 cases and RH shows a high positive correlation at Pune of 0.70, where the RH is 52% and the other two cities show a significant positive correlation. Ahmedabad shows a poor correlation between daily cases and RH and very poor correlation with AH. Pune shows very strong correlation 0.70 between daily cases and RH which is the highest correlation value as compared to the other parameters and cities.

Fig. 3
figure 3

Linear regression between daily COVID-19 cases and the different meteorological parameters

3.3.2 Principal Component Analysis

The PCA was used to make possible the visualization of patterns and correlations between the COVID 19 and all meteorological variables. The PCA was used to better understand the correlation between the dataset variables (Núñez-Alonso et al. 2019). In the present study, PCA was applied on the six variables, i.e. average temperature, minimum temperature, maximum temperature, RH, AH and daily COVID-19 cases. Eigenvalues and accumulated variance of the PCs are shown in Table 2. A total of 6 PCs were obtained however only two of them were retained for further analysis. The PCs with eigenvalues greater than 1 are considered while selecting the number of PCs (Binaku et al. 2013). The PCs were also subjected to varimax rotation and the rotated PCs and factor loadings were used for further interpretation. The effect of rotation is to spread the importance more or less equally between the rotated factors.

Table 2 Principal components (PC) and total variance of four cities considering meteorological variables and Daily COVID-19 cases

PC1 in Pune had the highest eigenvalue (4.43) and accounted for 48% variance of the original dataset and it shows strong positive factor loadings for AH and RH and daily cases (Table 3). PC1 in Delhi has an eigenvalue of 3.32 accounting for 51% variance having strong positive loadings for Tmax, Tmin, Tavg and AH. In Mumbai, PC1 has an eigenvalue of 3.51 with a total variance of 52% and has strong loadings for Tmax, Tmin, Tavg. Although the PC2 has strong positive loading for RH, AH and COVID-19 cases. In Ahmedabad, PC1 has an eigenvalue of 4 and a variance of 59% and it has strong positive factor loadings for Tmax, Tmin, Tavg.

Table 3 Factor loadings after varimax rotation

PC1 in Delhi comprised all the meteorological variables except relative humidity and the Daily COVID-19 cases. In Mumbai and Ahmedabad, the first principal component (PC1) alone explained 52% and 59%, respectively, of the data sets' total variance. In Pune, humidity was significantly associated with PC1 whereas the temperature parameters were associated with PC2. When PC1 of Mumbai was analyzed, temperature parameters were associated with it. Whereas in Delhi Tmax, Tmin, Tavg were associated with PC1 and other variable with PC2.

From the correlation circle chart for each of the cities, we could interpret the highly correlated variables with each other (Fig. 4). In Delhi, the daily COVID-19 cases were significantly correlated with the Absolute Humidity. In Mumbai, the daily cases were strongly associated with absolute humidity and in Pune, it was strongly correlated with relative humidity. The daily cases in Ahmedabad were strongly correlated with the minimum temperature. Thus, it could be observed that the daily COVID-19 cases in all the cities had a strong correlation with the humidity (except for Ahmedabad).

Fig. 4
figure 4

Projection of all meteorological variables and COVID-19 on the first two principal components

4 Conclusion

In the present study, we investigated the variation in meteorological parameters and relation with the spread of COVID-19 daily cases (from 1st April to 30th September 2020) in the four Indian megacities Delhi, Mumbai, Pune and Ahmedabad with different climatology. We observed that the meteorological variables such as relative humidity and absolute humidity showed a moderate positive correlation with the daily COVID-19 cases in three cities. Ahmedabad showed poor correlation with absolute humidity and no correlation was observed with relative humidity. There was a poor correlation between atmospheric temperature in Ahmedabad and Delhi and negative correlation in hillocky Pune and coastal Mumbai was observed which indicate that lower temperature increased the transmission. An increase in cases during the unlocking period signifies that it is not affected by simple variability of seasonal temperature. The PCA further enhanced the understanding of the correlations between the variables (meteorological parameters and COVID-19 cases). PCA analysis revealed that the COVID-19 cases are closely correlated with the humidity. A more detailed analysis of different parts of the country is required and various analysis techniques are required to understand the environmental factors' roles in the spread of COVID-19.