1 Introduction

General circulation models (GCMs) are sophisticated, three-dimensional numerical models developed by research groups worldwide and are used to simulate the behavior of the climate system, its components, and their interactions. The climate system is chaotic, and for the long period simulations required to study climate change, these complex models are generally run at a global scale using equations, parameterizations, and assumptions (Broecker 1995; Phillips et al. 2004; Raisanen 2007; Rial et al. 2004; Skelly and Henderson-Sellers 1996). The performance of GCMs at the regional scale remains variable (Bollasina and Nigam 2009; Gleckler et al. 2008; Randall et al. 2007). Model accuracy differs with region and the type of variable simulated (Errasti et al. 2010); hence, GCM models are evaluated by testing their ability to simulate the "present climate" of hydrometeorological variables (including variability and extremes). However, it should be noted that neither good performance across an arbitrary suite of measures of observed climate, nor agreement in output across a collection of models, provides a rigorous basis for assessing the accuracy of a future prediction. Other evaluation methodologies identify groups of models that agree on future climate changes (convergence) or a combination of present and future climates (Dominguez et al. 2010; Giorgi and Mearns 2003; Tebaldi and Knutti 2007). The inherent problems with these approaches are discussed by Knutti (2010), Raisanen (2007), Tebaldi and Knutti (2007), and Weigel et al. (2010).

Daily precipitation is important at a river basin scale because it is (1) the main driver of runoff generation in river basins (Chen et al. 2010; Chiew et al. 2009; Kim and Pachepsky 2010; Piani et al. 2010); (2) used in the evaluation of flood frequency for flood safety, which is of utmost concern for water resource management agencies for operating and maintaining reservoir systems (Raff et al. 2009), and for risk evaluations to guide design of infrastructure alterations or potential changes in reservoir operations (Raff et al. 2009); (3) used to study the impact produced by a precipitation pattern change in the erosion process in a river basin (Abaurrea and Asín 2005); (4) important in irrigation water management (García-Garizábal and Causapé 2010); (5) used in the estimation of water budget (Guo et al. 2004), precipitation erossivity (Angulo-Martínez and Beguería 2009; Vrieling et al. 2010), and groundwater recharge (Nolan et al. 2007; Toews and Allen 2009); (6) used to study the effect of its changes on loads of NO3-N and sediment from watersheds (Chaplot 2007) and forest ecosystems (Johnson et al. 2000); and (7) used to study water table fluctuations (Park and Parker 2008), atmospheric circulation, and temperature (Brandsma and Buishand 1997). Hence, the performance of daily precipitation simulated by GCMs should be evaluated at finer spatial scales.

Evaluating highly variable, complex climate variables such as precipitation is important and desirable. On higher spatial and temporal scales, GCM performance is rarely assessed at a daily scale due to the vast quantities of data involved (Finnis et al. 2009). For the 20th century climate scenario (20C3M), past studies have evaluated the daily precipitation simulated by the GCMs participating in the Intergovernmental Panel for Climate Change's fourth assessment report (IPCC AR4). The studies were performed for the entire globe (Dai 2006) for the land areas (Sun et al. 2006) and in a more detailed way for smaller regions such as Australia (Maxino et al. 2008; Perkins et al. 2007; Vaze et al. 2011), the USA (Chen and Knutson 2008; Pierce et al. 2009; Pryor and Schoof 2008), China (Li et al. 2010), and South America (Bombardi and Carvalho 2009, 2010). Probability density functions, skill scores (SSs), mean square error, intensity, frequency, and extreme value indices are some of the methods used in these studies. The details of these studies such as the region examined, number of GCMs, and methods used in the analysis are provided in Table 1. The results of these studies are discussed later in the manuscript. Evaluation of the daily precipitation simulated by the GCMs exclusively for the Indian region at finer scales has not been conducted (Table 1). Past studies have evaluated the precipitation simulated by GCMs at larger timescales (monthly, seasonal, and annual) over India region and larger spatial scales (e.g., south Asia) (Annamalai et al. 2007; Bollasina and Nigam 2009; Kripalani et al. 2007b; Preethi et al. 2010; Rajeevan and Nanjundiah 2009). A good review of the methods available to evaluate the performance of GCMs may be found in (Errasti et al. 2010; Johnson and Sharma 2009). Details of many previous studies performed at a monthly scale, such as region examined, variables evaluated, method used, and GCM model selected can be found in Table 8 in Errasti et al. (2010).

Table 1 Literature review of daily precipitation evaluation studies

Monsoon constitutes an essential phenomenon for a tropical climate (IPCC 2001) and defines essential features of the Earth's climate that have profound social and economic consequences (Zhang and Li 2008). The Indian summer monsoon (ISM) represents one of the largest annual variations of the global climate system (Turner and Slingo 2009). ISM is one of the main components of the large-scale Asian summer monsoon (Cherchi et al. 2007), which is relied on by more than a third of the world's population for the majority of their water resources, for agriculture, and, increasingly, for industrial uses (Turner and Slingo 2010). Furthermore, the Indian monsoon is one of the most dominant tropical circulation systems in the general circulation of the atmosphere (Rajeevan and Nanjundiah 2009). Precipitation is the variable used by most studies to evaluate the performance of a simulated monsoon (Zhang and Li 2008). The errors in simulated precipitation fields often indicate deficiencies in the representation of these physical processes in the GCM (Dai 2006). Indian precipitation has often been used as proxy data for the Asian monsoon as a whole for understanding the energy budget of the major circulation features (Parthasarathy et al. 1994).

The objective of this study was to evaluate the simulated daily precipitation by GCMs over India. Simulations from 19 GCMs participating in the IPCC's AR4 for 20C3M scenario during the period 1961–2000 are evaluated. The study is valuable due to the importance of daily precipitation and Indian monsoon. It supplements the many assessments that evaluate GCMs at daily scale and helps us understand GCMs' strengths or weaknesses in simulating the monsoon in this region, which also influences surrounding areas. Knowledge of model performance will help groups to build on strengths or address weaknesses in subsequent models and is useful to researchers in selecting an appropriate mix of global models for use in regional applications and the effects such choices would have on regional study results (Pierce et al. 2009).

2 Data used and study region

The observational dataset used in this study is the 1° × 1° daily precipitation dataset prepared by the National Climate Centre of the Indian Meteorological Department (IMD), Pune, India (Rajeevan et al. 2005; Rajeevan et al. 2006). About 2,140 rain gauge stations having a minimum 90 % of data during the period 1951–2004 were chosen from 6,329 stations in India (Satyanarayana and Srinivas 2008). Rajeevan et al. (2005) discussed the method of gridded data preparation in detail. The spatial interpolation procedure from the irregularly spaced rain gauge network to an equal angle grid was adapted from Shepard (1968). In addition to a distance factor, a direction factor has been introduced while defining the weights for interpolation (Dash et al. 2009). There are 357, 1° grids in the study region (filled circles in Fig. 1), each having a time series of precipitation data. The period of data used in this study is 1961–2000.

Fig. 1
figure 1

a Location of the 1° grid points and 2.5°×2.5°grid squares used in the study. b Location of 1° grid points and classification of regions in India. 1° grid points show the location of observed precipitation data. All Global Climate Models (GCMs) are regridded to a common 2.5° grid square. The 77 grid squares in the study region are numbered (a) and referred in text. The 1° grid points in each grid square are pooled to estimate the probability density function (PDF) of observed precipitation and are compared to the PDF estimated from GCM for the grid square

Simulations of precipitation from 19 GCMs (Tables 2 and 3) are obtained from the World Climate Research Programme's (WCRP) Coupled Model Intercomparison Project phase 3 (CMIP3) multi-model dataset for the baseline scenario (20C3M) in the form of daily data. In this study, the naming convention followed for GCM model names are three-letter abbreviations, similar to those in (Kripalani et al. 2007a). Some of the GCMs are run multiple times. The naming conventions of both GCM name and run number (numerals) are given in Tables 2 and 3. There are a total of 39 realizations from different GCMs and runs (Tables 2 and 3) for study region. They are extracted and interpolated to a common 2.5º grid square using bilinear interpolation. There are 77, 2.5° grid squares in the study region, and they are numbered in Fig. 1. These 2.5° grid squares are referred to as grid squares in the manuscript.

Table 2 Names of the modeling group, country of origin, model name and version, acronym, run for the precipitation used in the study, and the reference
Table 3 The model characteristics considered: (1) horizontal resolutions of the GCMs (H high, M medium, L low), (2) convective scheme employed for precipitation parameterization (RAS relaxed Arakawa–Schubert, MC moist convection adjustment, MF mass flux-based, AS Arakawa–Schubert) and (3) flux correction at the ocean–atmosphere interface (N no flux correction, H heat, W water, M momentum)

In this study, India has been divided into six precipitation zones (peninsular, west central, northwest, northeast, central northeast India, and the hilly region), as shown in Fig. 1b. Zones were defined by the Indian Institute of Tropical Meteorology (IITM) in Pune (www.tropmet.res.in) and used as homogenous precipitation zones (Dash et al. 2009). These are referred to as zones in the manuscript and discussed in Section 4.1 (Figs. 2 and 3).

Fig. 2
figure 2

a Average daily precipitation for each month (mm/day) was calculated for each of the six zones (shown in Fig. 1b). b Variation in the probability density functions (PDFs) estimated for each of the 77 grid squares (referred in Fig. 1a) for the three categories (Annual, JJASO, and JFMAMND). The plots were prepared using observed data for the period 1961–2000

Fig. 3
figure 3

Variations in SS in each category were studied using a few statistics calculated based on the SSs in each grid square. The statistics of SS calculated were median, mean, maximum (max), minimum (min), standard deviation (SD) and inter-quartile range (IQR). These statistics were calculated from SSs of all GCM realizations used in the study for three categories (Annual, JJASO, JFMAMND)

3 Methodology

In the study region, each grid square has one or more grid points (one time series for each grid point) of observed data (Fig. 1a). Most of the grid squares had either six or nine time series of observed precipitation. The boundary grid squares having fewer (one to five) time series of observed data (Fig. 1). For each grid square, a probability density function (PDF) of observed precipitation was estimated by pooling all the time series of observed data in that grid square. These are referred as "observed PDF" in the manuscript (Figs. 2b and 3). In calculating the observed PDFs, the values of the observed time series were not averaged. For each grid square and GCM run, a PDF was estimated using a single time series of simulated precipitation. These PDFs were referred as the "GCM PDF" in the manuscript (Fig. 3).

The nonparametric observed PDF and simulated PDF were calculated using MatLab (http://www.mathworks.com). To estimate the PDFs, bin width (S b) was required and was assumed as 1 mm/day in this study. The frequency of each bin (n) was calculated, then normalized such that the area (or integral) under the histogram was equal to 1. The frequency was normalized by dividing by the number of observation times the bin size. The observed PDF for each grid square was the plot of the normalized frequencies of observations (FOn), whereas the GCM PDF was the plot of the normalized frequencies of GCM simulations (Fgn).

The SS based on empirical PDFs developed by Perkins et al. (2007) was used due to its simplicity and applicability across variables, spatial scales, and time periods. The minimum frequency in each of the bins between modeled (Fg n ) and observed (FO n ) were estimated. SS was the summation of the minimum frequency values over all bins (Eq. 1).

$$ \mathrm{SS}={\displaystyle \sum_{n=1}^{N_{\mathrm{b}}} \min \left({\mathrm{Fg}}_n,{\mathrm{Fo}}_n\right)} $$
(1)

N b is the number of bins and is calculated using Eq. 2.

$$ {N}_{\mathrm{b}}=\left({V}_{\max }-{V}_{\min}\right)/{S}_{\mathrm{b}} $$
(2)

where V max and V min are the maximum and minimum values of the precipitation, respectively. The SS can range between 0 and 1. SS is close to 1 when the observed PDFs and the GCM PDF are similar, and it is close to 0 if overlap is negligible. The limitation of the methodology is that as the event becomes more rare, failure of the model to simulate these events becomes less important to the SS (Perkins et al. 2007).

In this study, this limitation was overcome by modification of the SS methodology.

We found that in cases where a model overestimated in general and had very high values of rare precipitation amounts, significant differences occurred between the observed and model's mean value. The SSs estimated for these models were not found to be the least, because these events were less important to SS. In the modified approach adopted in this study, the models were screened and separated into two groups: models with and without significant differences between mean value from observed datasets \( \left(\overline{o}\right) \) and model simulations \( \left(\overline{g}\right) \). Then the SSs were estimated using Eq. 3.

$$ SS=\left\{\begin{array}{l}{\displaystyle \sum_{n=1}^{N_{\mathrm{b}}} \min \left({\mathrm{Fg}}_n,{\mathrm{Fo}}_n\right)}\kern2.5em \mathrm{if}\ \left(\overline{o}-\overline{g}\right)\ \mathrm{is}\ \mathrm{significant}\\ {}\mathrm{user}\hbox{-} \mathrm{defined}\kern4.5em \mathrm{else}\end{array}\right. $$
(3)

Models with significant differences were penalized and assigned an SS value such that they had the lowest SS. In this study, gir01 model was identified as the model with significant differences during the screening process and was assigned a SS value 0.1 lower than the model with the least SS for the grid square.

Based on the temporal scale, this analysis was performed for three categories, the monsoon season (JJASO — June to October), non-monsoon season (JFMAMND — January–February–March–April–May–November–December), and for the entire year ("Annual"). SSs were estimated for each grid square.

SS variations in each category and precipitation zone (described in Section 2) were studied using a few statistics calculated based on the SSs in each grid square. The statistics of SS calculated were maximum, minimum, median, mean, standard deviation, and inter-quartile range; for example, the category Annual would have 39 SS values (one for each GCM realization) in each grid square. From the 39 values, the statistics were calculated for each grid square. The entire Indian region had 77 values (one for each grid square) for a statistic. The results of the statistics estimated are presented in Sections 4.2 and 4.3, and Fig. 4.

Fig. 4
figure 4

Typical figures showing the variation in the probability density functions (PDFs) for the top five and last five models for the northwest zone and the three categories (Annual, JJASO, JFMAMND). The GCMs with top five and bottom five rankings were obtained from Table 4. The observed PDFs for the region are in black

To evaluate the performance of individual models in GCM realizations in each category, the SSs estimated for each grid square were ranked. For example, a grid square in a category (e.g., Annual) would have 39 SS values (one for each GCM realization). Ranks were assigned to these 39 SS values. The realization with the highest SS was assigned rank 1, and the realization with the lowest SS was assigned rank 39. Thus, in a category, each grid square and GCM realization was ranked between 1 and 39. For GCMs with multiple runs, the ranks were first averaged across the runs. Each of the 19 GCMs had 77 rank values (one for each grid square) for a category; for example, the GCM cc5 had five runs. The ranks of the five runs were averaged for each of the 77 grid squares, and these values in a category were again averaged to get the ranking of GCMs for the entirety of India and each of the six zones. Thus, a GCM had a rank for a category, zone, and all of India (Table 4).

Table 4 The ranking of the GCMs for the three categories (Annual, JJASO, JFMAMND), six zones (peninsular, west central, central northeast, northwest, northeast, hilly region), and for all of India

Subgroups of models that share common features were formed to connect SSs and model characteristics. The model characteristics considered for the analysis included (1) horizontal resolutions of the GCMs, (2) convective scheme employed for precipitation parameterization, and (3) flux correction at the ocean–atmosphere interface (Table 3). Based on horizontal resolution, we divided the models into three groups (high, medium, and low) and compared them with the SSs. In general, the horizontal resolutions of the three groups are >3°, 2–3° and <2° for high, medium, and low groups, respectively (Kim et al. 2008). Based on convective scheme used, they are divided into four groups: RAS (relaxed Arakawa–Schubert), MC (moist convection adjustment), MF (mass flux-based), and AS (Arakawa–Schubert) (Kripalani et al. 2007a). Based on flux correction at the ocean–atmosphere interface, they are divided into groups with no flux correction (N), heat (H), water (W), or momentum (M) (Dai 2006; Kripalani et al. 2007a).

4 Results

4.1 Analysis of observed data

The average daily precipitation for each month (mm/day) was calculated for each zone using the observational dataset (Fig. 2a). The PDFs of the observed data for the six zones and the entire country (Fig. 2b) are provided. During July and August, the entire subcontinent comes under the influence of the monsoon. The monsoon starts retreating in northwestern India in early September, but it continues almost until December in the far south. The retreating monsoon is also responsible for precipitation in parts of the Indian peninsula (Roy 2009), as observed by a peak in October in Fig. 2a. The northeast zone has the highest annual precipitation in the country. Cherrapunji and Mawsynram, the two well-known stations with highest annual rainfall (Jenamani et al. 2006), are located in this region. During the dry summer months of March through May, there are also convective storms in certain parts of the subcontinent (Roy 2009) that cause higher precipitation in hilly regions. Orographic uplift of air causes heavy precipitation on the windward side of mountains. This is also the cause of high precipitation over western coast in the peninsular region, which is on the windward side of western ghats. The northwest zone encompasses the desert regions that have the least precipitation.

4.2 Pan-India assessment of SS from GCM realizations

The results of the statistics (maximum, minimum, median, mean, standard deviation, and inter-quartile range of GCM realizations) estimated from the SSs for each grid square (described in Section 3) are presented in Fig. 4. The variation in SSs among the 77 grid squares and various categories are shown. In this subsection, the four values in the parentheses represent maximum, minimum, mean, and median SSs. Among the three categories, for the whole country, the SSs in JFMAMND category were slightly better, as they had higher maximum, minimum, mean, and median SSs values (~1, 0.88, 0.96, and 0.99), followed by Annual (0.99, 0.73, 0.90, and 0.96) and JJASO (0.98, 0.72, 0.87, and 0.92). The JJASO category had slightly higher standard deviation in the maximum, minimum, mean, and median SSs (0.22, 0.12, 0.12, and 0.15) and inter-quartile range for the above statistics (0.42, 0.18, 0.29, and 0.24) compared with JFMAMND [(0.20, 0.13, 0.11, and 0.17), (0.40, 0.16, 0.23, and 0.26)] and Annual categories [(0.18, 0.12, 0.07, and 0.14), (0.35, 0.16, 0.19, and 0.22)].

4.3 Regional assessment of skill score from GCM realizations

The SS statistics (maximum, minimum, median, mean, standard deviation, and inter-quartile range of GCM realizations) estimated for individual grid squares were averaged across the six zones in India (Fig. 3). This provides insight into the relative accuracies and inaccuracies in model simulation on a regional basis. The figure shows that SS statistics vary with grid square, zone, and category. In this subsection, the values in parentheses represent the SSs for the three categories (Annual, JJASO, JFMAMND). Among the zones, SSs in the northwest were better for these three categories. This zone had a highest mean SSs (0.87, 0.76, and 0.94), median SSs (0.90, 0.80, and 0.95) and maximum SSs (0.97, 0.94, 0.99). Peninsular zone had the lowest mean SSs (0.66, 0.56, 0.72) and minimum SSs (0.37, 0.33, 0.31) for the three categories. This zone also had the lowest median SSs (0.69, 0.53) and maximum SSs (0.88, 0.83) for two categories (Annual and JJASO). The hilly region had the lowest mean SSs (0.72), median SSs (0.73), and maximum SSs (0.87) for the category JFMAMND. This zone had the second-lowest SSs for the three categories. The SS statistics for the rest of the zones (west central, central northeast, and northeast), were in between for the three categories. The standard deviation and inter-quartile range of SS for Annual category were the least for the hilly region and highest for peninsular zone. The JJASO category was the least for the hilly region and highest for the northwest zone. The JFMAMND category was the least for the northwest zone and highest for the peninsular zone.

4.4 Ranking of individual GCM

The ranking of GCMs was explained in Section 3 and presented in Table 4. The SSs among the different runs of a GCM were comparable (Fig. 5), so the ranks estimated for different simulations with a single model are averaged in this study. From Table 4, it can be observed that the ranking of GCMs varies with category and zone. No one model can be considered as the best for all zones and categories. The models miu, mpi, mim, mri, and mih ranked in the top five for Annual for India and for zones such as peninsular, west central, and central northeast. The models miu, mpi, mim, and mri ranked in the top five for categories JJASO for India and for zones such as peninsular, west central, central northeast, and northwest. The models miu, mpi, and mri ranked in the top five for category JFMAMND for India and for zones such as peninsular and northwest. The models (miu, mpi, mri, or mih) that performed well for India and most zones did not rank in the top five models for the hilly region and northeast zone for the three categories (except mpi, which ranked fifth for the JFMAMND category). The gir model had the lowest rank for all categories, zones, and India. Models gao and cs5 ranked in the bottom five for India and most zones, whereas in the hilly region they ranked in the top six for all three categories.

Fig. 5
figure 5

Typical figures showing the variation in the probability density functions (PDFs) for the top five and bottom five models for the northeast zone and the three categories (Annual, JJASO, and JFMAMND). The GCMs with top five and bottom five rankings were obtained from Table 4. The observed PDFs for the region are in black

4.5 Sensitivity analysis

To test the sensitivity of the GCM rankings for the three categories, six zones, and for all of India, PDFs of the GCMs in the top five and bottom five rankings were plotted. From the PDF plots, it was observed that the variability of the PDFs were sensitive to the ranks of the GCM, zones, and categories. Typical PDFs were shown for northwest, northeast, and peninsular zones in Figs. 4, 5 and 6. The PDFs from GCMs and observed precipitation for the six zones exhibited their lowest variability in the JFMAMND category (column 3, Figs. 4 through 6) followed by Annual (column 1, Figs. 4 through 6) and JJASO (column 2, Figs. 4 through 6) categories. For all zones and categories, the variability of the observed PDFs was less than the variability of the PDFs from GCMs (considering top five and bottom five rankings). For all three categories and six zones, (except JJASO for northeast, peninsular, and hilly regions) the variability in PDFs from the top five ranked GCMs was lower than variability from the bottom-five-ranked GCMSs. Some zones (e.g., North East) the top5 GCMS do much better while in the others (e.g., North West), the difference appears to be not so pronounced.

Fig. 6
figure 6

Typical figures showing the variation in the probability density functions (PDFs) for the top five and bottom five models for the peninsular zone and the three categories (Annual, JJASO, and JFMAMND). The GCMs with top five and bottom five rankings were obtained from Table 4. The observed PDFs for the region are in black

The results of the subgroups of models that share common features (horizontal resolutions of the GCMs, convective scheme employed for precipitation parameterization, flux correction at the ocean–atmosphere interface) show no clear connection between SSs (obtained using the entire distribution) and model characteristics, because models with high SSs did not belong to a particular group in terms of horizontal resolution, convective scheme, and flux correction (Fig. S1a,b,c).

5 Discussion

Annamalai et al. (2007) evaluated the 18 models participating in the IPCC AR4 for the Asian summer monsoon (ASM) region using pattern correlations and root-mean-square differences (RMSDs) relative to the observed precipitation. They used seasonal averages (June to September) of precipitation climatology for 30 years (1971–2000). They found six models (gf0, gf1, mim, mih, Hadcm3, and NCAR PCM) had larger pattern correlation and smaller RMSD with observations. Similar results (high SSs) were observed in this study for India at daily timescales for two of the six models (mri and mpi). Models (gf0, gf1) that performed well in their study for the ASM region did not perform as well in the present study for India and the zones. This could be because these models had some significant systematic errors and therefore have difficulty capturing the regional details in precipitation over India, particularly the high precipitation along the west coast (Annamalai et al. 2007). The Hadcm3 and NCAR PCM models were not used in this study, so their performance not compared. Annamalai et al. (2007) found the mri model simulation the most realistic in simulating the annual cycle of the ASM region. Preethi et al. (2010) found that the ing model had higher skill than mpi in simulating ISM climatology as well as inter-annual variability; however, results indicate that the mpi model had better SSs in simulating daily precipitation than the ing model for the JJASO category (except in the hilly region), the JFMAMND category (except central northeast and north east zones), and the Annual category (except the northeast zone). Among the seven models used in their study, ing and mpi are the two models common to this study. In their study, Kripalani et al. (2007b) used monthly precipitation values to calculate the annual cycle over the south Asian region and found bar and ips models unable to simulate the annual cycle accurately. The results in this study show a higher skill for the ips model in the Annual category for the Indian region.

Among the different zones considered in India, SSs in peninsular zones and hilly regions were generally lower compared with the rest of the zones. The results of this study show that considerable improvements in the hilly regions are desirable; for example, through improved representation of mountains and high terrain. Similar conclusion was observed in studies (Dai 2006). The peninsular zone in this study has contrasting precipitation patterns, with the west coast of the zone having heavy precipitation and the southeast having very low precipitation (Preethi et al. 2010). Within the west coast of this zone, a mountain range (Western Ghats) runs parallel to the coast, and the axis of the range lies perpendicular to the prevailing summer-monsoon winds. The moisture-laden monsoon winds cause heavy rainfall on the windward side of the mountains than the rain shadow on the leeward side of the west range (Basu 2005; Suprit and Shankar 2008). The low SSs in the peninsular zones could be due to significant large-scale precipitation biases in the models, such as reduced precipitation along the western coast of India and excessive rain over the Indian peninsula observed by Bollasina and Nigam (2009). Their results were based on the comparison of seasonal precipitation (June through September) from models (gf1, mim, mpi, CCSM3, and Hadcm3) to observed precipitation. The grid square resolution in GCMs is coarse to represent both high and low precipitation in the windward and leeward sides of the mountains in the west coast of the zone, which could be another reason for low SSs in this zone. Especially in the mountain regions, simulated precipitation is highly dependent upon model resolution (Pan et al. 2011). The low performance of GISS-ER could be attributed to its coarse resolution model, its inability to simulate extreme rainfall, and the seasonal cycle (Vidyunmala 2008; Kharin et al. 2005).

Among the three categories considered in the study, JFMAMND had higher SSs compared with JJASO. The reason for this could be that the category JFMAMND has less precipitation during the period than the JJASO category (Fig. 2a). The large spread in the PDFs from GCMs (column 2, Figs. 5 and 6) indicated that in general, most GCMs were not able to capture the monsoon in high-rainfall zones (north east and peninsular region).

Sun et al. (2006) classified daily precipitation rates into two categories: light (1–10 mm/day) and heavy (>10 mm/day) precipitation. They studied seven fully coupled climate models and found that most of them overestimated the frequency of light precipitation and underestimated the frequency of heavy precipitation. The results (Figs. 4 and 5) from our study show that daily precipitation rates of 0–1 mm/day were underestimated by most models, and the daily precipitation rates of 1–15 mm/day are overestimated by most models. The overestimation of 1–10 mm/day precipitation by most models agrees with results from Sun et al. (2006) even at the smaller regions considered in this study. The 10–15 mm/day precipitation is also overestimated at smaller spatial scales in our study, whereas the range is underestimated by most models in the study by Sun et al. (2006). The difference could be because smaller regions and seasons are examined in this study. From the PDF analysis, whether the higher frequency precipitation rates (>30 mm/day) are underestimated or not is unclear because the probability of their occurrence is very low.

No clear relationship between model characteristics (horizontal resolution, convective scheme, and flux correction) and SS could be discerned. One explanation is that several institutions have contributed a set of two or three climate models, have shared parts of code, and input datasets and expertise of those developing the GCMs, so the resulting parts of the model bias may be similar in some or all models (Jun et al. 2008; Knutti et al. 2010). Therefore, determining underlying reasons for high/low SSs with the GCM model characteristics that are responsible for the biases may not be possible. However, our results are in agreement with the results from Kharin et al. (2007), who found no statistically significant dependence of the magnitude of precipitation extremes on the model resolution in the tropics.

Model accuracy is also affected by the unpredictability of the Indian monsoon, which is influenced by a multitude of physical processes and interactions, orography, and its interaction with the circulation (e.g., ENSO). The various theories explaining the onset of the Indian monsoon are discussed in Chakraborty et al. (2006), and the challenges of modeling the monsoon are discussed in Turner and Annamalai (2012). Some of the problems of the AR4 models in simulating the monsoon climate are briefly stated below. The problems are addressed in more detail in the provided references.

  • Most monsoon depressions in the Bay of Bengal cause extreme rainfall (>100 mm/day) in the region, but, due to the coarse resolution of the GCMs, it is unknown if the depressions are the reason for the extreme rainfall (Turner and Annamalai 2012).

  • The interannual variation of the summer monsoon rainfall over the Indian region is not correctly simulated by the GCMs (Gadgil et al. 2005). This could be due to the GCMs' inability to simulate the special nature of SST–rainfall relationship over regions such as the West Pacific Ocean, the Bay of Bengal, and the South China Sea (Rajendran et al. 2012; Wang et al. 2005) and also improper simulation of monsoon related teleconnections (Nanjundiah et al. 2013).

  • There are biases and discrepancies such a double intertropical convergence zone (ITCZ) (Islam et al. 2013).

  • In nature, in the Indian monsoon zone, the continent tropical convergence zone (TCZ) is dominant, and the oceanic TCZ appears intermittently throughout the summer. Many models seem to have a tendency to get locked into either the oceanic or the continental TCZ (Gadgil and Sajani 1998).

  • Emissions of scattering and absorbing aerosols in the region are found to affect the monsoon climate (Chakraborty et al. 2004). The uncertainty in the level of these emissions and the ability to model their impact on the monsoon are a problem (Sajani et al. 2012; Turner and Annamalai 2012). Most models fail to simulate the link between Equatorial Indian Ocean Oscillation (EQUINOO) and ISM (Nanjundiah et al. 2013). Rajeevan and Nanjundiah (2009) find overestimation of rainfall over the Equatorial Indian Ocean and errors in simulating the seasonal cycle of rainfall over both the Eastern and Western Equatorial Indian Oceans.

Some of the reasons for uncertainty of the AR4 models in simulating the monsoon climate stated in the literature are as follows:

  • The SST responses of a given model to anthropogenic forcing affect the available moisture and vary among models causing uncertainty (Turner and Annamalai 2012). The differences in the factors affecting monsoon–SST relationships such as air–sea coupling and SST bias could cause variability among models (Islam et al. 2013).

  • Rajeevan and Nanjundiah (2009) looked at the role of SST and its interaction with cumulus convection. They found that the models had a compensation between errors in SST (underprediction) and rainfall for a given SST (overprediction). This response was different for different models and could be a cause of errors.

6 Conclusions

Daily precipitation from a suite of GCMs participating in the Intergovernmental Panel for Climate Change's fourth assessment report (IPCC AR4) for the 20th-century climate (20C3M scenario) were evaluated for the Indian region. The SSs were estimated from the PDFs. The methodology from earlier studies was modified to take into account high extreme precipitation events simulated by GCMs. The SSs were estimated at every 2.5° × 2.5° grid square. Results are presented for three categories and six zones. The three categories are the monsoon season (JJASO — June to October), non-monsoon season (JFMAMND — January–February–March–April–May–November–December) and for the entire year ("Annual"). The six precipitation zones considered are peninsular, west central, northwest, northeast, central northeast India, and the hilly region. Sensitivity analysis was performed for three spatial scales — 2.5° grid square, zones, and for the whole of India for the three categories. The observational dataset for India is the 1° × 1° daily precipitation dataset, prepared by the National Climate Centre of the IMD. The models were ranked based on the SS.

The results indicate that no single model performs best for all the categories and zones considered. The category JFMAMND had higher SS than the JJASO category. In general, among the zones, the northwest zone had higher SSs, whereas the peninsular zones and hilly regions had lower SSs.

The models are ranked for various categories and zones considered in this study. The impact groups could use this evaluation as a basis for choosing climate models for subsequent study. miu, mpi, and mri ranked in the top five for the three categories for India and most zones except the hilly region and the northeast zone. The gir model had the lowest rank for all categories, zones, and all of India. The models gao and cs5 ranked in the bottom five for India and most zones except in the hilly region, where they ranked in the top six for all three categories.

The PDFs were sensitive to the ranks of the GCM, zones, and categories. Results show that most models underestimated the daily precipitation rates 0–1 mm/day and overestimated 1–15 mm/day daily precipitation rates. The overestimation of 1–10 mm/day precipitation by most models agrees with results from Sun et al. (2006), even at the smaller regions considered in this study; however, the 10–15 mm/day precipitation overestimated at smaller spatial scales in our study differs from Sun et al. (2006). The difference could be because smaller regions and seasons are examined in this study.

We propose to study the subset of the best GCMs and further analyze their PDFs for change in different climate change scenarios. Mean precipitation can be affected by a spectrum of temporal scales, and it is possible for a model to generate a "correct" seasonal mean value without properly capturing the underlying precipitation variability (DeMott et al. 2007). These issues are not frequently explored in model evaluation. This evaluation approach will be extended to temperature; extended research in this direction is underway.