Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter Open Access December 12, 2016

A best-fit probability distribution for the estimation of rainfall in northern regions of Pakistan

  • M. T. Amin EMAIL logo , M. Rizwan and A. A. Alazba
From the journal Open Life Sciences

Abstract

This study was designed to find the best-fit probability distribution of annual maximum rainfall based on a twenty-four-hour sample in the northern regions of Pakistan using four probability distributions: normal, log-normal, log-Pearson type-III and Gumbel max. Based on the scores of goodness of fit tests, the normal distribution was found to be the best-fit probability distribution at the Mardan rainfall gauging station. The log-Pearson type-III distribution was found to be the best-fit probability distribution at the rest of the rainfall gauging stations. The maximum values of expected rainfall were calculated using the best-fit probability distributions and can be used by design engineers in future research.

1 Introduction

Pakistan is located at a latitude of 33.6667° N and longitude of 73.1667° E in the Middle East, a well-known region of southwestern Asia situated in the northern and eastern hemispheres. Pakistan experiences a diversified climate throughout the year. The minimum temperature is as low as –25°C in northern areas, and the maximum temperature is as high as 55°C in southern areas. Most of Pakistan experiences a dry climate, while humid conditions prevail in northern areas. In Pakistan, monsoons and evaporation from western depressions are the sources of rainfall. Monsoons contribute 65 to 75% of the total rainfall in Pakistan. The most vital natural source of water for humans, animals and crops is rainfall that contributes to lakes and rivers. Predicting the future occurrence and distribution of rainfall based on the amounts received in previous years has proved to be difficult and the results unreliable. Hydrological events such as rainfall, which occurs as a natural phenomenon, are observed at the event scale. The efficient management and use of water resources can be enhanced by rainfall analyses using probability distributions and annual maximum daily rainfall [1]. The expected rainfall in different return periods is determined through probability and frequency analysis of rainfall data [2]. In order to reduce flood damages and design and construct hydrologic projects such as dams, dykes, and urban drainage systems, the management and implementation of water resource strategies require reliable data regarding extreme events with high return periods [3]. Various probability distributions are currently used to predict expected rainfall in different return periods, as rainfall varies with time and location [4]. Frequency analyses of rainfall data have been performed for different return periods [5-9]. The expected rainfall values in different return periods, which are greater than or less than those of recorded rainfall, are estimated using a fitted distribution. The damage caused by storms can be reduced by the precise estimation of extreme rainfall, leading to the efficient design of hydraulic structures. A number of probability models have been developed to depict the distribution of extreme rainfall at a site [3]. The choice of an appropriate distribution model is one of the major problems in engineering practice. The selection mainly depends on the available rainfall data at a particular site. To find a suitable distribution model that will provide accurate estimates of extreme rainfall, it is necessary to evaluate the available distribution models. The probability models most commonly used to estimate rainfall frequency are the normal, log-normal, log-Pearson type-III and Gumbel distributions. The objective of the study is to find the best-fit probability model and perform a probability analysis of 24-hour annual maximum rainfall in northern Pakistan, as rainfall in this area is the main source of water for the irrigation network in the country.

2 Material and methods

Probability distributions are basic concepts in statistics. The results of statistical experiments and their probabilities of occurrence are linked by probability distributions. Rainfall data from northern Pakistan were evaluated with four probability models to find the best-fit model. The probability models used include the normal (N), log-normal (LN), log-Pearson type III (LP3) and Gumbel (EVI) probability models.

2.1 Normal distribution

The normal distribution is the most useful continuous distribution of all the distributions. The probability density function (PDF) and cumulative distribution function (CDF) of the normal distribution are calculated using Eqs. (1) and (2), respectively:

f(x)=exp(12(xμσ)2)σ2π(1)
F(x)=(xμσ)=12[1+erf (xμσ2)](2)

where ‘μ’ is the location parameter, ‘σ’ is the scale parameter and ‘Φ’is the Laplace Integral.

In the normal distribution, the maximum value of expected rainfall (XT) corresponding to any return period (T) can be calculated using Eq. (3):

XT=X¯(1+CvKT)(3)

where ‘XT’ is the maximum value of expected rainfall, X¯ is the mean, ‘Cv’ is the coefficient of variation and ‘KT’ is the frequency factor, which depends on the return period and probability distribution. ‘KT’is calculated using the following equation.

KT=XTμσ(4)

The frequency factor (KT) is the same as the standard normal variate ‘z’, which is calculated using Eq. (5).

z=w2.515517+0.802853w+0.0110328w21+1.432788w+0.189269w2+0.001308w3(5)

From Eq. (5), can be expressed as follows:

w=ln1p21/2(0<p0.50)(6)

where ‘p’is the exceedance probability (p=1/T). When p>0.5, 1-p is substituted for ‘p’in Eq. (6).

2.2 Log-normal distribution

The log-normal distribution is a distribution of random variables with a normally distributed logarithm. The lognormal distribution model includes a random variable Y, and Log(Y) is normally distributed. The probability density function (PDF) and cumulative distribution function (CDF) of the log-normal distribution are calculated using Eqs. (7) and (8), respectively:

f(x)=exp[12(ln(xγ)μσ)2](xγ)σ2π(7)
F(x)=(ln(xγ)μσ)=12[erfc {ln(xγ)μσ2}](8)

where ‘μ’ is the shape parameter, ‘σ’ is the scale parameter, ‘γ’is the location parameter and ‘Φ’is the Laplace Integral.

The log-normal distribution assumes that Y=In(X); therefore, the maximum value of expected rainfall (XT) corresponding to any return period (T) can be calculated using Eq. (9):

XT=exp(YT)(9)
YT=Y¯(1+CvyKT)(10)
KT=YTμyσy(11)

where Y¯ and ‘Cvy’ are the mean and coefficient of variation of ‘Y’, respectively. ‘KT’ is the frequency factor, which is the same as the standard normal variate and can be computed using Eq. (5).

2.3 Log-Pearson type-III distribution

The log-Pearson type-III distribution has been widely and frequently used in hydrology and for hydrologic frequency analyses since the recommendation of this distribution by U.S. federal agencies. The probability density function (PDF) and cumulative distribution function (CDF) of the log-Pearson type-III distribution are calculated using Eqs. (12) and (13), respectively:

fx=1xβΓαlnxyβα1explnxyβ(12)
Fx=ΓlnxyαβΓα(13)

where ‘α’, ‘β’ and ‘γ’ are shape, scale and location parameters, respectively.

In the log-Pearson type-III distribution, the maximum value of expected rainfall (XT) corresponding to any return period (T) can be calculated using Eq. (14):

XT=AntilogX(14)
Log(X)=X¯+KTSd(15)
KT=2CS[{(zCs6)Cs6+1}31](16)

where X¯, ‘Sd’, and ‘Cs’ are the mean, standard deviation and coefficient of skewness of rainfall data, respectively, and ‘KT’ is the frequency factor.

2.4 Gumbel (EV I) distribution

The Gumbel distribution named in honor of Emil Gumbel, and also known as the Extreme Value Type I (EV I) distribution, is a continuous probability distribution... This distribution can be applied to model maximum or minimum values (extreme values) of a random variable. The probability density function (PDF) and cumulative distribution function (CDF) of the Gumbel distribution are calculated using Eqs. (17) and (18), respectively:

fx=1σexpxμσexpxμσ(17)
Fx=expexpxμσ(18)

where ‘σ’ and ‘μ’ are the scale and location parameters, respectively.

The Gumbel distribution can be used to calculate the maximum value of expected rainfall (XT) corresponding to any return period (T) using Eq. (19):

XT=X¯1+CvKT(19)
KT=6π0.5772+lnlnTT1(20)

where X¯ is the mean, ‘Cv’ is the coefficient of variation and ‘KT’ is the frequency factor, which depends on the return period (T) and probability distribution.

3 Results and discussion

The northern area of Pakistan is surrounded by the Himalayan, Karakoram, Hindu Kush, and Pamir mountain ranges which with high peaks of between 6500 m to 8600 m. The snowmelt from these mountains, combined with glacier melt and monsoon rainfall, contribute to the many rivers, most notably the Indus River, that Pakistan has relied on to develop an advanced irrigation canal network However, the distribution and quantity of monsoon rainfall varies widely throughout the year, and occurs due to seasonal winds and western disturbances. In northern areas, such as Khyber Pukhtonkhuwa and Balochistan provinces, the maximum rainfall occurs during December to March, and in Punjab and Sindh, the maximum rainfall (50-75%) occurs during the monsoon season [10-15].

The 24-hour annual maximum rainfall data from six rainfall-gauging stations in northern Pakistan were used in this study. The locations of these stations are shown in Figure 1. A summary of the statistics is presented in Table 1. These statistical parameters are used to calculate the estimated 24-hour annual maximum rainfall in different return periods using different probability distributions. Of the six selected stations, Oghi has 46 years of rainfall data, spanning from 1961 to 2010. Three stations, including Kalam, Daggar and Mardan, have 44 years of rainfall data, spanning from 1962 to 2009, 1963 to 2010 and 1963 to 2010, respectively. Two stations, including Puran and Besham Qilla, have 38 years of rainfall data, spanning from 1963 to 2004 and 1969 to 2010, respectively.

Fig. 1 Locations of selected rainfall gauging stations.
Fig. 1

Locations of selected rainfall gauging stations.

Table 1

Summary of statistics from the selected rainfall gauging stations.

Statistical parametersSelected rainfall gauging stations
KalamOghiDaggarMardanPuranBesham Qilla
Mean62.3784.6689.6277.5766.2876.11
Coefficient of skewness0.910.840.940.390.631.33
Coefficient of variation0.380.340.350.360.270.36
Standard deviation23.4528.9931.7227.7317.8127.73
Maximum value138.43149.6196.85145114.3149.6
Minimum value19.345.7238.612338.140.89
Data collection years1962-20091961-20101963-20101963-20101963-20041969-2010
Data collection period444644443838

The distribution of 24-hour maximum rainfall observed during different months of a year is shown in Figure 2. Figure 2 shows that Kalam and Besham Qilla received 42% and 21%, respectively, of observed rainfall in March. Oghi, Daggar and Puran received 37%, 32% and 23%, respectively, of observed rainfall in July. Mardan received 37% of observed rainfall in August. These results suggest that the maximum rainfall at these selected stations occurred between March and August.

Fig. 2 Distributions of 24-hour annual maximum rainfall in a year.
Fig. 2

Distributions of 24-hour annual maximum rainfall in a year.

Four probability distributions (normal, log-normal, log-Pearson type-III and Gumbel) were used in this study. The parameters of probability distributions were calculated using the method of moments and are given in Table 2.

Table 2

Parameters of probability distributions at rainfall gauging stations.

DistributionParametersGauging stations
KalamOghiDaggarMardanBuranBesham Qilla
NormalSigma (δ)23.4528.9931.7227.7317.8127.73
mu (μ)62.3784.6689.6277.5766.2876.11
Log-normalSigma (δ)0.3750.3260.3450.1440.2610.322
mu (μ)4.064.384.445.244.164.28
Gamma (γ)000-112.3400
Log Pearson type IIIAlfa (α)51.2467.56132117.48259.938.86
Beta (β)-0.0530.04-0.003-0.1440.0160.11
Gamma (γ)6.781.6844.535.36-0.0993.305
Gumbel maxSigma (δ)18.2822.6124.7321.6213.8821.62
mu (μ)51.8271.6175.3465.0958.2763.64

The four probability distributions were subjected to three goodness of fit tests (Kolmogorov Smirnov Test, Chi-Squared Test and Anderson Darling Test) to determine the best-fitting probability distribution model at each rainfall gauging station. A standard procedure was followed for application of goodness of fit tests that was described earlier by several authors [16-18].

The goodness of fit tests was ranked from one (bestfit) to four (least-fit) for all probability distributions.

Selection of the best-fit probability distribution is based on the total score from all the goodness of fit tests. The results of goodness of fit tests at each selected rainfall gauging station and for each probability distribution used in this study are shown in Table 3. Based on the results of the goodness of fit tests, the best-fit probability distribution and mathematical expression for the calculation of rainfall in different return periods at each gauging station are shown in Table 4.

Table 3

Results of goodness of fit tests.

StationDistribution modelKolmogorov Smirnov testChi squared testAnderson Darling testTotal
KalamNormal1214
Log normal3429
Log Pearson type III43310
Gumbel2147
OghiNormal1113
Log normal2327
Log Pearson type III44412
Gumbel3238
DaggarNormal1416
Log normal3137
Log Pearson type III43411
Gumbel2226
MardanNormal43310
Log normal1124
Log Pearson type III3249
Gumbel2417
PuranNormal3115
Log normal2439
Log Pearson type III43411
Gumbel1225
Besham QillaNormal1113
Log normal3227
Log Pearson type III44412
Gumbel2338
Table 4

Best-fit distributions and mathematical expressions.

StationBest-fit distributionMathematical expression of best-fit distribution
KalamLog Pearson type IIILog (XT) = 1.77+0.17KT
OghiLog Pearson type IIILog (XT) = 1.90+0.14KT
DaggarLog Pearson type IIILog (XT) = 1.93+0.15KT
MardanNormalXT = 77.57+27.73KT
PuranLog Pearson type IIILog (XT) = 1.81+0.12KT
Besham QillaLog Pearson type IIILog (XT) = 1.86+0.14KT

The normal distribution provides the best-fit at the Mardan rainfall gauging station, while log-Pearson type-III provides the best-fit at the other rainfall gauging stations analyzed in this study. Probability density functions (PDF) and cumulative distribution functions (CDF) at the rainfall gauging stations were calculated using the best-fit distribution, i.e., the normal distribution at Mardan and the log-Pearson type-III distribution at the rest of the rainfall gauging stations, and are shown in Figures 3 and 4.

Fig. 3 PDFs of probability distributions at rainfall gauging stations.
Fig. 3

PDFs of probability distributions at rainfall gauging stations.

Fig. 4 CDFs of probability distributions at rainfall gauging stations.
Fig. 4

CDFs of probability distributions at rainfall gauging stations.

The rainfall estimates or maximum values of expected rainfall (mm) for return periods of 2, 5, 10, 20, 50, 100 and 200 years at the rainfall gauging stations were calculated using the best-fit distribution. The rainfall estimates are given in Table 5.

Table 5

Rainfall estimates at the rainfall gauging stations using the best-fit distribution.

StationBest-fit distributionReturn period (Years)
25102050100200
KalamLog-Pearson III59.2980.5093.56105.41119.93130.32140.33
OghiLog-Pearson III79.15105.31123.23140.88164.52182.93201.95
DaggarLog-Pearson III84.56113.32132.01149.69172.38189.35206.30
MardanNormal77.57100.90113.11123.19134.54142.09149.01
PuranLog-Pearson III63.7179.8690.1699.83112.15121.32130.47
Besham QillaLog-Pearson III69.4693.24111.19130.14157.48180.31205.29

4 Conclusions

Annual maximum rainfall data based on a 24-hour duration at six rainfall-gauging stations in northern Pakistan were used in this study. The purpose of the study was to find the best-fit probability distributions at northern rainfall gauging stations. The maximum values of expected rainfall or rainfall estimates calculated using a probability distribution that does not provide the best-fit may yield values that are higher or lower than the actual values. These calculations may be used to influence decisions relating to local economics and hydrologic safety systems.

The normal distribution provided the best-fit probability distribution at the Mardan rainfall gauging station based on the scores of the goodness of fit tests used in this study. The log-Pearson type-III distribution is the best-fit probability distribution at the rest of the rainfall gauging stations. The expected values of designed rainfall or rainfall estimates calculated using the best-fit probability distributions at the rainfall gauging stations might be used by design engineers to safely and feasibly design hydrologic projects.

Acknowledgements

The project was financially supported by King Saud University, Vice Deanship of Research Chairs.

  1. Conflict of Interest: The authors declare no conflict of interest.

References

Subudhi R., Probability analysis for prediction of annual maximum daily rainfall of Chakapada block of Kandhamal district in Orissa, Indian J. Soil Conser, 2007, 35, 84-85.Search in Google Scholar

Bhakar S. R., Iqbal M., Devanda M., Chhajed N., Bansal A. K., Probability analysis of rainfall at Kota, Indian J. Agri. Res, 2008, 42, 201-206.Search in Google Scholar

Tao D.Q., Nguyen V. T., Bourque A., On selection of probability distributions for representing extreme precipitations in Southern Quebec, Annual Conference of the Canadian Society for Civil Engineering, 5th-8th June 2002, 1-8.10.1061/40644(2002)250Search in Google Scholar

Upadhaya A., Singh S. R., Estimation of consecutive day’s maximum rainfall by various methods and their comparison, Indian J. Soil Conserv., 1998, 26, 193-201.Search in Google Scholar

Bhakar, S. R., Bansal A. N., Chhajed N., Purohit, R. C., Frequency analysis of consecutive days maximum rainfall at Banswara, Rajasthan, India, ARPN J. Engg. Appl. Sci, 2006, 1, 64-67.Search in Google Scholar

Barkotulla M. A. B., Rahman M. S., Rahman, M. M., Characterization and frequency analysis of consecutive days maximum rainfall at Boalia, Rajshahi and Bangladesh, J. Develop. Agri. Econ., 2009, 1, 121-126.Search in Google Scholar

Nemichandrappa M., Ballakrishnan P., Senthilvel S., Probability and confidence limit analysis of rainfall in Raichur region, Karnataka J. Agri. Sci., 2010, 23, 737-741.Search in Google Scholar

Manikandan M., Thiyagarajan G., Vijayakumar G., Probability analysis for estimating annual One day maximum rainfall in Tamil Nadu Agricultural University, Mad. Agri. J., 2011, 98 (1-3), 69-73.Search in Google Scholar

Vivekanandan N., Intercomparison of extreme value distributions for estimation of ADMR, Int. J. Appl. Engg. Technol., 2012, 2 (1), 30-37.Search in Google Scholar

Kazi S. A., Khan M. L., Variability of rainfall and its bearing on agriculture in the arid and semi-arid zones of West Pakistan, Pak. Geographic Rev., 1951, 6 (1), 40-63.Search in Google Scholar

FAO, Pakistan’s experience in rangeland rehabilitation and improvement, Food and Agriculture Organization of the UNO, 70, 1987.Search in Google Scholar

Khan J. A., The climate of Pakistan, Rehber Publishers, Karachi, Pakistan, 1993.Search in Google Scholar

Khan F. K., Pakistan geography, economy and people, Oxford University Press, Karachi, Pakistan, 2002.Search in Google Scholar

Kureshy K. U. Geography of Pakistan, National Book Service Lahore, Pakistan, 1998.Search in Google Scholar

Luo Q., Lin E., Agricultural vulnerability and adaptation in developing countries: the Asia-Pacific region, Climate Change, 1999, 43, 729-743.10.1023/A:1005501517713Search in Google Scholar

Chowdhury J. U., Stedinger J. R., Goodness of fit tests for regional generalized extreme value flood distributions, Water Res., 1991, 27 (7), 1765-1777.10.1029/91WR00077Search in Google Scholar

Adegboye O. S., Ipinyomi R. A., Statistical tables for class work and Examination, Tertiary publications Nigeria Limited, Ilorin, Nigeria, 1995, 5-11.Search in Google Scholar

Murray R.S., Larry J. S., Theory and problems of statistics, 3rd Edition, Tata Mc Graw – Hill Publishing Company Limited, New Delhi, India, 2000, 314-316.Search in Google Scholar

Received: 2016-2-9
Accepted: 2016-8-21
Published Online: 2016-12-12
Published in Print: 2016-1-1

© 2016 M. T. Amin et al.

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Downloaded on 25.4.2024 from https://www.degruyter.com/document/doi/10.1515/biol-2016-0057/html
Scroll to top button