Problems in the extreme value analysis
Introduction
Communities should be prepared for natural hazards, such as storms, floods, earthquakes, landslides and avalanches. If not, these scarce events turn out to be disastrous. The history of man offers numerous examples of natural disasters that have killed many people and caused enormous economic losses. Recent examples include the East Asian tsunami in 2004 and the hurricane induced flood in New Orleans in 2005.
In some areas of the world, economic resources may limit adequate preparation for the natural hazards. Errors in the engineering processes may also occur and the climate change, which is not predictable from historical data, is a complicating factor. However, frequently, the damage caused by a natural phenomenon is related to poor structural safety resulting simply from underestimating the risk.
Resources are extensively wasted, not only when damage occurs, but also when structures are over-designed. The decision is about the balance between living with the risk and the costs of avoiding it. During the last century a scientific approach called probabilistic design has been widely applied to optimize such a balance. The first step is to identify a probability P of the magnitude of the phenomenon not exceeding a certain limit in 1 year. This probability can be expressed in terms of the return period RThe return period R is the time in years during which an annual maximum exceeds the assessed severity limit once in the mean.
The allowable risk depends on the related human safety aspects, the planned lifetime and cost of the structure, as well as the indirect adverse effects of damage. One would, for example, wish to design a dam for a very low risk of a hazardous event, but a communication tower for R in the order of the expected lifetime of it. Thus, assessing the allowable risk is a political and economic decision. The probability of exceedance of an event to be applied in structural design is given in the building codes.
Science steps in when the magnitude of the phenomenon that corresponds to the given probability of exceedance is estimated. For example, the question might be: What is the value of the wind speed that is exceeded once in 100 years in the mean at the location? To answer such a question, analysis of historical data of extreme wind speeds is required. However, the data would unlikely cover more than a few decades, because reliable wind measurements do not go back further than that. The 100 year return wind speed may not have occurred even once during such an observation period. Taking the second example of a dam, the situation is much worse: Earthquake events even remotely as severe as the chosen design event are most probably missing from any available data set. In terms of statistics theory, the problem is that the tail of the cumulative distribution function CDF cannot be readily estimated from the parent data, since they include little information in the range of small probabilities that are of interest here.
Section snippets
The extreme value theory
To cope with the situation described above, the theory of extremes developed by Fisher and Tippett [1] is commonly utilized. The crux of the extreme value theory is that ideally the parent cumulative distribution function CDF of the variable need not to be known, because the CDF of extremes of any parent distribution asymptotically approaches a known distribution when the sample size increases. In the case of seasonal extremes the CDF approaches one of three types of so-called Fisher–Tippett
The problem with the plotting positions
An important part of the extreme value analysis is the method to assess the non-exceedance probability P of the order-ranked data, i.e. the so called plotting positions (see Fig. 1). Note that while these cumulative probabilities are called “plotting positions”, their determination is by no means limited to the classical graphical analysis on probability paper. On the contrary, plotting positions must be determined for any analysis of order ranked extremes. Hence, the discussion that follows is
The solution to the plotting positions
Consider a variable x that has a probability density function f(x) and cumulative distribution function F(x). Then a new variable F(xm) related to x by order ranking from the smallest (m = 1) to the largest (m = N) value will have the probability density fm(F(xm)) given bywhere F(xm) is the cumulative distribution function of the order ranked values (0 ⩽ F(xm) ⩽ 1). Eq. (6) can be derived by several approaches [2], [3], [8], [16].
Gumbel [2] showed that,
Missing the unique nature of the probability positions
It was pointed out in Section 4 above that Eq. (5) associates the cumulative probability P to the mth rank in N samples. Gumbel’s result in Eq. (7) is distribution free. The rest of the derivation in Section 4 is likewise independent of the underlying distribution. Thus, Eq. (12) underlines that the plotting positions are unique in the sense that Pm depends on m and N only. Consequently, the use of such plotting methods that depend on the distribution of the data or on the application of the
The problem with a small number of events per extreme
Even when using the correct probability positions, present engineering practices often lead to significant errors due to improperly applying the extreme value theory. This problem is illustrated here by an example of an extreme value analysis using two methods. The data in Fig. 2 are related to ice loads on power transmission lines. These data include the January 1998 Ice Storm that was the worst natural disaster in Canadian history [39] due to the collapse of power lines. The radial ice
The fitting method
In addition to the issues discussed above, the extreme value analysis requires choosing a fitting technique which involves giving weights to the order ranked data points. This problem has been widely discussed in the literature and reviewed e.g. in [3], [58]. The choice of a fitting method generally causes smaller uncertainties than the problems of plotting positions and non-asymptotic behavior discussed in this paper. Nevertheless, some related issues are discussed in the following.
Firstly,
Discussion and conclusions
It was outlined above in Section 4 that the Weibull plotting formula P = m/(N + 1) directly follows from the Gumbel’s distribution-free result for the mean of the sample cumulative frequency and the classical definition of statistical probability. The probability P is not a random variable. Hence, calling its non-linear transformation η “a reduced variate”, as customary, is misleading. A better term would be a “reduced probability”. Furthermore, since P is not a variate, it would be more logical to
Acknowledgements
Thanks are due to M. Pajari for many fruitful discussions and comments on the manuscript, T. Kärnä for comments and K.F. Jones for providing data. This work was partly funded by the Environmental Cluster Research Program, Ministry of Environment, Finland.
References (70)
Towards better estimation of extreme winds
J Wind Eng Ind Aerodyn
(1982)Gumbel re-visited – a new look at extreme value statistics applied to wind speeds
J Wind Eng Ind Aerodyn
(1996)Unbiased plotting positions – a review
J Hydrol
(1978)- et al.
Extreme wind speeds in mixed climates revisited
J Wind Eng Ind Aerodyn
(2003) The accuracy of design values predicted from extreme value analysis
J Wind Eng Ind Aerodyn
(2001)A discussion on unbiased plotting positions for the general extreme value distribution
J Hydrol
(1990)- et al.
Alternative PWM-estimators of the Gumbel distribution
J Hydrol
(2003) Self-determined probability-weighed moments method and its application to various distributions
J Hydrol
(1997)- et al.
The method of self-determined probability weighted moments revisited
J Hydrol
(2002) - et al.
An evaluation of the self-determined probability-weighted moment method for estimating extreme wind speeds
J Wind Eng Ind Aerodyn
(2004)
Improvements to the “Method of Independent Storms”
J Wind Eng Ind Aerodyn
Control curves for extreme value methods
J Wind Eng Ind Aerodyn
Modeling power line icing in freezing precipitation
Atmos Res
Improved extreme wind prediction for the United States
J Wind Eng Ind Aerodyn
Exact and general FT1 penultimate distributions of extreme wind speeds drawn from a tail-equivalent Weibull parents
Struct Safe
Extreme value analysis of epoch maxima – convergence, and choice of asymptote
J Wind Eng Ind Aerodyn
Extreme wind load estimates based on the Gumbel distribution of dynamic pressures: an assessment
Struct Safe
Statistical analysis of high return period wind speeds
J Wind Eng Ind Aerodyn
Recent approaches to extreme value estimation with application to wind speeds. Part I: The Picklands method
J Wind Eng Ind Aerodyn
Statistics of extremes in hydrology
Adv Water Resour
Extreme wind speeds in mixed climates
J Ind Aerodyn
Extreme value prediction of snow avalanche runout
Cold Reg Sci Technol
The POT model described by the generalized Pareto distribution with Poisson arrival rate
J Hydrol
Limiting forms of the frequency distributions of the largest or smallest members of a sample
Proc Camb Philos Soc
Statistics of extremes
Extreme value theory in engineering
Decisions under uncertainty
Statistical interference using extreme order statistics
Ann Stat
Tests of the generalized Pareto distribution for predicting extreme wind speeds
J Appl Meteorol
A statistical theory of strength of materials
Ing Vet Ak Handl (Stockholm)
A plotting rule for extreme probability paper
J Geophys Res
Another look at plotting positions
Comm Stat – Theor Meth
Plotting positions and economics of engineering planning
Proc Am Soc Civ Eng Hydraul Div
Cited by (77)
Impact of climate change on the design of multi-megawatt spar floating wind turbines
2024, Marine StructuresNon-asymptotic Weibull tails explain the statistics of extreme daily precipitation
2023, Advances in Water ResourcesUncertainty assessment of significant wave height return levels downscaling for coastal application
2022, Applied Ocean ResearchCitation Excerpt :Several sources of uncertainty affect the EVA (Wang et al., 2021), such as the distribution used to model the peaks (Makkonen, 2008) and the selection of the initial dataset, which can either rely on the block maxima or the peak over threshold (POT) approach; in the latter case, the choice of the threshold is not uniquely defined and poses additional drawbacks (Cavanaugh et al., 2015; Davison and Huser, 2015).
Assessing extremes in hydroclimatology: A review on probabilistic methods
2022, Journal of Hydrology