Skip to main content
Log in

The skill of multi-model seasonal forecasts of the wintertime North Atlantic Oscillation

  • Published:
Climate Dynamics Aims and scope Submit manuscript

Abstract.

The skill assessment of a set of wintertime North Atlantic Oscillation (NAO) seasonal predictions in a multi-model ensemble framework has been carried out. The multi-model approach consists in merging the ensemble hindcasts of four atmospheric general circulation models forced with observed sea surface temperatures to create a multi-model ensemble. Deterministic (ensemble-mean based) and probabilistic (categorical) NAO hindcasts have been considered. Two different sets of NAO indices have been used to create the hindcasts. A first set is defined as the projection of model anomalies onto the NAO spatial pattern obtained from atmospheric analyses. The second set obtains the NAO indices by standardizing the leading principal component of each single-model ensemble. Positive skill is found with both sets of indices, especially in the case of the multi-model ensemble. In addition, the NAO definition based upon the single-model leading principal component shows a higher skill than the hindcasts obtained using the projection method. Using the former definition, the multi-model ensemble shows statistically significant (at 5% level) positive skill in a variety of probabilistic scoring measures. This is interpreted as a consequence of the projection method being less suitable because of the presence of errors in the spatial NAO patterns of the models. The positive skill of the seasonal NAO found here seems to be due not to the persistence of the long-term (decadal) variability specified in the initial conditions, but rather to a good simulation of the year-to-year variability. Nevertheless, most of the NAO seasonal predictability seems to be due to the correct prediction of particular cases such as the winter of 1989. The higher skill of the multi-model has been explained on the basis of a more reliable description of large-scale tropospheric wave features by the multi-model ensemble, illustrating the potential of multi-model experiments to better identify mechanisms that explain seasonal variability in the atmosphere.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.

Similar content being viewed by others

References

  • Ambaum MHP, Hoskins BJ (2002) The NAO troposphere–stratosphere connection. J Clim 15: 1969–1978

    Article  Google Scholar 

  • Ambaum MHP, Hoskins BJ, Stephenson DB (2001) Arctic Oscillation or North Atlantic Oscillation? J Clim 14: 3495–3507

    Article  Google Scholar 

  • Atger F (1999) The skill of ensemble prediction systems. Mon Weather Rev 127: 1941–1953

    Article  Google Scholar 

  • Bacon S, Carter DJT (1993) A connection between mean wave height and atmospheric pressure gradient in the North Atlantic. Int J Climatol 13: 423–436

    Google Scholar 

  • Barnett TP (1985) Variations in near-global sea level pressure. J Atmos Sci 42: 478–501

    Article  Google Scholar 

  • Barnett TP (1995) Monte Carlo climate forecasting. J Clim 8: 1005–1022

    Article  Google Scholar 

  • Barnston AG, Livezey RE (1987) Classification, seasonality and persistence of low-frequency atmospheric circulation patterns. Mon Weather Rev 115: 1083–1126

    Article  Google Scholar 

  • Beniston M, Rebetez M (1996) Regional behavior of minimum temperatures in Switzerland for the period 1979–1993. Theor Appl Climatol 53: 231–243

    Google Scholar 

  • Branković Č, Palmer TN (1997) Atmospheric seasonal predictability and estimates of ensemble size. Mon Weather Rev 125: 859–874

    Article  Google Scholar 

  • Branković Č, Palmer TN (2000) Seasonal skill and predictability of ECMWF PROVOST ensembles. Q J R Meteorol Soc 126: 2035–2068

    Article  Google Scholar 

  • Branković Č, Palmer TN, Ferranti L (1994) Predictability of seasonal atmospheric variations. J Clim 7: 217–237

    Article  Google Scholar 

  • Bretherton CS, Battisti DS (2000) An interpretation of the results from atmospheric general circulation models forced by the time history of the observed sea surface temperature distributions. Geophys Res Lett 27: 767–770

    Article  Google Scholar 

  • Castro-Díez Y, Pozo-Vázquez D, Rodrigo FS, Esteban-Parra MJ (2002) NAO and winter temperature variability in southern Europe. Geophys Res Lett 29(8): 1–14

    Article  Google Scholar 

  • Cayan DR (1992) Latent and sensible heat flux anomalies over the northern oceans: the connection to monthly atmospheric circulation. J Clim 5: 354–369

    Article  Google Scholar 

  • Czaja A, Frankignoul D (1999) Influence of the North Atlantic SST on the atmospheric circulation. Geophys Res Lett 26: 2969–2972

    Article  Google Scholar 

  • Déqué M (1991) Removing the model systematic error in extended range forecasting. Ann Geophys 9: 242–251

    Google Scholar 

  • Déqué M (1997) Ensemble size for numerical seasonal forecasts. Tellus 49A: 74–86

    Google Scholar 

  • Déqué M, Royer JF, Stroe R (1994) Formulation of Gaussian probability forecasts based on model extended-range integrations. Tellus 46A: 52–65

    Google Scholar 

  • Deser C, Blackmon ML (1993) Surface climate variations over the North Atlantic Ocean during winter: 1900–1989. J Clim 6: 1743–1753

    Article  Google Scholar 

  • Doblas-Reyes FJ, Déqué M, Piedelièvre JP (2000) Multi-model spread and probabilistic seasonal forecasts in PROVOST. Q J R Meteorol Soc 126: 2069–2088

    Article  Google Scholar 

  • Dong BW, Sutton RT, Jewson SP, O'Neill A, Slingo JM (2000) Predictable winter climate in the North Atlantic sector during the 1997–1999 ENSO cycle. Geophys Res Lett 27: 985–988

    Article  Google Scholar 

  • Drévillon M, Cassou CH, Terray L (2003) Model study of the North Atlantic region atmosperic response to autumn tropical Atlantic sea-surface-temperature anomalies. Quart J Roy Meterol Soc 129:2591–2611

    Google Scholar 

  • Drévillon M, Terray L, Rogel P, Cassou C (2001) Mid latitude Atlantic SST influence on European winter climate variability in the NCEP reanalysis. Clim Dyn 18: 331–344

    Article  Google Scholar 

  • Elliott JR, Jewson SP, Sutton RT (2001) The impact of the 1997/98 El Niño event on the Atlantic Ocean. J Clim 14: 1069–1077

    Article  Google Scholar 

  • Epstein ES (1969a) Stochastic dynamic prediction. Tellus 21: 739–759

    Google Scholar 

  • Epstein ES (1969b) A scoring system for probability forecasts of ranked categories. J Appl Meteorol 8: 985–987

    Article  Google Scholar 

  • Evans RE, Harrison MSJ, Graham RJ, Mylne KR (2000) Joint medium-range ensembles from the Met. Office and ECMWF systems. Mon Weather Rev 128: 3104–3127

    Article  Google Scholar 

  • Fang Z, Wallace JM (1994) Arctic sea-ice variability on a time scale of weeks and its relation to atmospheric forcing. J Clim 7: 1897–1914

    Article  Google Scholar 

  • Fritsch JM, Hilliker J, Ross J, Vislocky RL (2000) Model consensus. Weather Forecast 15: 571–582

    Article  Google Scholar 

  • Gibson JK, Kallberg P, Uppala S, Hernandez A, Nomura A, Serrano E (1997) ERA description. ECMWF re-analysis project report series 1. ECMWF Tech Rep, pp 872

  • Glowienka-Hensa R (1985) Studies on the variability of the Icelandic Low and Azores High between 1881 and 1983. Contrib Atmos Phys 58: 160–170

    Google Scholar 

  • Graham RJ, Evans ADL, Mylne KR, Harrison MSJ, Robertson KB (2000) An assessment of seasonal predictability using atmospheric general circulation models. Q J R Meteorol Soc 126: 2211–2240

    Article  Google Scholar 

  • Harrison MSJ, Palmer TN, Richardson DS, Buizza R, Petroliagis T (1995) Joint ensembles from the UKMO and ECMWF models. In: Proc Seminar Predictability, ECMWF, Reading, UK, 2: 61–120

  • Hastenrath S (2002) Dipoles, temperature gradients, and tropical climate anomalies. Bull Am Meteorol Soc 83: 735–738

    Article  Google Scholar 

  • Hoerling MP, Hurrell JW, Xu T (2001) Tropical origins for recent North Atlantic climate change. Science 292: 90–92

    CAS  PubMed  Google Scholar 

  • Hoffman NR, Kalnay E (1983) Lagged average forecasting, an alternative to Monte Carlo forecasting. Tellus 35A: 100–118

    Google Scholar 

  • Honda M, Nakamura H, Ukita J, Kousaka I, Takeuchi K (2001) Interannual seesaw between the Aleutian and Icelandic lows. Part I: seasonal dependence and life cycles. J Clim 13: 1029–1042

    Article  Google Scholar 

  • Hurrell JW (1995a) Decadal trends in the North Atlantic Oscillation regional temperatures and precipitation. Science 269: 676–679

    CAS  Google Scholar 

  • Hurrell JW (1995b) Transient eddy forcing of the rotational flow during northern winter. J Atmos Sci 52: 2286–2301

    Article  Google Scholar 

  • Hurrell JW, van Loon H (1997) Decadal variations in climate associated with the North Atlantic oscillation. Clim Change 36: 301–326

    Article  Google Scholar 

  • Jolliffe IT, Stephenson DB, eds (2003) Forecast verification: a practitioner's guide in atmospheric science. Wiley and Sons, Chichester, UK, pp 240

    Google Scholar 

  • Jones PD, Jönsson T, Wheeler D (1997) Extension to the North Atlantic Oscillation using early instrumental pressure observations from Gibraltar and south-west Iceland. Int J Climatol 17: 1433–1450

    Article  Google Scholar 

  • Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Leetmaa A, Reynolds B, Chelliah M, Ebisuzaki W, Higgins W, Janowiak J, Mo KC, Ropelewski C, Wang J, Jenne R, Joseph D (1996) The NCEP/NCAR 40-year reanalysis project. Bull Am Meteorol Soc 77: 437–472

    Article  Google Scholar 

  • Kharin VV, Zwiers FW (2002) Climate prediction with multimodel ensembles. J Clim 15: 793–799

    Article  Google Scholar 

  • Kodera K, Chiba M, Koide H, Kitoh A, Nikaido Y (1996) Interannual variability of winter stratosphere and troposphere in the Northern Hemisphere. J Meteorol Soc Jpn 74: 365–382

    Google Scholar 

  • Krishnamurti TN, Kishtawai CM, LaRow T, Bachiochi DR, Zhang Z, Williford CE, Gadgil S, Surendran S (1999) Improved weather and seasonal climate forecasts from multimodel superensemble. Science 285: 1548–1550

    Article  CAS  PubMed  Google Scholar 

  • Krishnamurti TN, Kishtawai CM, Zhang Z, LaRow T, Bachiochi D, Williford CE (2000) Multimodel ensemble forecasts for weather and seasonal climate. J Clim 13: 4196–4216

    Article  Google Scholar 

  • Kumar A, Barnston AG, Hoerling MP (2001) Seasonal prediction, probabilistic verifications, and ensemble size. J Clim 14: 1671–1676

    Article  Google Scholar 

  • Lamb PJ, Peppler RA (1987) North Atlantic Oscillation: concept and an application. Bull Am Meteorol Soc 68: 1218–1225

    Article  Google Scholar 

  • Latif M, Barnett TP (1996) Decadal variability over the North Pacific and North America: dynamics and predictability. J Clim 9: 2407–2423

    Article  Google Scholar 

  • Leith CE (1974) Theoretical skill of Monte Carlo forecasts. Mon Weather Rev 102: 409–418

    Article  Google Scholar 

  • Loewe F (1937) A period of warm winters in western Greenland and the temperature see-saw between western Greenland and Europe. Q J R Meteorol Soc 63: 365–371

    Google Scholar 

  • Luterbacher J, Schmutz C, Gyalistras D, Xoplaki E, Wanner H (1999) Reconstruction of monthly NAO and EU indices back to AD 1675. Geophys Res Lett 26: 2745–2748

    Google Scholar 

  • Marsh R (2000) Recent variability of the North Atlantic thermohaline circulation inferred from surface heat and freshwater fluxes. J Clim 13: 3239–3260

    Article  Google Scholar 

  • Marshall JC, Molteni F (1993) Towards a dynamical understanding of weather regimes. J Atmos Sci 50: 1792–1818

    Article  Google Scholar 

  • Marshall JC, Johnson H, Goodman J (2001) A study of the interaction of the North Atlantic Oscillation with ocean circulation. J Clim 14: 1399–1421

    Article  Google Scholar 

  • Martineu C, Caneill JY, Sadourny R (1999) Potential predictability of European winters from the analysis of seasonal simulations with an AGCM. J Clim 12: 3033–3061

    Article  Google Scholar 

  • Massacand AC, Davies HU (2001) Interannual variability of European winter weather: the potential vorticity insight. Atmos Sci Lett doi:10.1006/asle.2001.0026, http://www.idealibrary.com/links/toc/asle/0/0/0

  • Molteni F, Cubash U, Tibaldi S (1988) 30- and 60-day forecast experiments with the ECMWF spectral models. In: Chagas C, Puppi G (eds) Persistent meteo-oceanographic anomalies and teleconnections. Pontificae Academiae Scientiarum Scripta Varia, Vatican City, 69: 505–555

  • Moulin C, Lambert CE, Dulac F, Dayan U (1997) Atmospheric export of dust from North Africa: control by the North Atlantic Oscillation. Nature 387: 691–694

    Article  CAS  Google Scholar 

  • Murphy AH (1971) A note on the ranked probability score. J Appl Meteorol 10: 155–156

    Article  Google Scholar 

  • Murphy AH (1992) Climatology, persistence, and their linear combination as standards of reference in skill scores. Weather Forecast 7: 692–698

    Article  Google Scholar 

  • Murphy AH, Winkler RL (1987) A general framework for forecast verification. Mon Weather Rev 115: 1330–1338

    Article  Google Scholar 

  • Orsolini Y, Doblas-Reyes FJ (2003) Ozone signatures of climate patterns over the Euro-Atlantic sector in spring. Q J R Meteorol Soc (in press)

  • Palmer TN, Sun Z (1985) A modelling and observational study of the relationship between sea surface temperature anomalies in the northwest Atlantic and the atmospheric general circulation. Q J R Meteorol Soc 111: 947–975

    Article  Google Scholar 

  • Palmer TN, Anderson DLT (1994) The prospect for seasonal forecasting – a review paper. Q J R Meteorol Soc 120: 755–793

    Article  Google Scholar 

  • Palmer TN, Shukla J (2000) Editorial to DSP/PROVOST special issue. Q J R Meteorol Soc 126: 1989–1990

    Article  Google Scholar 

  • Palmer TN, Brankovic C, Richardson DS (2000) A probability and decision-model analysis of PROVOST seasonal multi-model ensemble integrations. Q J R Meteorol Soc 126: 2013–2034

    Article  Google Scholar 

  • Pavan V, Doblas-Reyes FJ (2000) Multi-model seasonal forecasts over the Euro-Atlantic: skill scores and dynamic features. Clim Dyn 16: 611–625

    Article  Google Scholar 

  • Pavan V, Molteni F, Branković Č (2000) Wintertime variability in the Euro-Atlantic region in observations and in ECMWF seasonal ensemble experiments. Q J R Meteorol Soc 126: 2143–2173

    Article  Google Scholar 

  • Peng S, Whitaker JS (1999) Mechanisms determining the atmospheric response to midlatitude SST anomalies. J Clim 12: 1393–1408

    Article  Google Scholar 

  • Perlwitz J, Graf HF (1995) The statistical connection between tropospheric and stratospheric circulation of the Northern Hemisphere in winter. J Clim 8: 2281–2295

    Article  Google Scholar 

  • Qian B, Corte-Real J, Xu H (2000) Is the North Atlantic Oscillation the most important atmospheric pattern for precipitation in Europe? J Geophys Res 105: 11,901–11,910

    Article  Google Scholar 

  • Richardson DS (2001) Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size. Q J R Meteorol Soc 127: 2473–2489

    Article  Google Scholar 

  • Rodwell MJ, Rowell DP, Folland CK (1999) Oceanic forcing of the wintertime North Atlantic oscillation and European climate. Nature 398: 320–323

    CAS  Google Scholar 

  • Rogers JC (1990) Patterns of low-frequency monthly sea level pressure variability (1899–1986) and associated wave cyclone frequencies. J Clim 3: 1364–1379

    Article  Google Scholar 

  • Serreze MC, Carse F, Barry RG, Rogers JC (1997) Icelandic low cyclone activity: climatological features, linkages with the NAO, and relationship with recent changes in the Northern Hemisphere circulation. J Clim 10: 453–464

    Article  Google Scholar 

  • Shabbar A, Huang J, Higuchi K (2001) The relationship between the wintertime North Atlantic Oscillation and blocking episodes in the North Atlantic. Int J Climatol 21: 355–369

    Article  Google Scholar 

  • Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, New York, USA

  • Stefanova L, Krishnamurti TN (2002) Interpretation of seasonal climate forecast using Brier skill score, the Florida State University superensemble, and the AMIP-I dataset. J Clim 15: 537–544

    Article  Google Scholar 

  • Stephenson DB (1997) Correlation of spatial climate/weather maps and the advantages of using the Mahalanobis metric in prediction. Tellus 49A: 513–527

    Google Scholar 

  • Stephenson DB (2000) Use of the "odds ratio" for diagnosing forecast skill. Weather Forecast 15: 221–232

    Article  Google Scholar 

  • Stephenson DB, Pavan V (2003) The North Atlantic Oscillation in coupled climate models: a CMIP1 evaluation. Clim Dyn (in press)

    Google Scholar 

  • Stephenson DB, Pavan V, Bojariu R (2000) Is the North Atlantic Oscillation a random walk? Int J Climatol 20: 1–18

    Article  Google Scholar 

  • Stephenson DB, Wanner H, Brönnimann S, Luterbacher J (2003) The history of scientific research on the North Atlantic Oscillation. In: Hurrell JW, Kushnir Y, Ottersen G, Visbeck M (eds) The North Atlantic Oscillation. AGU Geophysical Monograph Series, 134

  • Stern W, Miyakoda K (1995) Feasibility of seasonal forecast inferred from multiple GCM simulations. J Clim 8: 1071–1085

    Article  Google Scholar 

  • Sutton R, Mathieu PP (2002) Response of the atmosphere–ocean mixed layer system to anomalous ocean heat flux convergence. Q J R Meteorol Soc 128:1259–1275

    Google Scholar 

  • Swets JA (1973) The relative operating characteristic in psychology. Science 182: 990–1000

    Google Scholar 

  • Thompson PD (1977) How to improve accuracy by combining independent forecasts. Mon Weather Rev 105: 228–229

    Article  Google Scholar 

  • Thornes JE, Stephenson DB (2001) How to judge the quality and value of weather forecast products. Meteorol Appl 8: 307–314

    Article  Google Scholar 

  • Tracton MS, Kalnay E (1993) Operational ensemble prediction at the National Meteorological Center. Practical aspects. Weather Forecast 8: 379–398

    Article  Google Scholar 

  • van Loon H, Rogers JC (1978) The seesaw in winter temperatures between Greenland and Northern Europe. Part I: general description. Mon Weather Rev 106: 296–310

    Google Scholar 

  • Vislocky RL, Fritsch JM (1995) Improved model output statistics forecast through model consensus. Bull Am Meteorol Soc 76: 1157–1164

    Article  Google Scholar 

  • Walker GT (1924) Correlations in seasonal variations of weather IX. Mem. 24: 275–332, Indian Meteorological Department, Pune, India

    Google Scholar 

  • Wallace JM, Gutzler DS (1981) Teleconnections in the geopotential height field during the Northern Hemisphere winter. Mon Weather Rev 109: 784–812

    Google Scholar 

  • Wilks DS (1995) Statistical methods in the atmospheric sciences. Academic Press, (1st edn)

  • Zhang H, Casey T (2000) Verification of categorical probability forecasts. Weather Forecast 15: 80–89

    Article  Google Scholar 

Download references

Acknowledgements.

This study was undertaken when the first author worked at the Centre Nationale de Recherches Météorologiques, Météo-France (Toulouse, France). VP has received support from the Progetto Strategico SINAPSI funded by the Ministero dell'Istruzione, dell'Universita'e della Ricerca (MIUR) and Consiglio Nazionale di Ricerca (CNR). The authors wish to thank David Anderson, Magdalena Balmaseda, Michel Déqué, Thomas Jung, Alexia Massacand, Laura Ferranti, and Tim Palmer for reviews of early drafts and constructive advice. Richard Graham and an anonymous reviewer are especially acknowledged for their significant contribution to the improvement of the scientific quality and readability of the paper. This work was in part supported by the EU-funded DEMETER project (EVK2-1999-00197).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. J. Doblas-Reyes.

Appendix 1: scoring rules

Appendix 1: scoring rules

A tool commonly used to evaluate the association between ensemble-mean hindcasts and verification is the time correlation coefficient. This measure is independent of the mean and variance of both variables. As in the rest of the study, different climatologies for hindcasts and verification were computed using the cross-validation technique, making the correlation estimator unbiased (Déqué 1997).

A set of verification measures has been used to assess the quality of the probabilistic hindcasts: the ranked probability skill score (RPSS), the receiver operating characteristic (ROC) area under the curve, the Peirce skill score (PSS), and the odds ratio skill score (ORSS). Most of them, along with estimates of the associated error, are described in Stephenson (2000), Zhang and Casey (2000), and Thornes and Stephenson (2001), where the reader is referred to for more specific definitions and properties.

The accuracy measure for RPSS is the ranked probability score (RPS). RPS was first proposed by Epstein (1969b) and simplified by Murphy (1971). This score for categorical probabilistic forecasts is a generalization of the Brier score for ranked categories. For J ranked categories, the RPS can be written:

$$ RPS({\mathbf{r}},{\mathbf{d}}) = {{1}\over{J - 1}} \sum\limits_{i = 1}^J {\left(\sum\limits_{k = 1}^i {r_k} - \sum\limits_{k = 1}^i {d_k} \right)} ^2 $$
(1)

where the vector \( \mathbf{r} = (r_1, \ldots , r_J) (\sum\nolimits_{k = 1}^J {r_k = 1}) \) represents an estimate of the forecast PDF and d = (d 1, …, d J) corresponds to the verification PDF where d k is a delta function which equals to 1 if category k occurs and 0 otherwise. By using cumulative probabilities, it takes into account the ordering of the categories, though for finite ensemble sizes, the estimated probabilities for the event to be in different categories strongly depend on the estimate of the category thresholds. RPS can be accumulated for several time steps or grid points over a region, or both. The RPSS expresses the relative improvement of the forecast against a reference score. The reference score used here has been the climatological probability hindcast, which, under the assumption of a Gaussian distribution of the observations, is the forecast without any skill that minimises the RPS (Déqué et al. 1994). The RPSS is defined as:

$$ RPSS = 100 \left(1 - \frac{RPS_{\text{forecast}}} {RPS_{\text{climatol}}} \right) $$
(2)

Such skill score is 100 for a perfect forecast, 0 for a probabilistic forecast which is no more accurate than a trivial forecast using long-term climatology, and negative for even worse forecasts, as random or biased values. To provide an estimate of the skill score significance, the calculations were repeated 100 times for a given time series (either a grid point or the NAO index). Each time, the order of the individual hindcasts was scrambled (this preserves the PDF of the variable), then computing the skill score, and finally taking the 5% upper threshold of the resulting skill distribution.

RPSS can be a too stringent measure of skill by requiring a correct estimate of a simplified PDF. Then, a set of simple accuracy measures for binary events is made based upon the hit rate H, or the relative number of times an event was forecast when it occurred, and the false alarm rate F, or the relative number of times the event was forecast when it did not occur (Jolliffe and Stephenson 2003). They are based on the likelihood-base rate factorization of the joint probability distribution of forecasts and verifications (Murphy and Winkler 1987). To derive them, a contingency table is computed, wherein the cells are occupied by the number of hits (a, number of cases when an event is forecast and is also observed), false alarms (b, number of cases the event is not observed but is forecast), misses (c, number of cases the event is observed but not forecast), and correct rejections (d, number of no-events correctly forecast) for every ensemble member. Then, the hit rate and the false alarm rate take the form:

$$ H = \frac{a} {a + c}\quad F = \frac{b} {b + d} $$
(3)

The previous scheme allows for the definition of a reliability measure, the bias B. Reliability is another attribute of forecast quality and corresponds to the ability of the forecast system to average probabilities equal to the frequency of the observed event. The bias indicates whether the forecasts of an event are being issued at a higher rate than the frequency of observed events. It reads:

$$ B = \frac{a + b} {a + c} $$
(4)

A bias greater than 1 indicates over-forecasting, i.e., the model forecasts the event more often than it is observed. Consequently, a bias lower than 1 indicates under-forecasting.

The Peirce skill score (PSS) is a simple measure of skill that equals to the difference between the hit rate and the false alarm rate:

$$ PSS = H - F $$
(5)

When the score is greater than zero, the hit rate exceeds the false alarm rate so that the closer the value of PSS to 1, the better. The standard error formula for this score assumes independence of hit and false alarm rates and, for large enough samples, it is computed as:

$$ \sigma _{PSS} = \sqrt{\frac{H(1 - H)} {a + c} + \frac{F(1 - F)} {b + d}} $$
(6)

The odds ratio (OR) is an accuracy measure that compares the odds of making a good forecast (a hit) to the odds of making a bad forecast (a false alarm):

$$ OR = \frac{H} {1 - H} \frac{1 - F} {F} $$
(7)

The ratio is greater than one when the hit rate exceeds the false alarm rate, and is unity when forecast and reference values are independent. It presents the advantage of being independent of the forecast bias. Furthermore, it has the property that the natural logarithm of the odds ratio is asymptotically normally distributed with a standard error of 1/(n h )1/2 where

$$ \frac{1} {n_h} = \frac{1} {a} + \frac{1} {b} + \frac{1} {c} + \frac{1} {d} $$
(8)

To test whether there is any skill, one can test against the null hypothesis that the forecasts and verifications are independent with log odds of zero. A simple skill score, the odds ratio skill score (ORSS), ranging from –1 to +1, where a score of zero represents no skill, may be obtained from the odds ratio through the expression:

$$ ORRS = \frac{OR - 1} {OR + 1} = \frac{H - F} {H + F - 2HF} $$
(9)

Thornes and Stephenson (2001) provide a useful table with the minimum values of ORSS needed to have significant skill at different levels of confidence depending on the value of n h .

The ROC (Swets 1973) is a signal-detection curve plotting the hit rate against the false alarm rate for a specific event over a range of probability decision thresholds (Evans et al. 2000; Graham et al. 2000; Zhang and Casey 2000). Basically, it indicates the performance in terms of hit and false alarm rate stratified by the verification. The probability of detection is a probability decision threshold that converts probabilistic binary forecasts into deterministic binary forecasts. For each probability threshold, a contingency table is obtained from which the hit and false alarm rates are computed. For instance, consider a probability threshold of 10%. The event is forecast in those cases where the probability is equal to or greater than 10%. This calculation is repeated for thresholds of 20%, 30%, up to 100% (or whatever other selection of intervals, depending mainly on the ensemble size). Then, the hit rate is plotted against the false alarm rate to produce a ROC curve. Ideally, the hit rate will always exceed the false alarm rate and the curve will lie in the upper-left-hand portion of the diagram. The hit rate increases by reducing the probability threshold, but at the same time the false alarm rate is also increased. The standardized area enclosed beneath the curve is a simple accuracy measure associated with the ROC, with a range from 0 to 1. A system with no skill (made by either random or constant forecasts) will achieve hits at the same rate as false alarms and so its curve will lie along the 45° line and enclose a standardized area of 0.5. As the ROC is based upon a stratification by the verification it provides no information about reliability of the forecasts, and hence the curves cannot be improved by improving the climatology of the system. The skill score significance was assessed, as in the case of RPSS, by Monte Carlo methods.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Doblas-Reyes, F.J., Pavan, V. & Stephenson, D.B. The skill of multi-model seasonal forecasts of the wintertime North Atlantic Oscillation. Climate Dynamics 21, 501–514 (2003). https://doi.org/10.1007/s00382-003-0350-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00382-003-0350-4

Keywords

Navigation