A Bayesian state space modelling approach to probabilistic quantitative precipitation forecasting

https://doi.org/10.1016/j.jhydrol.2003.11.040Get rights and content

Abstract

The generation of very short range forecasts of precipitation in the 0–6 h time window is traditionally referred to as nowcasting. Most existing nowcasting systems essentially extrapolate radar observations in some manner, however, very few systems account for the uncertainties involved. Thus deterministic forecast are produced, which have a limited use when decisions must be made, since they have no measure of confidence or spread of the forecast. This paper develops a Bayesian state space modelling framework for quantitative precipitation nowcasting which is probabilistic from conception. The model treats the observations (radar) as noisy realisations of the underlying true precipitation process, recognising that this process can never be completely known, and thus must be represented probabilistically. In the model presented here the dynamics of the precipitation are dominated by advection, so this is a probabilistic extrapolation forecast. The model is designed in such a way as to minimise the computational burden, while maintaining a full, joint representation of the probability density function of the precipitation process. The update and evolution equations avoid the need to sample, thus only one model needs be run as opposed to the more traditional ensemble route. It is shown that the model works well on both simulated and real data, but that further work is required before the model can be used operationally.

Introduction

The provision of reliable forecasts of precipitation over the 0–6 h period at high spatial resolution, often described as nowcasting, is regarded as essential for flood forecasting by the Environment Agency (Golding, 2000). To adequately describe the spatial distribution and timing of precipitation over catchments with areas of the order of 10 km2, prediction over a spatial scale of the order of 1 km is required, with a temporal scale of the order of 10 min. In the foreseeable future forecasts of precipitation at such high resolutions will require the application of probabilistic methods since precipitation generating processes are very sensitive to the correct specification of initial and boundary conditions, which is unlikely to be resolved in operational meso-scale models for some time, largely due to problems of data assimilation and parameterisations.

Operationally, precipitation forecasts are produced over a variety of time and space scales, using a variety of methods We shall assume that nowcasting refers to the production of forecasts with lead times of 0–6 h (Golding, 2000), while short-term forecasting refers to lead times of 6–24 h (Collier, 2000). Beyond 24 h lead times it is widely accepted that NWP based approaches are optimal for providing precipitation forecasts (Golding, 2000). In the range of 0–24 h a variety of methods are used to provide quantitative precipitation forecasts including:

  • NWP based approaches (Kuligowski and Barros, 1999);

  • model output statistics techniques (Antolik, 2000, Fox and Collier, 2000);

  • purely statistical extrapolation methods, generally using radar data as a first guess (Grecu and Krajewski, 2000, Toth et al., 2000, Mellor et al., 2000);

  • expert system based approaches which model precipitation cell evolution (Pierce et al., 2000).

These different approaches have strengths and weaknesses for a range of forecast lead times, time scales and space scales. In the nowcasting range of 0–6 h NWP based forecasts are less often used because the assimilation and initialisation cycle of these models is of the order of 3 h (Golding, 2000) and there are many unresolved issues surrounding the assimilation of precipitation data into these models (Zou and Kuo, 1996). Thus, to produce nowcasts of precipitation, various statistical methods have been developed, which take advantage of the availability of high resolution radar. In the UK the radar system is only capable of providing information of instantaneous rainfall rates, so this is what is assumed to be available.

Wheater et al. (2000) review a series of elegant statistical spatio-temporal models for describing precipitation fields, based on a hierarchical decomposition of the precipitation into precipitation fields, bands and cells, each of which are modelled as point processes in space (precipitation fields), time (precipitation bands) or space and time (cells). Thus the models are defined as being continuous in space and time, and the actual precipitation field is described by a series of precipitation cells. The models can be fit to radar data, but this fitting is designed to estimate the hyper-parameters of the models, such as the rates of the corresponding Poisson processes. Thus the models provide a statistical characterisation of the precipitation field but are not directly suitable for use in operational forecasting.

In a similar vein Mellor et al. (2000) describe a statistical model for the production of a space-time precipitation field, based on their so called Modified Turning Bands method. This uses triangular prisms to define the location of the regions with potential to generate precipitation, and these are modulated by a paired sinusoidal function to produce temporal pulsing in the resulting precipitation cell generation potential field. Cells are generated from this potential as a Poisson process (the potential defines the rate of the process). Each precipitation cell is described by an inverted parabola, and has its own characteristic lifetime, width and maximal intensity. The model requires manual intervention to be fit to data and is tuned to frontal systems with a linear structure. By generating realisations of the Poisson processes ensembles can be generated, but the rather ad-hoc nature of the fitting of the model to data means that the operational use of this system is currently limited.

In the GANDOLF system, used operationally in the Met Office (Pierce et al., 2000), convection is represented using an object-oriented approach whereby each convective cell is identified and evolved through a life-cycle model, which includes initiation of daughter cells. The system uses data from a variety of sources including the Met Office mesoscale model, METEOSAT images and the radar network. While the GANDOLF system has been shown to produce reasonable convective precipitation forecasts (Pierce et al., 2000), there are several areas which could be improved. One of the most significant problems is that one single deterministic forecast is produced which, for such a non-linear system, is unlikely to characterise the true distribution of precipitation at a forecast lead time of say 3 h.

Georgakakos (2000) developed a hybrid model, which has aspects of the model output statistics approach, in that NWP derived fields are used extensively, for example to predict advection vectors, but also directly uses radar data (both near ground precipitation rates and vertically integrated water content) to produce a nowcast. The paper is novel in that a method is developed which also incorporates model uncertainty, through the specification of both observation and model errors. In that respect is it similar to the work described in this paper, however Georgakakos (2000) assumes a discrete state space (that is a grid based model, with numerical integration in the standard way) and thus can only represent the joint covariance matrix of the precipitation field in neighbouring cells, which is rather restrictive.

While there have been some interesting research developments in statistical modelling for precipitation forecasting, there remain many unanswered questions, in terms of what framework is most appropriate and how model uncertainty and observation uncertainty can be combined to capture the full probability distribution function of precipitation through space and time. In Section 1.2, one possible approach is reviewed.

To progress to the development of probabilistic models for precipitation, which can then be fed into a Bayesian hydrologic forecasting system such as presented in Krzysztofowicz (1999), we essentially have three options:

  • 1.

    use model output statistics techniques to post-process deterministic NWP (Antolik, 2000);

  • 2.

    use Monte Carlo (ensemble) methods with statistical or NWP models (Mellor et al., 2000);

  • 3.

    directly represent and evolve the probability density function of the precipitation process (this paper, and to some extent (Georgakakos, 2000)).

The first method, using model output statistics has a disadvantage, that the probability density function (pdf) of the precipitation process will be complex and state dependent, which may make parameterisation very difficult. In addition, many users (for instance those running distributed hydrological models) will require the joint pdf for precipitation over some spatial domain, so that the spatial structure of the precipitation is preserved, which will be very difficult to achieve.

The second option, based on sampling methods uses a large number of samples to characterise the pdf. The problem with these methods is that to correctly characterise the joint pdf of precipitation over a spatial domain may require many thousands of samples. Current operational ensemble prediction systems use around 100 samples, and are computationally very expensive, since each member must be integrated separately (Molteni et al., 1996). However ensembles maintain an advantage in that any model can readily be run as an ensemble, if the computational expense can be coped with. In addition it may be possible to post-process output to produce better approximations to the true forecast pdf (although this will again be very difficult for joint pdfs) (Mylne et al., 2002).

The third option is pursued in this paper, that is a model will be constructed which retains a probabilistic representation of the evolution of the precipitation field. Thus the model must propagate the full joint pdf of the precipitation field, and update this as observations become available. State space models provide a suitable framework, these being most readily presented in their natural Bayesian context. Any dynamical system can be represented in the state space modelling framework, which provides formalism for the updating and propagation of the pdf of the state of a system. In the linear, Gaussian case this is generally referred to as the Kalman filter. For non-linear and non-Gaussian systems there are a large variety of extensions to the Kalman filter (Chatfield, 1996), such as the extended Kalman filter, the ensemble Kalman filter (Evensen, 2001) and particle filter (Doucet et al., 2001) based methods.

State space models assume that the state of the system at time t, denoted xt, is not directly observed (it is a latent variable). The state is taken to evolve in time according to the (Markovian) state evolution equation:xt+1=f(xt)+ηt,where f() is the state evolution (or system) model which maps the state at time t to the new value at time t+1 and ηt is the system noise. This system noise represents model error, and can often be difficult to determine. The observations at time t, yt, are related to the state by the observation equation:yt=h(xt)+ϵt,where h() is the observation (or forward/sensor) model which maps the state variable to the observables. ϵt is the observation error and reflects the uncertainty in the observations which may arise from several sources, including incomplete knowledge of the observation process, such as arises when using radar images, as well as the intrinsic measurement uncertainty due to the observation system.

This framework is completely general, since there have been no restrictions imposed on f, h, η or ϵ. In the standard Kalman filter f and h are no longer functions but linear operators (matrices) and η and ϵ are assumed Gaussian, as is the initial distribution of the of the state x0. Thus for all future times the state will remain Gaussian distributed.

In this work the aim is to track the pdf of the state given all the previous observations, denoted by p(xt|Dt)=p(xt|yt,yt−1,…,y1).

The observation process is written as p(yt|xt,h), the conditional probability of the observations at time t given our estimate of the state at time t and the observation model, h. This reflects the assumption that the observation process in often not completely understood and that it has errors associated with it. The evolution of the system is given by p(xt+1|xt,f), the conditional probability of the state at time t+1 given our estimate of the state at time t and the system model, f. In Section 2 these are shown for the precipitation model.

In the Bayesian interpretation the state evolution (forecast) step becomes:p(xt|Dt−1)=∫p(xt|xt−1)p(xt−1|Dt−1)dxt−1.The state update (assimilation) step can then be written as:p(xt|Dt)=p(yt|xt)p(xt|Dt−1)∫p(yt|xt)p(xt|Dt−1)dxt,which can be seen as the application of Bayes theorem to the inverse problem of estimating the state given a set of observations, where the prior distribution, p(xt|Dt−1) comes from the state evolution step. These equations are then applied sequentially, to propagate the pdf of the state forward in time and update it given some observations, as illustrated in Fig. 1.

This general framework is very flexible and can adapted to almost every modelling situation. In Section 2 the state space modelling framework is extended to the probabilistic quantitative precipitation forecast model. In Section 3 the prior models are discussed and it is shown how these can be used generatively, to simulate from the model, even when no data has been seen. Section 4 discusses the results of the application of the model to both simulated (where we know the true parameters) and real data and conclusions are drawn in Section 5.

Section snippets

Model framework

The state space for the precipitation model must consist of the key variables necessary to forecast precipitation over the 0–3 h range. At a minimum there must be a representation of the instantaneous precipitation rate field, which we will denote R. To account for advection, the state space must be extended to include a vector field to represent the continuous spatial behaviour of the advection of the precipitation, which we denote u. The model dynamics are very simple, the evolution of R

Generative model

The model has been completely specified, although there remain some (hyper)parameters (the system noise (co)variances) which must be defined. Ideally these parameters would be estimated from data, using a method such as Kalman smoothing. Time constraints meant this was not possible for this prototype model, thus the values for these priors were set on the basis of expert judgement. This is consistent with the Bayesian framework adopted.

The system noise covariances were estimated using arguments

Results and discussion

The model was first tested on synthetic data generated from the model itself. This has the advantage that the true (generating) parameters are known and thus we can check the inference within the model is operating correctly. An example of a time series generated from the model, together with the fit to that data (the assimilated fields at the same time) is shown in Fig. 4. While this sort of testing cannot be used to infer the ability of the model to forecast real precipitation events, it does

Conclusions

This paper has described a fully probabilistic model for precipitation nowcasting. Wherever possible the assumptions which underlie the model have been explicitly stated, so that it is possible to appreciate the drawbacks and advantages of the model. A methodology is proposed for placing ad-hoc extrapolation based nowcasting models into a consistent probabilistic framework. There is much that could be criticised about the model, and the model deficiencies might be ranked in the following order:

Acknowledgements

Thanks are due to Clive Pierce of the UK Met Office and other members of the Stochastic Precipitation Nowcasting Panel for their comments and discussion about this work. Thanks are also due to Chris Collier for his helpful comments on radar errors at the review stage. I would like to acknowledge the help of MSc student Emmanuel Batail, who assisted in getting the model running on real data, and Lehel Csato who was always happy to discuss mathematical problems.

References (25)

  • C.G. Collier

    Developments in radar and remote-sensing methods for measuring and forecasting rain

    Philosophical Transactions of the Royal Society of London A

    (2002)
  • A. Doucet et al.

    Sequential Monte Carlo Methods in Practice

    (2001)
  • Cited by (13)

    View all citing articles on Scopus
    View full text