Skip to main content
Log in

Nonparametric Bayesian volatility learning under microstructure noise

  • Original Paper
  • Statistics for Stochastic Processes
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

In this work, we study the problem of learning the volatility under market microstructure noise. Specifically, we consider noisy discrete time observations from a stochastic differential equation and develop a novel computational method to learn the diffusion coefficient of the equation. We take a nonparametric Bayesian approach, where we a priori model the volatility function as piecewise constant. Its prior is specified via the inverse Gamma Markov chain. Sampling from the posterior is accomplished by incorporating the Forward Filtering Backward Simulation algorithm in the Gibbs sampler. Good performance of the method is demonstrated on two representative synthetic data examples. We also apply the method on a EUR/USD exchange rate dataset. Finally we present a limit result on the prior distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Code availability

The computer code to reproduce numerical examples in this article is available at Gugushvili et al. (2022).

Notes

  1. As of 2020, data are not available from the Pepperstone website any more, but can be obtained directly from the present authors. The data are stored as csv files, that contain the dates and times of transactions and bid and ask prices. The data over 2019 are available for download (after a free registration) at https://www.truefx.com/truefx-historical-downloads.

References

  • Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Review, 59(1), 65–98.

    Article  MathSciNet  MATH  Google Scholar 

  • Brigo, D., & Mercurio, F. (2006). Interest rate models-theory and practice: With smile, inflation and credit (Vol. 2). Springer.

    MATH  Google Scholar 

  • Cemgil, A. T., & Dikmen, O. (2007). Conjugate gamma Markov random fields for modelling nonstationary sources. In M. E. Davies, C. J. James, S. A. Abdallah, & M. D. Plumbley (Eds.), Independent component analysis and signal separation: 7th International Conference, ICA 2007, London, UK, September 9–12, 2007. Proceedings (pp. 697–705). Springer.

    Chapter  Google Scholar 

  • Cox, J. C., Ingersoll, J. E., & Ross, S. A. (1985). A theory of the term structure of interest rates. Econometrica, 53(2), 385–407.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., & Gijbels, I. (1995). Data-driven bandwidth selection in local polynomial fitting: Variable bandwidth and spatial adaptation. Journal of Royal Statistics Society Series B, 57(2), 371–394.

    MathSciNet  MATH  Google Scholar 

  • Filipovic, D. (2009). Term-structure models. A graduate course. Springer.

    Book  MATH  Google Scholar 

  • Ghosal, S., & van der Vaart, A. (2017). Fundamentals of nonparametric Bayesian inference, Cambridge series in statistical and probabilistic mathematics (Vol. 44). Cambridge: Cambridge University Press.

    Google Scholar 

  • Glasserman, P. (2004). Monte Carlo methods in financial engineering (Vol. 53). Springer.

    MATH  Google Scholar 

  • Gugushvili, S., van der Meulen, F., Schauer, M., & Spreij, P. (2019). Fast and scalable non-parametric Bayesian inference for Poisson point processes. RESEARCHERS.ONE. https://www.researchers.one/article/2019-06-6

  • Gugushvili, S., van der Meulen, F., Schauer, M., & Spreij, P. (2019). Nonparametric Bayesian volatility estimation. In J. de Gier, C. E. Praeger, & T. Tao (Eds.), 2017 MATRIX annals (pp. 279–302). Springer.

    Chapter  Google Scholar 

  • Gugushvili, S., van der Meulen, F., Schauer, M., & Spreij, P. (2020). Nonparametric Bayesian estimation of a Hölder continuous diffusion coefficient. Brazilian Journal of Probability and Statistics, 34(3), 537–579. https://doi.org/10.1214/19-BJPS433.

    Article  MathSciNet  MATH  Google Scholar 

  • Gugushvili, S., van der Meulen, F., Schauer, M., & Spreij, P. (2022). Julia code for nonparametric Bayesian volatility learning under microstructure noise. https://doi.org/10.5281/zenodo.6801410

  • Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies, 6(2), 327–343.

    Article  MathSciNet  MATH  Google Scholar 

  • Hoffmann, M., Munk, A., & Schmidt-Hieber, J. (2012). Adaptive wavelet estimation of the diffusion coefficient under additive error measurements. Annales de l’IHP Probabilités et Statistiques, 48(4), 1186–1216.

    MathSciNet  MATH  Google Scholar 

  • Ignatieva, K., & Platen, E. (2012). Estimating the diffusion coefficient function for a diversified world stock index. Computer Statistics Data Analysis, 56(6), 1333–1349.

    Article  MathSciNet  MATH  Google Scholar 

  • Jacod, J., Li, Y., Mykland, P., Podolskij, M., & Vetter, M. (2009). Microstructure noise in the continuous case: The pre-averaging approach. Stochastic Processes Application, 119(7), 2249–2276.

    Article  MathSciNet  MATH  Google Scholar 

  • Jacod, J., & Shiryaev, A. (2013). Limit theorems for stochastic processes (Vol. 288). Springer.

    MATH  Google Scholar 

  • Kanaya, S., & Kristensen, D. (2016). Estimation of stochastic volatility models by nonparametric filtering. Econometric Theory, 32(4), 861–916.

    Article  MathSciNet  MATH  Google Scholar 

  • Kloeden, P. E., & Platen, E. (1992). Numerical solution of stochastic differential equations, Applications of Mathematics (New York) (Vol. 23). Springer. https://doi.org/10.1007/978-3-662-12616-5.

    Book  MATH  Google Scholar 

  • Mancini, C., Mattiussi, V., & Renò, R. (2015). Spot volatility estimation using delta sequences. Finance and Stochastics, 19(2), 261–293.

    Article  MathSciNet  MATH  Google Scholar 

  • Müller, P., & Mitra, R. (2013). Bayesian nonparametric inference—Why and how. Bayesian Analyses, 8(2), 269–302.

    MathSciNet  MATH  Google Scholar 

  • Müller, P., Quintana, F. A., Jara, A., & Hanson, T. (2015). Bayesian nonparametric data analysis. Springer Series in Statistics. Springer.

    MATH  Google Scholar 

  • Munk, A., & Schmidt-Hieber, J. (2010). Lower bounds for volatility estimation in microstructure noise models. Borrowing strength: Theory powering applications—A Festschrift for Lawrence D. Brown, Inst. Math. Stat. (IMS) Collect. (Vol. 6, pp. 43–55). Institute of Mathematics Statistics.

    Chapter  Google Scholar 

  • Musiela, M., & Rutkowski, M. (2005). Martingale methods in financial modelling. Stochastic modelling and applied probability (2nd ed., Vol. 36). Springer.

    MATH  Google Scholar 

  • Mykland, P. A., & Zhang, L. (2009). Inference for continuous semimartingales observed at high frequency. Econometrica, 77(5), 1403–1445.

    Article  MathSciNet  MATH  Google Scholar 

  • Mykland, P. A., & Zhang, L. (2012). The econometrics of high-frequency data. In C. R. C. Press (Ed.), Statistical methods for stochastic differential equations, Monogr. Statist. Appl. Probab. (Vol. 124, pp. 109–190). CRC Press.

    Google Scholar 

  • Papaspiliopoulos, O., Roberts, G. O., & Stramer, O. (2013). Data augmentation for diffusions. Journal of Computational and Graphical Statistics, 22(3), 665–688.

    Article  MathSciNet  Google Scholar 

  • Petris, G., Petrone, S., & Campagnoli, P. (2009). Dynamic linear models with R. Use R! Springer.

    MATH  Google Scholar 

  • Reiß, M. (2011). Asymptotic equivalence for inference on the volatility from noisy observations. Annals of Statistics, 39(2), 772–802.

    Article  MathSciNet  MATH  Google Scholar 

  • Sabel, T., Schmidt-Hieber, J., & Munk, A. (2015). Spot volatility estimation for high-frequency data: Adaptive estimation in practice. Modeling and stochastic learning for forecasting in high dimensions, Lect. Notes Stat. (Vol. 217, pp. 213–241). Springer.

    Chapter  Google Scholar 

  • Silverman, B. W. (1986). Density estimation for statistics and data analysis. Monographs on statistics and applied probability. Chapman & Hall.

    Google Scholar 

  • Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of American Statistics Association, 82(398), 528–550 (With discussion and with a reply by the authors).

    Article  MathSciNet  MATH  Google Scholar 

  • Tierney, L. (1994). Markov chains for exploring posterior distributions. Annals of Statistics, 22(4), 1701–1762.

    MathSciNet  MATH  Google Scholar 

  • van der Ploeg, A. P. C. (2006). Stochastic volatility and the pricing of financial derivatives. Ph.D. thesis, University of Amsterdam.

  • van der Meulen, F., & Schauer, M. (2017). Bayesian estimation of discretely observed multi-dimensional diffusion processes using guided proposals. Electronics Journal of Statistics, 11(1), 2358–2396.

    MathSciNet  MATH  Google Scholar 

  • Wilkinson, D. J. (2012). Metropolis Hastings MCMC when the proposal and target have differing support. https://darrenjw.wordpress.com/2012/06/04/metropolis-hastings-mcmc-when-the-proposal-and-target-have-differing-support/. Accessed 23 Dec 2017

  • Zhang, L., Mykland, P. A., & Aït-Sahalia, Y. (2005). A tale of two time scales: Determining integrated volatility with noisy high-frequency data. Journal of the American Statistical Association, 100(472), 1394–1411.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

The research leading to the results in this paper has received funding from the European Research Council under ERC Grant Agreement 320637.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frank van der Meulen.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Details on update steps in the Gibbs sampler

Details on update steps in the Gibbs sampler

1.1 Drawing \(x_{0:n}\)

We first describe how to draw the state vector \(x_{0:n}\) conditional on all other parameters in the model and the data \(y_{1:n}\). Note that for \(u_i\) in (5) we have by (4) that \(u_i \sim N(0,w_i)\), where

$$\begin{aligned} \begin{aligned} w_i&= \theta _k\varDelta _i, \quad i \in [(k-1)m+1,km], \quad k=1,\ldots ,N-1,\\ w_i&= \theta _N \varDelta _i, \quad i \in [(N-1)m+1,n]. \end{aligned} \end{aligned}$$
(19)

By Eq. (4.21) in Petris et al. (2009) (we omit dependence on \(\theta _{1:N}, \eta _v\) in our notation, as they stay fixed in this step),

$$\begin{aligned} p(x_{0:n} | y_{1:n})=\prod _{i=0}^n p( x_i | x_{i+1:n}, y_{1:n}), \end{aligned}$$

where the factor with \(i=n\) in the product on the righthand side is the filtering density \(p( x_n | y_{1:n})\). This distribution is in fact \(N(\mu _n,C_n)\), with the mean \(\mu _n\) and variance \(C_n\) obtained from Kalman recursions

$$\begin{aligned} \mu _i=\mu _{i-1}+K_i e_i,\quad C_i=K_i \eta _v, \quad i=1,\ldots ,n. \end{aligned}$$

Here

$$\begin{aligned} K_i =\frac{C_{i-1}+w_i }{C_{i-1} + w_i + \eta _v}, \quad i=1,\ldots ,n, \end{aligned}$$

is the Kalman gain. Furthermore, \( e_i=y_i-\mu _{i-1} \) is the one-step ahead prediction error, also referred to as innovation. See Petris et al. (2009), Section 2.7.2. This constitutes the forward pass of the FFBS.

Next, in the backward pass, one draws backwards in time \({\widetilde{x}}_n \sim N(\mu _n,C_n)\) and \({\widetilde{x}}_{n-1},\ldots {\widetilde{x}}_0\) from the densities \(p( x_i | {\widetilde{x}}_{i+1}, y_{1:n})\) for \(i=n-1,n-2,\ldots ,0\). It holds that \(p( x_i | {\widetilde{x}}_{i+1:n}, y_{1:n})=p( x_i | {\widetilde{x}}_{i+1}, y_{1:n})\), and the latter distribution is \(N(h_i,H_i)\), with

$$\begin{aligned} h_i=\mu _i+\frac{C_i}{C_i+w_{i+1}}({\widetilde{x}}_{i+1}-\mu _i), \quad H_i =\frac{C_i w_{i+1}}{C_i+w_{i+1}}. \end{aligned}$$

For every i, these expressions depend on a previously generated \({\widetilde{x}}_{i+1}\) and other known quantities only. The sequence \({\widetilde{x}}_0,{\widetilde{x}}_1,\ldots ,{\widetilde{x}}_n\) is a sample from \(p(x_{0:n}|y_{1:n})\). See Section 4.4.1 in Petris et al. (2009) for details on FFBS.

1.2 Drawing \(\eta _v\), \(\theta _{1:N}\) and \(\zeta _{2:N}\)

Using the likelihood expression from Sect. 2.3 and the fact that \(\eta _v\sim {\text {IG}}(\alpha _v,\beta _v)\), one sees that the full conditional distribution of \(\eta _v\) is given by

$$\begin{aligned} \eta _v | x_{1:n}, y_{1:n} \sim {\text {IG}}\left( \alpha _v + \frac{n}{2}, \beta _v + \frac{1}{2} \sum _{i=1}^n (y_i-x_i)^2 \right) . \end{aligned}$$

Similarly, using the likelihood expression from Sect. 2.3 and the conditional distributions in (7), one sees that the full conditional distributions for \(\theta _{1:N}\) are

$$\begin{aligned} \theta _1 | \zeta _2,x_{1:n}&\sim \textrm{IG}\left( \alpha _1 + \alpha +\frac{m_1}{2},\, \beta _1 + \frac{\alpha }{ \zeta _{2}} + \frac{ Z_1}{2}\right) ,\\ \theta _k | \zeta _k,\zeta _{k+1},x_{1:n}&\sim \textrm{IG}\left( 2\alpha +\frac{m_k}{2},\, \frac{\alpha }{\zeta _k}+\frac{\alpha }{\zeta _{k+1}} + \frac{ Z_k}{2}\right) , \quad k=2,\dots ,N-1, \\ \theta _N | \zeta _N,x_{1:n}&\sim \textrm{IG}\left( \alpha +\frac{m_N}{2},\, \frac{\alpha }{ \zeta _N} + \frac{ Z_N}{2}\right) . \end{aligned}$$

The full conditional distributions for \(\zeta _{2:N}\) are

$$\begin{aligned} \zeta _k | \theta _k,\theta _{k-1} \sim {\text {IG}}\left( 2\alpha ,\frac{\alpha }{\theta _{k-1}}+\frac{\alpha }{ \theta _k}\right) , \quad k=2,\ldots ,N. \end{aligned}$$

1.3 Drawing \(\alpha \)

The unnormalised full conditional density of \(\alpha \) is

$$\begin{aligned} q(\alpha ) = \pi (\alpha )\left( \frac{\alpha ^{\alpha }}{\varGamma (\alpha )}\right) ^{2(N-1)} \exp \left( -\alpha \sum _{k=2}^N \frac{1}{\zeta _k}\left( \frac{1}{\theta _{k-1}} + \frac{1}{\theta _k}\right) \right) \prod _{k=2}^N \left( \theta _{k-1} \theta _k \zeta _k^2 \right) ^{-\alpha }. \end{aligned}$$

The corresponding normalised density is nonstandard, and the Metropolis-within-Gibbs step (see, e.g., Tierney 1994) is used to update \(\alpha \). The specific details are exactly the same as in Gugushvili et al. (2019b).

1.4 Gibbs sampler

Settings for the Gibbs sampler in Sect. 4 are as follows: we used a vague specification \(\alpha _1,\beta _1\rightarrow 0\), and also assumed that \(\log \alpha \sim N(1,0.25)\) and \(\eta _v \sim {\text {IG}}(0.3,0.3)\) in Sect. 4.1. For the Heston model in Sect. 4.2 we used the specification \(\eta _v \sim {\text {IG}}(0.001,0.001)\). Furthermore, we set \(x_0 \sim N(0,25)\). The Metropolis-within-Gibbs step to update the hyperparameter \(\alpha \) was performed via an independent Gaussian random walk proposal (with a correction as in Wilkinson (2012)) with scaling to ensure the acceptance rate of about \(30-50\%\). The Gibbs sampler was run for \(30 \, 000\) iterations, with the first third of the samples dropped as burn-in.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gugushvili, S., van der Meulen, F., Schauer, M. et al. Nonparametric Bayesian volatility learning under microstructure noise. Jpn J Stat Data Sci 6, 551–571 (2023). https://doi.org/10.1007/s42081-022-00185-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-022-00185-9

Keywords

Mathematics Subject Classification

Navigation