Structured volatility matrix estimation for non-synchronized high-frequency financial data

https://doi.org/10.1016/j.jeconom.2018.12.019Get rights and content

Abstract

Several large volatility matrix estimation procedures have been recently developed for factor-based Itô processes whose integrated volatility matrix consists of low-rank and sparse matrices. Their performance depends on the accuracy of input volatility matrix estimators. When estimating co-volatilities based on high-frequency data, one of the crucial challenges is non-synchronization for illiquid assets, which makes their co-volatility estimators inaccurate. In this paper, we study how to estimate the large integrated volatility matrix without using co-volatilities of illiquid assets. Specifically, we pretend that the co-volatilities for illiquid assets are missing, and estimate the low-rank matrix using a matrix completion scheme with a structured missing pattern. To further regularize the sparse volatility matrix, we employ the principal orthogonal complement thresholding method (POET). We also investigate the asymptotic properties of the proposed estimation procedure and demonstrate its advantages over using co-volatilities of illiquid assets. The advantages of our methods are also verified by an extensive simulation study and illustrated by high-frequency data for NYSE stocks.

Introduction

High-frequency financial data have provided researchers and practitioners with incredible information to investigate asset pricing and market volatility dynamics. New analytic challenges also arise from analysis of high-frequency financial data. First, due to small market inefficiency such as bid–ask bounce, asymmetric information, latency, and so on, stock prices are contaminated by micro-structural noises. If the micro-structural noises are not accounted for, estimators for integrated volatilities will diverge as the frequency increases (Aït-Sahalia et al., 2005). Second, the observation time points are not synchronized, which makes it hard to estimate co-volatilities, particularly for those illiquid assets. Despite these challenges, several efficient estimation procedures have been developed. Examples include two-time scale realized volatility (TSRV) (Zhang et al., 2005), multi-scale realized volatility (MSRV) (Zhang, 2006, Zhang, 2011), wavelet estimator (Fan and Wang, 2007), pre-averaging realized volatility (PRV) (Christensen et al., 2010, Jacod et al., 2009), kernel realized volatility (KRV) (Barndorff-Nielsen et al., 2008, Barndorff-Nielsen et al., 2011), quasi-maximum likelihood estimator (QMLE) (Aït-Sahalia et al., 2010, Xiu, 2010), local method of moments (Bibinger et al., 2014), and robust pre-averaging realized volatility (Fan and Kim, 2018).

When estimating co-volatilities, to handle the non-synchronization problem, we often employ some synchronization scheme such as generalized sampling time (Aït-Sahalia et al., 2010), refresh time (Barndorff-Nielsen et al., 2011, Fan et al., 2012), previous tick (Wang and Zou, 2010, Zhang, 2011), and some linear interpolation (Bibinger et al., 2014) schemes. See also Hayashi and Yoshida, 2005, Hayashi and Yoshida, 2011, Malliavin and Mancino, 2002, Malliavin et al., 2009, Mancino and Sanfelici, 2008, Park et al., 2016. These synchronization schemes asymptotically guarantee that the errors coming from the non-synchronized observations can be negligible as the frequency increases. However, for illiquid assets, whose trading frequencies are relatively low, the errors may not be asymptotically negligible, as the refresh times are too long to be useful so that estimators for co-volatilities can be inaccurate. This generates demand for investigating how to better estimate co-volatilities for illiquid assets. Apparently, we need to appeal to structural aspects of the model.

A commonly used structure to account for cross-sectional dependence is the factor model. It was first used to estimate high-dimensional covariance matrix in Fan et al. (2008) for portfolio allocation and risk management and admits a low-rank plus sparse volatility matrix structure (Fan et al., 2013, Aït-Sahalia and Xiu, 2017, Fan et al., 2016a, Kim et al., 2018, Kong et al., 2018). When the number of assets is large, the latent factors can be accurately estimated. The performance of these factor-based estimators depends critically on the accuracy of the initial volatility matrix input. However, as discussed above, the co-volatility estimators for illiquid assets are inaccurate, due to relatively long refresh times between any two illiquid assets. On the other hand, the special covariance structure implied by the factor model makes us possible to use the covariance information from liquid blocks to infer about those in illiquid blocks.

How to estimate co-volatilities for illiquid assets, which have serious non-synchronization issue? In this paper, we appeal to the factor structure to infer these co-volatilities. The factor structure implies that the volatility matrix consists of a low-rank covariance matrix induced by the linear combinations of common factors and a sparse covariance matrix induced by idiosyncratic components. We investigate how to estimate the low-rank (or factor) volatility matrix without using estimators for illiquid assets. Due to the low-rankness of the covariance matrix induced by the linear combinations of the common factors, the sub-matrix corresponding to the illiquid assets is spanned by the column space of the remaining low-rank volatility sub-matrices and can be determined analytically from the sub-matrices that involve liquid assets. Thus, the problem of estimating the low-rank volatility matrix is related to the popular matrix completion problem (Candès and Recht, 2009, Koltchinskii et al., 2011), except that the entries (corresponding to the illiquid assets) are not ‘missing’ at random, but ‘missing’ (not used due to their inaccuracies) with a structured pattern (Cai et al., 2016). This structured pattern allows us to use the aforementioned analytical formula to estimate the factor-induced volatility submatrix that corresponds to illiquid assets. Then we estimate the sparse (or idiosyncratic) volatility matrix by subtracting the low-rank volatility estimator from the input volatility matrix estimator and apply the adaptive thresholding scheme to the sparse volatility matrix estimator. The resulting procedure of this kind is called Principal Orthogonal complEment Thresholding (POET) in Fan et al. (2013).

We will investigate the asymptotic behaviors of the proposed estimators for the volatility matrices that correspond to linear combinations of factors, the idiosyncratic components, and the log-returns of assets. We assume that the high-frequency data are contaminated with micro-structural noises. We ideally model the trading volumes of liquid and illiquid assets. We explicitly show when and where the gain can be made by ignoring the co-volatilities of the illiquid assets.

The rest of the paper is organized as follows. Section 2 provides a factor-based diffusion process and data structure and Section 3 reviews the pairwise refresh time scheme and pre-averaging realized volatility estimation method. A large volatility estimation procedure is proposed in Section 4 using matrix completion scheme with the structured missing pattern, whose asymptotic properties are established. The advantages of the proposed method is demonstrated via a simulation study in Section 5 and is illustrated by an application to the NYSE stocks in Section 6. Proofs are collected in Section 7.

Section snippets

Model set-up

We first define some notations. For any given vector a, diag(a) creates a diagonal matrix using elements of a. For any given d1×d2 matrix U=Uij, U1=max1jd2i=1d1|Uij|,U=max1id1j=1d2|Uij|,andUmax=maxi,j|Uij|.Matrix spectral norm U2 is the largest eigenvalue of UU, the Frobenius norm of U is UF=tr(UU). UIJ denotes the sub-matrix of U formed by rows and columns whose indices are in I and J, respectively, where I and J are subsets of {1,,d1} and {1,,d2}, respectively. We will

Pairwise refresh method

To handle the non-synchronization problem, we can use synchronization schemes such as generalized sampling time (Aït-Sahalia et al., 2010), refresh time (Barndorff-Nielsen et al., 2011, Fan et al., 2012), and previous tick (Wang and Zou, 2010, Zhang, 2011) schemes, or some linear interpolation scheme (Bibinger et al., 2014). There are estimation procedures which do not require to align data (Hayashi and Yoshida, 2005, Hayashi and Yoshida, 2011, Malliavin and Mancino, 2002, Malliavin et al., 2009

Low-rank volatility matrix estimation

Several large volatility matrix estimation procedures have been developed based on the factor model (Fan et al., 2016a, Aït-Sahalia and Xiu, 2017, Kim et al., 2018, Kong et al., 2018). Their performances may depend on the accuracy of the input volatility matrix estimator Γ̂. As discussed in Section 2, when it comes to estimating co-volatilities in high-frequency finance, one of the crucial issues is the non-synchronization problem. We use the pairwise refresh time defined in Definition 1 in

Consistency of estimators

To check the finite sample performance of the proposed estimator, we conducted a simulation study. The true log-stock price follows a continuous-time r-factor model defined in (2.1) with μ(t)=0. Let σ(t) be the Cholesky decomposition of the instantaneous volatility process ς(t)=(ςij(t))1i,jp. The diagonal elements of ς(t) follow four different processes such as geometric Ornstein–Uhlenbeck processes, the sum of two CIR processes (Barndorff-Nielsen, 2002, Cox et al., 1985), the volatility

Empirical applications

We collected intra-daily transaction prices of NYSE constituents from January to March in 2016 from the TAQ database in the Wharton Data Service (WRDS) system, 60 trading days in total. We excluded stocks which have less than 100 trading observations and chose the top 100 liquid stocks and the top 100 illiquid stocks as the candidates of our portfolio construction. We used the log-prices in seconds and exclude overnight returns to avoid dividend issuances and stock splits. To manage the

Proofs

Proof of Theorem 4.1

Similar to the proof of Theorem 4.1 of Fan and Kim (2018), we can show Θ̂11Θ11maxCMσs(p)p1+β1,n.

Now consider Θ̂12Θ12max. By Weyl’s Theorem, we have max1kr|ξ̂kξk|Γ̂12Θ122Γ̂12Γ122+Γ12Θ122p1p2Γ̂12Γ12max+Mσs(p). By Theorem 1.1 in Fan et al. (2016b), we have max1krv̂ksign(v̂k,vk)vkmaxCp1p2Γ̂12Γ12max+Mσp1p2max(s(p)p1,s(p)p2)Dξp1Cp2Γ̂12Γ12max+Mσmax(s(p)p1,s(p)p2)Dξ and max1krûksign(ûk,uk)ukmaxCp1p2Γ̂12Γ12max+Mσp1p2max(s(p)p1,s(p)p2)Dξp2Cp1Γ̂

Acknowledgments

The research of Jianqing Fan was supported in partby National Science Foundation (NSF) grant DMS-1712591 and a Princeton engineering innovation fund. The research of Donggyu Kim was supported in part by KAIST Settlement/Research Subsidies for Newly-hired Faculty grant G04170049. The bulk of the work was conducted while Donggyu Kim was a postdoctoral fellow at Department of Operations Research and Financial Engineering, Princeton University.

References (42)

  • ParkS. et al.

    Estimating the quadratic covariation matrix for asynchronously observed high frequency stock returns corrupted by additive measurement error

    J. Econometrics

    (2016)
  • XiuD.

    Quasi-maximum likelihood estimation of volatility with high frequency data

    J. Econometrics

    (2010)
  • ZhangL.

    Estimating covariation: epps effect, microstructure noise

    J. Econometrics

    (2011)
  • Aït-SahaliaY. et al.

    High-frequency covariance estimates with noisy and asynchronous financial data

    J. Amer. Statist. Assoc.

    (2010)
  • Aït-SahaliaY. et al.

    How often to sample a continuous-time process in the presence of market microstructure noise

    Rev. Financial Stud.

    (2005)
  • BaiJ. et al.

    Determining the number of factors in approximate factor models

    Econometrica

    (2002)
  • Barndorff-NielsenO.E.

    Econometric analysis of realized volatility and its use in estimating stochastic volatility models

    J. R. Stat. Soc. Ser. B Stat. Methodol.

    (2002)
  • Barndorff-NielsenO.E. et al.

    Designing realized kernels to measure the ex post variation of equity prices in the presence of noise

    Econometrica

    (2008)
  • BibingerM. et al.

    Estimating the quadratic covariation matrix from noisy observations: local method of moments and efficiency

    Ann. Statist.

    (2014)
  • CaiT. et al.

    Structured matrix completion with applications to genomic data integration

    J. Amer. Statist. Assoc.

    (2016)
  • CandèsE.J. et al.

    Exact matrix completion via convex optimization

    Found. Comput. Math.

    (2009)
  • Cited by (17)

    View all citing articles on Scopus
    View full text