Structured volatility matrix estimation for non-synchronized high-frequency financial data
Introduction
High-frequency financial data have provided researchers and practitioners with incredible information to investigate asset pricing and market volatility dynamics. New analytic challenges also arise from analysis of high-frequency financial data. First, due to small market inefficiency such as bid–ask bounce, asymmetric information, latency, and so on, stock prices are contaminated by micro-structural noises. If the micro-structural noises are not accounted for, estimators for integrated volatilities will diverge as the frequency increases (Aït-Sahalia et al., 2005). Second, the observation time points are not synchronized, which makes it hard to estimate co-volatilities, particularly for those illiquid assets. Despite these challenges, several efficient estimation procedures have been developed. Examples include two-time scale realized volatility (TSRV) (Zhang et al., 2005), multi-scale realized volatility (MSRV) (Zhang, 2006, Zhang, 2011), wavelet estimator (Fan and Wang, 2007), pre-averaging realized volatility (PRV) (Christensen et al., 2010, Jacod et al., 2009), kernel realized volatility (KRV) (Barndorff-Nielsen et al., 2008, Barndorff-Nielsen et al., 2011), quasi-maximum likelihood estimator (QMLE) (Aït-Sahalia et al., 2010, Xiu, 2010), local method of moments (Bibinger et al., 2014), and robust pre-averaging realized volatility (Fan and Kim, 2018).
When estimating co-volatilities, to handle the non-synchronization problem, we often employ some synchronization scheme such as generalized sampling time (Aït-Sahalia et al., 2010), refresh time (Barndorff-Nielsen et al., 2011, Fan et al., 2012), previous tick (Wang and Zou, 2010, Zhang, 2011), and some linear interpolation (Bibinger et al., 2014) schemes. See also Hayashi and Yoshida, 2005, Hayashi and Yoshida, 2011, Malliavin and Mancino, 2002, Malliavin et al., 2009, Mancino and Sanfelici, 2008, Park et al., 2016. These synchronization schemes asymptotically guarantee that the errors coming from the non-synchronized observations can be negligible as the frequency increases. However, for illiquid assets, whose trading frequencies are relatively low, the errors may not be asymptotically negligible, as the refresh times are too long to be useful so that estimators for co-volatilities can be inaccurate. This generates demand for investigating how to better estimate co-volatilities for illiquid assets. Apparently, we need to appeal to structural aspects of the model.
A commonly used structure to account for cross-sectional dependence is the factor model. It was first used to estimate high-dimensional covariance matrix in Fan et al. (2008) for portfolio allocation and risk management and admits a low-rank plus sparse volatility matrix structure (Fan et al., 2013, Aït-Sahalia and Xiu, 2017, Fan et al., 2016a, Kim et al., 2018, Kong et al., 2018). When the number of assets is large, the latent factors can be accurately estimated. The performance of these factor-based estimators depends critically on the accuracy of the initial volatility matrix input. However, as discussed above, the co-volatility estimators for illiquid assets are inaccurate, due to relatively long refresh times between any two illiquid assets. On the other hand, the special covariance structure implied by the factor model makes us possible to use the covariance information from liquid blocks to infer about those in illiquid blocks.
How to estimate co-volatilities for illiquid assets, which have serious non-synchronization issue? In this paper, we appeal to the factor structure to infer these co-volatilities. The factor structure implies that the volatility matrix consists of a low-rank covariance matrix induced by the linear combinations of common factors and a sparse covariance matrix induced by idiosyncratic components. We investigate how to estimate the low-rank (or factor) volatility matrix without using estimators for illiquid assets. Due to the low-rankness of the covariance matrix induced by the linear combinations of the common factors, the sub-matrix corresponding to the illiquid assets is spanned by the column space of the remaining low-rank volatility sub-matrices and can be determined analytically from the sub-matrices that involve liquid assets. Thus, the problem of estimating the low-rank volatility matrix is related to the popular matrix completion problem (Candès and Recht, 2009, Koltchinskii et al., 2011), except that the entries (corresponding to the illiquid assets) are not ‘missing’ at random, but ‘missing’ (not used due to their inaccuracies) with a structured pattern (Cai et al., 2016). This structured pattern allows us to use the aforementioned analytical formula to estimate the factor-induced volatility submatrix that corresponds to illiquid assets. Then we estimate the sparse (or idiosyncratic) volatility matrix by subtracting the low-rank volatility estimator from the input volatility matrix estimator and apply the adaptive thresholding scheme to the sparse volatility matrix estimator. The resulting procedure of this kind is called Principal Orthogonal complEment Thresholding (POET) in Fan et al. (2013).
We will investigate the asymptotic behaviors of the proposed estimators for the volatility matrices that correspond to linear combinations of factors, the idiosyncratic components, and the log-returns of assets. We assume that the high-frequency data are contaminated with micro-structural noises. We ideally model the trading volumes of liquid and illiquid assets. We explicitly show when and where the gain can be made by ignoring the co-volatilities of the illiquid assets.
The rest of the paper is organized as follows. Section 2 provides a factor-based diffusion process and data structure and Section 3 reviews the pairwise refresh time scheme and pre-averaging realized volatility estimation method. A large volatility estimation procedure is proposed in Section 4 using matrix completion scheme with the structured missing pattern, whose asymptotic properties are established. The advantages of the proposed method is demonstrated via a simulation study in Section 5 and is illustrated by an application to the NYSE stocks in Section 6. Proofs are collected in Section 7.
Section snippets
Model set-up
We first define some notations. For any given vector , creates a diagonal matrix using elements of . For any given matrix , Matrix spectral norm is the largest eigenvalue of , the Frobenius norm of is . denotes the sub-matrix of formed by rows and columns whose indices are in and , respectively, where and are subsets of and , respectively. We will
Pairwise refresh method
To handle the non-synchronization problem, we can use synchronization schemes such as generalized sampling time (Aït-Sahalia et al., 2010), refresh time (Barndorff-Nielsen et al., 2011, Fan et al., 2012), and previous tick (Wang and Zou, 2010, Zhang, 2011) schemes, or some linear interpolation scheme (Bibinger et al., 2014). There are estimation procedures which do not require to align data (Hayashi and Yoshida, 2005, Hayashi and Yoshida, 2011, Malliavin and Mancino, 2002, Malliavin et al., 2009
Low-rank volatility matrix estimation
Several large volatility matrix estimation procedures have been developed based on the factor model (Fan et al., 2016a, Aït-Sahalia and Xiu, 2017, Kim et al., 2018, Kong et al., 2018). Their performances may depend on the accuracy of the input volatility matrix estimator . As discussed in Section 2, when it comes to estimating co-volatilities in high-frequency finance, one of the crucial issues is the non-synchronization problem. We use the pairwise refresh time defined in Definition 1 in
Consistency of estimators
To check the finite sample performance of the proposed estimator, we conducted a simulation study. The true log-stock price follows a continuous-time -factor model defined in (2.1) with . Let be the Cholesky decomposition of the instantaneous volatility process . The diagonal elements of follow four different processes such as geometric Ornstein–Uhlenbeck processes, the sum of two CIR processes (Barndorff-Nielsen, 2002, Cox et al., 1985), the volatility
Empirical applications
We collected intra-daily transaction prices of NYSE constituents from January to March in 2016 from the TAQ database in the Wharton Data Service (WRDS) system, 60 trading days in total. We excluded stocks which have less than 100 trading observations and chose the top 100 liquid stocks and the top 100 illiquid stocks as the candidates of our portfolio construction. We used the log-prices in seconds and exclude overnight returns to avoid dividend issuances and stock splits. To manage the
Proofs
Proof of Theorem 4.1 Similar to the proof of Theorem 4.1 of Fan and Kim (2018), we can show Now consider . By Weyl’s Theorem, we have By Theorem 1.1 in Fan et al. (2016b), we have and
Acknowledgments
The research of Jianqing Fan was supported in partby National Science Foundation (NSF) grant DMS-1712591 and a Princeton engineering innovation fund. The research of Donggyu Kim was supported in part by KAIST Settlement/Research Subsidies for Newly-hired Faculty grant G04170049. The bulk of the work was conducted while Donggyu Kim was a postdoctoral fellow at Department of Operations Research and Financial Engineering, Princeton University.
References (42)
- et al.
Using principal component analysis to estimate a high dimensional factor model with high-frequency data
J. Econometrics
(2017) - et al.
Multivariate realised kernels: consistent positive semi-definite estimators of the covariation of equity prices with noise and non-synchronous trading
J. Econometrics
(2011) - et al.
Pre-averaging estimators of the ex-post covariance matrix in noisy diffusion models with non-synchronous data
J. Econometrics
(2010) - et al.
High dimensional covariance matrix estimation using a factor model
J. Econometrics
(2008) - et al.
Risks of large portfolios
J. Econometrics
(2015) - et al.
Nonsynchronous covariation process and limit theorems
Stoch. Process. Appl.
(2011) - et al.
Microstructure noise in the continuous case: the pre-averaging approach
Stoch. Process. Appl.
(2009) - et al.
Sparse PCA Based on High-Dimensional Itô processes with Measurement Errors
J. Multivariate Anal.
(2016) - et al.
Asymptotic theory for large volatility matrix estimation based on high-frequency financial data
Stochastic Process. Appl.
(2016) - et al.
Robustness of fourier estimator of integrated volatility in the presence of microstructure noise
Comput. Statist. Data Anal.
(2008)
Estimating the quadratic covariation matrix for asynchronously observed high frequency stock returns corrupted by additive measurement error
J. Econometrics
Quasi-maximum likelihood estimation of volatility with high frequency data
J. Econometrics
Estimating covariation: epps effect, microstructure noise
J. Econometrics
High-frequency covariance estimates with noisy and asynchronous financial data
J. Amer. Statist. Assoc.
How often to sample a continuous-time process in the presence of market microstructure noise
Rev. Financial Stud.
Determining the number of factors in approximate factor models
Econometrica
Econometric analysis of realized volatility and its use in estimating stochastic volatility models
J. R. Stat. Soc. Ser. B Stat. Methodol.
Designing realized kernels to measure the ex post variation of equity prices in the presence of noise
Econometrica
Estimating the quadratic covariation matrix from noisy observations: local method of moments and efficiency
Ann. Statist.
Structured matrix completion with applications to genomic data integration
J. Amer. Statist. Assoc.
Exact matrix completion via convex optimization
Found. Comput. Math.
Cited by (17)
Adaptive robust large volatility matrix estimation based on high-frequency financial data
2023, Journal of EconometricsLarge volatility matrix analysis using global and national factor models
2023, Journal of EconometricsUsing, taming or avoiding the factor zoo? A double-shrinkage estimator for covariance matrices
2023, Journal of Empirical FinanceBlock-diagonal precision matrix regularization for ultra-high dimensional data
2023, Computational Statistics and Data AnalysisHigh-Dimensional Volatility Matrix Estimation with Cross-Sectional Dependent and Heavy-Tailed Microstructural Noise
2023, Journal of Systems Science and Complexity