1 Introduction

The increase in financial collapses and periods of instability have increased attention to the reappraisal of the models used to forecast risk. A risk overestimation generates an opportunity cost, as an unnecessary portion of capital is allocated to investment security. On the other hand, risk underestimation can lead to irreversible losses, as insufficient capital is destined to fulfill the safety purpose.

Value at Risk (VaR) is the standard measure used to forecast market risk in the financial industry. VaR computes the maximum loss that is expected for a given period and significance level. For more details regarding on VaR, we suggest Duffie and Pan (1997) and Jorion (2000). Although VaR has been historically criticized for not being a coherent risk measureFootnote 1, recently the literature has paid attention to its statistical properties. Among these properties, we refer to robustness. According to Kou and Peng (2016), a risk measure is robust if it is stable to small deviations in the model and theoretical distribution of the data. Robust risk measures, used for regulatory purposes, allow reducing the problem of regulatory arbitrage (Kou et al., 2013).Footnote 2 Another interesting statistical property that VaR satisfies is the elicitability. A risk measure is elicitable when it minimizes the expected value of a score function (Ziegel, 2016; Acerbi & Szekely, 2017). This property is interesting in risk management because it allows determining, among competing models, the model that generates the best forecast.

The most common VaR estimation approach is the non-parametric one, known as Historical Simulation (HS). Pérignon and Smith (2010) verify that approximately 73% of banking institutions that disclose the value of their VaR use this procedure. Other frequently applied methods are the parametric, such as the models of the GARCH family (Generalized Autoregressive Conditional Heteroskedasticity), and the semiparametric, including the Filtered Historical Simulation. However, the VaR results are conditioned to the model and specification. Two models or a model with two different specifications generate different values and consequently compromise decision-making. The uncertainty regarding model choice and specification leads to a risk referred to in the literature as model risk. Although there is no unanimous definition for this risk, regulatory agencies define it as the uncertainty present in the estimation process and the choice of the estimation model (Reserve, 2011).

In order to minimize the impacts of model risk, many studies are carried out comparing different VaR forecasting models. Research developed in this regard can be found in Kuester et al. (2006); Marinelli et al. (2007); Weiß (2013); Telmoudi et al. (2016); Patton et al. (2019), and Müller et al. (2022). In most of these studies, the main focus has been using empirical data and univariate models. However, in a practical sense, the models used for risk management are multivariate. Furthermore, studies such as Weiß (2013); Righi and Ceretta (2015), among others, use backtesting tools to compare different estimation procedures. This procedure performs the validation of a given estimation procedure for a risk measure using historical data (Ziegel, 2016). For VaR, backtesting is generally used to compare predicted losses, that is, VaR forecasts, against actual losses, for a given time horizon. See Kupiec (1995) and Christoffersen (1998) for examples of backtesting for VaR. However, these tests do not allow direct comparison and ranking of the performance of competing risk forecasting procedures (Gneiting, 2011; Ziegel, 2016). For elicitable measures (including VaR), a suitable procedure for comparing and verifying current forecasting models is to use the scoring rule (Gneiting, 2011; Ziegel, 2016; Acerbi & Szekely, 2017).

In this sense, this work aims to analyze the performance of multivariate models to predict VaR. The estimation procedures considered are HS, copulas (C-Vine, D-Vine, and R-Vine),Footnote 3 and multivariate models from the GARCH family, which including GO (Generalized Orthogonal)-GARCH e DCC (Dynamic Conditional Correlation)-GARCH. We consider these models because they are the main estimation methods considered in risk management studies (Pérignon & Smith, 2010; Müller & Righi, 2018; Silahli et al., 2019; Nagler et al., 2019). To assess the quality of the predictions, we consider descriptive statistics, such as mean and standard deviation, realized loss function (elicitable loss function), and a measure of model risk (Müller & Righi, 2020), which allows us to be quantified realized cost. The data set used in the study refers to the main stocks belonging to the Bovespa Index (Brazilian market index) from January 2012 to April 2022, making 2,559 daily observations for each stock. In the analysis, we use two rolling estimation windows and the main significance levels considered in the literature.

This research contributes to the literature investigating what model results in more reliable risk predictions. Previous studies mainly compare univariate risk prediction models. Besides that, the researchers that evaluate multivariate models focus mainly on the bivariate case. An example is the study of Weiß (2013), which investigates multivariate GARCH models and bivariate copulas to predict VaR and ES (Expected Shortfall). Müller and Righi (2018) consider a set of models similar to the one used in our study. However, the authors focus on a numerical assessment. The limitations of numerical analysis are that simulated returns do not have stylized facts so close to real stock returns. Additionally, our study contributes to the studies carried out to identify the model with the lowest model risk to predict VaR. Estimation methods with lower model risk are essential for the financial system’s stability. In addition to financial losses, this risk can compromise strategic decision-making and damage the financial institution’s reputation (Reserve, 2011).

Regarding structure, the remainder of this paper divides into the following contents: in Sect. 2, we expose the multivariate models, which we use to forecast VaR; in Sect. 3, we exhibit the empirical procedures of VaR forecasts assessing; in Sect. 4, we describe empirical results; in Sect. 5, we summarize and conclude the paper.

2 Background

This section briefly describes the multivariate models that we consider in VaR forecasting. Consider X as a financial random variable, where \(X \ge 0\) is a gain and \(X < 0\) is a loss. X is defined in a random variable space \({\mathcal {X}}(\Omega , {\mathcal {F}}, {\mathbb {P}})\). \(F_{X}\) represents the cumulative distribution function of X and \(F_{X}^{-1}\) its inverse (left) function. VaR can be defined by:

$$\begin{aligned} VaR^{\alpha }(X) = -\inf \{x:F_{X}(x) \ge \alpha \} = -F_{X}^{-1}(\alpha ), \end{aligned}$$

where \(\alpha \ \in \ (0, 1)\) corresponds to the significance level. The negative sign of the measure is to represent a monetary loss. This measure represents the maximum loss for a given period and significance level.

2.1 Multivariate GARCH Models

Consider N series of log-returns, and \(X_{T}\) a returns vector (\(N \times 1\)). We represent \(X_t\) by:

$$\begin{aligned}&X_{t} = \mu_{t} + \epsilon _{t}\\&\quad \epsilon_{t} = {\textbf {H}}^{\frac{1}{2}}_{t,m} z_{t} \end{aligned}$$

where \(t \in T\) refers to the period, \(\mu _{t}\) is the conditional mean (for simplicity, for the definition, we assume that \(\mu _t = 0\)), \(\epsilon _{t}\) is the error process, \({\textbf {H}}^{\frac{1}{2}}_{t,m}\) is a matrix \(N \times N\) positive definite, which is specific for each m model, and \(z_{t}\) is a white noise process. Besides that, \({\textbf {H}}_{t,m} = E[X_{t}X^{'}_{T}]\) is the conditional covariance matrix of \(X_{t}\), given the information passed, which can be computed from multivariate GARCH models. The main difference between multivariate GARCH models is the treatment given to obtain the conditional covariance matrix. This study focuses on the DCC-GARCH and GO-GARCH models. See Francq and Zakoian (2019) for a review regarding GARCH models.

2.1.1 DCC-GARCH

The DCC-GARCH model, presented by Engle (2002) and Tse and Tsui (2002), allows jointly modeling of volatility and conditional correlation. Through this model, the conditional covariance matrix is obtained by:

$$\begin{aligned} {\textbf { H}}_{t} = {\textbf {D}}_{t} {\textbf {R}}_{t} {\textbf {D}}_{t}, \,\,\, {\textbf {D}}_{t} = diag\{\sqrt{\sigma _{i,t}}\}, \end{aligned}$$

where \({\textbf {H}}_{t}\) is the conditional covariance matrix, \({\textbf {D}}_{t}\) is the diagonal matrix of conditional standard deviations, which is computed with a univariate GARCH model for each asset i and period t. Moreover, \({\textbf {R}}_{t}\) is the correlation matrix containing the conditional correlation coefficients.

In general, \({\textbf {R}}_{t}\) can be defined as follows:

$$\begin{aligned} {\textbf {R}}_{t} = {\textbf {Q}}^{*-1}_{t} {\textbf {Q}}_{t} {\textbf {Q}}^{*-1}_{t}, \end{aligned}$$

where \({\textbf {Q}}_{t}\) is given by:

$$\begin{aligned} {\textbf {Q}}_{t} = (1 - a + b) {\varvec{\bar{Q}}} + a\epsilon _{t-1}\epsilon ^{'}_{t-1} + b{\textbf {Q}}_{t-1}, \end{aligned}$$

where \({\textbf {Q}}_{t}\) is the conditional covariance matrix of residuals with unconditional covariance matrix \({\varvec{\bar{Q}}}\), which is obtained by means of a univariate GARCH model. a and b are non-negative parameters, which satisfy \(a+ b < 1\), and \({\textbf {Q}}^{*}_{t}\) is a diagonal matrix containing the square root of the diagonal elements of \({\textbf {Q}} _{t}\).

2.1.2 GO-GARCH

The GO-GARCH model, proposed by Van der Weide (2002), belongs to the family of factorial GARCH models. This model is an extension of the Orthogonal Factor GARCH Model (Alexander & Chibumba, 1997). The main assumption of the model is that the process \(X_{t}\) is governed by a linear combination of unobserved variables (factors) \(z_{t}\), independent and with zero mean. For this model, the covariance matrix of \(X_{t}\) is defined by:

$$\begin{aligned} {\textbf {H}}_{t} ={\textbf { W}} {\textbf {H}}^{z}_{t} {\textbf {W}}^{'}, \end{aligned}$$

where \({\textbf {H}}^{z}_{t}\) is the conditional variance matrix, which is obtained through the univariate GARCH model. The positivity of \({\textbf {H}}_{t}\) results from the positivity of \({\textbf {H}}^{z}_{t}\), which is ensured by the positivity of the GARCH model estimators. \({\textbf {W}}\) is a non-singular matrix and invertible parameter.

Since \(E[X_{t},X^{'}_{T}] = {\textbf {H }}= {\textbf {W}}{} {\textbf {W}}^{'}\), Van der Weide (2002) use a singular value decomposition to obtain \({\textbf {W}}\), as:

$$\begin{aligned} {\textbf {W}} = {\textbf {P}} \varLambda ^{\frac{1}{2}} {\textbf {U}}. \end{aligned}$$

The orthonormal eigenvectors of \({\textbf {H}}\) form the columns of \({\textbf {P}}\), and their respective eigenvalues make up of the diagonal matrix \(\varLambda \). \({\textbf {U}}\) is an orthogonal matrix (\(N \times N\)) of eigenvectors of \({\textbf {H}}\), with determinant equal to 1. The matrices \({\textbf {P}}\) and \(\varLambda \) are obtained considering the unconditional information of the matrix \({\textbf {H}}\), while \({\textbf {U}}\) depends on the conditional information of \({\textbf {H}}_ {t}\). For more information regarding approaches that can be considered to compute \({\textbf {W}}\), we suggest Boswijk and Van Der Weide (2006) and Lanne and Saikkonen (2007).

2.2 Copulas

Consider \(X_{1}, X_{2}, \cdots , X_{N}\) with cumulative distribution functions \(F_{1}, F_{2}, \cdots , F_{N}\) and inverse distribution functions given by \(F_{1}^{-1}, F_{2}^{-1}, \cdots , F_{N}^{-1}\). A copula function C is a N-dimensional distribution with uniformly distributed marginals \([0,1]^{N}\). Using Theorem of Sklar (1959), one can obtain the N-dimensional distribution function F of the copula by:

$$\begin{aligned} F(x_{1}, x_{2}, \cdots , x_{N}) = C(F_{1}(x_{1}), F_{2}(x_{2}), \cdots , F_{N}(x_{N})), \end{aligned}$$
(1)

for all x = \((x_{1}, x_{2}, \cdots , x_{N})^{'} \in {\mathbb {R}}^{N}\), C is unique if the corresponding distribution functions are continuous.

Given (1), C is obtained in the following way:

$$\begin{aligned} C(u_{1}, u_{2}, \cdots , u_{N}) = F(F_{1}^{-1}(u_{1}), F_{2}^{-1}(u_{2}), \cdots , F_{N}^{-1}(u_{N})) \ \ \ \ \ \ \ \ \ (u_{1}, u_{2}, \cdots , u_{N}) \ \in [0, 1]^{N}, \end{aligned}$$

where \(F_{i}^{-1}\) is the inverse function of the marginal distribution function of \(F_{i}\). For a formal definition of multivariate copulas, we suggest Nelsen (2007).

2.3 Vine Copula

Copula Vine is a flexible model to describe multivariate copulas, which are constructed from a cascade of bivariate copulas. We will work with three variations of these copulas, R-Vines (Regular Vines), D-Vines and C-Vines. As presented in Kurowicka and Cooke (2006), the density of the R-Vine copula is given by:

$$\begin{aligned} f(u_{1}, u_{2}, \cdots , u_{N}) = \prod ^{N}_{k=1} f_{k}(u_{k}) \prod ^{N-1}_{i=1} \prod _{e \in E_{i}}c_{j(e), k(e)|D(e)}\left\{ { \begin{array}{ccc} F(u_{j(e)}|{\textbf {u}}_{D(e)}), \\ F(u_{k(e)}|{\textbf {u}}_{D(e)}). \end{array}} \right\} , \end{aligned}$$

where \(E_{i}, i = 1, 2, \cdots , N\), are the edges, D(e) are the nodes that the bivariate copulas share, and j(e) and k(e) are the nodes that do not share. The nodes j(e) and k(e) are called the conditioned nodes and D(e) indicates the conditioning set, and the union between the three nodes is called the constraint set. \({\textbf {u}}_{D(e)}\) is a sub-vector of pseudo-observations. \(f_{k}\) is the density function of each asset i, and c is the bivariate copula density, which can be different for each pair-copula.

In the case of D-Vine copula, the density can be represented by (Aas et al., 2009):

$$\begin{aligned} f(u_{1}, u_{2}, \cdots , u_{N}) = \prod ^{N}_{k=1} f_{k}(u_{k}) \prod ^{N-1}_{j=1} \prod ^{N-j}_{i=1}c_{i,i+j|i+1, \cdots , i+j-1}\left\{ { \begin{array}{ccc} F(u_{i}|u_{i+1}, \cdots , u_{i+j-1}), \\ F(u_{i+j}|u_{i+1}, \cdots , u_{i+j-1}). \end{array}} \right\} , \end{aligned}$$

where \({\textbf {u}} = (u_{1}, u_{2}, \cdots , u_{N})^{'} \in [0, 1]^{N}\) are pseudo-observations, the index j indicates the trees, and i connects each edge in each tree. Similarly, the density of the C-Vine copula is given by:

$$\begin{aligned} f(u_{1}, u_{2}, \cdots , u_{N}) = \prod ^{N}_{k=1} f_{k}(u_{k}) \prod ^{N-1}_{j=1} \prod ^{N-j}_{i=1}c_{i,i+j|i+1, \cdots , i+j-1}\left\{ { \begin{array}{ccc} F(u_{j}|u_{1}, \cdots , u_{j-1}), \\ F(u_{j+i}|u_{1}, \cdots , u_{j-1}). \end{array}} \right\} . \end{aligned}$$

The difference between the densities of the D-Vine and C-Vine copulas is that in the former, no node in any tree is connected at more than two ends, and in the case of C-Vine copulas, each tree has a unique node, which is connected with \(N-j\) ends. The marginal distribution for cascade-like constructions, such as Vine copulas, can be obtained by (Joe, 1996; Czado et al., 2013):

$$\begin{aligned} F(u_{i}|{\textbf {u}}) = \frac{\partial Cu_{i},u_{j} | {\textbf {u}}_{-j} \{F(u_{i} | {\textbf {u}}_{-j}), F(u_{j}| {\textbf {u}}_{-j})\}}{\partial F(u_{j}| {\textbf {u}}_{-j})}, \end{aligned}$$

where \(i, j \in {\mathbb {N}}\), \(i \ne j\), \(Cu_{i},u_{j} | {\textbf {u}}_{-j}\) is the dependency structure of \(u_{i}\) and \(u_{j}\).

We refer the reader to Joe (2014) for a review on vine copulas and related topics.

3 Methodological Procedures

In this section, we describe the methodological procedures used in our empirical analysis. As a dataset to build the portfolios, we used stocks belonging to the Ibovespa index during the period from January 2, 2012 to April 28, 2022, making a total of 2559 observations for each stock.Footnote 4 The period from 2012 to April 2022 allows us to analyze periods of calm in the Brazilian market and turbulence, including, for example, the crisis triggered by the COVID-19 outbreak. Besides that, our sample includes the period of the Russian invasion of Ukraine. We consider the Brazilian market because it is an emerging market and the largest stock exchange in Latin America. Besides that, we selected this market because of data availability. Countries in emerging financial markets differ substantially from those in developed markets. In periods of financial instability, emerging markets are more affected than developed economies. Gencay and Selcuk (2004) comment that a significant part of total savings in developed economies is invested in emerging markets, either for the hedge or mutual funds. Thus, the implications of modeling emerging countries’ risk investments are not limited to investors residing in the country. We compute the log-returns for each stock in our sample using the adjusted price series. According to the KPSS test (Kwiatkowski et al., 1992), the log-returns are stationary. For the sake of brevity, these results are omitted but are available on request.

To build the portfolios, we use two strategies: equally weighted portfolio (naive diversification) and minimum risk portfolio.Footnote 5 To limit transaction and information costs, we limit the number of stocks in the portfolio with a cardinality constraint. In general, we use the simplest model for the minimum risk portfolio with cardinality constraint since we want to check the impact of the weights on the performance of the risk prediction models.Footnote 6 This model is given by:

$$\begin{aligned}&\min _{W \in R^N,\, z \in \left\{ 0,1\right\} ^N} \rho \left( \sum _{i=1}^Nw_iX_i\right) \nonumber \\&\text {st} \nonumber \\&\sum _{i=1}^Nw_i = 1\nonumber \\&w_i \ge 0, \; \forall \; i = 1, \cdots , N,\nonumber \\&w_i \le z_i, \; \forall \; i = 1, \cdots , N,\nonumber \\&\sum _{i=1}^N z_i \le K, \end{aligned}$$
(2)

where \(X_{i} \in {\mathcal {X}}, i = 1, \cdots , N\), being N the number of stocks, is a vector of returns (stocks) that might compose the portfolio, \(w_{i}\) refers to the weight of the \(i-\)th stock, \(z_{i}\) is a binary decision variable that receives value 1 if the i-th stock is included and 0 (zero) otherwise, and K sets the maximum number of stocks to be allowed.Footnote 7 Our problem does not allow short selling \(\left( w_i \ge 0, \; \forall \; i = 1, \cdots , N\right) \) and requires all capital will be allocated \(\left( \sum _{i=1}^Nw_i = 1\right) \). These restrictions are common in portfolio optimization problems. See, for example, Righi and Borenstein (2018). The weights obtained by the formulation (2) were not rebalanced. \(\rho (\cdot )\) refers to the Expected Shortfall. We use this measure in the portfolio optimization problem because it is a subadditive/convex measure. For details see Artzner et al. (1999). Thus, the risk of the combined position is less than or equal to the sum of the individual risks of the stocks. ES is quantified using Historical Simulation. We selected the HS method because our focus is not on selecting the most appropriate model for portfolio optimization. Furthermore, this method is widely used in academic studies and the financial industry (Pérignon & Smith, 2010). We did not determine the portfolio weights using VaR because this does not respect the subadditivity axiom.

The estimation methods used to predict VaR are the traditional HS, Vine copulas, and multivariate GARCH models, being the last two groups of models explained in Sect. 2. For the copula method and multivariate GARCH models, at first, we model the mean (\(\mu _{i,t}\)) and conditional standard deviation (\(\sigma _{i,t}\)) of each univariate series, that is, of each stock i. The Ljung-Box test (Ljung & Box, 1978) indicated the presence of significant autocorrelation in the stocks log-returns. Thus, we fit the conditional mean with an autoregressive (AR) model of order 1 with a constant. According to Garcia-Jorcano and Novales (2021), the AR(1) model is sufficient to produce serially uncorrelated innovations. We also applied the Ljung-Box test to the squared standardized residuals. Test values indicate a significant presence of conditional heteroskedasticity.Footnote 8 To model the conditional standard deviation, we consider the GARCH model. We use the AR-GARCH because the risk forecasting literature shows that the model presents good results in forecasting risk measures for univariate series. See, for instance, Hartz et al. (2006) and Garcia-Jorcano and Novales (2021). Furthermore, it is a common model to obtain the uniform marginal distributions, which are necessary for estimating the copulas and multivariate GARCH models (Müller & Righi, 2018).

The AR(p)-GARCH(qs) model is represented by:

$$\begin{aligned}&X_{i,t} = \phi _0 + \sum _{l=1}^{p}\phi _l X_{i,t-l} + \epsilon _{i,t} = \mu _{i,t} + \epsilon _{i,t}, \nonumber \\&\epsilon _{i,t} = \sigma _{i,t}z_{i,t}, \qquad z_{i,t} \sim i.i.d. \, F(z_{i,t};\varvec{\theta }), \nonumber \\&\sigma _{i,t}^2 = a_0 + \sum _{j = 1}^{q}a_{j}\epsilon ^2_{i,t-j} + \sum _{k = 1}^{s}b_{k}\sigma ^2_{i,t-k}, \end{aligned}$$
(3)

where \(X_{i,t}\) for time t and stock i is the return. \(\phi _l\), for \(l = 1, \cdots , p\), being p the autoregressive order, are parameters of the autoregressive model, \(\epsilon _{i,t}\) is the error term, \(z_{i,t}\) is a white noise process with distribution \(F(z_{i,t};\varvec{\theta })\), where \(\mathbf {\theta }\) is a vector of parameters of distribution, including zero mean, unit variance and some additional parameter that varies with the distribution. \(\mu _{i,t}\) and \( \sigma _{i,t}^2\) are the mean and conditional variance given past information of each stock i, and \(a_j\), for \(j = 1,\cdots , q\), as well as \(b_k\), for \(k = 1, \cdots , s\), are parameters of the GARCH model (\(a_0 >0\), \(a_j\ge 0\), \(b_k\ge 0\)), and q and s are its order. For F, we assume the normal (norm), Student’s t (std) and skewed Student’s t (sstd) distribution. We use the normal distribution as it is the most common and is often used in stock market analyses (Müller & Righi, 2018). The first two moments describe this distribution: mean (\(\mu \)) and variance (\(\sigma ^2\)). We also consider the Student’s t distribution because it considers the heavy-tailed behavior of financial stocks. This distribution is described completely by the shape parameter (\(\nu \)). In addition to the heavy tails, the skewed Student’s t distribution allows modeling the asymmetry in financial stocks. This distribution is described by the shape (\(\nu \)) and skewness (\(\xi \)) parameter. For more details, see Fernández and Steel (1998). To see the probability density function of three distributions, we suggest Ghalanos (2020). To choose the lags of the AR-GARCH model, we compare the Akaike Information Criteria (AIC) values. Our results suggest that the most suitable model to adjust the log-returns is the AR(1)-GARCH(1,1). We obtain the standardized residuals after forecasting the mean and standard deviation values based on the adjusted AR(1)-GARCH(1,1) model, considering normal, Student’s t and skewed Student’s t distribution for \(z_{i,t}\). Then, we transform them into pseudo-observations \({\textbf {u}} \in [0 ,1]\) by inverting the fitted distribution of each series. This procedure is necessary to predict risk measures using copulas. Given the marginal distribution and the estimated parameters, we use the following algorithm to predict VaR with copulas (Aas and & Berg, 2010; Righi & Ceretta, 2013):

  1. (i)

    For each stock, we forecast the mean \(\mu _{i,t+1}\) and the conditional standard deviation \(\sigma _{i,t+1}\) using the AR(1)-GARCH(1,1) model.

  2. (ii)

    We simulate N samples \(u_{i,T}\) with size T, being that each i represents a stock of our sample, using Vine copulas.

  3. (iii)

    Given \(u_{i,T}\), we generated N \(z_{i,T}\) through the inversion of marginal probability, as \(z_{i,T} = F^{-1}(u_{i,T})\).

  4. (iv)

    For each stock i, we obtain the returns by \(r_{i,T} = \mu _{i,t+1} + \sigma _{i,t+1}z_{i,T}\).

  5. (v)

    Based \(r_{i,T}\) of each i, we calculate the portfolio returns \({W} {\textbf {r}}_T\), where \({W} = \left\{ w_1, w_2, \cdots , w_N\right\} \) is a vector with the weights and \({\textbf {r}}_T = \left\{ r_{1,T}, r_{2,T}, \cdots , r_{N, T}\right\} \) are the log-returns of stocks.

  6. (vi)

    Then, we quantified the VaR forecast for \(t+1\) (VaR\(^{\alpha }_{t+1}\)) of portfolio returns using the Historical Simulation, i.e., the empirical data distribution.

The Vine copulas are constructed from a cascade of bivariate copulas. See Sect. 2.3. In this study, we choice of bivariate copulas utilizing AIC. The bivariate copulas considered were: Gaussian, Student’s t, Clayton, Gumbel, Frank, Joe, BB1, BB6, BB7, BB8, as well as the rotated copulas Clayton (180 degrees), Gumbel (180 degrees), Joe (180 degrees), BB1 (180 degrees), BB6 (180 degrees), BB7 (180 degrees), BB8 (180 degrees), Clayton (90 degrees), Gumbel (90 degrees), Joe (90 degrees), BB1 (90 degrees), BB6 (90 degrees), BB7 (90 degrees), BB8 (90 degrees), Clayton (270 degrees), Gumbel (270 degrees), Joe (270 degrees), BB1 (270 degrees), BB6 (270 degrees), BB7 (270 degrees) and BB8 (270 degrees). We consider portfolios formed with 6 and 12 stocks, i.e., \(K = 6\) and 12; for T, we use the same value of the rolling estimation window, which was 500 and 1000 observations. Thus, we have a total of 1,559 out-of-sample observations, that is, predictions. The out-of-sample period coincides with both rolling estimation windows. For the rolling estimation window of 1000 observations, our first out-of-sample prediction was obtained considering data from January 2, 2012, to January 20, 2016. For the rolling estimation window of 500 observations, our first out-of-sample prediction was obtained considering data from January 13, 2014, to January 20, 2016. We exclude the first 500 observations for this rolling window so that the out-of-sample period matches. Thus, we can compare the performance of the models for both rolling estimation windows. Furthermore, the period used for the first out-of-sample risk forecast was the one used to obtain the weights of the different portfolios. As significance levels, we use 1%, 2.5% and 5% because they are the most common values in the literature, and regulatory agencies recommend 1% for VaR estimation (Basel Committee on Banking Supervision, 2013; Righi & Ceretta, 2015). For W, we use the vector of weights given by formulation (2). We also consider the equally weighted weights for the stocks selected in the optimization problem, i.e., \(\frac{1}{N}\). In the case of multivariate GARCH models, the difference is that \(z_{i,T}\) is obtained by employing these models instead of the copula method. For the GO-GARCH model, we performed the estimation of the \({\textbf {W}}\) matrix through independent component analysis (ICA). Regarding the HS approach, the returns \(r_{i,T}\) of each stock i is the empirical data distribution.

To describe the risk forecasts, we quantified the mean and standard deviation. The quality of the forecasts was assessed using the VaR score function, which is given by (Gneiting, 2011):

$$\begin{aligned} {\mathcal {L}}_{VaR}(X, Y): = \frac{1}{n}\sum ^n_{i=1}\left[ \alpha \left( x_{t+1}-y_{t+1} \right) ^+ + (1-\alpha )\left( x_i-y_i\right) ^- \right] , \end{aligned}$$

where \(x_{t+1} \in X\) are the observations for the out-of-sample period and \(y_{t+1} \in Y\) are the risk forecasts for this period. According to this criterion, the model with the lowest value is the most accurate in predicting VaR. The results of this function are called realized loss.

Besides that, we quantified realized cost, which is obtained in the following way:

$$\begin{aligned} \text {Cost}(X, Y) := \text {Cost}_{G, L}(X, Y) = \frac{1}{T}\sum ^{T}_{t=1}\left[ \left( x_{t+1} -y_{t+1}\right) ^+g_{t+1} + (x_{t+1} -y_{t+1})^-l_{t+1}\right] \, \end{aligned}$$
(4)

where \(x_{t+1}\in X\) and \(y_{t+1} \in Y\). Besides, \(g_{t+1} \in G\) represents costs from risk overestimation and \(l_{t+1} \in L\) costs from risk underestimation. G and L are positive random variables. The formulation (4) is based on the robust risk measurement approach proposed by Righi et al. (2020) and model risk measure discussed in Müller and Righi (2020). The realized cost identifies the model with the best trade-off between the sum of the costs from risk overestimation and underestimation. We use overestimation and underestimation cost Selic interest rate (Selic) and average interest rates for credit operations in Brazil (C.rate), respectively.Footnote 9 These rates represent, respectively, a risk-free investment with high liquidity where the surplus over capital requirement could be safely applied and an average interest rate of resources can be obtained when the capital requirement is not enough to cover losses. We convert both yield rates to daily frequency. Details of both series are available under request. The results for the out-of-sample period are presented considering a complete out-of-sample sample (Full sample). The out-of-sample period comprises from January 21, 2016, to April 28, 2022. Furthermore, we divided the out-of-sample period into two intervals to analyze the results: 2016 to 2019 and 2020 to April 2022. From 2016 to 2019, we have an interval that is free from the influence of the COVID-19 outbreak. This period also includes a relatively stable period in the Brazilian market. From 2020 to April 2022 reflects the influence of the COVID-19 pandemic from the beginning of the pandemic until the end of the sample. In this way, we can verify whether the performance of risk prediction models has changed during the COVID-19 outbreak.

4 Results

This section presents and discusses the results obtained. First, we describe the descriptive analysis of the stock data and portfolio returns considering the sample period. We omitted these descriptive results for out-of-sample period intervals for brevity, but the results are available under request. We then analyzed the performance of the competing forecasting models using realized loss and cost. In Table 1, we describe the descriptive statistics, which include mean, minimum, maximum, standard deviation, skewness, and excess kurtosis (E. kurtosis), of stocks that belong to the Brazilian market index and are considered in our investigation. In general, it can be seen that the average of the stocks is close to zero. We note that around 70% of the stocks show negative asymmetry, which indicates a fatter tail on the left side of the distribution. In contrast, the positive skewness of other stocks represents a long tail on the right side. We note that the average asymmetry of the stocks that make up the sample is negative. Excess kurtosis values greater than zero indicate that the data distribution is leptokurtic, that is, the stock returns distribution has heavy tails. The stock’s average excess kurtosis during the sample period is 10.4129. However, we found that BRAP4Footnote 10 and SULA11Footnote 11 have the highest excess kurtosis values (72.8797 and 33.5521, respectively). PRIO3Footnote 12 is the stock with the highest standard deviation (4.8952), possibly indicating that this stock is the riskiest among those considered. This stock is a common stock, which is usually riskier compared to preferred stocks. The characteristics observed by stocks are stylized facts observed in financial returns (Cont, 2001).

Table 1 Descriptive statistics of the log-returns of the sixty-three (63) stocks belonging to the Ibovespa index

In Table 2, we describe the mean and standard deviation values of portfolios of minimum risk and equally weighted with 6 and 12 stocks,Footnote 13 considering 500 and 1000 observations for weights estimation. Portfolio returns, as well as stock returns, have an average close to zero. The portfolios built with 12 stocks are less volatile than portfolios with 6 stocks, i.e., they have a lower standard deviation. This result indicates that portfolios with 12 stocks are less risky than those with 6. Portfolios with more stocks, that is, more diversified, tend to reduce investment risk. For the portfolios with 6 stocks, we verify that the minimum risk portfolio has a higher standard deviation, indicating that they tend to be more volatile than equally weighted portfolios. We also point out that the composition of the portfolios changes when we consider different numbers of stocks and observations to estimate the weights, which justifies the differences in the descriptive statistics of the portfolios. For visual analysis of the differences among portfolio returns, we present in Fig. 1 a graphic evolution of the different portfolios. We present portfolio returns for the full period (January 2012 to April 2022). Note that for the risk estimation considering 500 observations, we do not consider the first 500 observations. We did that with the intent that the out-of-sample period coincides with portfolios where we use 500 and 1000 observations for risk prediction. This way, we can compare the results obtained for both rolling estimation windows. The illustration confirms the greater variability and possibly greater risk of the portfolio with 6 stocks compared to the 12 stocks. For all portfolios, we have noticed an increase in volatility around March 2020. The COVID-19 outbreak can explain the increase in variability. Ashraf (2020) and Vasileiou et al. (2021) point out that the decline in returns in financial markets occurred primarily when the increase in confirmed COVID-19 cases occurred, an event that also coincided with the period when the World Health Organization (WHO) officially declared the COVID-19 outbreak as a global pandemic. Furthermore, we observe the similarity of returns obtained through the two portfolio construction strategies: minimum risk and equally weighted.

Table 2 Descriptive statistics [Mean and standard deviation (SD)] of portfolios with 6 and 12 stocks
Fig. 1
figure 1

Returns on portfolios built with 6 and 12 stocks, considering the minimum risk and equally weighted strategies. For portfolios with 6 and 12 stocks, the weights were computed using 500 and 1000 observations to estimate portfolio weights. The sampling period comprises daily data from January 2012 to April 2022. The portfolio returns are multiplied by 100

In Tables 3, 4, 5, 6we describe the mean, standard deviation (SD), realized loss (\({\mathcal {L}}_{VaR}\)), and realized cost (Cost) for portfolio risk forecasts with 6 and 12 stocks, respectively, considering \(\alpha = 1\%\) and 5%. The results obtained for \(\alpha = 2.5\%\) are described in the Appendix. We note, as expected, that the mean value and standard deviation of risk forecasts are higher in portfolios formed with 6 stocks. In a portfolio with 12 stocks, there is a greater reduction in diversifiable (non-systemic) risks than in a portfolio with 6 stocks. We also verified that the significance level equal to 1% presents higher risk estimates when compared to \(\alpha = 5\%\). Furthermore, the values of the risk forecasts for the significance level equal to 2.5% are between the forecasts obtained for \(\alpha \) 1% and 5%. This result is expected because more extreme levels are associated with higher losses, implying higher risk forecasts.

Table 3 Descriptive statistics [Mean and standard deviation (SD)], loss (\({\mathcal {L}}_{VaR}\)), and realized cost (Cost) of VaR forecasts, considering portfolios with 6 stocks and 500 observations to estimate the weights
Table 4 Descriptive statistics [Mean and standard deviation (SD)], loss (\({\mathcal {L}}_{VaR}\)), and realized cost (Cost) of VaR forecasts, considering portfolios with 12 stocks and 1000 observations to estimate the weights
Table 5 Descriptive statistics [Mean and standard deviation (SD)], loss (\({\mathcal {L}}_{VaR}\)), and realized cost (Cost) of VaR forecasts, considering portfolios with 12 stocks and 500 observations to estimate the weights
Table 6 Descriptive statistics [Mean and standard deviation (SD)], loss (\({\mathcal {L}}_{VaR}\)), and realized cost (Cost) of VaR forecasts, considering portfolios with 6 stocks and 1000 observations to estimate the weights

For the same significance level, we found that, in general, for a smaller rolling estimation window (500), risk predictions tend to be higher. This result does not hold when we evaluate portfolios with 6 stocks and weights obtained from risk minimization. Some studies, such as Kuester et al. (2006); Alexander and Sheedy (2008), and Righi and Ceretta (2015) argue that larger estimation windows tend to improve the accuracy of risk predictions. Smaller sample sizes are usually associated with estimation problems, such as the bias of the estimators of the parameters of the models used to predict risk. To compare the impact of different estimation windows for risk prediction, we recommend Righi and Ceretta (2016). Moreover, we notice that the risk statistics of the minimum risk and equally weighted portfolios differ, which is in line with the descriptive statistics of the portfolio returns. The equally weighted portfolio tends to be less risky when evaluating out-of-sample forecasts for the same rolling window and significance level. However, it should be noted that our intention here is not to compare the results of the portfolios built using both strategies to determine the weights but to evaluate the impact of different weights in selecting the most accurate model to predict risk.

Furthermore, we noticed that the average risk forecasts for the sub-sample from 2020 to April 2022 tend to be higher than the period from 2016 to 2019. This result was expected because the sample from 2020 to April 2022 includes the period of the COVID-19 outbreak. For this period, we also observed an increase in the standard deviation of the risk forecasts. The higher standard deviation in the forecasts for 2020–2022 can be explained by the period of heavy losses around March 2020 and the market recovery after the shock, as can be seen in Fig. 1. Literature exposes some reasons for market recovery, including the low-interest rates and expansive actions taken by central banks (Cantú et al., 2021). Khalfaoui et al. (2021) and Rouatbi et al. (2021) identify that vaccination was also responsible for stabilizing stock markets. The implementation and acceleration of the vaccination program increased market confidence, spurring the stock market recovery (Khalfaoui et al., 2021).

We notice that in general, Vine models have the best result, with the main emphasis on the case in which the marginal distribution is skewed Student\('\)s t. For example, considering the portfolio with 6 stocks and weights obtained from risk minimization with a sample of 500 observations, we observe R-Vine\(_{sstd}\) as the model with the lower realized loss for the full sample and the sub-sample 2016-2019, when we use VaR\(^{1\%}\) and VaR\(^{2.5\%}\). Vine copulas describe the multivariate relationship between stocks considering a cascade of pair-copulas and the marginal densities (Kurowicka & Cooke, 2006). The variety of bivariate copulas that can be considered allows us to construct a rich multivariate distribution class, which models complex and asymmetric dependency structures (Geidosch & Fischer, 2016). Each of the Vine copulas considered provides a way of arranging the pair-copulas in a tree structure (hierarchical), facilitating the analysis of multiple dependencies. Good performance of Vine copulas to forecast risk measures was also observed in the Monte Carlo simulation performed by Müller and Righi (2020), and empirical analyses of Righi and Ceretta (2015), and Trucíos et al. (2020). The superiority of copula models over GARCH approaches, such as GO-GARCH, can be explained by the greater flexibility of the copulas to model the dependence between variables (Kole et al., 2007). Also, it is observed that the results of the realized loss for the Vine copulas tend to be similar for the same marginal distribution. Regarding the marginal distribution, the skewed Student\('\)s t tends to do better results because it considers two important stylized facts in the returns: asymmetry and excess kurtosis.

Although we perceive the superiority of copulas, we observed that the performance of the models tends to change as we consider different significance levels, rolling estimation windows, and analysis periods (full sample, 2006-2019, 2020-2022). Regarding different significance levels, we have that lower significance levels are associated with more extreme observations. For this reason, different models may more accurately accommodate the data characteristics for each level. The different performance results for both rolling estimation windows can be explained because the smaller sample size in general, they are affected by the bias of the estimators of the model parameters employed for risk forecasting. A similar result is found by Wong et al. (2012). Furthermore, when considering a rolling window of 1000 observations, we have more information about the dynamics of returns, which can influence the performance of risk prediction models. When analyzing the out-of-sample samples used to present the results, we noticed in many scenarios that the model with the lowest realized loss for the full sample tends to be the same as for the 2016-2019 period. By an illustration, for portfolios with 6 stocks and a rolling estimation window of 1000 observations, considering minimum risk and equally weighted portfolio, and \(\alpha = 2.5\%\), the lowest realized loss is presented by D-Vine\(_{sstd}\). However, some exceptions are found. For example, for a portfolio of minimum risk, considering a portfolio of 6 stocks, a rolling estimation window of 1000 observations, and \(\alpha = 1\%\), the DCC model presents the best result for three out-of-sample periods, that is, the lowest realized loss. We also observe that the model’s performance is not generally maintained for the results obtained for the two weighting strategies used to build the portfolios. However, we note that when we consider portfolios with 6 stocks and a rolling estimation window of 1000 observations, the performance of the models tends to coincide for both strategies of determining the weights (this result holds for the full sample and 2016-2019). From a practical point of view, it is clear that when the manager performs the portfolio rebalancing, he does need to worry about choosing a new risk prediction model.

In Fig. 2, we display the evolution of risk estimations considering minimum risk portfolio with 6 stocks and \(\alpha = 1\%\) and rolling estimation window of 500 observations. The VaR plotted had its signal converted so that the plot of VaR can be compared to the negative portfolio returns. For brevity, we omit the illustrations from the other risk estimates. However, they are available under request. Visually, we can verify that the risk estimates obtained via the DCC and copula approach, considering the normal and skewed Student’s t distribution, tend to follow better the behavior of the portfolio returns, which corroborates the good results of both forecasting approaches according to the realized loss. We visually verify that the predictions obtained by the copula approach using Student’s t distribution as marginal distribution results in predictions that do not follow the evolution of the series and are visually volatile, which is confirmed by checking the standard deviation of the predictions obtained by these models. A similar evolution is displayed by the GO-GARCH model forecasts, mainly between 2016 and 2017. For the full sample and 2016-2019 results, we identified that the GO-GARCH model is among the models with the highest realized loss (worst result). This result does not corroborate the findings of Tiwari et al. (2020), which identify consistent results between GO-GARCH and DCC for risk analysis. Regarding HS, we observe that this approach does not follow the evolution of portfolio returns. Thus, as observed by Müller and Righi (2020), we note that the HS tends to overestimate the risk in calm periods and underestimate the risk in turbulent periods. One of the reasons for this is that HS responds slowly to volatility and price movement changes (Pritsker, 2006). For a discussion of the use of HS in risk forecasting, we recommend Christoffersen and Gonçalves (2005) and Pritsker (2006).

Fig. 2
figure 2

Returns and risk forecasts on portfolios built with 6, considering the minimum risk portfolio, \(\alpha = 1\%\) and a rolling estimation window equal to 500 observations. The out-of-sample period comprises data from January 2016 to April 2022. The risk forecasts are obtained by: R-Vine, C-Vine, and D-Vine, considering normal, Student’s t and skewed Student’s t distribution for marginal distribution, DCC model using normal and Student’s t distribution, HS, and GO-GARCH model. Note The solid line refers to the portfolio returns with 6 stocks. We obtain the portfolio weight considering a minimum risk portfolio (see Eq. (2)). The dashed lines refer to the risk forecasts. We illustrate the VaR predictions obtained by the different models we use, considering \(\alpha = 1\%\) and a rolling estimation window equal to 500 observations

Regarding the results of the cost realized, we verified that the results differ from the loss realized. For a significance level equal to 1%, the DCC model shows the lower realized cost. This model also performs well for \(\alpha = 2.5\%\) and an equally weighted portfolio. Corroborating, Weiß (2013) identifies that DCC models are not outperformed by the copula models, in the bivariate sense, for estimating VaR and ES. In the other scenarios, R-Vine\(_{sstd}\) and D-Vine\(_{sstd}\) present the best results (lower realized cost). Thus, DCC, R-Vine\(_{sstd}\) and D-Vine\(_{sstd}\) present the best trade-off between the sum of the costs from underestimation and overestimation. The highest values for the realized cost are identified copulas models with Student’s t distribution. The differences in the results identified by both criteria can be explained by the fact that the VaR loss function penalizes more aggressively those observations for which we observe returns showing risk estimates exceedance and consider only forecasting errors rather than the costs associated with such errors (Righi et al., 2020). However, it is worth noting that the results of the realized cost are conditioned to the costs from risk overestimation and underestimation paid by the manager. A thorough analysis of the costs incurred by incorrect risk forecasting is essential because it reduces profitability (risk overestimation) and increases costs from unexpected and uncovered losses (risk underestimation).

5 Concluding Remarks

This study analyzed the performance of multivariate models to predict VaR. The models considered were HS, GARCH-DCC with multivariate normal and Student’s t distribution, GO-GARCH, C-Vine, D-Vine, and R-Vine. For copula models, we consider normal, Student’s t and skewed Student’s t distribution as marginal distribution. To evaluate the performance of the models, we considered portfolios of minimum ES with cardinality restriction and equally weighted obtained from the stocks belonging to the Ibovespa index considering 500 and 1000 observations to quantify the weights. We form portfolios with 6 and 12 stocks. The risk forecasts are obtained using two rolling estimation windows (500 and 1000) and significance levels equal to 1%, 2.5%, and 5%. To assess VaR forecasting, we consider realized loss and cost. We quantify the VaR forecasts from January 2012 to April 2022.

In general, we observe that risk estimates are sensitive to the choice of significance level, strategy to determine the weights, and estimation window. We observed that, according to the realized loss, the copulas considering skewed Student’s t distribution tend to present better performance. Another interesting result is that the model indicated by the realized loss, obtained via the score function, does not coincide with the model’s best performance according to the realized cost. According to realized cost, DCC, R-Vine\(_{sstd}\), and D-Vine\(_{sstd}\) present the best performance, i.e., these models have the best trade-off between the sum of the costs of overestimation and underestimation. Knowing the performance of risk estimates and their behavior under different scenarios is important for both regulators and investors since the risk is one of the most important variables, along with return, in financial decision-making. For future research, we recommend further investigating the impact of the weight on stocks that make up the portfolio on the performance of risk forecasting models. In addition, it is suggested that other models be included and portfolios with more stocks investigated than those considered in this study. Finally, we recommend that future work considers data from other markets and different data frequencies, including weekly and intraday data.