Modeling Risk for CVaR-Based Decisions in Risk Aggregation

Zinchenko, Yuriy; Asimit, Alexandru V.

doi:10.3390/jrfm16050266

Open AccessArticle

Modeling Risk for CVaR-Based Decisions in Risk Aggregation

by

Yuriy Zinchenko

^1,2,* and

Alexandru V. Asimit

³

¹

Department of Mathematics and Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada

²

Gurobi Optimization, LLC, Beaverton, OR 97008, USA

³

Bayes Business School, University of London, London EC1Y 8TZ, UK

^*

Author to whom correspondence should be addressed.

J. Risk Financial Manag. 2023, 16(5), 266; https://doi.org/10.3390/jrfm16050266

Submission received: 4 March 2023 / Revised: 5 May 2023 / Accepted: 6 May 2023 / Published: 9 May 2023

(This article belongs to the Special Issue Risk Management and Forecasting Methods in Finance)

Download

Browse Figures

Versions Notes

Abstract

:

Measuring the risk aggregation is an important exercise for any risk bearing carrier. It is not restricted to evaluation of the known portfolio risk position only, and could include complying with regulatory requirements, diversification, etc. The main difficulty of risk aggregation is creating an underlying robust probabilistic model. It is an irrefutable fact that the uncertainty in the individual risks is much lower in its complexity, as compared to modeling the dependence amongst the risks. As a result, it is often reasonable to assume that individual risks are modeled in a robust fashion, while the exact dependence remains unknown, yet some of its traits may be made available due to empirical evidence or “good practice”. Our main contribution is to propose a numerical procedure that enables the identification of the worst possible dependence scenario, when the risk preferences are modeled by the conditional value-at-risk in the presence of dependence uncertainty. For portfolios with two risks, it is known that CVaR ordering coincides with the lower-orthant stochastic ordering of the underlying bivariate distributions. As a by-product of our analysis, we show that no such extensions are possible to higher dimensions.

Keywords:

risk management; conditional value-at-risk; uncertainty modeling; bilinear optimization; linear programming; risk aggregation

1. Introduction

Risk aggregation is a well-known strategy to reduce the overall risk held by a financial institution, insurance company, or any other risk bearing carrier. Risk portfolios are often a summation of individual risks (or lines of business) and the risk bearing carrier is usually concerned with evaluating the risk position for this portfolio so that regulatory requirements or business targets (such as diversification, shareholder value management constraints, etc.) are met. Within the insurance and banking industries, there are regulatory requirements that financial institutions need to meet by maintaining an appropriate level of capital at all times. These calculations take into account multiple sources of risk and all other factors that contribute to changes in the company’s balance sheet within a specified period of time. Examples of such regulatory requirements include international Basel II/III banking supervision guidelines (e.g., see BCBS 2016) and the Swiss Solvency Test that applies to all Swiss based insurance and reinsurance companies (e.g., see Swiss Solvency Test 2006), where the risk measurements are performed via the well-known risk measure conditional value-at-risk (CVaR). This risk measure is introduced in the seminal paper of Rockafellar and Uryasev (2000) and has shown clear computational advantage in OR applications.A risk aggregation application in the context of the European Union insurance regulations, known as Solvency II, is given in Asimit et al. (2016).

Many practical situations show that obtaining full knowledge of the dependence amongst a group of observed random variables is a very difficult task. It is an irrefutable fact that when modeling multivariate risks, the estimation error is weighted towards determining the dependence amongst the risks. Common practice has shown that individual risks are estimated with higher confidence as compared to the dependence model between the variables. Unlike estimating individual risks, fitting the dependence model typically presents a great challenge, especially due to data scarcity. As a result, decision makers usually commit to a somewhat arbitrarily chosen parametric model, but these ad hoc choices lead to inadequate evaluations of the overall risk. Therefore, it may be more preferable to use qualitative information about the dependence and use a notion of realistic weakest and strongest dependence models amongst the observed risks instead. For example, knowing that the risks are positively associated would imply that the independence represents the weakest possible dependence, etc. Thus, it is more reasonable to assume that we have reliable models for individual random variables coupled with some partial knowledge of their association.

Many attempts have been made to resolve the problem of risk valuation under uncertainty modeling and more specifically under dependence uncertainty. The literature on this topic is vast and we give only a brief account of the related work. One direction of research typically pursued in the OR literature is to adapt recent methodologies from the so-called robust optimization. For example, in robust portfolio optimization, one typically assumes that a decision maker has some partial information about the joint distribution function amongst the risks. In order to incorporate the uncertainty, several notions of the worst-case risk measure have been proposed. For example, El Ghaoui et al. (2003) and Zymler et al. (2013) discuss this problem in the context of VaR-based optimal portfolio selection. The same problem is investigated in Zhu and Fukushima (2009) and Huang et al. (2010), where decisions are made on the worst-case CVaR; a related insurance setting is discussed in Asimit et al. (2017, 2019) and Balbás et al. (2011), while robust portfolio selection and related topics are addressed in manuscripts like Blanchet et al. (2017) or Fabozzi et al. (2010). Some attention has been devoted to computing bounds on CVaR with moment information. For example, in Bertsimas et al. (2004) sharp explicit bounds are obtained with the first two moments, and in the work of Bertsimas and Popescu (2002), where a more general numerical convex-optimization-based approach is obtained. Recall that robust versions of the above moment-based models may be developed in principle, relying on the so-called robust optimization techniques (e.g., Ben-Tal et al. 2009). Interesting connections between chance-constrained and robust optimization in relation to CVaR are established in Chen et al. (2009). Other risk measures (beyond CVaR) are available in the literature; e.g., the higher-moment risk measure that is investigated in Gómez et al. (2022).

The main contribution of our paper is to propose a method to evaluate sharp lower and upper bounds for the CVaR-based aggregate risk level under dependence uncertainty. Specifically, we assume that bounds on the cumulative multivariate distribution are available, as well as that we have the full knowledge of the individual risk distributions. Here, the partial information about dependence is given by the lower-orthant stochastic ordering type constraints. Arguably, the most practically relevant examples of such types of constraints are the so-called positive and negative quadrant dependence models. The practical advantage of using the above dependence models is that we can test the statistical significance of such properties (see Gijbels and Sznajder 2013). In other words, the validity of restricting the range of possible dependence models may be statistically verified using the observed data. The latter plausible dependence provides us with the main motivation to include lower-orthant type restrictions in our model.

From the methodogical perspective, our numerical method is based on (convex) optimization techniques and specifically, bilinear and linear programming (LP). Interestingly, despite the associated optimization problem being bilinear—and thus non-convex—in nature, we show that the problem’s objective function still retains a strong structural property, namely, it is convex in every argument, and in turn, the convexity provides the basis for efficient computations. Despite a seeming symmetry of the two problems, evaluating the sharp lower bound on CVaR appears to be more of a challenge, as compared to computing the sharp upper bound. This is substantiated by both the complexity analysis of the proposed method, and the numerical results.

It is known that CVaR respects the so-called lower-orthant stochastic ordering for two-dimensional portfolios (chapter 6.2.6 of Denuit et al. (2005)). Yet no similar result has been established or disproved for higher dimensions. As a by-product of our analysis, using elementary LP techniques, we show that no such extensions are possible, and give insights as to why this is the case.

The paper is organized as follows. Section 2 presents our model for determining sharp upper and lower bounds on the CVaR-based aggregate risk level. Section 3 and Section 4 describe the approach used to compute the lower and upper bounds, respectively. Section 5 contains the numerical experiments and analysis, while Section 6 discusses the behaviour of CVaR in multivariate settings under the lower-orthant and other orderings. Our final comments and conclusions are summarized in Section 7.

2. Model Setting

The notation relies on lower case letters

t, α, x, \dots

for deterministic quantities and capital letters

Z, X, \dots

for random variables. Likewise, we use capital letters

F_{i}, Π, \dots

to denote functions. Bold letters such as

x, i, \dots

and

X, \dots

are reserved for deterministic and random vectors, respectively; likewise, we use

A, \dots

for vector-valued functions. Capital script letters

I, M, \dots

are used for sets.

Let

X = (X_{1}, \dots, X_{n})

denote an n-variate random vector, and let

Z = \sum_{i = 1}^{n} X_{i}

be the sum of n possibly dependent risks. The VaR of a generic loss variable Z at confidence level

α

,

{VaR}_{α} (Z)

, represents the

α

-quantile of Z. Mathematically,

{VaR}_{α} (Z) : = inf {z \in ℜ : Pr (Z \leq z) \geq α}

, where

inf \emptyset = \infty

. The CVaR at confidence level

α

,

{CVaR}_{α} (Z)

, evaluates the expected loss amount incurred under the worst

100 \times (1 - α) %

loss scenarios of Z. The CVaR has multiple formulations in the literature (Acerbi and Tasche 2002), but in the present paper, we only refer to the following representation (Rockafellar and Uryasev 2000),

{CVaR}_{α} (Z) : = {VaR}_{α} (Z) + \frac{1}{1 - α} E {(Z - {VaR}_{α} (Z))}_{+} = min_{t \in ℜ} t + \frac{1}{1 - α} E {(Z - t)}_{+},

where

E (\cdot)

is the expectation and

z_{+} = max {z, 0}

.

Let us assume that we have a portfolio consisting of n risks

X = (X_{1}, \dots, X_{n})

. The cumulative distribution function (c.d.f.) of each individual risk

X_{i}

is

F_{i} (\cdot)

and is assumed to be known for all

1 \leq i \leq n

, and we write

X_{i} \sim F_{i}

. Moreover, we assume that the dependence between the risks, i.e., the multivariate distribution

F (x) = Pr (X \leq x)

of

X

, is unknown, but some prior knowledge about the association amongst risks is available. Namely, the set of feasible distributions is given by

F = \{F : \underline{F} (x) \leq F (x) \leq \bar{F} (x), \forall x \in ℜ^{n}, X_{i} \sim F_{i}\},

(1)

where

\underline{F}

and

\bar{F}

are some n-dimensional joint c.d.f.’s that define the set of acceptable dependence models. Note that the above assumption provides a lower and upper bound for

X

in the lower-orthant stochastic ordering sense. Recall that two random vectors

X

and

Y

in

ℜ^{n}

are lower-orthant ordered, written

X ⪯_{l o} Y

, if

Pr (X \leq x) \leq Pr (Y \leq x)

for all

x \in ℜ^{n}

. It is known that the comonotonic1 dependence

F^{c} (x) : = {min}_{i = 1, \dots, n} F_{i} (x_{i})

gives the sharp upper bound on the c.d.f. with prescribed marginals

F_{i}

. Thus, if there is no upper bound specified, without a loss in generality we may set

\bar{F} (x) = F^{c} (x)

. On the other hand, given the marginals, it is impossible to construct the sharp lower bound on the c.d.f. for

n \geq 3

. Thus, when the lower bound

\underline{F}

is not known a priori, it is not so clear what should be used in place of

\underline{F}

, besides trivial choices.

The main aim of the paper is to compute sharp lower and upper bounds on

{CVaR}_{α} (Z)

,

inf_{F \in F} {CVaR}_{α} (Z) and sup_{F \in F} {CVaR}_{α} (Z),

(2)

where

X \sim F

. We approximate the solutions to (2) by assuming

X_{i}

’s to be discrete random variables, i.e., by considering a sample of size

m^{n}

from our population

X

. Namely, it is assumed that

X_{i}

takes the values

x_{i, 1} \leq \dots \leq x_{i, m}

with equal probability

1 / m

, but we do not know the joint probability amongst the risks, represented by p.m.f.

p_{i_{1}, \dots, i_{n}} = Pr (X = (x_{1, i_{1}}, \dots, x_{n, i_{n}})), for all 1 \leq i_{1}, \dots, i_{n} \leq m .

Note, that if

X

is a continuous compactly supported random vector, one can use the above discretization to approximate its distribution to within the desired accuracy by increasing m. The

F_{i}

-equivalent discrete marginal distributions are standardized and assumed to be uniform. This choice is motivated by the common sampling procedure using copulas, if parametric models for the marginal distributions are available. It is also motivated by the practical considerations on availability of historical data. The methods in the paper can be easily adapted to arbitrary marginals. However, this comes at the unnecessary expense of further complicating the notation. The c.d.f. bounds

\underline{F}

and

\bar{F}

are represented by discrete vectors

\underline{ß}

and

\bar{ß}

. Likewise, we denote the aggregate risk sample by

z

,

z_{i} = z_{i_{1}, \dots, i_{n}} = \sum_{j = 1}^{n} x_{j, i_{j}}

, where the multi-index

i = (i_{1}, \dots, i_{n})

runs over all

m^{n}

possible values

i \in I

with

i_{j} \in {1, \dots, m}

. The values of

z_{i}

are only partially ordered.

Thus, in order to approximate (2), we need to compute

\begin{matrix} \begin{matrix} {\underline{CVaR}}_{α} : = & inf_{p} min_{t \in ℜ} t + \frac{1}{1 - α} \sum_{I} {(z_{i} - t)}_{+} p_{i} \\ s . t . & {\underline{ß}}_{i} \leq \sum_{j \leq i} p_{j} \leq {\bar{ß}}_{i}, for all i \in I, \\ \sum_{i : i_{j} = k} p_{i} = \frac{1}{m}, for all j = 1, \dots, n, k = 1, \dots, m, \\ \sum_{I} p_{i} = 1, p \geq 0, \end{matrix} \end{matrix}

(3)

and

\begin{matrix} \begin{matrix} {\bar{CVaR}}_{α} : = & sup_{p} min_{t \in ℜ} t + \frac{1}{1 - α} \sum_{I} {(z_{i} - t)}_{+} p_{i} \\ s . t . & {\underline{ß}}_{i} \leq \sum_{j \leq i} p_{j} \leq {\bar{ß}}_{i}, for all i \in I, \\ \sum_{i : i_{j} = k} p_{i} = \frac{1}{m}, for all j = 1, \dots, n, k = 1, \dots, m, \\ \sum_{I} p_{i} = 1, p \geq 0, \end{matrix} \end{matrix}

(4)

where the multi-index inequalities

j \leq i

are interpreted component-wise. Note that the marginal density constraints

\sum_{i : i_{j} = k} p_{i} = \frac{1}{m}

are stated explicitly as part of the formulation, although, we could absorb these constraints into tighter upper and lower c.d.f. bounds.

3. Computable Lower Bound

3.1. Reduction to Parametric LP

Let us define the value function as

\begin{matrix} \begin{matrix} \underline{val} (t) : = & inf_{p} t + \frac{1}{1 - α} \sum_{I} {(z_{i} - t)}_{+} p_{i} \\ s . t . & {\underline{ß}}_{i} \leq \sum_{j \leq i} p_{j} \leq {\bar{ß}}_{i}, for all i \in I, \\ \sum_{i : i_{j} = k} p_{i} = \frac{1}{m}, for all j = 1, \dots, n, k = 1, \dots, m, \\ \sum_{I} p_{i} = 1, p \geq 0, \end{matrix} \end{matrix}

(5)

and note that evaluating

\underline{val} (t)

for a fixed t corresponds to solving an LP problem. This is critical to the design of our computational approach to solving (3), i.e., determining

\begin{matrix} {\underline{CVaR}}_{α} = inf_{t \in ℜ} \underline{val} (t) . \end{matrix}

Since the solution to a moderately sized LP problem can be typically computed in reasonable time, to obtain an initial sense of what range

{\underline{CVaR}}_{α}

may fall into, one may simply compute a few values

\underline{val} (t)

for some sample values

t_{1}, t_{2}, \dots

. We extend this basic idea by combining it with a few more observations that follow. Recall that evaluating

{\underline{CVaR}}_{α}

corresponds to solving the so-called bilinear optimization problem, which is notoriously difficult, due to the inherent non-convexity of the objective with potentially many local minima.

3.2. Compact Support in t

We now claim that in order to compute

{\underline{CVaR}}_{α}

, it is unnecessary to perform an exhaustive search over all possible values of

t \in ℜ

.

Theorem 1.

Denote

\underline{t} = min_{I} z_{i}

and

\bar{t} = max_{I} z_{i}

. Then, the following holds

\begin{matrix} inf_{t \in ℜ} \underline{val} (t) = min_{t \in [\underline{t}, \bar{t}]} \underline{val} (t) . \end{matrix}

Proof.

Assume first that

t > \bar{t}

: since

{(z_{i} - t)}_{+} = 0

for all

i

, we have

\underline{val} (t) = t

and thus, the value function

\underline{val} (t)

is increasing for any

t > \bar{t}

.

Consider now the case of fixed t, such that

t < \underline{t}

. Let

p

denote an optimal probability distribution resolving

\underline{val} (t)

at t, and

\underline{p}

denote an optimal probability distribution resolving

\underline{val} (t)

at

\underline{t}

. Denoting

Δ t = \underline{t} - t \geq 0

, we have

\begin{matrix} \underline{val} (t) - \underline{val} (\underline{t}) & = & (t + \frac{1}{1 - α} \sum_{I} {(z_{i} - t)}_{+} p_{i}) - (\underline{t} + \frac{1}{1 - α} \sum_{I} {(z_{i} - \underline{t})}_{+} {\underline{p}}_{i}) \\ = & - Δ t + \frac{1}{1 - α} \sum_{I} ((z_{i} - t) p_{i} - (z_{i} - t - Δ t) {\underline{p}}_{i}) \\ = & - Δ t + \frac{1}{1 - α} \sum_{I} ((z_{i} - t) (p_{i} - {\underline{p}}_{i}) + Δ t {\underline{p}}_{i}) \\ = & \frac{α}{1 - α} Δ t + \frac{1}{1 - α} \sum_{I} (z_{i} - t) (p_{i} - {\underline{p}}_{i}) \\ \geq & \frac{α}{1 - α} Δ t \geq 0, \end{matrix}

where the last identity follows from the feasibility of

\underline{p}

, namely,

\sum_{I} {\underline{p}}_{i} = 1

, while the next to last inequality follows from

p

being the optimal solution corresponding to t, which in turn implies

\sum_{I} {(z_{i} - t)}_{+} p_{i} \leq \sum_{I} {(z_{i} - t)}_{+} {\underline{p}}_{i} .

Therefore,

\underline{val} (t)

is decreasing for

t < \underline{t}

. Finally, since

\underline{val} (t)

is a continuous function minimized over a compact set, we can replace inf with min. □

3.3. Key Properties of the Value Function

Since evaluation of the value function can be reduced to an LP with a parametric objective, we can establish the next proposition.

Proposition 1.

The function

\underline{val} (t)

is a piecewise linear, continuous function, concave on every subinterval

[z_{(ℓ)}, z_{(ℓ + 1)}]

, where

z_{(ℓ)}

corresponds to re-indexing of

z_{i}

values in non-decreasing order so that

z_{(ℓ)} \leq z_{(ℓ + 1)}

for all

ℓ = 1, \dots, m^{n}

. Furthermore,

\underline{val} (t)

has finitely many linear segments.

Proof.

Observe that restricting

t \in [z_{(ℓ)}, z_{(ℓ + 1)}]

, we can write

\underline{val} (t) = t + \frac{1}{1 - α} \underline{v} (t)

with

\begin{matrix} \begin{matrix} \underline{v} (t) : = & inf_{p, \underline{s}, \bar{s}} \sum_{z_{i} \geq z_{(ℓ + 1)}} (z_{i} - t) p_{i} \\ s . t . & \sum_{j \leq i} p_{j} - {\underline{s}}_{i} = {\underline{ß}}_{i}, for all i \in I, \\ \sum_{j \leq i} p_{j} + {\bar{s}}_{i} = {\bar{ß}}_{i}, for all i \in I, \\ \sum_{i : i_{j} = k} p_{i} = \frac{1}{m}, for all j = 1, \dots, n, k = 1, \dots, m, \\ \sum_{I} p_{i} = 1, \\ p, \underline{s}, \bar{s} \geq 0 \end{matrix} \end{matrix}

(6)

denoting the partial value function. In turn, determining

\underline{v} (t)

may easily be recognised as a linear optimization problem in standard minimisation form

\begin{matrix} \begin{matrix} \underline{v} (t) = & min_{x} {(c + t Δ c)}^{T} u \\ s . t . & A u = b, \\ u \geq 0 . \end{matrix} \end{matrix}

Note that

u = (p; \underline{s}; \bar{s})

is a vector of variables of dimension

d = 3 m^{n}

,

A : ℜ^{d} \to ℜ^{r}

is a linear function encoded as

d \times r

matrix with

r = 2 m^{n} + m n + 1

rows and

b \in ℜ^{r}

represents the affine equality constraints stated for

\underline{v} (t)

. The t-parametric objective

c + t Δ c

corresponds to

c_{i} = \{\begin{matrix} z_{i}, & i : z_{i} \geq z_{(ℓ + 1)}, \\ 0 & otherwise, \end{matrix} with Δ c_{i} = \{\begin{matrix} - 1, & i : z_{i} \geq z_{(ℓ + 1)}, \\ 0 & otherwise, \end{matrix}

(7)

where we allow a slight abuse of notation when indexing

c

and

Δ c

by the multi-index

i

.

Clearly,

\underline{v} (t)

is a continuous piecewise linear concave function of t. By enumerating the total number of possible bases, standard LP sensitivity analysis implies that on a given subinterval

t \in [z_{(ℓ)}, z_{(ℓ + 1)}]

function

\underline{v} (t)

, and therefore

\underline{val} (t)

, consists of at most

(\binom{d}{r})

linear segments. Since we have at most

m^{n} - 1

of such subintervals, we conclude that

\underline{v} (t)

consists of at most

(m^{n} + 1) (\binom{d}{r})

linear segments, which also includes two end subintervals

(- \infty, z_{(1)}]

and

[z_{(m^{n})}, \infty)

. □

The above bound on the number of linear segments comprising

\underline{v} (t)

is very crude. Not only do we take a very pessimistic bound

(\binom{d}{r})

on the number of vertices of a very special polytope that describes the feasible probability distributions, we also ignore a special “monotonic” structure in perturbations to the objective vector. Consequently, it is quite natural to expect the number of such segments to be much smaller.

The above proposition, based on classical sensitivity analysis for LP, albeit correct, may be misleading while designing a numerical scheme for minimising

\underline{val} (t)

. Specifically, the asserted piecewise concavity of

\underline{val} (t)

may suggest a potential existence of several local minima (see Figure 1a). We remedy this in the next theorem, which gives a complete characterisation of the partial value function. Along the way, we drastically reduce the upper bound on the number of linear segments comprising

\underline{val} (t)

.

Theorem 2.

The function

\underline{v} (t)

is continuous, non-negative and non-increasing, satisfying

\underline{v} (t) = 0

for

t \geq z_{(m^{n})}

and

{\underline{v}}^{'} (t) = - 1

for

t \leq z_{(1)}

. Moreover,

\underline{v} (t)

is convex on ℜ and linear on every subinterval

[z_{(ℓ)}, z_{(ℓ + 1)}]

.

Proof.

Continuity and the tail-end behaviour of

\underline{v} (t)

are established in the proof of Proposition 1 and Theorem 1. Examining variational formulation (5), we easily note the non-negativity and monotonicity of the partial value function, with the latter, due to the objective coefficients

{(z_{i} - t)}_{+}

, being monotone in t. Linearity on

[z_{(ℓ)}, z_{(ℓ + 1)}]

follows as a consequence of convexity—to be established shortly—and piecewise concavity in Proposition 1. It remains to show convexity.

We show the convexity property by contradiction. First, introduce

\begin{matrix} v_{p} (t) : = \sum_{I} {(z_{i} - t)}_{+} p_{i} \end{matrix}

to be the partial value function restricted to a given feasible

p

. Observe that

v_{p}

is convex, piecewise linear non-increasing, and its derivative

v_{p}^{'} (t)

, whenever defined, corresponds to the dot product of

p

with the corresponding sub-vector of

Δ c

, as in (7). Thus,

v_{p}^{'} (t)

is non-decreasing whenever defined. We also note that as t passes from the interval

[z_{(ℓ - 1)}, z_{(ℓ)}]

to

[z_{(ℓ)}, z_{(ℓ + 1)}]

, the number of

- 1

entries, i.e., the non-zeros in

Δ c

, is reduced by at least one.

If

\underline{v} (t)

is strictly concave, there exists a cross-over point

t^{=}

, characterized by

t^{-} < t^{=} < t^{+}

and the corresponding optimal distributions

p^{-}

and

p^{+}

, resolving (6) such that

v_{-} (t^{=}) = v_{+} (t^{=})

, with derivatives satisfying

v_{-}^{'} (t^{-}) > v_{+}^{'} (t^{-})

and

v_{-}^{'} (t^{+}) > v_{+}^{'} (t^{+})

, where

v_{-} (t) : = v_{p^{-}} (t)

and

v_{+} (t) : = v_{p^{+}} (t)

. Note that

t^{-}

and

t^{+}

may be chosen close enough to

t^{=}

to warrant differentiability of the corresponding piecewise linear

v_{+}, v_{-}

on

[t^{-}, t^{=})

and

(t^{=}, t^{+}]

. Furthermore, without a loss in generality, we may assume that both

v_{-} (t), v_{+} (t)

have either at most one break-point at

z_{(ℓ)} = t^{=}

for some ℓ with

t^{-} \in (z_{(ℓ - 1)}, z_{(ℓ)})

and

t^{+} \in (z_{(ℓ)}, z_{(ℓ + 1)})

, or no break-point at all with

t^{-}, t^{+} \in (z_{(ℓ)}, z_{(ℓ + 1)})

, as can be seen in Figure 2. By re-scaling and shifting t we can also assume

t^{=} \equiv 0

and

- t^{-} = t^{+} = \frac{1}{2}

. With the above notation, we have

\underline{v} (t) = min {v_{-} (t), v_{+} (t)}

for

t \in [- 1 / 2, 1 / 2]

and

v_{-}^{'} (t) > v_{+}^{'} (t)

for

t \in [- 1 / 2, 0) ⋃ (0, 1 / 2]

.

Denote

p (τ) = τ p^{-} + (1 - τ) p^{+}, τ \in [0, 1]

. Note that

p (τ)

is feasible since the feasible region of (6) is convex. Further, let us examine

v (τ) : = v_{p (τ)} (1 / 2 - τ)

. By the fundamental theorem of calculus, we obtain

\begin{matrix} v (1 / 2) = v (0) + \int_{0}^{1 / 2} v_{τ}^{'} (τ) d τ = \underline{v} (t^{+}) + \int_{0}^{1 / 2} v_{p (τ)}^{'} (τ) d τ < \underline{v} (t^{=}), \end{matrix}

where the inequality is due to the fact that

| v_{p (τ)}^{'} (τ) | < | v_{+}^{'} (τ) |

, for all

τ \in (0, 1 / 2)

, since

p^{-}

; consequently,

p (τ)

carries less probability mass over the support of

Δ c

at

t^{+}

as compared to

p^{+}

. Since

\underline{v} (t^{=})

is supposed to be the smallest over all feasible

p

at

t^{=}

, the contradiction is conspicuous. This completes the proof. □

3.4. Two Computational Approaches

We now present two computational schemes for computing the sharp lower bound

{\underline{CVaR}}_{α}

on CVaR given the constraints on the risks’ c.d.f.. The schemes are aimed at illustrating the advantages of exploiting the inherent structure of Problem 3 and range in order of complexity, as well as the perceived numerical efficiency. The latter is further substantiated in Section 5.

3.4.1. Naïve Scheme

Observe that the piecewise concavity of

\underline{val} (t)

established in Proposition 1 implies that the minimum of the value function may only occur at the end points of each interval

[z_{(ℓ)}, z_{(ℓ + 1)}]

. Therefore, it suffices to compute

\underline{val} (z_{i})

for all

i

and take the minimum value. This gives rise to the naïve scheme.

Clearly, the naïve scheme requires access to an LP solver and runs in finite time. However, it requires solving a large number—namely

m^{n}

—of (6)-type optimization problems, where the problem dimensions also grow proportional to

m^{n}

. As a result, the procedure may become very computationally expensive for even modest values of m and n. Further effort can be put towards reducing the computational requirements imposed by the naïve scheme. For example, the LP problems for evaluating

\underline{val} (t)

differ only in the objective function, and thus, may be well-suited for the so-called warm-start techniques, as in simplex-type algorithms. In turn, the use of warm-starting may speed up solution times.

3.4.2. Epigraph Scheme

Unlike the naïve scheme, here we aim to take full advantage of the uncovered convexity of the value function. This not only allows us to greatly reduce the computational efforts required to determine the exact value

{\underline{CVaR}}_{α}

, but also permits the introduction of an alternative termination criterion when only an approximate answer is required within some given absolute precision

ε > 0

.

We recall that an epigraph of a convex function can be obtained as an intersection of half-spaces. In the case of a smooth function the half-spaces correspond to tangent hyperplanes, and in the case of non-differentiable functions one may use half-spaces defined by sub-gradients. Thus, given two consecutive values

t^{-} < t^{+}

of t with appropriately defined derivatives

{\underline{val}}^{'} (t)

of

\underline{val} (t)

, with values

v_{-} : = \underline{val} (t^{-}), v_{+} : = \underline{val} (t^{+})

and

v_{-}^{'} : = {\underline{val}}^{'} (t^{-}) < 0, v_{+}^{'} : = {\underline{val}}^{'} (t^{+}) > 0

, we know that the minimal value

{\underline{val}}^{*}

of

\underline{val} (t)

corresponds to some

t^{*} \in [t^{-}, t^{+}]

. In addition, we have

{\underline{val}}^{*} \in [min (v_{-}, v_{+}), \underline{val} (\tilde{t})]

, where

\tilde{t} = \frac{v_{-} - v_{+} + v_{+}^{'} t^{+} - v_{-}^{'} t^{-}}{v_{+}^{'} - v_{-}^{'}}

is the intersection of the supporting hyperplanes

v_{-}^{'} (t - t^{-}) + v_{-}

and

v_{+}^{'} (t - t^{+}) + v_{+}

. This can be seen in Figure 3. To refine the interval

[t^{-}, t^{+}]

and our estimate on

{\underline{val}}^{*}

, we can take the mid-point of the interval and adjust either

t^{-}, v_{-}, v_{-}^{'}

or

t^{+}, v_{+}, v_{+}^{'}

accordingly.

Assuming that the data are given by

α, m, n, \underline{ß}, \bar{ß}

, and

z

, the scheme may be defined recursively as a function whose declaration is given below using MATLAB notation

function [t^{-}, t^{+}, v_{-}, v_{+}, v_{-}^{'}, v_{+}^{'}] = Epigraph (t^{-}, t^{+}, v_{-}, v_{+}, v_{-}^{'}, v_{+}^{'}),

and is defined as follows.

[Input:] $t^{-} < t^{+}, v_{-}^{'} \leq 0, v_{+}^{'} \geq 0, v_{-}, v_{+}$ , problem data.
Set ${\underline{val}}^{*} : = min {v_{-}, v_{+}}$ ,
compute $\tilde{t}$ and $v : = \underline{val} (\tilde{t})$ by solving (5), recovering the optimal probability distribution $\tilde{p}$ ,
if $v = {\underline{val}}^{*}$ then return,
set $v^{'} : = 1 + \frac{1}{1 - α} \sum_{I} Δ c_{i} {\tilde{p}}_{i}$ with $Δ c$ corresponding to $\tilde{t}$ as in (7),
if $v^{'} \leq 0$ set $t^{-} : = \tilde{t}, v_{-} : = v, v_{-}^{'} : = v^{'}$ ,
if $v^{'} > 0$ set $t^{+} : = \tilde{t}, v_{+} : = v, v_{+}^{'} : = v^{'}$ ,
invoke Epigraph( $t^{-}, t^{+}, v_{-}, v_{+}, v_{-}^{'}, v_{+}^{'}$ ).
[Output:] ${\underline{val}}^{*} = min_{t \in [t^{-}, t^{+}]} \underline{val} (t)$ .

Clearly, in order to obtain

{\underline{CVaR}}_{α}

, we need to invoke Epigraph( $t^{-}, t^{+}, v_{-}, v_{+}, v_{-}^{'}, v_{+}^{'}$ ) with initial values

t^{-} = \underline{t} (\equiv z_{(1)})

and

t^{+} = \bar{t} (\equiv z_{(m^{n})})

. If one desires to terminate the procedure once the absolute precision

ε

is reached, such that

{\underline{val}}^{*} \leq {\underline{CVaR}}_{α} \leq {\underline{val}}^{*} + ε

, it suffices to replace the function termination criteria

\underline{val} - v = 0

with

min {- v_{-}^{'} (\tilde{t} - t^{-}), v_{+} (t^{+} - \tilde{t})} \leq ε

.

For the convexity of

\underline{val} (t)

and its tail behaviour from Theorem 2, we know that the fastest decrease rate of the value function does not exceed

|1 - \frac{1}{1 - α}| = \frac{α}{1 - α}

and therefore,

| \underline{val} (t) - \underline{val} (t + Δ t) | \leq \frac{α Δ t}{1 - α}, for all t, Δ t .

Thus, in order to achieve an

ε

precision, it suffices to have

t^{+} - t^{-} \leq \frac{ε (1 - α)}{α}

. In turn, recalling that at every iteration of the scheme the interval

[t^{-}, t^{+}]

is halved, we conclude that the absolute

ε

precision can be attained in at most

{log}_{2} (\frac{1}{ε} \cdot \frac{α (\bar{t} - \underline{t})}{(1 - α)})

recursive calls, where the dominant work belongs to solving an LP instance of the form (6).

In a nutshell, although the epigraph scheme still relies on solving multiple LP instances in order to recover

{\underline{CVaR}}_{α}

for fixed n, its worst-case run-time is bounded from above as a polynomial function of the problem input. Furthermore, when an approximate solution is sufficient, one would expect the number of calls to the LP solver to be dramatically less than

m^{n}

, as compared to the naïve scheme. We also note that the epigraph procedure is defined recursively only in an attempt to improve clarity of exposition. Clearly, the procedure can be unrolled into if ... else ... statements with no recursion. Just as with the naïve scheme, one may try to take advantage of the warm-starting capabilities of an LP solver in an attempt to speed up the computational times required.

4. Computable Upper Bound

It turns out that despite apparent similarities between Problems (3) and (4), the complexity of evaluating

{\underline{CVaR}}_{α}

is quite different from that of

{\bar{CVaR}}_{α}

. Namely, the calculation of

{\bar{CVaR}}_{α}

is much simpler. We first establish an essential property that is needed for proving the main result of this section.

Proposition 2.

The max-value function

\bar{val} (t) = t + \frac{1}{1 - α} \bar{v} (t)

, where

\begin{matrix} \begin{matrix} \bar{v} (t) : = & max_{p} \sum_{I} {(z_{i} - t)}_{+} p_{i} \\ s . t . & {\underline{ß}}_{i} \leq \sum_{j \leq i} p_{j} \leq {\bar{ß}}_{i}, for all i \in I, \\ \sum_{i : i_{j} = k} p_{i} = \frac{1}{m}, for all j = 1, \dots, n, k = 1, \dots, m, \\ \sum_{I} p_{i} = 1, p \geq 0, \end{matrix} \end{matrix}

is convex in t.

Proof.

For fixed non-negative

p

, the objective function

\sum_{I} {(z_{i} - t)}_{+} p_{i}

is convex in t. In turn,

\bar{v} (t)

is obtained by taking a supremum of convex functions

v_{p} (t)

, indexed by

p

, and therefore

\bar{val} (t)

is convex as well as a positive weighted sum of t and

\bar{v} (t)

. □

Using convexity, we note that the epigraph-based scheme from Section 3.4.2 can readily be adapted to computing the sharp upper bound of

{CVaR}_{α}

. Furthermore, using classical LP duality theory, finding the optimal t value corresponding to minimising

\bar{val} (t)

may equivalently be reformulated as solving a linear optimization problem. Let

\begin{matrix} M_{j, k} = {i : i_{j} = k, i_{ℓ} = m for all ℓ \neq j} \end{matrix}

(8)

and

M = ⋃_{j, k} M_{j, k}

denote the set of multi-indices corresponding to sums of marginals, including the total probability mass. For simplicity, from now on, we assume that the c.d.f. bounds

\bar{ß}, \underline{ß}

are consistent with marginals, that is,

{\underline{ß}}_{i} \leq \frac{k}{m} \leq {\bar{ß}}_{i}

for all

i \in M_{j, k}

where the marginal index sets are defined as in (8). If not, clearly, the problem of computing

\bar{val} (t)

is infeasible.

For clarity of exposition, we first slightly modify our formulation of

\bar{v} (t)

from above. Noting that the lower and upper bound requirements on the c.d.f. are clearly redundant for

i \in M

, they may simply be replaced with more restrictive modified bounds

{\underline{ß}}^{'}, {\bar{ß}}^{'}

, where

{\underline{ß}}_{i}^{'} = \{\begin{matrix} \frac{k}{m}, & i \in M_{j, k}, \forall j, k, \\ {\underline{ß}}_{i}, & otherwise, \end{matrix} and {\bar{ß}}_{i}^{'} = \{\begin{matrix} \frac{k}{m}, & i \in M_{j, k}, \forall j, k, \\ {\bar{ß}}_{i}, & otherwise . \end{matrix}

We are now ready to formulate the main result of this section.

Theorem 3.

The upper bound defined in (4) can be computed as follows:

\begin{matrix} \begin{matrix} {\bar{CVaR}}_{α} : = & min_{t, \underline{y}, \bar{y}} t + \frac{1}{1 - α} (- {\underline{y}}^{T} {\underline{ß}}^{'} + {\bar{y}}^{T} {\bar{ß}}^{'}) \\ s . t . & \sum_{j \geq i} {\underline{y}}_{j} - \sum_{j \geq i} {\bar{y}}_{j} \leq t - z_{i}, for all i \in I, \\ \sum_{j \geq i} {\underline{y}}_{j} - \sum_{j \geq i} {\bar{y}}_{j} \leq 0, for all i \in I, \\ \underline{y}, \bar{y} \geq 0, t \in ℜ, \end{matrix} \end{matrix}

Proof.

Note that for any fixed t, the problem of computing

\bar{v} (t)

is equivalent to solving its dual

\begin{matrix} \begin{matrix} {\bar{v}}^{*} (t) : = & min_{\underline{y}, \bar{y}} - {\underline{y}}^{T} {\underline{ß}}^{'} + {\bar{y}}^{T} {\bar{ß}}^{'} \\ s . t . & \sum_{j \geq i} {\underline{y}}_{j} - \sum_{j \geq i} {\bar{y}}_{j} \leq - {(z_{i} - t)}_{+}, for all i \in I, \\ \underline{y}, \bar{y} \geq 0, \end{matrix} \end{matrix}

where by strong LP duality, we know that

{\bar{v}}^{*} (t) = \bar{v} (t)

. Furthermore, for any dual-feasible point

(\underline{y}, \bar{y})

, by the weak duality property we have

- {\underline{y}}^{T} {\underline{ß}}^{'} + {\bar{y}}^{T} {\bar{ß}}^{'} \geq \bar{v} (t) .

Noting that the dual-feasible region may equivalently be rewritten as stated in the theorem, we finally observe that in order to compute the optimal

t^{*}

that satisfies

{\bar{CVaR}}_{α} = \bar{val} (t^{*})

, it suffices to solve the concurrent linear minimisation problem with respect to t and

(\underline{y}, \bar{y})

. □

Finally, once the optimal value

t^{*}

is known, the corresponding optimal values of

p

can easily be computed by solving for

\bar{v} (t^{*})

as a linear maximisation problem, if further desired.

5. Numerical Results

In this section, we provide numerical illustrations of our findings from Section 3 and Section 4. First, we gauge how the computational requirements scale up with the problem dimensions and identify one critical bottleneck in Section 5.1. To do this, we compare two ways of implementing our approaches in MATLAB. One primarily relies on CVX with the embedded open-source solver SDPT3, chosen for the sake of simplicity. The other approach uses Gurobi Optimization, LLC (2023) and a direct problem formulation, as a potentially more efficient option. CVX removes the inconvenience of carefully formulating the LP (6) to near-standard form suitable for Gurobi, while potentially sacrificing some of the efficiencies. On the other hand, the user-provided direct specification of the underlying LP may be more of a challenge initially, but potentially gives some computational advantage when solving the problem. Next, we propose an approach that allows us to circumvent one of the main computational obstacles, and illustrate the refined methodology on a real-life inspired example in Section 5.2.

5.1. Verbatim Implementation

Our first goal is to obtain a sense of how the performance of our method scales up with problem dimensions, as well as to gauge if the modeling environment and the LP solver play a role. For this, we use a very modest Alienware laptop with a 2 core Intel i7 U640 CPU running at 1.2 GHz, 4 GB RAM, running Windows 7 x64, MATLAB R2013b, CVX 2.0, and Gurobi 5.5.

Regardless of the approach, we rely on solving (5) or its variant, where the dimensions of the problem grow proportional to

m^{n}

—thus, polynomial in m and exponential in n. Specifically, for the standard LP form of the partial value evaluation (6), the number of variables and constraints grow as

3 m^{n}

and

2 m^{n} + m n + 1

, respectively, while the number of non-zeros in the matrix of coefficients describing the affine constraint is roughly

2 m^{2 n} \cdot {(\frac{m + 1}{2 m})}^{n}

.2 Consequently, despite the fact that the fraction of non-zero entries in the matrix of affine coefficients corresponding to (6) decreases exponentially in n, the number of non-zeros still grows very rapidly with the number of risks. For instance, in case of

m = 100

and

n = 3

, one should expect to deal with a matrix containing more than

10^{11}

non-zero entries (of one), making solving such problems on a regular computer workstation prohibitively expensive. Even with the availability of super-computing resources, one probably has to resort to very specialized algorithms—e.g., Tardos (1986)—and linear algebra techniques to exploit matrix sparsity structure efficiently for large values of m and n.

In Table 1 we report the average run-times for estimating sharp upper and lower bounds for problems with varying n and m. For this and the other numerical experiments, for each dimension, we generate 30 random problem instances, where

X_{i}

sample values are chosen to be uniform between 0 and 1 for simplicity. CVX refers to only using CVX to formulate the LP sub-problem and passes it to a selected solver, while tensor-like notation is used inside the CVX code. CVX+ refers to us formulating the affine constraints of an LP in vectorized form and letting CVX only pass the data to the solver. Direct refers to us both formulating the problem and invoking the Gurobi solver directly, bypassing CVX. When not specified,

α = 0.95

and

ε = 10^{- 7}

.

Our first goal is to understand how the proposed methods scale with dimensions. As expected, the computational cost escalates very rapidly when dimensions m, and especially n, increase. We observe that the run-time heavily depends on the LP solver. For Gurobi, here we used the simplex option, while experimenting with the barrier gave inferior results on this dataset; we suspect that the latter can be attributed to being able to take advantage of a simplex warm-start. Even when no top-of-the-line commercial solver is available, one can compute some bounds with

n < 3

in reasonable time for small values of m.

We also note that, in general using CVX, as opposed to directly formulating the problem and feeding it into a solver, poses some processing time overhead, especially for smaller problems. While formulating the matrix of affine constraints, we rely on MATLAB loops, which may potentially be sped up. Solving with

n = 2

and

m = 50

to within an

ε = 10^{- 7}

precision by using the epigraph scheme, MATLAB takes about 100 s to form a single LP matrix of the coefficients in the standard form, while solving all the subsequent LP problems takes roughly another 150 s.

For estimating the lower bound on

{CVaR}_{α}

, between the two schemes, the epigraph-based method is a clear winner over the naïve approach. The solution times grow with n and m (see Table 1 and Table 2), as well as the desired precision

ε

(see Table 3b). By comparing the results in Table 2 and Table 4, we conclude that computing the sharp upper bound is generally cheaper, as compared to the lower bound. When computing an exact sharp upper bound, direct LP embedding is preferred.

5.2. Stylized Practice-Inspired Example

Computing CVaR sharp bounds under given marginals and lower-orthant stochastic ordering bounds on joint c.d.f.’s, and in particular sharp lower bound, entails solving a non-convex (bilinear) optimization problem of potentially very high dimensionality. Namely, we seek to determine the extreme values of

m^{n}

variables representing the c.d.f. When attempting to scale up the model sizes n and m, we are faced with an obvious memory requirement issue. For instance, solving for

n = 3, m = 100

in (3) entails formulating a model with over

10^{11}

non-zeros that requires almost 1000 GB of RAM if we operate in standard double-precision arithmetic. The RAM requirement grows as

m^{2 n}

and it is reasonable to expect a significant growth in the computational effort required to solve the model as well.

However, it turns out that one could produce a much sparser equivalent representation of lower and upper bound optimization models, allowing solving for sharp bounds with

n = 3, m = 100

sized models in a reasonable time, i.e., a couple of hours, on a reasonable hardware, i.e., a multi-core station with enough RAM. Next, we present this refined setup, along with a more practical illustration of our approach. The example is partly based on work carried out outside of this manuscript, and has been further stylized to avoid breaching any possible non-disclosure agreements. We focus on the lower bound computation as it is more challenging; the upper bound evaluation can be refined in a similar manner.

Assume an insurance company with a portfolio of three risks, located in (1) New York (NY), (2) Miami (FL) and (3) Houston (TX), for which the policy covers economic damages to certain buildings caused by hurricanes in these regions. The underwriter makes decisions based on the hurricane intensity estimates that in turn are predicted based on an atmospheric internal risk model. If

X_{k}

with

k \in {1, 2, 3}

is the economic damage for the k-th risk in dollars, we know that

X_{k}

is

Pareto (α_{k}, λ_{k})

, so that the c.d.f. along with the first two moments are

F (x) = 1 - {(\frac{λ}{x + λ})}^{α}, x \geq 0, E X = \frac{λ}{α - 1}, Var X = \frac{α λ^{2}}{(α - 2) {(α - 1)}^{2}},

with

α_{1} = 5, α_{2} = 2.1, α_{3} = 2.7, λ_{1} = 7.92 \times 10^{6}, λ_{2} = 1.11 \times 10^{7}, λ_{3} = 7.36 \times 10^{6}

resulting in expected losses of USD 1.98 million, USD 10.07 million and USD 4.33 million, respectively. Further, for each risk, the coefficient of variation (CV), a well-known measure of risk, is 1.29, 4.58, and 1.96, so indeed the assets are risky, as expected. A large CV is expected for coverage in more risky regions.

The underwriter has empirical evidence (based on atmospheric observational data) to identify the marginal risk distributions, but does not have the knowledge to create a spatial dependence model across the risks located in different regions. Geographical-dependent ratings would be hardily available even to world-leading rating agencies. Therefore, the underwriter has to rely on the available domain knowledge to come up with aggregate risk estimates

{CVaR}_{α} (X_{1} + X_{2} + X_{3})

based on the best possible information about the risk position.

It is clear that

X_{k}

’s are not negatively associated, and thus, a lower bound on the joint distribution, in terms of the lower-orthant (LO) stochastic ordering, can be given by the independence model,

\underline{F} (x_{1}, x_{2}, x_{3}) = F_{1} (x_{1}) F_{2} (x_{2}) F_{3} (x_{3}),

where

F_{k}

is the c.d.f. of

X_{k}, k = 1, 2, 3

. The upper bound on the joint distribution, in terms of LO, assumes that the NY economic damages are independent of the other two, while economic damages from Miami and Houston could be strongly positive dependent, i.e., comonotonic, and therefore

\bar{F} (x_{1}, x_{2}, x_{3}) = F_{1} (x_{1}) min (F_{2} (x_{2}), F_{3} (x_{3})) .

In terms of our CVaR lower bound formulation (3), the above can be encoded via discretizing the individual risks with some fixed m, so that

x_{i, 1} \leq \dots \leq x_{i, m}, i = 1, 2, 3

correspond to Pareto distribution sample values or inverse Pareto-c.d.f.’s at mid-points

j = (j - 1 / 2) / m

, with

{\underline{ß}}_{i_{1}, i_{2}, i_{3}} = \frac{i_{1}}{m} \times \frac{i_{2}}{m} \times \frac{i_{3}}{m}, {\bar{ß}}_{i_{1}, i_{2}, i_{3}} = \frac{i_{1}}{m} \times max (\frac{i_{2}}{m}, \frac{i_{3}}{m}) .

Our objective here is to evaluate the lower bound, specifically,

{\underline{CVaR}}_{. 8}

for

n = 3, m = 100

.

Using a sparse reformulation of (3), which we discuss next, this objective indeed can be achieved in a reasonable computation time, here, in about 2 h, or 7554 s to be precise, yielding

{\underline{CVaR}}_{. 8} = 40.6

mn, corresponding to

t^{*} = 20, 914, 036

, with the bound computed to within the relative precision of

2.5 \times 10^{- 7}

. For this set of computational experiments we move to a more powerful machine, with an AMD EPYC 7313P 16-core processor and 256 GB RAM, running Ubuntu 22.04. To solve the subsequent LPs, we use Gurobi 10.0.1, where the model was implemented using Gurobi’s Python API, and benchmarked using Python 3.7. We want to emphasize that the chief enabling factor is the sparse reformulation that reduces the number of non-zeros in the model by a square root, e.g., going from

10^{11}

to about

10^{6}

for

n = 3, m = 100

, allowing the formulation of the model in RAM as well as permitting vastly faster computations, which is further improved by moving to a powerful computer server. The code can be found on GitHub, as per Zinchenko (2023).

A number of further computational experiments were performed with varying sparsified model dimensions for both n and m and the run-times were recorded. A model with

n = 3, m = 50

could now be solved in about 550 s, while

n = 3, m = 150

is the current computational limit for the above machine. The solve time scales super-linearly with the problem dimensions. For lower dimensional models with

n = 2

, as before, the run-times look more favourable; for instance

n = 2, m = 1000

could be solved in 808 s.

The sparse reformulation of (3) is built on a pivotal observation that joint c.d.f.’s can be defined recursively, using inclusion–exclusion formulas. Namely, if we introduce

m^{n}

auxiliary variables for the c.d.f. to represent

ß_{i} : = \sum_{j \leq i} p_{j},

we can express the c.d.f. bound constraints as upper and lower bounds on

ß

, and more critically, define the c.d.f. quantities recursively. Namely, for

n = 2

we have

ß_{i} - p_{i} = ß_{i_{1} - 1} + ß_{i_{2} - 1} - ß_{i_{1} - 1, i_{2} - 1},

and for

n = 3

,

ß_{i} - p_{i} = ß_{i_{1} - 1} + ß_{i_{2} - 1} + ß_{i_{3} - 1} - ß_{i_{1} - 1, i_{2} - 1} - ß_{i_{1} - 1, i_{3} - 1} - ß_{i_{2} - 1, i_{3} - 1} + ß_{i_{1} - 1, i_{2} - 1, i_{3} - 1},

where

i = (i_{1}, i_{2}, i_{3})

and

ß_{i_{1} - 1}

is a shorthand notation for

ß_{(i_{1} - 1, i_{2}, i_{3})}

and if some sub-index becomes negative, we replace the corresponding c.d.f. entry with 0. This necessitates only 5 and 9 zeros per constraint, respectively, as opposed to an average of

m^{n} / 2^{n}

in the original model formulation carried out verbatim. To further promote sparsity, the marginals can be reformulated in terms of the c.d.f. auxiliary variables, for instance, for the first risk we can write

ß_{(j, m, m)} = j / m, j = 1, \dots, m .

Thus, even though we gain another

m^{n}

variables in our formulation, the revised non-zero count grows as

O (2^{n} m^{n})

, as compared to the original

O (m^{2 n} / 2^{n})

. The construction can easily be extended and implemented to any n.

While for

n = 3

, moving beyond

m = 100

becomes prohibitively expensive, an argument can be made that from a practical point of view perhaps this is also not so critical. It is hard to imagine a situation where the empirical marginals are known so precisely that it would necessitate spelling out marginal c.d.f. constraints in finer than

0.01 (= 1 / m)

granularity. It could also be more important to distill the extreme dependence trends for the unknown multivariate c.d.f. rather than try to zero down on the very last digits of CVaR bounds, and as such, our approach could provide a viable exploratory tool.

6. A Special Case and Its Higher-Dimensional Variants

In this section we investigate the question of whether CVaR ordering may be consistent with the ordering of the underlying distributions for higher-dimensional portfolios, i.e.,

n \geq 3

. We say that two n-dimensional random vectors

X^{(1)}, X^{(2)}

have identical marginals if

Pr (X_{i}^{(1)} \leq x) = Pr (X_{i}^{(2)} \leq x)

for all

i = 1, \dots, n

and

x \in ℜ

. We first provide an alternative proof to a well-known result that CVaR respects the so-called lower-orthant stochastic ordering for

n = 2

(Proposition 6.2.9. of Denuit et al. (2005)).

Theorem 4.

Let

n = 2

and

X^{(1)}

and

X^{(2)}

be two compactly supported random vectors with identical marginals and corresponding aggregate risks

Z^{(1)}

and

Z^{(2)}

. If

X^{(1)} ⪯_{l o} X^{(2)}

, then for any

α \in (0, 1)

we have that

{CVaR}_{α} (Z^{(1)}) \leq {CVaR}_{α} (Z^{(2)})

.

Although the claim may be extended to a wide class of other risk measures, the previously known proofs of the above theorem rely on the fairly exotic techniques from convex analysis. The theorem itself becomes interesting in view of the potential computational savings it may provide, when comparing the aggregate risks of bivariate distributions with identical marginals, satisfying the lower-order stochastic ordering.

A natural question is whether such an ordering is preserved in higher dimensions,

n \geq 3

. We show that no such extension exists. In fact, one may argue that even the above result with

n = 2

is unnatural and goes against the intuition of what should happen. To substantiate the latter point of view, we

give an alternative and self-contained proof of the classical result from Theorem 4,
state several potential extensions of such a result to higher dimensions, and
provide counter-examples to show that no such extensions are true for $n \geq 3$ .

6.1. An Alternative Proof for $n = 2$

We start by recalling the inclusion–exclusion type criterion (for example, see Billingsley (1995)), that characterizes a c.d.f.. The criterion ensures that the probability mass accumulated within any hypercube is non-negative, and is commonly referred to as the rectangle inequality.

For fixed n, consider a right-continuous non-decreasing

F : ℜ^{n} \to ℜ

, such that

{lim}_{x_{i} \to - \infty} F (x) = 0

for all

i = 1, \dots, n

, and

{lim}_{x_{1}, \dots, x_{n} \to \infty} F (x) = 1

, F is a c.d.f. if and only if

\begin{matrix} \sum_{j_{1} = 1}^{2} \dots \sum_{j_{n} = 1}^{2} {(- 1)}^{j_{1} + \dots + j_{n}} F (ζ_{1, j_{1}}, ζ_{2, j_{2}}, \dots, ζ_{k, j_{n}}) \geq 0, \end{matrix}

for all

ζ_{i, 1} < ζ_{i, 2}, i = 1, \dots, n

.

In particular, the rectangle inequality guarantees the existence of a probability mass function (pmf) given a candidate non-decreasing step-like function on

ℜ^{n}

. From now on, we consider an n-dimensional discrete random vector with values

(x_{1, i_{1}}, \dots, x_{n, i_{n}})

,

1 \leq i_{1}, \dots, i_{n} \leq m

, placed on an

m \times m \times \dots \times m

rectangular grid, and the corresponding pmf

p_{i}, i \in I

. Thus, for

n = 2

the above inequalities become

\begin{matrix} 0 & \leq & \underset{ϕ_{2, 2}}{\underset{︸}{F (ζ_{1, 2}, ζ_{2, 2})}} - \underset{ϕ_{1, 2}}{\underset{︸}{F (ζ_{1, 1}, ζ_{2, 2})}} - \underset{ϕ_{2, 1}}{\underset{︸}{F (ζ_{1, 2}, ζ_{2, 1})}} + \underset{ϕ_{1, 1}}{\underset{︸}{F (ζ_{1, 1}, ζ_{2, 1})}}, \end{matrix}

and for

n = 3

we have

\begin{matrix} \begin{matrix} 0 & \leq & \underset{ϕ_{2, 2, 2}}{\underset{︸}{F (ζ_{1, 2}, ζ_{2, 2}, ζ_{3, 2})}} - \underset{ϕ_{1, 2, 2}}{\underset{︸}{F (ζ_{1, 1}, ζ_{2, 2}, ζ_{3, 2})}} - \underset{ϕ_{2, 1, 2}}{\underset{︸}{F (ζ_{1, 2}, ζ_{2, 1}, ζ_{3, 2})}} - \underset{ϕ_{2, 2, 1}}{\underset{︸}{F (ζ_{1, 2}, ζ_{2, 2}, ζ_{3, 1})}} \\ + \underset{ϕ_{1, 1, 2}}{\underset{︸}{F (ζ_{1, 1}, ζ_{2, 1}, ζ_{3, 2})}} + \underset{ϕ_{1, 2, 1}}{\underset{︸}{F (ζ_{1, 1}, ζ_{2, 2}, ζ_{3, 1})}} + \underset{ϕ_{2, 1, 1}}{\underset{︸}{F (ζ_{1, 2}, ζ_{2, 1}, ζ_{3, 1})}} - \underset{ϕ_{1, 1, 1}}{\underset{︸}{F (ζ_{1, 1}, ζ_{2, 1}, ζ_{3, 1})}}, \end{matrix} \end{matrix}

where

ζ_{i, j_{i}}

values correspond to the atoms on the grid, that is,

ζ_{i, 1} = x_{i, k_{i}}

and

ζ_{i, 2} = x_{i, k_{i}^{'}}

, with

k_{i} < k_{i}^{'}

,

i = 1, 2, 3

. The latter expressions can be abridged to

\begin{matrix} 0 & \leq & ϕ_{2, 2} - ϕ_{1, 2} - ϕ_{2, 1} + ϕ_{1, 1}, \end{matrix}

(9)

and

\begin{matrix} \begin{matrix} 0 & \leq & ϕ_{2, 2, 2} - ϕ_{1, 2, 2} - ϕ_{2, 1, 2} - ϕ_{2, 2, 1} + ϕ_{1, 1, 2} + ϕ_{1, 2, 1} + ϕ_{2, 1, 1} - ϕ_{1, 1, 1} \end{matrix} \end{matrix}

(10)

introducing

ϕ_{j_{1}, j_{2}, \dots} = F (ζ_{1, j_{1}}, ζ_{2, j_{2}}, \dots)

. The summation sign pattern for values of F, or equivalently

ϕ_{i_{1}, i_{2}, \dots}

, may be best illustrated graphically, as seen in Figure 4.

The following elementary, yet critical, observation can be made and is given as Proposition 3.

Proposition 3.

Along with pmf

p \in ℜ^{m^{n}}

, consider the c.d.f.

ß \in ℜ^{m^{n}}

and the survival function

s \in ℜ^{m^{n}}

, defined by the corresponding linear transformations

\begin{matrix} Π : p \mapsto ß, d e f i n e d a s ß_{i} = \sum_{j \leq i} p_{j} and S : p \mapsto s, d e f i n e d a s s_{i} = \sum_{j \geq i} p_{j} . \end{matrix}

Then, for any

z \in ℜ^{m^{n}}

we have

\begin{matrix} p^{T} z = \sum_{I} p_{i} z_{i} = {(Π^{- 1} z)}^{T} (S p) = {(S^{- 1} z)}^{T} (Π p) . \end{matrix}

The proof of the above proposition, although somewhat tedious, simply relies on accounting for the indices in the summation

\sum_{I} p_{i} z_{i}

. We also note that for

n = 2

, the c.d.f. ordering of two distributions with identical marginals is equivalent to the ordering of the survival functions.

Lemma 1.

Let the two bivariate discrete random variables

X^{(1)}

and

X^{(2)}

have identical marginals and the corresponding c.d.f.

ß^{(1)}, ß^{(2)}

. Then,

ß_{i}^{(1)} \leq ß_{i}^{(2)}, \forall i

, if and only if

s_{i}^{(1)} \leq s_{i}^{(2)}, \forall i

.

The proof is a straightforward implication of the inclusion–exclusion type fact that

\begin{matrix} Pr (X_{1} > x_{1}, X_{2} > x_{2}) = 1 - Pr (X_{1} \leq x_{1}) - Pr (X_{2} \leq x_{2}) + Pr (X_{1} \leq x_{1}, X_{2} \leq x_{2}) . \end{matrix}

Finally, we are now able to prove Theorem 4.

Proof of Theorem 4.

The sub-problem (6) can be re-parameterized using the survival function

s_{i}

,

\begin{matrix} \begin{matrix} \underline{v} (t) = & inf_{s} {(Π^{- 1} {(z - t)}_{+})}^{T} s (\equiv inf_{p} {(z - t)}_{+}^{T} S^{- 1} S p) \\ s . t . & {\underline{S}}_{i} \leq s_{i} \leq {\bar{S}}_{i}, for all i \in I, \\ s_{i} = \frac{m - k + 1}{m}, for all i = (1, k) or (k, 1), k = 1, \dots, m, \\ s_{(1, 1)} = 1, \\ S^{- 1} s \geq 0, \end{matrix} \end{matrix}

(11)

where the survival function bounds

{\underline{S}}_{i}, {\bar{S}}_{i}

may easily be computed applying the inclusion–exclusion type formula similar to that in Lemma 1.

We now make a critical observation, that

Π^{- 1} {(z - t)}_{+} \geq 0

for all t. From the definition of aggregate risk values

z

, we observe that

Π^{- 1} {(z - t)}_{+} \geq 0

if and only if the rectangle inequality (9) holds for all

\begin{matrix} \begin{matrix} ϕ_{1, 1} = {(x_{1, i_{1}} + x_{2, i_{2}} - t)}_{+} \equiv {(z_{(i_{1}, i_{2})} - t)}_{+}, & ϕ_{1, 2} = {(x_{1, i_{1}} + x_{2, i_{2}^{'}} - t)}_{+} \equiv {(z_{(i_{1}, i_{2}^{'})} - t)}_{+}, \\ ϕ_{2, 1} = {(x_{1, i_{1}^{'}} + x_{2, i_{2}} - t)}_{+} \equiv {(z_{(i_{1}^{'}, i_{2})} - t)}_{+}, & ϕ_{2, 2} = {(x_{1, i_{1}^{'}} + x_{2, i_{2}^{'}} - t)}_{+} \equiv {(z_{(i_{1}^{'}, i_{2}^{'})} - t)}_{+}, \end{matrix} \end{matrix}

with

1 \leq i_{1} < i_{1}^{'} \leq m, 1 \leq i_{2} < i_{2}^{'} \leq m

and any t. Since

x_{1, i_{1}} \leq x_{1, i_{1}^{'}}

and

x_{2, i_{2}} \leq x_{2, i_{2}^{'}}

, we also have partial ordering of

ζ

values, namely

\begin{matrix} ϕ_{1, 1} \leq ϕ_{1, 2} \leq ϕ_{2, 2} and ϕ_{1, 1} \leq ϕ_{2, 1} \leq ϕ_{2, 2} . \end{matrix}

The range of all t values may clearly be partitioned into

T_{-} = (- \infty, z_{(i_{1}, i_{2})}]

,

T = (z_{(i_{1}, i_{2})}, z_{(i_{1}^{'}, i_{2}^{'})})

and

T_{+} = [z_{(i_{1}^{'}, i_{2}^{'})}, \infty)

. Therefore, there are three possible cases.

(a): $t \in T_{-}$ : rectangle inequality (9) clearly holds as

$\begin{matrix} 0 = (x_{1, i_{1}} + x_{2, i_{2}} - t) - (x_{1, i_{1}} + x_{2, i_{2}^{'}} - t) - (x_{1, i_{1}^{'}} + x_{2, i_{2}} - t) + (x_{1, i_{1}^{'}} + x_{2, i_{2}^{'}} - t) . \end{matrix}$
(b): $t \in T$ : the validity of the rectangle inequality may easily be established by assuming, without a loss in generality, that $z_{(i_{1}, i_{2}^{'})} \leq z_{(i_{1}^{'}, i_{2})}$ and considering further sub-cases depending on where the value of t falls with respect to $z$ subintervals. For example, if $t \in (z_{(i_{1}, i_{2})}, z_{(i_{1}, i_{2}^{'})}]$ , then $ϕ_{1, 1} > z_{(i_{1}, i_{2})} - t$ , and thus, the rectangle inequality results in positive mass. That is,

$\begin{matrix} 0 < {(x_{1, i_{1}} + x_{2, i_{2}} - t)}_{+} - (x_{1, i_{1}} + x_{2, i_{2}^{'}} - t) - (x_{1, i_{1}^{'}} + x_{2, i_{2}} - t) + (x_{1, i_{1}^{'}} + x_{2, i_{2}^{'}} - t) . \end{matrix}$
(c): $t \in T_{+}$ : clearly the inequality holds as all $ϕ$ values are 0.

Therefore,

Π^{- 1} {(z - t)}_{+} \geq 0

indeed holds.

To complete the proof, consider problem (11), where

\underline{S}

and

\bar{S}

correspond to

X^{(1)}

and

X^{(2)}

, respectively. Due to the non-negativity of the objective coefficients

Π^{- 1} {(z - t)}_{+}

in (11), clearly

{\underline{CVaR}}_{α}

corresponds to

X^{(1)}

. Similarly, considering a variant of (11) to evaluate the upper

{\bar{CVaR}}_{α}

bound, we conclude that

{\bar{CVaR}}_{α}

corresponds to

X^{(2)}

. The fact that

{\underline{CVaR}}_{α} \leq {\bar{CVaR}}_{α}

completes the proof. □

6.2. A Few Possible Generalizations and Some Counter-Examples

We first provide the definitions of some stochastic ordering and for two multivariate risks

X^{(1)}, X^{(2)} \in ℜ^{n}

we define

upper-orthant ordering $X^{(1)} ⪯_{u o} X^{(2)}$ if $Pr (X^{(1)} > x) \leq Pr (X^{(2)} > x)$ for all $x \in ℜ^{n}$ ;
lower-orthant ordering $X^{(1)} ⪯_{l o} X^{(2)}$ if $Pr (X^{(1)} \leq x) \leq Pr (X^{(2)} \leq x)$ for all $x \in ℜ^{n}$ ;
concordance ordering $X^{(1)} ⪯_{c o} X^{(2)}$ if $X^{(1)} ⪯_{u o} X^{(2)}$ and $X^{(1)} ⪯_{l o} X^{(2)}$ ;
persistent ordering $X^{(1)} ⪯_{p o} X^{(2)}$ if $X^{(1)} ⪯_{u o} X^{(2)}$ and $X^{(1)} ⪰_{l o} X^{(2)}$ .

Note that, due to Lemma 1, persistent ordering for

n = 2

results in identical distributions, and is therefore not interesting to be investigated for the bivariate case. Recall from the proof of Theorem 4, that for

n = 2

, the persistence of CVaR ordering relies on the implied upper-orthant stochastic ordering of the respective risks. Consequently, in search of an extension of such a result to

n = 3

, the following question appears to be a natural place to start: Is it true that for trivariate distributions

X^{(1)} ⪯_{u o} X^{(2)}

, we have

{CVaR}_{α} (Z^{(1)}) \leq {CVaR}_{α} (Z^{(2)})

for all

α \in (0, 1)

, with

Z^{(1)}, Z^{(2)}

being the corresponding aggregate risks?

From now on, we fix

n = 3

and the marginals of

X^{(1)}, X^{(2)}

to be uniform. The key to constructing a counter-example to the above is the failure of the rectangle inequality (10) over

{(z - t)}_{+}

values. In turn, this results in a re-parameterized three-dimensional analogue of (11) that has both positive and negative objective coefficients

Π^{- 1} {(z - t)}_{+}

for some suitably chosen t. Specifically, consider the only relevant CVaR estimation values of t that correspond to

z_{i}, i \in I

, in accordance with Theorem 2. We claim that for carefully chosen

z

, we can pick two values

t_{+}

and

t_{\pm}

such that

Π^{- 1} {(z - t_{+})}_{+} \geq 0

, while

Π^{- 1} {(z - t_{\pm})}_{+}

contains both positive and negative entries. As a consequence of

Π^{- 1} {(z - t_{+})}_{+}

being sign-indeterminant, when estimating

{\underline{CVaR}}_{α}, {\bar{CVaR}}_{α}

with bounds

\underline{S}, \bar{S}

corresponding to the respective distributions

X^{(1)}, X^{(2)}

, it is natural to expect that we may end up having

{\underline{CVaR}}_{α} < {\bar{CVaR}}_{α}

for some values of

α

as well as

{\underline{CVaR}}_{α} > {\bar{CVaR}}_{α}

for other values of

α

. From here, it may simply suffice to pick “correct” scaling constants

α

in extremal characterisations of

CVaR

.

To make this precise, consider the set of risk values for

m = 2

and

m = 3

in Table 5, with aggregate risk values depicted on the hypercube lattice in Figure 5. Take

t_{+} = 0

,

t_{\pm} = 1

. Clearly, rectangle inequality (10) holds at

t_{+}

and results in negative mass at

t_{\pm}

, due to the fact that

{(ϕ_{1, 1, 1} - t)}_{+} > ϕ_{1, 1, 1} - t

.

Now, we can use

z

values to produce a desired counter-example for upper-orthant ordering. To do so, we can form an LP problem to maximize the difference between two partial value-type estimates

\sum_{I} {(z - t)}_{+} p^{(1)}

and

\sum_{I} {(z - t)}_{+} p^{(2)}

. We subject both pmfs

p^{(1)}

and

p^{(2)}

to have identical marginals, and the resulting survival functions

s^{(1)} = S p^{(1)}

and

s^{(2)} = S p^{(2)}

to satisfy the upper-orthant ordering, i.e.,

s^{(1)} \leq s^{(2)}

. The last problem can be solved for all

t = z_{i}, i \in I

, to extract an example.

A similar exercise can be carried out for the other stochastic orderings. Thus, for the sake of brevity, we give only a summary of our findings in Table 6 and Table 7. In order to verify the results, it suffices to perform a direct calculation. For instance, with respect to the upper-orthant ordering, observe that

Z^{(1)}

takes values 0, 11, 101, and 110, and

Z^{(2)}

takes values 1, 10, 100, and 111 all with equal probabilities of one-quarter. Consequently,

{CVaR}_{. 1} (Z^{(1)}) = 61 \frac{2}{3}

and

{CVaR}_{. 1} (Z^{(2)}) = 61 \frac{5}{9}

. Further, one can verify that

{CVaR}_{α} (Z^{(1)}) > (<) {CVaR}_{α} (Z^{(2)})

holds for any

α < (>) . 5

.

Interestingly, counter-examples to lower/upper-orthant and persistent orderings require a distribution supported at vertices of a single hypercube, that is,

m = 2

. On the other hand, the concordant ordering appears to require more degrees of freedom, e.g.,

m = 3

. Note that in the latter case, due to the risks being potentially supported on

m^{n} = 27

vertices, we present the example in a “sparse” format, as seen in Table 8.

7. Conclusions

The problem of finding the entire spectrum of values for CVaR of a sum of dependent random variables under dependence uncertainty could be approached in various ways. Under restrictive assumptions, analytical approaches are implementable, but the bounds are often loose, and occasionally, not sharp. Even if the sharpness issue is not present, the lower and upper bounds are typically attained under dependence models that are difficult to justify as feasible in practice, especially for portfolios consisting of many risks, since such extreme dependence models are not realistic.

Our contribution is two-fold. Firstly, we provide a first-in-its-class numerical method for constrained CVaR estimation, when the marginal distributions are known while only the bounds are available for the joint distribution. The latter setting is backed up by many observational data, where the dependence structure is rarely computable even if multivariate observational data are available. As a result, the lower and upper sharp bounds of the CVaR-based aggregate risk can be found. We analyse the complexity of the proposed methods for calculating these bounds, as well as substantiate our findings via numerical illustrations. Our approach trivially generalizes to non-uniform marginals. Despite the fact that the computational cost increases very rapidly with the number of risks, we believe that the method may still be used as a viable exploratory tool when dealing with a relatively large risk portfolio. Finally, we show how the run-times may be significantly improved, by exploiting the very special structure of the underlying linear optimization problems at the formulation stage.

Secondly, it is known that CVaR respects the so-called lower-orthant stochastic ordering for two-dimensional portfolios. Yet, no similar result has yet been established or disproved for higher dimensions. As a by-product of our analysis, using elementary LP techniques, we show that no such extensions are possible. Specifically, we construct trivariate counter-examples that demonstrate a lack of aggregate risk monotonicity under upper, lower-orthant, concordant, and persistent stochastic orderings. We also give a self-contained alternative proof for the bivariate risk case, and point out the exact reason why higher-dimensional extensions are not possible.

Author Contributions

The authors contributed equally to this article. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partly funded by Natural Sciences and Engineering Research Council Discovery grant RGPIN/07199-2019.

Data Availability Statement

Data is synthetic, with methodology described in the manuscript.

Acknowledgments

We would like to thank the authors of the free online course CVX 101 (Stephen Boyd and Lieven Vandenberghe) for the invaluable resource that is made open to the public through Stanford University, that helped to sharpen our focus on convexity. We would also like to thank Michael Grant for making the CVX modeling environment freely available to the general public. We would also like to thank the authors of SDPT3 (Kim-Chuan Toh, Michael Todd and Reha Tutuncu) for making their solver publicly available, as well as Gurobi for providing a free academic license for their top-of-the-line LP-MIP solver. The second author would also like to express gratitude to NSERC and PIMS for supporting this piece of research.

Conflicts of Interest

The authors declare no conflict of interest.

Notes

1	For a multivariate vector $(X_{1}, \dots, X_{n})$ comonotonicity formally is defined as follows: there exists a random vector Z and non-decreasing functions $f_{k}$ for all $1 \leq k \leq n$ such that $Pr (X_{k} = f_{k} (Z)) = 1$ for all $1 \leq k \leq n$ .
2	Intuitively, on average, for a uniform random integer number between 1 and m, exactly $\frac{m + 1}{2 m} \cdot 100$ % of integers in $1, \dots, m$ are less than or equal to the chosen number. Observing that the c.d.f. constraints have non-zeros at exactly such “lesser” sub-indices along each dimension, a proof can be established by induction.

References

Acerbi, Carlo, and Dirk Tasche. 2002. On the Coherence of Expected Shortfall. Journal of Banking and Finance 26: 1487–503. [Google Scholar] [CrossRef]
Asimit, Alexandru V., Alexandru M. Badescu, Steven Haberman, and Eun-Seok Kim. 2016. Efficient risk allocation within a non-life insurance group under Solvency II Regime. Insurance: Mathematics and Economics 66: 69–76. [Google Scholar] [CrossRef]
Asimit, Alexandru V., Junlei Hu, and Yuantao Xie. 2019. Optimal Robust Insurance with a Finite Uncertainty Set. Insurance: Mathematics and Economics 87: 67–81. [Google Scholar] [CrossRef]
Asimit, Alexandru V., Valeria Bignozzi, Ka Chun Cheung, Junlei Hu, and Eun-Seok Kim. 2017. Robust and Pareto Optimality of Insurance Contract. European Journal of Operational Research 262: 720–32. [Google Scholar] [CrossRef]
Balbás, Alejandro, Beatriz Balbás, and Antonio Heras. 2011. Stable Solutions for Optimal Reinsurance Problems involving Risk Measures. European Journal of Operational Research 214: 796–804. [Google Scholar] [CrossRef]
BCBS. 2016. Standards. In Minimum Capital Requirements for Market Risk. Basel Committee on Banking Supervision. Basel: Bank for International Settlements, January. [Google Scholar]
Ben-Tal, Aharon, Laurent El Ghaoui, and Arkadi Nemirovski. 2009. Robust Optimization. Princeton: Princeton University Press. [Google Scholar]
Bertsimas, Dimitris, and Ioana Popescu. 2002. On the Relation between Option and Stock Prices: A Convex Optimization Approach. Operations Research 50: 358–74. [Google Scholar] [CrossRef]
Bertsimas, Dimitris, Geoffrey J. Lauprete, and Alexander Samarov. 2004. Shortfall as a Risk Measure: Properties, Optimization and Applications. Journal of Economic Dynamics and Control 28: 1353–81. [Google Scholar] [CrossRef]
Billingsley, Patrick. 1995. Probability and Measure, 3rd ed. New York: John Wiley and Sons. [Google Scholar]
Blanchet, Jose, Henry Lam, Qihe Tang, and Zhongyi Yuan. 2017. Applied Robust Performance Analysis for Actuarial Applications. Technical Report, Society of Actuaries. Available online: https://web.stanford.edu/~jblanche/papers/Robust_Actuarial.pdf (accessed on 5 May 2023).
Chen, Wenqing, Melvyn Sim, Jie Sun, and Chung-Piaw Teo. 2009. From CVaR to Uncertainty Set: Implications in Joint Chance-Constrained Optimization. Operations Research 58: 470–85. [Google Scholar] [CrossRef]
Denuit, Michel, Jan Dhaene, Marc Goovaerts, and Rob Kaas. 2005. Actuarial Theory for Dependent Risks: Measures, Orders and Models. Chichester: Wiley. [Google Scholar]
El Ghaoui, Laurent, Maksim Oks, and Francois Oustry. 2003. Worst-case Value-at-risk and Robust Portfolio Optimization: A conic Programming Approach. Operations Research 51: 543–56. [Google Scholar] [CrossRef]
Fabozzi, Frank J., Dashan Huang, and Guofu Zhou. 2010. Robust Portfolios: Contributions from Operations Research and Finance. Annals of Operations Research 176: 191–220. [Google Scholar] [CrossRef]
Gijbels, Irène, and Dominik Sznajder. 2013. Positive Quadrant Dependence Testing and Constrained Copula Estimation. Canadian Journal of Statistics 41: 36–64. [Google Scholar] [CrossRef]
Gómez, Fabio, Qihe Tang, and Zhiwei Tong. 2022. The Gradient Allocation Principle based on the Higher Moment Risk Measure. Journal of Banking & Finance 143: 106544. [Google Scholar]
Gurobi Optimization, LLC. 2023. Gurobi Optimizer Reference Manual. Available online: https://www.gurobi.com (accessed on 5 May 2023).
Huang, Dashan, Shushang Zhu, Frank J. Fabozzi, and Masao Fukushima. 2010. Portfolio Selection under Distributional Uncertainty: A Relative Robust CVaR Approach. European Journal of Operational Research 203: 185–94. [Google Scholar] [CrossRef]
Rockafellar, R. Tyrrell, and Stanislav Uryasev. 2000. Optimization of Conditional Value-at-Risk. Journal of Risk 2: 21–41. [Google Scholar] [CrossRef]
Swiss Solvency Test. 2006. FINMA SST Technisches Dokument. Available online: https://www.finma.ch/FinmaArchiv/bpv/download/e/SST_techDok_061002_E_wo_Li_20070118.pdf (accessed on 5 May 2023).
Tardos, Éva. 1986. A Strongly Polynomial Algorithm to Solve Combinatorial Linear Programs. Operations Research 34: 250–56. [Google Scholar] [CrossRef]
Zhu, Shushang, and Masao Fukushima. 2009. Worst-Case Conditional Value-at-Risk with Application to Robust Portfolio Management. Operations Research 57: 1155–68. [Google Scholar] [CrossRef]
Zinchenko, Y. 2023. CVaR Engine for a Sharp Lower Bound, GitHub Repository. Available online: https://github.com/yzinchenko/CVaR (accessed on 5 May 2023).
Zymler, Steve, Daniel Kuhn, and Berç Rustem. 2013. Worst-case Value-at-risk of Nonlinear Portfolios. Management Science 59: 172–88. [Google Scholar] [CrossRef]

Figure 1. Perceived behavior of

\underline{val} (t)

: (a) strict piecewise concavity, (b) convexity.

Figure 1. Perceived behavior of

\underline{val} (t)

: (a) strict piecewise concavity, (b) convexity.

Figure 2. Hypothetical concavity of

\underline{v} (t)

: (a)

z_{(ℓ)} = t^{=}

, (b)

z_{(ℓ)} \neq t^{=}

.

Figure 2. Hypothetical concavity of

\underline{v} (t)

: (a)

z_{(ℓ)} = t^{=}

, (b)

z_{(ℓ)} \neq t^{=}

.

Figure 3. Epigraph scheme.

Figure 4. Rectangle inequality summation sign pattern.

Figure 5. Trivariate aggregate risk values with

m = 2

.

Figure 5. Trivariate aggregate risk values with

m = 2

.

Table 1. Average run-time (in seconds) for naïve and epigraph schemes for small size problems with

ε = 10^{- 10}

.

Table 1. Average run-time (in seconds) for naïve and epigraph schemes for small size problems with

ε = 10^{- 10}

.

			SDPT3					Gurobi
		Naïve		Epigraph		Naïve			Epigraph
n	m	CVX	CVX+	CVX	CVX+	CVX	CVX+	Direct	CVX	CVX+	Direct
2	2	2.59	2.53	6.71	6.77	1.41	1.38	0.10	6.80	6.66	0.27
	4	9.26	8.38	17.04	16.16	4.06	3.80	0.19	8.01	7.52	0.33
	6	21.85	18.49	18.79	16.85	8.76	8.03	0.36	8.28	7.40	0.38
	8	42.94	35.09	20.75	17.76	16.13	13.97	0.84	8.51	7.41	0.50
	10	77.72	61.51	22.35	18.25	26.60	22.32	1.76	9.20	7.70	0.73
	12	149.49	117.61	29.06	25.27	41.82	33.68	3.57	9.95	8.17	1.11
	14	264.68	200.38	35.92	31.02	63.71	49.53	7.29	10.59	8.40	1.66
3	3	16.05	13.67	19.68	17.10	6.51	6.11	0.29	8.04	7.59	0.37
	4	42.80	35.11	22.57	18.86	15.36	14.05	0.83	8.46	7.70	0.53
	5	109.92	87.24	27.32	22.98	32.05	28.60	2.57	9.18	8.23	0.95
	6	299.76	224.36	39.94	33.52	61.86	53.93	8.40	10.06	8.90	1.91

Table 2. Average run-time (in seconds) for the best lower bound estimation scheme—epigraph-based with direct Gurobi—for small to medium size problems with

ε = 10^{- 7}

, with a run-time limit of 15 min.

Table 2. Average run-time (in seconds) for the best lower bound estimation scheme—epigraph-based with direct Gurobi—for small to medium size problems with

ε = 10^{- 7}

, with a run-time limit of 15 min.

$n ∖ m$	3	6	9	12	15	20	30	40	50	60
2	0.24	0.30	0.48	0.91	1.71	4.76	28.08	93.44	240.52	543.52
3	0.29	1.62	16.87	114.93	600.53	-	-	-	-	-

Table 3. Average run-time (in seconds) for Gurobi-based epigraph scheme with respect to (a)

α

, with

ε = 10^{- 5}

, and (b)

ε

.

Table 3. Average run-time (in seconds) for Gurobi-based epigraph scheme with respect to (a)

α

, with

ε = 10^{- 5}

, and (b)

ε

.

			(a)					(b)
$n$	$m ∖ α$	0.1	0.9	0.99	$n$	$m ∖ ϵ$	$10^{- 3}$	$10^{- 5}$	$10^{- 7}$
2	20	3.47	3.93	4.21	2	20	3.47	3.96	4.56
	30	20.36	23.36	25.61		30	19.24	22.29	27.53
	40	68.34	74.08	85.97		40	64.46	79.07	92.04
3	6	1.16	1.37	1.46	3	6	1.16	1.35	1.62
	9	11.83	13.90	14.47		9	12.22	14.14	16.84
	12	77.88	95.16	100.65		12	81.13	96.21	122.45

Table 4. Average run-time (in seconds) for the upper bound using Gurobi-based epigraph and direct LP embedding methods, with

ε = 10^{- 10}

, with a run-time limit of 15 min.

Table 4. Average run-time (in seconds) for the upper bound using Gurobi-based epigraph and direct LP embedding methods, with

ε = 10^{- 10}

, with a run-time limit of 15 min.

Method	$n ∖ m$	3	6	9	12	20	30	40	50	60
Epigraph	2	0.32	0.35	0.60	1.14	5.88	35.60	139.0	354.4	688.6
Direct LP		0.05	0.07	0.13	0.33	2.64	12.05	39.2	98.4	218.1
Epigraph	3	1.94	21.7	172.5	802.2	-	-	-	-	-
Direct LP		0.67	7.41	44.6	190.9	-	-	-	-	-

Table 5. Trivariate risk sample values.

Risk	$x_{i, 2}$	$x_{i, 3}$
$i = 1$	100	200
$i = 2$	10	20
$i = 3$	1	2

Table 6. Cases in which

{CVaR}_{α} (Z^{(1)}) > {CVaR}_{α} (Z^{(2)})

.

Table 6. Cases in which

{CVaR}_{α} (Z^{(1)}) > {CVaR}_{α} (Z^{(2)})

.

Ordering	$α$	i	$p_{(1, 1, 1)}^{(i)}$	$p_{(2, 1, 1)}^{(i)}$	$p_{(1, 2, 1)}^{(i)}$	$p_{(2, 2, 1)}^{(i)}$	$p_{(1, 1, 2)}^{(i)}$	$p_{(2, 1, 2)}^{(i)}$	$p_{(1, 2, 2)}^{(i)}$	$p_{(2, 2, 2)}^{(i)}$
$⪯_{u o}$	0.1	1	1/4	-	-	1/4	-	1/4	1/4	-
		2	-	1/4	1/4	-	1/4	-	-	1/4
$⪯_{l o}$	0.9	1	-	1/4	1/4	-	1/4	-	-	1/4
		2	1/4	-	-	1/4	-	1/4	1/4	-
$⪯_{p o}$	0.1	1	1/4	-	-	1/4	-	1/4	1/4	-
		2	-	1/4	1/4	-	1/4	-	-	1/4

Table 7. Cases in which

{CVaR}_{α} (Z^{(1)}) < {CVaR}_{α} (Z^{(2)})

.

Table 7. Cases in which

{CVaR}_{α} (Z^{(1)}) < {CVaR}_{α} (Z^{(2)})

.

Ordering	$α$	i	$p_{(1, 1, 1)}^{(i)}$	$p_{(2, 1, 1)}^{(i)}$	$p_{(1, 2, 1)}^{(i)}$	$p_{(2, 2, 1)}^{(i)}$	$p_{(1, 1, 2)}^{(i)}$	$p_{(2, 1, 2)}^{(i)}$	$p_{(1, 2, 2)}^{(i)}$	$p_{(2, 2, 2)}^{(i)}$
$⪯_{u o}$	0.9	1	1/4	-	-	1/4	-	1/4	1/4	-
		2	-	1/4	1/4	-	1/4	-	-	1/4
$⪯_{l o}$	0.1	1	-	1/4	1/4	-	1/4	-	-	1/4
		2	1/4	-	-	1/4	-	1/4	1/4	-
$⪯_{p o}$	0.9	1	1/4	-	-	1/4	-	1/4	1/4	-
		2	-	1/4	1/4	-	1/4	-	-	1/4

Table 8. Trivariate concordant risks with

{CVaR}_{. 5} (Z^{(1)}) > {CVaR}_{. 5} (Z^{(2)})

and

{CVaR}_{. 9} (Z^{(1)}) < {CVaR}_{. 9} (Z^{(2)})

.

Table 8. Trivariate concordant risks with

{CVaR}_{. 5} (Z^{(1)}) > {CVaR}_{. 5} (Z^{(2)})

and

{CVaR}_{. 9} (Z^{(1)}) < {CVaR}_{. 9} (Z^{(2)})

.

i	(2,1,1)	(3,1,1)	(2,3,1)	(3,3,1)	(1,2,2)	(2,1,3)	(3,1,3)	(2,3,3)	(3,3,3)
$z_{i}$	100	200	120	220	11	102	202	122	222
$p_{i}^{(1)}$	1/6	-	-	1/6	1/3	-	1/6	1/6	-
$p_{i}^{(2)}$	-	1/6	1/6	-	1/3	1/6	-	-	1/6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zinchenko, Y.; Asimit, A.V. Modeling Risk for CVaR-Based Decisions in Risk Aggregation. J. Risk Financial Manag. 2023, 16, 266. https://doi.org/10.3390/jrfm16050266

AMA Style

Zinchenko Y, Asimit AV. Modeling Risk for CVaR-Based Decisions in Risk Aggregation. Journal of Risk and Financial Management. 2023; 16(5):266. https://doi.org/10.3390/jrfm16050266

Chicago/Turabian Style

Zinchenko, Yuriy, and Alexandru V. Asimit. 2023. "Modeling Risk for CVaR-Based Decisions in Risk Aggregation" Journal of Risk and Financial Management 16, no. 5: 266. https://doi.org/10.3390/jrfm16050266

Article Menu

Modeling Risk for CVaR-Based Decisions in Risk Aggregation

Abstract

1. Introduction

2. Model Setting

3. Computable Lower Bound

3.1. Reduction to Parametric LP

3.2. Compact Support in t

3.3. Key Properties of the Value Function

3.4. Two Computational Approaches

3.4.1. Naïve Scheme

3.4.2. Epigraph Scheme

4. Computable Upper Bound

5. Numerical Results

5.1. Verbatim Implementation

5.2. Stylized Practice-Inspired Example

6. A Special Case and Its Higher-Dimensional Variants

6.1. An Alternative Proof for $n = 2$

6.2. A Few Possible Generalizations and Some Counter-Examples

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Modeling Risk for CVaR-Based Decisions in Risk Aggregation

Abstract

1. Introduction

2. Model Setting

3. Computable Lower Bound

3.1. Reduction to Parametric LP

3.2. Compact Support in t

3.3. Key Properties of the Value Function

3.4. Two Computational Approaches

3.4.1. Naïve Scheme

3.4.2. Epigraph Scheme

4. Computable Upper Bound

5. Numerical Results

5.1. Verbatim Implementation

5.2. Stylized Practice-Inspired Example

6. A Special Case and Its Higher-Dimensional Variants

6.1. An Alternative Proof for n = 2

6.2. A Few Possible Generalizations and Some Counter-Examples

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

6.1. An Alternative Proof for $n = 2$