On the statistical properties and tail risk of violent conflicts
Introduction
Since the middle of last century, there has been a multidisciplinary interest in wars and armed conflicts, quantified in terms of casualties [1], [2], [3], [4], [5], [6], [7], [8]; while other studies have also covered the statistics of terrorism [9], [10], [11]. From a statistical point of view, recent contributions have attempted to show that the distribution of war casualties (or terrorist attacks’ victims) tends to have heavy tails, characterized by a power law decay [9], [12]. Often, the analysis of armed conflicts falls within the broader study of violence [13], [4], with the aim to understand whether we as human are more or less violent and aggressive than in the past, and what role institutions played in that respect. Accordingly, the public intellectual arena has witnessed active debates, such as the one between Steven Pinker [4] on one side, and John Gray [14] on the other, concerning the hypothesis that “the long peace” is a statistically established phenomenon, or a mere statistical sampling error, characteristic of heavy-tailed phenomena—the latter case being corroborated by this paper.
Using a new data set containing 565 armed conflicts with more than 3000 casualties over the period 1–2015 AD, we confirm that the distribution of war casualties exhibits a very heavy right-tail. The tail is so heavy that–at first glance–war casualties could represent an infinite-mean phenomenon [15]. But should the distribution of war casualties have an infinite mean, the annihilation of the human species would be just a matter of time, and the sample properties we can compute from data have no meaning at all in terms of statistical inference. In reality, a simple argument allows us to rule out the infiniteness of the mean: no event or series of events can kill more than the whole world population. The support of the distribution of war casualties is thus necessarily bounded, and the true mean cannot be infinite.
Let be the support of the distribution of war casualties today. cannot be smaller than 0, and we can safely fix it at some level to ignore those small events that are not readily definable as armed conflict [7]. As to , its value cannot be larger than the world population, i.e. 7.2 billion [16] people in 2015 (today’s world population can be safely taken as the upper bound, as never before humanity reached similar numbers).
If is the random variable representing war casualties, its range of variation is very large and equal to . Studying the distribution of can be difficult, given its bounded but extremely wide support. Finding the right parametric model among the set of possible ones is hard, while nonparametric approaches are more difficult to interpret from a risk analysis point of view.
Our approach is to transform the data in order to apply the powerful tools of extreme value theory. Since we suggest a special log-transformation of the data, different from the others in the literature. This allows us to use tools such as the Generalized Pareto approximation of tails [17], [18], simplifying the choice of the model to fit to our data.
Let and be respectively the lower and the upper bound of a random variable , and define the function It is easy to verify that
- 1.
,
- 2.
,
- 3.
.
By studying the tail properties of the dual distribution (the one with an infinite upper bound), using extreme value theory, we will be able to obtain, by reverting to the real distribution, what we call the shadow tail mean of war casualties. We will show that this mean can be 3 times larger than the sample mean, but nevertheless finite.
We assume that many observations are missing from the data set (because of under-reported conflicts), and we also take into consideration the fact that war casualties are just non-precise estimates [19], on which historians often have disputes, without anyone’s ability to verify the assessments using period sources. For instance, an event that took place in the eighth century, the An Lushan rebellion, is thought to have killed 13 million people, but there is no precise or reliable methodology to allow us to trust that number, which could be conceivably one order of magnitude smaller. For a long time, indeed, an assessment of the drop in population in China was made on the basis of tax census, which might be attributable to a drop in the aftermath of the rebellion in surveyors and tax collectors [20].
Using resampling techniques, we show that our tail estimates are robust to changes in the quality and reliability of data. Our results and conclusions would replicate even if we missed almost a fifth of the data.
Finally, focusing on a more reliable subset covering the last 500 years of data, one cannot observe any specific trend in the number of conflicts, as large wars appear to follow a homogeneous Poisson process. This memorylessness in the data conflicts with the idea that war violence has declined over time, as proposed by Goldstein [2] and Pinker [4].
Section snippets
The data
The data set contains 565 events over the period 1–2015 AD; an excerpt is shown in Table 1. Events are generally armed conflicts, such as interstate wars and civil wars, with a few exceptions represented by violence against citizens perpetrated by the bloodiest dictatorships, such as Stalin’s and Mao Zedong’s regimes. These were included in order to be consistent with previous works about war victims and violence [4], [6].
In the words of Wallensteen and Sollenberg [7], “an armed conflict is a
Descriptive data analysis
Fig. 2 shows casualties over time and it is composed of four subfigures, two related to raw data and two related to rescaled data. Given Eq. (1), rescaled and dual data are approximately the same, hence there is no need to show further pictures. For each type of data, using the radius of the different bubbles, we show the relative size of each event in terms of victims with respect to the world population (A, B) and within our data set (C, D). Note that the choice of the type of data, or of the
Tail risk of armed conflicts
The previous data analysis suggests a heavy right tail for the distribution of war casualties, both for raw and rescaled data, and it is consistent with the existing literature [9], [12], [10]. Using extreme value theory, or EVT [18], [31], [35], we can model this tail by the means of a Generalized Pareto Distribution (GPD). The choice is due a result of Gnedenko [17], further extended by Pickands, Balkema and de Haan [38], [39].
Consider a random variable with unknown distribution
Estimating the shadow mean and the high quantiles
Given the finite upper bound , we known that must be finite as well. A simple idea is to estimate it by computing the sample mean, or the conditional tail mean above a given threshold —what in risk management is called expected shortfall, if we are more interested in the tail behavior of war casualties, as in this paper. For a minimum threshold equal to 145k, this value is (remember that represents the rescaled data).
However, in spite of its boundedness, the support
Missing data and imprecisions
An effective way of checking the robustness of our estimates to the “quality and reliability” of data is to use resampling techniques, which allow us to deal with non-precise observations [29] and possibly missing data. We have performed three different experiments:
- •
Using jackknife, we have created 100k samples, by randomly removing up to 20% of the observations lying above the 25k thresholds. In more than 99% of cases . As expected, the shape parameter goes below the value 1 only if most
Frequency of armed conflicts
Can we say something about the frequency of armed conflicts over time? Can we observe some trend?
In this section, we show that our data tend to support the idea that armed conflicts are likely to follow a homogeneous Poisson process [21], [6], especially when we focus on events with large casualties.
The good GPD approximation allows us to use a well-known model of extreme value theory: the Peaks-over-Threshold, or POT. According to this approach, exceedances over a high threshold follow a
Acknowledgments
Captain Mark Weisenborn initially engaged in the fundamental task of compiling the data, checking across sources and linking each conflict to a narrative. We are extremely grateful for his work.
We also benefited from generous help on social networks where we put data for scrutiny, as well as advice from historians such as Ben Kiernan.
We thank Raphael Douady, Yaneer Bar Yam, Joe Norman, Alex (Sandy) Pentland and his lab for discussions and comments.
P. C. acknowledges the support by the Marie
References (44)
Are your data really Pareto distributed?
Physica A
(2013)- et al.
A bootstrap goodness of fit test for the generalized Pareto distribution
Comput. Statist. Data Anal.
(2009) The Devil’s Delusion: Atheism and its Scientific Pretensions
(2009)Winning the War on War: The Decline of Armed Conflict Worldwide
(2011)Retreat from Doomsday: The Obsolescence of Major War
(1989)The Better Angels of Our Nature: Why Violence Has Declined
(2011)Variation of the frequency of fatal quarrels with magnitude
J. Amer. Statist. Assoc.
(1948)Statistics of Deadly Quarrels
(1960)- et al.
Armed conflict 1989–2000
J. Peace Res.
(2001) The Great Big Book of Horrible Things
(2011)