On the statistical properties and tail risk of violent conflicts

https://doi.org/10.1016/j.physa.2016.01.050Get rights and content

Highlights

  • New methodology to deal with apparently infinite-mean phenomena.

  • The new method can be of interest for all researchers dealing with heavy-tailed but bounded random variables.

  • New application on war casualties’ data, using a novel data set that will be shared upon acceptance.

  • Interesting applications in other fields of science.

Abstract

We examine statistical pictures of violent conflicts over the last 2000 years, providing techniques for dealing with the unreliability of historical data.

We make use of a novel approach to deal with fat-tailed random variables with a remote but nonetheless finite upper bound, by defining a corresponding unbounded dual distribution (given that potential war casualties are bounded by the world population). This approach can also be applied to other fields of science where power laws play a role in modeling, like geology, hydrology, statistical physics and finance.

We apply methods from extreme value theory on the dual distribution and derive its tail properties. The dual method allows us to calculate the real tail mean of war casualties, which proves to be considerably larger than the corresponding sample mean for large thresholds, meaning severe underestimation of the tail risks of conflicts from naive observation. We analyze the robustness of our results to errors in historical reports.

We study inter-arrival times between tail events and find that no particular trend can be asserted.

All the statistical pictures obtained are at variance with the prevailing claims about “long peace”, namely that violence has been declining over time.

Introduction

Since the middle of last century, there has been a multidisciplinary interest in wars and armed conflicts, quantified in terms of casualties  [1], [2], [3], [4], [5], [6], [7], [8]; while other studies have also covered the statistics of terrorism  [9], [10], [11]. From a statistical point of view, recent contributions have attempted to show that the distribution of war casualties (or terrorist attacks’ victims) tends to have heavy tails, characterized by a power law decay  [9], [12]. Often, the analysis of armed conflicts falls within the broader study of violence  [13], [4], with the aim to understand whether we as human are more or less violent and aggressive than in the past, and what role institutions played in that respect. Accordingly, the public intellectual arena has witnessed active debates, such as the one between Steven Pinker  [4] on one side, and John Gray  [14] on the other, concerning the hypothesis that “the long peace” is a statistically established phenomenon, or a mere statistical sampling error, characteristic of heavy-tailed phenomena—the latter case being corroborated by this paper.

Using a new data set containing 565 armed conflicts with more than 3000 casualties over the period 1–2015 AD, we confirm that the distribution of war casualties exhibits a very heavy right-tail. The tail is so heavy that–at first glance–war casualties could represent an infinite-mean phenomenon  [15]. But should the distribution of war casualties have an infinite mean, the annihilation of the human species would be just a matter of time, and the sample properties we can compute from data have no meaning at all in terms of statistical inference. In reality, a simple argument allows us to rule out the infiniteness of the mean: no event or series of events can kill more than the whole world population. The support of the distribution of war casualties is thus necessarily bounded, and the true mean cannot be infinite.

Let [L,H] be the support of the distribution of war casualties today. L cannot be smaller than 0, and we can safely fix it at some level L0 to ignore those small events that are not readily definable as armed conflict  [7]. As to H, its value cannot be larger than the world population, i.e. 7.2 billion  [16] people in 2015 (today’s world population can be safely taken as the upper bound, as never before humanity reached similar numbers).

If Y is the random variable representing war casualties, its range of variation is very large and equal to HL. Studying the distribution of Y can be difficult, given its bounded but extremely wide support. Finding the right parametric model among the set of possible ones is hard, while nonparametric approaches are more difficult to interpret from a risk analysis point of view.

Our approach is to transform the data in order to apply the powerful tools of extreme value theory. Since H< we suggest a special log-transformation of the data, different from the others in the literature. This allows us to use tools such as the Generalized Pareto approximation of tails  [17], [18], simplifying the choice of the model to fit to our data.

Let L and H be respectively the lower and the upper bound of a random variable Y, and define the function φ(Y)=LHlog(HYHL). It is easy to verify that

  • 1.

    φC,

  • 2.

    φ1()=H,

  • 3.

    φ1(L)=φ(L)=L.

Then Z=φ(Y) defines a new random variable with lower bound L and an infinite upper bound. Notice that the transformation induced by φ() does not depend on any of the parameters of the distribution of Y, and that φ() is monotone. In what follows, we will call the distributions of Y and Z, respectively the real and the dual distribution.

By studying the tail properties of the dual distribution (the one with an infinite upper bound), using extreme value theory, we will be able to obtain, by reverting to the real distribution, what we call the shadow tail mean of war casualties. We will show that this mean can be 3 times larger than the sample mean, but nevertheless finite.

We assume that many observations are missing from the data set (because of under-reported conflicts), and we also take into consideration the fact that war casualties are just non-precise estimates  [19], on which historians often have disputes, without anyone’s ability to verify the assessments using period sources. For instance, an event that took place in the eighth century, the An Lushan rebellion, is thought to have killed 13 million people, but there is no precise or reliable methodology to allow us to trust that number, which could be conceivably one order of magnitude smaller. For a long time, indeed, an assessment of the drop in population in China was made on the basis of tax census, which might be attributable to a drop in the aftermath of the rebellion in surveyors and tax collectors  [20].

Using resampling techniques, we show that our tail estimates are robust to changes in the quality and reliability of data. Our results and conclusions would replicate even if we missed almost a fifth of the data.

Finally, focusing on a more reliable subset covering the last 500 years of data, one cannot observe any specific trend in the number of conflicts, as large wars appear to follow a homogeneous Poisson process. This memorylessness in the data conflicts with the idea that war violence has declined over time, as proposed by Goldstein  [2] and Pinker  [4].

Section snippets

The data

The data set contains 565 events over the period 1–2015 AD; an excerpt is shown in Table 1. Events are generally armed conflicts, such as interstate wars and civil wars, with a few exceptions represented by violence against citizens perpetrated by the bloodiest dictatorships, such as Stalin’s and Mao Zedong’s regimes. These were included in order to be consistent with previous works about war victims and violence  [4], [6].

In the words of Wallensteen and Sollenberg  [7], “an armed conflict is a

Descriptive data analysis

Fig. 2 shows casualties over time and it is composed of four subfigures, two related to raw data and two related to rescaled data. Given Eq. (1), rescaled and dual data are approximately the same, hence there is no need to show further pictures. For each type of data, using the radius of the different bubbles, we show the relative size of each event in terms of victims with respect to the world population (A, B) and within our data set (C, D). Note that the choice of the type of data, or of the

Tail risk of armed conflicts

The previous data analysis suggests a heavy right tail for the distribution of war casualties, both for raw and rescaled data, and it is consistent with the existing literature  [9], [12], [10]. Using extreme value theory, or EVT  [18], [31], [35], we can model this tail by the means of a Generalized Pareto Distribution (GPD). The choice is due a result of Gnedenko  [17], further extended by Pickands, Balkema and de Haan  [38], [39].

Consider a random variable Z with unknown distribution

Estimating the shadow mean and the high quantiles

Given the finite upper bound H, we known that E[Y] must be finite as well. A simple idea is to estimate it by computing the sample mean, or the conditional tail mean above a given threshold E[Y|Yu]—what in risk management is called expected shortfall, if we are more interested in the tail behavior of war casualties, as in this paper. For a minimum threshold u equal to 145k, this value is 1.77×107 (remember that Y represents the rescaled data).

However, in spite of its boundedness, the support

Missing data and imprecisions

An effective way of checking the robustness of our estimates to the “quality and reliability” of data is to use resampling techniques, which allow us to deal with non-precise observations  [29] and possibly missing data. We have performed three different experiments:

  • Using jackknife, we have created 100k samples, by randomly removing up to 20% of the observations lying above the 25k thresholds. In more than 99% of cases ξ1. As expected, the shape parameter ξ goes below the value 1 only if most

Frequency of armed conflicts

Can we say something about the frequency of armed conflicts over time? Can we observe some trend?

In this section, we show that our data tend to support the idea that armed conflicts are likely to follow a homogeneous Poisson process  [21], [6], especially when we focus on events with large casualties.

The good GPD approximation allows us to use a well-known model of extreme value theory: the Peaks-over-Threshold, or POT. According to this approach, exceedances over a high threshold follow a

Acknowledgments

Captain Mark Weisenborn initially engaged in the fundamental task of compiling the data, checking across sources and linking each conflict to a narrative. We are extremely grateful for his work.

We also benefited from generous help on social networks where we put data for scrutiny, as well as advice from historians such as Ben Kiernan.

We thank Raphael Douady, Yaneer Bar Yam, Joe Norman, Alex (Sandy) Pentland and his lab for discussions and comments.

P. C. acknowledges the support by the Marie

References (44)

  • P. Cirillo

    Are your data really Pareto distributed?

    Physica A

    (2013)
  • J.A. Villasenor-Alva et al.

    A bootstrap goodness of fit test for the generalized Pareto distribution

    Comput. Statist. Data Anal.

    (2009)
  • D. Berlinski

    The Devil’s Delusion: Atheism and its Scientific Pretensions

    (2009)
  • J.L. Goldstein

    Winning the War on War: The Decline of Armed Conflict Worldwide

    (2011)
  • J.E. Mueller

    Retreat from Doomsday: The Obsolescence of Major War

    (1989)
  • S. Pinker

    The Better Angels of Our Nature: Why Violence Has Declined

    (2011)
  • L.F. Richardson

    Variation of the frequency of fatal quarrels with magnitude

    J. Amer. Statist. Assoc.

    (1948)
  • L.F. Richardson

    Statistics of Deadly Quarrels

    (1960)
  • P. Wallensteen et al.

    Armed conflict 1989–2000

    J. Peace Res.

    (2001)
  • M. White

    The Great Big Book of Horrible Things

    (2011)
  • A. Clauset et al.

    Estimating the historical and future probabilities of large terrorist events

    Ann. Appl. Stat.

    (2013)
  • A. Scharpf et al.

    Forecasting of the risk of extreme massacres in Syria

    Eur. Rev. Int. Stud.

    (2014)
  • Advances in terrorism risk analysis

    Risk Anal.

    (2013)
  • J.A. Friedman

    Using power laws to estimate conflict size

    J. Conflict. Resolut.

    (2015)
  • L.E. Cederman

    Modeling the size of wars: from billiard balls to sandpiles

    Amer. Polit. Sci. Rev.

    (2003)
  • J. Gray, Steven Pinker is wrong about violence and war. The Guardian, Friday 13 March, 2015. Available art:...
  • J. Nešlehová et al.

    Infinite mean models and the LDA for operational risk

    J. Oper. Risk

    (2006)
  • United Nations - Department of Economic and Social Affairs. 2015 Revision of World Population Prospects

    (2015)
  • D.V. Gnedenko

    Sur la distribution limité du terme d’une série aléatoire

    Ann. of Math.

    (1943)
  • L. de Haan et al.

    Extreme Value Theory: An Introduction

    (2006)
  • M. Spagat et al.

    Estimating war deaths: An arena of contestation

    J. Conflict. Resolut.

    (2009)
  • BBC (2012). In our times: The An Lushan Rebellion....
  • Cited by (0)

    View full text