Introduction

Emerging technologies that permit data collection at unprecedented scale can reveal patterns of individual and group behavior1,2,3,4,5 and they have propelled the development of data-driven mathematical models of human conflict. These include models of insurgent and terrorist activity6,7,8,9,10, global trends in violence across human history11,12,13, cultural and ethnic confrontation14,15,16,17 and the structure and dynamics of criminal organizations18,19,20.

Here, we turn our attention to violent extremist groups that make intensive use of Internet fora for recruitment, illegal financing, internal communication and, in fully radicalized individuals, operative support of real world violent activity21,22,23,24. For the past decade, extremists have been using online networks to socialize their members and to reinforce aspects of their radical culture25,26. These networks diminish the sense of isolation for members of extremist groups and give them access to a global community that enjoys the legal and religious approval of the leaders of their movements27. They also facilitate the acquisition of articles, books and audiovisual material with radical content that are created in clandestine or underground circles. Without access to these networks, obtaining material like these would be much more difficult28.

Online fora are particularly important for extremist groups, as they play a key role in the terrorist recruitment process. These platforms have not only become a common medium for extremist organizations to spread their propaganda and obtain funding, but they have also transformed into the primary arena where the process of violent radicalization takes place29. Extremist fora allow users to reinforce their identities as militants through virtual activism and help to create a hierarchy within the group that can be easily monitored by the group leaders. In fact, it is possible to detect specific participation rules in these fora that allow users to improve their reputation within the group and their status within the community30. Those users who achieve the highest status may then be recruited to engage in real world operations21,22,23,24.

Although it is well documented that recruitment into online fora can lead to recruitment into terrorist organizations, it is important to emphasize that these processes are not the same. We are not claiming that online fora are a proxy for real world terrorism. Rather, we are claiming that they are an important entry point for some people who later engage in real world acts of violent extremism and it is therefore important to understand how these fora operate and how we might act to reduce their effectiveness.

Several thousand websites support and promote extremist discourse, but not all have the same impact. A very small number of sites exert a very large amount of influence because they are the only ones to directly receive material prepared by extremist groups31,32,33,34,35,36. On these fora one can find ideological diatribes, interviews and audio and video recordings of recent acts of violence. The other sites simply repeat and amplify the content disseminated by the most prominent extremist fora.

Extremist Internet fora operate in a hostile environment as their growing importance has caused them to come under scrutiny in recent years. Indeed, the conventional wisdom is that the longevity of an online community is inversely proportional to its importance in the radical universe. Organizations dedicated to monitoring extremist groups on the web note that 80% of the top websites to emerge between 2002 and 2004 have disappeared because they have been unable to withstand continuous harassment by security agencies, private groups and hackers37,38. These attacks are designed to infiltrate and boycott extremist fora39 and they have also succeeded in provoking mistrust and mutual accusations between rival groups21. As a result, administrators are suspicious of new users who are too active or who express a clear interest in contacting forum members in person. Some platforms have implemented stricter self-protection measures such as making their entire content available only to registered users, closing admission to new members, or accepting only new users who have been endorsed by a current one.

Actors seeking to disrupt recruitment to extremist fora face several technical questions40,41,42,43. Which fora should be targeted for an attack? What strategy should be used? Given that these attacks require costly human resources to be implemented, it is important to assess the impact of various strategies and to do this we need data and a model that will help us to better understand how fora react when they are under stress. Elucidating these questions will shed light into an important debate: Should terrorist groups be allowed to use the Internet, or should they be banned from it44?

In order to analyze the potential impact of cybernetic attacks, we recorded the number of members registered from 10 different fora, as reported automatically by the forum platforms (see Table 1, Table 2 and Figure 1 for the list of forums analyzed and monitoring statistics). Recordings were performed at 12 AM GMT. The data collection started 1/1/2008 for the Forum 2, which is the earliest forum that we were able to detect and spanned the next 4 years for the rest of the fora. The number of measurements per forum ranges from 7 to 49 and the sampling rate was constrained by the availability and accessibility of each forum. In Table 2 we provide the details of the data collected for each forum and overall statistics showing, among other things, the skewness of the distribution (notice that the mean number of members is significantly larger than the median). Forum 10 is not shown here, as it was discarded from our analysis for reasons discussed below.

Table 1 List of online fora analyzed. Forum names and URLs are available from the authors on request
Table 2 Monitoring statistics for the fora analyzed. The statistics are calculated for the full observation period and all fora, with the aim to give an intuition on the large variability in our ability to collect data across different fora
Figure 1
figure 1

(TOP) Number of members participating in the 9 extremist fora under study.

Note that initially the fora grow exponentially and then saturate growing at a exponential yet much slower rate. (BOTTOM) Fora time shifted to match the early stages of the development of every fora and gain visual intuition on their dynamics. In particular, we aligned the early stages of the all fora with Forum 8, with the exception of Forum 5, which was aligned with Forum 9 due to a similar volume of members.

The reported information was publicly accessible without login or participation required. No further checks on the validity of these reported figures were performed, so we cannot rule out systematic biases introduced by the forum administrators (e.g. multiple registrations by the same users, counting of inactive users, etc). During the observation period we did not detect any deliberate efforts to strategically misrepresent the data. However, given the possibly quite large incentives for misrepresentation, further work on the assessment of data veracity is certainly needed. We also cannot rule out problems arising from the left-censoring of our data, as we lack information about activity prior to the detection of a forum.

Although the observation period for the 10 fora spans 4 years, we were unable to record information on a daily basis, as we would find the fora inaccessible on many occasions – mostly de-activated with no website connectivity. The time-span for which these fora would be de-activated varied from days to months. Even though we may suspect that the fora under study were shut down as a result of an attack, we do not have direct evidence of this attack and cannot establish a causal link.

In selecting the sites for analysis, we focused attention on fora that constituted the “inner circle” or core of the violent extremist presence on Internet. These fora 1) served as primary sources of dissemination of propaganda produced directly by extremist organizations, 2) had users that were implicated in real-world acts of terror and 3) had the largest following and influence in the extremist cyber-community31,32,33,34,35,36. In addition, we only included a particular forum if we detected at least one instance where sharing of text or multimedia material prepared by extremists took place during the observation period. These fora also contained normal discussion and sharing of non-extremist information among members.

We excluded from our analysis “secondary” fora that hosted radical content copied from other sites because they typically attract very few users. This also lead us to discard one of the ten fora under study, run from Spain prior to August 2011. During its three-month existence, it only managed to attract 13 registered users despite its more than 7, 800 discussion topics and 10, 000 posts from other sites, which were uploaded daily by the site administrator45.

In Figure 1 we show the rate of growth for each forum and given the different initial start dates we also show these curves fitted to reduce the Euclidean distance to Forum 8, which had the most observations. These data suggest that fora exhibit a first phase of exponential growth followed by a saturation phase in which they continue to grow exponentially but at a much slower rate. This contrasts with the constant exponential growth (with probably a very large cutoff established by the size of the susceptible population) one might expect if members were recruited and allowed to join without restrictions (as in the classic susceptible-infected epidemiological model46,47). Case studies suggest that extremist fora control growth by limiting the number of participants48.

In what follows, we describe a dynamic characterization for the evolution of one forum and then the population dynamics of multiple interacting fora. Our goal is to provide a plausible explanation for the empirical observations and to understand the impact of persistent disruptive attacks on the evolution of these online communities.

Results

We start by denoting N as the total population that is susceptible to becoming a forum member and let π1,0 be the rate of becoming a forum member. Within the forum itself there are several levels of status. We are interested in the number of users at each level, so let Xi = Xi(t) be the number of active members at level i. We assume that there is a process for advancement that allows users to go from one level to another. The rate of transition between levels is given by the transition probability matrix πi,j which denotes the probability per unit of time of moving from level j to level i.

For simplicity, we assume the transition probability matrix can only move upwards, hence πi,j = 0 for ji. It may be the case that some users exit the process at different points through withdrawal (i.e. de-registration), inertia, lack of Internet access and so on. However, in our model, the probability of exit is absorbed into the transition probabilities from one level to the next one, where a probability of disengagement would yield a smaller probability of transition upwards. We also assume that the number of law enforcement and intelligence agents inhabiting these fora are much smaller than the number of genuine members, as otherwise it would raise awareness of the forum administrators and compromise their infiltration operation.

Suppose that there are L levels within the forum. At the highest level an individual may become fully recruited to engage in real world operations (O) or to become a recruiter (R). We denote O and R the output of the model and recruiters, respectively, produced by the forum. Recruitment is the bottleneck of the radicalization process, as the number of recruiters determine the change rate of X, R and O. This is implicit in the evolution of XL, as in this equation, the members leaving the forum at level L depend on R, not on XL.

The dynamics of the populations within a particular forum can be written as

where 2 ≤ iL – 1 and pR,R and pO,R denote the transition rates to R and O, respectively. In these equations we assume that the number of recruiters is smaller than the number of members because there is a bottleneck in the conversion from being a forum member to being a recruiter (R) or engaging in real life terrorism (O). We also assume that the total population is much larger than the number of users in the forum N X.

The model itself is a variation of a mass-action compartment model familiar in mathematical epidemiology46,47. Conceptual simplicity drives the choice of this model, though we must stress the caveat that changes in psychological state or activity may not necessarily be similar to changes in disease status.

Full observation of all the states L, O, R presents ethical, experimental49,50 and legal complications51, as it requires interaction with the Forum participants for long periods of time. Thus, we present a simplified version of the model where we merge all the levels of the fora into one: X = X1 + X2 + … + XL. Thus, we can rewrite the previous equations as

The initial conditions are X(0) = 0, R(0) = R0 ≥ 1, O(0) = 0. The main output of this model is dO/dt = pO,RR and note that we are assuming that there is a fixed number of susceptible individuals, N, who may become members of the forum. We also need to take into account the fact that the success of a forum may depend on its popularity (for example, larger fora may be able to attract more new members). Thus, we make the assumption that π1,0 has an explicit dependence on the forum size as follows, π1,0(X) = pN(X)X, with pN(X) > 0 for X > 0.

Next, we address the possibility that as fora become more popular, they are more likely to draw the attention of anti-extremist authorities and other agencies and therefore may be attacked and shut down. We can model this by assuming that, when forum population (X) reaches a threshold θ, the forum is attacked and disabled, causing all its variables (X, R, O) to go to 0.

Forum administrators may try to balance two forces. On the one hand, they are aware that larger fora are more likely to be attacked, so they may try to control the number of users on the forum. On the other, they may want to maximize membership, making X as large as possible (i.e., just below the threshold θ).

For simplicity, suppose the transition rates to become a recruiter or terrorist are constant such that and . Assume X* is the target value. Let X(t*) = X*, where X* is the target control condition for the number of members of the forum. The stationarity condition dX(t*)/dt = 0 implies that

where R* = R(t*). We extrapolate pN(X*) linearly in X and R, that is,

where is a positive constant so that pN(X, R) ≥ 0 for all X, R ≥ 0. Note that the stationarity condition causes the transition rate per time unit and forum participant pN to depend on both X and R. Because control is exerted by individual recruiters we assume condition (4) will be applied with a time delay τ > 0. This delay is one factor that makes it impossible for the recruiter to stay below the target X* indefinitely (see Methods, section “Implications of Delayed Control”).

We also need to take into account that the target set by the recruiters, X*, cannot be stationary over time, because the lack of attacks on the forum raises the expectations of the target values. We model this by a linear factor, λ, that increases the target value, X* over time. Figure 1 shows an example of the exponential growth rate of the forum and how it tends to decrease as the number of members increase. The equations for the controlled forum can be written as

where pN (X, R, X*) is given as in equation (4). Note that we have replaced N in the first equation by (NXRO) because the number of individuals susceptible of becoming forum members are the ones that remain outside the Forum.

It can be shown analytically that this model with control displays the double exponential behavior of the empirical data. There is fast exponential growth when forum administrators are not exerting control ( when t ≈ τ, t > τ) and slower exponential growth ( when tt*) when they are close to their control target, i.e. c > p (see Methods, Section “Linear asymptotic behavior of the forum”). Moreover, there is additional, subtle evidence in the data that such control mechanisms exist – notice in Figure 1 that Forum 5 breaks out of an equilibrium around Day 1450 and becomes exponential again, suggesting that the control mechanism was discontinued at that point.

We also derived an alternative model without control in which the only limiting factor for growth of the forum was the size of the population N (see Methods, section “Specification of the model without control”). In other words, it is possible that the slowdown in growth may be driven by the fact that there are fewer susceptible individuals in the population as forum membership increases. However, in Figure 2 we use data from Forum 8 to demonstrate that the model with control outperforms the alternative model, even when we let N be a free parameter of the model that we can fit. We also show in Table 3 the values fitted to other fora in our data set, all of which indicate that the controlled model fits considerably better than the uncontrolled model. The parameter can be easily estimated from the equations that describe the initial growth of the forum (see Methods, section “Linear asymptotic behavior of the forum,” CASE 1). The parameters and can also be uniquely estimated by the saturation process of the forum (details in Methods section “Linear asymptotic behavior of the forum,” CASE 2). The rest of the parameters are obtained by minimizing the RMSE using the PGAPack library52. Note that the mean-field model we propose is based on a deterministic (not stochastic) dynamical system and therefore it has a unique solution for a particular set of parameters.

Table 3 Parameter values of the model fitted to each individual forum by minimizing root mean square error (RMSE) for the controlled model. For comparison, the RMSE obtained after fitting the parameters of the uncontrolled model is also shown for each forum. Forum 5 is not included in the table as it breaks out of an equilibrium around Day 1450 and becomes exponential again, suggesting that the control mechanism was discontinued at that point
Figure 2
figure 2

Fit of the model to the Forum 8 by minimizing root mean square error (RMSE) for the model with control (solid black line) and the uncontrolled one (solid blue).

Note that even when we add an additional parameter to the uncontrolled model (the number of susceptible individuals, N) the model without control cannot fit the data well (dashed line).

We now extend our model of a single forum to one that includes the interplay between several fora by linking them together in a model of competition. Let I be the total number of individuals, N(t) the number of individuals susceptible to entering a forum (that is, those who have not yet entered a forum at time t), NF the total number of fora. For simplicity let us assume that each individual can only belong to one forum. Then,

which means that the total number of individuals in the system is conserved. When a particular forum j is eliminated at time τ, then N+) = N) + Xj) and Xj+) = 0. Note that this may lead to a potential increase in the growth of other active fora. Forum j can reappear again (perhaps with a different name or identity) but with a lower equilibrium point, adjusted below the level that triggered the attack as Xj,*(τ+) = μXj,*(τ), where μ can vary in the range (0, 1). None of the fora know a priori what the threshold θ = Xj) value is and this uncertainty creates a market for competition ruled by self-regulation. We model the reappearing time with a Poisson distribution with rate of 1 event in time TR.

We make use of the conservation identity in equation (6) to derive the following expressions for rates of growth in each forum i = 1, …, NF:

Initially each forum has zero members, Xi(t = 0) = 0 and at least 1 recruiter, Ri(t = 0) = 1 and produces no violent output Oi(0) = 0. Note that each forum fixes their own target values Xi,*(t = 0), with their own estimation of the threshold starting with a uniformly random distribution of thresholds at the beginning. Since fora are started at random times, we initialize pN(Xi(t − τ), Ri(t − τ), Xi,*) with some probability, κ.

As an illustration we conduct a numerical simulation of 50 fora, NF = 50, depicted in Figure 3. An external agent (for example, a counter-terrorism agency) allows fora to grow until they reach 14, 000 members (θ = 14, 000) but no forum administrator knows this a priori. In this example, we assume that administrators apply their control rules with a one day delay, τ = 1. When a forum is attacked and deactivated, its membership goes back to 0 and it reappears with an adjusted threshold that is μ = 0.99 times the former value. We also assume that on average 20 fora can be activated in a year, that is κ = 20/(NF × 365 days).

Figure 3
figure 3

Simulated number of members of 25 randomly chosen fora out of 50 during a period of 10 years: I = 17,000,000, Nf = 50 , pR = 0.0026, , θ = 14, 000 and τ = 1 day and λ = 9.49.

The results show that fora are able to control the size of their membership for long periods of time, which is not a trivial problem from a distributed control point of view. Moreover, differences in forum lifetimes are based on the heterogeneity in their different control target thresholds, Xi,* and the variability in the parameter values , and . This effect leads to different rates of growth for different fora, even though the fact that policymakers exert the same pressure on all of them.

To illustrate the implications of self-regulation, we show in Figure 4 the dependence of total output and the number of fora disrupted as a function of the policy maker threshold, θ. Notably, total output increases nearly exponentially with the threshold, suggesting that even weak control mechanisms can have a big effect on reducing membership. However, the marginal effect of a stricter (lower θ) policy decreases dramatically as control is tightened. Note also that the number of fora disrupted decreases exponentially with θ, suggesting that efforts to tighten control will require an increasingly costly effort in detection and deactivation. Therefore, the amount of extremism reduced per unit cost of control will decline quickly with stricter policies both because of decreasing effectiveness of the attacks as well as increasing costs of detection/deactivation of fora.

Figure 4
figure 4

Total output and number of forum disruptions versus the policymaker threshold θ (lower values indicate stricter policies) using the same set of parameters as in Figure 3.

Discussion

The model presented here reproduces the observed dynamics of violent extremist fora. In particular, the model shows (i) an initially steep and exponential increase in forum members, (ii) an intermediate interval in which the number of new members continues to grow exponentially but at a decreasing rate and (iii) a final stabilization period, in which the number of participants smoothly converges (and, eventually, overshoots) a target number of members. Our analysis shows that the model is robust to varying assumptions about the recruitment and control process, accommodating any arbitrary number of status ranks in these fora and allowing for time delay in the implementation of the access control mechanism.

Even though we and other researchers lack detailed evidence about when and how forum self-controls might be implemented, it is known that extremist fora are the target of law enforcement organizations. Our model suggests that a simple strategic reaction to these attacks can explain dynamic changes in forum membership. A plausible explanation suggested by field-researchers is that the forum leaders are able to ‘broadcast’ their message to the population in a top down manner rather than across a well defined network structure, which is known to display a bursty, slow speed of diffusion via word-of-mouth53,54. Our model can also explain oddities in the empirical data, such as the observation that some fora stop growing for long periods of time, and, in some cases, they suddenly start growing after being stagnant (see Figure 1; Forum 5). This suggests that some fora occasionally eliminate or drastically alter their self-control mechanisms to allow for a new burst of growth.

The model also lets us test the effect of different strategies that might be employed by law enforcement agencies. The fact that membership increases exponentially with less strict policies means that stricter policies may be decreasingly effective at the margin. Moreover, the number of fora that must be attacked increases exponentially with stricter policies, suggesting that the costs of enforcing stricter polices may increase exponentially. Thus, we expect a sharp decline in the effectiveness of targeting additional fora of smaller size. As a result, we would recommend targeting policies aimed at occasionally disrupting a small number of the largest groups. Resources devoted to a stricter policy might be better used elsewhere. Our findings therefore back Berger's “Strategy doesn't have to be an all-or-nothing proposition.”55.

An important limitation of our model is that it is just a model – there may be other ways to explain the growth rates we observe. Our results yield better fit than a simple growth model without control, suggesting that the fora have recruitment procedures that control access in order to keep the number of members below a time-dependent threshold. But it is possible that other processes, particularly those based on recruitment within social networks (rather than random recruitment from the population) may also fit the data. We also note that the limited size of our dataset may hinder our efforts to validate the proposed model, but it is important to emphasize that these data are difficult to collect due ethical, experimental and legal impediments to monitoring the activity of extremist fora. Finally, one might expect that a deliberate policy of limiting the rate of growth to be something that would itself be discussed on a forum, but during our observations we did not find any evidence of such discussions.

That said, if this model approximates the true underlying process, it suggests that even minimal efforts to target extremist fora may have a large effect on reducing their membership and therefore their capacity to recruit for real world violent acts. It also suggests that much of this effect is due to the self-control induced in the fora rather than the direct effect of temporarily dispersing the members of a given forum.

While our original aim was to better understand extremist groups, we note that our model might also be of use to those seeking to understand how legitimate fora react under stress. For example, there have been numerous recent examples of repressive regimes that seek to thwart the use of social media to mobilize legitimate opposition protest56,57 and in future work it would be interesting to see whether our model can explain changes in the membership and self-control mechanisms adopted by fora that are used in such protests.

Methods

Forum names and URLs are available from the authors on request.

Implications of delayed control

The evolution equation for X is

for and

for . The resulting model yields delayed control behavior that is like steering a boat: one tends to overshoot the target values because the action is implemented with a delay.

Since the smaller τ the better the control, we will consider the condition expressed in equation (8). The target value X* would be reached at time t* by the uncontrolled system. At this point,

by (5) and

by equation (2) and equation (3). It follows from equation (8) that up to O2),

The fact that dX(t*)/dt < 0 for τ > 0 sufficiently small, shows that X(t) exceeds X* before the time t = t* (since ).

Linear asymptotic behavior of the forum

Let us linearize at the early stages of the formation of the forums and at later stages when forum membership reaches a high number. We know from Eq. (5) that

Replacing this in Eq. (8), we get

This equation is nonlinear due to the second term on the right hand side. With this equation we can now study two limiting cases.

CASE 1

If t ≈ τ, t > τ, then we may replace X* − X(t − τ) by X* in (12) since X(t − τ) X*. Eq. (12) becomes linear:

where

because and R0/X* 1 and

Hence

CASE 2

If tt*, then X* − X(t − τ) ≈ 0 according to Eq. (11). Eq. (12) becomes also linear, see Eq. (13), but now

Analogously to Case 1, the solution is

This shows that the population dynamics model is consistent wit the double exponentials observed in the data. Moreover, it follows from Eq. 14 that X(t) starts growing at an exponential rate given by the parameter

By the time , the exponential growth rate implied by Eq. 15 is

Since N, X* 1 and on general grounds , , we conclude that c > p, i.e., the growth of forum members is much greater at the beginning than at the end of the time span (0, t*).

Specification of the model without control

The model without control can be simply written as

These equations show that the only mechanism the uncontrolled system has for saturation is the reduction of susceptible individuals (NXRO) over time. However Table 3 and Figure 2 indicate that even when N is small and an adjustable parameter, the controlled model fits the empirical data considerably better than the uncontrolled model.