Introduction

Game theory is a framework for analysing the outcome of the strategic interaction between decision makers1. The fundamental concept is that of a Nash equilibrium where no player can improve her payoff by a unilateral strategy change. Typically, the Nash equilibrium is considered to be the optimal outcome of a game, however in social dilemmas the individual optimal outcome is at odds with the collective optimal outcome2. This means that one player can improve her payoff at the expense of the other by unilaterally deviating, but if both deviate, they end up with lower payoffs. In this type of games, the mutually beneficial, but non-Nash equilibrium strategy is called cooperation. However, in this context cooperation should not be interpreted as an interest in the welfare of others, as players only aim to secure a high payoff for themselves.

In this framework, payoff maximisation is considered to be rational, but when such rational players then seize every opportunity to gain at the opponent’s expense, they may counterintuitively both end up with low payoffs. A game that clearly exhibits this contradiction is the Traveler’s Dilemma. Since its formulation in 1994 by the economist Kaushik Basu3, the game has become one of the most studied in the economics literature. Additionally, it has been discussed in theoretical biology in the context of evolutionary game theory.

In general, the dilemma relies on the individuals’ incentive to undercut the opponent. To be more specific, players are motivated to claim a lower value than their opponent to reach a higher payoff at the opponent’s expense. Such incentive leads players to a systematic mutual undercutting until the lowest possible payoff is reached, which is the unique Nash equilibrium. It seems paradoxical that players defined as rational in a game theoretical sense end up with such a poor outcome. Therefore, the question that naturally arises is how can this poor outcome be prevented and how cooperation can be achieved.

To address these questions, it can be helpful to better understand price wars, which consist in the mutual undercutting of prices to gain market share. In addition, it can provide information about human behaviour, because economic experiments have shown that individuals prefer to choose the cooperative high payoff action, instead of the Nash equilibrium4.

Our analysis focuses on showing that the Traveler’s Dilemma can be decomposed into a local and a global game. If the payoff optimisation is constrained to the local game, then players will inevitably end up in the Nash equilibrium. However, if players escape the local maximisation and optimise their payoff for the global game, they can reach the cooperative high payoff equilibrium.

Here, we show that the cooperative equilibrium can be reached in a game like the Traveler’s Dilemma due to diversity, which we define as the presence of suboptimal strategies. The appearance of strategies far from those of the residents allows for the local maximisation process to be escaped, such that an optimisation at a global level takes place. Overall this can lead to cooperation because by considering “suboptimal strategies” that play against each other it is possible to reach higher payoffs, both collectively and individually.

Game

The Traveler’s Dilemma is a two-player game. Player i has to choose a claim, \(n_i\), from the action space, consisting of all integers on the interval [LU], where \(0 \le L < U\). The payoffs are determined as follows:

  • If both players, i and j, choose the same value (\(n_i = n_j\)), both get paid that value.

  • There is a reward parameter \(R>1\), such that if \(n_i < n_j\), then i receives \(n_i + R\) and j gets \(n_i- R\)

Thus, the payoff of player i playing against player j is

$$\begin{aligned} \pi _{ij} = {\left\{ \begin{array}{ll} n_i& \text { if } n_i = n_j\\ n_i + R& \text { if } n_i < n_j\\ n_j - R& \text { if } n_i > n_j \end{array}\right. } \end{aligned}$$
(1)

Thus, a player is better off by choosing a slightly lower value than the opponent: when j plays \(n_j\), then it is best for i to play \(n_j-1\). The iteration of this reasoning, which we will call the stairway to hell, leads to the only Nash equilibrium of the game, \(\{L,L\}\), where both players choose the lowest possible claim. The classical game theory method to arrive to this equilibrium is called iterative elimination of dominated strategies5.

The game can be visualised through its payoff matrix (Fig. 1). For simplicity, we use the values from the original formulation: \(L=2\), \(U=100\) and \(R=2\). The payoff matrix shows that the Traveler’s Dilemma can be decomposed into a local and a global game. Let us begin with the local game. When the action space of the game is reduced to two adjacent actions n and \(n+1\) (black boxes in Fig. 1), the Traveler’s Dilemma with \(R=2\) is equivalent to the Prisoner’s Dilemma6. In general, for any value of R, the Traveler’s Dilemma becomes a Prisoner’s Dilemma for any pair of actions n and \(n+s\), where \( 1 \le s \le R-1 \). For example, for \(R=4\) the pair of actions n and \(n+1\), n and \(n+2\), n and \(n+3\) follow the same game structure as the Prisoner’s Dilemma. Therefore, the Traveler’s Dilemma consists of many embedded Prisoners’ Dilemmas. This means that at a local level the game is a Prisoner’s Dilemma.

If we now consider actions that are distant from each other in the action space, e.g. 2 and 100, we can observe a coordination game structure (gray boxes in Fig. 1), where \(\{100,100\}\) is payoff and risk dominant7,8. In general, any pair of actions n and \(n+s\), where \( R \le s \le U-n\), construct a coordination game. As a result, the Traveler’s Dilemma becomes a coordination game at a global level, which has different equilibria than the local game.

Figure 1
figure 1

Payoff matrix of the Traveler’s Dilemma. Visualisation of the payoff scheme described by Eq. (1). For simplicity, the action space is \( \{n_i \in {\mathbb {N}} \mid 2 \le n_i \le 100\}\) and the reward parameter is \(R=2\). The Traveler’s Dilemma can be decomposed into a local and a global game. At a local level the game is a Prisoner’s Dilemma (black boxes). This happens when the action space is reduced to any pair of actions n and \(n+s\), where \( 1 \le s \le R-1 \). While at the global level, we can observe a coordination game (gray boxes). This level is defined as any pair of actions n and \(n+s\), where \( R \le s \le U-n\).

Paradox

Social dilemmas appear paradoxical in the sense that self-interested competing players, when rationally playing the Nash equilibrium, end up with a payoff that clearly goes against their self-interest. But with the Traveler’s Dilemma, the paradox goes further, as suggested in its original formulation3. Classical game theory proposes \(\{L,L\}\) as the Nash equilibrium of the game. However, it seems unlikely and implausible that, with R being moderately low, say \(R=2\), for individuals to play the Nash equilibrium. This has been confirmed in economic experiments where individuals rather choose values close to the upper bound of the interval. Such experiments have also shown that the chosen value depends on the reward parameter (R), where an increasing value of R shifts players’ decision towards \(\{L,L\}\)4. Nonetheless, classical game theory states that the Nash equilibrium of the game is independent of R.

Consequently, the aim of this paper is to seek and explore simple mechanisms through which the apparent non-rational cooperative behaviour can come about. We also examine the effect of the reward parameter on the game’s outcome. Given that the Traveler’s Dilemma paradox emerges in the classical game theory framework, we analyse the game using evolutionary game theory tools5,9,10. This dynamical approach allows us to explore adaptive behaviour outside of the stationary classical game theory framework. To be more precise, for this approach individuals dynamically adjust their actions according to their payoffs.

The key point of course is to understand how the system can converge to high claims. We show that this behaviour is possible because the Traveler’s Dilemma can be decomposed into a local and a global game. If the payoff maximisation is constrained to the local level, then the stairway to hell leads the system to the Nash equilibrium; given that locally the game is a Prisoner’s Dilemma. On the other hand, at a global level the game follows a coordination game structure, where the high claim actions are payoff dominant. Thus, for the system to reach a high claim equilibrium the maximisation process needs to jump from the local to the global level.

Our analysis led us to identify the mechanism of diversity as responsible for enabling this jump and preventing players from going down the stairway to hell. This mechanism works on the idea that to reach a high claim equilibrium, players have to benefit from playing a high claim. For a population setting, it means that players need to have the chance to encounter opponents also playing high. From a learning model point of view, it refers to the belief that the opponent will also play high, at least with a certain probability. If the belief is shared by both players, they should both play high and reach the cooperative equilibrium. Here, we explore these two types of models to unveil the mechanism leading to cooperation.

Population based models unveil diversity as the cooperative mechanism via the effect of mutations on the game’s outcome. This is shown for the replicator-mutator equation and the Wright–Fisher model. Similarly, a two-player learning model approach, more in line with human reasoning, shows that if players are free to adopt a higher payoff action from a diverse action set during their introspection process, they can reach the cooperative equilibrium. This result is obtained using introspection dynamics.

Finally, we explain how diversity is the underlying mechanism that enables the convergence to high claims in previously proposed models. To be more precise, we show that diversity is required because it allows for the maximisation process to jump from the local to the global level.

Replicator–mutator equation

Formulation

The replicator-mutator equation is a generalization of the replicator dynamics, which is a fundamental model to describe selection in a population. In replicator dynamics, the equation describes the evolution of an infinite population of n different types whose frequencies are \(x_1,\ldots ,x_n\). To model selection, the reproduction rate of each type i is determined by its fitness \(f_i\), which is derived from the payoff matrix of interactions among individuals in the population11. In addition to the typical formulation of the replicator dynamics, the equation includes mutations via the mutation matrix, \(Q = (q_{ji})\), where \(q_{ji}\) is the probability that type j mutates to type i12,13. Consequently, the mutation matrix Q is a row-stochasic matrix. Thus the replicator-mutator equation is described as

$$\begin{aligned} \dot{x_i} = \sum ^{n}_{j=1}x_jf_j(\vec {x})q_{ji} - x_i\phi \quad i=1,\ldots ,n \end{aligned}$$
(2)

where \(\vec {x}=(x_1,x_2,\ldots ,x_n)\) and \(\phi =\sum ^{n}_{i=1}x_if_i(\vec {x})\) denotes the average population fitness.

To model mutations, we consider a uniform random probability of mutating to other types, i.e.

$$\begin{aligned} q_{ij}=\frac{q}{n-1}, i \ne j, q_{ii} = 1-q, 1\le i, j \le n \end{aligned}$$
(3)

The mutation parameter q can take values from 0 to \(\frac{n-1}{n}\). When \(q=0\), the dynamics corresponds to the replicator dynamics, such that no mutations occur. While for \(q=\frac{n-1}{n}\), type i is equally likely to change to any other type or to remain as it is14.

Results

The replicator dynamics limit (\(q=0\)) results are in agreement with classical game theory predictions. As expected, L is the claim that dominates the population regardless of the reward parameter value (R). Specifically, for this dynamics the stairway to hell takes place, where dominated strategies are systematically eliminated through extinction until only the Nash equilibrium claim remains in the population (Fig. 2a). This behaviour is related to the theorem proposed in Hofbauer and Sandholm15, where evolutionary game dynamics—like the replicator dynamics—successfully eliminate strictly dominated strategies, given that they fulfil a set of requirements.

When mutations are allowed (\(q\ne 0\)), a different behaviour can be observed. The system reaches a coexistence equilibrium, where there is a claim with a higher frequency than the others. No extinction or dominance of actions takes place, due to the sustained presence of mutation events. Nonetheless, two types of this behaviour can be identified: for type 1 the system converges to a coexistence equilibrium where the frequency values of all claims are evenly distributed (Fig. 2b). For type 2, there is a single claim that nearly dominates the population given that its frequency is much larger than the rest (Fig. 2c).

To characterise how the system’s equilibrium changes according to the parameter values, the claim with highest frequency, the average claim and the average payoff were obtained for different different reward (R) and mutation strength (q) values. Figure 2d shows that the claim with highest frequency becomes larger for lower reward values (\(c_{highFreq} \sim 1/R\)). It also suggests that for reward values greater than \(R_{B}\approx 50\sqrt{q}\), the claim with highest frequency is always 2. This result has been obtained by estimating the best fit for the boundary curve, due to the nature of the system of equations in question. As expressed by Eq. (2), the system consists of 99 coupled nonlinear differential equations. Consequently, a numerical approach is a natural method to find the solution of the system, and therefore a numerical fit is appropriate to estimate the boundary expression.

On the other hand, the results for the average claim of the population (Fig. 2e) imply that there is a boundary dividing the parameter space for which the system either exhibits type 1 or type 2. In particular, for \(R \lessapprox 20\), the boundary is \(R_{B}\approx 50\sqrt{q}\), and for \(R \gtrapprox 20\), the boundary is \(R_{B}\approx 50q+10\). I.e if \(R<R_{B}\), then type 1 is observed; while if \(R>R_{B}\), then type 2 is observed. These boundaries were also obtained by estimating the best fit for the curves. The same results can be found via the average payoff of the population, defined as \(\sum ^{n}_{i=1}x_if_i(\vec {x})\) (Fig. 2f). The reason for this similarity is the close relationship between the claims (\(n_i\) and \(n_j\)) and the payoffs (\(\pi _{ij}\)), as shown in Eq. (1).

In general terms, the boundaries suggest that there is a critical mutation parameter value (q) for every reward value. This means that there is a minimum rate of mutations required for the maximisation process to jump from the local to the global level. As expected, for high reward values the system mainly converges to L, as the local level takes over the payoff matrix (see Fig. 1).

Overall, the results present the same outcome as economic experiments in terms of the reward parameter, where a low R value is needed for high claims to be played most frequently. Additionally, they reveal that mutations are necessary for high claims to be advantageous in the population. This means that diversity should be maintained in the population for high claims to be favourable.

Figure 2
figure 2

Replicator-mutator equation for the Traveler’s Dilemma. (a) Replicator dynamics limit (\(q=0\)). The dynamics show that dominated actions are systematically eliminated from the population. Only L remains and dominates the population. Without loss of generality, here the interval [2, 10] is considered to visualise the systematic elimination of actions on the same time scale. (b) Type 1 for the replicator-mutator dynamics. The system converges to a coexistence equilibrium where the frequency values of all claims are evenly distributed. (\(q=0.7\), \(R=2\), and \(c_{highFreq} = 96\)). (c) Type 2 for the replicator-mutator dynamics. There is a single claim that nearly dominates the population, as its frequency is much larger than the rest. (\(q=0.2\), \(R=30\), and \(c_{highFreq} = 2\)). (d) Claim with highest frequency, (e) average claim and (f) average payoff obtained by the replicator-mutator dynamics for different reward parameters (R) and mutation strength (q) values. Overall, the resulting dynamics reveal that mutations are necessary for high claims to be advantageous in the population. This means that diversity should be maintained in the population for high claims to be favourable.

Wright–Fisher model

Model

The basic model for the evolution of populations including genetic drift, mutation and selection is the Wright–Fisher model, see e.g.16,17,18,19. It is therefore natural to also analyse the Traveler’s Dilemma within that framework. In particular, our aim is to further explore how the population can converge to high claims.

The Wright–Fisher model describes the evolution of a population with discrete, non-overlapping generations of fixed size N. From the current generation, the next generation is obtained by sampling with replacement, with sampling weights proportional to the fitness of the individuals. For each sampled individual there is a probability \(\mu \) of mutation.

In our context, an individual is a player of the Traveler’s Dilemma with a certain value as her claim20, such that a mutation means that the value of the claim changes by \(\pm \delta \); unless this would result in a value smaller than L or larger than U, in which case simply L or U, respectively, is chosen.

The expected payoff of each individual is determined by playing the Traveler’s Dilemma with the other individuals present in the population, excluding self-interactions. The accumulated payoff \(\Pi \) then defines the fitness of an individual, that is, the sampling weight in the replacement scheme, via

$$\begin{aligned} f = e^{\rho \Pi } \end{aligned}$$
(4)

where the selection intensity coefficient \(\rho \) modulates the magnitude of the impact of fitness differences in the game dynamics. The larger the selection intensity, the stronger the influence of fitness differences and the less important is the role of stochasticity on the dynamics.

The process of fitness dependent sampling with replacement and mutation probability \(\mu \) is repeated for t time steps. We fix the population size to \(N=100\) and consider \(\mu , \delta , \rho \) as variable parameters. We end the simulation after \(t=1000\) generations, because the dynamics converges well before that.

Results

For the Wright–Fisher model, a large mutation probability, \(\mu \), and a large maximal mutation size, \(\delta \), enable the convergence to high claims (see Fig. 3). The reason for this is that mutation introduces diversity in the population, larger mutation rates and sizes lead to greater diversity, and in particular, to the appearance of high claim individuals in the population. When players using these “suboptimal actions” play against each other, a higher payoff results, both collectively and individually.

In Fig. 3, it can be observed that for large \(\rho \), the system converges to a low claim in a larger region of the parameter space. The reason for this is that a high value of \(\rho \) reduces the population’s diversity, as by the exponential payoff-to-fitness mapping of Eq. (4), since \(\rho \) is the amplification level of fitness differences. Therefore, a large value of \(\rho \) magnifies even small fitness differences, that is, amplifies the differences between sampling weights. Consequently, for high \(\rho \) values, a global maximisation process can only appear if the mutation rate and size are sufficiently large. Otherwise, the maximisation process will be constrained to a local level, which produces the convergence to the Nash equilibrium via the stairway to hell. It can also be noted that when \(\rho \) is low, the level of stochasticity is large, which means that noise dominates the game dynamics, instead of the payoff differences.

Additionally, the results show that for high reward values the population converges to lower claims, as it is also observed in economic experiments. In particular, the outcome of the simulations propose the following relationship between the parameters and the average claim: \(c_{avg} \sim (\mu \delta )/(R\rho )\). Similarly, it suggests that there is a boundary dividing the parameter space for which the system either converges to high claims or to L, defined by \(\delta = \alpha \mu ^{-b}\), where \(\alpha \) and b are constants such that \(\alpha \sim R\rho \) and \(b \sim \rho \) (see dotted lines in Fig. 3). This boundary is a result of the maximisation process shifting from the local to the global level. As suggested by the previous expression, the eccentricity of the boundary is directly proportional to the selection intensity coefficient. The reason for this is that the selection intensity is inversely proportional to the population’s diversity, so for lower values of \(\rho \) it is easier to reach a high claim equilibrium. However, when the maximal mutation size is low, the system can not easily escape the local game and jump to the global game.

The boundary formula also suggests that when the reward value increases, the semi-major axis of the hyperbola that describes the boundary also increases, because for a greater reward value the region of the local game becomes larger (see Fig. 1). This means that greater values of \(\delta \) and \(\mu \) are needed to change from the local to the global maximisation process.

It should be noted that the average claim is equivalent to the average payoff of the population, given that a single claim dominates the population in the equilibrium.

On the whole, the Wright–Fisher model unveils that mere heterogenity in the population is not sufficient to enable the convergence to high claims. This heterogenity should be characterised by being diverse, i.e. there should be a presence of claims that are distant enough from each other in the action space, in order for the cooperative equilibrium to be reached. In fact, the model shows how the local maximisation process can be escaped thanks to diversity, introduced by mutations, in a population based model including genetic drift and selection.

Figure 3
figure 3

Wright–Fisher model with selection and mutation for the Traveler’s Dilemma. The graphs show the average claim of a population of size \(N=100\) after \(t=1000\) time steps, for different reward and selection intensity values: (a) \(R=2, \rho =1.0\) (b) \(R=2, \rho =0.1\) (c) \(R=2, \rho =0.01\), and (d) \(R=5, \rho =1.0\) (e) \(R=5, \rho =0.1\) (f) \(R=5, \rho =0.01\). In general, a large mutation probability (\(\mu \)) and a large maximal mutation size (\(\delta \)) allow for convergence to high claims, given that these quantities are proportional to the diversity present in the population. Particularly, there is a boundary dividing the parameter space for which the system either converges to high claims or to L, as shown by the dotted lines. For the strong selection regime, the system tends to converge to low claims because a high value of \(\rho \) reduces the effective population diversity by magnifying even small payoff differences. For very low \(\rho \) values, the level of stochasticity increases, given that the selection intensity parameter modulates the impact of fitness differences in the game dynamics. Finally, for high reward values the population converges to lower claims, as it is observed in economic experiments.

Introspection dynamics

Model

Introspection dynamics is a learning model used to describe adaptive behaviour via introvert reasoning21,22. The basic idea is that a player, after considering her payoff in an encounter with an opponent, checks whether a randomly chosen alternative action—sampled from the uniform distribution on the [LU] interval—would have given her a better payoff against a given action chosen by her opponent. This process is called the introspection. A new action then is adopted with a probability depending on the payoff difference. Thus, when the payoff of the realised action is \(\pi \), while the alternative action would have yielded \({\tilde{\pi }}\), the probability of adopting the latter action is computed using the Fermi function23,24,25

$$\begin{aligned} p = \frac{1}{1+e^{- \beta ({\tilde{\pi }} - \pi )}} \end{aligned}$$
(5)

with a selection intensity coefficient \(\beta \). For weak selection (\(\beta \rightarrow 0\)), payoffs are not important in the learning process as any alternative action is randomly adopted. For strong selection (\(\beta \rightarrow \infty \)), players exclusively adopt the alternative action if their payoff matches at least the realised payoff22. Therefore, \(\beta \) determines the magnitude of the impact that payoffs have on the game dynamics. The parameter \(\beta \) and the parameter \(\rho \) of the Wright–Fisher model have different effects on the game dynamics, even though they share the same interpretation.

In this learning model, there are only two players that from time to time perform the introspection process, i.e. consider the adoption of a counterfactual action given that their opponent’s action is fixed. For the Traveler’s Dilemma, the model is used in the context of a one-shot game, where players adjust their behaviour using introspection.

While the proposed model may be stylised and coarse, the principle is not psychologically implausible, as humans can both evaluate putative alternatives and perform a coarse choice when the number of details in the decision process, here the number of possible claims, becomes too large.

The model thus assumes a rather simple cognitive mechanism, but as we shall see, it can lead to the emergence of cooperation. This is a novel approach and result in the context of the Traveler’s Dilemma, where noise and errors have been previously proposed as the cause for this behaviour.

Results

For the introspection dynamics, the system converges to large claims in the strong selection regime for low values of R (see Fig. 4a). The dynamics, however, does not converge to a specific value, but instead oscillates in the high claim region. This is expected from the way the optimisation process operates. First, the global optimisation is performed, then the model shifts to a local maximisation. Given that at a local level the system does not have a maximum—due to the stairway to hell—the system exhibits oscillations in the large claim region. For the weak selection regime, as expected, the system becomes increasingly stochastic, until it comes to fluctuate across the whole action space (see Fig. 4b).

The stationary distribution of actions, derived using the main analytical result of Couto, Giaimo and Hilbe22, allows us to calculate the average claim and average payoff for different values of the reward parameter (R) and the selection intensity coefficient (\(\beta \)). As shown in Fig. 4c,d, strong selection and a low reward value are requirements for the convergence to high claims. This result is in agreement with economic experiments findings, where individuals choose high claims when the reward parameter is low. Such a decision is purely motivated by payoff maximisation, as in the case of strong selection, and not by mistakes or noise in the decision-making process.

To be more precise, for strong selection, the average claim follows the relation: \(c_{avg} \sim 1/R\). In particular, \(R_{B} \approx 9/\beta \) describes the boundary such that for \(R > R_{B}\) the system always converges to 2. This boundary is obtained by estimating the best curve that fits the borderline. It should be noted that the complexity of the system is high, there are 99 possible actions per player. Thus, to compute the invariant distribution it is necessary to invert a 9801x9801 matrix, such calculation is done numerically, which makes a numerical fit an appropriate method to find the borderline. Overall, this boundary suggests that above a critical reward value the system converges to the lowest possible claim, because the maximisation process is constrained to the local level. This can be seen more clearly in Fig. 1, where it is explained that a larger R increases the region of the local game. On the other hand, for the weak selection regime, the dynamics is characterised by stochasticity for all values of R. It should be noticed that the same general results hold for the average payoff because the payoff values are tightly related to the claim values, as explained by Eq. (1).

Once again it is found that for the convergence to large claims some diversity is needed, that is, the presence of a certain fraction of large claim players or a sufficiently high probability that agents select large claims. The random adoption of an action that outperforms the current one in the introspection dynamics generates and maintains such diversity.

Thus, the parameters \(\rho \) and \(\beta \) cause an overall different effect in the system’s dynamics. For the Wright–Fisher scenario, high values of \(\rho \) prevent a global maximisation process by reducing the effective diversity of the population. On the other hand, \(\beta \) does not have any effect on the diversity for the introspection dynamics, because the alternative random strategy is always chosen from a uniform distribution regardless of the magnitude of \(\beta \). However, it should be noted that \(\rho \) and \(\beta \) share the same interpretation, as they are both parameters that modulate the impact of payoffs on the game dynamics. In particular, when \(\rho \) or \(\beta \) have low values, the level of stochasticity is large, which means that noise dominates the game dynamics, instead of the payoff differences.

Finally, we should note that, in contrast to our findings for the Traveler’s Dilemma, cooperation does not emerge in the Prisoner’s Dilemma when played using introspection dynamics. There, both players learn to defect at all times, that is, play Nash, in the limit of strong selection22. Therefore, diversity is only effective in promoting cooperation when the structure of the game allows for it. The Traveler’s Dilemma is composed on a local and a global game, where jumping from the local to the global level maximisation is required to reach a cooperative equilibrium.

Figure 4
figure 4

Introspection dynamics for the Traveler’s Dilemma. Dynamics of the system for strong, (a) \(\beta =1.0\), and weak selection intensity, (b) \(\beta =0.1\), both for \(R=2\). For the strong selection regime, the system converges to large claims thanks to the global optimisation process. In particular, the system oscillates in the high claim region, given that at a local level the system does not have a maximum. On the other hand, for weak selection the system becomes increasingly stochastic, until it fluctuates across all the action space. The stationary distribution of actions allows the analytical calculation of the (c) average claim and the (d) average payoff for different reward parameter (R) and selection intensity (\(\beta \)) values. It can be seen that strong selection and a low reward are needed for the convergence to high claims.

Comparison to other models

Previous models propose different methods to explain the convergence to high claims in the Traveler’s Dilemma. From our perspective, however, they operate on the same underlying mechanism to allow for cooperation, namely diversity enabling a global maximisation to dominate over a local optimisation and thereby preventing the stairway to hell.

Two popular models are Goeree and Holt26, and Manapat et al.27 The first is a dynamic learning model with a logit rule, for which a player’s decision probabilities are proportional to an exponential function of expected payoffs. In general, Goeree and Holt26 show that “noisy learning” allows for players to converge to high claims. This means that when players have the possibility to adopt non-Nash or “suboptimal” strategies, due to a “noisy learning process”, the system can converge to high values. From our perspective, this is equivalent to having diversity in the game such that a global maximisation process occurs.

In comparison to the logit rule learning model, the introspection dynamics allows a better understanding of the paradox posed by the Traveler’s Dilemma. The reason for this is that the introspection dynamics separates the notion of rationality from the mechanism of diversity. If it is assumed that a rational player is defined as a player that maximises its own payoff, then for strong selection (\(\beta \rightarrow \infty \)) players act rationally, as they always choose the claim with higher payoff in the introspection process. This allows for diversity to be introduced through the randomly chosen alternative action of the introspection step. Conversely, the logit rule introduces diversity via errors in the payoff maximisation.

In terms of evolutionary game theory models, in the stochastic approach formulated by Manapat et al.27 mutations lead to high claims. Overall, the results we obtained are in agreement with their results, as they also find it necessary to include mutations in their model to reach large values. Moreover, our and their approach exhibit the same behaviour in terms of the selection intensity coefficient. Nonetheless, in Manapat et al. diversity is not identified as the general underlying mechanism for the convergence to high claims. We have explained in the Wright–Fisher model section how diversity enables the jump towards cooperation in a population, when the Traveler’s Dilemma is analysed using similar stochastic evolutionary dynamics.

Discussion

We have shown that diversity can enable cooperation to emerge in a game like the Traveler’s Dilemma. This mechanism avoids the paradox of classical game theory where supposedly rational players inevitably end up in a Nash equilibrium, which neither maximises the individual nor the collective payoff. As already pointed out by Kaushik Basu3, and as confirmed by economic experiments4, such behaviour seems implausible for human players when faced with this game.

To analyse the dilemma, we proposed its decomposition into a local and a global game. Locally, the game structure is that of a Prisoner’s Dilemma, while globally it is a coordination game. In consequence, for a cooperative high claim equilibrium to be reached, it is necessary to jump from the local to the global payoff maximisation. In order to explore this idea, we used a dynamical approach, including population based models and a learning model. The replicator-mutator equation and the Wright–Fisher model for the former, and introspection dynamics for the latter. Comparing the results allowed us to identify a general mechanism that may lead to the convergence to strategies far from the Nash equilibrium that leave the players much better off. Such a mechanism that favours cooperation is diversity. Here, diversity, understood as the presence of “suboptimal strategies”, allows the jump from the local to the global level maximisation. To be more precise, there should be a sustained heterogenity characterised by claims that are distant from each other in the action space. Overall this can lead to cooperation because by considering “suboptimal strategies” that play against each other it is possible to reach higher payoffs, both collectively and individually.

Additionally, we found that for all models the equilibrium claim is inversely proportional to the reward value (\(c \sim 1/R\)), this implies that R delimits the competition between the local and global level. The stark borderline that appears in all models dividing the convergence to high claims and the convergence to the lowest possible claim is an evidence for this. Indeed, R defines the region in the action space for which the Traveler’s Dilemma is no longer a Prisoner’s Dilemma, and becomes a coordination game. This result is in agreement with economic experiments, where it is shown that for increasing reward values the chosen claim decreases4.

This perspective also allowed us to identify diversity as the main mechanism of previous models used to analyse the Traveler’s Dilemma26,27, although these models are very different.

Our approach may also help to better understand how humans play a game like the Traveler’s Dilemma4, typically settling on high claims instead of going to the Nash equilibrium. While the introspection dynamics is rather stylised and coarse, it is not completely implausible as humans also tend to first make a coarse decision when faced with a large number of options. And in a game like the Traveler’s Dilemma, a belief that opponents play “suboptimal” actions may become self-confirming when it leads to higher payoffs. From our perspective, it suffices that the probability to encounter a suboptimal high claim action is sufficiently high. This is ensured by diversity, as previously explained.

Overall, the mechanism of diversity proposed here benefits players both at a collective and at an individual level. Therefore, exploring other games that share the same characteristic of having conflicting interests between a local and a global payoff maximisation might be helpful to understand and induce cooperation in scenarios such as price wars. A situation where undercutting the opponent is tempting but reaching a cooperative equilibrium would be beneficial for everybody.