Skip to main content
Log in

Correlation and unmediated cheap talk in repeated games with imperfect monitoring

  • Original Paper
  • Published:
International Journal of Game Theory Aims and scope Submit manuscript

Abstract

This paper studies n-player \((n\ge 3)\) undiscounted repeated games with imperfect monitoring. We prove that all uniform communication equilibrium payoffs of a repeated game can be obtained as Nash equilibrium payoffs of the game extended by unmediated cheap talk. We also show that all uniform communication equilibrium payoffs of a repeated game can be reached as Nash equilibrium payoffs of the game extended by a pre-play correlation device and a cheap-talk procedure that only involves public messages; furthermore, in the case of imperfect public and deterministic signals, no cheap talk is conducted on the equilibrium path.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. There are real world examples of mediated communications that seem to help collusion or cooperation, such as those in many cartels (see Harrington 2006; Marshall and Marx 2012) and the World Trade Organization (see Bagwell and Staiger 2009).

  2. See also Mertens et al. (2014) for a similar characterization of uniform communication equilibrium payoff set for the two-player case.

  3. Uniform equilibrium is a widely adopted equilibrium notion for the undiscounted infinitely repeated game. It is an approximate Nash equilibrium of the undiscounted finitely repeated game with sufficiently many stages; it is also an approximate Nash equilibrium of the discounted infinitely repeated game for all large discount factors. See Mertens et al. (2014) for a comprehensive discussion of the uniform equilibrium concept.

  4. There is a large literature on collusion in repeated games. See Awaya and Krishna (2016) for a recent study.

  5. Starting from Barany (1992), there is also an extensive literature studying the implementation of correlated and communication equilibria in normal-form games by unmediated cheap talk. Subsequent contributions include Forges (1990), Ben-Porath (1998, 2003), Urbano and Vila (2002), Gerardi (2004), Krishna (2007), Vida and Forges (2013), etc. See also Forges (2009) and Halpern (2008) for detailed surveys and the references therein.

  6. In particular, we do not impose any computational restrictions on the players.

  7. See also the protocols of Chaum et al. (1988), and of Beaver (1991).

  8. Abraham et al. (2006), Abraham et al. (2008), and Heller (2010) study cheap-talk implementation of correlated equilibria with MPC protocols in norm form games. Heller et al. (2012) extend the result to multistage games with perfect information. See also Dodis and Rabin (2007) and Katz (2008) for surveys on the connections between game theory and cryptography.

  9. Urbano and Vila (2004) also apply cryptographic tools to show that any correlated equilibrium payoff of two-player repeated games with imperfect monitoring can be implemented by unmediated communication under the assumption that players are computationally restricted.

  10. See Sect. 6.2 for further comparisons with the existing literature.

  11. It follows from the argument in Myerson (1982) (page 71) that in this formulation a communication device allows for multi-round communications at every stage game.

  12. Multi-round cheap talk in the current paper mainly serves two purposes: (1) detecting deviations: players can announce previous signals to check whether a deviation occurs; (2) facilitating multi-party computation: players exchange inputs for computation and then share outputs. It plays a different role than that in the literature on long cheap talk in sender-receiver games with incomplete information such as Aumann and Hart (2003) and Krishna and Morgan (2004), where random message exchanges improve information transmission as a coordination device.

  13. See Sorin (1990), Fudenberg and Levine (1991), and also Lemma 2.15 in Renault and Tomala (2011).

  14. Applying a revelation principle in Forges (1986) and Mertens et al. (2014), Renault and Tomala (2004) consider canonical communication equilibria of the repeated game. They do not address the question of how to get rid of the mediator, which is the main focus of the current paper.

  15. See also Renault and Tomala (2004) for details.

  16. In repeated games with imperfect monitoring, equilibrium play can be used as an endogenous correlation device; but the extent to which such correlation can replace the mediator is unknown in general. See Lehrer (1990, 1991) and Gossner and Tomala (2007) for some progress on this issue.

  17. This result also holds in the two-player case. See Lehrer (1992).

  18. Although we assume that there are at least three players throughout the paper, some results here hold for the two-player case too. Thus in this and the next sections, if a result requires at least three players then we emphasize it in the statement of the result.

  19. Recall that almost public cheap talk involves a pre-play plain conversation phase with public and private messages and only one round of public conversation at each subsequent stage, and that a cheap-talk extension involves several rounds of plain conversations with public and private messages at every stage.

  20. Both protocols are valid under the assumption that there are at least three players.

  21. Alternatively, one can adopt the protocols in Barany (1992) (for the case with four or more players) or Ben-Porath (1998) (for the case with three players) to prove Theorem 4.2, under the assumption that players can publicly verify their past communication records. Public verification means that each player has the option to ask for the revelation of past messages to all players, and these messages are assumed to be recorded. That is, all other players will detect any unilateral deviation of a player at the communication stage, if they choose to verify past records. Both Barany (1992) and Ben-Porath (1998) make this assumption in their constructions of the protocols.

  22. See Tomala (1998) for a characterization of pure-strategy uniform equilibria in repeated games with public monitoring.

  23. Such a result is known for extensive-form games with perfect information. See Myerson (1986).

  24. I thank an anonymous referee for encouraging me to include this example.

  25. This approachability argument closely follows the analysis in Example 3.7 in Renault and Tomala (2004).

  26. Since in this example player 3 only plays one of two actions in the punishment phase, it is enough to draw two numbers. In general, the device draws \(|A^i|\) numbers for each encryption.

  27. While in this example it is always optimal for player 4 to be truthful, in general we need to guarantee that players do not have incentives to announce invalid encoding keys.

  28. See also Renault (2000).

  29. Here \({\mathbf {1}}_{\{\cdot \}}\) denotes the Dirac measure.

  30. Note that this is not a problem in the 2-player case.

  31. For example, Awaya and Krishna (2016) show that in a class of discounted repeated games, public communication between stages can lead to more collusion.

  32. I thank an anonymous referee for encouraging this comparison.

  33. The application of the secure MPC is the same as in Heller et al. (2012); the “signal sub-phase” in our proof of Theorem 4.3 resembles the “monitoring of the previous stage” through public announcement in Heller et al. (2012) and Solan and Vieille (2002).

  34. See Compte (1998), Kandori and Matsushima (1998), Obara (2009), and Zheng (2008).

  35. See Sugaya (2015) and Sugaya and Wolitzky (2015) for recent development on discounted repeated games with imperfect private monitoring, both with and without a mediator.

  36. This does not contradict with the statement that the obedient and truthful strategy profile is a uniform equilibrium with the correlation device and public cheap talk, since recommendations are drawn from identical and independent distributions, the probability of such “unlucky” draws can be made arbitrarily small provided that there are sufficiently many stages, and expected payoffs in the repeated game are evaluated ex ante in a uniform equilibrium. See Lehrer (1992) and Mertens et al. (2014) for similar constructions.

  37. See also Lemma 2.13 in Renault and Tomala (2011).

  38. In the case with stochastic signals, punishments occur even on the equilibrium path. Thus one need to modify \(P_{III}\) so that the correlated minmax action profile \(a^{-i}\) is drawn independently T times.

  39. Specifically, if player \(i+1\) announces a key \(l^i\) different from \(l^i[k]\), then the probability that his announcement is in the set \(\{l^i[1],\ldots ,l^i[K^i]\}\) is at most \((K^i-1)/(L^i-1)<\varepsilon /M\). That is, such a deviation is detected by all other players with probability at least \(1-\varepsilon /M\).

  40. The idea of putting a small probability on all possible actions to improve monitoring is first used in Lehrer (1992).

  41. Both (1) and (2) in this part are adapted from Renault and Tomala (2004), with the error terms adjusted to fit in the rest of the proofs.

  42. It is at least theoretically appealing to have infinitely rounds of cheap talk as oppose to a given finite upper bound on the length. Since such cheap talk has no payoff consequence, one can think of players making moves “at infinity,” as in Aumann and Hart (2003) and Vida and Forges (2013). See Section 4.2 in Aumann and Hart (2003) for further discussion on the infinite length of the cheap talk phase. One distinction, compared with Aumann and Hart (2003) and Vida and Forges (2013), is that since we study infinite repeated games, we always need unbounded cheap talk. Alternatively, one could weaken the pre-play cheap-talk requirement and let players perform MPC for finitely many times at the beginning of each block that consists of T stages of play.

  43. The result then follows by applying Proposition 4.5 in Mertens et al. (2014).

  44. Note that by construction both \(l^i_t\) and \(k^i_t\) are uniformly distributed.

  45. Since draws are i.i.d., this follows from Chebyshev’s inequality or Azuma-Hoeffding inequality. See for example Renault (2000).

References

  • Abraham I, Dolev D, Gonen R, Halpern J (2006) Distributed computing meets game theory: robust mechanisms for rational secret sharing and multiparty computation. In: Proc. 25 ACM symp. Principles of distributed computing, pp 53–62

  • Abraham I, Dolev D, Halpern J (2008) Lower bounds on implementing robust and resilient mediators. In: TCC

  • Aumann R (1987) Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55:1–18

    Article  Google Scholar 

  • Aumann R, Hart S (2003) Long cheap talk. Econometrica 55:1–18

    Article  Google Scholar 

  • Aumann R, Maschler M (1995) Repeated games with incomplete information. The MIT Press, Cambridge

    Google Scholar 

  • Awaya Y, Krishna V (2016) On communication and collusion. Am Econ Rev 106(2):285–315

  • Bagwell K, Staiger R (2009) The WTO: theory and practice, No. w15445. National Bureau of Economic Research

  • Barany I (1992) Fair distribution protocols or how players replace fortune. Math Oper Res 17:327–340

    Article  Google Scholar 

  • Beaver D (1991) Secure multiparty protocols and zero-knowledge proof systems tolerating a faulty minority. J Cryptogr 4:75–122

    Google Scholar 

  • Ben-Or M, Goldwasser S, Wigderson A (1988) Completeness theorems for non-cryptographic fault-tolerant distributed computation. In: Proc. 20 STOC ACM, pp 1–10

  • Ben-Porath E (1998) Correlation without mediation: expanding the set of equilibrium outcomes by cheap pre-play procedures. J Econ Theory 80:108–122

    Article  Google Scholar 

  • Ben-Porath E (2003) Cheap talk in games with incomplete information. J Econ Theory 108:45–71

    Article  Google Scholar 

  • Blackwell D (1951) Comparison of experiments. In: Proceedings of the second Berkeley symposium on mathematical statistics and probability, University of California Press, Berkeley and Los Angeles, pp 93–102

  • Blackwell D (1956) An analog of the minmax theorem for vector payoffs. Pac J Math 65:1–8

    Article  Google Scholar 

  • Chaum D, Crepeau C, Damgard I (1988) Multiparty unconditionally secure protocols. In: Proc. 20th symp. Theory of computing, pp 11–19

  • Compte O (1998) Communication in repeated games with imperfect private monitoring. Econometrica 66:597–626

    Article  Google Scholar 

  • Deb J, González-Diáz J, Renault J (2013) Uniform folk theorems in repeated anonymous random matching games, working paper

  • Dodis Y, Rabin T (2007) Cryptography and game theory. In: Nisan N, Roughgarden T, Tardos E, Vazirani V (eds) Algorithmic game theory. Cambridge University Press, Cambridge, pp 181–205

    Chapter  Google Scholar 

  • Forges F (1986) An approach to communication equilibria. Econometrica 54:1375–1385

    Article  Google Scholar 

  • Forges F (1990) Universal mechanisms. Econometrica 58:1341–1364

    Article  Google Scholar 

  • Forges F (2009) Correlated equilibrium and communication in games. In: Meyers R (ed) Encyclopedia of complexity and systems science. Springer, New York, pp 1587–1596

    Chapter  Google Scholar 

  • Fudenberg D, Levine D (1991) An approximate folk theorem with imperfect private information. J Econ Theory 54:26–47

    Article  Google Scholar 

  • Gerardi D (2004) Unmediated communication in games with complete and incomplete information. J Econ Theory 114:104–131

    Article  Google Scholar 

  • Gossner O, Tomala T (2007) Secret correlation in repeated games with imperfect monitoring. Math Oper Res 32:413–424

    Article  Google Scholar 

  • Halpern J (2008) Computer science and game theory: a brief survey. In: Durlauf S, Blume L (eds) The new Palgrave dictionary of economics, 2nd edn. Palgrave Macmillan, Basingstoke

    Google Scholar 

  • Harrington J (2006) How do cartels operate? Foundations and trends in microeconomics. Now Publishers Inc, Breda

    Google Scholar 

  • Heller Y (2010) Minority-proof cheap-talk protocols. Games Econ Behav 69:394–400

    Article  Google Scholar 

  • Heller Y, Solan E, Tomala T (2012) Communication, correlation and cheap-talk in games with public information. Games Econ Behav 74:222–234

    Article  Google Scholar 

  • Kandori M, Matsushima H (1998) Private observation, communication and collusion. Econometrica 66:627–652

    Article  Google Scholar 

  • Katz J (2008) Bridging game theory and cryptography: recent results and future directions. In: 5th TCC, Springer, pp 251–272

  • Kohlberg E (1975) Optimal strategies in repeated games with incomplete information. Int J Game Theory 4:7–24

    Article  Google Scholar 

  • Krishna RV (2007) Communication in games of incomplete information: two players. J Econ Theory 132:584–592

    Article  Google Scholar 

  • Krishna V, Morgan J (2004) The art of conversation: Eliciting information from experts through multi-stage communication. J Econ Theory 117:147–179

    Article  Google Scholar 

  • Lehrer E (1990) Nash equilibria of n-player repeated games with semi-standard information. Int J Game Theory 19:191–217

    Article  Google Scholar 

  • Lehrer E (1991) Internal correlation in repeated games. Int J Game Theory 19:431–456

    Article  Google Scholar 

  • Lehrer E (1992) Correlated equilibria in two-player repeated games with nonobservable actions. Math Oper Res 17:175–199

    Article  Google Scholar 

  • Marshall R, Marx L (2012) The economics of collusion: Cartels and bidding rings. The MIT Press, Cambridge

    Google Scholar 

  • Mertens J-F, Sorin S, Zamir S (2014) Repeated games. Cambridge University Press, Cambridge

    Google Scholar 

  • Myerson R (1982) Optimal coordination mechanisms in generalized principal-agent problems. J Math Econ 10:67–81

    Article  Google Scholar 

  • Myerson R (1986) Multistage games with communication. Econometrica 54:323–358

    Article  Google Scholar 

  • Obara I (2009) Folk theorem with communication. J Econ Theory 144:120–134

    Article  Google Scholar 

  • Rabin T, Ben-Or M (1989) Verifiable secret sharing and multiparty protocols with honest majority. In: ACM symp. theory of computing, pp 73–85

  • Renault J (2000) On two-player repeated games with lack of information on one side and state dependent signaling. Math Oper Res 25:552–572

    Article  Google Scholar 

  • Renault J, Tomala T (2004) Communication equilibrium payoffs in repeated games with imperfect monitoring. Games Econ Behav 49:313–344

    Article  Google Scholar 

  • Renault J, Tomala T (2011) General properties of long-run games. Dyn Games Appl 1:319–350

    Article  Google Scholar 

  • Solan E (2001) Characterization of correlated equilibria in stochastic games. Int J Game Theory 30:259–277

    Article  Google Scholar 

  • Solan E, Vieille N (2002) Correlated equilibrium in stochastic games. Games Econ Beh 38:362–399

    Article  Google Scholar 

  • Sorin S (1990) Supergames. In: Ishiishi T, Neyman A, Tauman Y (eds) Games theory and its applications. Academic Press, Waltham, pp 43–63

    Google Scholar 

  • Sugaya T (2015) The folk theorem in repeated games with private monitoring, working paper

  • Sugaya T, Wolitzky A (2015) On the equilibrium payoff set in repeated games with imperfect private monitoring, working paper

  • Tomala T (1998) Pure equilibria of repeated games with public observation. Int J Game Theory 27:93–109

    Article  Google Scholar 

  • Tomala T (2009) Perfect communication equilibria in repeated games with imperfect monitoring. Games Econ Behavior 67:682–694

    Article  Google Scholar 

  • Urbano A, Vila J (2002) Computational complexity and communication: coordination in two-player games. Econometrica 70:1893–1927

    Article  Google Scholar 

  • Urbano A, Vila J (2004) Unmediated communication in repeated games with imperfect monitoring. Games Econ Behav 46:143–173

    Article  Google Scholar 

  • Vida P, Forges F (2013) Implementation of communication equilibria by correlated cheap talk: the two-player case. Theor Econ 8:95–123

    Article  Google Scholar 

  • Zheng B (2008) Approximate efficiency in repeated games with correlated private signals. Games EconBehav 63:406–416

    Article  Google Scholar 

Download references

Acknowledgements

I am grateful to Paulo Barelli and Hari Govindan for their guidance and encouragement. I would like to thank the associated editor, two anonymous referees, Yu Awaya, Tilman Börgers, and Tristan Tomala for helpful comments and suggestions that improve the exposition of the paper. All remaining errors are my own.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng Liu.

Appendix A: Proofs

Appendix A: Proofs

We first present the proofs of the results in the public monitoring case. In particular, to prove Proposition 5.2, we construct correlated equilibria in which no cheap talk is used on the equilibrium path. Next we explain how to generalize this proof to establish Proposition 4.1 in the private monitoring case. We then show that Theorem 4.2 follows from Proposition 4.1 and an application of the MPC protocols introduced in the Online Appendix. Finally, we outline a detailed sketch of proof of Theorem 6.1 (the stochastic-signal case).

1.1 Proof of Proposition 5.1

Fix a canonical communication equilibrium \(({\bar{c}},\tau ^*)\) with payoff \(w\in C\), where \({\bar{c}}\) is a canonical communication device and \(\tau ^*\) is truthful and obedient. Under public monitoring, all players observe the same public signal at each stage. It is therefore without loss to assume that the mediator observes the realized public signals. Thus, for each \(t, {\bar{c}}_t\) can be viewed as a mapping from the set of past recommendations and public signals \((a_1,s_1,\ldots ,a_{t-1},s_{t-1})\) to the set of distributions of actions \(\Delta (A)\).

Now we define an autonomous device \({\hat{c}}=({\hat{c}}_t)\), which does not observe the public signals. First set \({\hat{c}}_1={\bar{c}}_1\), draw \(a_1\) from \({\hat{c}}_1\), and recommend each player i to play \(a^i_1\). Next for each \(s_1\in S\), let \({\hat{c}}_2(a_1)(s_1)={\bar{c}}_2(a_1,s_1)\), that is, \({\hat{c}}_2\) maps A into \(\Delta (A)^S\). Then draw \((a_2(s_1))_{s_1\in S}\) from \({\hat{c}}_2(a_1)\) and inform each player i the vector \((a^i_2(s_1))_{s_1\in S}\). Inductively, for each \(t\ge 2\) and \((s_1,\ldots ,s_{t-1})\), let \({\hat{c}}_t(a_1,\ldots ,a_{t-1})(s_1,\ldots ,s_{t-1})={\bar{c}}_t(a_1,s_1,a_2(s_1),s_2,\ldots , a_{t-1}(s_1,\ldots ,s_{t-2}),s_{t-1})\), so that \({\hat{c}}_t\) maps \(A^{t-1}\) into \(\Delta (A)^{S^{t-1}}\). Similarly, draw \((a_t(s_1,\ldots ,s_{t-1}))_{(s_1,\ldots ,s_{t-1})\in S^{t-1}}\) and inform each player the corresponding vector of recommendations.

Given the constructed autonomous device \({\hat{c}}\), the obedient strategy of each player i is to play the recommended action \(a^i_t(s_1,\ldots ,s_{t-1})\) if the realized public history is \((s_1,\ldots ,s_{t-1})\) at stage t. Since \({\hat{c}}\) is defined without referring to the actual play, no player learns more information under \({\hat{c}}\) than under \({\bar{c}}\) after any realized history at any stage. It follows that the obedient strategy profile is an equilibrium of this extended game, with the corresponding equilibrium payoff w. This completes the proof.

1.2 Proof of Proposition 5.2

Fix any \(w\in JR\) with \(w=u(p)\) for some \(p\in {\mathcal {P}}\). We construct a correlated equilibrium with public cheap talk such that the equilibrium payoff vector is w and cheap talks are only conducted off the equilibrium path. By Proposition 4.5 in Mertens et al. (2014), it suffices to show that for any \(\varepsilon _0>0\), there exist \(T\in \mathbb {N}\) and an \(\varepsilon _0\)-correlated equilibrium with public cheap talk of the T-period finitely repeated game with payoff within \(\varepsilon _0\) of w.Footnote 37 Recall that \(M=\max _{i,a}|u^i(a)|\). Let \(\varepsilon =\varepsilon _0/(1+5M)\). Let \(T_1\) be such that \(T_1\ge \max \{T_0, (M/\varepsilon )^6\}\), where \(T_0\) is defined in the “approachability strategy” \(c^a\) at the end of Sect. 3 [see inequality (6)]. Let \(T_2\) be such that \(T_2\ge T_1/\varepsilon \) and for any \(T\ge T_2, T\exp (-\varepsilon \sqrt{T}/n|A|)\le \varepsilon \). Fix \(T\ge T_2\).

Define a correlation device \({\bar{c}}=(\Omega , P)\) as follows:

  • \(\Omega =\Omega _{I}\times \Omega _{II}\times \Omega _{III}\), where \(\Omega _{I}=\prod _{i\in N}\Omega ^i_{I}, \Omega ^i_{I}=\prod _{t=1}^T\Omega ^i_{I,t}, \Omega ^i_{I,t}=A^i_t\cup (\prod _{j\in N} A^j_t), \Omega _{II}=\prod _{i\in N}(\prod _{t\ge 1} \mathbb {N}^{S^{t-1}})\), and \(\Omega _{III}=\prod _{i\in N} A^{-i}\).

  • \(P=P_{I}\otimes P_{II}\otimes P_{III}\), where \(P_{I}\) is a probability on \(\Omega _{I}, P_{II}\) is a probability on \(\Omega _{II}\), and \(P_{III}\) is a probability on \(\Omega _{III}\). \((\Omega _{I}, P_{I})\) generates the recommended actions for players to follow on the equilibrium path. \((\Omega _{II}, P_{II})\) is used by players to coordinate punishment in case a deviation is detected and there are multiple suspects. \((\Omega _{III}, P_{III})\) is used by players to punish any single player to his correlated minmax.

  • \(P_{III}=\otimes _{i\in N} P^i_{III}\), where for each \(i, P^i_{III}\in \Delta (A^{-i})\) is any minimizer of the problem

    $$\begin{aligned} v^i=\min _{x^{-i} \in \Delta \left( A^{-i} \right) } \max _{a^i\in A^i} \sum _{a^{-i}} x^{-i}(a^{-i}) u^i(a^i,a^{-i}). \end{aligned}$$

    Each player i is informed of the action used to punish any player \(j\ne i\), so that player j will effectively be punished to his correlated minmax level by others. Note that here we only need to draw \(a^{-i}\) once because this punishment never occurs on the equilibrium path.Footnote 38 Since the correlated minmax action profile against player i is drawn from \(P^{i}_{III}\in \Delta (A^{-i})\) independently from the actions that player i receive from \(P^j_{III}\) for all \(j\ne i\), on the equilibrium path player i does not learn any information about the actions recommended by \(P^{i}_{III}\) to other players when he is punished this way. We also note that \((\Omega _{III}, P_{III})\) can be discarded since we allow players to use cheap talk in punishment phases and correlated minmax a single player can be achieved through cheap talk among all other players.

  • \(P_{II}\) is constructed from the “approachability strategy” \(c^a\) together with an authentication scheme as follows: Recall that the aim of \(P_{II}\) is to generate correlation to implement the punishment strategy in the corresponding communication equilibrium. Since the correlation device is only present before the game starts, we need it to generate a collection of recommended action profiles for each history \((a_1,s_1,\ldots ,a_{t-1},s_{t-1})\) of the mediator and then translate them to recommendations that only depend on the public history \((s_1,\ldots , s_t)\). Moreover, equally important is the design of an encoding-decoding scheme so that no player is able to learn the realized recommendations without the help from others.

    Specifically, given \(c^a\), the mediator first draws \(a_1\in A\) according to \(c^a_1\); then for each \(s_1\), she draws \(a_2(s_1)\in A\) from \(c^a_2(a_1,s_1)\); then for each \((s_1,s_2)\), she draws \(a_3(s_1,s_2)=a_3(a_1,a_2(s_1),s_1,s_2)\in A\) from \(c^a_3(a_1,s_1,a_2,s_2)\), and so on. That is, the mediator draws all recommendations for all possible public histories at the pre-play stage. Then the recommendations are encrypted and sent to each player as follows. For each i, let \(K^i=|A^i|\), write \(A^i=\{a^{i}[1],\ldots ,a^i[K^i]\}\), and pick an integer \(L^i\) large enough such that \(K^i<\varepsilon (L^i-1)/M\). Suppose that after history \((s_1,\ldots ,s_{t-1})\), the recommended action for player i is \(a^i[k]\). The mediator randomly chooses \(K^i\) different integers \(\{l^i[1],\ldots ,l^i[K^i]\}\) from the set \(\{1,2,\ldots ,L^i\}\), where \(l^i[\kappa ]\) corresponds to the \(\kappa \)-th action \(a^i[\kappa ]\) for player i. Then she sends the ordered sequence \((l^i[1],\ldots ,l^i[K^i])\) to player i and the integer \(l^i[k]\) to player \(i+1\) (modulo n). Finally, she takes a random permutation of \((l^i[1],\ldots ,l^i[K^i])\) and sends the resulting ordered sequence to all other players in \(N\setminus \{i,i+1\}\).

    By construction, player i’s recommendations from \((P_{II},\Omega _{II})\) are encoded so that he does not know which actions are recommended unless player \(i+1\) broadcasts the decoding keys for i truthfully for all public histories; moreover, no player in \(N\setminus \{i,i+1\}\) learns any information about player i’s recommendations other than the fact that player \(i+1\) has announced valid keys. Finally, whenever player \(i+1\) broadcasts a number different from the truth encoding key, all other players will find his announcement invalid with high probability.Footnote 39

  • \(P_I\), which generates the main path of play, is defined by the following procedure: Let \({\hat{p}}\in \Delta (A)\) be the convex combination of the distribution \(p\in \Delta (A)\) and the uniform distribution \({\mathcal {U}}\in \Delta (A)\), i.e., \({\hat{p}}=(1-\eta )p+\eta {\mathcal {U}}\), where \(\eta <\min \{\varepsilon /M, T^{-\frac{1}{6}}\}\).Footnote 40 The mediator draws a sequence of action profiles \((a_1,\ldots ,a_T)\in A^T\) independently and identically according to the distribution \({\hat{p}}\). Each player i is informed of the sequence of recommendations \((a^i_1,\ldots ,a^i_T)\). Moreover, independently at each stage \(t\le T\), (1) with probability \(\eta \), all players are informed of the action profile \(a_t\) instead of their own recommendations, and (2) with probability \(\eta ^2/n\), only \(n-1\) players are informed of \(a_t\); so that, for each player i, (1) conditioning on observing \(a_t\), he believes that with probability at least \(1-\eta \), all other players also observe \(a_t\); and (2) conditioning on observing \(a^i_t\), he believes that with probability at least \(\eta ^2/n\), all other players observe \(a_t\).

  • Each player i is supposed to follow the recommended actions \((a^i_1,\ldots , a^i_T)\) on the equilibrium path and switch to the recommended actions from \((P_{II},\Omega _{II})\) once he detects a deviation from the main path, i.e., he observes \(a_t\) but \(s_t\ne f(a_t)\). In the latter case, a public cheap-talk phase is added at each stage: at the first punishment stage each player \(i+1\) truthfully broadcasts the encoding key \(l^i[k]\) for player i and follows the recommended action \(a^{i+1}[k']\) after learning the encoding key \(l^{i+1}[k']\) from player \(i+2\); at each later stage each player again truthfully broadcasts the encoding keys for every public history and follow the recommended actions. If any player fails to announce any encoding key or announces an invalid key, then all other players know the identity of that player. Since \((\Omega _{II}, P_{II})\) is constructed from the approachability strategy \(c^a\), if there are at least \(T_0\) stages in the punishment phase, then no player i can get an expected average payoff higher than \(w^i+\varepsilon \) from playing non-obediently.

    Finally, in cases where a single deviant is identified, that is, either a deviation from the recommended action is attributed to a single player or a deviation from the public cheap talk is detected, players switch to playing the corresponding recommended action from \((P_{III},\Omega _{III})\) without cheap talk.

Note that by construction:

  1. (1)

    At each stage t where player i only observes \(a^i_t\) and other players play according to \(P_I\), if player i deviates in a way that could be detected, i.e., choosing an \({\hat{a}}^i_t\) such that \(f({\hat{a}}^i_t,a^{-i}_t)\ne f(a^i_t,a^{-i}_t)\) for some \(a^{-i}_t\), then the probability that all other players detect such a deviation is at least \(\eta ^3/(n|A|)\).

  2. (2)

    At each stage t where player i observes \(a_t\) and other players play according to \(P_I\), it is possible that player i can profitably deviate without being detected. However, since \(\eta \) is small and there are less than \(\eta T\) stages like this, so that the improvement in average payoff is at most \(\eta M <\varepsilon \).

  3. (3)

    At each stage t where player i observes \(a_t\) and other players play according to \(P_I\), if player i deviates in a way that could be detected, then the probability that all other players detect such a deviation is at least \(1-\eta \). At each t of these stages with \(t<T-T_0\), the average continuation payoff of player i is at most \((1-\eta )(w^i+\varepsilon )+\eta M <x^i+2\varepsilon \) with probability at least \(1-\varepsilon \).

  4. (4)

    The play in the last \(T_0\) stages has little impact on the T-stage average payoff: \(T_0\le \varepsilon T\).

    Finally, we need to show that along the main path of play, at those stages where each player i only observes \(a^i_t\), profitable deviations (measured by the T-stage average payoff) lead to punishment with high probability. Fix \(i\in N\). Define the random variable \(Z\in \{0,\ldots , T\}\) as the number of stages where player i is deviating in a way that could have been detected by all other players yet not. Let z denote the realized value of Z. The idea in the following steps is that a small z does not give player i high payoffs and a large z occurs with small probability:

    1. (1)

      \(z\le \varepsilon T\). For each such z, there are at most \(z+1\) stages where player i deviates yet other players do not switch to the punishment phase, and we also drop the last \(T_0-1\) stages. That is, there are at most \(T_0+z\) stages where player i’s stage game payoff is at most M. For any of the \(T-T_0-z\) stages left, either player i deviates to indistinguishable actions, which is not profitable by assumption, or all other players are at the punishment phase, which will be effective with probability at least \(1-\varepsilon \). In the former case, player i’s expected stage game payoff is bounded above by

      $$\begin{aligned} (1-\eta )x^i+\eta M \le w^i+\varepsilon . \end{aligned}$$

      In the latter case, player i’s expected average payoff is at most \(x^i+\varepsilon \) with probability at least \(1-\varepsilon \).

      Therefore, for each \(z\le \varepsilon T\), player i’s average payoff is bounded above by

      $$\begin{aligned} \frac{1}{T}\left[ (T_0+z)M + (T-T_0-z) \left( w^i+\varepsilon \right) \right] \le w^i+(1+2M)\varepsilon , \end{aligned}$$

      with probability at least \(1-\varepsilon \).

    2. (2)

      \(z > \varepsilon T\). For each such z, the probability that \(Z=z\) is at most

      $$\begin{aligned} \left( 1-\frac{\eta ^3}{n|A|}\right) ^z \le \left( 1-\frac{1}{n|A|\sqrt{T}}\right) ^z \le \exp \left( -\frac{z}{n|A|\sqrt{T}}\right) \le \exp \left( -\frac{\varepsilon \sqrt{T}}{n|A|}\right) . \end{aligned}$$

      Therefore, the probability that \(z\ge \varepsilon T\) is at most

      $$\begin{aligned} T\exp \left( -\frac{\varepsilon \sqrt{T}}{n|A|}\right) \le \varepsilon . \end{aligned}$$

      Since M is an upper bound on the stage game payoffs, the unconditional average payoff is at most \(\varepsilon M\).Footnote 41

    3. (3)

      Summing up the two cases, player i’s average payoff is bounded above by

      $$\begin{aligned} (1-\varepsilon )(w^i+(1+2M)\varepsilon )+\varepsilon M + \varepsilon M \le w^i + (1+4M)\varepsilon \le w^i+\varepsilon _0. \end{aligned}$$

This completes the proof.

1.3 Proof of Proposition 4.1

The proof is similar to and generalizes the construction in the proof of Proposition 5.2. Here we give a detailed sketch. Specifically, fix any \(w\in JR\) with \(w=u(p)\) for some \(p\in {\mathcal {P}}\), and consider the correlation device \({\bar{c}}\) defined in Sect. A.2. Because now each player observes private signals along the play, a public cheap-talk phase is added at the beginning of each stage, where players broadcast their observed private signals at the previous stage.

The timing of this extended game is as follows. Before the game starts, the correlation device sends to each player sequences of recommended actions (or action profiles occasionally) and encoded instructions for punishment as in Sect. A.2. At each stage t, before choosing their actions \(a_t\), players simultaneously and publicly announce the private signals observed at stage \(t-1\), i.e., \(m_t=s_{t-1}\). In equilibrium, players always announce the true signals and follow the recommended actions at every stage. When a deviation is detected, i.e., one of the reported signals is not compatible with the recommended action profile, players switch to the punishment phase. The approachability strategy profile is implemented in the punishment phase as in Sect. A.2. At each stage of the punishment phase, players first simultaneously broadcast the observed private signals from the previous stage and the encoding keys for the punishment actions at this stage; then they choose actions based on these encoding keys as well as the instructions sent by the correlation device.

Now we verify that the above strategy profile is a Nash equilibrium of the game extended by the correlation device and public cheap talk. First, since \(p\in {\mathcal {P}}\), no undetectable deviation is profitable by the definition of \({\mathcal {P}}\). Second, following the argument in Sect. A.2, any \(\varepsilon \)-profitable deviation of player i, which either leads to an inconsistent report by this player or makes other players observe an inconsistent signal, will be detected by all other players with high probability if the time horizon T is large enough. Finally, consider the punishment phase, which again lasts for at least \(T_0\) stages. Since \(x\in JR\), if all players follow the punishment instructions, then player i’s expected payoff is not more than \(w^i+\varepsilon \). On the other hand, player i may deviate during the punishment phase, i.e., he could (1) report a signal different from his observation, (2) choose an action different from the suggestion by the correlation device, or (3) announce an invalid encoding key. By the definition of the approachability strategy, the first two types of deviations cannot yield player i more than \(w^i+\varepsilon \); by the construction of the encoding scheme, the last type of deviation will be detected by all other players with high probability, which triggers a correlated minmax punishment of player i. Therefore, player i’s expected payoff is bounded above by \(w^i+\varepsilon \) during the punishment phase.

1.4 Proof of Theorem 4.2

Fix any \(w\in JR\) with \(w=u(p)\) for some \(p\in {\mathcal {P}}\). We adapt the construction in the proof of Proposition 4.1 (see Sect. A.3) to obtain a Nash equilibrium of the extended game with almost public cheap talk. Specifically, we replace the correlation device \({\bar{c}}\) in Sect. A.1 with a long pre-play cheap-talk phase, which generates the same set of recommendations and instructions as \({\bar{c}}\). Moreover, as in Sect. A.3, players announce their private signals from the previous stage at each public cheap-talk phase along the play.

During the pre-play cheap-talk phase, all players execute the secure MPC (see the Online Appendix) infinitely many times to generate (1) a sequence of recommended action profiles as \((\Omega _I,P_I)\), (2) a sequence of encoded punishment actions as \((\Omega _{II},P_{II})\), and (3) a sequence of correlated minmax punishment actions as \((\Omega _{III},P_{III})\). In other words, players have to talk infinitely long before they start playing the repeated game, in order to generate the required sequences of actions for the entire subsequent play.Footnote 42 Note that for each probability distribution in \((\Omega _{I},P_{I}), (\Omega _{II},P_{II})\) and \((\Omega _{III},P_{III})\), the secure MPC runs independently. In particular, deviations at the computation for drawing action profiles with one distribution do not affect the computation for action profiles with another probability distribution. Therefore, we can have many computation phases to compute the sequences of actions as required. If a deviation is detected during the pre-play cheap-talk phase, all the non-deviating players know the identity of the deviant; they then minmax the deviant permanently.

At the end of the pre-play cheap-talk phase, each player is only informed of his own sequences of recommendations as the correlation device \({\bar{c}}\) achieves. The players then follow the recommended actions on the equilibrium path. The rest of the proof is the same as that in Sect. A.3.

1.5 Proof of Theorem 4.3

Fix any \(w\in JR\) with \(w=u(p)\) for some \(p\in {\mathcal {P}}\). We construct a cheap-talk phase at the beginning of each stage. For each \(t\ge 1\), the cheap-talk phase at stage t consists of three sub-phases:

  1. (1)

    Signal sub-phase: Each player i first broadcasts the private signals \(s^i_{t-1}\) observed from the previous stage of play. In the public monitoring case, the signal sub-phase is not needed.

  2. (2)

    Message sub-phase: Each player i then publicly announce the messages \(m^i_{t-1}\) that he sent and received during the computation sub-phase at stage \(t-1\). This and the previous sub-phases are used to check whether deviation occurred in period \(t-1\). That is, players can recover the realized recommended action profile in period \(t-1\) from the messages \((m^i_{t-1})_i\) during the MPC in period \(t-1\). Combined with the realized signals \((s^i_{t-1})_i\), they can check whether deviation occurred in period \(t-1\).

  3. (3)

    Computation sub-phase: If no deviation is detected, then all players conduct the secure MPC to generate an action profile \(a_t=(a^i_t)_{i\in N}\) from the full-support distribution \({\hat{p}}\in \Delta (A)\) defined in Sect. A.2, so that each player i only learns his own component \(a^i_t\) of the profile. If a deviation is detected, then players switch to the punishment phase and conduct the secure MPC to generate the corresponding action profile from the approachability strategy; each player again only learns his own component of the profile.

Each player is supposed to report the true signals and messages and follow the computation protocol at each cheap-talk phase, and follow the computed action at each stage of play. As shown in the Online Appendix, deviations at the message or computation sub-phases are always detected with high probability; therefore, players are willing to play the prescribed strategies in the message and computation sub-phases. Moreover, by definition a player’s profitable deviations either lead to a private signal of some other player that is different from what that player expects to observe, or result in a less informative signal, which does not match the signal that the player should have observed if he were to follow the recommendations. In either case, after detecting a deviation players switch to the joint punishment using approachability strategies, in which case players’ payoffs are no more than their equilibrium payoffs. Since a profitable deviation involves deviating at many stages, it follows from the same arguments in Sect. A.4 that profitable deviations during the play are detected with high probability. Therefore, players are willing to follow the prescribed strategies during the signal sub-phase and follow the recommended actions at every stage. Hence, the above strategy profile is a uniform equilibrium of the cheap-talk extension.

1.6 Proof of Theorem 6.1

Fix any \(w\in C\) with \(u(p)=w\) for some \(p\in \Delta (A)\). We will construct a cheap talk extension of the T-period finitely repeated game and an \(\varepsilon _0\)-equilibrium with payoff within \(\varepsilon _0\) of w.Footnote 43 There are again two phases: main path and punishment. The play starts from the main path. The idea is to conduct statistical tests along the main path of play. Since signals are stochastic, along the main path players’ strategies are defined on blocks of stages. That is, within a block players choose the same mixed actions independently for a large number of stages, and then decide whether some player has deviated at the end of the block.

Specifically, let \({\tilde{p}}=(1-\eta )p+\eta {\mathcal {U}}\), for some \(\eta >0\). For any \(m\in \mathbb {Z}_+\), Each block \(B_m\) on the main path consists of \(2^{q_{m}}\) stages, where \((q_m)_{m\ge 1}\) is a increasing sequence of integers. Independently at each stage t of the block \(B_m\), each player i first draws a number \(l^i_t\) uniformly from \(\{1,2,\ldots ,|S^i|\}\). Since each private signal \(s^i_t\in S^i\) also corresponds to a unique number between 1 and \(|S^i|\) via a one-to-one map \(k^i:S^i\rightarrow \{1,2,\ldots ,|S^i|\}\), player i will then encode the signal \(s^i\) with \(l^i_t\) and get the encoded signal \(k^i_t=k^i(s^i_t)+l^i_t\mod |S^i|\).Footnote 44 Then he privately sends \(l^i_t\) to player \((i+1)\mod n\) and \(k^i_t\) to player \((i+2)\mod n\). After this message sharing, they perform secure MPC to draw the recommended action profile from the distribution \({\tilde{p}}\). At the end of the block, players publicly and simultaneously announce the encoded messages that they received during the block, to recover the private signals in this block, and the messages sent and received for the secure MPC, to recover the recommended action profiles in this block. Players are supposed to be truthful and obedient on the main path. Then they compare the frequency of signals they theoretically expected to observe given the recommended actions with the frequency of signals they actually observed. If players follow the prescribed strategies, then the two frequencies will be very close with high probability, i.e., the distance between the two frequencies measured by the sup-norm with respect to all action profiles is less than \(\varepsilon _m\), where \((\varepsilon _m)_{m\ge 1}\) is a decreasing sequence of positive real numbers converging to zero, in which case the play stays on the main path and moves to the next block.Footnote 45 If some player has profitably deviated, then it is very likely that the two frequencies will be more different, i.e., the distance between the two frequencies is larger than \(\varepsilon _m\); as a result players will switch to the punishment phase.

Next consider the punishment phase. Since \(w\in JR\), by Blackwell’s approachability theorem for the stochastic signal case (see Mertens et al. 2014), there exists a communication device \({\tilde{c}}^{a}=({\tilde{c}}^a_t)_{t\ge 1}\) where \({\tilde{c}}^a_t:(A\times S)^{t-1}\rightarrow \Delta (A)\) such that, for any \(\varepsilon >0\), there exists \(T_0\) such that for any \(T>T_0, i\in N\), if all players other than i are truthful and obedient, i.e., \(\tau ^j=\tau ^{j*}, \forall j\in N\setminus \{i\}\), then for any \(\tau ^i\),

$$\begin{aligned} \frac{1}{T}{\mathbb {E}}\,_{{\tilde{c}}^a, \tau }\left[ \sum _{t=1}^T u^i(a_t) \right] \le w^i+\varepsilon . \end{aligned}$$

Whenever all players agree that a deviation has occurred, the play switches from the main path to the punishment phase immediately. At the beginning of each stage in the punishment phase, players first publicly announce the private signals from the previous stage and then perform secure MPC to generate the recommended action profiles according to \({\tilde{c}}^a\). If a deviation is detected during the computation, then the deviant is identified and is punished by correlated minmax strategies, which again can be implemented by cheap talk.

Finally, we briefly explain that the above strategy profile is an \(\varepsilon _0\)-equilibrium of the T-period repeated game extend by cheap talk, with \(T=\sum _{m=M_0}^{M_1} 2^{q_m}\) for sufficiently large integers \(M_0\) and \(M_1\). First, if players follow the prescribed strategy profile, then the probability that punishment happens is very small according to Azuma-Hoeffding inequality and hence players’ expected payoffs are within \(\varepsilon _0\) of w. Second, notice that with the above constructed cheap talk phases, along the stages within a block each player only learns his own private signals; and only at the end of a block all players learn all the private signals and recommended actions in the block. Therefore, each player has the same amount of information as in the case with a mediator in Renault and Tomala (2004). Following the argument in Renault (2000) or Renault and Tomala (2004), one can show that a profitable deviation by any player will be detected and punished with high probability, in which case no player’s payoff is more than \(\varepsilon _0\) compare to his equilibrium payoff.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H. Correlation and unmediated cheap talk in repeated games with imperfect monitoring. Int J Game Theory 46, 1037–1069 (2017). https://doi.org/10.1007/s00182-017-0569-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00182-017-0569-7

Keywords

Navigation