Skip to main content
Log in

Consistent epistemic planning for multiagent deep reinforcement learning

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Multiagent cooperation in a partially observable environment without communication is difficult because of the uncertainty of agents. Traditional multiagent deep reinforcement learning (MADRL) algorithms fail to address this uncertainty. We proposed a MADRL-based policy network architecture called shared mental model-multiagent epistemic planning policy (SMM-MEPP) to resolve this issue. Firstly, this architecture combines multiagent epistemic planning and MADRL to create a “perception–planning–action” multiagent epistemic planning framework, helping multiple agents better handle uncertainty in the absence of coordination. Additionally, by introducing mental models and describing them as neural networks, the parameter-sharing mechanism is used to create shared mental models, maintain the consistency of multiagent planning under the condition of no communication, and improve the efficiency of cooperation. Finally, we applied the SMM-MEPP architecture to three advanced MADRL algorithms (i.e., MAAC, MADDPG, and MAPPO) and conducted comparative experiments in multiagent cooperation tasks. The results show that the proposed method can provide consistent planning for multiple agents and improve the convergence speed or training effect in a partially observable environment without communication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author, Luo, upon reasonable request.

References

  1. Alshehri A, Miller T, Sonenberg L (2021) Modeling communication of collaborative multiagent system under epistemic planning. Int J Intell Syst 36(10):5959–5980

    Article  Google Scholar 

  2. Areces C, Fervari R, Saravia AR et al (2021) Uncertainty-based semantics for multi-agent knowing how logics. arXiv preprint arXiv:2106.11492

  3. Baier C, Funke F, Majumdar R (2021) Responsibility attribution in parameterized Markovian models. In: Proceedings of the AAAI conference on artificial intelligence, pp 11734–11743

  4. Bolander T, Andersen MB (2011) Epistemic planning for single-and multi-agent systems. J Appl Non-Class Logics 21(1):9–34

    Article  MathSciNet  Google Scholar 

  5. Buckingham D, Kasenberg D, Scheutz M (2020) Simultaneous representation of knowledge and belief for epistemic planning with belief revision. In: Proceedings of the international conference on principles of knowledge representation and reasoning, vol 17, pp 172–181

  6. Chen L, Wang Y, Mo Y et al (2023) Multiagent path finding using deep reinforcement learning coupled with hot supervision contrastive loss. IEEE Trans Ind Electron 70(7):7032–7040. https://doi.org/10.1109/TIE.2022.3206745

    Article  Google Scholar 

  7. Engesser T, Bolander T, Mattmüller R et al (2017) Cooperative epistemic multi-agent planning for implicit coordination. arXiv preprint arXiv:1703.02196

  8. Fabiano F, Burigana A, Dovier A et al (2021) Multi-agent epistemic planning with inconsistent beliefs, trust and lies. In: Pham DN, Theeramunkong T, Governatori G et al (eds) PRICAI 2021: trends in artificial intelligence. Springer International Publishing, Cham, pp 586–597

    Google Scholar 

  9. Fabiano F, Srivastava B, Lenchner J, et al (2021b) E-PDDL: a standardized way of defining epistemic planning problems. arXiv preprint arXiv:2107.08739

  10. Foerster J, Assael IA, De Freitas N et al (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Proceedings of the 30th international conference on neural information processing systems, pp 2145–2153

  11. Geffner H, Bonet B (2013) A concise introduction to models and methods for automated planning. In: Synthesis lectures on artificial intelligence and machine learning, vol 8, no 1, pp 1–141

  12. Gurov D, Goranko V, Lundberg E (2022) Knowledge-based strategies for multi-agent teams playing against nature. Artif Intell 309(103):728

    MathSciNet  Google Scholar 

  13. He K, Banerjee B, Doshi P (2021) Cooperative-competitive reinforcement learning with history-dependent rewards. In: Proceedings of the 20th international conference on autonomous agents and multiagent systems, pp 602–610

  14. Ikeda T, Shibuya T (2022) Centralized training with decentralized execution reinforcement learning for cooperative multi-agent systems with communication delay. In: 2022 61st annual conference of the Society of Instrument and Control Engineers (SICE). IEEE, pp 135–140

  15. Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 2961–2970

  16. Jain V, Kumar B (2023) QoS-aware task offloading in fog environment using multi-agent deep reinforcement learning. J Netw Syst Manag. https://doi.org/10.1007/s10922-022-09696-y

    Article  Google Scholar 

  17. Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Proceedings of the 32nd international conference on neural information processing systems, pp 7265–7275

  18. Kong X, Xin B, Liu F et al (2017) Revisiting the master–slave architecture in multi-agent deep reinforcement learning. arXiv preprint arXiv:1712.07305

  19. Lowe R, Wu YI, Tamar A et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, pp 6382–6393

  20. Muise C (2014) Exploiting relevance to improve robustness and flexibility in plan generation and execution. University of Toronto (Canada), Toronto

    Google Scholar 

  21. Muise C, Belle V, Felli P et al (2022) Efficient multi-agent epistemic planning: teaching planners about nested belief. Artif Intell 302(103):605. https://doi.org/10.1016/j.artint.2021.103605

    Article  MathSciNet  Google Scholar 

  22. Parnika P, Diddigi RB, Danda SKR et al (2021) Attention actor-critic algorithm for multi-agent constrained co-operative reinforcement learning. In: International conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)

  23. Rouse WB, Morris NM (1986) On looking into the black box: prospects and limits in the search for mental models. Psychol Bull 100(3):349

    Article  Google Scholar 

  24. Rupprecht T, Wang Y (2022) A survey for deep reinforcement learning in Markovian cyber-physical systems: common problems and solutions. Neural Netw Off J Int Neural Netw Soc 153:13–36

    Google Scholar 

  25. Seo S, Kennedy-Metz LR, Zenati MA et al (2021) Towards an AI coach to infer team mental model alignment in healthcare. In: 2021 IEEE conference on cognitive and computational aspects of situation management (CogSIMA). IEEE, pp 39–44

  26. Shibata K, Jimbo T, Matsubara T (2023) Deep reinforcement learning of event-triggered communication and consensus-based control for distributed cooperative transport. Robot Auton Syst 159(104):307

    Google Scholar 

  27. Singh R, Sonenberg L, Miller T (2017) Communication and shared mental models for teams performing interdependent tasks. In: Coordination, organizations, institutions, and norms in agent systems XII: COIN 2016 international workshops, COIN@ AAMAS, Singapore, Singapore, May 9, 2016, COIN@ ECAI, The Hague, The Netherlands, August 30, 2016, Revised Selected Papers. Springer, pp 81–97

  28. Ulusoy A, Smith SL, Ding XC et al (2011) Optimal multi-robot path planning with temporal logic constraints. In: 2011 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3087–3092

  29. Wan H, Fang B, Liu Y (2021) A general multi-agent epistemic planner based on higher-order belief change. Artif Intell 301(103):562. https://doi.org/10.1016/j.artint.2021.103562

    Article  MathSciNet  Google Scholar 

  30. Wu J, Sun X, Zeng A et al (2021) Spatial intention maps for multi-agent mobile manipulation. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 8749–8756

  31. Xu Y, Wei Y, Jiang K et al (2023) Multiple UAVs path planning based on deep reinforcement learning in communication denial environment. Mathematics. https://doi.org/10.3390/math11020405

    Article  Google Scholar 

  32. Yang T, Tang H, Bai C et al (2021) Exploration in deep reinforcement learning: a comprehensive survey. arXiv preprint arXiv:2109.06668

  33. Yu C, Velu A, Vinitsky E et al (2021) The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955

  34. Zhou Y (2021) Ideology, censorship, and propaganda: unifying shared mental models. Available at SSRN 3821161

Download references

Funding

This work was Supported by the National Key R &D Program of China (2018YFB1308300), National Natural Science Foundation of China (62276028, U20A20167), Beijing Natural Science Foundation (4202026), Natural Science Foundation of Hebei Province (F202103079) and the Innovation Capability Improvement Plan Project of Hebei Province (22567626H).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shicheng Luo.

Ethics declarations

Conflict of interest

The authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, P., Luo, S., Tian, L. et al. Consistent epistemic planning for multiagent deep reinforcement learning. Int. J. Mach. Learn. & Cyber. 15, 1663–1675 (2024). https://doi.org/10.1007/s13042-023-01989-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01989-1

Keywords

Navigation