Consistent epistemic planning for multiagent deep reinforcement learning

Wu, Peiliang; Luo, Shicheng; Tian, Liqiang; Mao, Bingyi; Chen, Wenbai

doi:10.1007/s13042-023-01989-1

Consistent epistemic planning for multiagent deep reinforcement learning

Original Article
Published: 28 October 2023

Volume 15, pages 1663–1675, (2024)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Peiliang Wu^1,2,
Shicheng Luo ORCID: orcid.org/0000-0003-2099-7074^1,2^na1,
Liqiang Tian^1,2^na1,
Bingyi Mao^1,2^na1 &
…
Wenbai Chen³^na1

237 Accesses
Explore all metrics

Abstract

Multiagent cooperation in a partially observable environment without communication is difficult because of the uncertainty of agents. Traditional multiagent deep reinforcement learning (MADRL) algorithms fail to address this uncertainty. We proposed a MADRL-based policy network architecture called shared mental model-multiagent epistemic planning policy (SMM-MEPP) to resolve this issue. Firstly, this architecture combines multiagent epistemic planning and MADRL to create a “perception–planning–action” multiagent epistemic planning framework, helping multiple agents better handle uncertainty in the absence of coordination. Additionally, by introducing mental models and describing them as neural networks, the parameter-sharing mechanism is used to create shared mental models, maintain the consistency of multiagent planning under the condition of no communication, and improve the efficiency of cooperation. Finally, we applied the SMM-MEPP architecture to three advanced MADRL algorithms (i.e., MAAC, MADDPG, and MAPPO) and conducted comparative experiments in multiagent cooperation tasks. The results show that the proposed method can provide consistent planning for multiple agents and improve the convergence speed or training effect in a partially observable environment without communication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning controlled and targeted communication with the centralized critic for the multi-agent system

Article 04 November 2022

Review on Dec-POMDP Model for MARL Algorithms

Policy Advisory Module for Exploration Hindrance Problem in Multi-agent Deep Reinforcement Learning

Data availability

The data that support the findings of this study are available from the corresponding author, Luo, upon reasonable request.

References

Alshehri A, Miller T, Sonenberg L (2021) Modeling communication of collaborative multiagent system under epistemic planning. Int J Intell Syst 36(10):5959–5980
Article Google Scholar
Areces C, Fervari R, Saravia AR et al (2021) Uncertainty-based semantics for multi-agent knowing how logics. arXiv preprint arXiv:2106.11492
Baier C, Funke F, Majumdar R (2021) Responsibility attribution in parameterized Markovian models. In: Proceedings of the AAAI conference on artificial intelligence, pp 11734–11743
Bolander T, Andersen MB (2011) Epistemic planning for single-and multi-agent systems. J Appl Non-Class Logics 21(1):9–34
Article MathSciNet Google Scholar
Buckingham D, Kasenberg D, Scheutz M (2020) Simultaneous representation of knowledge and belief for epistemic planning with belief revision. In: Proceedings of the international conference on principles of knowledge representation and reasoning, vol 17, pp 172–181
Chen L, Wang Y, Mo Y et al (2023) Multiagent path finding using deep reinforcement learning coupled with hot supervision contrastive loss. IEEE Trans Ind Electron 70(7):7032–7040. https://doi.org/10.1109/TIE.2022.3206745
Article Google Scholar
Engesser T, Bolander T, Mattmüller R et al (2017) Cooperative epistemic multi-agent planning for implicit coordination. arXiv preprint arXiv:1703.02196
Fabiano F, Burigana A, Dovier A et al (2021) Multi-agent epistemic planning with inconsistent beliefs, trust and lies. In: Pham DN, Theeramunkong T, Governatori G et al (eds) PRICAI 2021: trends in artificial intelligence. Springer International Publishing, Cham, pp 586–597
Google Scholar
Fabiano F, Srivastava B, Lenchner J, et al (2021b) E-PDDL: a standardized way of defining epistemic planning problems. arXiv preprint arXiv:2107.08739
Foerster J, Assael IA, De Freitas N et al (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Proceedings of the 30th international conference on neural information processing systems, pp 2145–2153
Geffner H, Bonet B (2013) A concise introduction to models and methods for automated planning. In: Synthesis lectures on artificial intelligence and machine learning, vol 8, no 1, pp 1–141
Gurov D, Goranko V, Lundberg E (2022) Knowledge-based strategies for multi-agent teams playing against nature. Artif Intell 309(103):728
MathSciNet Google Scholar
He K, Banerjee B, Doshi P (2021) Cooperative-competitive reinforcement learning with history-dependent rewards. In: Proceedings of the 20th international conference on autonomous agents and multiagent systems, pp 602–610
Ikeda T, Shibuya T (2022) Centralized training with decentralized execution reinforcement learning for cooperative multi-agent systems with communication delay. In: 2022 61st annual conference of the Society of Instrument and Control Engineers (SICE). IEEE, pp 135–140
Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: International conference on machine learning. PMLR, pp 2961–2970
Jain V, Kumar B (2023) QoS-aware task offloading in fog environment using multi-agent deep reinforcement learning. J Netw Syst Manag. https://doi.org/10.1007/s10922-022-09696-y
Article Google Scholar
Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. In: Proceedings of the 32nd international conference on neural information processing systems, pp 7265–7275
Kong X, Xin B, Liu F et al (2017) Revisiting the master–slave architecture in multi-agent deep reinforcement learning. arXiv preprint arXiv:1712.07305
Lowe R, Wu YI, Tamar A et al (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Proceedings of the 31st international conference on neural information processing systems, pp 6382–6393
Muise C (2014) Exploiting relevance to improve robustness and flexibility in plan generation and execution. University of Toronto (Canada), Toronto
Google Scholar
Muise C, Belle V, Felli P et al (2022) Efficient multi-agent epistemic planning: teaching planners about nested belief. Artif Intell 302(103):605. https://doi.org/10.1016/j.artint.2021.103605
Article MathSciNet Google Scholar
Parnika P, Diddigi RB, Danda SKR et al (2021) Attention actor-critic algorithm for multi-agent constrained co-operative reinforcement learning. In: International conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Rouse WB, Morris NM (1986) On looking into the black box: prospects and limits in the search for mental models. Psychol Bull 100(3):349
Article Google Scholar
Rupprecht T, Wang Y (2022) A survey for deep reinforcement learning in Markovian cyber-physical systems: common problems and solutions. Neural Netw Off J Int Neural Netw Soc 153:13–36
Google Scholar
Seo S, Kennedy-Metz LR, Zenati MA et al (2021) Towards an AI coach to infer team mental model alignment in healthcare. In: 2021 IEEE conference on cognitive and computational aspects of situation management (CogSIMA). IEEE, pp 39–44
Shibata K, Jimbo T, Matsubara T (2023) Deep reinforcement learning of event-triggered communication and consensus-based control for distributed cooperative transport. Robot Auton Syst 159(104):307
Google Scholar
Singh R, Sonenberg L, Miller T (2017) Communication and shared mental models for teams performing interdependent tasks. In: Coordination, organizations, institutions, and norms in agent systems XII: COIN 2016 international workshops, COIN@ AAMAS, Singapore, Singapore, May 9, 2016, COIN@ ECAI, The Hague, The Netherlands, August 30, 2016, Revised Selected Papers. Springer, pp 81–97
Ulusoy A, Smith SL, Ding XC et al (2011) Optimal multi-robot path planning with temporal logic constraints. In: 2011 IEEE/RSJ international conference on intelligent robots and systems. IEEE, pp 3087–3092
Wan H, Fang B, Liu Y (2021) A general multi-agent epistemic planner based on higher-order belief change. Artif Intell 301(103):562. https://doi.org/10.1016/j.artint.2021.103562
Article MathSciNet Google Scholar
Wu J, Sun X, Zeng A et al (2021) Spatial intention maps for multi-agent mobile manipulation. In: 2021 IEEE international conference on robotics and automation (ICRA). IEEE, pp 8749–8756
Xu Y, Wei Y, Jiang K et al (2023) Multiple UAVs path planning based on deep reinforcement learning in communication denial environment. Mathematics. https://doi.org/10.3390/math11020405
Article Google Scholar
Yang T, Tang H, Bai C et al (2021) Exploration in deep reinforcement learning: a comprehensive survey. arXiv preprint arXiv:2109.06668
Yu C, Velu A, Vinitsky E et al (2021) The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955
Zhou Y (2021) Ideology, censorship, and propaganda: unifying shared mental models. Available at SSRN 3821161

Download references

Funding

This work was Supported by the National Key R &D Program of China (2018YFB1308300), National Natural Science Foundation of China (62276028, U20A20167), Beijing Natural Science Foundation (4202026), Natural Science Foundation of Hebei Province (F202103079) and the Innovation Capability Improvement Plan Project of Hebei Province (22567626H).

Author information

Shicheng Luo, Liqiang Tian, Bingyi Mao and Wenbai Chen contributed equally to this work.

Authors and Affiliations

School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004, Hebei, China
Peiliang Wu, Shicheng Luo, Liqiang Tian & Bingyi Mao
The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao, 066004, Hebei, China
Peiliang Wu, Shicheng Luo, Liqiang Tian & Bingyi Mao
School of Automation, Beijing Information Science and Technology University, Beijing, 100192, China
Wenbai Chen

Authors

Peiliang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shicheng Luo
View author publications
You can also search for this author in PubMed Google Scholar
Liqiang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Bingyi Mao
View author publications
You can also search for this author in PubMed Google Scholar
Wenbai Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shicheng Luo.

Ethics declarations

Conflict of interest

The authors declare that they do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, P., Luo, S., Tian, L. et al. Consistent epistemic planning for multiagent deep reinforcement learning. Int. J. Mach. Learn. & Cyber. 15, 1663–1675 (2024). https://doi.org/10.1007/s13042-023-01989-1

Download citation

Received: 20 March 2023
Accepted: 25 September 2023
Published: 28 October 2023
Issue Date: May 2024
DOI: https://doi.org/10.1007/s13042-023-01989-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Consistent epistemic planning for multiagent deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Learning controlled and targeted communication with the centralized critic for the multi-agent system

Review on Dec-POMDP Model for MARL Algorithms

Policy Advisory Module for Exploration Hindrance Problem in Multi-agent Deep Reinforcement Learning

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Consistent epistemic planning for multiagent deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

Learning controlled and targeted communication with the centralized critic for the multi-agent system

Review on Dec-POMDP Model for MARL Algorithms

Policy Advisory Module for Exploration Hindrance Problem in Multi-agent Deep Reinforcement Learning

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation