欢迎访问《兵工学报》官方网站,今天是 分享到:

兵工学报 ›› 2023, Vol. 44 ›› Issue (6): 1547-1563.doi: 10.12382/bgxb.2022.0711

• • 上一篇    下一篇

基于分层强化学习的无人机空战多维决策

张建东1, 王鼎涵1, 杨啟明1,*(), 史国庆1, 陆屹2, 张耀中1   

  1. 1.西北工业大学 电子信息学院, 陕西 西安 710072
    2.沈阳飞机设计研究所, 辽宁 沈阳 110035
  • 收稿日期:2022-08-13 上线日期:2023-06-30
  • 基金资助:
    陕西省自然科学基础研究计划项目(2022JQ-593); 陕西省科技厅重点研发计划项目(2022GY-089)

Multi-Dimensional Decision-Making for UAV Air Combat Based on Hierarchical Reinforcement Learning

ZHANG Jiandong1, WANG Dinghan1, YANG Qiming1,*(), SHI Guoqing1, LU Yi2, ZHANG Yaozhong1   

  1. 1. School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, Shaanxi, China
    2. AVIC Shenyang Aircraft Design and Research Institute, Shenyang 110035, Liaoning, China
  • Received:2022-08-13 Online:2023-06-30

摘要:

针对无人机空战过程中面临的智能决策问题,基于分层强化学习架构建立无人机智能空战的多维决策模型。将空战自主决策由单一维度的机动决策扩展到雷达开关、主动干扰、队形转换、目标探测、目标追踪、干扰规避、武器选择等多个维度,实现空战主要环节的自主决策;为解决维度扩展后决策模型状态空间复杂度、学习效率低的问题,结合Soft Actor-Critic算法和专家经验训练和建立元策略组,并改进传统的Option-Critic算法,设计优化策略终止函数,提高策略的切换的灵活性,实现空战中多个维度决策的无缝切换。实验结果表明,该模型在无人机空战全流程的多维度决策问题中具有较好的对抗效果,能够控制智能体根据不同的战场态势灵活切换干扰、搜索、打击、规避等策略,达到提升传统算法性能和提高解决复杂决策效率的目的。

关键词: 无人机空战, 多维决策, 分层强化学习, Soft Actor-Critic算法, Option-Critic算法

Abstract:

To solve the intelligent decision-making problem in the process of UAV air combat, a multi-dimensional decision-making model for UAV intelligent air combat based on the hierarchical reinforcement learning architecture is established, allowing the autonomous decision-making of air combat to be extended from a single-dimensional maneuver decision to a multi-dimensional one including radar switch, active jamming, formation conversion, target detection, target tracking, interference avoidance, weapon selection, etc., so that autonomous decision-making in the main steps of air combat is realized. In order to solve the problems of state-space complexity and low learning efficiency of the decision-making model after the dimension expansion, a meta-strategy group is trained and established with the Soft Actor-Critic algorithm and expert experience, and the traditional Option-Critic algorithm is improved. The strategy termination function is designed and optimized to improve the flexibility of strategy switching and realize seamless multi-dimensional decision-making switching in air combat.. The experimental results show that the proposed method has good countermeasure effectiveness for the multi-dimensional decision-making during the whole process of UAV air combat, which can control the agent to flexibly switch among interference, search, strike, and avoidance strategies according to different battlefield situations with the purpose of improving the performance of traditional algorithms and the efficiency of solving complex decision-making processes.

Key words: UAV air combat, multi-dimensional decision-making, hierarchical reinforcement learning, Soft Actor-Critic algorithm, Option-Critic algorithm