research-article Free Access
- Authors:
- Jiahua Wang School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
View Profile
- Ping Zhang School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
View Profile
- Yang Wang State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550525, China
State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550525, China
View Profile
Applied Soft ComputingVolume 145Issue CSep 2023https://doi.org/10.1016/j.asoc.2023.110604
Published:01 September 2023Publication History
- 0citation
- 0
- Downloads
Metrics
Total Citations0Total Downloads0Last 12 Months0
Last 6 weeks0
- Get Citation Alerts
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
See AlsoPielęgniarstwo onkologiczne - Publisher Site
Applied Soft Computing
Volume 145, Issue C
PreviousArticleNextArticle
Abstract
Abstract
In recent years, deep reinforcement learning (DRL) has developed rapidly and has been applied to multi-UAV target tracking (MTT) research. However, DRL still faces challenges in data utilization and learning speed. To better solve the above problems, a novel two-stage DRL-based multi-UAV decision-making method is proposed in this paper. Specifically, a sample generator combining artificial potential field with proportional–integral–derivative is used to produce expert experience data. On this basis, a two-stage reinforcement learning training method is introduced. For the first stage, the policy network and critic network are pre-trained using expert data, combined with behavior cloning loss and additional Q-value loss, which reduces ineffective exploration and speeds up learning. For the second RL stage, by calculating the average return of the last recent k excellent episodes, the excellent experience generated by the agent itself is screened out and used to guide the policy network to choose the actions with high reward, thus improving the efficiency of data utilization. Extensive simulation experiments show that our method not only enables multi-UAV to continuously track the target in obstacle environments but also significantly improves the learning speed and convergence effect.
Graphical abstract
Display Omitted
Highlights
• | A new decision-making framework is proposed for MTT in obstacle environments. | ||||
• | TSDRL-EE makes full use of expert data and the excellent experience of the agent. | ||||
• | TSDRL-EE has obvious advantages in learning speed and convergence effect. |
References
- [1] Yao P., Wang H., Ji H., Multi-UAVs tracking target in urban environment by model predictive control and Improved Grey Wolf Optimizer, Aerosp. Sci. Technol. 55 (2016) 131–143.Google Scholar
- [2] Oh H., Kim S., Tsourdos A., White B.A., Decentralised standoff tracking of moving targets using adaptive sliding mode control for UAVs, J. Intell. Robot. Syst. 76 (1) (2014) 169–183.Google Scholar
- [3] LeCun Y., Bengio Y., Hinton G., Deep learning, Nature 521 (7553) (2015) 436–444.Google Scholar
- [4] Botvinick M., Ritter S., Wang J.X., Kurth-Nelson Z., Blundell C., Hassabis D., Reinforcement learning, fast and slow, Trends in Cognitive Sciences 23 (5) (2019) 408–422.Google Scholar
- [5] Li B., Yang Z.-p., Chen D.-q., Liang S.-y., Ma H., Maneuvering target tracking of UAV based on MN-DDPG and transfer learning, Def. Technol. 17 (2) (2021) 457–466.Google Scholar
- [6] Moon J., Papaioannou S., Laoudias C., Kolios P., Kim S., Deep reinforcement learning multi-UAV trajectory control for target tracking, IEEE Internet Things J. 8 (20) (2021) 15441–15455.Google Scholar
- [7] Zhang R., Zong Q., Zhang X., Dou L., Tian B., Game of drones: Multi-UAV pursuit-evasion game with online motion planning by deep reinforcement learning, IEEE Trans. Neural Netw. Learn. Syst. (2022).Google Scholar
- [8] Chen Y.-J., Chang D.-K., Zhang C., Autonomous tracking using a swarm of UAVs: A constrained multi-agent reinforcement learning approach, IEEE Trans. Veh. Technol. 69 (11) (2020) 13702–13717.Google Scholar
- [9] Fujimoto S., Hoof H., Meger D.,
Addressing function approximation error in actor-critic methods , in: International Conference on Machine Learning, PMLR, 2018, pp. 1587–1596.Google Scholar - [10] Shin Y., Kim E., Hybrid path planning using positioning risk and artificial potential fields, Aerosp. Sci. Technol. 112 (2021).Google Scholar
- [11] Pham H.X., La H.M., Feil-Seifer D., Nguyen L.V., Autonomous uav navigation using reinforcement learning, 2018, arXiv preprint arXiv:1801.05086.Google Scholar
- [12] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.Google Scholar
- [13] Silver D., Huang A., Maddison C.J., Guez A., Sifre L., Van Den Driessche G., Schrittwieser J., Antonoglou I., Panneershelvam V., Lanctot M., et al., Mastering the game of go with deep neural networks and tree search, Nature 529 (7587) (2016) 484–489.Google Scholar
- [14] Li Z., Xiong G., Tian Y., Lv Y., Chen Y., Hui P., Su X., A multi-stream feature fusion approach for traffic prediction, IEEE Trans. Intell. Transp. Syst. (2020).Google Scholar
- [15] Jin J., Ma X., A multi-objective agent-based control approach with application in intelligent traffic signal system, IEEE Trans. Intell. Transp. Syst. 20 (10) (2019) 3900–3912.Google Scholar
- [16] Mnih V., Kavukcuoglu K., Silver D., Rusu A.A., Veness J., Bellemare M.G., Graves A., Riedmiller M., Fidjeland A.K., Ostrovski G., et al., Human-level control through deep reinforcement learning, Nature 518 (7540) (2015) 529–533.Google Scholar
- [17] Silver D., Schrittwieser J., Simonyan K., Antonoglou I., Huang A., Guez A., Hubert T., Baker L., Lai M., Bolton A., et al., Mastering the game of go without human knowledge, Nature 550 (7676) (2017) 354–359.Google Scholar
- [18] Huang Z., Wu J., Lv C., Efficient deep reinforcement learning with imitative expert priors for autonomous driving, IEEE Trans. Neural Netw. Learn. Syst. (2022).Google Scholar
- [19] Li X., Wang X., Zheng X., Dai Y., Yu Z., Zhang J.J., Bu G., Wang F.-Y., Supervised assisted deep reinforcement learning for emergency voltage control of power systems, Neurocomputing 475 (2022) 69–79.Google Scholar
- [20] Samir M., Ebrahimi D., Assi C., Sharafeddine S., Ghrayeb A., Leveraging UAVs for coverage in cell-free vehicular networks: A deep reinforcement learning approach, IEEE Trans. Mob. Comput. 20 (9) (2020) 2835–2847.Google Scholar
- [21] Wan K., Wu D., Li B., Gao X., Hu Z., Chen D., ME-MADDPG: An efficient learning-based motion planning method for multiple agents in complex environments, Int. J. Intell. Syst. 37 (3) (2022) 2393–2427.Google Scholar
- [22] Bhagat S., Sujit P.,
UAV target tracking in urban environments using deep reinforcement learning , in: 2020 International Conference on Unmanned Aircraft Systems,ICUAS , IEEE, 2020, pp. 694–701.Google Scholar - [23] Li B., Wu Y., Path planning for UAV ground target tracking via deep reinforcement learning, IEEE Access 8 (2020) 29064–29074.Google Scholar
- [24] You S., Diao M., Gao L., Zhang F., Wang H., Target tracking strategy using deep deterministic policy gradient, Appl. Soft Comput. 95 (2020).Google Scholar
- [25] Zhou W., Liu Z., Li J., Xu X., Shen L., Multi-target tracking for unmanned aerial vehicle swarms using deep reinforcement learning, Neurocomputing 466 (2021) 285–297.Google Scholar
- [26] Xia Z., Du J., Wang J., Jiang C., Ren Y., Li G., Han Z., Multi-agent reinforcement learning aided intelligent UAV swarm for target tracking, IEEE Trans. Veh. Technol. 71 (1) (2021) 931–945.Google Scholar
- [27] T. Hester, M. Vecerik, O. Pietquin, M. Lanctot, T. Schaul, B. Piot, D. Horgan, J. Quan, A. Sendonaris, I. Osband, et al., Deep q-learning from demonstrations, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32, No. 1, 2018.Google Scholar
- [28] Vecerik M., Hester T., Scholz J., Wang F., Pietquin O., Piot B., Heess N., Rothörl T., Lampe T., Riedmiller M., Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards, 2017, arXiv preprint arXiv:1707.08817.Google Scholar
- [29] Nair A., McGrew B., Andrychowicz M., Zaremba W., Abbeel P.,
Overcoming exploration in reinforcement learning with demonstrations , in: 2018 IEEE International Conference on Robotics and Automation,ICRA , IEEE, 2018, pp. 6292–6299.Google Scholar - [30] Gao Y., Xu H., Lin J., Yu F., Levine S., Darrell T., Reinforcement learning from imperfect demonstrations, 2018, arXiv preprint arXiv:1802.05313.Google Scholar
- [31] M. Jing, X. Ma, W. Huang, F. Sun, C. Yang, B. Fang, H. Liu, Reinforcement learning from imperfect demonstrations under soft expert guidance, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 04, 2020, pp. 5109–5116.Google Scholar
- [32] Xie L., Wang S., Rosa S., Markham A., Trigoni N.,
Learning with training wheels: speeding up training with a simple controller for deep reinforcement learning , in: 2018 IEEE International Conference on Robotics and Automation,ICRA , IEEE, 2018, pp. 6276–6283.Google Scholar - [33] Sutton R.S., Barto A.G., Reinforcement Learning: An Introduction, MIT Press, 2018.Google ScholarDigital Library
- [34] H. VanHasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30, No. 1, 2016.Google Scholar
- [35] X. Liang, T. Wang, L. Yang, E. Xing, Cirl: Controllable imitative reinforcement learning for vision-based self-driving, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 584–599.Google Scholar
- [36] Oh J., Guo Y., Singh S., Lee H.,
Self-imitation learning , in: International Conference on Machine Learning, PMLR, 2018, pp. 3878–3887.Google Scholar - [37] Raffin A., Hill A., Gleave A., Kanervisto A., Ernestus M., Dormann N., Stable-baselines3: Reliable reinforcement learning implementations, J. Mach. Learn. Res. (2021).Google Scholar
- [38] Shah S., Dey D., Lovett C., Kapoor A.,
Airsim: High-fidelity visual and physical simulation for autonomous vehicles , in: Field and Service Robotics, Springer, 2018, pp. 621–635.Google Scholar - [39] Lillicrap T.P., Hunt J.J., Pritzel A., Heess N., Erez T., Tassa Y., Silver D., Wierstra D., Continuous control with deep reinforcement learning, 2015, arXiv preprint arXiv:1509.02971.Google Scholar
- [40] He L., Aouf N., Whidborne J.F., Song B., Deep reinforcement learning based local planner for UAV obstacle avoidance using demonstration data, 2020, arXiv preprint arXiv:2008.02521.Google Scholar
- [41] Andrychowicz M., Wolski F., Ray A., Schneider J., Fong R., Welinder P., Mcgrew B., Tobin J., Abbeel P., Zaremba W., Hindsight experience replay, 2017.Google Scholar
Cited By
View all
Recommendations
- Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Deep Reinforcement Learning
Abstract
When using deep reinforcement learning algorithm to complete Unmanned Aerial Vehicle (UAV) autonomous obstacle avoidance and target tracking tasks, there are often some problems such as slow convergence speed and low success rate. Therefore, this ...
Read More
- Multi-UAV autonomous collision avoidance based on PPO-GIC algorithm with CNN–LSTM fusion network
Abstract
This paper is concerned with the autonomous effective collision avoidance strategy for multiple unmanned aerial vehicles (multi-UAV) in limited airspace under the framework of proximal policy optimization (PPO) algorithm. An end-to-end ...
Read More
- Multi-mode filter target tracking method for mobile robot using multi-agent reinforcement learning
Abstract
Multi-mode filtering target tracking for mobile robot has important research significance for robot path planning, motion control and tracking robot targets. To address the problem that it is difficult for mobile robot to track targets in unknown ...
Read More
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Article
- Information
- Contributors
Published in
Applied Soft Computing Volume 145, Issue C
Sep 2023
1314 pages
ISSN:1568-4946
Issue’s Table of Contents
Elsevier B.V.
Sponsors
In-Cooperation
Publisher
Elsevier Science Publishers B. V.
Netherlands
Publication History
- Published: 1 September 2023
Author Tags
- Multi-UAV
- DRL
- TD3
- Expert experience
- Target tracking
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics
- Bibliometrics
- Citations0
Article Metrics
- View Citations
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet
Digital Edition
View this article in digital edition.
View Digital Edition
- Figures
- Other
Close Figure Viewer
Browse AllReturn
Caption
View Issue’s Table of Contents