DRL-Based Dynamic Destroy Approaches for Agile-Satellite Mission Planning
"> Figure 1
<p>Agile satellite observation. Agile satellites with three-axis maneuvering capabilities must schedule a considerable number of remote sensing missions within a limited time frame. Targets beyond the observation range remain undetectable and are therefore considered invalid.</p> "> Figure 2
<p>The process of attention and aggregation operation in graph attention layer.</p> "> Figure 3
<p>Actor-network. The attention mechanism is incorporated into the network architecture, where the initial input tasks are embedded as graph nodes. These graph nodes undergo multiple layers of GAT networks to extract node features, ultimately obtaining the set of nodes to be destroyed in the next iteration.</p> "> Figure 4
<p>Agent–environment interaction in AEOS.</p> "> Figure 5
<p>Training curves for the Area and World scenarios. The two plots show the average scheduled rate of Area and World instances, respectively. The results were obtained by conducting experiments with 5 different random seeds. (<b>a</b>) Training process for Area targets. (<b>b</b>) Training process for World targets.</p> "> Figure 6
<p>Optimization scheduled rate performance contrast. Boxplot illustrating comparison based on 10 different instances. Each box represents the interquartile range (IQR) of the data distribution, with the median marked by a horizontal line inside the box. Triangle markers indicate the mean values of the test results. (<b>a</b>) Scene: Area. (<b>b</b>) Scene: World.</p> "> Figure 7
<p>Optimization Scheduled rate performance contrast. (<b>a</b>) Scene: Area. (<b>b</b>) Scene: World.</p> "> Figure 8
<p>Performance comparison of algorithms with 1500 iterations. Each instance size evaluated across 10 test scenarios. Note: The GA algorithm fails to find a feasible solution when the number of planning tasks exceeds 200. (<b>a</b>) Scene: Area instance size: 50. (<b>b</b>) Scene: Area instance size: 100. (<b>c</b>) Scene: Area instance size: 150. (<b>d</b>) Scene: Area instance size: 200. (<b>e</b>) Scene: World instance size: 100. (<b>f</b>) Scene: World instance size: 200. (<b>g</b>) Scene: World instance size: 300. (<b>h</b>) Scene: World instance size: 400.</p> "> Figure 8 Cont.
<p>Performance comparison of algorithms with 1500 iterations. Each instance size evaluated across 10 test scenarios. Note: The GA algorithm fails to find a feasible solution when the number of planning tasks exceeds 200. (<b>a</b>) Scene: Area instance size: 50. (<b>b</b>) Scene: Area instance size: 100. (<b>c</b>) Scene: Area instance size: 150. (<b>d</b>) Scene: Area instance size: 200. (<b>e</b>) Scene: World instance size: 100. (<b>f</b>) Scene: World instance size: 200. (<b>g</b>) Scene: World instance size: 300. (<b>h</b>) Scene: World instance size: 400.</p> ">
Abstract
:1. Introduction
- A D3RL model is designed for the AEOS scheduling problem, focusing on large-scale scenarios. The state, action, and reward function is clearly explained in MDP.
- A GAT-based actor-network is established using the target clustering information. The clustering graph for the satellite mission is built using an adjacency matrix based on the constraints provided.
- Two application methods for practical agile-satellite mission planning are proposed: the DRL method and the DRL-LNS method. The DRL method utilizes the D3RL model for direct inference, while an improved dynamic destroy operator based on the D3RL model is developed for the DRL-LNS.
- Comparison experiments are implemented between our algorithms and other algorithms (GA, ALNS). The results show that the proposed methods have fewer iteration times to convergence and a higher task-scheduled rate than other traditional heuristic algorithms in large-scale scenarios. The DRL-LNS method outperforms other algorithms in Area instances, while the DRL method performs better in World scenarios.
2. Problem Description
2.1. Problem Assumption
- Each task can only be executed once, and the profit can be obtained when the task is scheduled.
- The execution time windows of any two scheduled tasks cannot overlap and must be within the visible time window.
- Satellite storage, power constraints, and image download processes are not considered. Resource problems are not the key factor for short-term mission planning [6].
- Most of the tasks are regular with equal profit.
- Only the single-orbit scenario of a single satellite is taken into account.
2.2. Variables
2.3. Mathematical Formulation
2.3.1. Constraint Conditions
- Conflict constraint:A satellite can only perform one mission per observation. There is no crossover between any two tasks, as the optical sensor cannot perform two observation tasks at the same time. and :
- Task execution uniqueness constraint:In this paper, we do not consider repetitive tasks. Each task can just be observed at most once in all satellite observation periods:is the execution time window set for task . We define the as a binary decision variable indicating whether the j-th excution time window of associated with task is selected or not. It means that for each task we can choose only one execution time window to fulfill its observation.
- Satellite-attitude maneuver time constraint:The actual observation time of the task must be within a visual time window of the task:Then, we can obtain the satellite-attitude maneuver time constraint between and :
2.3.2. Optimization Objective
3. Modeling Process
3.1. Clustering Graph
- Iterate through all tasks in and construct an adjacency matrix A initialized with zeros. If the two original tasks u and v satisfy the time window constraint, set to one. This process generates the edge, , and eventually constructs the graph .
- Examine each edge in and remove it if its two endpoints do not meet the roll angle constraint. This results in a task clustering graph .
- The clustering algorithm concludes, yielding the final clustering graph G and its corresponding adjacency matrix A.
3.2. GAT-Based Network
3.2.1. Graph Attention Layer
3.2.2. GAT-Based Actor-Network
3.3. Markov Decision Process Modeling
3.3.1. State Space
3.3.2. Action Space
- Initialize all the parameters, given a feasible initial decision vector and state .
- “Destroy” process: We determined the final set of removal points using and . The length of the removal point set, denoted as , is also influenced by the predetermined destroy rate . Similar to Equation (11), the computation process is as follows:
- “Repair” process: The repair process involves re-planning, where the repair operator TimeInsert is employed to include task nodes that adhere to the constraints in , setting the corresponding decision variables in to one, thereby generating a new state . TimeInsert, which is a designed repair operator, will improve the destroyed solution by inserting new tasks into it, following the time sequence of the task’s arrival [19].
3.3.3. Reward
3.3.4. Training Process
Algorithm 1 Training process based on PPO |
|
3.4. Applications
Algorithm 2 DRL-LNS |
|
- 1.
- Parameter initialization. The initial solution, denoted as , can be arbitrary. The initialization function, , checks the initial solution for compliance with the rules and reorders it according to the constraints to obtain a feasible initial solution.
- 2.
- Destruction and repair process. Tasks are first removed from the current solution, and then new tasks are added to generate a new solution. The repair process employs a repair operator, which randomly samples from the set of pending tasks when inserting new tasks. The destruction process utilizes the D3RL-based destruction operator, denoted as D.
- 3.
- Parameter update. The obtained solution is evaluated, and if the reward of the new current solution improves, parameter updates are performed. We mainly use simulated annealing for optimal solution selection.
- 4.
- Termination criterion determination. If the termination criterion is met, the algorithm ends and returns to the optimal value. If not, the iteration continues to search for solutions.
4. Experimental Simulation
4.1. Simulation Scenarios
4.1.1. Task Scheduled Rate
4.1.2. Algorithm Parameters
4.2. Training Performance
Training Comparison
4.3. Testing Performance
4.3.1. Model Generalization Test
4.3.2. Comparison of Algorithms Test
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Singh, P.; Pandey, P.C.; Petropoulos, G.P.; Pavlides, A.; Srivastava, P.K.; Koutsias, N.; Deng, K.A.K.; Bao, Y. Hyperspectral remote sensing in precision agriculture: Present status, challenges, and future trends. In Hyperspectral Remote Sensing; Elsevier: Amsterdam, The Netherlands, 2020; pp. 121–146. [Google Scholar]
- Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A.; Zhang, L. Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man-made disasters. Remote. Sens. Environ. 2021, 265, 112636. [Google Scholar] [CrossRef]
- De Bem, P.P.; de Carvalho Junior, O.A.; Fontes Guimarães, R.; Trancoso Gomes, R.A. Change detection of deforestation in the Brazilian Amazon using landsat data and convolutional neural networks. Remote Sens. 2020, 12, 901. [Google Scholar] [CrossRef]
- Chen, J.; Liu, H.; Hou, J.; Yang, M.; Deng, M. Improving building change detection in VHR remote sensing imagery by combining coarse location and co-segmentation. ISPRS Int. J. Geo-Inf. 2018, 7, 213. [Google Scholar] [CrossRef]
- Liu, R.; Kuffer, M.; Persello, C. The temporal dynamics of slums employing a CNN-based change detection approach. Remote Sens. 2019, 11, 2844. [Google Scholar] [CrossRef]
- Peng, G.; Dewil, R.; Verbeeck, C.; Gunawan, A.; Xing, L.; Vansteenwegen, P. Agile Earth Observation Satellite Scheduling: An Orienteering Problem with Time-Dependent Profits and Travel Times. Comput. Oper. Res. 2019, 111, 84–98. [Google Scholar] [CrossRef]
- Wang, X.; Wu, G.; Xing, L.; Pedrycz, W. Agile Earth Observation Satellite Scheduling Over 20 Years: Formulations, Methods, and Future Directions. IEEE Syst. J. 2021, 15, 3881–3892. [Google Scholar] [CrossRef]
- Lemaître, M.; Verfaillie, G.; Jouhaud, F.; Lachiver, J.M.; Bataille, N. Selecting and Scheduling Observations of Agile Satellites. Aerosp. Sci. Technol. 2002, 6, 367–381. [Google Scholar] [CrossRef]
- Chu, X.; Chen, Y.; Xing, L. A branch and bound algorithm for agile earth observation satellite scheduling. Discret. Dyn. Nat. Soc. 2017, 2017, 7345941. [Google Scholar] [CrossRef]
- Beaumet, G.; Verfaillie, G.; Charmeau, M.C. Feasibility of autonomous decision making on board an agile earth-observing satellite. Comput. Intell. 2011, 27, 123–139. [Google Scholar] [CrossRef]
- Sarkheyli, A.; Vaghei, B.G.; Bagheri, A. New tabu search heuristic in scheduling earth observation satellites. In Proceedings of the 2010 2nd International Conference on Software Technology and Engineering, San Juan, PR, USA, 3–5 October 2010; Volume 2, p. V2-199. [Google Scholar]
- Zhao, Y.; Du, B.; Li, S. Agile Satellite Mission Planning Via Task Clustering and Double-Layer Tabu Algorithm. Comput. Model. Eng. Sci. 2020, 122, 235–257. [Google Scholar] [CrossRef]
- Peng, G.; Song, G.; He, Y.; Yu, J.; Xiang, S.; Xing, L.; Vansteenwegen, P. Solving the Agile Earth Observation Satellite Scheduling Problem with Time-Dependent Transition Times. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 1614–1625. [Google Scholar] [CrossRef]
- Tangpattanakul, P.; Jozefowiez, N.; Lopez, P. Multi-objective optimization for selecting and scheduling observations by agile earth observing satellites. In Proceedings of the Parallel Problem Solving from Nature-PPSN XII: 12th International Conference, Taormina, Italy, 1–5 September 2012; pp. 112–121. [Google Scholar]
- Geng, X.; Li, J.; Yang, W.; Gong, H. Agile satellite scheduling based on hybrid coding genetic algorithm. In Proceedings of the 2016 12th World Congress on Intelligent Control and Automation (WCICA), Guilin, China, 12–15 June 2016; pp. 2727–2731. [Google Scholar]
- Zhang, Z.; Zhang, N.; Feng, Z. Multi-satellite control resource scheduling based on ant colony optimization. Expert Syst. Appl. 2014, 41, 2816–2823. [Google Scholar] [CrossRef]
- Shaw, P. Using constraint programming and local search methods to solve vehicle routing problems. In Proceedings of the Principles and Practice of Constraint Programming—CP98: 4th International Conference, CP98, Pisa, Italy, 26–30 October 1998; pp. 417–431. [Google Scholar]
- Sonnerat, N.; Wang, P.; Ktena, I.; Bartunov, S.; Nair, V. Learning a large neighborhood search algorithm for mixed integer programs. arXiv 2021, arXiv:2107.10201. [Google Scholar]
- Liu, X.; Laporte, G.; Chen, Y.; He, R. An adaptive large neighborhood search metaheuristic for agile satellite scheduling with time-dependent transition time. Comput. Oper. Res. 2017, 86, 41–53. [Google Scholar] [CrossRef]
- Pisinger, D.; Ropke, S. Large neighborhood search. Handbook of Metaheuristics; Springer: Cham, Switzerland, 2019; pp. 99–127. [Google Scholar]
- Mara, S.T.W.; Norcahyo, R.; Jodiawan, P.; Lusiantoro, L.; Rifai, A.P. A survey of adaptive large neighborhood search algorithms and applications. Comput. Oper. Res. 2022, 146, 105903. [Google Scholar] [CrossRef]
- Ropke, S.; Pisinger, D. An adaptive large neighborhood search heuristic for the pickup and delivery problem with time windows. Transp. Sci. 2006, 40, 455–472. [Google Scholar] [CrossRef]
- Pisinger, D.; Ropke, S. A general heuristic for vehicle routing problems. Comput. Oper. Res. 2007, 34, 2403–2435. [Google Scholar] [CrossRef]
- Hottung, A.; Tierney, K. Neural large neighborhood search for the capacitated vehicle routing problem. arXiv 2019, arXiv:1911.09539. [Google Scholar]
- da Costa, P.; Rhuggenaath, J.; Zhang, Y.; Akcay, A.; Kaymak, U. Learning 2-opt heuristics for routing problems via deep reinforcement learning. SN Comput. Sci. 2021, 2, 1–16. [Google Scholar] [CrossRef]
- Wu, G.; Wang, H.; Li, H.; Pedrycz, W.; Qiu, D.; Ma, M.; Liu, J. An adaptive Simulated Annealing-based satellite observation scheduling method combined with a dynamic task clustering strategy. arXiv 2014, arXiv:1401.6098. [Google Scholar]
- Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. arXiv 2016, arXiv:1611.09940. [Google Scholar]
- Wang, H.; Yang, Z.; Zhou, W.; Li, D. Online Scheduling of Image Satellites Based on Neural Networks and Deep Reinforcement Learning. Chin. J. Aeronaut. 2019, 32, 1011–1019. [Google Scholar] [CrossRef]
- Chen, M.; Chen, Y.; Chen, Y.; Qi, W. Deep Reinforcement Learning for Agile Satellite Scheduling Problem. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; pp. 126–132. [Google Scholar] [CrossRef]
- Huang, Y.; Mu, Z.; Wu, S.; Cui, B.; Duan, Y. Revising the Observation Satellite Scheduling Problem Based on Deep Reinforcement Learning. Remote Sens. 2021, 13, 2377. [Google Scholar] [CrossRef]
- He, Y.; Xing, L.; Chen, Y.; Pedrycz, W.; Wang, L.; Wu, G. A Generic Markov Decision Process Model and Reinforcement Learning Method for Scheduling Agile Earth Observation Satellites. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 1463–1474. [Google Scholar] [CrossRef]
- Zhang, C.; Vinyals, O.; Munos, R.; Bengio, S. A study on overfitting in deep reinforcement learning. arXiv 2018, arXiv:1804.06893. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Iacopino, C.; Harrison, S.; Brewer, A. Mission planning systems for commercial small-sat earth observation constellations. In Proceedings of the 9th International Workshop on Planning and Scheduling for Space (IWPSS), VenueBuenos Aires, Argentina, 25–27 June 2015; pp. 45–52. [Google Scholar]
- He, L.; de Weerdt, M.; Yorke-Smith, N. Time/sequence-dependent scheduling: The design and evaluation of a general purpose tabu-based adaptive large neighbourhood search algorithm. J. Intell. Manuf. 2020, 31, 1051–1078. [Google Scholar] [CrossRef]
- Wei, L.; Chen, Y.; Chen, M.; Chen, Y. Deep reinforcement learning and parameter transfer based approach for the multi-objective agile earth observation satellite scheduling problem. Appl. Soft Comput. 2021, 110, 107607. [Google Scholar] [CrossRef]
- Wu, G.; Liu, J.; Ma, M.; Qiu, D. A two-phase scheduling method with the consideration of task clustering for earth observing satellites. Comput. Oper. Res. 2013, 40, 1884–1894. [Google Scholar] [CrossRef]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Chen, M.; Gao, L.; Chen, Q.; Liu, Z. Dynamic Partial Removal: A Neural Network Heuristic for Large Neighborhood Search. arXiv 2020, arXiv:2005.09330. [Google Scholar]
- Jazzbin, J. Geatpy: The Genetic and Evolutionary Algorithm Toolbox with High Performance in Python. Available online: http://www.geatpy.com/ (accessed on 31 July 2020).
- Wouda, N.A.; Lan, L. ALNS: A Python implementation of the adaptive large neighbourhood search metaheuristic. J. Open Source Softw. 2023, 8, 5028. [Google Scholar] [CrossRef]
Name | Description |
---|---|
The i-th task | |
A binary variable representing whether task is selected | |
The selected visible time window for task | |
The selected execution time window for task , which is a subset of | |
Execution time for task | |
The profit for task | |
Satellite attitude maneuver time between tasks and | |
Satellite observation angle for task | |
The set of task-execution time windows for task | |
The i-th embedded graph node | |
W | The weight matrix of transformation |
Roll attitude angle observed by the satellite to the task node i | |
Pitch attitude angle observed by the satellite to the task node i | |
The state in t-th iteration | |
The action in t-th iteration | |
The set of removing action nodes | |
The set of corresponding removing coefficients | |
The reward function | |
R | The discounted cumulative reward |
Scenario | Region | Num of Points | Instance Sizes | Training Scenarios | Test Scenarios |
---|---|---|---|---|---|
Area | 3°N–53°N, 74°E–133°E | 400 | 50, 100, 150, 200 | 20 | 10 |
World | 65°S–65°N, 180°W–180°E | 600 | 100, 200, 300, 400 | 20 | 10 |
Parameters | Value |
---|---|
Learning rate | 0.001 |
Discount factor | 0.99 |
The clip rate | 0.2 |
Batch size | 16 |
Number of iterations in each scene | 3000 |
Number of training scenes | 20 |
Number of epoch K | 2 |
Destroy rate | 0.2 |
Initial temperature | 100 |
Final temperature | 1 |
Methods | Area | World | ||||||
---|---|---|---|---|---|---|---|---|
50 | 100 | 150 | 200 | 100 | 200 | 300 | 400 | |
train_50 | 0.5080 | 0.3460 | 0.3387 | 0.3070 | 0.6160 | 0.5980 | 0.6020 | 0.6215 |
train_500 | 0.5120 | 0.3660 | 0.3493 | 0.3230 | 0.6560 | 0.6080 | 0.6120 | 0.6280 |
train_5000 | 0.5360 | 0.3920 | 0.3827 | 0.3600 | 0.7020 | 0.6680 | 0.6667 | 0.6740 |
trained | 0.5280 | 0.3860 | 0.3600 | 0.3310 | 0.6720 | 0.6280 | 0.6240 | 0.6470 |
Scenarios | Average Scheduled Rate 1 | |||
---|---|---|---|---|
GA | ALNS | DRL-LNS | DRL | |
Area_50 | 0.2640 | 0.4680 | 0.5000 | 0.4080 |
Area_100 | 0.2400 | 0.3760 | 0.4020 | 0.3640 |
Area_150 | 0.2080 | 0.3267 | 0.3747 | 0.3107 |
Area_200 | 0.0000 | 0.3010 | 0.3520 | 0.2840 |
World_100 | 0.6000 | 0.6620 | 0.7000 | 0.7020 |
World_200 | 0.0000 | 0.6150 | 0.6400 | 0.6660 |
World_300 | 0.0000 | 0.6093 | 0.6320 | 0.6640 |
World_400 | 0.0000 | 0.5795 | 0.5940 | 0.6350 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, W.; Li, Z.; He, X.; Xiang, J.; Du, X.; Liang, X. DRL-Based Dynamic Destroy Approaches for Agile-Satellite Mission Planning. Remote Sens. 2023, 15, 4503. https://doi.org/10.3390/rs15184503
Huang W, Li Z, He X, Xiang J, Du X, Liang X. DRL-Based Dynamic Destroy Approaches for Agile-Satellite Mission Planning. Remote Sensing. 2023; 15(18):4503. https://doi.org/10.3390/rs15184503
Chicago/Turabian StyleHuang, Wei, Zongwang Li, Xiaohe He, Junyan Xiang, Xu Du, and Xuwen Liang. 2023. "DRL-Based Dynamic Destroy Approaches for Agile-Satellite Mission Planning" Remote Sensing 15, no. 18: 4503. https://doi.org/10.3390/rs15184503
APA StyleHuang, W., Li, Z., He, X., Xiang, J., Du, X., & Liang, X. (2023). DRL-Based Dynamic Destroy Approaches for Agile-Satellite Mission Planning. Remote Sensing, 15(18), 4503. https://doi.org/10.3390/rs15184503