Open AccessArticle

Optimization of the Stand Structure in Secondary Forests of Pinus yunnanensis Based on Deep Reinforcement Learning

Jian Zhao

¹,

Jianmming Wang

^1,*

Jiting Yin

²,

Yuling Chen

³ and

Baoguo Wu

⁴

School of Mathematics and Computer Science, Dali University, Dali 671003, China

Dali Forestry and Grassland Science Research Institute, Dali 671000, China

Institute of Remote Sensing and Geographic Information System, School of Earth and Space Sciences, Peking University, Beijing 100871, China

⁴

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China

Author to whom correspondence should be addressed.

Forests 2024, 15(12), 2181; https://doi.org/10.3390/f15122181

Submission received: 8 November 2024 / Revised: 29 November 2024 / Accepted: 10 December 2024 / Published: 11 December 2024

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Figure 1
Location of the study area. "> Figure 2
Diagram of the random selection process. "> Figure 3
Diagram of tree homogeneity index process. "> Figure 4
Diagram of the spatial competition process. "> Figure 5
Model of DQN structure. "> Figure 6
DQN algorithm for stand structure optimization. "> Figure 7
Alterations in stand structure indices for different optimization scenarios in different plots. Note: U, W, <math display="inline"><semantics> <mrow> <mi>C</mi> <mi>I</mi> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>C</mi> <mi>d</mi> </mrow> </semantics></math>, S and <math display="inline"><semantics> <mrow> <mi>M</mi> <mi>c</mi> </mrow> </semantics></math> represent the optimized values of neighborhood comparison, uniform angle index, crown competition index, canopy density, stratification index, and complete mingling, respectively. "> Figure 8
Optimization effect of each optimization scheme. "> Figure 9
Felling decision effect of each optimization scheme. Note: The six axes represent the six stand structure optimization schemes A1–A6, and the five line colors represent the optimized objective function values of the six optimization schemes in the five plots P1–P5. "> Figure A1
Optimization effect of each optimization scheme. "> Figure A1 Cont.
Optimization effect of each optimization scheme. ">

Versions Notes

Abstract

This study proposes a multi-objective stand structure optimization scheme based on deep reinforcement learning, demonstrating the strengths of deep reinforcement learning in solving multi-objective optimization problems and providing innovative insights for sustainable forest management. Using the Pinus yunnanensis secondary forest in Southwest China as the research subject, we established a stand structure optimization model with stand spatial structure indexes as the optimization objectives and non-spatial structure indexes as the constraints. We optimized the stand structure by combining deep reinforcement learning with three tree-felling decisions: random selection, tree homogeneity index, and spatial competition. Simulated cutting experiments were conducted on circular plots (P1–P5) using deep reinforcement learning and reinforcement learning. The initial objective function values of all plots (0.2950, 0.2954, 0.3445, 0.3010, 0.3168) were effectively improved. The maximum objective function values after optimization by the deep reinforcement learning schemes (0.3815, 0.3701, 0.4301, 0.4599, 0.3689) were significantly better than those achieved by the reinforcement learning schemes (0.3394, 0.3579, 0.3986, 0.4321, 0.3556). Among these, the optimization scheme combining random selection and deep reinforcement learning showed the greatest average improvement across the five plots (29.73%), with its enhancement of the objective function value significantly surpassing that of other optimization schemes. This study applies deep reinforcement learning to stand structure optimization, proposing a new approach to solving multi-objective optimization problems in stand structure and providing a reference for forest health management in Southwest China.

Keywords:

deep reinforcement learning; stand structure; multi-objective optimization; reinforcement learning

1. Introduction

Pinus yunnanensis Franch., an important component of coniferous forests in southwestern China, is known for its fast growth, strong adaptability, wide range of timber uses, and high resin content. It is often used as a pioneer species for afforestation on barren hills, and it plays an indispensable role in the sustainable development of the economy, society, and the environment in the southwestern region [1]. Secondary forests of Pinus yunnanensis represent the principal forest type in southwestern China [2]. The secondary forest commonly faces issues such as unreasonable stand structure, poor stability, low biodiversity, and high forest fire risk, highlighting a pressing need for stand structure optimization [3,4,5]. However, relevant research remains scarce. Therefore, optimizing the stand structure of Pinus yunnanensis secondary forests is of significant theoretical and practical value for forest health management in southwestern China.

Stand structure is a highly generalized and quantified state of a forest during its dynamic changes, serving as the foundation for determining its functions. It significantly influences the current health status and future development trajectory of the forest [6,7,8,9]. Effective optimization hinges on developing appropriate models and solving them accurately. Current optimization models primarily focus on structural optimization, with spatial structure as the main objective and non-spatial structure as constraints, utilizing selective logging as the primary regulatory method [10,11,12,13]. Therefore, selecting suitable stand structure indicators and swiftly and accurately identifying trees for selective logging are essential issues in stand structure optimization.

Stand structure optimization is a multi-objective optimization problem characterized by large systems, multiple objectives, and nonlinearity, resulting in high computational demands and difficulty finding solutions. Early approaches utilized linear programming and nonlinear programming algorithms [14,15]. However, these methods were only suitable for solving simple optimization models and had some drawbacks such as long solution times and high computational costs. Currently, intelligent optimization algorithms such as Monte Carlo [16,17], genetic algorithms [18,19,20], and particle swarm optimization [21,22] are predominantly used. These algorithms can find relatively optimal solutions within an acceptable range with fewer iterations, but they are limited by inherent defects, including a lack of accuracy, susceptibility to local optima, and significant volatility.

Recently, reinforcement learning has exhibited substantial advantages in handling multi-objective optimization problems due to its powerful autonomous learning capabilities and adaptability. It has found extensive application in areas like smart grids and intelligent transportation systems [23,24,25]. Additionally, reinforcement learning has been practiced in the multi-objective optimization of stand structure. For instance, Xuan [26] employed the Q-Learning algorithm to transform the tree selection behavior in stand structure optimization into the actions of an agent in reinforcement learning. Different felling decisions were compared using particle swarm optimization, Monte Carlo, and Q-Learning algorithms to verify the feasibility and superiority of reinforcement learning in solving stand structure optimization problems. However, when dealing with complex environmental issues, the trial-and-error cost of reinforcement learning increases, leading to unstable training and low generalization capabilities. Compared to reinforcement learning, deep reinforcement learning, which combines different forms of deep learning, offers excellent fast computation capabilities, better solution efficiency and stability in handling complex problems, and stronger generalization capabilities [27,28,29].

Therefore, we attempt to apply Deep Q Network (DQN) from deep reinforcement learning to optimize the stand structure of Pinus yunnanensis secondary forests. We simulate selective logging optimization across different plots, incorporating three different felling decisions, and use Q-Learning from reinforcement learning as a comparative algorithm. This study applies deep reinforcement learning to stand structure optimization, providing insights and references for the management decisions of Pinus yunnanensis secondary forests.

2. Materials and Methods

2.1. Study Areas

The data collection area is situated in Cangshan, Dali, Yunnan Province, southwestern China (

25^{\circ} 34^{'} \sim 26^{\circ} 00^{'}

99^{\circ} 55^{'} \sim 100^{\circ} 12^{'}

E), covering a total area of approximately 293 km² with an elevation ranging from 1966 m to 4122 m (Figure 1). The region belongs to a subtropical plateau monsoon climate zone, characterized by mild temperatures throughout the year and abundant sunshine. The area has an average yearly temperature of 16.1 °C and experiences abundant rainfall exceeding 1000 mm, with 84% of the annual precipitation occurring primarily between May and October. The predominant soil type is Hyperdystric Clayic Ferralsol (Ferric). The dominant tree species is Pinus yunnanensis Franch., accompanied by associated species such as Rhododendron microphyton Franch. Quercus acutissima Carruth., Pinus armandii Franch., Betula alnoides Buch.-Ham. ex D. Don, Vaccinium bracteatum Thunb., Gaultheria griffithiana Wight, Eurya nitida Korthals, Ternstroemia gymnanthera (Wight & Arn.) Bedd., Camellia sinensis (L.) O. Kuntze, and Quercus variabilis Blume.

2.2. Data Collection

This study, based on terrain conditions, stand characteristics, and the advantages of circular plots in forest surveys—such as easier establishment and positioning, better adaptability to complex terrains, and smaller edge errors for the same area—established fixed circular standard plots with radii ranging from 18 to 35 m at elevations of 2100–2400 m in Cangshan [30]. The geographic coordinates, slope, altitude, orientation, and radius of each plot were measured and documented. For all live trees with a diameter at breast height of 5 cm or greater (DBH ≥ 5 cm), basic tree attributes such as species, relative coordinates, tree height, DBH, crown width (

C W

, the average values of two measures taken in the east–west direction and the north–south direction), and crown length (

C L

, the vertical distance from the base of the trunk to the top of the tree crown) were documented. The relative coordinates of each tree base were precisely determined using a total station (GTS-2002). The basic information of the plots is shown in Table 1.

2.3. Determination of Spatial Structure Units and Edge Correction

The spatial relationships among trees were determined using the Voronoi diagram method [31]. In this approach, a reference tree serves as the focal point, with its adjacent trees delineated by the edges shared between the Voronoi polygons surrounding the reference tree. This method effectively captures the proximity relationships between trees and accurately represents their horizontal distribution. To minimize errors in calculating spatial structure parameters due to the potential splitting of edge trees by plot boundaries, a 2 m wide buffer zone was established along the plot perimeter [32]. In this zone, trees were only considered as neighbors and not as central trees when defining the spatial structure units of the stand.

2.4. Stand Structure Indexes

2.4.1. Non-Spatial Structure Indexes

(1) Tree DBH Classes

Trees were classified according to different DBH classes, ensuring that the diversity of tree diameter classes remained consistent before and after cutting. In this study, trees were classified starting from a DBH of 6 cm, with a 2 cm interval for each diameter class.

(2) Number of Species

Maintaining consistent tree species diversity before and after cutting is essential to ensure no species are lost.

(3) Canopy Density (

C d

)

A continuous canopy is typically one of the indicators of a reasonable stand structure. Generally, a canopy density of no less than 0.7 can be considered as continuous forest cover.

(4) Cutting Intensity

When conducting cutting, it is crucial to design the cutting intensity rationally. Typically, the annual cutting volume should be less than the annual growth volume. Based on previous studies [33,34], the cutting intensity for secondary forests of Pinus yunnanensis should be controlled within 35%.

2.4.2. Spatial Structure Indexes

(1) Neighborhood Comparison (U) [35]

Quantified is neighborhood comparison to describe the degree of size differentiation and competition among trees, indicating the proportion of neighboring trees with a diameter greater than that of the reference tree among n neighboring trees. Its expression is

U_{i} = \frac{1}{n} \sum_{j = 1}^{n} k_{i j}

(1)

U_{i}

represents the neighborhood comparison for reference tree i. If the DBH of neighboring tree j is greater than that of reference tree i, then

k_{i j} = 1

; otherwise,

k_{i j} = 0

(2) Crown Competition Index (

C I

) [36]

To describe the competition among trees, we use characteristics such as

C W

and

C L

to calculate the crown overlap area, thereby reflecting the competition pressure experienced by trees during their growth. Its expression is

C I_{i} = \frac{1}{Z_{i}} \times \sum_{j = 1}^{n} A O_{i j} \times \frac{L_{j}}{L_{i}}

(2)

C I_{i}

represents the crown competition index for reference tree i, and

Z_{i}

represents the crown projection area of reference tree i.

L_{j} = H_{j} \times C W_{j} \times C L_{j}

(height of competing tree j ×

C W

of competing tree j ×

C L

of competing tree j),

L_{i} = H_{i} \times C W_{i} \times C L_{i}

(height of reference tree i ×

C W

of reference tree i ×

C L

of reference tree i).

A O_{i j}

represents the crown overlap area between reference tree i and competitor tree j [37]. If there is no overlap,

A O_{i j} = 0

. When there is overlap,

S_{0} = \frac{C W_{i}^{2}}{2} \sum_{j = 1}^{n} a r c c o s (\frac{q_{j}^{2}}{2 C W_{i}^{2}} - 1) - \frac{1}{4} \sum_{j = 1}^{n} q_{j} \sqrt{4 C W_{i}^{2} - q_{i}^{2}}

(3)

S_{1} = \frac{1}{2} \sum_{j = 1}^{n} [C W_{j}^{2} a r c c o s (1 - \frac{4 C W_{i}^{2} - q_{j}^{2}}{2 C W_{j}^{2}}) - \frac{\sqrt{4 C W_{i}^{2} - q_{j}^{2}}}{2} \sqrt{4 C W_{j}^{2} - (4 C W_{i}^{2} - q_{j}^{2})}]

(4)

A O_{i j} = S_{0} + S_{1}

(5)

S_{0}

represents the total shaded area of reference tree i by n competitor trees, and

S_{1}

represents the total shaded area of n competitor trees by reference tree i.

q_{j} = \frac{L_{i j}^{2} - (C W_{j}^{2} - C W_{i}^{2})}{L_{i j}}

L_{i j}

represents the distance between competitor tree j and reference tree i,

C W_{i}

represents the crown width of reference tree i,

C W_{j}

represents the

C W

of competitor tree j, and n represents the number of competitor trees.

(3) Stratification Index (S)

The story index [38] reflects the vertical distribution pattern and structural diversity of forest layers. It is defined as the product of the proportion of neighboring trees that do not belong to the same layer as the reference tree and the diversity of the layer structure within the spatial structural unit. However, this index is proposed under the assumption of a flat terrain without considering the influence of topography. Given the complex terrain and significant topographic variations in the study area, the traditional story index is not applicable. Therefore, we introduce the stratification index, which incorporates topographic considerations into the traditional story index.

In calculating the stratification index, the stand is divided into three vertical layers based on the dominant height: the upper layer consists of trees with heights greater than or equal to two-thirds of the dominant height; the middle layer consists of trees with heights between one-third and two-thirds of the dominant height; and the lower layer consists of trees with heights less than or equal to one-third of the dominant height. To account for the influence of topography on forest stratification, the dominant height is calculated using the 100 tallest trees per hectare as the dominant trees [39,40]. Its expression is

S_{i} = \frac{z_{i}}{3} \times \frac{1}{n} \sum_{j = 1}^{n} (1 - \frac{| F L_{i} - F L_{j} |}{m a x (| F L_{i} - F L_{j} |, 1)})

(6)

F L_{i} = \{\begin{matrix} - 1 & , H_{i} \leq \frac{1}{3} H_{d} \\ 0 & , \frac{1}{3} H_{d} \leq H_{i} \leq \frac{2}{3} H_{d} \\ 1 & , H_{i} \geq \frac{2}{3} H_{d} \end{matrix}

(7)

H_{d} = \frac{1}{⌊ 100 A ⌋} \sum_{i = 1}^{⌊ 100 A ⌋} (H_{s o r t (i)} + E_{s o r t (i)})

(8)

S_{i}

represents the stratification index for reference tree i,

z_{i}

denotes the number of layers within the spatial structure unit to which the reference tree i belongs.

F L_{i}

indicates the classification of reference tree i in the vertical stratification.

H_{i}

represents the height of reference tree i, while

H_{d}

denotes the dominant height. A stands for the plot area per hectare,

H_{s o r t (i)}

is the height of the i-th tree among the tallest

⌊ 100 A ⌋

trees per hectare, and

E_{s o r t (i)}

indicates the relative elevation of the i-th tree among these

⌊ 100 A ⌋

trees. The closer the stratification index is to 1, the more complex the vertical stratification of the stand.

(4) Complete Mingling (

M c

) [41]

This is used to describe the degree of species isolation among trees while considering species diversity by calculating the evenness of the proportions of different species. Its expression is

M c_{i} = \frac{M_{i}}{2} [1 - \frac{1}{{(n + 1)}^{2}} \sum_{j = 1}^{s_{i}} n_{j}^{2} + \frac{n_{i}}{n}]

(9)

M c_{i}

represents the complete mingling of reference tree i.

n_{i}

is the number of different species among the neighboring trees,

n_{j}

is the number of trees of the j-th species among the neighboring trees, and

s_{i}

is the number of species within the spatial structure unit to which reference tree i belongs.

M_{i}

represents the mingling degree of reference tree i.

M_{i} = \frac{1}{n} \sum_{j = 1}^{n} v_{i j}

. When the reference tree i and neighboring tree j are of the same species,

v_{i j} = 0

; otherwise,

v_{i j} = 1

(5) Uniform Angle Index (W) [42]

Used to describe the spatial distribution pattern among trees, it is defined as the proportion of

α

angles (the smaller angles between neighboring trees) that are less than the standard angle

α_{0} (α_{0} = \frac{360^{\circ}}{n + 1})

out of a total of the angles formed. Its expression is

W_{i} = \frac{1}{n} \sum_{j = 1}^{n} z_{i j}

(10)

W_{i}

represents the uniform angle index for reference tree i. When the j-th

α

angle is smaller than the standard angle

α_{0}

z_{i j} = 1

; otherwise,

z_{i j} = 0

2.5. Optimization Model for the Cutting of Stand Structure

2.5.1. Constrain

The stand structure is subject to both spatial and non-spatial constraints. Spatial constraints require that spatial structure indexes do not deteriorate after optimization. Specifically, this means reducing the neighborhood comparison, uniform angle index, and crown competition index, while increasing the complete mingling and stratification index. The goal is to make the horizontal distribution of the stand tend toward randomness, enhance the degree of mingling and vertical richness, and reduce the levels of competition and size differentiation. Non-spatial constraints ensure that the number of species and tree diameter classes do not decrease after optimization, canopy density remains no less than 0.7, and cutting intensity is controlled within 35%. The constraint conditions are expressed as follows

s . t . \{\begin{matrix} \bar{M c} \geq \bar{M c_{0}} \\ \bar{S} \geq \bar{S_{0}} \\ \bar{U} \leq \bar{U_{0}} \\ \bar{C I} \leq \bar{C I_{0}} \\ |\bar{W} - 0.496| \leq |\bar{W_{0}} - 0.496| \\ D = D_{0} \\ T = T_{0} \\ C d \geq 0.7 \\ N \geq N_{0} (100 - 35) / 100 \end{matrix}

(11)

\bar{M c}

\bar{S}

\bar{U}

\bar{C I}

\bar{W}

, D, T,

C d

, and N represent the values of complete mingling, stratification index, neighborhood comparison, crown competition index, uniform angle index, diameter classes, tree species, canopy density, and total number of trees the forest stand after optimization, respectively.

\bar{M c_{0}}

\bar{S_{0}}

\bar{U_{0}}

\bar{C I_{0}}

\bar{W_{0}}

D_{0}

T_{0}

, and

N_{0}

represent the values of complete mingling, stratification index, neighborhood comparison, crown competition index, uniform angle index, diameter classes, tree species, canopy density, and total number of trees in the forest stand before optimization, respectively.

2.5.2. Model Construction

Stand structure optimization is a multi-objective optimization problem. The sub-objectives are interrelated and constrained by each other, making it challenging to achieve optimal solutions for all sub-objectives simultaneously. Therefore, it is necessary to approach the problem holistically, planning and integrating multiple sub-objectives to achieve the overall optimization of the stand structure.

We selected five spatial structure indexes: uniform angle index, neighborhood comparison, complete mingling, stratification index, and crown competition index. Using a "multiplicative and divisive" approach, these spatial structure indexes are integrated into multi-objective planning to establish the objective function for the optimization model for cutting of stand structure:

m a x L = \frac{1}{n} \sum_{i = 1}^{N} \frac{\frac{1 + M c_{i}}{δ_{M c}} \cdot \frac{1 + S_{i}}{δ_{S}}}{\frac{1 + U_{i}}{δ_{U}} \cdot \frac{1 + C I_{i}}{δ_{C I}} \cdot \frac{1 + | W_{i} - 0.496 |}{δ_{| W_{i} - 0.496 |}}}

(12)

W_{i}

M c_{i}

S_{i}

U_{i}

and

C I_{i}

represent the uniform angle index, complete mingling, stratification index, neighborhood comparison, and crown competition index of the reference tree, respectively.

δ_{W}

δ_{M c}

δ_{S}

δ_{U}

and

δ_{C I}

are the standard deviations of these structural indexes. N represents the overall tree count in the forest stand. After stand optimization, the current quality of spatial structure should not be reduced; the complete mingling and stratification index should increase; neighborhood comparison should decrease; and the crown competition index, indicating competition pressure among trees, should be reduced. The mean uniform angle index of an ideal stand should be within the range of [0.475, 0.517]. Taking the midpoint of this range, 0.496, the smaller the value of |

\bar{W}

− 0.496|, the the stand’s horizontal distribution pattern approaches a random distribution.

2.5.3. Felling Decisions

(1) Random selection [43]

Under the constraint of cutting intensity, trees in the initial stand are randomly selected to form the set of trees for cutting (Figure 2).

(2) Tree homogeneity index [44]

Based on the selected five spatial structure indexes, the tree homogeneity index

L_{i}

for each tree is calculated. Within the buffer zone, the tree homogeneity index for each reference tree is calculated and ranked. The trees are then sorted in ascending order of

L_{i}

, forming the set of trees for cutting (Figure 3).

L = \frac{\frac{1 + M c_{i}}{δ_{M c}} \cdot \frac{1 + S_{i}}{δ_{S}}}{\frac{1 + U_{i}}{δ_{U}} \cdot \frac{1 + C I_{i}}{δ_{C I}} \cdot \frac{1 + W_{i}}{δ_{W_{i}}}}

(13)

M c_{i}

W_{i}

S_{i}

C I_{i}

and

U_{i}

denote complete mingling, the uniform angle index, the stratification index, the crown competition index, and the neighborhood comparison, of the reference tree, respectively.

δ_{W}

δ_{M c}

δ_{S}

δ_{C I}

, and

δ_{U}

are the standard deviations of these structural indexes.

(3) Spatial competition [45]

The ideal range for the mean uniform angle index in a stand is [0.475, 0.517]. The smaller the value of |

\bar{W}

− 0.496|, the closer the horizontal distribution pattern of the stand is to a random distribution. Trees with a uniform angle index significantly different from 0.496 are selected for cutting. Additionally, trees with a high neighborhood comparison and crown competition index are also designated for cutting to reduce competition pressure among trees, thus forming the set of trees for cutting (Figure 4).

2.6. Deep Reinforcement Learning Solution Algorithm

Deep reinforcement learning combines reinforcement learning with deep learning; it not only possesses the dynamic decision-making capabilities of reinforcement learning but also benefits from the rapid computation abilities of deep learning. This combination can overcome the computational difficulties and local optima issues associated with traditional heuristic algorithms while offering superior computational power and generalization capabilities compared to reinforcement learning.

2.6.1. Deep Reinforcement Learning Optimization Algorithms

We adopt DQN (Deep Q Network) from deep reinforcement learning as the solution algorithm. DQN introduces target networks and experiences replay mechanisms to mitigate gradient explosion and enhance training efficiency [46,47]. The agent interacts with the environment by observing states, selecting actions, and receiving rewards, forming quadruple (

s, a, r, s^{'}

), the values of which are stored in the replay buffer. The experience replay mechanism improves training efficiency by decorrelating samples and allowing multiple updates from the same data. Both the main network

Q (s, a; θ)

and the target network

Q (s^{'}, a^{'}; θ^{'})

are composed of three fully connected layers. This structure is chosen for its simplicity, efficiency, and capability to approximate complex nonlinear functions, which makes it well suited for mapping state–action pairs to Q-values. Randomly sampled experiences are used to train the main network

Q (s, a; θ)

, which predicts the current Q-values, while the target network

Q (s^{'}, a^{'}; θ^{'})

provides stable target Q-values for reference. The parameters

θ

of the main network are updated at every training step, while the parameters

θ^{'}

of the target network are updated periodically by copying from the main network. This ensures the stability of the target Q-values and the training process. The agent employs the

ϵ

-greedy strategy to select actions, choosing the action with the highest Q-value output by the main network with a probability of 1-

ϵ

, and exploring random actions with a probability of

ϵ

. Through continuous interaction, learning, and optimization, the agent gradually approaches the optimal policy, achieving efficient decision-making in complex environments (Figure 5).

The target Q-value is calculated using the following formula:

Q_{t a r g e t} = r + γ \underset{a^{'}}{m a x} Q (s^{'}, a^{'}; θ^{'})

(14)

r represents the reward obtained by the agent after taking an action.

γ

represents the discount factor, indicating the agent’s emphasis on future rewards.

s^{'}

and

a^{'}

represent the next state and next action, respectively, and

θ^{'}

denotes the parameters of the target network.

The loss function is calculated using the following formula:

L = E [{(Q_{t a r g e t} - Q (s, a; θ))}^{2}]

(15)

a and s denote the action and state, separately, and

θ

denotes the parameters of the main network.

2.6.2. Algorithms for Solving the Model

We transform the behavior of selecting trees for cutting in the stand structure optimization into the agent’s action selection in deep reinforcement learning. While interacting with the environment, the agent determines whether to execute the cutting action by balancing exploration and exploitation. The trees for cutting are selected using felling decision, and the objective function value of the stand after cutting is simulated and calculated to assess the optimization quality. Consequently, the agent assigns an appropriate reward to the action, which influences subsequent action choices. The agent persistently engages with the environment to accumulate experience, saving it in a buffer. Data batches are drawn from the buffer to adjust the neural network parameters, allowing the agent to respond more efficiently to changes in the stand structure and make more adaptive decisions. Gradually, the agent identifies the set of trees for cutting corresponding to the maximum reward value, thereby enhancing the optimization effect and the efficiency of the stand structure.

The DQN solution for the stand structure optimization is illustrated in Figure 6. The pseudocode for implementing DQN in the stand structure optimization is provided in Appendix A.

3. Results

To verify the practical application effectiveness of the deep reinforcement learning solution algorithm in multi-objective stand structure optimization and to measure the influence of the felling decision on model performance, we conducted simulated cutting experiments on real stands in five standard plots with different densities and site conditions. Three different felling decisions and two different solution algorithms, DQN and Q-Learning, were employed, resulting in six stand structure optimization schemes. The optimization effects of these different schemes in solving the stand structure optimization problem are compared (Table 2).

3.1. Parameter Configuration

The parameter configurations for the solution algorithms used in the experiment are presented in Table 3. The initial iteration count and the maximum iteration count for the solution algorithms are set to 0 and 10,000, respectively. Throughout the optimization process, the stand structure’s state at various periods is abstracted as a sequence. At the start of each iteration, the agent begins from the initial state (

s t a t e = 0

). The agent engages with the environment to decide whether to move forward (

s t a t e = s t a t e + 1

) or backward (

s t a t e = s t a t e - 1

). An iteration ends when the agent reaches the final state (

s t a t e_{m a x}

= 100).

To compare the performance of DQN and Q-Learning, the same parameters were used for both algorithms (

γ = 0.9

l r = 0.01

ϵ = 0.9

). Additionally, the experience replay buffer capacity for DQN was set to 10,000, the batch size for data sampling was set to 32, and a three-layer fully connected network with a hidden layer size of 24 was used. These parameter settings were derived from the optimal results after multiple experimental adjustments.

3.2. Results of Simulated Cutting Optimization

As shown in Figure 7, after cutting adjustments based on different optimization schemes, the quality of stand structure indices in each plot improved to varying degrees while meeting the constraint conditions, effectively enhancing the stand structure in each plot. The mean uniform angle index of the stands slightly narrowed the gap to 0.496, bringing the overall horizontal distribution of the plots closer to a random pattern. The complete mingling of the stands improved to varying extents across all plots, with plot P4 showing the most significant increase, reaching a maximum of 83.61%. This substantial improvement is likely due to the initial complete mingling of P4 being the lowest at only 0.00276, indicating an extremely low mingling state close to zero mingling. Consequently, the optimization resulted in a noticeable enhancement. The crown competition index of each plot significantly decreased, indicating that the competition among trees was alleviated to some extent after optimization. Additionally, the stratification index of each plot improved, suggesting that the vertical structure of the forest layers became richer, enhancing the vertical distribution pattern. The decrease in neighborhood comparison across all plots was minimal, which is likely related to the fact that the average neighborhood comparison in each plot was already in an intermediate competition level, limiting the potential for further improvement.

3.3. Algorithm Performance

We applied six stand structure optimization schemes to model the cutting optimization on five plots. The optimal objective function values obtained for each scheme and plot are shown in (Table 4). Figure 8 depicts the convergence status of each optimization scheme when addressing the stand structure optimization problem across different plots.

As shown in Table 4, the objective function values for each plot improved to varying degrees after optimization. The values for plots P1 to P5 increased from 0.2950, 0.2954, 0.3445, 0.3010, and 0.3168 to a maximum of 0.3815, 0.3701, 0.4301, 0.4599, and 0.3689, respectively, with the highest increase reaching 29.73%. In the simulation optimization experiments for these five plots, under the same plot and felling decision, the objective function values obtained using the DQN algorithm were higher than those obtained using the Q-Learning algorithm. This demonstrates the feasibility of deep reinforcement learning in handling stand structure optimization problems and highlights its superiority over reinforcement learning in terms of optimization effectiveness.

As shown in Figure 8, the number of iterations required for convergence varies among different plots. This variation is related to the distinct stand structure conditions and site characteristics of each plot. In most plots, the deep reinforcement learning algorithms converged faster than the reinforcement learning algorithms under different optimization schemes. Notably, within the same algorithm, plot P4 exhibited the greatest improvement, while plot P5 showed the least. This is partly due to the fact that plot P4 has the highest canopy density (0.8941), and plot P5 has the lowest canopy density (0.7529). Under the constraint conditions, more trees could be cut in plot P4 compared to plot P5, resulting in a more significant improvement for plot P4.

3.4. Influence of Felling Decisions

As shown in Figure 9, the optimization effect of the DQN algorithm combined with the random selection optimization scheme A2 is the most significant. It exhibits the highest improvement in objective function values across all plots, significantly outperforming the combination of the Q-Learning algorithm and all the felling decisions in the same plots. Additionally, A2 generally shows faster convergence speeds compared to the Q-Learning algorithm combined with the random selection optimization scheme A1. This demonstrates the strong compatibility of deep reinforcement learning and random selection in stand structure optimization, offering advantages in both optimization effectiveness and speed.

The optimization effects of the DQN algorithm combined with the schemes based on tree homogeneity index and spatial competition, A4 and A6, are 20.40% and 21.22%, respectively. These are significantly better than the improvement rates of 16.47% and 18.22% achieved by combining the Q-Learning algorithm with the same decisions, A3 and A5. This indicates that the DQN algorithm has stronger optimization capabilities and higher efficiency with the same felling decisions. In comparing the same algorithm, the optimization effects of A3, A4, and A5, A6 vary across different plots, but overall, the optimization schemes based on the tree homogeneity index slightly outperform those based on spatial competition. This suggests that the tree homogeneity index, being a more holistic and thorough evaluation metric, can more accurately play a more significant role in the optimization process. In contrast, the optimization schemes based on spatial competition, while accounting for the competitive relationships among trees, lack comprehensiveness and thus show slightly inferior optimization effects. However, the spatial competition-based schemes may perform better in specific plots, indicating the need to choose the most suitable optimization scheme based on specific circumstances in practical applications.

4. Discussion

Optimizing forest stand structure is a critical challenge in sustainable forest management. The main issues involve scientifically constructing quantitative models of stand structure and designing appropriate solving algorithms. The quantitative model of forest stand structure, through the design of non-spatial and spatial structural constraints, provides essential support for stand structure optimization. As the objective function value of the model increases, the overall stand structure becomes more optimal. Traditional methods and common intelligent optimization algorithms often struggle to efficiently and accurately achieve good results due to the large computational requirements and long solving times. Nevertheless, with the swift advancement of artificial intelligence technology, reinforcement learning has become a valuable tool in forest management, owing to its automated decision-making and ability to dynamically adapt [48]. Previous studies have introduced reinforcement learning to forest stand structure optimization and demonstrated its superiority over traditional intelligent optimization algorithms [26,49,50,51]. Nevertheless, reinforcement learning has its shortcomings, such as instability in solving and weak generalization capabilities [52,53]. Deep reinforcement learning, which integrates deep learning, excels in handling high-dimensional and complex decision problems. It leverages neural networks to automatically extract complex patterns from large datasets, thereby optimizing decision strategies and enhancing multi-objective optimization. Therefore, this study applies deep reinforcement learning to optimize forest stand structure, using reinforcement learning, which has been shown to outperform traditional intelligent optimization algorithms, as a comparison. We conducted simulated cutting optimization experiments using survey data from five circular plots in Pinus yunnanensis secondary forests in Southwest China. The results showed that deep reinforcement learning significantly outperformed reinforcement learning in terms of optimization efficiency and accuracy. This research provides valuable insights and methodologies for managing and optimizing Pinus yunnanensis secondary forest stands in southwest China. Furthermore, due to its robust generalization abilities, deep reinforcement learning can provide important insights for managing forests in different regions or with various tree species.

After conducting simulated cutting optimization experiments with different schemes, the forest stand structure in each plot was significantly improved. The forest trees’ horizontal distribution pattern shifted closer to randomness, the level of species spatial segregation was significantly improved, the competition pressure among the stands was reduced, and the vertical distribution pattern of the trees became more varied. Compared to reinforcement learning, the deep reinforcement learning algorithm showed a more significant improvement in the objective function value. In combination with different felling decisions, the improvement rates for the deep reinforcement learning optimization scheme were 29.73%, 20.40%, and 21.22%, respectively, significantly higher than the corresponding reinforcement learning optimization schemes’ rates of 20.62%, 16.47%, and 18.22%. This improvement not only demonstrates the significant optimization advantage of deep reinforcement learning algorithms but also reflects their superiority in handling complex decision-making problems [54,55,56].

Additionally, the deep reinforcement learning algorithm exhibited a faster convergence rate during the optimization process. From the optimization results of each plot, it is evident that the maximum optimization effects achieved by the deep reinforcement learning schemes (0.3815, 0.3701, 0.4301, 0.4599, 0.3689) were higher than those achieved by the reinforcement learning schemes (0.3394, 0.3579, 0.3986, 0.4321, 0.3556). Furthermore, the convergence speed was relatively faster. In contrast to reinforcement learning, deep reinforcement learning leverages neural networks to extract valuable features from extensive datasets. This automatic feature extraction process not only reduces errors but also improves the model’s capability to represent the data, allowing the deep reinforcement learning algorithm to identify the optimal solution more rapidly when dealing with complex decision-making challenges in forest stand structure optimization [57,58]. However, due to the demands of neural networks in handling complex environments and large-scale state spaces, deep reinforcement learning has higher time and space complexity compared to reinforcement learning. Therefore, while maintaining the powerful performance of deep reinforcement learning in solving multi-objective optimization problems, minimizing time and space complexity will be a crucial focus for future research.

In forest stand structure optimization, the felling decision is a critical factor influencing optimization efficiency. Among different felling decisions, the combination of random selection with deep reinforcement learning showed the most significant optimization effects, with the greatest increase in the objective function’s values. Compared to the other two felling decisions, random selection provided more sample data and a wider range of candidate trees, allowing deep reinforcement learning to more comprehensively learn the optimization characteristics of forest stand structure, thus achieving better optimization results [59,60]. Similarly, the compatibility of reinforcement learning with random selection was also the best, consistent with the conclusions of Xuan et al. [26]. The combination of deep reinforcement learning with felling decisions based on the tree homogeneity index and spatial competition was also superior to its combination with reinforcement learning, but the improvement was relatively small compared to random selection. This indicates that while deep reinforcement learning has certain compatibility with these two felling decisions, further research in algorithm parameter adjustment and model optimization is needed to fully realize its potential. Additionally, exploring more felling decisions within the framework of deep reinforcement learning could enhance the effectiveness of forest stand structure optimization.

Although this study did not directly use traditional intelligent optimization algorithms for simulation experiments, Xuan et al. [26] compared reinforcement learning with traditional intelligent optimization algorithms such as Monte Carlo and Particle Swarm Optimization and verified that reinforcement learning achieved better optimization results. Our study demonstrates that deep reinforcement learning, which outperforms standard reinforcement learning, shows higher computational efficiency and stronger adaptability. Compared to traditional intelligent optimization algorithms, deep reinforcement learning is more effective and reliable in handling complex optimization problems such as forest stand structure optimization.

5. Conclusions

This study introduces deep reinforcement learning to optimize stand structure by establishing an objective function based on uniform angle index, complete mingling, stratification index, neighborhood comparison, and crown competition index. Three different felling decisions were combined with reinforcement learning for experimental comparison, verifying the superiority of deep reinforcement learning in stand structure optimization. By transforming the behavior of selecting trees for cutting into the actions of an agent using deep reinforcement learning, and undergoing continuous trial-and-error training, neural networks are integrated for rapid computation. Furthermore, the experience replay mechanism is implemented to improve the stability and effectiveness of the optimization process. This approach provides advantages over traditional reinforcement learning, offering better performance and stronger generalization capabilities. The study offers new solutions and technical support for multi-objective optimization of stand structure, with broad application prospects. The felling decisions significantly influence the optimization effect, with the random selection combined with deep reinforcement learning (A2) showing the best performance among all optimization schemes. Therefore, the choice of an appropriate felling decision is important for achieving optimal results in stand structure optimization.

In future research, while deeply introducing deep reinforcement learning algorithms, the following aspects need to be improved: (1) We only employed the basic DQN algorithm in deep reinforcement learning. Further research and verification are needed to determine whether other more advanced algorithms and their corresponding improvements are more effective in solving multi-objective optimization problems for stand structure. (2) In stand structure optimization, replanting is a significant measure besides cutting. Future research will focus on the collaborative effort of cutting and replanting using multiple agents. Multi-agent deep reinforcement learning will be a key area of focus for optimizing stand structure. (3) Stand structure optimization is a continuous process that necessitates several adjustments to progressively achieve the ideal state. While optimizing the current stand structure, it is also necessary to consider the future stand structure. By using stand structure prediction models to forecast future stand factors and structural indexes, the future stand structure can be gradually optimized toward the ideal stand structure. (4) In this study, for the sake of experimental comparison, different plot sizes were uniformly divided into 2 m buffer zones. However, whether this buffer size is reasonable and whether different plot sizes should have different buffer sizes still require further research and experimentation. (5) GIS can provide precise spatial data analysis and visualization capabilities, aiding in the effective planning and implementation of cutting and replanting activities. Therefore, applying GIS to stand structure optimization will also be part of future research.

Author Contributions

Conceptualization, J.Z. and J.W.; methodology, J.Z. and J.W.; software, Y.C.; validation, J.W., Y.C. and B.W.; formal analysis, J.Z.; investigation, J.Z.; resources, J.W.; data curation, J.Z., J.W., and J.Y.; writing—original draft preparation, J.Z.; writing—review and editing, J.W. and B.W.; visualization, J.W., J.Y., Y.C. and B.W.; supervision, J.W.; project administration, J.W.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 32460389), the Yunnan Fundamental Research Projects (grant number 202201AT070006), and the Yunnan Postdoctoral Research Fund Projects (grant number ynbh20057).

Data Availability Statement

The data supporting the findings of this study are available within the article. Additional information can be obtained by contacting the corresponding author.

Acknowledgments

The authors sincerely thank all the members of the research group for their invaluable assistance in data collection. Your dedication and hard work were essential to the successful completion of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

The following is the pseudocode for applying DQN to forest stand structure optimization. Here, S represents states, A represents actions,

ϵ

represents the exploration rate for the

ϵ

-greedy strategy,

α

represents the learning rate,

γ

represents the discount factor,

M A X E P I S O D E S

represents maximum episodes,

R E P L A Y

represents the replay buffer, and

M A I N_N E T

and

T A R G E T_N E T

represent the main network and the target network, respectively, which are used to predict Q-values and stabilize training. DQN allows the agent to gradually learn the optimal cutting approach through multiple iterations and Q-value updates, allowing it to identify an appropriate set of trees for cutting, thereby achieving forest stand structure optimization.

Algorithm A1: DQN Algorithm for Stand Structure Optimization

Appendix B

Canopy density is one of the key characteristics used to describe the state of forest systems and environmental indicators, and it is often an important metric for controlling cutting intensity. There are various methods for calculating canopy density, among which the crown projection method, which calculates the ratio of the total crown projection area to the total sample plot area, is one of the most accurate. We utilized the spatial analysis and computation capabilities of GIS to calculate canopy density based on GIS, with the following formula:

\begin{matrix} C d = \frac{S_{c}}{S_{r}} \end{matrix}

(A1)

S_{c}

represents the total crown projection area, and

S_{r}

represents the total area of the sample plot.

Algorithm A2: Canopy Density Calculation

Appendix C

Figure A1 shows the time required for different optimization schemes to reach convergence across different plots. By analyzing the runtime and convergence performance of different optimization schemes in the figure, it can be observed that compared to Q-Learning, the DQN algorithm generally achieves higher objective function values, but in most cases, it requires a longer runtime (as depicted in P2 in Figure A1). This is likely because DQN introduces neural networks, which make the optimization method more advantageous in complex problems, resulting in higher objective function values. However, the trade-off is an increase in runtime. Nevertheless, in some cases, the DQN algorithm achieves higher objective function values than Q-Learning while also requiring a shorter runtime (as depicted in scheme A2 of P1 in Figure A1). This indicates that with proper adjustments and configurations of the neural network, DQN has the potential to outperform Q-Learning in terms of both convergence performance and runtime efficiency.

Figure A1. Optimization effect of each optimization scheme.

References

Xu, Y.; Woeste, K.; Cai, N.; Kang, X.; Li, G.; Chen, S.; Duan, A. Variation in needle and cone traits in natural populations of Pinus yunnanensis. J. For. Res. 2016, 27, 41–49. [Google Scholar] [CrossRef]
Wang, L.; Huang, X.; Su, J. Tree Species diversity and stand attributes differently influence the ecosystem functions of Pinus Yunnanensis secondary forests under the climate context. Sustainability 2022, 14, 8332. [Google Scholar] [CrossRef]
Yang, G. Study on the Division of Forest Fire Danger Grade in Cangshan Mountain of Dali City. J. Fujian For. Sci. Technol. 2015, 42, 138–142. [Google Scholar]
Ding, Y.; Zang, R. Effects of thinning on the demography and functional community structure of a secondary tropical lowland rain forest. J. Environ. Manag. 2021, 279, 111805. [Google Scholar] [CrossRef] [PubMed]
Zaizhi, Z. Status and perspectives on secondary forests in tropical China. J. Trop. For. Sci. 2001, 13, 639–651. [Google Scholar]
Huang, X.; Su, J.; Li, S.; Liu, W.; Lang, X. Functional diversity drives ecosystem multifunctionality in a Pinus yunnanensis natural secondary forest. Sci. Rep. 2019, 9, 6979. [Google Scholar] [CrossRef]
Ali, A. Forest stand structure and functioning: Current knowledge and future challenges. Ecol. Indic. 2019, 98, 665–677. [Google Scholar] [CrossRef]
Tang, C.Q.; Shen, L.Q.; Han, P.B.; Huang, D.S.; Li, S.; Li, Y.F.; Song, K.; Zhang, Z.Y.; Yin, L.Y.; Yin, R.H.; et al. Forest characteristics, population structure and growth trends of Pinus yunnanensis in Tianchi National Nature Reserve of Yunnan, southwestern China. Veg. Classif. Surv. 2020, 1, 7–20. [Google Scholar] [CrossRef]
Von Gadow, K.; Hui, G. Characterizing forest spatial structure and diversity. In Sustainable Forestry in Temperate Regions; Björk, L., Ed.; SUFOR, University of Lund: Lund, Sweden, 2002; pp. 20–30. [Google Scholar]
Liu, S. Research on the Analysis and Multi-Objective Intelligent Optimization of Stand Structure of Natural Secondary Forest. Ph.D. Thesis, Central South University of Forestry and Technology, Changsha, China, 2017. [Google Scholar]
Dong, L.; Bettinger, P.; Liu, Z. Optimizing neighborhood-based stand spatial structure: Four cases of boreal forests. For. Ecol. Manag. 2022, 506, 119965. [Google Scholar] [CrossRef]
Qing, D.; Peng, J.; Li, J.; Liu, S.; Deng, Q. Genetic algorithm to solve the optimization problem of stand spatial structure. J. For. Environ. 2022, 42, 434–441. [Google Scholar]
Chi, P.; Zhu, K.; Li, J.; Ai, W.; Huang, J.; Qing, D. Dynamic Multi-Objective Optimization Model for Forest Spatial Structure with Environmental Detection Mechanism. In Proceedings of the 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Zhangjiajie, China, 10–12 August 2019; pp. 1635–1642. [Google Scholar]
Hof, J.G.; Kent, B.M. Nonlinear programming approaches to multistand timber harvest scheduling. For. Sci. 1990, 36, 894–907. [Google Scholar] [CrossRef]
Barahona, F.; Weintraub, A.; Epstein, R. Habitat dispersion in forest planning and the stable set problem. Oper. Res. 1992, 40, S14–S21. [Google Scholar] [CrossRef]
Haight, R.G.; Travis, L.E. Wildlife conservation planning using stochastic optimization and importance sampling. For. Sci. 1997, 43, 129–139. [Google Scholar] [CrossRef]
Boston, K.; Bettinger, P. An analysis of Monte Carlo integer programming, simulated annealing, and tabu search heuristics for solving spatial harvest scheduling problems. For. Sci. 1999, 45, 292–301. [Google Scholar] [CrossRef]
Okasha, N.M.; Frangopol, D.M. Lifetime-oriented multi-objective optimization of structural maintenance considering system reliability, redundancy and life-cycle cost using GA. Struct. Saf. 2009, 31, 460–474. [Google Scholar] [CrossRef]
Wang, J.; Wu, B.; Liang, Q. Forest Thinning Subcompartment Intelligent Selection Based on Genetic Algorithm. Sci. Silvae Sin. 2017, 53, 63–72. [Google Scholar]
Fotakis, D.G.; Sidiropoulos, E.; Myronidis, D.; Ioannou, K. Spatial genetic algorithm for multi-objective forest planning. For. Policy Econ. 2012, 21, 12–19. [Google Scholar] [CrossRef]
Qiu, H.; Zhang, H.; Lei, K.; Hu, X.; Yang, T.; Jiang, X. A New Tree-Level Multi-Objective Forest Harvest Model (MO-PSO): Integrating Neighborhood Indices and PSO Algorithm to Improve the Optimization Effect of Spatial Structure. Forests 2023, 14, 441. [Google Scholar] [CrossRef]
Li, J.; Zhang, H.; Liu, S.; Kuang, Z.; Wang, C.; Zang, H.; Cao, X. A space optimization model of water resource conservation forest in Dongting Lake based on improved PSO. Acta Ecol. Sin. 2013, 33, 4031–4040. [Google Scholar]
Wiering, M.A.; Van Otterlo, M. Reinforcement learning. Adapt. Learn. Optim. 2012, 12, 729. [Google Scholar]
Tahvonen, O.; Suominen, A.; Malo, P.; Viitasaari, L.; Parkatti, V.P. Optimizing high-dimensional stochastic forestry via reinforcement learning. J. Econ. Dyn. Control 2022, 145, 104553. [Google Scholar] [CrossRef]
Zou, F.; Yen, G.G.; Tang, L.; Wang, C. A reinforcement learning approach for dynamic multi-objective optimization. Inf. Sci. 2021, 546, 815–834. [Google Scholar] [CrossRef]
Xuan, S.; Wang, J.; Chen, Y. Reinforcement Learning for Stand Structure Optimization of Pinus yunnanensis Secondary Forests in Southwest China. Forests 2023, 14, 2456. [Google Scholar] [CrossRef]
Li, K.; Zhang, T.; Wang, R. Deep Reinforcement Learning for Multiobjective Optimization. IEEE Trans. Cybern. 2021, 51, 3103–3114. [Google Scholar] [CrossRef] [PubMed]
Waschneck, B.; Reichstaller, A.; Belzner, L.; Altenmüller, T.; Bauernhansl, T.; Knapp, A.; Kyek, A. Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP 2018, 72, 1264–1269. [Google Scholar] [CrossRef]
Oroojlooyjadid, A.; Nazari, M.; Snyder, L.V.; Takáč, M. A deep q-network for the beer game: Deep reinforcement learning for inventory optimization. Manuf. Serv. Oper. Manag. 2022, 24, 285–304. [Google Scholar] [CrossRef]
Packalen, P.; Strunk, J.; Maltamo, M.; Myllymäki, M. Circular or square plots in ALS-based forest inventories—Does it matter? Forestry 2023, 96, 49–61. [Google Scholar] [CrossRef]
Liu, H.; Dong, X.; Meng, Y.; Gao, T.; Mao, L.; Gao, R. A novel model to evaluate spatial structure in thinned conifer-broadleaved mixed natural forests. J. For. Res. 2023, 34, 1881–1898. [Google Scholar] [CrossRef]
Gadow, K. Beziehungen zwischen winkelmaB and baumabstanden. Forstwiss Centralbl 2003, 122, 127–137. [Google Scholar]
Han, M.; Li, L.; Zheng, W.; Su, J.; Li, W.; Gong, J.; Zheng, S. Effects of different intensity of thinning on the improvement of middle-aged Yunnan pine stand. J. Cent. South Univ. For. Technol. 2011, 31, 27–33. [Google Scholar]
Su, J.; Li, L.; Zheng, W.; Yang, W.; Han, M.; Huang, Z.; Xu, P.; Feng, Z. Effect of Intermediate Cutting Intensity on Growth of Pinus yunnanensis Plantation. West. For. Sci. 2010, 39, 27–32. [Google Scholar]
Aguirre, O.; Hui, G.; von Gadow, K.; Jiménez, J. An analysis of spatial forest structure using neighbourhood-based variables. For. Ecol. Manag. 2003, 183, 137–145. [Google Scholar] [CrossRef]
Wang, J.M. Study on Decision Technology of Tending Felling for Larix principis-Rupprechtii Plantation Forest. Ph.D. Thesis, Beijing Forestry University, Beijing, China, 2017. [Google Scholar]
Bao, Q.; Liang, X.; Weng, G. The concept of tree crown overlap and its calculating methods of area. J. Northeast. For. Univ. 1995, 23, 103–109. [Google Scholar]
Zhou, C.; Liu, D.; Chen, K.; Hu, X.; Lei, X.; Feng, L.; Zhang, Y.; Zhang, H. Spatial structure dynamics and maintenance of a natural mixed forest. Forests 2022, 13, 888. [Google Scholar] [CrossRef]
Lei, X.; Zhu, G.; Lu, J. Top Height Estimation for Mixed Spruce-fir-deciduous Over-logged Forests. For. Res. 2018, 31, 36–41. [Google Scholar]
Sharma, M.; Amateis, R.L.; Burkhart, H.E. Top height definition and its effect on site index determination in thinned and unthinned loblolly pine plantations. For. Ecol. Manag. 2002, 168, 163–175. [Google Scholar] [CrossRef]
Sheng, Q.; Dong, L.; Chen, Y.; Liu, Z. Selection of the Optimal Timber Harvest Based on Optimizing Stand Spatial Structure of Broadleaf Mixed Forests. Forests 2023, 14, 2046. [Google Scholar] [CrossRef]
Zhang, G.; Hui, G.; Zhao, Z.; Hu, Y.; Wang, H.; Liu, W.; Zang, R. Composition of basal area in natural forests based on the uniform angle index. Ecol. Inform. 2018, 45, 1–8. [Google Scholar] [CrossRef]
Tang, M.; Tang, S.; Lei, X.; Li, X. Study on spatial structure optimizing model of stand selection cutting. Sci. Silvae Sin. 2004, 40, 25–31. [Google Scholar]
Yu, Y.T. Study on Forest Structure of Different Recovery Stages and Optimization Models of Natural Mixed Spruce-Fir Secondary Forests on Selective Cutting. Ph.D. Thesis, Beijing Forestry University, Beijing, China, 2019. [Google Scholar]
Zhang, G.; Hui, G.; Zhang, G.; Zhao, Z.; Hu, Y. Telescope method for characterizing the spatial structure of a pine-oak mixed forest in the Xiaolong Mountains, China. Scand. J. For. Res. 2019, 34, 751–762. [Google Scholar] [CrossRef]
Fan, J.; Wang, Z.; Xie, Y.; Yang, Z. A theoretical analysis of deep Q-learning. In Proceedings of the Learning for Dynamics and Control (PMLR), Virtual, 10–11 June 2020; pp. 486–489. [Google Scholar]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Malo, P.; Tahvonen, O.; Suominen, A.; Back, P.; Viitasaari, L. Reinforcement learning in optimizing forest management. Can. J. For. Res. 2021, 51, 1393–1409. [Google Scholar] [CrossRef]
Chen, Y.T.; Chang, C.T. Multi-coefficient goal programming in thinning schedules to increase carbon sequestration and improve forest structure. Ann. For. Sci. 2014, 71, 907–915. [Google Scholar] [CrossRef]
Zhou, Z.; Yang, D.; Liu, H. Improved Sparrow Search Algorithm in Optimizing Spatial Structure of Forest Stands. J. Northeast. Foresrty Univ. 2023, 51, 68–73. [Google Scholar]
Qing, D.; Zhang, X.; Li, J.; Guo, R.; Deng, Q. Spatial Structure Optimization of Natural Forest Based on Bee Colony-particle Swarm Algorithm. J. Syst. Simul. 2020, 32, 371. [Google Scholar]
Wang, C.; Zhang, X.; Zou, Z.; Wang, S. On Path Planning of Unmanned Ship Based on Q-Learning. Ship Ocean Eng. 2018, 47, 168–171. [Google Scholar]
Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Joo, H.; Lim, Y. Traffic signal time optimization based on deep Q-network. Appl. Sci. 2021, 11, 9850. [Google Scholar] [CrossRef]
Liang, Z.; Yang, R.; Wang, J.; Liu, L.; Ma, X.; Zhu, Z. Dynamic constrained evolutionary optimization based on deep Q-network. Expert Syst. Appl. 2024, 249, 123592. [Google Scholar] [CrossRef]
Yang, Y.; Juntao, L.; Lingling, P. Multi-robot path planning based on a deep reinforcement learning DQN algorithm. CAAI Trans. Intell. Technol. 2020, 5, 177–183. [Google Scholar] [CrossRef]
Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
Dulac-Arnold, G.; Evans, R.; van Hasselt, H.; Sunehag, P.; Lillicrap, T.; Hunt, J.; Mann, T.; Weber, T.; Degris, T.; Coppin, B. Deep reinforcement learning in large discrete action spaces. arXiv 2015, arXiv:1512.07679. [Google Scholar]
Shresthamali, S.; Kondo, M.; Nakamura, H. Multi-Objective Resource Scheduling for IoT Systems Using Reinforcement Learning. J. Low Power Electron. Appl. 2022, 12, 53. [Google Scholar] [CrossRef]
Hu, T.; Luo, B.; Yang, C. Multi-objective optimization for autonomous driving strategy based on Deep Q Network. Discov. Artif. Intell. 2021, 1, 11. [Google Scholar] [CrossRef]

Figure 1. Location of the study area.

Figure 2. Diagram of the random selection process.

Figure 3. Diagram of tree homogeneity index process.

Figure 4. Diagram of the spatial competition process.

Figure 5. Model of DQN structure.

Figure 6. DQN algorithm for stand structure optimization.

Figure 7. Alterations in stand structure indices for different optimization scenarios in different plots. Note: U, W,

C I

C d

, S and

M c

represent the optimized values of neighborhood comparison, uniform angle index, crown competition index, canopy density, stratification index, and complete mingling, respectively.

Figure 7. Alterations in stand structure indices for different optimization scenarios in different plots. Note: U, W,

C I

C d

, S and

M c

represent the optimized values of neighborhood comparison, uniform angle index, crown competition index, canopy density, stratification index, and complete mingling, respectively.

Figure 8. Optimization effect of each optimization scheme.

Figure 9. Felling decision effect of each optimization scheme. Note: The six axes represent the six stand structure optimization schemes A1–A6, and the five line colors represent the optimized objective function values of the six optimization schemes in the five plots P1–P5.

Table 1. Basic information of the sample plots.

Sample Plots	Altitude	Slope (°)	Slope Dir.	Mean DBH (cm)	Mean Height (m)	Sample Plot Radius (m)	Tree Species Composition	Stand Density (Trees· ha⁻¹)
P1	2254	13.45	E	17.10	11.97	35	8PY2PA-BA-RM-TG	1603
P2	2273	16.16	S	13.79	9.39	32	7PY3PA	2182
P3	2205	17.70	NE	14.50	9.30	20	7PY3PA+BA-QA-VB-GG	2109
P4	2138	5.10	NE	14.26	10.94	19	10PY-QA	2618
P5	2253	15.25	SE	16.03	9.57	30	8PY1PA+QA-VB-BA-CS	2631

Note: E, East; S, South; NE, North-East; SE, South-East; PY, Pinus yunnanensis; PA, Pinus armandii; BA, Betula alnoides; RM, Rhododendron microphyton; TG, Ternstroemia gymnanthera; QA, Quercus acutissima; VB, Vaccinium bracteatum; GG, Gaultheria griffithiana; CS, Camellia sinensis; DBH, diameter at breast height. The column “Tree Species Composition” was calculated by tree number.

Table 2. Six optimization schemes for forest stand structure.

	Q-Learning	DQN
random selection	A1	A2
tree homogeneity index	A3	A4
spatial competition	A5	A6

Table 3. Optimizing algorithm parameter configuration.

Algorithms	Settings	Meaning
DQN	$W = 0$	Initial iteration
	$W_{m a x} = 10000$	Upper limit of iterations
	$s t a t e = 0$	Agent’s initial location
	$s t a t e_{m a x} = 100$	The agent’s permitted farthest move distance
	$l a y e r = 3$	Neural network depth
	$h i d d e n_s i z e = 24$	The size of the hidden layers in a three-layer fully connected network
	$o p t i m i z e r = A d a m$	Using Adam as the optimizer to adaptively adjust learning rates
	$a c t i v a t i o n_f n = R e L u$	Using ReLu as the activation function
	$l o s s_f n = M S E$	Using Mean Squared Error (MSE) as the loss function
	$b u f f e r_s i z e = 10000$	Replay buffer capacity
	$b a t c h_s i z e = 32$	Batch size for sampling from the replay buffer
	$γ = 0.9$	Discount factor
	$l r = 0.01$	Learning rate
	$ϵ = 0.9$	exploration rate for $ϵ$ -greedy strategy
	$ϵ_d e c a y = 0.9998$	Exploration Decay Rate
	$ϵ_m i n = 0.01$	The minimum exploration rate for $ϵ$ -greedy strategy.
	$a = 150, b = 10, c = - 100, d = 1, e = 200, f = 300$	Reward and punishment values
Q-Learning	$W = 0$	Initial iteration
	$W_{m a x} = 10000$	Upper limit of iterations
	$s t a t e = 0$	Agent’s initial location
	$s t a t e_{m a x} = 100$	The agent’s permitted farthest move distance
	$γ = 0.9$	Discount factor
	$l r = 0.01$	Learning rate
	$ϵ = 0.9$	Epsilon for $ϵ$ -greedy strategy
	$ϵ_d e c a y = 0.9998$	Exploration Decay Rate
	$ϵ_m i n = 0.01$	The minimum exploration rate for $ϵ$ -greedy strategy.
	$a = 150, b = 10, c = - 100, d = 1, e = 200, f = 300$	Reward and punishment values

Table 4. Objective function values before and after simulated optimization for each optimization scheme.

		P1	P2	P3	P4	P5	Scheme Average	Scheme Improvement Extent
Before Optimizing		0.2950	0.2954	0.3445	0.3010	0.3168	0.3105
After Optimizing	A1	0.3392	0.3579	0.3986	0.4321	0.3412	0.3738	20.62%
	A2	0.3813	0.3701	0.4301	0.4599	0.3689	0.4021	29.73%
	A3	0.3250	0.3539	0.3860	0.4062	0.3339	0.3610	16.47%
	A4	0.3356	0.3593	0.4022	0.4193	0.3503	0.3733	20.40%
	A5	0.3394	0.3506	0.3833	0.4028	0.3556	0.3664	18.22%
	A6	0.3469	0.3576	0.3895	0.4236	0.3600	0.3755	21.22%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, J.; Wang, J.; Yin, J.; Chen, Y.; Wu, B. Optimization of the Stand Structure in Secondary Forests of Pinus yunnanensis Based on Deep Reinforcement Learning. Forests 2024, 15, 2181. https://doi.org/10.3390/f15122181

AMA Style

Zhao J, Wang J, Yin J, Chen Y, Wu B. Optimization of the Stand Structure in Secondary Forests of Pinus yunnanensis Based on Deep Reinforcement Learning. Forests. 2024; 15(12):2181. https://doi.org/10.3390/f15122181

Chicago/Turabian Style

Zhao, Jian, Jianmming Wang, Jiting Yin, Yuling Chen, and Baoguo Wu. 2024. "Optimization of the Stand Structure in Secondary Forests of Pinus yunnanensis Based on Deep Reinforcement Learning" Forests 15, no. 12: 2181. https://doi.org/10.3390/f15122181

APA Style

Zhao, J., Wang, J., Yin, J., Chen, Y., & Wu, B. (2024). Optimization of the Stand Structure in Secondary Forests of Pinus yunnanensis Based on Deep Reinforcement Learning. Forests, 15(12), 2181. https://doi.org/10.3390/f15122181

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization of the Stand Structure in Secondary Forests of Pinus yunnanensis Based on Deep Reinforcement Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Areas

2.2. Data Collection

2.3. Determination of Spatial Structure Units and Edge Correction

2.4. Stand Structure Indexes

2.4.1. Non-Spatial Structure Indexes

2.4.2. Spatial Structure Indexes

2.5. Optimization Model for the Cutting of Stand Structure

2.5.1. Constrain

2.5.2. Model Construction

2.5.3. Felling Decisions

2.6. Deep Reinforcement Learning Solution Algorithm

2.6.1. Deep Reinforcement Learning Optimization Algorithms

2.6.2. Algorithms for Solving the Model

3. Results

3.1. Parameter Configuration

3.2. Results of Simulated Cutting Optimization

3.3. Algorithm Performance

3.4. Influence of Felling Decisions

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

Appendix C

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI