[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN117332693A - Slope stability evaluation method based on DDPG-PSO-BP algorithm - Google Patents

Slope stability evaluation method based on DDPG-PSO-BP algorithm Download PDF

Info

Publication number
CN117332693A
CN117332693A CN202311337010.XA CN202311337010A CN117332693A CN 117332693 A CN117332693 A CN 117332693A CN 202311337010 A CN202311337010 A CN 202311337010A CN 117332693 A CN117332693 A CN 117332693A
Authority
CN
China
Prior art keywords
pso
algorithm
network
ddpg
particle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311337010.XA
Other languages
Chinese (zh)
Inventor
秦浩东
李文荣
杨跃光
张晓宸
张彬
王敩青
廖玉琴
毛强
张怿宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Southern Power Grid Corp Ultra High Voltage Transmission Co Electric Power Research Institute
Original Assignee
China Southern Power Grid Corp Ultra High Voltage Transmission Co Electric Power Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Southern Power Grid Corp Ultra High Voltage Transmission Co Electric Power Research Institute filed Critical China Southern Power Grid Corp Ultra High Voltage Transmission Co Electric Power Research Institute
Priority to CN202311337010.XA priority Critical patent/CN117332693A/en
Publication of CN117332693A publication Critical patent/CN117332693A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a slope stability evaluation method based on a DDPG-PSO-BP algorithm, which comprises the following steps: selecting a slope data sample; constructing a BP neural network with single or multiple hidden layers according to the slope data sample; parameters of an initializing particle swarm algorithm, and determining a learning factor c of initializing particles 1 And c 2 The method comprises the steps of carrying out a first treatment on the surface of the Performing PSO-BP algorithm iterative training, and updating the positions and speeds of all particles in a particle swarm; outputting a trained PSO-BP algorithm model, and predicting errors and G best And the maximum iteration number is used as state information to be transmitted into a DDPG algorithm model of deep reinforcement learning, andoutputting action information according to the state information, and updating the learning factor c 1 And c 2 The method comprises the steps of carrying out a first treatment on the surface of the Will update the learning factor c 1 And c 2 And carrying out slope stability evaluation by carrying out the DDPG-PSO-BP algorithm model after the training in the PSO-BP algorithm model. According to the invention, the learning factors in the particle swarm algorithm are optimized through the DDPG algorithm of the deep reinforcement learning, so that the variation trend of PSO-BP algorithm parameters in the iterative process is improved, and the prediction precision of the trained model is higher.

Description

Slope stability evaluation method based on DDPG-PSO-BP algorithm
Technical Field
The invention relates to the technical field of slope stability prediction, in particular to a slope stability evaluation method based on a DDPG-PSO-BP algorithm.
Background
The evaluation and prediction method of the slope stability is core content of the slope engineering, and comprises a limit balance method, an elastoplastic theory method and the like, and the methods are strict in steps, perfect in theory and widely applied to practical engineering. However, the factors influencing the stability of the slope are numerous, the selection of the rock-soil mass parameters has a certain artificial subjectivity, the prediction of the stability of the slope has a large degree of uncertainty and high nonlinearity, and the prediction cannot be expressed by an accurate mathematical model and formula.
The BP neural network has simple initial parameter selection, higher self-adaptive capacity and fault tolerance capacity for processing complex nonlinear relations, but has the problems of easy sinking into local minimum value, low algorithm convergence speed and the like, and can be optimized by a particle swarm algorithm.
In the particle swarm optimization algorithm, a learning factor is a very important parameter, has a great influence on the convergence process and effect of the algorithm, and when the learning factor is smaller in value, the movement speed of particles in a search space is slower, so that the particles pay more attention to an individual optimal solution, and the convergence speed is slower; when the learning factor is larger, the particle speed change is large, so that the particles pay more attention to the global optimal solution, and the possibility that the algorithm oscillates and skips the global optimal solution is increased. However, in the conventional particle swarm optimization algorithm, a fixed value is usually selected by researchers, so that the searching speed and effect of the algorithm are greatly limited.
Therefore, a new slope stability evaluation prediction model capable of optimizing learning factors in a particle swarm algorithm needs to be established to improve the accuracy and reliability of evaluation and prediction.
Disclosure of Invention
In view of the shortcomings of the prior art, the main purpose of the invention is to provide a slope stability evaluation method based on a DDPG-PSO-BP algorithm, so as to solve one or more problems in the prior art.
The technical scheme of the invention is as follows:
a slope stability evaluation method based on a DDPG-PSO-BP algorithm comprises the following steps:
s1: selecting a side slope data sample, and converting the sample data type;
s2: determining the structure of a BP neural network, and constructing the BP neural network with single or multiple hidden layers according to the slope data sample;
s3: the PSO algorithm is integrated into the BP neural network, and the learning factor c of the initializing particles is determined according to the parameters of the established BP neural network initializing particle swarm algorithm 1 And c 2
S4: performing PSO-BP algorithm iterative training, and updating optimal solution vector P of single particle by comparing fitness value best And a global optimal solution vector G of the particle swarm best Updating the positions and the speeds of all particles in the particle swarm;
s5: when the PSO-BP algorithm reaches a preset end condition, the prediction error and G are calculated best And the maximum iteration number is used as state information to be transmitted into a DDPG algorithm model of deep reinforcement learning, and according to the state informationStatus information output action information, update learning factor c 1 And c 2
S6: the updated learning factor c 1 And c 2 And carrying out updating after carrying out updating in the PSO-BP algorithm model to obtain new speed and position, obtaining a finally trained DDPG-PSO-BP algorithm model, and evaluating the slope stability.
In some embodiments, the converting the sample data type includes:
normalizing the side slope data sample:
wherein X and X' are numerical values before and after calculation of the slope data sample; x is X max 、X min Inputting and outputting the maximum value of each column of data for the slope data sample; a. b is a constant.
In some embodiments, in S2, the determining the structure of the BP neural network, and constructing the BP neural network with single or multiple hidden layers according to the slope data sample includes:
initializing network parameters, and setting the number of neurons of an input layer, a hidden layer and an output layer of the BP neural network;
determining the node numbers of an input layer and an output layer of the BP neural network according to the side slope sample data, and selecting the node number of a hidden layer according to the following formula:
wherein l, m and n are respectively the node numbers of an input layer, a hidden layer and an output layer of the BP neural network, alpha is an adjusting constant, and alpha=1, 2,3, & gt, 10;
setting an activation function, and selecting Sigmoid as the activation function:
in the formula e -x An exponential function of the natural constant e;
selecting a mean square error function as a loss function:
wherein E is total error, n is the number of samples; i is the dimension of the data; y is i Outputting a value for the network; t is t i Is a tag value.
In some embodiments, in S3, the parameters of the particle swarm algorithm initialized according to the established BP neural network include:
determining population number, maximum iteration number, position range and speed range, and initializing learning factor c 1 And c 2 Position and velocity vectors;
constructing a single particle network and a particle swarm network, comprising:
(1) Selecting a spatial dimension D:
D=l×m+m×n+m+n
wherein l, m and n are the node numbers of the BP neural network input layer, the hidden layer and the output layer respectively;
(2) In the spatial dimension D, the position X of the ith particle i And velocity V i Expressed as:
X i =(x i1 ,x i2 ,...,x iD ),i∈[1,2,...,N]
V i =(v i1 ,v i2 ,...,v iD ),i∈[1,2,...,N]
wherein x is i1 、x i2 ...、x iD For the position vector of particle i in the D-th dimension in a certain iteration, v i1 、v i2 ...、v iD In (a) is the velocity vector of the particle i in the D-th dimension in a certain iteration, and the position and the velocity of the particle are defined by the maximum position X max And maximum speed V max Limit, and X i ∈[-X max ,X max ],V i ∈[-V max ,V max ]。
In some embodiments, in S4, the updating the positions and velocities of all particles in the population of particles includes:
in the process of PSO-BP algorithm iteration, the position of the particle is brought into a BP neural network to obtain a predicted value;
the predicted value is brought into a fitness function to obtain a fitness value of the particles, and the fitness value is compared to search the optimal position P of each particle in the space best And the global optimum G of the particle swarm best
Updating the optimal solution vector P of single particles according to the back propagation capability of BP neural network best And a global optimal solution vector G of the particle swarm best And updating the positions and the speeds of all particles in the particle swarm.
In some embodiments, the fitness function is a mean square error function, and the fitness value of each particle is calculated according to a mean square error formula:
wherein E is total error, n is the number of samples; i is the dimension of the data; y is i Outputting a value for the network; t is t i Is a tag value.
In some embodiments, the search is performed for the individual optimal position P of each particle best And the global optimum G of the particle swarm best Comprising the following steps:
each particle independently searches the optimal position of the particle in the space until the current iteration step number, and the searched optimal position, namely the individual extremum, is marked as P best
P best =(p i1 ,p i2 ,...,p iD ),i∈[1,2,...,N]
Wherein p is i1 、p i2 、...、p iD The historical optimal position of the particle i in the D dimension in a certain iteration is the optimal solution obtained by searching the i-th particle after the certain iteration;
all particles in the past searching process reach in the whole particle swarmThe global optimum position, i.e. the global optimum, is noted as G best
G best =(p g1 ,p g2 ,...,p gD ),i∈[1,2,...,N]
Wherein p is g1 、p g2 、...、p gD The historical optimal position of the D dimension of the group g in a certain iteration is the optimal solution in the whole particle group after a certain iteration;
all particles in the particle swarm adjust own speed and position according to the extreme value and the global optimal value of the individual, and the updating formula is as follows:
v i+1 =ωv i +c 1 r 1 (P best -x i )+c 2 r 2 (G best -x i )
x i+1 =x i +v i+1
wherein ω is inertial weight, linearly decreasing with iteration number, v i+1 V for the next step velocity vector i For the current velocity vector, x i+1 For the next step position vector, x i C is the current position vector 1 And c 2 Are learning factors, r 1 、r 2 Is [0,1]Random number within range omega max For maximum inertial weight, ω min For the minimum inertia weight, t is the iteration step number, t max Is the maximum number of iterative steps.
In some embodiments, in S5, the preset end condition is that the target convergence accuracy is reached or the maximum iteration number is reached, the target convergence accuracy is determined by the mean square error, if yes, the calculation is terminated, otherwise, the iteration number +1 returns to the previous step.
In some embodiments, in S5, the DDPG algorithm model comprises a Critic network Q (|θ) Q ) Actor network μ (|θ) μ ) Target critical network Q' (|θ) Q' ) And a Target Actor network μ' (|θ) μ' ) Which is provided withIn (a)
The Critic network updating process comprises the following steps:
calculating the action in the state s' by using the Target Actor network:
a'=μ'(s'|θ μ' )
wherein a 'is Target Actor network μ' (|θ) μ' ) An action in state s';
calculating a Target value of the state action pair (s, a) by using the Target Critic network:
y=r+γ(1-done)Q'(s',a'|θ Q' )
wherein y is a target value, r is an instant prize, gamma is a discount factor, done is a task completion flag, Q' (|theta) Q' ) Is a Target Critic network, s' is a state;
calculating the evaluation value θ of the state action pair (s, a) by using Critic network μ
Minimizing the difference L between the evaluation value and the expected value by gradient descent c Thereby updating parameters in the Critic network:
L c =(y-q) 2
wherein y is a target value, and q is a predicted value;
the updating process of the Actor network comprises the following steps: calculating action a in state s by using an Actor network:
a=μ(s|θ μ )
wherein μ (|θ) μ ) Is an Actor network, s is a state;
calculating an evaluation value q of the state action pair (s, a) by using the Critic network:
q=Q(s,a|θ Q )
wherein a is Critic network Q (|θ) Q ) An action in the state s;
finally, the gradient ascent method is utilized to maximize the accumulated expected return, so that the parameters in the Actor network are updated;
the Target Critic network updating process is as follows:
θ Q' =τθ Q +(1-τ)θ Q'
in θ Q’ Is a parameter of Critic network, θ Q Is a parameter of a Target Critic network;
the Target Actor network updating process is as follows:
θ μ' =τθ μ +(1-τ)θ μ'
where τ is the update weight, θ μ Is a parameter of an Actor network, θ μ’ Is a parameter of the Target Actor network.
In some embodiments, in S6, the maximum iteration number of the DDPG algorithm model is preset, and when the DDPG algorithm model reaches the maximum round, the learning factor c obtained by the convergence of the DDPG algorithm model is obtained 1 And c 2 And (3) returning to the PSO-BP algorithm model, updating the PSO-BP algorithm model to obtain new positions and speeds of all particles in the particle swarm, namely obtaining the optimal weight and bias of the BP neural network, and obtaining the finally trained DDPG-PSO-BP algorithm model.
In some embodiments, further comprising: s7: benchmark test verification specifically includes:
(1) The verification method comprises the following steps: selecting a plurality of groups of side slope data, wherein one part of the side slope data is used as a training sample to train and learn the BP neural network, and the other part of the side slope data is used as a test sample to test the feasibility of the DDPG-PSO-BP algorithm model;
(2) Prediction accuracy: selecting a mean square error function as a prediction error;
(3) And comparing and analyzing the PSO-BP algorithm model prediction result with the DDPG-PSO-BP algorithm model prediction result.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a slope stability evaluation method based on a DDPG-PSO-BP algorithm, which optimizes learning factors in a particle swarm algorithm by using a DDPG algorithm of deep reinforcement learning.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, but rather by the claims.
FIG. 1 is a schematic flow chart of a slope stability evaluation method based on a DDPG-PSO-BP algorithm according to some embodiments of the present invention;
FIG. 2 is a diagram of a DDPG algorithm model learning round number-return value;
FIG. 3 is a schematic diagram showing the comparison of fitness values of a PSO-BP algorithm model and a DDPG-PSO-BP algorithm model;
FIG. 4 is a schematic diagram showing the comparison of the prediction results of the PSO-BP algorithm model and the DDPG-PSO-BP algorithm model;
FIG. 5 is a schematic diagram showing the correlation between the PSO-BP algorithm model prediction result and test data;
FIG. 6 is a schematic diagram showing correlation between the model prediction result of the DDPG-PSO-BP algorithm and test data.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the embodiments and the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
It should be understood that the terms "comprises/comprising," "consists of … …," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product, apparatus, process, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product, apparatus, process, or method as desired. Without further limitation, an element defined by the phrases "comprising/including … …," "consisting of … …," and the like, does not exclude the presence of other like elements in a product, apparatus, process, or method that includes the element.
It is further understood that the terms "upper," "lower," "front," "rear," "left," "right," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship based on that shown in the drawings, merely to facilitate describing the present invention and to simplify the description, and do not indicate or imply that the devices, components, or structures referred to must have a particular orientation, be configured or operated in a particular orientation, and are not to be construed as limiting the present invention.
Deep Deterministic Policy Gradient (DDPG) algorithm is an online deep reinforcement learning algorithm specially solving the problem of continuous control, and has the advantages of high convergence rate, direct optimization of continuous action and the like.
According to the invention, the DDPG algorithm is introduced into the PSO-BP algorithm, and the student factors in the particle swarm algorithm are optimized through the DDPG algorithm of deep reinforcement learning, so that the variation trend of PSO-BP algorithm parameters in the iterative process is improved, and the accuracy of predicting the stability of the side slope of the trained model is higher.
The implementation of the present invention will be described in detail with reference to the preferred embodiments.
The invention provides a slope stability evaluation method based on a DDPG-PSO-BP algorithm, which mainly comprises the following steps of: s1: selecting a side slope data sample, and converting the sample data type; s2: determining the structure of a BP neural network, and constructing the BP neural network with single or multiple hidden layers according to the slope data sample; s3: the PSO algorithm is integrated into the BP neural network, and the learning factor c of the initializing particles is determined according to the parameters of the established BP neural network initializing particle swarm algorithm 1 And c 2 The method comprises the steps of carrying out a first treatment on the surface of the S4: performing PSO-BP algorithm iterative training, and updating optimal solution vector P of single particle by comparing fitness value best And a global optimal solution vector G of the particle swarm best Updating the positions and the speeds of all particles in the particle swarm; s5: when the PSO-BP algorithm reaches a preset end condition, outputting a trained PSO-BP algorithm model, and predicting errors and G best And the maximum iteration number is used as state information to be transmitted into a DDPG algorithm model of deep reinforcement learning, action information is output according to the state information, and the learning factor c is updated 1 And c 2 The method comprises the steps of carrying out a first treatment on the surface of the S6: will update the learning factor c 1 And c 2 And updating the PSO-BP algorithm in the trained PSO-BP algorithm model to obtain a DDPG-PSO-BP algorithm model, and evaluating the slope stability.
The slope stability evaluation method provided by the invention can accurately and effectively predict and evaluate the safety of the slope, and is beneficial to finding and solving potential problems in advance, so that the safety and stability of the slope are improved.
Specifically, in S1, a side slope data sample is selected, and the sample data type is converted, so that the difference of each group of data in order of magnitude and dimension is eliminated.
In the invention, six parameters of geological condition factors influencing the slope stability are selected, namely the soil weight, cohesive force, internal friction angle, slope height and pore pressure.
The gravity of the soil is the gravity of the soil in unit volume, the sliding force of the side slope is directly influenced, and under the condition that other conditions are unchanged, the larger the sliding force of the side slope is, the higher the instability risk is; the cohesive force is the shear strength of the damaged surface without any positive stress; the internal friction angle is the friction characteristic formed by mutual movement and gluing action among particles in the soil body, and the cohesive force and the internal friction angle are the most important factors affecting the slope stability; the slope angle is the included angle between the slope surface and the horizontal plane; the slope height is the vertical height from the slope top to the horizontal plane where the slope angle is located; pore water pressure refers to the pressure of groundwater in soil or rock, and acts between particles or pores to influence soil weight, which are important parameters affecting slope stability.
The method and the device for predicting the slope stability by utilizing the parameters can accurately judge the current safety condition of the slope, and are convenient for timely treatment and maintenance.
Furthermore, the invention performs normalization processing on the side slope data samples before training:
wherein X and X' are numerical values before and after calculation of the slope data sample; x is X max 、X min Inputting and outputting the maximum value of each column of data for the slope data sample; a. b is a constant.
S2, determining the structure of the BP neural network, and constructing the BP neural network with single or multiple hidden layers according to the slope data sample comprises the following steps:
initializing network parameters, and setting the number of neurons of an input layer, a hidden layer and an output layer of the BP neural network.
In this embodiment, geological condition factors such as soil gravity, cohesive force, internal friction angle, slope height and pore pressure which affect the slope stability are used as input values of the BP neural network, normalized data preprocessing is performed, and slope safety coefficients are used as output values.
Determining the node numbers of an input layer and an output layer of the BP neural network according to the side slope sample data, and selecting the node number of a hidden layer according to the following formula:
in the formula, l, m and n are respectively the node numbers of the BP neural network input layer, the hidden layer and the output layer, alpha is an adjusting constant, and alpha=1, 2,3 and 10 can be taken.
In this embodiment, l is selected according to the number of input data, n is selected according to the number of output data, and α is selected according to the error in the iterative process.
Setting an activation function, the invention selects the Sigmoid with the widest use range as the activation function:
in the formula e -x Is an exponential function of the natural constant e.
In this embodiment, a mean square error function (MSE) is selected as the loss function:
wherein E is total error, n is the number of samples; i is the dimension of the data; y is i Outputting a value for the network; t is t i Is a tag value.
In S3, the PSO algorithm is fused into the BP neural network, each particle represents a network, and the parameters for initializing the particle swarm algorithm comprise:
determining population number N, position range and speed range, initializing position X of particles i And velocity V i Setting the iteration times t and the maximum iteration times t max Inertia weight omega, initializing learning factor c 1 And c 2
Further, according to parameters of the established BP neural network initialization PSO algorithm, establishing a single particle network and a particle swarm network, selecting a proper space dimension D, and assuming that a group formed by N particles exists in a D-dimensional space, the dimension D can be determined according to the following formula:
D=l×m+m×n+m+n
wherein, l, m and n are the node numbers of the input layer, the hidden layer and the output layer of the neural network respectively.
Further, in the spatial dimension D, the position X of the ith particle i And velocity V i Can be expressed as:
X i =(x i1 ,x i2 ,...,x iD ),i∈[1,2,...,N]
V i =(v i1 ,v i2 ,...,v iD ),i∈[1,2,...,N]
the position and velocity of the particles in this embodiment are determined by the maximum position X max And maximum speed V max Restriction, wherein X i ∈[-X max ,X max ],V i ∈[-V max ,V max ]To avoid blind searches in space.
S4, performing PSO-BP algorithm iterative training, and updating the positions and speeds of all particles in the particle swarm, wherein the updating comprises the following steps:
in the process of PSO-BP algorithm iteration, the position of the particle is brought into a BP neural network to obtain a predicted value;
the predicted value is brought into a fitness function to obtain a fitness value of the particles, and the fitness value is compared to search the optimal position P of each particle in the space best And the global optimum G of the particle swarm best
According to the characteristics of back propagation adjustment weight and threshold value of BP neural network, updating the optimal solution vector P of single particle best And a global optimal solution vector G of the particle swarm best And updating the positions and the speeds of all particles in the particle swarm.
Further, each particle independently searches its own optimal position in space until the current iteration step number, and the searched optimal position, namely the individual extremum, is marked as P best
P best =(p i1 ,p i2 ,...,p iD ),i∈[1,2,...,N]
Wherein p is i1 、p i2 、...、p iD The optimal solution obtained by searching the ith particle (individual) is the historical optimal position of the ith particle (i) in the D-th dimension in a certain iteration, namely after a certain iteration.
Each particle shares information with other particles of the whole particle group, and the global optimal position, namely the global optimal value, reached by all particles in the past searching process is marked as G in the whole particle group best
G best =(p g1 ,p g2 ,...,p gD ),i∈[1,2,...,N]
Wherein p is g1 、p g2 、...、p gD The historical optimal position of the D-th dimension of the group g in a certain iteration is the optimal solution in the whole particle group after the certain iteration.
Further, all particles in the particle swarm adjust their own speed and position according to their own extremum and global optimum, and the update formula is as follows:
v i+1 =ωv i +c 1 r 1 (P best -x i )+c 2 r 2 (G best -x i )
x i+1 =x i +v i+1
wherein ω is an inertial weight that decreases linearly with iteration number:
in the formula, v i+1 V for the next step velocity vector i For the current velocity vector, x i+1 For the next step position vector, x i C is the current position vector 1 And c 2 Collectively referred to as learning factors, the former represents the empirical coefficients of individual particles, and the latter represents the empirical coefficients of groups of particles; r is (r) 1 、r 2 Is [0,1]Random number within range omega max For maximum inertial weight, ω min The minimum inertia weight is adopted, and t is the iteration step number;t max is the maximum number of iterative steps.
Further, traversing all particles in the particle swarm, inputting slope sample data for network propagation, and calculating the fitness value of each particle according to a mean square error formula, namely, a mean square error formula:
wherein E is total error, n is the number of samples; i is the dimension of the data; y is i Outputting a value for the network; t is t i Is a tag value.
In this embodiment, comparing the current fitness value of the particle with the individual optimization of the particle generation, if the current fitness value is better, let P best If not, updating a single particle network according to the original particle optimization; comparing the current fitness value of the particle with the optimal value of all the particles in the population, if the current fitness value is better, giving the position of the current particle to G best Otherwise, updating the particle swarm network according to the global history optimization; current P best And G best For comparison, if P best Preferably, the particle is at the current P best Give G best Otherwise according to the original G best Updating the particle swarm network.
S5, when the PSO-BP algorithm reaches a preset end condition, the prediction error and G best And the maximum iteration number is used as state information to be transmitted into a DDPG algorithm model method of deep reinforcement learning, and then action information, namely an optimized learning factor c, is output according to the state information 1 And c 2
In this embodiment, the preset ending condition of the PSO-BP algorithm model is that the target convergence accuracy is reached or the maximum iteration number is reached. It is easy to understand that the target convergence accuracy is judged by mean square error, if yes, the calculation is terminated, otherwise, the iteration number +1 returns to the previous step.
Deep reinforcement learning mainly focuses on how an agent interacts with an environment, the agent receives the state of the environment, makes an action through a certain strategy, the environment updates the state of the agent according to the action of the agent, gives a reward, and the agent updates the agent according to the reward.
The optimization process of learning factors in the PSO-BP algorithm model may be modeled as a Markov Decision Process (MDP), which is represented by the tuples < S, A, P, R, gamma >.
Wherein S is a state space, i.e. the prediction error, G in the present invention best And a maximum number of iterations representing a set of all possible states of the agent in the environment; a is the action space, i.e. the learning factor c in the present invention 1 And c 2 Representing a set of all possible actions taken by an agent in an environment; p is a state transition probability matrix, P (S 'S, a) is the probability of taking action to transition to a new state S' S under state S; r is a system bonus function, the invention is designed to be r=k1×1/prediction error+k2×1/iteration number; r (s, a) is the immediate prize obtained after action a is taken in state s; gamma e [0,1 ]]Is a discount factor that is used to represent the impact of future rewards on the current time decision.
Further, the deep reinforcement learning includes a state value function and an action value function, where the state value function refers to the desire to obtain a return using a strategy pi under a certain state s:
V π (s)=E[G t |S t =s]
wherein E is the desired G t In return, S t Is in a state;
V π(s) the method is used for measuring the expectation of the sum of rewards obtained by the agent from the beginning of the state s to the end of the final state, and further guiding the updating of the strategy.
The action value function refers to the desire to obtain a return using the policy pi to perform action a in a certain state s:
Q π (s,a)=E[G t |S t =s,A t =a]
wherein E is the desired G t In return, S t In the state of A t Is action;
Q π(s,a) for measuring the onset of action a by an agent in state s to the mostThe end state ends the expected value of the sum of rewards available.
In one round, deep reinforcement learning obtains c according to state information of PSO-BP algorithm model 1 And c 2 Based on the current c, PSO-BP algorithm model 1 And c 2 Predicting the value again to obtain a reward value, storing the state, the action and the reward information into an experience buffer area, and extracting a batch of data from the experience buffer area by the DDPG algorithm model for updating the network. The DDPG algorithm model is continuously circulated and iterated until the maximum round number is reached, and the optimal c after convergence is obtained 1 And c 2
The experience buffer zone adopts experience playback technology, it is easy to understand that experience playback is technology which enables experience probability to be stable respectively, and can improve training stability, and the invention is mainly divided into two steps of 'storage' and 'playback', wherein 'storage' is to make experience in terms of(s) t ,a t ,r t+1 ,s t+1 Done) form is stored in an experience pool; "playback" is the sampling of one or more pieces of empirical data from a pool of experiences according to some rule.
The DDPG algorithm model in the step comprises a Critic network Q (|theta) Q ) Actor network μ (|θ) μ ) Target critical network Q' (|θ) Q ') and Target Actor networks μ' (|θ) μ' ) These four networks. Wherein the Critic network updates the parameter θ of the Critic network by minimizing an error between the evaluation value and the target value Q The Actor network updates the Actor network's parameter θ by maximizing the accumulated expected return μ
Further, the Critic network updating process is as follows: calculating the action in the state s' by using the Target Actor network:
a'=μ'(s'|θ μ' )
where a 'is the action of the Target Actor network in state s'.
Calculating a Target value of the state action pair (s, a) by using the Target Critic network:
y=r+γ(1-done)Q'(s',a'|θ Q' )
wherein y isTarget value, r is immediate rewarding, gamma is discount factor, done is task completion sign, Q' (|theta) Q' ) For the Target Critic network, s' is the state.
Calculating the evaluation value θ of the state action pair (s, a) by using Critic network μ
Minimizing the difference L between the evaluation value and the expected value by gradient descent c Thereby updating parameters in the Critic network:
L c =(y-q) 2
where y is a target value and q is a predicted value.
Further, the updating process of the Actor network is as follows: calculating action a in state s by using an Actor network:
a=μ(s|θ μ )
wherein μ (|θ) μ ) And s is a state, and s is an Actor network.
Calculating an evaluation value q of the state action pair (s, a) by using the Critic network:
q=Q(s,a|θ Q )
wherein a is Critic network Q (|θ) Q ) S is an operation in the state.
And finally, maximizing accumulated expected return, namely an evaluation value q by using a gradient ascent method, so as to update parameters in the Actor network.
The Target Critic network updating process is as follows:
θ Q '=τθ Q +(1-τ)θ Q'
in θ Q’ Is a parameter of Critic network, θ Q Is a parameter of a Target Critic network;
the Target Actor network updating process is as follows:
θ μ' =τθ μ +(1-τ)θ μ'
where τ is the update weight, θ μ Is a parameter of an Actor network, θ μ’ Is a parameter of the Target Actor network.
It is easy to understand that the DDPG algorithm in reinforcement learning is an online deep reinforcement learning algorithm under the Actor-Critic framework, and a set of Target Actor network and Target Critic network for estimating targets are used outside the Actor network and Critic network, which are integrated. By using the target network, concussion and instability of the estimated target value can be reduced.
S6, presetting the maximum iteration times of the DDPG algorithm model, and when the DDPG algorithm model reaches the maximum round, converging the DDPG algorithm model to obtain a learning factor c 1 And c 2 And (3) returning to the PSO-BP algorithm model, updating the PSO-BP algorithm model to obtain new positions and speeds of all particles in the particle swarm, namely obtaining the optimal weight and bias of the BP neural network, and obtaining the finally trained DDPG-PSO-BP algorithm model.
According to the invention, the finally trained DDPG-PSO-BP algorithm model is utilized to evaluate the stability of the side slope, and the problems of poor local optimizing capability, low convergence speed, low result precision and the like of the traditional PSO-BP algorithm model are solved.
The invention also includes S7: and (5) benchmark test verification. The method specifically comprises the following steps:
(1) The verification method comprises the following steps: in order to verify the prediction performance of a DDPG-PSO-BP algorithm model, the invention carries out model training and data prediction aiming at actual side slope data, and selects a plurality of groups of side slope data, wherein one part is used as a training sample, the BP neural network is trained and learned, and the other part is used as a test sample for verifying the feasibility of the DDPG-PSO-BP algorithm model.
In a specific embodiment, 85 groups of side slope data are selected in total, the whole sample data are divided into two parts, 70 groups of data are used as training samples, training learning is carried out on a network, and 15 groups of data are used as test samples for checking the feasibility of the network model constructed by the invention.
(2) Prediction accuracy: to more accurately evaluate the performance of the model, a mean square error function (MSE) is selected as the prediction error:
wherein E is the total error, nThe number of the samples; i is the dimension of the data; y is i Outputting a value for the network; t is t i Is a tag value.
In a specific embodiment, the accuracy prediction is performed on the slope data by using a PSO-BP algorithm model and a DDPG-PSO-BP algorithm model, and the convergence condition and performance of the report value judgment algorithm are often used in the reinforcement learning training stage, as shown in fig. 2, as the number of rounds increases, the report value increases, and finally, the convergence is stabilized to about 25 at about 600. As a result of the prediction, as shown in fig. 3, the prediction error was 0.0141 by using the PSO-BP algorithm model and 0.0026 by using the DDPG-PSO-BP algorithm model for the test data. Therefore, the prediction accuracy of the DDPG-PSO-BP algorithm model is higher.
(3) And comparing and analyzing the PSO-BP algorithm model prediction result with the DDPG-PSO-BP algorithm model prediction result.
In a specific embodiment, refer to fig. 4 to 6, where fig. 4 is a comparison chart of the prediction results of the conventional PSO-BP algorithm model and the DDPG-PSO-BP algorithm model, and the DDPG-PSO-BP is closer to the actual curve; fig. 5 is a graph of correlation between a prediction result of a PSO-BP algorithm model and test data, wherein the correlation between PSO-BP is 0.59938, and fig. 6 is a graph of correlation between a prediction result of a DDPG-PSO-BP algorithm model and test data, wherein the correlation between DDPG-PSO-BP is 0.92607. Therefore, the predicted result of the DDPG-PSO-BP algorithm model is closer to the test data and is better than the traditional PSO-BP algorithm model.
The invention provides a slope stability prediction method based on an improved DDPG-PSO-BP algorithm, which optimizes a learning factor in a particle swarm algorithm speed iteration formula by using a DDPG algorithm of deep reinforcement learning.
It is easy to understand by those skilled in the art that the above preferred embodiments can be freely combined and overlapped without conflict.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the particular forms disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (11)

1. A slope stability evaluation method based on a DDPG-PSO-BP algorithm is characterized by comprising the following steps:
s1: selecting a side slope data sample, and converting the sample data type;
s2: determining the structure of a BP neural network, and constructing the BP neural network with single or multiple hidden layers according to the slope data sample;
s3: the PSO algorithm is integrated into the BP neural network, and the learning factor c of the initializing particles is determined according to the parameters of the established BP neural network initializing particle swarm algorithm 1 And c 2
S4: performing PSO-BP algorithm iterative training, and updating optimal solution vector P of single particle by comparing fitness value best And a global optimal solution vector G of the particle swarm best Updating the positions and the speeds of all particles in the particle swarm;
s5: when the PSO-BP algorithm reaches a preset end condition, the prediction error and G are calculated best And the maximum iteration number is used as state information to be transmitted into a DDPG algorithm model of deep reinforcement learning, action information is output according to the state information, and the learning factor c is updated 1 And c 2
S6: the updated learning factor c 1 And c 2 And carrying out updating after carrying out updating in the PSO-BP algorithm model to obtain new speed and position, obtaining a finally trained DDPG-PSO-BP algorithm model, and evaluating the slope stability.
2. The slope stability evaluation method based on DDPG-PSO-BP algorithm according to claim 1, wherein in S1, the conversion sample data type includes:
normalizing the side slope data sample:
wherein X and X' are numerical values before and after calculation of the slope data sample; x is X max 、X min Inputting and outputting the maximum value of each column of data for the slope data sample; a. b is a constant.
3. The slope stability evaluation method based on DDPG-PSO-BP algorithm according to claim 1, wherein in S2, the determining the structure of the BP neural network, and constructing the BP neural network with single or multiple hidden layers according to the slope data sample, comprises:
initializing network parameters, and setting the number of neurons of an input layer, a hidden layer and an output layer of the BP neural network;
determining the node numbers of an input layer and an output layer of the BP neural network according to the side slope sample data, and selecting the node number of a hidden layer according to the following formula:
wherein l, m and n are respectively the node numbers of an input layer, a hidden layer and an output layer of the BP neural network, alpha is an adjusting constant, and alpha=1, 2,3, & gt, 10;
setting an activation function, and selecting Sigmoid as the activation function:
in the formula e -x An exponential function of the natural constant e;
selecting a mean square error function as a loss function:
wherein E is total error, n is the number of samples; i is the dimension of the data; y is i Outputting a value for the network; t is t i Is a tag value.
4. The slope stability evaluation method based on the DDPG-PSO-BP algorithm according to claim 1, wherein in S3, the parameters according to the established BP neural network initialization particle swarm algorithm include:
determining population number, maximum iteration number, position range and speed range, and initializing learning factor c 1 And c 2 Position and velocity vectors;
constructing a single particle network and a particle swarm network, comprising:
(1) Selecting a spatial dimension D:
D=l×m+m×n+m+n
wherein l, m and n are the node numbers of the BP neural network input layer, the hidden layer and the output layer respectively;
(2) In the spatial dimension D, the position X of the ith particle i And velocity V i Expressed as:
X i =(x i1 ,x i2 ,...,x iD ),i∈[1,2,...,N]
V i =(v i1 ,v i2 ,...,v iD ),i∈[1,2,...,N]
wherein x is i1 、x i2 ...、x iD For the position vector of particle i in the D-th dimension in a certain iteration, v i1 、v i2 ...、v iD In (a) is the velocity vector of the particle i in the D-th dimension in a certain iteration, and the position and the velocity of the particle are defined by the maximum position X max And maximum speed V max Limit, and X i ∈[-X max ,X max ],V i ∈[-V max ,V max ]。
5. The slope stability evaluation method based on DDPG-PSO-BP algorithm according to claim 4, wherein in S4, updating the positions and velocities of all particles in the particle swarm comprises:
in the process of PSO-BP algorithm iteration, the position of the particle is brought into a BP neural network to obtain a predicted value;
the predicted value is brought into a fitness function to obtain a fitness value of the particles, and the fitness value is compared to search the optimal position P of each particle in the space best And the global optimum G of the particle swarm best
Updating the optimal solution vector P of single particles according to the back propagation capability of BP neural network best And a global optimal solution vector G of the particle swarm best And updating the positions and the speeds of all particles in the particle swarm.
6. The slope stability evaluation method based on the DDPG-PSO-BP algorithm according to claim 5, wherein the fitness function is a mean square error function, and the fitness value of each particle is calculated according to a mean square error formula:
wherein E is total error, n is the number of samples; i is the dimension of the data; y is i Outputting a value for the network; t is t i Is a tag value.
7. The slope stability evaluation method based on DDPG-PSO-BP algorithm according to claim 5, wherein the searching of the individual optimum position P of each particle best And the global optimum G of the particle swarm best Comprising the following steps:
each particle independently searches the optimal position of the particle in the space until the current iteration step number, and the searched optimal position, namely the individual extremum, is marked as P best
P best =(p i1 ,p i2 ,...,p iD ),i∈[1,2,...,N]
Wherein p is i1 、p i2 、...、p iD The historical optimal position of the particle i in the D-th dimension in a certain iteration is obtained by searching the i-th particle after the certain iterationIs the optimal solution of (a);
in the whole particle swarm, the global optimal position, namely the global optimal value, reached by all particles in the past generation searching process is marked as G best
G best =(p g1 ,p g2 ,...,p gD ),i∈[1,2,...,N]
Wherein p is g1 、p g2 、...、p gD The historical optimal position of the D dimension of the group g in a certain iteration is the optimal solution in the whole particle group after a certain iteration;
all particles in the particle swarm adjust own speed and position according to the extreme value and the global optimal value of the individual, and the updating formula is as follows:
v i+1 =ωv i +c 1 r 1 (P best -x i )+c 2 r 2 (G best -x i )
x i+1 =x i +v i+1
wherein ω is inertial weight, linearly decreasing with iteration number, v i+1 V for the next step velocity vector i For the current velocity vector, x i+1 For the next step position vector, x i C is the current position vector 1 And c 2 Are learning factors, r 1 、r 2 Is [0,1]Random number within range omega max For maximum inertial weight, ω min For the minimum inertia weight, t is the iteration step number, t max Is the maximum number of iterative steps.
8. The slope stability evaluation method based on the DDPG-PSO-BP algorithm according to claim 1, wherein in S5, the preset end condition is that the target convergence accuracy is reached or the maximum iteration number is reached, the target convergence accuracy is judged by mean square error, if yes, the calculation is terminated, otherwise, the iteration number +1 returns to the previous step.
9. The slope stability evaluation method based on the DDPG-PSO-BP algorithm as set forth in claim 1, wherein in S5, the DDPG algorithm model comprises a Critic network Q (|θ) Q ) Actor network μ (|θ) μ ) Target critical network Q' (|θ) Q' ) And a Target Actor network μ' (|θ) μ' ) Wherein
The Critic network updating process comprises the following steps:
calculating the action in the state s' by using the Target Actor network:
a'=μ'(s'|θ μ' )
wherein a 'is Target Actor network μ' (|θ) μ' ) An action in state s';
calculating a Target value of the state action pair (s, a) by using the Target Critic network:
y=r+γ(1-done)Q'(s',a'|θ Q' )
wherein y is a target value, r is an instant prize, gamma is a discount factor, done is a task completion flag, Q' (|theta) Q' ) Is a Target Critic network, s' is a state;
calculating the evaluation value θ of the state action pair (s, a) by using Critic network μ
Minimizing the difference L between the evaluation value and the expected value by gradient descent c Thereby updating parameters in the Critic network:
L c =(y-q) 2
wherein y is a target value, and q is a predicted value;
the updating process of the Actor network comprises the following steps: calculating action a in state s by using an Actor network:
a=μ(s|θ μ )
wherein μ (|θ) μ ) Is an Actor network, s is a state;
calculating an evaluation value q of the state action pair (s, a) by using the Critic network:
q=Q(s,a|θ Q )
wherein a is Critic network Q (|θ) Q ) In the state of sActs of (a);
finally, the gradient ascent method is utilized to maximize the accumulated expected return, so that the parameters in the Actor network are updated;
the Target Critic network updating process is as follows:
θ Q '=τθ Q +(1-τ)θ Q'
in θ Q' Is a parameter of Critic network, θ Q Is a parameter of a Target Critic network;
the Target Actor network updating process is as follows:
θ μ' =τθ μ +(1-τ)θ μ'
where τ is the update weight, θ μ Is a parameter of an Actor network, θ μ' Is a parameter of the Target Actor network.
10. The slope stability evaluation method based on a DDPG-PSO-BP algorithm according to claim 1, wherein in S6, the maximum iteration number of the DDPG algorithm model is preset, and when the DDPG algorithm model reaches the maximum round, the learning factor c obtained by converging the DDPG algorithm model is obtained 1 And c 2 And (3) returning to the PSO-BP algorithm model, updating the PSO-BP algorithm model to obtain new positions and speeds of all particles in the particle swarm, namely obtaining the optimal weight and bias of the BP neural network, and obtaining the finally trained DDPG-PSO-BP algorithm model.
11. The slope stability evaluation method based on the DDPG-PSO-BP algorithm according to claim 1, further comprising: s7: benchmark test verification specifically includes:
(1) The verification method comprises the following steps: selecting a plurality of groups of side slope data, wherein one part of the side slope data is used as a training sample to train and learn the BP neural network, and the other part of the side slope data is used as a test sample to test the feasibility of the DDPG-PSO-BP algorithm model;
(2) Prediction accuracy: selecting a mean square error function as a prediction error;
(3) And comparing and analyzing the PSO-BP algorithm model prediction result with the DDPG-PSO-BP algorithm model prediction result.
CN202311337010.XA 2023-10-16 2023-10-16 Slope stability evaluation method based on DDPG-PSO-BP algorithm Pending CN117332693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311337010.XA CN117332693A (en) 2023-10-16 2023-10-16 Slope stability evaluation method based on DDPG-PSO-BP algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311337010.XA CN117332693A (en) 2023-10-16 2023-10-16 Slope stability evaluation method based on DDPG-PSO-BP algorithm

Publications (1)

Publication Number Publication Date
CN117332693A true CN117332693A (en) 2024-01-02

Family

ID=89278999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311337010.XA Pending CN117332693A (en) 2023-10-16 2023-10-16 Slope stability evaluation method based on DDPG-PSO-BP algorithm

Country Status (1)

Country Link
CN (1) CN117332693A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828403A (en) * 2024-01-03 2024-04-05 浙江丰源泵业有限公司 Water pump fault prediction and diagnosis method based on machine learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117828403A (en) * 2024-01-03 2024-04-05 浙江丰源泵业有限公司 Water pump fault prediction and diagnosis method based on machine learning

Similar Documents

Publication Publication Date Title
CN109858647B (en) Regional flood disaster risk evaluation and estimation method coupled with GIS and GBDT algorithm
WO2022083009A1 (en) Customized product performance prediction method based on heterogeneous data error compensation fusion
CN109492814A (en) A kind of Forecast of Urban Traffic Flow prediction technique, system and electronic equipment
CN110806759A (en) Aircraft route tracking method based on deep reinforcement learning
CN103105246A (en) Greenhouse environment forecasting feedback method of back propagation (BP) neural network based on improvement of genetic algorithm
CN114692310B (en) Dueling DQN-based virtual-real fusion primary separation model parameter optimization method
CN113177675B (en) Air conditioner cooling load prediction method based on longicorn group algorithm optimization neural network
CN112633591B (en) Space searching method and device based on deep reinforcement learning
CN113983646A (en) Air conditioner interaction end energy consumption prediction method based on generation countermeasure network and air conditioner
CN117268391B (en) Intelligent planning method and system for deformed aircraft based on target layered architecture
CN116451556A (en) Construction method of concrete dam deformation observed quantity statistical model
CN115016534A (en) Unmanned aerial vehicle autonomous obstacle avoidance navigation method based on memory reinforcement learning
CN112464567A (en) Intelligent data assimilation method based on variational and assimilative framework
CN108832663A (en) The prediction technique and equipment of the generated output of micro-capacitance sensor photovoltaic generating system
CN117332693A (en) Slope stability evaluation method based on DDPG-PSO-BP algorithm
CN109754122A (en) A kind of Numerical Predicting Method of the BP neural network based on random forest feature extraction
CN112381282A (en) Photovoltaic power generation power prediction method based on width learning system
CN115374933A (en) Intelligent planning and decision-making method for landing behavior of multi-node detector
CN111008790A (en) Hydropower station group power generation electric scheduling rule extraction method
CN116680477A (en) Personalized problem recommendation method based on reinforcement learning
CN115964898A (en) Bignty game confrontation-oriented BC-QMIX on-line multi-agent behavior decision modeling method
CN113379063B (en) Whole-flow task time sequence intelligent decision-making method based on online reinforcement learning model
CN115938104A (en) Dynamic short-time road network traffic state prediction model and prediction method
CN116415700A (en) Wind energy numerical forecasting method and device combining artificial intelligence
CN118195853A (en) Intelligent campus information management method and system based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination