CN114148349B - Vehicle personalized following control method based on generation of countermeasure imitation study - Google Patents
Vehicle personalized following control method based on generation of countermeasure imitation study Download PDFInfo
- Publication number
- CN114148349B CN114148349B CN202111568497.3A CN202111568497A CN114148349B CN 114148349 B CN114148349 B CN 114148349B CN 202111568497 A CN202111568497 A CN 202111568497A CN 114148349 B CN114148349 B CN 114148349B
- Authority
- CN
- China
- Prior art keywords
- vehicle
- following
- neural network
- simulation
- personalized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000004088 simulation Methods 0.000 claims abstract description 68
- 239000012634 fragment Substances 0.000 claims abstract description 28
- 230000006870 function Effects 0.000 claims abstract description 26
- 238000012360 testing method Methods 0.000 claims abstract description 12
- 230000002787 reinforcement Effects 0.000 claims abstract description 8
- 238000013528 artificial neural network Methods 0.000 claims description 92
- 230000009471 action Effects 0.000 claims description 26
- 210000002569 neuron Anatomy 0.000 claims description 25
- 230000001133 acceleration Effects 0.000 claims description 20
- 238000005070 sampling Methods 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 7
- 230000003993 interaction Effects 0.000 claims description 6
- 239000003795 chemical substances by application Substances 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 abstract description 10
- 238000011217 control strategy Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/001—Planning or execution of driving tasks
- B60W60/0015—Planning or execution of driving tasks specially adapted for safety
- B60W60/0016—Planning or execution of driving tasks specially adapted for safety of the vehicle or its occupants
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
- B60W30/14—Adaptive cruise control
- B60W30/143—Speed control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/08—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2520/00—Input parameters relating to overall vehicle dynamics
- B60W2520/10—Longitudinal speed
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2520/00—Input parameters relating to overall vehicle dynamics
- B60W2520/10—Longitudinal speed
- B60W2520/105—Longitudinal acceleration
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2540/00—Input parameters relating to occupants
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2554/00—Input parameters relating to objects
- B60W2554/40—Dynamic objects, e.g. animals, windblown objects
- B60W2554/404—Characteristics
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2554/00—Input parameters relating to objects
- B60W2554/80—Spatial relation or speed relative to objects
- B60W2554/802—Longitudinal distance
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2554/00—Input parameters relating to objects
- B60W2554/80—Spatial relation or speed relative to objects
- B60W2554/804—Relative longitudinal speed
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Traffic Control Systems (AREA)
- Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
Abstract
The application provides a vehicle personalized following control method based on generation of countermeasure imitation study, which comprises the following steps: establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a host vehicle and a front vehicle; setting different speed curves of a front vehicle in a simulated vehicle following simulation environment; according to different speed curves of a front vehicle, carrying out a simulated driving following test in a simulated following simulation environment, collecting driving data of a main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set; constructing a vehicle personalized following control model according to a driver following data set by using a generating countermeasure network imitation reinforcement learning method; and performing personalized following control on the vehicle by using a vehicle personalized following control model. The application can solve the technical problem that the driving habit of the driver can not be objectively and comprehensively reflected by formulating the reward function in the existing follow-up control technology based on deep reinforcement learning.
Description
Technical Field
The application relates to the technical field of automatic driving, in particular to a vehicle personalized following control method based on generation of countermeasure imitation learning.
Background
In the development process of the automatic driving technology, from initial constant-speed cruising, self-adaptive cruising and final full automatic driving, the vehicle autonomous following control is one of the key technologies of the vehicle active safety technology and the vehicle automatic driving technology.
The existing autonomous following control technology is mainly divided into two main types: model-based control and Data-based (Data-driven) control. The vehicle longitudinal acceleration is controlled by a constraint optimization method by establishing a vehicle kinematics/dynamics model, describing collision risk of longitudinal movement of the vehicle and combining indexes such as vehicle following efficiency, passenger comfort and the like in the vehicle following process. Thanks to the rapid development of chip computing power, simulation technology and AI technology, the deep reinforcement learning technology provides a brand-new thought for automatic driving control strategies, and by setting a reward function and utilizing continuous interaction of an intelligent body and a simulation environment to try mistakes and optimize the control strategy, the cost of system dynamics modeling and parameter adjustment can be reduced, and by adding driving habit indexes into the reward function, the following control is more in accordance with driving habits of different drivers through the learning of actual driving data of the drivers.
However, at present, the establishment of the reward function is still generally based on subjective judgment on the performance of the following system, and is difficult to objectively and comprehensively reflect the implicit relation between the state space and the output of the system, so that the following control based on the traditional deep reinforcement learning has certain limitations in individuation and pleasure.
Disclosure of Invention
Aiming at the defects existing in the prior art, the application provides a vehicle personalized following control method based on generation of countermeasures and imitation learning, so as to solve the technical problem that the driving habit of a driver cannot be objectively and comprehensively reflected by formulating a reward function in the existing following control technology based on deep reinforcement learning.
The technical scheme adopted by the application is that the vehicle personalized following control method based on the generation of the countermeasure imitation study comprises the following steps:
establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a host vehicle and a front vehicle;
setting different speed curves of a front vehicle in a simulated vehicle following simulation environment;
according to different speed curves of a front vehicle, carrying out a simulated driving following test in a simulated following simulation environment, collecting driving data of a main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set;
constructing a vehicle personalized following control model according to a driver following data set by using a generating countermeasure network imitation reinforcement learning method;
and performing personalized following control on the vehicle by using a vehicle personalized following control model.
Further, establishing the simulated vehicle following simulation environment includes:
setting up a simulated vehicle following simulation environment by adopting an automatic driving simulation platform;
the method comprises the steps that vehicle dynamics simulation software is adopted for a main vehicle to carry out main vehicle dynamics modeling;
a random traffic flow model is used to describe the surrounding vehicle motion.
Further, the different speed profiles of the front vehicle include: the front vehicle runs at a constant speed, at a reduced speed, at emergency braking and at random speed.
Further, the following state includes: relative distance d between host vehicle and front vehicle, relative speed v between host vehicle and front vehicle r Main vehicle speed v h The action is longitudinal acceleration a of the main vehicle h 。
Further, constructing the vehicle personalized following control model includes:
the method comprises the steps of taking the relative distance, the relative speed and the speed of a main vehicle and a front vehicle as inputs, taking the longitudinal acceleration of the main vehicle as output, and establishing a strategy to generate a neural network;
the method comprises the steps of taking the relative distance and the relative speed of a main vehicle and a front vehicle, the speed of the main vehicle and the longitudinal acceleration of the main vehicle as inputs, taking a true value and a false value as outputs, and establishing a discrimination neural network;
a continuous follow-up segment is obtained from a uniform sampling of the driver dataset:
wherein Respectively representing the m-th step following state and the actual action of a driver;
inputting the obtained continuous following segment into a strategy generation neural network to interact with a simulation environment to obtain a simulation following segment: wherein />Respectively represent the following shape of the nth step of the simulation processThe state and the strategy generate a neural network output action;
inputting the simulation following segment into a discrimination neural network, and discriminating the true degree of the strategy generation neural network output by adopting the discrimination neural network;
generating a neural network using a plurality of sequential following segment training strategies;
training and distinguishing the neural network by using a plurality of simulation following fragments;
and updating and judging the neural network parameters by adopting a gradient descent method.
Further, the number of the neurons of the input layer of the strategy generation neural network is 3, and the number of the neurons is respectively the relative distance, the host vehicle speed and the relative speed; the number of the neurons of the output layer is 1, and the neurons are longitudinal acceleration of the host vehicle; the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the policy generating neural network is expressed as:
f=π(as;ω)
wherein a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the s represents the vehicle state, s= [ d, v ] h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the ω represents policy generation neural network parameters.
Further, 4 neurons of an input layer of the neural network are judged, namely, the relative distance, the speed of the host vehicle, the relative speed and the longitudinal acceleration of the host vehicle, 1 neuron of an output layer is judged, and the values of the neurons of the output layer are (0, 1); the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the discriminating neural network is expressed as:
p a =D(s,a;θ)∈(0,1)
where s represents the vehicle state, s= [ d, v h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the θ represents a discrimination neural network parameter.
Further, when a continuous following fragment is obtained by uniformly sampling from a driver data set, taking the initial state of a host vehicle and the running track of a front vehicle in the fragment as simulation scenes, defining a current strategy generation neural network parameter strategy generation neural network to perform probability sampling on each action so as to control interaction between the host vehicle and the environment, stopping simulation when a stopping condition is met, and recording simulation following fragment data; the stop conditions include:
finishing the reading of the sample data;
the two vehicles collide;
the speed of the master vehicle is less than or equal to 0.
Further, when the discrimination neural network is adopted to discriminate the true degree of the output of the strategy generation neural network, the cross entropy is defined as the kth step return function r k =logD(s k ,a k The method comprises the steps of carrying out a first treatment on the surface of the θ), substituting the interaction track of the agent and the environment into a return function to obtain a track containing return in each step:
wherein ,and generating neural network output actions respectively representing the following state and the strategy of the nth step of the simulation process.
Further, when training the strategy generation neural network and the discrimination neural network, the objective function of the strategy generation neural network is as follows:
where ω represents the network parameter to be updated, ω now Representing current network parameters;
the loss function of the discrimination neural network is:
wherein m is the sampling point number of the continuous following fragments, and n is the sampling point number of the simulation following fragments;
the gradient descent method is adopted to update and judge the parameters of the neural network, and the parameter updating formula is as follows:
wherein ,θold For the current discrimination of network parameters, θ new In order to judge the network parameters after updating, lambda is the learning rate.
According to the technical scheme, the beneficial technical effects of the application are as follows:
1. the strategy network is more in line with the behavior characteristics of the driver through the generation network and the discrimination network without manually defining the reward function, so that the following control strategy is more in line with the driving habit of the driver.
2. Based on the simulation driving device, the driving data acquisition of the driver is carried out by establishing different simulation scenes, the system structure is simpler, the cost is low, and the driving is safer compared with the actual road.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. Like elements or portions are generally identified by like reference numerals throughout the several figures. In the drawings, elements or portions thereof are not necessarily drawn to scale.
FIG. 1 is a schematic diagram of a vehicle personalized follow-up control flow according to an embodiment of the present application;
FIG. 2 is a block diagram of a vehicle personalized follow-up control strategy according to an embodiment of the present application;
FIG. 3 is a block diagram of a policy generation neural network according to an embodiment of the present application;
FIG. 4 is a diagram of a discriminating neural network according to an embodiment of the application;
fig. 5 is a schematic diagram of a truncation function according to an embodiment of the present application.
Detailed Description
Embodiments of the technical scheme of the present application will be described in detail below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present application, and thus are merely examples, and are not intended to limit the scope of the present application.
It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs.
Examples
The embodiment provides a vehicle personalized following control method based on generation of countermeasure imitation study, as shown in fig. 1, specifically comprising the following steps:
step 1, establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a host vehicle and a front vehicle.
In a specific embodiment, an automatic driving simulation platform Prescan is adopted to build a simulation vehicle following simulation environment; the simulated vehicle following simulation environment comprises a road model, a main vehicle and a front vehicle. And carrying out host vehicle dynamics modeling on the host vehicle by adopting vehicle dynamics simulation software Carsim to obtain a host vehicle Carsim model. For surrounding vehicles, a random traffic flow model is used to describe the surrounding vehicle motion. In this embodiment, the host vehicle is a vehicle that adopts autonomous personalized following control, and the front vehicle is a vehicle that is located in the same lane as the host vehicle, in front of the host vehicle, and is followed by the host vehicle.
And 2, setting different speed curves of the front vehicle in a simulated vehicle following simulation environment.
Specifically, the different speed profiles of the front vehicle include: the front vehicle runs at a constant speed, runs at a reduced speed, runs at an emergency brake and at random speed, and different following working conditions of the main vehicle can be simulated by setting different speed curves of the front vehicle.
And 3, performing a simulated driving following test in a simulated following simulation environment according to different speed curves of the front vehicle, collecting driving data of the main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set.
In a specific embodiment, compass G29 is adopted as a simulated driving device in a simulated vehicle following simulation environment, a driver controls a steering wheel and an accelerator/brake pedal, and a driving simulator collects steering wheel angle, accelerator pedal opening and brake pedal opening signals and transmits the steering wheel angle, accelerator pedal opening and brake pedal opening signals to a main vehicle Carsim model.
According to different speed curves of the front vehicle, the driver is in the modelAnd simulating different following working conditions of the simulated car following simulation environment, performing multiple simulated driving following tests, acquiring driving data of a main car and a front car in the same lane to obtain a following state, wherein the following state comprises the following states: relative distance d between host vehicle and front vehicle, relative speed v between host vehicle and front vehicle r Main vehicle speed v h Longitudinal acceleration a of host vehicle h . The relative distance, the host vehicle speed and the relative speed are selected as the vehicle state s, i.e. s= [ d, v h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the The longitudinal acceleration of the main vehicle is selected as action a, i.e. a= [ a ] h ]。
In a specific embodiment, the collected test data may be embodied as a plurality of different continuous following segments, that is, k simulated driving tests are performed, but each test has different time, for example, the first time is 8 seconds, the second time is 10 seconds, each test corresponds to one continuous following segment, and each continuous following segment contains 4 driving data of relative distance, relative speed, host vehicle speed and host vehicle longitudinal acceleration. In the case of collecting test data, the sampling frequency is not limited, and the preferred frequency is 5Hz, the number of sampling points for the first test for 8 seconds is 40, and the number of sampling points for the second test for 10 seconds is 50.
A plurality of consecutive heel segments is selected as the driver dataset Γ, specifically as follows:
Γ={τ (1) ,τ (2) ,...,τ (k) }
wherein: τ= [ s ] 1 ,a 1 ,s 2 ,a 2 ,...,s m ,a m ]K is the number of consecutive follow-up segments and m is the number of sampling points in each segment.
Step 4, constructing a vehicle personalized following control model according to the driver following data set by using a generated countermeasure network imitation reinforcement learning method
As shown in fig. 2, the construction of the vehicle personalized following control model specifically includes:
step 4.1, taking the relative distance, the relative speed and the speed of the main vehicle and the front vehicle as input, taking the longitudinal acceleration of the main vehicle as output, and establishing a strategy to generate a neural network
In the present embodimentIn the method, the longitudinal acceleration a= [ a ] of the main vehicle is adopted h ]In a specific embodiment, the longitudinal acceleration range of the host vehicle is set to-3 m/s 2 ≤a h ≤3m/s 2 。
The strategy generation neural network structure is shown in figure 3, and the number of the input layer neurons is 3, namely the relative distance, the host vehicle speed and the relative speed; the number of the neurons of the output layer is 1, namely the longitudinal acceleration of the host vehicle, the number of the hidden layers is 2, and the number of the neurons of each hidden layer is 5. The policy generating neural network is expressed as:
f=π(a|s;ω)
wherein a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the s represents the vehicle state, s= [ d, v ] h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the ω represents policy generating neural network parameters including the number of network layers, the number of neurons per layer.
Step 4.2, taking the relative distance between the main vehicle and the front vehicle, the relative speed, the speed of the main vehicle and the longitudinal acceleration of the main vehicle as inputs, and taking the true value and the false value as outputs to establish a discrimination neural network
The neural network structure is shown in figure 4, the number of neurons of the input layer is 4, and the neurons are the relative distance, the speed of the host vehicle, the relative speed and the longitudinal acceleration [ d, v ] of the host vehicle h ,v r ,a h ]The number of output layer neurons is 1, the value of which is (0, 1), the closer the output is to 1 is to be "true", i.e., the action is the driver behavior, and the closer is to 0 is to be "false", i.e., the action is generated by the strategy generation neural network. The hidden layers are 2 layers, and the number of neurons of each hidden layer is 5. The discriminating neural network is expressed as:
p a =D(s,a;θ)∈(0,1)
where s represents the vehicle state, s= [ d, v h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the θ represents a discriminating neural network parameter, and the parameters include the number of network layers and the number of neurons in each layer.
Step 4.3, obtaining a continuous heel segment from the driver data set by uniform sampling
In this embodiment, continuous patches obtained from uniform sampling of the driver data setThe section is recorded as follows: wherein />Representing the m-th step following state and the actual action of the driver, respectively.
The following continuous fragments are obtained as any one of all fragments from a uniform sampling of the driver dataset.
Step 4.4, inputting the obtained continuous following segments into a strategy generation neural network to interact with a simulation environment respectively, so as to obtain simulation following segments; the simulation following segment is input into a discrimination neural network, and the discrimination neural network is adopted to discriminate the real degree of the output of the strategy generation neural network
Taking the initial state of the main vehicle and the running track of the front vehicle in the continuous following section obtained in the step 4.3 as simulation scenes, and defining the parameters of the current strategy generation neural network as omega now Strategy generation neural network pi (a|s; omega) now ) Each action performs probability sampling to control interaction between the host vehicle and the environment, and stops simulation when the simulation stopping condition is met to obtain simulation following fragment data wherein />Generating a neural network output action by representing the following state and the strategy of the nth step of the simulation process respectively;
the condition for stopping the simulation is any one of the following:
finishing the reading of the sample data;
the two vehicles collide (the relative distance d is less than or equal to 0);
speed v of host vehicle h ≤0。
In this step, m and n represent the lengths of two consecutive heel fragments, respectively;
when the discrimination neural network is adopted to discriminate the true degree of the strategy generation neural network output, in a specific embodiment, the cross entropy is defined as the kth step return function:
r k =logD(s k ,a k ;θ)
substituting the interaction track of the agent and the environment into a return function to obtain a track containing return in each step:
wherein ,and generating neural network output actions respectively representing the following state and the strategy of the nth step of the simulation process.
In this way, the discrimination neural network can discriminate the true degree of the output of the policy generation neural network, and the closer the output is to 1, the more true the action is the driver behavior, and the closer the action is to 0, the more false the action is generated by the policy generation neural network.
Step 4.5 generating a neural network using a plurality of continuous following segment training strategies
A plurality of continuous following segments can be obtained by step 4.3, and the strategy generation neural network is trained using the plurality of continuous following segments. Specifically, a near-end strategy optimization method (Proximal Policy Optimization, PPO) is adopted to update a strategy to generate a neural network during training, and an objective function is defined as follows:
where ω represents the network parameter to be updated, ω now Representing current network parameters; k represents the number of steps, pi () generates a neural network for the policy, r t Is a return function.
Designing a near-end strategy optimization function based on maximum clipping, and updating the performance target as follows:
the strategy generation neural network parameter updating method is shown in the following formula:
wherein ωnew Representing the updated network parameters and,represents a truncation function, wherein ε is a truncation parameter; the function y=clip (x, a, b) is shown in fig. 5. In this embodiment, adam-based stochastic gradient ascent is used to solve for ω new 。
Step 4.6, training and distinguishing the neural network by using a plurality of simulation following fragments
And 4.5, obtaining a plurality of simulation following fragments, and training the discrimination neural network by using the simulation following fragments. Specifically, the loss function is defined during training as follows:
wherein, m is the sampling point number of the continuous following fragments, D () is the discrimination neural network, and n is the sampling point number of the simulation following fragments. The smaller the first term in the above formula, the more representativeThe larger the second term, the smaller the second term, representing +.>The smaller the explanatory discriminator loss function G is, the better it is to be able to discriminate whether the system input is input by the driver or generated by the decision network.
Step 4.7: updating and distinguishing neural network parameters by adopting gradient descent method
Specifically, the parameter updating formula is:
wherein ,θold For the current discrimination of network parameters, θ new In order to determine the network parameters after updating, lambda is the learning rate,representing a deviation from θ.
And (3) repeatedly executing the steps 4.3-4.7, obtaining a plurality of continuous following fragments from a driver data set, performing continuous interactive trial and error with the environment through an agent, training a strategy generation neural network and a discrimination neural network until convergence, and finally obtaining the strategy generation neural network through the step 4.4 when the convergence is finished, namely the vehicle personalized following control model, wherein the output of the strategy generation neural network is the personalized following control strategy.
And 5, performing personalized following control on the vehicle by using a personalized following control model of the vehicle.
By adopting the technical scheme of the embodiment, the strategy network is more in line with the behavior characteristics of the driver by generating the network and distinguishing the network without manually defining the rewarding function, so that the following control strategy is more in line with the driving habit of the driver.
Meanwhile, the driving data acquisition of the driver is carried out by establishing different simulation scenes based on the simulation driving device, the system structure is simple, the cost is low, and the driving is safer compared with the actual road driving.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application, and are intended to be included within the scope of the appended claims and description.
Claims (8)
1. A vehicle personalized follow-up control method based on generation of challenge-mimicking learning, comprising the steps of:
establishing a simulated vehicle following simulation environment, wherein the simulated vehicle following simulation environment comprises a road model, a main vehicle and a front vehicle;
setting different speed curves of a front vehicle in the simulated vehicle following simulation environment;
according to different speed curves of the front vehicle, carrying out a simulated driving following test in the simulated following simulation environment, collecting driving data of the main vehicle and the front vehicle to obtain continuous following fragments, and selecting a plurality of continuous following fragments to establish a driver following data set;
constructing a vehicle personalized following control model according to a driver following data set by using a generating countermeasure network imitation reinforcement learning method; the vehicle personalized following control model is constructed by the following steps: the method comprises the steps of taking the relative distance, the relative speed and the speed of a main vehicle and a front vehicle as inputs, taking the longitudinal acceleration of the main vehicle as output, and establishing a strategy to generate a neural network; the method comprises the steps of taking the relative distance and the relative speed of a main vehicle and a front vehicle, the speed of the main vehicle and the longitudinal acceleration of the main vehicle as inputs, taking a true value and a false value as outputs, and establishing a discrimination neural network; a continuous follow-up segment is obtained from a uniform sampling of the driver dataset: wherein />Respectively representing the m-th step following state and the actual action of a driver; inputting the obtained continuous following segment into a strategy generation neural network to interact with a simulation environment to obtain a simulation following segment: /> wherein />Generating a neural network output action by representing the following state and the strategy of the nth step of the simulation process respectively; inputting the simulation following segment into a discrimination neural network, and discriminating the true degree of the strategy generation neural network output by adopting the discrimination neural network;
generating a neural network by using a plurality of continuous following fragments to train the strategy, wherein the objective function of the strategy generating the neural network during training is as follows:
wherein ω' represents the policy generation neural network parameters to be updated, ω now Representing the current policy generation neural network parameters, k representing the number of steps, pi () representing the policy generation neural network, r t Is a return function;
training a discrimination neural network by using a plurality of simulation following fragments, wherein the loss function of the discrimination neural network during training is as follows:
wherein m is the sampling point number of the continuous following fragments, D () is the discrimination neural network, and n is the sampling point number of the simulation following fragments;
the gradient descent method is adopted to update and judge the parameters of the neural network, and the parameter updating formula is as follows:
wherein ,θold For currently distinguishing the neural network parameters, θ new In order to judge the neural network parameters after updating, lambda is the learning rate,representing deviation of theta;
and performing personalized following control on the vehicle by using a vehicle personalized following control model.
2. The vehicle personalized heel control method based on generating countermeasure imitation learning of claim 1, wherein establishing an imitation heel simulation environment comprises:
setting up a simulated vehicle following simulation environment by adopting an automatic driving simulation platform;
the method comprises the steps that vehicle dynamics simulation software is adopted for a main vehicle to carry out main vehicle dynamics modeling;
a random traffic flow model is used to describe the surrounding vehicle motion.
3. The vehicle personalized heel control method based on generating countermeasure imitation learning of claim 1, wherein the different speed profiles of the preceding vehicle include: the front vehicle runs at a constant speed, at a reduced speed, at emergency braking and at random speed.
4. The vehicle personalized heel control method based on generation of countermeasure imitation learning of claim 1, wherein the heel state includes: relative distance d between host vehicle and front vehicle, relative speed v between host vehicle and front vehicle r Main vehicle speed v h The action is longitudinal acceleration a of the main vehicle h 。
5. The vehicle personalized following control method based on generation of countermeasure imitation learning according to claim 4, wherein the input layer neurons of the strategy generation neural network are 3, which are respectively a relative distance, a host vehicle speed and a relative speed; the number of the neurons of the output layer is 1, and the neurons are longitudinal acceleration of the host vehicle; the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the policy generating neural network is expressed as:
f=π(as;ω)
wherein a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the s represents the vehicle state, s= [ d, v ] h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the ω represents policy generation neural network parameters.
6. The vehicle personalized following control method based on the generation of the countermeasure imitation study according to claim 4, wherein 4 input layer neurons of the discrimination neural network are respectively the relative distance, the host vehicle speed, the relative speed and the host vehicle longitudinal acceleration, and 1 output layer neuron is the value of (0, 1); the hidden layers are 2 layers, and the number of neurons of each hidden layer is 5 respectively; the discriminating neural network is expressed as:
p a =D(s,a;θ)∈(0,1)
where s represents the vehicle state, s= [ d, v h ,v r ]The method comprises the steps of carrying out a first treatment on the surface of the a represents an action, a= [ a ] h ]The method comprises the steps of carrying out a first treatment on the surface of the θ represents a discrimination neural network parameter.
7. The method for personalized vehicle following control based on generation of countermeasure imitation study according to claim 1, wherein when a continuous following segment is obtained by uniformly sampling from a driver data set, a main vehicle initial state and a front vehicle running track in the segment are used as simulation scenes, current strategy generation neural network parameters are defined, probability sampling is performed on each action of the strategy generation neural network to control the main vehicle to interact with the environment, simulation is stopped when a stopping condition is met, and simulation following segment data are recorded; the stop condition includes:
finishing the reading of the sample data;
the two vehicles collide;
the host vehicle speed is less than or equal to 0.
8. The vehicle personalized following control method based on generation of countermeasure imitation study according to claim 1, wherein when the discrimination neural network is adopted to discriminate the true degree of the policy generation neural network output, the cross entropy is defined as the kth step return function r k =logD(s k ,a k The method comprises the steps of carrying out a first treatment on the surface of the θ), substituting the interaction track of the agent and the environment into a return function to obtain a track containing return in each step:
wherein ,and generating neural network output actions respectively representing the following state and the strategy of the nth step of the simulation process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111568497.3A CN114148349B (en) | 2021-12-21 | 2021-12-21 | Vehicle personalized following control method based on generation of countermeasure imitation study |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111568497.3A CN114148349B (en) | 2021-12-21 | 2021-12-21 | Vehicle personalized following control method based on generation of countermeasure imitation study |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114148349A CN114148349A (en) | 2022-03-08 |
CN114148349B true CN114148349B (en) | 2023-10-03 |
Family
ID=80451718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111568497.3A Active CN114148349B (en) | 2021-12-21 | 2021-12-21 | Vehicle personalized following control method based on generation of countermeasure imitation study |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114148349B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117698685B (en) * | 2024-02-06 | 2024-04-09 | 北京航空航天大学 | Dynamic scene-oriented hybrid electric vehicle self-adaptive energy management method |
CN118560530B (en) * | 2024-08-02 | 2024-10-01 | 杭州电子科技大学 | Multi-agent driving behavior modeling method based on generation of countermeasure imitation learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109624986A (en) * | 2019-03-01 | 2019-04-16 | 吉林大学 | A kind of the study cruise control system and method for the driving style based on pattern switching |
CN109733415A (en) * | 2019-01-08 | 2019-05-10 | 同济大学 | A kind of automatic Pilot following-speed model that personalizes based on deeply study |
CN111483468A (en) * | 2020-04-24 | 2020-08-04 | 广州大学 | Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning |
CN111795700A (en) * | 2020-06-30 | 2020-10-20 | 浙江大学 | Unmanned vehicle reinforcement learning training environment construction method and training system thereof |
CN111982137A (en) * | 2020-06-30 | 2020-11-24 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for generating route planning model |
CN112201069A (en) * | 2020-09-25 | 2021-01-08 | 厦门大学 | Deep reinforcement learning-based method for constructing longitudinal following behavior model of driver |
CN112580149A (en) * | 2020-12-22 | 2021-03-30 | 浙江工业大学 | Vehicle following model generation method based on generation of countermeasure network and driving duration |
CN113010967A (en) * | 2021-04-22 | 2021-06-22 | 吉林大学 | Intelligent automobile in-loop simulation test method based on mixed traffic flow model |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109284280B (en) * | 2018-09-06 | 2020-03-24 | 百度在线网络技术(北京)有限公司 | Simulation data optimization method and device and storage medium |
JP2022516383A (en) * | 2018-10-16 | 2022-02-25 | ファイブ、エーアイ、リミテッド | Autonomous vehicle planning |
-
2021
- 2021-12-21 CN CN202111568497.3A patent/CN114148349B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109733415A (en) * | 2019-01-08 | 2019-05-10 | 同济大学 | A kind of automatic Pilot following-speed model that personalizes based on deeply study |
CN109624986A (en) * | 2019-03-01 | 2019-04-16 | 吉林大学 | A kind of the study cruise control system and method for the driving style based on pattern switching |
CN111483468A (en) * | 2020-04-24 | 2020-08-04 | 广州大学 | Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning |
CN111795700A (en) * | 2020-06-30 | 2020-10-20 | 浙江大学 | Unmanned vehicle reinforcement learning training environment construction method and training system thereof |
CN111982137A (en) * | 2020-06-30 | 2020-11-24 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for generating route planning model |
CN112201069A (en) * | 2020-09-25 | 2021-01-08 | 厦门大学 | Deep reinforcement learning-based method for constructing longitudinal following behavior model of driver |
CN112580149A (en) * | 2020-12-22 | 2021-03-30 | 浙江工业大学 | Vehicle following model generation method based on generation of countermeasure network and driving duration |
CN113010967A (en) * | 2021-04-22 | 2021-06-22 | 吉林大学 | Intelligent automobile in-loop simulation test method based on mixed traffic flow model |
Also Published As
Publication number | Publication date |
---|---|
CN114148349A (en) | 2022-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhu et al. | Typical-driving-style-oriented personalized adaptive cruise control design based on human driving data | |
CN106874597B (en) | highway overtaking behavior decision method applied to automatic driving vehicle | |
CN110949398B (en) | Method for detecting abnormal driving behavior of first-vehicle drivers in vehicle formation driving | |
CN107264534B (en) | Based on the intelligent driving control system and method for driver experience's model, vehicle | |
CN113010967B (en) | Intelligent automobile in-loop simulation test method based on mixed traffic flow model | |
CN110321954A (en) | The driving style classification and recognition methods of suitable domestic people and system | |
CN111845701A (en) | HEV energy management method based on deep reinforcement learning in car following environment | |
CN111332362B (en) | Intelligent steer-by-wire control method integrating individual character of driver | |
CN111775949A (en) | Personalized driver steering behavior assisting method of man-machine driving-sharing control system | |
CN109709956A (en) | A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding | |
CN114148349B (en) | Vehicle personalized following control method based on generation of countermeasure imitation study | |
CN104462716B (en) | A kind of the brain-computer interface parameter and kinetic parameter design method of the brain control vehicle based on people's bus or train route model | |
CN113581182B (en) | Automatic driving vehicle lane change track planning method and system based on reinforcement learning | |
CN111204348A (en) | Method and device for adjusting vehicle running parameters, vehicle and storage medium | |
CN111267830A (en) | Hybrid power bus energy management method, device and storage medium | |
CN111783943B (en) | LSTM neural network-based driver braking strength prediction method | |
CN113901718A (en) | Deep reinforcement learning-based driving collision avoidance optimization method in following state | |
Selvaraj et al. | An ML-aided reinforcement learning approach for challenging vehicle maneuvers | |
CN116432448B (en) | Variable speed limit optimization method based on intelligent network coupling and driver compliance | |
CN110320916A (en) | Consider the autonomous driving vehicle method for planning track and system of occupant's impression | |
CN117719535A (en) | Human feedback automatic driving vehicle interactive self-adaptive decision control method | |
CN116629114A (en) | Multi-agent model training method, system, computer equipment and storage medium | |
Xu et al. | Modeling Lateral Control Behaviors of Distracted Drivers for Haptic-Shared Steering System | |
CN113033902B (en) | Automatic driving lane change track planning method based on improved deep learning | |
Liu et al. | Personalized Automatic Driving System Based on Reinforcement Learning Technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |