CN109709956A

CN109709956A - A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding

Info

Publication number: CN109709956A
Application number: CN201811600366.7A
Authority: CN
Inventors: 王雪松; 朱美新; 孙平
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2018-12-26
Filing date: 2018-12-26
Publication date: 2019-05-03
Anticipated expiration: 2038-12-26
Also published as: CN109709956B

Abstract

The present invention develop a kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding.The algorithm proposes a kind of model for automobile with speed control of speeding based on deeply study, which not only imitates mankind's driving, but directly optimizes drive safety, efficiency and comfort.In conjunction with collision time, the distribution of time headway experience, acceleration, construct reflection drive safety, the reward function of efficiency and comfort, (Next Generation Simulation is emulated using the next generation, NGSIM) practical driving data training pattern in project, and modeling is compared with speeding on the behavior observed in NGSIM empirical data, intensified learning intelligent body by test in simulated environment and trial and error, learnt in a manner of maximizing progressive award safety, it is comfortable, efficiently control car speed.The result shows that proposition shows better safe and efficient and comfortable driving ability with speed control algorithm of speeding compared with human driver in the real world.

Description

A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding

Technical field

The present invention relates to automatic Pilots with control field of speeding, in particular to a kind of automatic driving vehicle speed control multiple target Optimization with algorithm of speeding.

Background technique

It is the important component of automatic Pilot intelligent decision with control of speeding, including the speed selection under freely driving, vehicle With the braking under the holding and emergency of spacing at any time.In the case where automatic Pilot and the mankind drive and coexist, automatically It drives vehicle and makes the comfort level and letter that will improve passenger with control decision of speeding similar to human driver's (referred to as personalizing) Ren Du, while other traffic participants also being facilitated to more fully understand and predict the behavior of automatic driving vehicle, it is driven automatically with realizing Sail the secure interactive between mankind's driving.However traditional following-speed model is when being applied to automatically with speeding to control that there are many limitations Property, such as the flexibility and accuracy of limited model, it is difficult to which the Driving Scene being generalized to other than nominal data and driver are applied to Driving style and the Driving Scene etc. of the practical driver of vehicle cannot be reacted when automatic Pilot.

Deeply learn (Deep Reinforcement Learning, DRL) be widely used in industry manufacture, Analogue simulation, robot control, optimization and scheduling and the fields such as game play, basic thought be by maximize intelligent body from The accumulative reward value obtained in environment, to learn to the optimal policy for completing target.DRL method more lays particular emphasis on study and solves to ask The strategy of topic, and non-logarithmic evidence is fitted, therefore its generalization ability is stronger, provides ginseng with control of speeding for automatic driving vehicle It examines.

Summary of the invention

The purpose of the present invention is a kind of: automatic driving vehicle speed control multiple-objection optimizations with algorithm of speeding.The algorithm mentions A kind of model for automobile with speed control of speeding is gone out, which directly optimizes drive safety, efficiency and comfort.In conjunction with Collision time TTC, the distribution of time headway experience, acceleration (Jerk), construct reflection drive safety, efficiency and comfort Reward function, using practical driving data training pattern in next generation's emulation (NGSIM) project, and by modeling with speeding The behavior observed in behavior and NGSIM empirical data is compared, and intensified learning intelligent body passes through the test in simulated environment And trial and error, learnt in a manner of maximizing progressive award safety, it is comfortable, efficiently control car speed.The result shows that with reality Human driver in the world compares, and proposition shows better safe and efficient and comfortable driving with speed control algorithm of speeding Ability.

The technical scheme adopted by the invention is that:

A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding, steps are as follows:

Step 1: obtaining data.Using the data in NGSIM project, rested on same lane based on front truck and rear car and Vehicle follows the criterion such as length > 15 second of event to extract with the event of speeding, based on extraction with the event of speeding, by a part as training Data, another part is as test data.

Step 2: building reward function.It is proposed the feature of reflection automobile model- following control related objective (safety, comfortable, efficiency) Amount.

Step 2.1: safety is reflected using collision time (TTC).TTC indicates the remaining time before two cars collision Amount, formula areIt is wherein Sn-1, n (t) vehicle headway, △ Vn-1, n (t) are relative velocities.According to NGSIM empirical data determines that secure threshold is 7 seconds, and carries out TTC feature construction:If TTC less than 7 seconds, Then TTC characteristic index be negative value, with TTC approach zero, TTC feature will close to bear it is infinite, for close to collision the case where show Most severe punishment.

Step 2.2: driving efficiency is measured using time headway (headway).By analyzing, logarithm normal distribution is adapted to obtain The distribution of the training data taken, probability density function arex>0.According to being mentioned The data taken can estimate that the average value mu and logarithm standard deviation σ of distribution variable x is respectively 0.4226 and 0.4365.By time headway Feature construction is the probability density value of the time headway logarithm normal distribution of estimation: Fheadway=flognormal (headway | μ=0.4226, σ=0.4365).According to the headstock temporal characteristics, about 1.3 seconds time headways correspond to high characteristic value, headstock When away from it is too long or it is too short correspond to low characteristic value, therefore this feature value estimation high flow capacity spacing keeps behavior, while punishing dangerous Or spacing too far keeps behavior.

Step 2.3: driver comfort, feature construction are measured using the change rate Jerk of acceleration are as follows:

Step 2.4: establishing comprehensive reward function.R=w1FTTC+w2Fheadway+ is established according to above step W3Fjerk., wherein w1, w2, w3 be feature coefficient, be all set to 1.

Step 3: training pattern.Every time when training, sequence emulate in data with the event of speeding, training is repeated as many times, and is selected The model of maximum average reward is obtained in test data as final mask.

Step 4: evaluation model.NGSIM data and DDPG model are evaluated using Indexes Comparisons such as TTC, headway and jerk Simulate obtain with speed on for.

The invention has the advantages that

1. the automatic driving vehicle developed can be applied to automatic driving vehicle exploitation with control logic of speeding；

2. the algorithm model does not imitate mankind's driving, but directly optimizes drive safety, efficiency and comfort.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Fig. 2 NGSIM data are compared with DDPG model drive safety.

Fig. 3 NGSIM data are compared with driver comfort between DDPG model.

Specific embodiment

The algorithm proposes a kind of model for automobile with speed control of speeding based on deeply study, the model not mould Apery class drives, but directly optimizes drive safety, efficiency and comfort.In conjunction with collision time TTC, time headway experience point Cloth, acceleration (Jerk), construct reflection drive safety, and the reward function of efficiency and comfort is emulated using the next generation (NGSIM) practical driving data training pattern in project, and by modeling with speed on for observed in NGSIM empirical data To behavior be compared, intensified learning intelligent body is by test in simulated environment and trial and error, to maximize progressive award Mode learn safety, it is comfortable, efficiently control car speed.The result shows that being mentioned compared with human driver in the real world Out show better safe and efficient and comfortable driving ability with speed control algorithm of speeding.The result shows that with real world Human driver compare, proposition shows better safe and efficient and comfortable driving ability with speed control algorithm of speeding.

The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings, and steps are as follows:

Step 1: obtaining data.Using the data in next-generation emulation (NGSIM) project, rested on based on front truck and rear car On same lane and vehicle follow the criterion such as length > 15 second of event extract with the event of speeding, based on extraction with the event of speeding, by one Part is used as training data, and another part is as test data.

Step 2.1: safety is reflected using collision time (TTC).TTC indicates the remaining time before two cars collision Amount, formula areIt is wherein Sn-1, n (t) vehicle headway, △ Vn-1, n (t) are relative velocities.According to NGSIM empirical data determines that secure threshold is 7 seconds, and carries out TTC feature construction:If TTC is less than 7 Second, then TTC characteristic index be negative value, with TTC approach zero, TTC feature will close to bear it is infinite, for close to collision the case where table Now most severe punishment.

Step 2.4: establishing comprehensive reward function.R=w1FTTC is established according to above step 2.1, step 2.2, step 2.3 + w2Fheadway+w3Fjerk., wherein w1, w2, w3 are the coefficients of feature, are all set to 1.

Embodiment

By comparing experience NGSIM data and DDPG modeling obtain with speed on for, test the model can safely, Efficiently, front truck is comfortably followed.

Obtain data.Using the data in NGSIM project, rested on same lane based on front truck and rear car and vehicle with The criterion such as length > 15 second with event are extracted with the event of speeding.

In terms of drive safety, one has been randomly choosed from NGSIM data set with the event of speeding.Fig. 2, which is shown, to be observed Speed, spacing and acceleration, and the corresponding index value generated by DDPG model.Driver in NGSIM data was at 10 seconds Afterwards with the driving of very small following distance, and DDPG model remains that about 10 meters follow gap.

In terms of driver comfort, one has been randomly choosed in NGSIM data set with the event of speeding.Fig. 3, which is shown, to be observed Speed, spacing, acceleration and Jerk value, and the correspondence index value generated by DDPG model.Driver in NGSIM data Frequent acceleration change and big Jerk value are produced in driving procedure, and DDPG model can be kept close to constant acceleration It spends and generates low Jerk value.

Based on the above, what is proposed shows preferably with speed control algorithm of speeding compared with human driver in NGSIM Safe and efficient and comfortable driving ability.

Claims

1. a kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding, which is characterized in that steps are as follows:

Step 1: obtaining data；Using the data in NGSIM project, rested on same lane based on front truck and rear car and vehicle Follow the criterion such as length > 15 second of event to extract with the event of speeding, based on extraction with the event of speeding, by a part as training number According to another part is as test data；

Step 2: building reward function；It is proposed the characteristic quantity of reflection automobile model- following control related objective (safety, comfortable, efficiency)；

Step 2.1: safety is reflected using collision time (TTC)；TTC indicates remaining time quantum before two cars collision, Formula isIt is wherein Sn-1, n (t) vehicle headway, △ Vn-1, n (t) are relative velocities；According to NGSIM Empirical data determines that secure threshold is 7 seconds, and carries out TTC feature construction:If TTC was less than 7 seconds, TTC Characteristic index is negative value, with TTC approach zero, TTC feature will close to bear it is infinite, for close to collision the case where performance it is most severe Punishment；

Step 2.2: driving efficiency is measured using time headway (headway)；By analyzing, logarithm normal distribution is adapted to acquisition The distribution of training data, probability density function arex>0；According to extracted Data can estimate that the average value mu and logarithm standard deviation σ of distribution variable x is respectively 0.4226 and 0.4365；By time headway feature Be configured to estimation time headway logarithm normal distribution probability density value: Fheadway=flognormal (headway | μ= 0.4226, σ=0.4365)；According to the headstock temporal characteristics, about 1.3 seconds time headways correspond to high characteristic value, time headway It is too long or it is too short correspond to low characteristic value, therefore this feature value estimation high flow capacity spacing keeps behavior, while punishing dangerous or mistake Remote spacing keeps behavior；

Step 2.4: establishing comprehensive reward function；R=w1FTTC+w2Fheadway+w3Fjerk. is established according to above step, Middle w1, w2, w3 are the coefficients of feature, are all set to 1；

Step 3: training pattern；Every time when training, sequence emulate in data with the event of speeding, training is repeated as many times, and selection is being surveyed The model for trying to obtain maximum average reward in data is as final mask；

Step 4: evaluation model；NGSIM data and DDPG modeling are evaluated using Indexes Comparisons such as TTC, headway and jerk Obtain with speed on for；

Using the Indexes Comparisons such as TTC, headway and jerk evaluate that NGSIM data and DDPG modeling obtain with speed on for.