Disclosure of Invention
The technical problems to be solved by the invention are as follows:
urban traffic jam frequently occurs, so that the time cost of resident trip is increased, and the experience of trip is reduced. At present, the research aiming at the space-time correlation of traffic jam still has the advantages of dividing and treating time and space, and the function of excavating jam evolution is not realized. The invention provides a regional traffic jam propagation track mining method based on a cross-transaction space-time association rule, which can be used for mining a source region of a jam and a range and a direction of jam propagation, so that a jam evacuation scheme can be prepared in advance, traffic flows in the jam region can be shunted in time, and traffic conditions of other regions are prevented from being influenced by the jam propagation.
In order to achieve the above object, the present invention provides a method for mining a propagation trajectory of regional traffic congestion,
step 1: dividing an urban road area of a research area into grid areas according to 1km by 1km grids;
step 2: calculating average speed and flow data of the grid area;
and step 3: standardizing the average speed and flow data information of each grid area;
and 4, step 4: the normalized average speed and flow data of each grid area are clustered to obtain the traffic state classification of each grid area, and the traffic parameter data set of each grid area is divided by using a clustering algorithm fusing 'kmeans + +' and FCM to generate 4 classifications, namely unblocked, basically unblocked, slightly blocked and blocked. The traffic parameter data set comprises average speed and flow data for a grid area;
and 5: marking whether the grid area is congested or not by using '0-1', if the grid area is congested in a certain period, marking as 1, otherwise, marking as 0; extracting the time slot when the road is congested and the position information of the congestion, wherein the position information is the number information of the grid area;
step 6: and constructing a sliding window transaction set according to the time reference point in a classified manner, applying a cross-transaction space-time association rule to the sliding window transaction set, and excavating a regional traffic jam propagation track.
The traffic data is collected by a bayonet device arranged near an intersection, and two adjacent bayonet devices in the same direction form 1 road section; firstly, counting the number of road sections and bayonet codes in each grid area; secondly, counting the average speed and the traffic flow of vehicles passing through each road section in a certain time period to obtain the average speed of the road section in the time period; and finally, averaging the average speeds of all road sections in the grid area to obtain the average speed of the grid area, and summing the flow data of all road sections in the grid to obtain the flow data of the grid area.
The time granularity of counting the average speed and traffic flow of vehicles passing through each road section is 15 minutes, and each day is divided into 96 time periods.
The average speed of the vehicles in each road section is calculated as follows:
first, an average speed calculation formula when the ith vehicle passes through the section R is calculated as follows:
wherein,
the average speed of the ith vehicle passing through the road section R; l
RIs the length of the road segment R;
time when the ith vehicle passes through the road section R;
secondly, the average speed of the vehicles on the road section is calculated according to the following formula:
wherein, Vt,RIs the average speed of the road section R over the time period t; t is a time interval number, and t is 0,1,2, … …, 95; the number of vehicles passing through the road section R in the t period is nvol;
the calculation formula of the average velocity of each mesh region is as follows:
wherein, Vgrid,tThe average speed of grid area grid in a time period t; v. oft,RThe average speed of the R road section in the t period is obtained; t is a time interval number, and t is 0,1,2, … …, 95; num is the number of segments within the grid area grid.
The formula for calculating the traffic flow passing through the bayonet m by the time period t is as follows:
wherein, vehiclem,t0Is at t0A vehicle passing through the gate m in time; t is ti、tjRepresenting the start and end times of a period t; q. q.st,mThe flow rate passing through the bayonet m is a time period t; t is a time interval number, and t is 0,1,2, … …, 95;
the calculation formula of the flow data of the grid area is as follows:
wherein Q isgrid,tRepresenting the average flow of grid in a t period; num represents the number of checkpoints within the grid region grid.
The step6 specifically comprises the following steps:
step1, inputting a multi-element time sequence merging set, a minimum support degree, a minimum confidence degree, a time granularity size lambda and the length omega of a sliding time window;
step2, constructing a sliding time window transaction set according to the time reference point in a classified manner, and applying a cross-transaction space-time association rule to the sliding time window transaction set;
aiming at a certain time reference point, the steps of Step3-Step5 are carried out:
step3, counting the support degree of each item set in the sliding window transaction set, and screening according to the minimum support degree to obtain a frequent 1-item set of the time reference point;
step4, recursion is carried out on the frequent 1-item set by using a connecting Step and a pruning Step to form a frequent 2-item set, and then a frequent k-item set is calculated in sequence;
step5, calculating the confidence coefficient of the frequent k-item set, and screening according to the minimum confidence coefficient to generate a strong association rule;
step6, merging strong association rules obtained by mining omega time reference points, merging if a certain rule back piece of a previous time reference point is the same as a certain rule front piece of a next time reference point, and mining a plurality of strong association rules so as to mine the regional traffic jam propagation track.
In Step3, the support factor calculation formula is as follows:
frq (I (o ', t)) is a support degree count of the congestion item set I with a time period t and a position o'; n [ t ] is the number of transactions occurring in time period t; and o' is the location information of the congested area.
Step4, connecting Step: remembering frequent k-item set as Lk,LkFrom a set of candidate k-items CkFurther screening results in a candidate k-term set CkFrom Lk-1Self-ligation to generate, Lk-1When all subsets are connected with each other, space constraint, time constraint and attribute constraint are required to be met, the constraints are met, self-connection is carried out, and a candidate k-item set C is formedk。
Step4, pruning Step: ckIs a frequent k-term set LkSuperset of (C)kMay not be all frequent, deleting items that are not likely to be a frequent k-item set first; if one waitsK-1 item set of the selected k-item set is not in Lk-1In this case, the candidate k-item set is unlikely to become a frequent item set, and is selected from CkThe middle deletion is a pruning process.
In Step5, the confidence calculation formula is as follows:
wherein,
o' and o "represent two different location information.
The invention has the beneficial effects that:
the method and the system utilize the historical traffic data set for clustering, determine the congestion state and mark the congestion state for the traffic data. After the time and the place of the congestion are comprehensively considered, the congestion space-time data is used for constructing a sliding window transaction set, a cross-transaction space-time association rule is applied to the transaction set, and the propagation track of the congestion between the regions is mined. According to the invention, the propagation track of congestion in a certain area can be predicted through the historical traffic state of a road section, the specific probability is displayed, and the transmission of the congestion in time and space is explored from different dimensions by crossing transaction space-time association rules, so that the evolution rule of the congestion is obtained. The analysis result can provide a travel scheme for travelers and a traffic control direction for traffic managers, and congestion is prevented from spreading to adjacent areas. The data used by the method is easy to collect, and the congestion rule is quickly and conveniently acquired by a big data mining method, so that the method has good practical application value.
Detailed Description
The invention is described in further detail below with reference to figures 1-4.
The basic concepts involved in the algorithm are first introduced:
definition 1: congestion time series aggregation: let S be { S ═ S1,s2,...sn}。TiIs a set of congestion zones, which is a set of observation values, T, for S in a time period ii={s1(i),s2(i),...sn(i) And (4) defining a multi-element time sequence merging set D as: d ═ T1,T2,...Tm}. D Each set of observations TiAs a transaction set, each is assigned an identification number TID.
And based on the congestion time sequence collection in the definition 1, recombining the congestion time sequence collection according to the time sequence to form a data base of the cross-transaction space-time association rule in the definition 2.
Definition 2: cross-transaction spatiotemporal association rules: is provided with
∑={e1(0),e1(1),...e1(ω-1),e2(0),e2(1),...e2(ω-1)...eu(0),...eu(ω -1) } is a set of congestion entries that are each sequence observations in the multi-element time-series merged set D, ω being the length of the sliding time window of D. Let time s (s is more than or equal to 1 and less than or equal to n-omega +1) be a reference time reference point of D, and if time s + gamma (gamma is more than or equal to 0 and less than or equal to omega-1) congestion item eiIf it occurs, mark ei(γ) belongs to Ts. Each congestion item ei(γ) assigning an identification number TID. Then the cross-transaction spatiotemporal association rule is an implication in the form of X → Y, and at the same time the following conditions are satisfied:
1)X∈∑,Y∈∑;
6)X=[ti,oi,v1],Y=[tj,oj,v1]x and Y satisfy the constraint defined in definition 5, v1Indicates congestion, oi、ojTwo distinct regions.
Definition 3:
item and item set: let F ═ F1,F2,......Fp) For a collection of all items, called item sets (items), each FbThe term (b) ═ 1, 2.... p) refers to an item, and the number of items contained in F (l) ═ k refers to a k-item set, for example, the set { westbill congestion, college road congestion, beijing south station congestion } is a 3-item set.
Definition 4:
sliding time window transaction set: and arranging the transaction sets according to the time sequence, wherein the sliding time window transaction set is formed by item sets corresponding to the continuous omega time points. The time period t ═ 7:00-7:15,7:15-7:30, … …,21:45-22:00 studied]The time series set S has n time series, each time series including a time period t. Constructing Σ from definitions 1 and 2, generates a sliding time window transaction set with different time reference points. If the set sliding time window is too large, the calculation efficiency of the algorithm is influenced; however, if the sliding time window is small, the influence of the duration of the time on the study content cannot be mined. So the appropriate ω is selected from the study purpose and transaction data set size. As shown in FIG. 4, wherein TiSet of congested areas for time period I, Ii,1And the congestion area set of the first time series in the period i is shown, n is the total number of the time series, omega is the length of a sliding time window, and TID is the number of the transaction set.
Definition 5: and (3) time constraint:
wherein, △ t is the time interval of the difference between two research periods, | △ t | ═ time-step, tα dα th period on day dβ kβ th time period of the z-th day, and the lambda is the time particle size and the continuity of the reaction time.
And (3) space constraint:
after the congestion event occurs, the congestion event spreads to the surrounding areas, namely, the congestion event is spatially adjacent to the surrounding areas, and the spatial relationship is shown in fig. 3.
And (4) attribute constraint:
wherein o isα(i is 1,2, … …, n) indicates a certain spatial position, 0 and 1 indicate whether the traffic is congested, and 0 indicates that the traffic is uncongested in a certain area in an adjacent time slot.
Definition 6:
the support degree is as follows: and (3) deforming based on the support degree of the classical association rule, and calculating the probability of the occurrence of an item set containing time, space and attribute constraints in the transaction set, so that the final rule can reflect the evolution rule of congestion in space and time. And when the support degree of the calculated k-item set is greater than the set minimum support degree, namely the k-item set is a frequent k-item set, and the k-item set is called a candidate k-item set before being compared with the minimum support degree. The support calculation formula is as follows:
frq (I (o ', t)) is a support degree count of the congestion item set I with a time period t and a position o'; n [ t ] is the number of transactions occurring in time period t; and o' is the location information of the congested area.
Definition 7:
confidence coefficient: based on the support obtained by definition 6, calculating the confidence of the cross-transaction spatio-temporal association rule, wherein the calculation formula is as follows:
wherein
o' and o "represent two different location information.
As shown in fig. 1 to 4, the method for mining an area traffic jam propagation track according to the present invention includes the following steps:
step 1: dividing an urban road area of a research area into grid areas according to 1km by 1km grids;
step 2: calculating average speed and flow data of the grid area;
the traffic data is collected by the gate equipment arranged near the intersection, and two adjacent gate equipment in the same direction form 1 road section; firstly, counting the number of road sections and bayonet codes in each grid area; secondly, counting the average speed and the traffic flow of vehicles passing through each road section in a certain time period to obtain the average speed of the road section in the time period; finally, averaging the average speeds of all road sections in the grid area to obtain the average speed of the grid area, and summing the flow data of all road sections in the grid to obtain the flow data of the grid area; counting the time granularity of the average speed and the traffic flow of vehicles passing through each road section to be 15 minutes, dividing each day into 96 time periods, and obtaining the speed and the flow data of each time period of each grid area according to the method;
the average speed of the vehicles in each road section is calculated as follows:
first, an average speed calculation formula when the ith vehicle passes through the section R is calculated as follows:
wherein,
is the ith vehicleAverage speed of the vehicle as it passes through section R; l
RInputting longitude and latitude information of upstream and downstream checkpoints of the road section in a mode of calling a Baidu map API (application program interface) for the length of the road section R, so as to obtain the distance between the upstream and downstream checkpoints of the road section;
is the time when the ith vehicle passes through the section R.
Secondly, the average speed of the vehicles on the road section is calculated according to the following formula:
wherein, Vt,RIs the average speed of the road section R over the time period t; t is a time interval number, where t is 0,1,2, … …,95, that is, the time interval is divided by taking 15min as an interval, one day is divided into 96 time intervals, and the time span for acquiring data is d days; the number of vehicles passing through the road section R in the t period is nvol;
the calculation formula of the average velocity of each mesh region is as follows:
wherein, Vgrid,tThe average speed of grid area grid in a time period t; v. oft,RThe average speed of the R road section in the t period is obtained; t is a time interval number, and t is 0,1,2, … …, 95; num is the number of segments within the grid area grid.
The formula for calculating the traffic flow passing through the bayonet m by the time period t is as follows:
wherein, vehiclem,t0Is at t0A vehicle passing through the gate m in time; t is ti、tjRepresenting the start and end times of a period t; q. q.st,mThe flow rate passing through the bayonet m is a time period t; t is the slot number, and t is 0,1,2, … …, 95.
The calculation formula of the flow data of the grid area is as follows:
wherein Q isgrid,tRepresenting the average flow of grid in a t period; num represents the number of checkpoints within the grid region grid.
And step 3: and standardizing the average speed and flow data information of each grid area.
And 4, step 4: the normalized average speed and flow data of each grid area are clustered to obtain the traffic state classification of each grid area, and the traffic parameter data set of each grid area is divided by using a clustering algorithm fusing 'kmeans + +' and FCM to generate 4 classifications, namely unblocked, basically unblocked, slightly blocked and blocked. The traffic parameter data set includes average speed and flow data for a grid area.
And 5: marking whether the grid area is congested or not by using '0-1', if the grid area is congested in a certain period, marking as 1, otherwise, marking as 0; and extracting the time slot when the road is jammed and the position information of the jam, wherein the position information is the number information of the grid area.
Step 6: and (4) constructing a sliding window transaction set according to definition 4 in a classification mode according to the time reference points, applying a cross-transaction space-time association rule to the sliding window transaction set, and excavating a regional traffic jam propagation track. The method specifically comprises the following steps:
step 1: inputting a multi-element time sequence merging set, a minimum support degree, a minimum confidence degree, a time granularity size lambda and the length omega of a sliding time window.
Step 2: a sliding time window transaction set is constructed according to definition 4, sorted by temporal reference point, as shown in figure 4. A cross-transaction spatiotemporal association rule is applied to a sliding time window transaction set.
Taking a certain time reference point as an example:
step 3: the support count is counted for each set of items in the sliding window transaction set. And obtaining the frequent 1-item set of the time reference point according to the minimum support degree screening. The support calculation formula is shown as follows:
frq (I (o ', t)) is a support degree count of the congestion item set I with a time period t and a position o'; n [ t ] is the number of transactions occurring in time period t; and o' is the location information of the congested area.
Step 4: and recursion is carried out on the frequent 1-item set by using a connecting step and a pruning step to form a frequent 2-item set, then the frequent k-item sets are sequentially calculated, and the interaction of congestion on a time line is explored. Setting omega to be 2, therefore, defining time-step to be 15min in 5, namely mining the current time period tαAnd t isα+1Frequent 2-item sets between periods.
Wherein the connection step is as follows: remembering frequent k-item set as Lk,LkFrom a set of candidate k-items CkFurther screening results in a candidate k-term set CkFrom Lk-1Self-ligation to generate, Lk-1When all subsets are connected with each other, whether the spatial property of the connection item satisfies the spatial constraint needs to be considered, as shown in fig. 3; and whether the time constraint is satisfied, as in equation 8. The congestion propagation characteristics are considered while the attribute constraints need to be satisfied, as in equation 9. Can self-connect to form a candidate k-item set C only when the above factors are satisfiedk。
Pruning: ckIs a frequent k-term set LkA superset of (i), i.e. CkMay not be all frequent, so deleting items that are not likely to be a frequent k-item set first is beneficial to reducing computational costs. If the k-1 item set of a candidate k-item set is not in Lk-1Then the candidate k-term set is unlikely to become a frequent term set, and is thus driven from CkThe middle deletion is a pruning process.
Step 5: and calculating the confidence coefficient of the frequent k-item set, and screening according to the minimum confidence coefficient to generate a strong association rule.
Step 6: and (3) merging the strong association rules obtained by mining omega time reference points, namely merging if the back part of a certain rule of the former time reference point is the same as the front part of a certain rule of the latter time reference point, mining a plurality of strong association rules, and mining the regional traffic jam propagation track as shown in definition 2.
The confidence calculation formula is as follows:
wherein,
o' and o "represent two different location information.
Those not described in detail in this specification are within the skill of the art.