CN103035123B - Abnormal data acquisition methods and system in a kind of traffic track data - Google Patents
Abnormal data acquisition methods and system in a kind of traffic track data Download PDFInfo
- Publication number
- CN103035123B CN103035123B CN201210572392.XA CN201210572392A CN103035123B CN 103035123 B CN103035123 B CN 103035123B CN 201210572392 A CN201210572392 A CN 201210572392A CN 103035123 B CN103035123 B CN 103035123B
- Authority
- CN
- China
- Prior art keywords
- gps data
- sub
- area
- data points
- threshold
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 62
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000004458 analytical method Methods 0.000 claims description 18
- 238000012417 linear regression Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 abstract description 18
- 238000007405 data analysis Methods 0.000 abstract description 4
- 238000007728 cost analysis Methods 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000029305 taxis Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000000611 regression analysis Methods 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Landscapes
- Traffic Control Systems (AREA)
Abstract
Description
技术领域technical field
本发明属于智能交通技术领域,尤其涉及一种交通轨迹数据中异常数据获取方法及系统。The invention belongs to the technical field of intelligent transportation, and in particular relates to a method and system for acquiring abnormal data in traffic trajectory data.
背景技术Background technique
近年来城市机动车的数量急速增长,引发了许多诸如堵车、停车难、打车难等严重影响老百姓出行质量的问题。同时城市的交通网络也日趋复杂,对一个完善的管理系统的要求越来越高。在未来构建智能城市的蓝图中,智能交通可谓是重中之重。智能交通最核心的部分是了解机动车行驶的规律,并以此进行路段规划和交通管理。智能交通的实现必然是建立在对大量车辆的真实行驶数据进行海量数据分析之上的。车载GPS设备在城市交通中的普及使得获取车辆的位置信息变为可行。典型的例子如出租车的GPS信息。所有出租车都会将其行驶过程中的GPS数据传送回出租车服务公司的服务器以方便总部调度,从而可以汇集大量遍布城市交通网络的GPS信息。基于出租车行驶数据的交通相关的问题研究和服务开发已在政府、企业、和学术界展开。使用车辆的海量GPS数据要面临的首要问题,是如何发现和处理大量数据中的异常元素。造成异常数据的原因是多方面的:首先民用GPS设备在精度上的限制使得GPS数据常散布在真实位置的一定范围内;其次GPS设备会因为如多路径(multi-patheffect)等GPS定位方法的问题给出偏差较大的GPS数据;另外,GPS设备本身可能由于缺少维护并没有正常工作,这在大量车辆的情况中还是较为常见的。使用错误的GPS数据会影响后续分析交通状况的结果,因此对GPS数据中的异常元素进行排除具有很重要的意义。In recent years, the number of urban motor vehicles has increased rapidly, which has caused many problems such as traffic jams, parking difficulties, and difficulty in taking taxis, which seriously affect the quality of ordinary people's travel. At the same time, the city's traffic network is becoming more and more complex, and the requirements for a perfect management system are getting higher and higher. In the blueprint of building a smart city in the future, smart transportation is the top priority. The core part of intelligent transportation is to understand the driving rules of motor vehicles, and use this to carry out road section planning and traffic management. The realization of intelligent transportation must be based on massive data analysis of real driving data of a large number of vehicles. The popularity of vehicle-mounted GPS devices in urban traffic makes it feasible to obtain vehicle location information. A typical example is the GPS information of a taxi. All taxis will transmit the GPS data during their driving back to the server of the taxi service company to facilitate the dispatching of the headquarters, so that a large amount of GPS information can be collected throughout the urban traffic network. Research on traffic-related issues and service development based on taxi driving data has been carried out in government, business, and academia. The primary problem to be faced with the use of massive GPS data of vehicles is how to discover and deal with abnormal elements in the large amount of data. There are many reasons for the abnormal data: first, the limitation of the accuracy of civilian GPS equipment makes GPS data often scattered within a certain range of the real position; The problem gives GPS data with a large deviation; in addition, the GPS device itself may not work properly due to lack of maintenance, which is relatively common in the case of a large number of vehicles. The use of wrong GPS data will affect the results of subsequent analysis of traffic conditions, so it is of great significance to exclude abnormal elements in GPS data.
现有技术对GPS数据中的异常元素进行排除的方法主要是借助GPS数据之外的信息来对GPS数据的好坏进行判断。例如,可获取城市的范围信息,包括出租车所服务的城市的边界,以及在内的山、河、湖、海等车辆无法通行的区域,从而可认定越界或是在禁区的GPS数据为坏数据;另外,可通过车辆的GPS数据计算该车的平均速度,如果此速度值异常大,比如远远高于机动车辆在城市内的行驶限速,则可认为此GPS数据存在错误。此外,还有有不少研究使用道路匹配技术,即一种借助城市地理信息(GIS)将所有的GPS数据点匹配到具体的城市道路上的技术,并将难以合理匹配的GPS数据认定为错误数据。上述方法中,使用城市范围信息的方法十分粗糙,不能对在可行驶区域内的GPS数据进行判断。使用计算车辆平均速度的方法可以判断部分由多路径原因产生的错误数据和非正常工作GPS设备产生的错误数据,但应用范围偏窄。使用GIS进行道路匹配的方法不但要求精确的GIS道路信息,而且在处理匹配时需要做大量的概率分析从而影响了方法的性能。The prior art method for eliminating abnormal elements in GPS data mainly uses information other than GPS data to judge whether the GPS data is good or bad. For example, the scope information of the city can be obtained, including the boundary of the city served by the taxi, and the areas where vehicles cannot pass such as mountains, rivers, lakes, and seas, so that the GPS data that crosses the boundary or is in the restricted area can be determined as bad. In addition, the average speed of the vehicle can be calculated from the GPS data of the vehicle. If the speed value is abnormally large, for example, it is much higher than the speed limit of motor vehicles in the city, it can be considered that there is an error in the GPS data. In addition, there are many studies using road matching technology, which is a technology that matches all GPS data points to specific urban roads with the help of urban geographic information (GIS), and identifies GPS data that are difficult to match reasonably as errors data. In the above methods, the method of using the city range information is very rough, and the GPS data in the drivable area cannot be judged. The method of calculating the average vehicle speed can be used to judge part of the error data caused by multipath and the error data generated by abnormal GPS equipment, but the application range is narrow. The method of road matching using GIS not only requires accurate GIS road information, but also needs to do a lot of probability analysis when dealing with matching, which affects the performance of the method.
综上,现有技术的对GPS数据中的异常元素进行排除的方法需要借助GPS数据之外的信息来对GPS数据的好坏进行判断,并且判断过程复杂。To sum up, the prior art method for eliminating abnormal elements in GPS data needs information other than GPS data to judge whether the GPS data is good or bad, and the judgment process is complicated.
发明内容Contents of the invention
本发明实施例的目的在于提供一种交通轨迹数据中异常数据获取方法,旨在解决现有技术的对GPS数据中的异常元素进行排除的方法需要借助GPS数据之外的信息来对GPS数据的好坏进行判断,并且判断过程复杂的问题。The purpose of the embodiments of the present invention is to provide a method for acquiring abnormal data in traffic track data, which aims to solve the problem of eliminating abnormal elements in GPS data in the prior art, which needs to use information other than GPS data to analyze the abnormal elements in GPS data. Judgment is good or bad, and the judgment process is complicated.
为了实现上述目的,本发明实施例提供如下技术方案:In order to achieve the above purpose, embodiments of the present invention provide the following technical solutions:
本发明实施例是这样实现的,一种交通轨迹数据中异常数据获取方法,所述方法包括:The embodiment of the present invention is achieved in this way, a method for acquiring abnormal data in traffic trajectory data, the method comprising:
获取待分析的交通轨迹的GPS数据;Obtain GPS data of the traffic trajectory to be analyzed;
确定所述GPS数据分布的最小区域,并将所述区域划分成多个面积相等的子区域;Determine the minimum area where the GPS data is distributed, and divide the area into a plurality of equal-area sub-areas;
统计每个子区域中包含的GPS数据点的个数;Count the number of GPS data points contained in each sub-region;
分析与每个子区域中GPS数据点匹配的虚拟路径;Analyzing virtual paths matched to GPS data points in each sub-region;
根据所述每个子区域中包含的GPS数据点的个数以及虚拟路径,获取每一子区域中的GPS数据是否为异常数据。According to the number of GPS data points and the virtual path included in each sub-area, whether the GPS data in each sub-area is abnormal data is obtained.
本发明实施例还提供了一种交通估计数据中异常数据获取系统,所述方法包括:An embodiment of the present invention also provides a system for acquiring abnormal data in traffic estimation data, the method comprising:
GPS数据获取单元,用于获取待分析的交通轨迹的GPS数据;GPS data acquisition unit, for acquiring the GPS data of the traffic track to be analyzed;
确定单元,用于确定所述GPS数据分布的最小区域,并将所述区域划分成多个面积相等的子区域;A determination unit, configured to determine the minimum area of the GPS data distribution, and divide the area into a plurality of sub-areas with equal areas;
统计单元,用于统计每个子区域中包含的GPS数据点的个数;A statistical unit, used to count the number of GPS data points contained in each sub-area;
分析单元,用于分析与每个子区域中GPS数据点匹配的虚拟路径;An analysis unit for analyzing a virtual path matched with GPS data points in each sub-area;
异常数据获取单元,用于根据所述每个子区域中包含的GPS数据点的个数以及虚拟路径,获取每一子区域中的GPS数据是否为异常数据。The abnormal data obtaining unit is used to obtain whether the GPS data in each sub-region is abnormal data according to the number of GPS data points and the virtual path contained in each sub-region.
本发明实施例与现有技术相比,有益效果在于:获取待分析的交通轨迹的GPS数据,确定所述GPS数据分布的最小区域,并将所述区域划分成多个面积相等的子区域,统计每个子区域中包含的GPS数据点的个数,分析与每个子区域中GPS数据点匹配的虚拟路径,根据所述每个子区域中包含的GPS数据点的个数以及虚拟路径,获取每一子区域中的GPS数据是否为异常数据,异常数据判断过程不需要依赖额外的数据库,因此降低了使用数据库的成本和数据分析过程的消耗,且实现过程简单。Compared with the prior art, the embodiment of the present invention has the beneficial effects of acquiring GPS data of the traffic trajectory to be analyzed, determining the minimum area where the GPS data is distributed, and dividing the area into a plurality of sub-areas with equal areas, Count the number of GPS data points included in each sub-area, analyze the virtual path that matches the GPS data points in each sub-area, and obtain each Whether the GPS data in the sub-region is abnormal data or not, the process of judging the abnormal data does not need to rely on an additional database, so the cost of using the database and the consumption of the data analysis process are reduced, and the implementation process is simple.
附图说明Description of drawings
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.
图1是本发明实施例一提供的交通轨迹数据中异常数据获取方法的实现的流程图;Fig. 1 is the flow chart of the realization of the abnormal data acquisition method in the traffic track data provided by Embodiment 1 of the present invention;
图2是本发明实施例一提供的获取每一子区域中的GPS数据是否为异常数据的实现的流程图;Fig. 2 is a flow chart of the implementation of obtaining whether the GPS data in each sub-area is abnormal data provided by Embodiment 1 of the present invention;
图3a是本发明实施例一提供的现有技术方法去除异常数据后的交通轨迹数据的示意图;Fig. 3a is a schematic diagram of traffic trajectory data after removal of abnormal data by the prior art method provided by Embodiment 1 of the present invention;
图3b是本发明实施例一提供的使用本发明方法去除异常数据后的交通轨迹数据的示意图;Fig. 3b is a schematic diagram of the traffic track data after using the method of the present invention to remove abnormal data provided by Embodiment 1 of the present invention;
图4是本发明实施例二提供的交通轨迹数据中异常数据获取系统的结构图。Fig. 4 is a structural diagram of a system for acquiring abnormal data in traffic trajectory data according to Embodiment 2 of the present invention.
具体实施方式detailed description
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
本实施例提供了一种交通轨迹数据中异常数据获取方法,所述方法包括:This embodiment provides a method for acquiring abnormal data in traffic track data, the method comprising:
获取待分析的交通轨迹的GPS数据;Obtain GPS data of the traffic trajectory to be analyzed;
确定所述GPS数据分布的最小区域,并将所述区域划分成多个面积相等的子区域;Determine the minimum area where the GPS data is distributed, and divide the area into a plurality of equal-area sub-areas;
统计每个子区域中包含的GPS数据点的个数;Count the number of GPS data points contained in each sub-region;
分析与每个子区域中GPS数据点匹配的虚拟路径;Analyzing virtual paths matched to GPS data points in each sub-region;
根据所述每个子区域中包含的GPS数据点的个数以及虚拟路径,获取每一子区域中的GPS数据是否为异常数据。According to the number of GPS data points and the virtual path included in each sub-area, whether the GPS data in each sub-area is abnormal data is obtained.
本发明实施例还提供了一种交通估计数据中异常数据获取系统,所述方法包括:An embodiment of the present invention also provides a system for acquiring abnormal data in traffic estimation data, the method comprising:
GPS数据获取单元,用于获取待分析的交通轨迹的GPS数据;GPS data acquisition unit, for acquiring the GPS data of the traffic track to be analyzed;
确定单元,用于确定所述GPS数据分布的最小区域,并将所述区域划分成多个面积相等的子区域;A determination unit, configured to determine the minimum area of the GPS data distribution, and divide the area into a plurality of sub-areas with equal areas;
统计单元,用于统计每个子区域中包含的GPS数据点的个数;A statistical unit, used to count the number of GPS data points contained in each sub-area;
分析单元,用于分析与每个子区域中GPS数据点匹配的虚拟路径;An analysis unit for analyzing a virtual path matched with GPS data points in each sub-area;
异常数据获取单元,用于根据所述每个子区域中包含的GPS数据点的个数以及虚拟路径,获取每一子区域中的GPS数据是否为异常数据。The abnormal data obtaining unit is used to obtain whether the GPS data in each sub-region is abnormal data according to the number of GPS data points and the virtual path contained in each sub-region.
以下结合具体实施例对本发明的实现进行详细描述:The realization of the present invention is described in detail below in conjunction with specific embodiment:
实施例一Embodiment one
图1示出了本发明实施例一提供的交通轨迹数据中异常数据获取方法的实现的流程图,详述如下:Fig. 1 shows the flow chart of the realization of the abnormal data acquisition method in the traffic track data provided by Embodiment 1 of the present invention, which is described in detail as follows:
在S101中,获取待分析的交通轨迹的GPS数据;In S101, the GPS data of the traffic trajectory to be analyzed is acquired;
本实施例中,GPS数据通常以(经度、纬度)的形式表达,可以作为以经纬度绘制的二维地图上的点。In this embodiment, the GPS data is usually expressed in the form of (longitude, latitude), which can be used as points on a two-dimensional map drawn with latitude and longitude.
在S102中,确定所述GPS数据分布的最小区域,并将所述区域划分成多个面积相等的子区域;In S102, determine the minimum area of the GPS data distribution, and divide the area into a plurality of sub-areas with equal areas;
本实施例中,S102具体可以采用以下方式实现:In this embodiment, S102 may specifically be implemented in the following manner:
1、确定所述GPS数据分布的最小矩形区域,所述矩形的竖边平行于经线,矩形的横边平行于纬线;1. Determine the minimum rectangular area where the GPS data is distributed, the vertical side of the rectangle is parallel to the longitude, and the horizontal side of the rectangle is parallel to the latitude;
2、以等距经线族和等距纬线族将所述最小矩形区域划分多个子区域,所述每个子区域为矩形。2. Divide the minimum rectangular area into multiple sub-areas by using equidistant meridian lines and equidistant latitude lines, and each sub-area is a rectangle.
本实施例中,所述最小区域及子区域的形状可以为方形、圆形、矩形等,可以根据实际处理需要确定不同的最小区域的形状。In this embodiment, the shape of the minimum area and the sub-area may be square, circular, or rectangular, and different shapes of the minimum area may be determined according to actual processing requirements.
本实施例中,可以通过网格将所述区域划分成多个子区域,优选的,采用等间距网格,划分出大小相同的出子区域,从而为后续使用一致的判断标准对所有子区域进行处理提供基础;另外通过等间距网格以任意设置每个子区域的大小,优选的,每个子区域为矩形的最长边的长度小于平均道路宽度的5倍,以保证每个子区域包含的道路轨迹基本上只有一条,且该道路估计在该子区域内近似呈现呈直线形,当然,也不能太小至丢失道路轨迹的方向信息。例如,可以设置每个子区域的最长边的长度不超过200米,即大约五倍城市机动车道路的平均宽度。In this embodiment, the area can be divided into a plurality of sub-areas by means of a grid. Preferably, an equidistant grid is used to divide out sub-areas of the same size, so that all sub-areas can be evaluated for subsequent use of a consistent judgment standard. Processing provides a basis; in addition, the size of each sub-region is arbitrarily set by an equidistant grid. Preferably, each sub-region is a rectangle whose longest side is less than 5 times the average road width, so as to ensure that each sub-region contains road tracks There is basically only one road, and the road is estimated to be approximately straight in this sub-area. Of course, it cannot be too small to lose the direction information of the road track. For example, it may be set that the length of the longest side of each sub-region does not exceed 200 meters, which is about five times the average width of urban motor vehicle roads.
以下给出一个划分的示例,但不以此示例的情况为限:假设GPS数据分布的地理区域的最小矩形区域为A,矩形的竖边平行于经线,矩形的横边平行于纬线,矩形的经度范围为(hmin,hmax),纬度范围为(lmin,lmax),通过所述等距经线族和等距纬线族划分最小矩形区域A成子区域集为{ai|i=1,2...NxM},其中,i为系统给每个子区域的唯一编号,M为经线方向划分的子区域的个数,N为纬线方向划分的子区域的个数,每个子区域亦为矩形区域。An example of division is given below, but it is not limited to the situation of this example: Assume that the minimum rectangular area of the geographical area where GPS data is distributed is A, the vertical side of the rectangle is parallel to the longitude, the horizontal side of the rectangle is parallel to the latitude, and the rectangular The longitude range is (h min , h max ), the latitude range is (l min , l max ), and the minimum rectangular area A is divided into sub-area sets by the equidistant meridian family and equidistant latitude family as {a i |i=1 ,2...NxM}, wherein, i is the unique number given to each sub-area by the system, M is the number of sub-areas divided in the meridian direction, N is the number of sub-areas divided in the latitude direction, and each sub-area is also rectangular area.
在S103中,统计每个子区域中包含的GPS数据点的个数;In S103, count the number of GPS data points contained in each sub-region;
本实施例中,在最小矩形区域的(hmin,hmax),纬度范围为(lmin,lmax)内,确定所有子区域ai的边界,对每个GPS数据点进行判断,判断其所述的子区域,进而统计每个子区域中包含的GPS数据点的个数。In this embodiment, within the minimum rectangular area (h min , h max ), the latitude range is (l min , l max ), determine the boundaries of all sub-areas a i , judge each GPS data point, and judge its The sub-areas, and then count the number of GPS data points contained in each sub-area.
本实施例中,因为交通轨迹数据量巨大,在统计过程中,优选的,采用并行处理的方式对每个GPS数据点进行判断。In this embodiment, because the amount of traffic track data is huge, it is preferable to use parallel processing to judge each GPS data point in the statistical process.
进一步的,可以使用Hadoop并行数据处理系统进行GPS数据点的分析和整理,其中,编号依靠Hadoop中的Map运算,而通过Hadoop中的Reduce运算获取每个子区域包含的GPS数据点的个数,从而获取一系列以子区域的编号命名的小文件,所述每个小文件包含属于一个子区域中的包含的GPS数据的信息,例如,GPS数据点的个数,每个GPS数据点的经纬度信息,以及其它附加信息,例如在分析出租车轨迹时需要加入的点的时间戳和所属车辆的编号。在S104中,分析与每个子区域中GPS数据点匹配的虚拟路径;Further, the Hadoop parallel data processing system can be used to analyze and organize GPS data points, wherein the numbering depends on the Map operation in Hadoop, and the number of GPS data points contained in each sub-region is obtained through the Reduce operation in Hadoop, so that Obtain a series of small files named after the number of the sub-area, each of which contains information about the GPS data contained in a sub-area, for example, the number of GPS data points, and the latitude and longitude information of each GPS data point , and other additional information, such as the time stamp of the point and the number of the vehicle that need to be added when analyzing the taxi trajectory. In S104, analyzing the virtual path matched with the GPS data points in each sub-area;
在S104中,分析与每个子区域中GPS数据点匹配的虚拟路径。In S104, analyze the virtual route matched with the GPS data points in each sub-region.
本实施例中,优选的,可以通过线性回归分析,获取与每个子区域中GPS数据点匹配的虚拟路径,例如,通过对每个子区域的GPS数据点进行线性回归分析,可以获得与每个子区域中GPS数据点最匹配的虚拟路径的斜率。如果该子区域中存在一条真实道路,则拟合虚拟路径应该与道路吻合;反之,如果该子区域不存在真实道路,其中的GPS数据点全部是错误的数据,此时获取的虚拟路径的方向随机。In this embodiment, preferably, the virtual path matching with the GPS data points in each sub-area can be obtained through linear regression analysis, for example, by performing linear regression analysis on the GPS data points in each sub-area, the The slope of the virtual path that best matches the GPS data points in . If there is a real road in the sub-area, the fitting virtual path should match the road; on the contrary, if there is no real road in the sub-area, all the GPS data points in it are wrong data, and the direction of the virtual path obtained at this time random.
本实施例中,在分析过程中,优选的,仍然可以采用并行处理的方式获取与每个子区域中GPS数据点匹配的虚拟路径。In this embodiment, in the analysis process, preferably, parallel processing can still be used to obtain the virtual path matched with the GPS data points in each sub-area.
进一步的,仍然可以使用S103中提及的Hadoop的框架,具体的,可以将回归分析过程放于Reduce阶段,获取当前子区域内的GPS数据点匹配的虚拟路径。Further, the Hadoop framework mentioned in S103 can still be used. Specifically, the regression analysis process can be placed in the Reduce stage to obtain the virtual path matched by the GPS data points in the current sub-region.
在S105中,根据所述每个子区域中包含的GPS数据点的个数以及虚拟路径,获取每一子区域中的GPS数据是否为异常数据。In S105, according to the number of GPS data points and the virtual path included in each sub-area, whether the GPS data in each sub-area is abnormal data is obtained.
本实施例中,S105具体可以采用以下方式实现,具体实现思想为:首先每个子区域必须包含足够多的数据点才可继续对其进行分析,如果子区域包含的数据点非常多,则认为该子区域的可信度很高,可直接判断该子区域包含正确信息;反之,如果该子区域包含的数据点的个数很少,可直接判断该子区域包含的GPS数据为异常数据,请参阅图2示出了本发明实施例一提供的获取每一子区域中的GPS数据是否为异常数据的实现的流程图,详述如下:In this embodiment, S105 can specifically be implemented in the following manner. The specific implementation idea is: firstly, each sub-region must contain enough data points before it can continue to be analyzed. If the sub-region contains a lot of data points, then the The reliability of the sub-region is very high, and it can be directly judged that the sub-region contains correct information; on the contrary, if the number of data points contained in the sub-region is very small, it can be directly judged that the GPS data contained in the sub-region is abnormal data, please Referring to Fig. 2, it shows a flow chart of the realization of obtaining whether the GPS data in each sub-area provided by Embodiment 1 of the present invention is abnormal data, which is described in detail as follows:
在S201中,判断当前子区域的GPS数据点的个数是否大于第一阈值,若否,则执行S205,若是,则执行S202。In S201, it is judged whether the number of GPS data points in the current sub-region is greater than the first threshold, if not, execute S205, and if yes, execute S202.
在S202中,判断当前子区域的GPS数据点的个数是否大于第二阈值,其中,所述第一阈值小于第二阈值,若否,则执行S203,若是,则执行S204。In S202, it is judged whether the number of GPS data points in the current sub-area is greater than a second threshold, wherein the first threshold is smaller than the second threshold, if not, execute S203, and if yes, execute S204.
在S203中,判断当前子区域中所有GPS数据点与虚拟路径的垂直距离的标准差是否大于标准差阈值,若否,则执行S204,若是,则执行S205;In S203, it is judged whether the standard deviation of the vertical distance between all GPS data points and the virtual path in the current sub-region is greater than the standard deviation threshold, if not, then perform S204, if so, then perform S205;
本实施例中,当前子区域中所有GPS数据点与虚拟路径的垂直距离的标准差的计算过程如下:计算所有GPS数据点与所述虚拟路径的垂直距离,以及所有GPS数据点与所述虚拟路径的垂直距离的平均值,根据所述所有GPS数据点与所述虚拟路径的垂直距离以及所有GPS数据点与所述虚拟路径的垂直距离的平均值计算所有GPS数据点与虚拟路径的垂直距离的标准差。In this embodiment, the calculation process of the standard deviation of the vertical distance between all GPS data points and the virtual path in the current sub-region is as follows: calculate the vertical distance between all GPS data points and the virtual path, and the distance between all GPS data points and the virtual path The average value of the vertical distance of the path, according to the vertical distance of all GPS data points and the virtual path and the average value of the vertical distance of all GPS data points and the virtual path to calculate the vertical distance of all GPS data points and the virtual path standard deviation of .
本实施例中,如果所有的GPS数据点都完美分布在真实道路之上,通过回归分析出的虚拟路径等同于真实路径,所有GPS数据点到虚拟路径的垂直距离应小于道路宽度的一半,因此所求得的标准差应同样小于标准差阈值。如果标准差偏大,则表明子区域不存在真实路径的可能性为高。In this embodiment, if all GPS data points are perfectly distributed on the real road, the virtual path obtained by regression analysis is equal to the real path, and the vertical distance between all GPS data points and the virtual path should be less than half of the road width, so The resulting standard deviation should also be smaller than the standard deviation threshold. If the standard deviation is large, it indicates that there is a high probability that the subregion does not have a true path.
在S204中,判定当前子区域的所有GPS数据点为正常GPS数据点。In S204, it is determined that all GPS data points in the current sub-area are normal GPS data points.
在S205中,判定当前子区域的所有GPS数据点为异常GPS数据点;In S205, it is determined that all GPS data points in the current sub-area are abnormal GPS data points;
上述三个阈值:第一阈值、第二阈值及标准差阈值的确定是通过对所有子区域相关信息的统计分析得到的,具体可以使用的分析方法包括区间比例法,即分布在阈值分割区间的GPS数据点个数满足给定的比例。The above three thresholds: the first threshold, the second threshold and the standard deviation threshold are determined through the statistical analysis of the relevant information of all sub-regions. The specific analysis methods that can be used include the interval ratio method, that is, the The number of GPS data points satisfies a given ratio.
以下给出数据通过本实施例提供的异常数据获取方法对交通轨迹数据进行处理的结果:数据来源于深圳市在2010年8月一个月内20000辆出租车的运营数据,每辆出租车每一分钟向数据处理中心发送一条信息,信息格式为(纬度、经度、车载状态、时间)。数据的存储格式为普通文本文件。每一个文件储存了一辆出租车以时间排序的GPS信息。全部数据包含20000个文本文件,总大小约为50G。可见,通过使用本发明方法,结合Hadoop分布式文件处理系统,全部数据可以在不超过20分钟内完成错误侦测和排除工作(取决于用于存储和分析数据的服务器性能),可以得到比传统方法更好的结果。如下所示,图3a为现有技术方法去除异常数据后的交通轨迹数据的示意图;图3b为使用本发明方法去除异常数据后的交通轨迹数据的示意图。可见,使用本发明方法约占总数20%的数据点得到了排除,去除了大部分分布在合理道路之外的数据点。The result that the data is processed by the abnormal data acquisition method provided by the present embodiment below is that the traffic trajectory data is processed: the data comes from the operating data of 20,000 taxis in one month in August 2010 in Shenzhen City, and each taxi is Send a message to the data processing center every minute, and the message format is (latitude, longitude, vehicle status, time). The storage format of the data is an ordinary text file. Each file stores the time-ordered GPS information of a taxi. All data contains 20,000 text files, with a total size of about 50G. It can be seen that by using the method of the present invention, in conjunction with the Hadoop distributed file processing system, all data can be completed in no more than 20 minutes for error detection and elimination (depending on the server performance for storing and analyzing data), which can be obtained more than traditional method for better results. As shown below, Fig. 3a is a schematic diagram of the traffic trajectory data after the abnormal data is removed by the method of the prior art; Fig. 3b is a schematic diagram of the traffic trajectory data after the abnormal data is removed by the method of the present invention. It can be seen that the data points accounting for about 20% of the total using the method of the present invention have been excluded, and most of the data points distributed outside the reasonable roads have been removed.
本实施例中,获取待分析的交通轨迹的GPS数据,确定所述GPS数据分布的最小区域,并将所述区域划分成多个面积相等的子区域,统计每个子区域中包含的GPS数据点的个数,分析与每个子区域中GPS数据点匹配的虚拟路径,根据所述每个子区域中包含的GPS数据点的个数以及虚拟路径,获取每一子区域中的GPS数据是否为异常数据,异常数据判断过程不需要依赖额外的数据库,因此降低了使用数据库的成本和数据分析过程的消耗,且实现过程简单。In this embodiment, the GPS data of the traffic trajectory to be analyzed is obtained, the minimum area of the GPS data distribution is determined, and the area is divided into a plurality of sub-areas with equal areas, and the GPS data points contained in each sub-area are counted number, analyze the virtual path matched with the GPS data points in each sub-area, and obtain whether the GPS data in each sub-area is abnormal data according to the number of GPS data points contained in each sub-area and the virtual path , the abnormal data judging process does not need to rely on an additional database, thus reducing the cost of using the database and the consumption of the data analysis process, and the implementation process is simple.
另外,本发明实施例中判断过程中采用的相应的技术手段还可以带来以下有益效果:In addition, the corresponding technical means adopted in the judgment process in the embodiment of the present invention can also bring the following beneficial effects:
1、通过划分成多个细分的子区域,降低了个子区域之间的耦合度,从而使得对海量数据的分析过程可以并行完成,通过并行处理方式处理每个子区域的GPS数据,有效提高了异常数据判断的过程。1. By dividing into multiple subdivided sub-regions, the coupling degree between sub-regions is reduced, so that the analysis process of massive data can be completed in parallel, and the GPS data of each sub-region is processed by parallel processing, which effectively improves The process of judging abnormal data.
2、以城市道路宽度为尺度标准对城市的区域进行矩形网格划分,可以保证每个矩形子区域内包含的道路轨迹基本是唯一的,且呈直线。2. Using the urban road width as the scale standard to divide the urban area into a rectangular grid can ensure that the road trajectory contained in each rectangular sub-area is basically unique and straight.
3、结合GPS数据点在子区域的累积数量和虚拟路径轨迹的统计分析相结合的综合判断方法,可以有效提高对异常数据判断的准确率。3. The comprehensive judgment method combining the cumulative number of GPS data points in the sub-region and the statistical analysis of the virtual path trajectory can effectively improve the accuracy of abnormal data judgment.
实施例二Embodiment two
图4示出了本发明实施例二提供的交通轨迹数据中异常数据获取系统的结构图,为了便于说明,仅示出了与本发明实施例相关的部分,该系统可以是内置于交通估计处理设备中的软件单元、硬件单元或者软硬结合单元。Fig. 4 shows the structural diagram of the abnormal data acquisition system in the traffic trajectory data provided by the second embodiment of the present invention. For the convenience of explanation, only the parts related to the embodiment of the present invention are shown, and the system can be built into the traffic estimation process A software unit, a hardware unit, or a combination of software and hardware in a device.
所述方法包括:GPS数据获取单元41、确定单元42、统计单元43、分析单元44以及异常数据获取单元45。The method includes: a GPS data acquisition unit 41 , a determination unit 42 , a statistics unit 43 , an analysis unit 44 and an abnormal data acquisition unit 45 .
GPS数据获取单元41,用于获取待分析的交通轨迹的GPS数据;GPS data acquiring unit 41, for acquiring the GPS data of the traffic trajectory to be analyzed;
确定单元42,用于确定所述GPS数据分布的最小区域,并将所述区域划分成多个面积相等的子区域;A determination unit 42, configured to determine the minimum area of the GPS data distribution, and divide the area into a plurality of sub-areas with equal areas;
统计单元43,用于统计每个子区域中包含的GPS数据点的个数;Statistical unit 43, used to count the number of GPS data points contained in each sub-region;
分析单元44,用于分析与每个子区域中GPS数据点匹配的虚拟路径;An analysis unit 44, configured to analyze a virtual path matched with GPS data points in each sub-area;
异常数据获取单元45,用于根据所述每个子区域中包含的GPS数据点的个数以及虚拟路径,获取每一子区域中的GPS数据是否为异常数据。The abnormal data obtaining unit 45 is configured to obtain whether the GPS data in each sub-region is abnormal data according to the number of GPS data points contained in each sub-region and the virtual path.
可选的,所述确定单元42包括:Optionally, the determining unit 42 includes:
确定模块,用于确定所述GPS数据分布的最小矩形区域,所述矩形的竖边平行于经线,矩形的横边平行于纬线;A determining module, configured to determine the smallest rectangular area of the GPS data distribution, the vertical side of the rectangle is parallel to the longitude, and the horizontal side of the rectangle is parallel to the latitude;
划分模块,用于以等距经线族和等距纬线族将所述最小矩形区域划分多个子区域,所述每个子区域为矩形。A division module, configured to divide the minimum rectangular area into a plurality of sub-areas by means of equidistant meridian groups and equidistant parallel groups, and each sub-region is a rectangle.
可选的,所述每个子区域为矩形的最长边的长度小于平均道路宽度的2倍。Optionally, the longest side of each sub-region is a rectangle whose length is less than twice the average road width.
可选的,所述分析单元44,具体用于通过线性回归分析,获取与每个子区域中GPS数据点匹配的虚拟路径。Optionally, the analysis unit 44 is specifically configured to obtain a virtual route matching with GPS data points in each sub-area through linear regression analysis.
可选的,所述异常数据获取单元45,具体用于判断当前子区域的GPS数据点的个数是否大于第一阈值;Optionally, the abnormal data acquisition unit 45 is specifically configured to determine whether the number of GPS data points in the current sub-area is greater than a first threshold;
若小于等于第一阈值,当前子区域的所有GPS数据点为异常GPS数据点;If it is less than or equal to the first threshold, all GPS data points in the current sub-area are abnormal GPS data points;
若大于第一阈值,则判断当前子区域的GPS数据点的个数是否大于第二阈值时;If greater than the first threshold, when judging whether the number of GPS data points in the current sub-area is greater than the second threshold;
若小于等于第二阈值,则判断当前子区域中所有GPS数据点与虚拟路径的垂直距离的标准差是否大于标准差阈值,若小于等于标准差阈值,则当前子区域的所有GPS数据点为正常GPS数据点,若大于标准差阈值,则当前子区域的所有GPS数据点为异常GPS数据点,其中,所述第一阈值小于第二阈值;If it is less than or equal to the second threshold, it is judged whether the standard deviation of the vertical distance between all GPS data points in the current sub-region and the virtual path is greater than the standard deviation threshold, if it is less than or equal to the standard deviation threshold, then all GPS data points in the current sub-region are normal If the GPS data point is greater than the standard deviation threshold, all GPS data points in the current sub-area are abnormal GPS data points, wherein the first threshold is less than the second threshold;
若大于第二阈值,则当前子区域的所有GPS数据点为正常GPS数据点。If it is greater than the second threshold, all GPS data points in the current sub-area are normal GPS data points.
本发明实施例提供的交通轨迹数据中异常数据获取系统可以使用在前述对应的方法实施例一中,详情参见上述实施例一的描述,在此不再赘述。The abnormal data acquisition system in the traffic trajectory data provided by the embodiment of the present invention can be used in the first corresponding method embodiment above. For details, refer to the description of the first embodiment above, and will not be repeated here.
值得注意的是,上述系统实施例中,所包括的各个单元只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。It is worth noting that in the above system embodiments, the units included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, the specific names of each functional unit It is only for the convenience of distinguishing each other, and is not used to limit the protection scope of the present invention.
另外,本领域普通技术人员可以理解实现上述各实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,相应的程序可以存储于一计算机可读取存储介质中,所述的存储介质,如ROM/RAM、磁盘或光盘等。In addition, those of ordinary skill in the art can understand that all or part of the steps in the methods of the above-mentioned embodiments can be completed by instructing related hardware through programs, and the corresponding programs can be stored in a computer-readable storage medium. Storage media, such as ROM/RAM, magnetic disk or optical disk, etc.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be included in the protection of the present invention. within range.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210572392.XA CN103035123B (en) | 2012-12-25 | 2012-12-25 | Abnormal data acquisition methods and system in a kind of traffic track data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210572392.XA CN103035123B (en) | 2012-12-25 | 2012-12-25 | Abnormal data acquisition methods and system in a kind of traffic track data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103035123A CN103035123A (en) | 2013-04-10 |
CN103035123B true CN103035123B (en) | 2016-01-20 |
Family
ID=48021983
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210572392.XA Active CN103035123B (en) | 2012-12-25 | 2012-12-25 | Abnormal data acquisition methods and system in a kind of traffic track data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103035123B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104035985B (en) * | 2014-05-30 | 2017-07-07 | 同济大学 | A kind of method for digging towards Fundamental Geographic Information System abnormal data |
CN104915671A (en) * | 2015-06-23 | 2015-09-16 | 中国矿业大学 | FGAK (Fast Global Alignment Kernels) based abnormal trajectory detection method |
CN105355042B (en) * | 2015-10-23 | 2017-09-19 | 东南大学 | A road network extraction method based on taxi GPS |
CN107884795B (en) * | 2016-09-30 | 2021-06-29 | 厦门雅迅网络股份有限公司 | Method and system for judging entering and exiting areas based on GPS |
CN108242145B (en) * | 2016-12-26 | 2020-10-16 | 阿里巴巴(中国)有限公司 | Abnormal track point detection method and device |
CN107330085B (en) * | 2017-07-03 | 2020-07-17 | 上海世脉信息科技有限公司 | Method for judging, identifying and correcting error position of fixed sensor in big data environment |
CN107633674A (en) * | 2017-09-14 | 2018-01-26 | 王淑芳 | A kind of emphasis commerial vehicle exception tracing point elimination method and system |
CN109831744B (en) * | 2017-11-23 | 2021-05-07 | 腾讯科技(深圳)有限公司 | Abnormal track identification method and device and storage equipment |
CN108198418B (en) * | 2017-12-29 | 2019-03-19 | 徐欢 | A kind of traffic big data acquisition system of combination key point |
CN110058276A (en) * | 2019-02-27 | 2019-07-26 | 北京三快在线科技有限公司 | Abnormal point judgment method and device |
CN110160539A (en) * | 2019-05-28 | 2019-08-23 | 北京百度网讯科技有限公司 | Map-matching method, calculates equipment and medium at device |
CN111127891B (en) * | 2019-12-27 | 2021-04-27 | 中国交通通信信息中心 | Road network checking method based on floating car track big data |
CN112182133B (en) * | 2020-09-29 | 2022-02-15 | 南京北斗创新应用科技研究院有限公司 | A Ship Loitering Detection Method Based on AIS Data |
CN114019861B (en) * | 2021-10-29 | 2024-07-23 | Oppo广东移动通信有限公司 | GPS module control method and device, storage medium and electronic equipment |
CN116052417B (en) * | 2022-12-21 | 2024-08-20 | 浙江零跑科技股份有限公司 | Driving prediction method, device, equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0880120A2 (en) * | 1997-05-24 | 1998-11-25 | Daimler-Benz Aktiengesellschaft | Method for detecting and signalling traffic position data |
CN202075864U (en) * | 2011-04-28 | 2011-12-14 | 北京市劳动保护科学研究所 | Abnormal traffic state automatic detection system |
CN102521973A (en) * | 2011-12-28 | 2012-06-27 | 昆明理工大学 | Road matching method for mobile phone switching positioning |
CN102800191A (en) * | 2012-07-31 | 2012-11-28 | 北京世纪高通科技有限公司 | Traffic evaluation method and device |
-
2012
- 2012-12-25 CN CN201210572392.XA patent/CN103035123B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0880120A2 (en) * | 1997-05-24 | 1998-11-25 | Daimler-Benz Aktiengesellschaft | Method for detecting and signalling traffic position data |
CN202075864U (en) * | 2011-04-28 | 2011-12-14 | 北京市劳动保护科学研究所 | Abnormal traffic state automatic detection system |
CN102521973A (en) * | 2011-12-28 | 2012-06-27 | 昆明理工大学 | Road matching method for mobile phone switching positioning |
CN102800191A (en) * | 2012-07-31 | 2012-11-28 | 北京世纪高通科技有限公司 | Traffic evaluation method and device |
Non-Patent Citations (3)
Title |
---|
基于轨迹分析的交通目标异常行为识别;李明之,等;《电视技术》;20120102;第36卷(第1期);第106-112页 * |
基于轨迹点局部异常度的异常点检测算法;刘良旭,等;《计算机学报》;20111031;第34卷(第10期);第1966-1975页 * |
运动目标轨迹分类与识别;潘奇明,等;《火力与指挥控制》;20091130;第34卷(第11期);第79-83页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103035123A (en) | 2013-04-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103035123B (en) | Abnormal data acquisition methods and system in a kind of traffic track data | |
CN107657637B (en) | A method for obtaining the working area of agricultural machinery | |
CN102595323B (en) | Method for obtaining resident travel characteristic parameter based on mobile phone positioning data | |
CN104167092B (en) | A kind of method determining center, on-board and off-board hot spot region of hiring a car and device | |
CN105243128B (en) | A kind of user behavior method of trajectory clustering based on data of registering | |
CN110298500A (en) | A kind of urban transportation track data set creation method based on taxi car data and city road network | |
CN106651027B (en) | An optimization method of Internet shuttle bus route based on social network | |
WO2015096400A1 (en) | Bus planning method using mobile communication data mining | |
CN108417023A (en) | A method for selecting the center point of traffic district based on spatial clustering of taxi pick-up and drop-off points | |
CN102157075A (en) | Method for predicting bus arrivals | |
CN105117595B (en) | A kind of private car trip data integrated approach based on floating car data | |
CN107403550B (en) | Public transport road network data acquisition method and device and terminal equipment | |
CN104282142B (en) | Bus station arrangement method based on taxi GPS data | |
CN112036757A (en) | Parking transfer parking lot site selection method based on mobile phone signaling and floating car data | |
CN104599499B (en) | A kind of method and device of distributed statistics traffic location | |
Jiao et al. | Understanding the land use function of station areas based on spatiotemporal similarity in rail transit ridership: A case study in Shanghai, China | |
CN105336155A (en) | Bus frequency increasing method and system | |
CN112733112B (en) | Method and device for determining travel mode of user, electronic equipment and storage medium | |
CN116233757A (en) | Resident travel carbon emission amount calculating method based on mobile phone signaling data | |
CN106408936B (en) | A kind of highway anomalous event real-time detection method based on data in mobile phone | |
Bao et al. | Spatiotemporal clustering analysis of shared electric vehicles based on trajectory data for sustainable urban governance | |
Shi et al. | A sequential pattern mining based approach to adaptively detect anomalous paths in floating vehicle trajectories | |
CN115862331A (en) | Vehicle trajectory reconstruction method considering bayonet network topology | |
Zhu et al. | Urban population migration pattern mining based on taxi trajectories | |
Wang et al. | Spatio-temporal anomaly detection in traffic data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |