CN104462187B - Gunz Validation of Data method based on maximum likelihood ratio - Google Patents
Gunz Validation of Data method based on maximum likelihood ratio Download PDFInfo
- Publication number
- CN104462187B CN104462187B CN201410568300.XA CN201410568300A CN104462187B CN 104462187 B CN104462187 B CN 104462187B CN 201410568300 A CN201410568300 A CN 201410568300A CN 104462187 B CN104462187 B CN 104462187B
- Authority
- CN
- China
- Prior art keywords
- mrow
- data
- msub
- msup
- component
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000007476 Maximum Likelihood Methods 0.000 title claims abstract description 16
- 238000010200 validation analysis Methods 0.000 title 1
- 238000005259 measurement Methods 0.000 claims abstract description 14
- 238000002474 experimental method Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 238000013502 data validation Methods 0.000 claims 5
- 238000012795 verification Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 20
- 230000007246 mechanism Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000013479 data entry Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241000208140 Acer Species 0.000 description 1
- 240000001436 Antirrhinum majus Species 0.000 description 1
- 241000304405 Sedum burrito Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Telephonic Communication Services (AREA)
Abstract
本发明提供了一种基于最大似然比的群智数据有效性验证方法,包括步骤:实验获取一个未经训练的普通人将某个观测分量判断错的先验概率;服务器对已经积累的所有数据按观测值归类;对同一测量值的所有数据,使用核密度估计计算概率密度函数,计算置信概率;服务器等待用户上传新的数据;测量者使用其移动终端进行多次测量,获得一组数据,连同测量者自己观察得到的观测分量一同上传给服务器;服务器将用户提供的数据与数据库相比较,使用一种基于最大似然比的群智数据有效性验证方法计算这组数据的似然可靠度;服务器决定是否接受这组数据,根据可靠性支付报酬,更新这个测量值的数据库,重新计算概率密度函数和置信概率。
The present invention provides a method for verifying the validity of crowd intelligence data based on maximum likelihood ratio. The data is classified according to the observed value; for all the data of the same measured value, the probability density function is calculated using kernel density estimation, and the confidence probability is calculated; the server waits for the user to upload new data; the measurer uses his mobile terminal to perform multiple measurements to obtain a set of The data, together with the observed components observed by the measurer, are uploaded to the server; the server compares the data provided by the user with the database, and calculates the likelihood of this set of data using a group intelligence data validity verification method based on the maximum likelihood ratio Reliability; the server decides whether to accept this set of data, pays remuneration according to reliability, updates the database of this measured value, and recalculates the probability density function and confidence probability.
Description
技术领域technical field
本发明涉及通信技术领域,具体地,涉及一种基于最大似然比的群智数据有效性验证方法。The present invention relates to the field of communication technology, in particular to a maximum likelihood ratio-based method for verifying the validity of crowd intelligence data.
背景技术Background technique
群智(crowdsourcing)在智能手机的应用中有十分广阔的前景。随着互联网技术的飞速发展,网络中个体的数量飞速增长,个体相互之间的联系也越来越紧密。在这样的大环境下,群智服务应运而生。如何有效的构建群智服务平台,促进社会中的资源共享,是下一代互联网研究需要解决的重要问题。Crowdsourcing has a very broad prospect in the application of smart phones. With the rapid development of Internet technology, the number of individuals in the network is increasing rapidly, and the connection between individuals is getting closer and closer. In such a big environment, Swarm Intelligence Service came into being. How to effectively build a group intelligence service platform and promote resource sharing in society is an important issue that needs to be solved in the next generation Internet research.
如今,信息提供商往往采用群智激励机制(Crowdsourcing IncentiveMechanism),将采集信息的工作交由分散的用户来做,并为他们提供的信息或服务给予一定的回报。例如有人想知道某段道路的拥堵情况,由正在该路段上的用户提供的信息不仅比提供商派人去勘察得到的信息更快也更准确。如今手机传感技术(Mobile PhoneSensing)正在蓬勃的发展之中,多种多样的传感设备正在被安装到智能手机上,例如加速传感器,GPS,距离传感器,相机等。利用这些分散的用户的智能手机传感技术获取到所需的信息并上传给提供商是现阶段逐渐流行的手段。Nowadays, information providers often adopt the Crowdsourcing Incentive Mechanism (Crowdsourcing Incentive Mechanism), assigning the work of collecting information to scattered users, and giving them certain returns for the information or services they provide. For example, if someone wants to know the congestion situation of a certain road, the information provided by the user on this road is not only faster but also more accurate than the information obtained by the provider sending people to investigate. Nowadays, mobile phone sensing technology (Mobile PhoneSensing) is developing vigorously, and a variety of sensing devices are being installed on smart phones, such as acceleration sensors, GPS, distance sensors, cameras, etc. It is an increasingly popular method to use the sensor technology of these decentralized users' smart phones to obtain the required information and upload it to the provider.
尽管群智有众多优点,但是其弊端也是不可避免的。由于数据的测量者没有经过专业训练,测量的数据的观测误差总体来说会比较大,而且,由于测量者未经训练,不同数据的有效性的差异也会比通过传统方法获得的数据更大。极端情况下,如果测量者对测试对象非常陌生,甚至误操作,导致数据严重偏离了正常水平,采用这个数据将会对样本的有效性造成一定损害。Although swarm intelligence has many advantages, its disadvantages are also inevitable. Since the measurers of the data are not professionally trained, the observation error of the measured data will be relatively large in general, and, because the measurers are not trained, the difference in the validity of different data will be greater than that obtained by traditional methods . In extreme cases, if the measurer is very unfamiliar with the test object, or even misuses it, causing the data to seriously deviate from the normal level, the use of this data will cause certain damage to the validity of the sample.
这是群智场景中特有的一种误差,以下称为观测误差;其余的称为测量误差。这两种误差通常都可以用更大的样本量来弥补,但是我们的目的在于通过概率论的方法对群智数据进行定量评价与比较。进一步地,目的在于能从中筛选出相对有效性更高的一部分,也就是观测误差较小的一部分。This is a unique error in the crowd intelligence scene, hereinafter referred to as observation error; the rest are called measurement errors. These two kinds of errors can usually be compensated by a larger sample size, but our purpose is to quantitatively evaluate and compare crowdsmart data through the method of probability theory. Further, the purpose is to select a part with higher relative validity, that is, a part with smaller observation error.
经过对现有技术文献的检索发现,M.Ramadan等2008年在InternationalSymposium on Personal,Indoor and Mobile Radio Communications发表的“Implementation and evaluation of cooperative video streaming for mobiledevices”中提出了基于合作下载的视频分享机制,但该机制要求所有参与用户都相互认识并主动组成无线局域网,因而应用场景受到了极大限制。L.Keller等2012年在International Conference on Mobile Systems,Applications,and Services发表的“MicroCast:cooperative video streaming on smartphones”中提出了一种利用手机之间无线通信实现的视频协作下载加速机制。但该机制要求所有参与用户都希望下载同一个视频,该条件在大部分情况下都得不到满足,因而有很大的局限性。After searching the existing technical literature, it was found that M.Ramadan et al. proposed a video sharing mechanism based on cooperative downloading in "Implementation and evaluation of cooperative video streaming for mobile devices" published by International Symposium on Personal, Indoor and Mobile Radio Communications in 2008. However, this mechanism requires all participating users to know each other and actively form a wireless LAN, so the application scenarios are greatly limited. In "MicroCast: cooperative video streaming on smartphones" published by L.Keller et al. at the International Conference on Mobile Systems, Applications, and Services in 2012, a collaborative video download acceleration mechanism was proposed using wireless communication between mobile phones. However, this mechanism requires that all participating users want to download the same video, which is not satisfied in most cases, so it has great limitations.
发明内容Contents of the invention
针对现有技术中的缺陷,本发明的目的是提供一种基于最大似然比的群智数据有效性验证方法,通过利用服务器数据库中已经积累的大量数据内容更好地筛选有效的数据,减少录入错误数据造成的判断偏差。Aiming at the defects in the prior art, the purpose of the present invention is to provide a method for verifying the validity of group intelligence data based on maximum likelihood ratio, which can better screen effective data by using the accumulated large amount of data content in the server database, and reduce Judgment bias caused by incorrect data entry.
根据本发明提供的一种基于最大似然比的群智数据有效性验证方法,包括如下步骤:According to a method for verifying the validity of crowd intelligence data based on maximum likelihood ratio provided by the present invention, it comprises the following steps:
步骤1:实验获取先验概率plj,其中,plj表示对于某个观测分量j,一个未经训练的测量者将该观测分量j判断为l的概率;Step 1: Experimentally obtain the prior probability p lj , where p lj represents the probability that an untrained measurer judges the observation component j as l for a certain observation component j;
步骤2:服务器对已经积累的所有数据按观测值归类;对同一测量值j的所有数据,使用核密度估计计算概率密度函数,计算置信概率αj;Step 2: The server classifies all the accumulated data according to the observed value; for all the data of the same measured value j, use kernel density estimation to calculate the probability density function, and calculate the confidence probability α j ;
步骤3:服务器等待用户上传新的数据;Step 3: The server waits for the user to upload new data;
步骤4:测量者i使用其移动终端进行多次测量,获得一组数据,这组数据连同测量者自己观察得到的观测分量一同上传给服务器;Step 4: The measurer i uses his mobile terminal to conduct multiple measurements to obtain a set of data, which is uploaded to the server together with the observed components observed by the measurer himself;
步骤5:服务器将用户提供的数据与数据库相比较,计算这组数据的似然可靠度;Step 5: The server compares the data provided by the user with the database, and calculates the likelihood reliability of this set of data;
步骤6:服务器决定是否接受这组数据,根据可靠性支付报酬;如果服务器接受这组数据,返回步骤2,更新这个测量值j的数据库,重新使用步骤2中的方法计算概率密度函数和置信概率αj。Step 6: The server decides whether to accept this set of data, and pays according to the reliability; if the server accepts this set of data, return to step 2, update the database of the measured value j, and use the method in step 2 to calculate the probability density function and confidence probability α j .
优选地,所述步骤1包括如下步骤:Preferably, said step 1 includes the following steps:
步骤1.1:对于基于Wi-Fi信号强度的室内定位的训练过程中,测量者需要确定自已所处室内的位置,产生观测误差;测量者的观测误差被抽象为其处于房间中一点时对于房间最近的两个墙壁的距离的估计误差;Step 1.1: During the training process of indoor positioning based on Wi-Fi signal strength, the measurer needs to determine his or her indoor position, resulting in an observation error; the measurer’s observation error is abstracted as being the closest to the room when it is at a point in the room The estimation error of the distance between the two walls of ;
步骤1.2:通过预先的一次实验确定先验概率plj并将先验概率plj应用于所有室内定位的活动中,具体为,令多个测量者在一个没有距离参照物的房间里某些固定点j判断自己的位置l,收集该多个测量者的判断结果分布情况即作为plj;Step 1.2: Determine the prior probability p lj through a pre-experiment and apply the prior probability p lj to all indoor positioning activities, specifically, let multiple measurers be fixed in a room without distance reference objects Point j judges its own position l, and collects the distribution of the judgment results of the multiple measurers as p lj ;
步骤1.3:对于不能通过预先的一次实验确定的plj,可取克罗内克函数:Step 1.3: For p lj that cannot be determined by an experiment in advance, the Kronecker function can be used:
其中,δlj表示克罗内克函数。Among them, δ lj represents the Kronecker function.
优选地,所述步骤2包括如下步骤:Preferably, said step 2 includes the following steps:
步骤2.1:服务器的数据库中的每个观测分量对应积累数据集Dj,j=1,2,3,...,N,N表示观测分量的总数,Dj中的各个元素Dj k,k=1,2,3,...T,服从fj(x)分布,T表示每个观测分量的数据总数,fj(x)表示观测分量j所服从的概率密度函数;T=|Dj|>>M,M表示测量者一次上传的数据总数,则Step 2.1: Each observation component in the database of the server corresponds to the accumulated data set D j , j=1, 2, 3, ..., N, N represents the total number of observation components, each element D j k in D j , k=1, 2, 3,...T, subject to f j (x) distribution, T represents the total number of data of each observation component, f j (x) represents the probability density function obeyed by observation component j; T=| D j |>>M, M represents the total number of data uploaded by the measurer once, then
其中,Kh表示核密度函数,x表示数据变量;Among them, K h represents the kernel density function, and x represents the data variable;
步骤2.2:设即ns(x)表示[x-h,x+h]内数据库中已存在数据个数,h表示核密度函数Kh的带宽;Step 2.2: Set That is, n s (x) represents the number of existing data in the database in [xh, x+h], and h represents the bandwidth of the kernel density function K h ;
ns(x)可能有T+1个取值,服从分布:n s (x) may have T+1 values and obey the distribution:
其中,P(·)表示ns(x)的概率质量函数,ns(x)表示表示[x-h,x+h]内数据库中已存在数据个数,ns表示可能的取值,可取0,1,...,T,T+1中的任一值,表示从T个不同元素中取出ns个的组合数,h表示表示核密度函数Kh的带宽;Among them, P( ) represents the probability mass function of n s (x), n s (x) represents the number of existing data in the database in [xh, x+h], n s represents the possible value, which can be 0 , any value in 1,..., T, T+1, Indicates the number of combinations of n s taken from T different elements, and h indicates the bandwidth of the kernel density function K h ;
步骤2.3:通过数据库大小确定ril的期望,将这个期望作为置信概率α,其中,ril表示观测者i所上传的数据属于观测分量l的概率密度;显然,不同观测值对应的积累数据量是不同的,因此对于不同观测值有不同的置信概率αj。Step 2.3: Determine the expectation of r il through the size of the database, and use this expectation as the confidence probability α, where r il represents the probability density that the data uploaded by observer i belongs to the observation component l; obviously, the accumulated data amount corresponding to different observation values are different, so there are different confidence probabilities α j for different observations.
优选地,所述步骤4包括如下步骤:Preferably, said step 4 includes the following steps:
步骤4.1:测量者获得一组M个数据记作下式Step 4.1: The measurer obtains a set of M data and writes it down as the following formula
其中,表示测量者i对同一观测分量进行多次测量获得的一组数据,j表示这组M个数据的一个需要观测的分量的真实值,j∈{1,2,3,...,N},N表示观测分量的总数;xt i服从分量j对应分布fj(x),xt i表示测量者i上传的第t个数据;in, Indicates a set of data obtained by the measurer i for multiple measurements on the same observed component, j indicates the true value of a component that needs to be observed in this set of M data, j∈{1, 2, 3, ..., N} , N represents the total number of observed components; x t i obeys the distribution f j (x) corresponding to component j, and x t i represents the tth data uploaded by measurer i;
步骤4.2:观测误差体现为测量者将j判断为j′上报给服务器,即 Step 4.2: Observation error is reflected in that the measurer judges j as j′ and reports it to the server, that is,
优选地,所述步骤5包括如下步骤:Preferably, said step 5 includes the steps of:
步骤5.1:服务器取得数据后计算所有{ril}:Step 5.1: The server gets the data After calculating all {r il }:
其中,M表示测量者一次上传的数据总数,f(·)表示观测分量所服从的概率密度函数,l表示可能的观测分量编号,xt ij′表示观测者i上传的第t个数据,并将其判断为观测分量j′,N表示观测分量的总数,ril的物理意义为属于观测分量l的概率密度;显然,当l=j时最大;Among them, M represents the total number of data uploaded by the measurer at one time, f(·) represents the probability density function that the observed component obeys, l represents the number of the possible observed component, x t ij′ represents the tth data uploaded by the observer i, and It is judged as the observation component j′, N represents the total number of observation components, and the physical meaning of r il is The probability density belonging to the observed component l; obviously, when l=j is the largest;
步骤5.2:定义参数 Step 5.2: Define parameters
其中αj称为置信概率,plj′表示对于观测分量j′,测量者将该观测分量j′判断为观测分量l的概率;当αj=1时的意义为测量数据的最大可能概率密度的对数;显然对于相同长度的一组数据,较大者更可信;Among them, α j is called the confidence probability, and p lj' represents the probability that the measurer judges the observed component j' as the observed component l for the observed component j'; when α j =1 The meaning of is the logarithm of the maximum possible probability density of the measured data; obviously for a set of data of the same length, The larger is more credible;
步骤5.3:通过能够对所有群智数据的有效性进行排序,根据需要取其中的前若干个。Step 5.3: Pass It can sort the effectiveness of all group intelligence data, and select the first few of them according to needs.
优选地,Preferably,
在步骤2.1中,取核密度函数为均匀核函数:h足够小使得数据在带宽范围内近似均匀分布,落到这个区域内的概率Ps=P(|x-Dj k|<h)=f(x)2h;In step 2.1, the kernel density function is taken as the uniform kernel function: h is small enough to make the data approximately uniformly distributed within the bandwidth range, and the probability of falling into this area P s =P(|xD j k |<h)=f(x)2h;
在步骤2.3中,所有的数据都具有采用的价值,下面是一种计算ril的期望E{ril}的方法:In step 2.3, all The data of all have adopted value, the following is a method to calculate the expected E{r il } of r il :
其中,fl(xt)表示观测分量l取值为xt的概率密度,l表示第l个观测分量,t表示观测者上传的第t个数据,M表示测量者一次上传的数据总数,!表示阶乘,e表示自然底数,Ps=P(|x-Dj k|<h)=f(xi)2h,f(xi)用核密度估计得出;上式中不存在T以外的变量,故确定了置信概率αj与数据库大小T的关系。Among them, f l (x t ) represents the probability density of the observed component l taking the value of x t , l represents the lth observed component, t represents the tth data uploaded by the observer, M represents the total number of data uploaded by the measurer once, ! Represents factorial, e represents natural base, P s =P(|xD j k |<h)=f( xi )2h, f( xi ) is estimated by kernel density; there are no variables other than T in the above formula , so the relationship between the confidence probability α j and the database size T is determined.
与现有技术相比,本发明具有如下的有益效果:Compared with the prior art, the present invention has the following beneficial effects:
1、本发明可以通过预先实验矫正群智数据观测者的判断误差;1. The present invention can correct the judgment error of the swarm intelligence data observer through pre-experimentation;
2、本发明可以基于现有的可靠数据集,评价新进群智数据的有效性,从而合理对新进群智数据做出有效取舍。2. The present invention can evaluate the effectiveness of new swarm intelligence data based on the existing reliable data sets, so as to reasonably make effective choices for the new swarm intelligence data.
附图说明Description of drawings
通过阅读参照以下附图对非限制性实施例所作的详细描述,本发明的其它特征、目的和优点将会变得更明显:Other characteristics, objects and advantages of the present invention will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:
图1为本发明的步骤流程图。Fig. 1 is a flow chart of steps of the present invention.
具体实施方式detailed description
下面结合具体实施例对本发明进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本发明,但不以任何形式限制本发明。应当指出的是,对本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进。这些都属于本发明的保护范围。The present invention will be described in detail below in conjunction with specific embodiments. The following examples will help those skilled in the art to further understand the present invention, but do not limit the present invention in any form. It should be noted that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention. These all belong to the protection scope of the present invention.
本发明提供了一种基于最大似然比的群智数据有效性验证方法,包括步骤:实验获取一个未经训练的普通人将某个观测分量判断错的先验概率;服务器对已经积累的所有数据按观测值归类;对同一测量值的所有数据,使用核密度估计计算概率密度函数,计算置信概率;服务器等待用户上传新的数据;测量者使用其移动终端进行多次测量,获得一组数据,连同测量者自己观察得到的观测分量一同上传给服务器;服务器将用户提供的数据与数据库相比较,使用一种基于最大似然比的群智数据有效性验证方法计算这组数据的似然可靠度;服务器决定是否接受这组数据,根据可靠性支付报酬,更新这个测量值的数据库,重新计算概率密度函数和置信概率。The present invention provides a method for verifying the validity of crowd intelligence data based on maximum likelihood ratio. The data is classified according to the observed value; for all the data of the same measured value, the probability density function is calculated using kernel density estimation, and the confidence probability is calculated; the server waits for the user to upload new data; the measurer uses his mobile terminal to perform multiple measurements to obtain a set of The data, together with the observed components observed by the measurer, are uploaded to the server; the server compares the data provided by the user with the database, and calculates the likelihood of this set of data using a group intelligence data validity verification method based on the maximum likelihood ratio Reliability; the server decides whether to accept this set of data, pays remuneration according to reliability, updates the database of this measured value, and recalculates the probability density function and confidence probability.
具体地,本发明提供一种基于最大似然比的群智数据有效性验证方法,通过利用服务器数据库中已经积累的大量数据内容更好地筛选有效的数据,减少录入错误数据造成的判断偏差。Specifically, the present invention provides a maximum likelihood ratio-based crowd intelligence data validity verification method, which can better screen valid data by using a large amount of data content accumulated in the server database, and reduce judgment bias caused by wrong data entry.
参见附图1,本发明是通过以下技术方案实现的,本发明包括如下步骤:Referring to accompanying drawing 1, the present invention is realized through the following technical solutions, and the present invention comprises the steps:
第一步:实验获取先验概率plj,表示对于某个观测分量j,一个未经训练的普通人将之判断为l的概率。Step 1: Experimentally obtain the prior probability p lj , which means that for a certain observed component j, an untrained ordinary person judges it as the probability of l.
第二步:服务器对已经积累的所有数据按观测值归类。对同一测量值j的所有数据,使用核密度估计计算概率密度函数,计算置信概率αj。Step 2: The server classifies all the accumulated data according to the observed value. For all data of the same measurement value j, use kernel density estimation to calculate the probability density function, and calculate the confidence probability α j .
第三步:服务器等待用户上传新的数据。Step 3: The server waits for the user to upload new data.
第四步:测量者i使用其移动终端进行多次测量,获得一组数据,连同测量者自己观察得到的观测分量一同上传给服务器。Step 4: Measurer i uses his mobile terminal to conduct multiple measurements, obtain a set of data, and upload them to the server together with the observed components observed by the measurer himself.
第五步:服务器将用户提供的数据与数据库相比较,使用一种基于最大似然比的群智数据有效性验证方法计算这组数据的似然可靠度。Step 5: The server compares the data provided by the user with the database, and calculates the likelihood reliability of this set of data using a method of verifying the validity of group intelligence data based on the maximum likelihood ratio.
第六步:服务器决定是否接受这组数据,根据可靠性支付报酬;如果服务器接受这组数据,返回步骤2,更新这个测量值j的数据库,重新使用步骤2中的方法计算概率密度函数和置信概率αj。Step 6: The server decides whether to accept this set of data, and pays according to the reliability; if the server accepts this set of data, return to step 2, update the database of the measured value j, and use the method in step 2 to calculate the probability density function and confidence Probability α j .
下面更详细地将本发明的实施过程进行阐述。The implementation process of the present invention will be described in more detail below.
步骤一,假设服务器需要通过群智数据对某测量值进行测量,该测量值包含若干个观测分量。受观测误差的影响,测量者以概率plj将某个观测分量j误判为另一个观测分量l。实验首先获取先验概率plj。Step 1, assuming that the server needs to measure a certain measurement value through swarm intelligence data, and the measurement value includes several observation components. Affected by the observation error, the measurer misjudges a certain observed component j as another observed component l with probability p lj . The experiment first obtains the prior probability p lj .
例如,对于基于Wi-Fi信号强度的室内定位的训练过程中,测量者需要确定自己所处室内的位置,产生观测误差。测量者的观测误差可以被抽象为其处于房间中一点时对于房间最近的两个墙壁的距离的估计误差。通过预先的一次实验就可以确定这个分布plj并将其应用于所有室内定位的活动中。招募大量志愿者在一个没有显著距离参照物的房间里某些固定点j判断自己的位置l,收集他们的判断结果分布情况即可视作plj。For example, during the training process of indoor positioning based on Wi-Fi signal strength, the measurer needs to determine his or her indoor position, resulting in observation errors. The measurement error of the measurer can be abstracted as the estimation error of the distance between the two nearest walls of the room at a point in the room. This distribution p lj can be determined and applied to all indoor positioning activities through a previous experiment. Recruit a large number of volunteers to judge their own position l at some fixed point j in a room without a significant distance from the reference object, and collect the distribution of their judgment results, which can be regarded as p lj .
若不能通过预先的一次实验确定的plj,可以取Kronecker Delta函数。If p lj cannot be determined through a previous experiment, the Kronecker Delta function can be used.
步骤二,服务器的数据库中的每个观测分量对应积累数据集Dj,j=1,2,3,...,N,其中各个元素Dj k,k=1,2,3,...T,服从fj(x)分布,T=|Dj|为数据集的大小。假设可以对其通过核密度估计足够精确地恢复出fj(x)。则Step 2, each observation component in the database of the server corresponds to the accumulated data set D j , j=1, 2, 3, ..., N, where each element D j k , k = 1, 2, 3, .. .T, subject to f j (x) distribution, T=|D j | is the size of the data set. It is assumed that f j (x) can be recovered with sufficient accuracy by kernel density estimation. but
核密度函数可以取其他的任意形式,本领域技术人员可以在权利要求的范围内做出各种变形或修改,这并不影响本发明的实质内容。例如,取核密度函数为均匀核函数:h足够小使得数据在带宽范围内近似均匀分布,落到这个区域内的概率Ps=P(|x-Dj k|<h)=fj(x)2h。The kernel density function can take other arbitrary forms, and those skilled in the art can make various deformations or modifications within the scope of the claims, which do not affect the essence of the present invention. For example, take the kernel density function as the uniform kernel function: h is small enough to make the data approximately evenly distributed in the bandwidth range, and the probability of falling in this area P s =P(|xD j k |<h)=f j (x)2h.
设即[x-h,x+h]内数据库中已存在数据个数。ns(x)可能有T个取值,其分布满足Assume That is, the number of existing data in the database in [xh, x+h]. n s (x) may have T values whose distribution satisfies
由于不同的观测分量积累不同的数据量,因此不同的观测分量有不同的置信概率αj。置信概率αj用于衡量用户上传数据的采用价值其中表示用户i上传数据,且该用户将其判断为观测分量j′。若用ril表示属于观测分量l的概率密度,则ril的期望就可以作为置信概率α。下面是一种计算E{ril}的方法。Since different observation components accumulate different amounts of data, different observation components have different confidence probabilities α j . Confidence probability α j is used to measure the adoption value of user uploaded data in Indicates that user i uploads data, and the user judges it as observation component j′. If expressed by r il belongs to the probability density of the observed component l, then the expectation of r il can be used as the confidence probability α. The following is a method to calculate E{r il }.
其中Ps=P(|x-Dj k|<h)=f(xi)2h,f(xi)用核密度估计得出。式中不存在T以外的变量,故确定了置信概率αj与数据库大小T的关系。Wherein P s =P(|xD j k |<h)=f( xi )2h, f( xi ) is estimated by kernel density. There are no variables other than T in the formula, so the relationship between the confidence probability α j and the database size T is determined.
步骤三,服务器等待用户上传新的数据。Step 3, the server waits for the user to upload new data.
步骤四,测量者i对某个测量分量获得一组M个数据记作下式Step 4, measurer i obtains a set of M data for a certain measurement component and writes it down as the following formula
j表示这组数据测量分量的真实值,j∈{1,2,3,...,N}。xt i服从分量j对应分布fj(x)。观测误差体现为测量者将观测分量j判断为j′,并上报给服务器,即 j represents the real value of the measurement component of this set of data, j∈{1, 2, 3, ..., N}. x t i obeys the distribution f j (x) corresponding to component j. The observation error is reflected in the fact that the measurer judges the observed component j as j′ and reports it to the server, that is,
步骤五,服务器取得数据后计算所有{ril}:Step 5, the server obtains the data After calculating all {r il }:
显然,当l=j时最大。定义参数 Obviously, it is the largest when l=j. define parameters
通过系统可以对所有群智数据的有效性进行排序,根据需要取其中的前若干个。pass The system can sort the effectiveness of all group intelligence data, and select the first few of them according to needs.
本实施例的环境参数为:The environmental parameters of this embodiment are:
移动终端设备:六部Android智能手机,都是Nexus 4,每部智能手机都配置有1.5GHz Snapdragon APQ8064 CPU和2 G RAM六部智能手机的操作系统都是Android JellyBean(4.2)。这六部智能手机并列作为测试手机进行室内定位。Mobile terminal equipment: six Android smart phones, all Nexus 4, each smart phone is equipped with 1.5GHz Snapdragon APQ8064 CPU and 2 G RAM, and the operating system of the six smart phones is Android JellyBean (4.2). These six smartphones were used side by side as test phones for indoor positioning.
服务器:宏基4930G笔记本电脑,酷睿双核处理器,2G的内存,2G的主频。Server: Acer 4930G notebook computer, Core Duo processor, 2G memory, 2G main frequency.
以上对本发明的具体实施例进行了描述。需要理解的是,本发明并不局限于上述特定实施方式,本领域技术人员可以在权利要求的范围内做出各种变形或修改,这并不影响本发明的实质内容。Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art may make various changes or modifications within the scope of the claims, which do not affect the essence of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410568300.XA CN104462187B (en) | 2014-10-22 | 2014-10-22 | Gunz Validation of Data method based on maximum likelihood ratio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410568300.XA CN104462187B (en) | 2014-10-22 | 2014-10-22 | Gunz Validation of Data method based on maximum likelihood ratio |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104462187A CN104462187A (en) | 2015-03-25 |
CN104462187B true CN104462187B (en) | 2017-09-08 |
Family
ID=52908223
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410568300.XA Active CN104462187B (en) | 2014-10-22 | 2014-10-22 | Gunz Validation of Data method based on maximum likelihood ratio |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104462187B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105187139B (en) * | 2015-09-30 | 2018-01-23 | 中国人民解放军后勤工程学院 | A kind of outdoor radio signal reception strength map constructing method based on intelligent perception |
CN111865332B (en) * | 2020-08-04 | 2021-07-27 | 北京航空航天大学 | Reliable extraction method of low-confidence matrix and high-performance error detection and correction method for satellite-based ADS-B |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7278028B1 (en) * | 2003-11-05 | 2007-10-02 | Evercom Systems, Inc. | Systems and methods for cross-hatching biometrics with other identifying data |
CN100391255C (en) * | 2002-08-19 | 2008-05-28 | 纳格拉影像股份有限公司 | Digital Home Network Key Validity Verification Method |
CN101345601A (en) * | 2007-07-13 | 2009-01-14 | 华为技术有限公司 | A decoding method and decoder |
CN102546972A (en) * | 2010-12-28 | 2012-07-04 | 中国移动通信集团公司 | Effectiveness verification method, equipment and system for advertisement feedback information |
CN103312698A (en) * | 2013-05-24 | 2013-09-18 | 成都秦川科技发展有限公司 | Off-line data validity verifying method |
-
2014
- 2014-10-22 CN CN201410568300.XA patent/CN104462187B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100391255C (en) * | 2002-08-19 | 2008-05-28 | 纳格拉影像股份有限公司 | Digital Home Network Key Validity Verification Method |
US7278028B1 (en) * | 2003-11-05 | 2007-10-02 | Evercom Systems, Inc. | Systems and methods for cross-hatching biometrics with other identifying data |
CN101345601A (en) * | 2007-07-13 | 2009-01-14 | 华为技术有限公司 | A decoding method and decoder |
CN102546972A (en) * | 2010-12-28 | 2012-07-04 | 中国移动通信集团公司 | Effectiveness verification method, equipment and system for advertisement feedback information |
CN103312698A (en) * | 2013-05-24 | 2013-09-18 | 成都秦川科技发展有限公司 | Off-line data validity verifying method |
Also Published As
Publication number | Publication date |
---|---|
CN104462187A (en) | 2015-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10037327B2 (en) | Method and apparatus for accurate localization of points of interest | |
WO2020098606A1 (en) | Node classification method, model training method, device, apparatus, and storage medium | |
US20190065989A1 (en) | Constrained sample selection for training models | |
CN107657015B (en) | A point of interest recommendation method, device, electronic device and storage medium | |
CN108337656B (en) | Mobile crowd sensing excitation method | |
US20160337804A1 (en) | Positioning environment analysis apparatus, and method and system for predicting location determination performance of terminal using the same | |
WO2015018233A1 (en) | Method for determining position of terminal device, and terminal device | |
TW201130351A (en) | System and method for effectively populating a mesh network model | |
CN112214677A (en) | A point of interest recommendation method, device, electronic device and storage medium | |
CN110377846A (en) | Social networks method for digging, device, storage medium and computer equipment | |
Kong et al. | When Compressive Sensing Meets Mobile Crowdsensing | |
CN111148217A (en) | A positioning method, device and electronic device | |
CN104462187B (en) | Gunz Validation of Data method based on maximum likelihood ratio | |
CN107798636A (en) | Building information processing method, device, computer equipment and storage medium | |
Zhao et al. | GSMAC: GAN-based signal map construction with active crowdsourcing | |
CN109377083A (en) | Risk assessment method, apparatus, equipment and storage medium | |
Cheng et al. | Deco: False data detection and correction framework for participatory sensing | |
CN110958565B (en) | Method, apparatus, computer equipment and storage medium for calculating signal distance | |
Shi et al. | Effective truth discovery and fair reward distribution for mobile crowdsensing | |
CN108133234B (en) | Community detection method, device and device based on sparse subset selection algorithm | |
CN106611100B (en) | User behavior analysis method and device | |
Hung et al. | Model-driven traffic data acquisition in vehicular sensor networks | |
CN111835541B (en) | Method, device, equipment and system for detecting aging of flow identification model | |
CN102521362A (en) | Web service recommendation method and device | |
CN112307475A (en) | System detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |