NL2025323B1

NL2025323B1 - Data preprocessing method and apparatus for data fusion

Info

Publication number: NL2025323B1
Application number: NL2025323A
Authority: NL
Inventors: Gao Zhaohui; Sun Xinghuan; Liang Bin
Original assignee: Jiangsu Liangdong Information Tech Co Ltd
Priority date: 2019-04-12
Filing date: 2020-04-09
Publication date: 2020-12-22
Also published as: CN109766958B; NL2025323A; DE102020110028A1; CN109766958A

Abstract

A preprocessing method and apparatus for data fusion. Data measured by two different types of sensors are obtained in an arbitrary manner, and the data measured by the two different types of sensors are respectively set as target data and reference data. Features of the target data and features of the reference data are obtained based on the target data and the reference data. A target feature description set and a reference feature description set are obtained. A similarity 10 between the target data and the reference data is obtained based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set. Whether the similarity is greater than a preset critical threshold is determined based on the similarity, so as to determine whether the target data and the reference data can be fused, thus providing improved efﬁciency. 15 [FIG 1] 31

Description

DATA PREPROCESSING METHOD AND APPARATUS FOR DATA FUSION FIELD OF THE INVENTION

[0001] This application relates to the field of data fusion technologies, and in particular, to a data preprocessing method and apparatus for data fusion.

BACKGROUND OF THE INVENTION

[0002] A multi-sensing system refers to obtaining of comprehensive and complete information of an objective fact by using a plurality of sensors. For example, in an expressway application scenario, when an objective fact such as the traffic flow needs to be analyzed, real-time measurements are usually performed by providing a geomagnetic sensor and a light sensor, and then analyses are performed based on the measured data to obtain information about the traffic flow. For the multi-sensing system, the obtained data are various and complex, and data provided by various types of sensors have different features. For different analysis requirements, in order to improve credibility and utilization of the data, various sensors and the measured data thereby are usually appropriately controlled and fused by using the data fusion technology.

[0003] In addition to determining authenticity of the data, the key point of the data fusion technology lies on determining weights of various sensor data. Therefore, in the prior art, before the data fusion, a preprocessing method is usually used to preprocess the measurement data. In a common data preprocessing method, the weights of the measurement data of the various sensors which are to be used during the data fusion, are decided mainly by calculating measurement variances of the sensors so as to achieve the data fusion.

[0004] However, during the research process of the present invention, the applicant found that in the data preprocessing method in the prior art, the weights of all of the sensor measurement data in the multi-sensing system are calculated before the data fusion, and then all of the sensor data are fused based on the weights. In actual applications, various types of sensor data in the multi-sensing system have different features. When an objective fact needs to be analyzed, it may not be required to perform the data fusion on all of the sensor measurement data. The data 1 preprocessing method in the prior art cannot determine data of which sensors can be fused, and data of which sensors cannot be fused, but fuses all of the sensor measurement data. This may increase computational complexity and reduce the data fusion efficiency.

SUMMARY OF THE INVENTION

[0005] To resolve problems that the data preprocessing method in the prior art reduces the data fusion efficiency, and an error occurs between the result obtained by the data fusion and the result that is actually required, this application discloses a data preprocessing method and apparatus for data fusion, according to the following embodiments.

[0006] According to a first aspect of this application, a data preprocessing method for data fusion is disclosed, including: obtaining first measurement data, where the first measurement data include data measured by different types of sensors; obtaining data measured by two different types of sensors in an arbitrary manner based on the first measurement data, and respectively setting the data measured by the two different types of sensors as target data and reference data; obtaining features of the target data based on the target data, and obtaining features of the reference data based on the reference data; obtaining a target feature description set based on the features of the target data, and obtaining a reference feature description set based on the features of the reference data, where the target feature description set is a set that describes status of the features of the target data, and the reference feature description set is a set that describes status of the features of the reference data; obtaining a similarity between the target data and the reference data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set; and determining, based on the similarity, whether the similarity is greater than a preset critical threshold, where if a determining result is yes, it is determined that the target data and the reference data can be fused, and if the determining result is no, it is determined that the target data and the reference data cannot be fused.

[0007] Optionally, the obtaining of the similarity between the target data and the reference 2 data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set, includes: establishing a feature similarity model based on the features of the target data and the features of the reference data, where the feature similarity model is configured to 5S characterize a similarity between the features of the target data and the features of the reference data; and obtaining the similarity between the target data and the reference data based on the target feature description set, the reference feature description set, and the feature similarity model.

[0008] Optionally, the similarity between the target data and the reference data is obtained according to the following formula: Sflas)= SUB) >0,8>0 f(AnB)+af(4-B)+ Af(B- 4) ’ where Sf! (a, b) represents the similarity between the target data and the reference data, a represents the features of the target data, b represents the features of the reference data, A represents the target feature description set, B represents the reference feature description set, f (4 NB) represents a quantity of statuses that belong to both of the target feature description set and the reference feature description set, f (4-B) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the target feature description set but do not belong to the reference feature description set, f (B- A) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the reference feature description set but do not belong to target feature description set, & represents an attention degree to the features of the reference data, and £ represents an attention degree to the features of the target data.

[0009] Optionally, before the obtaining of the first measurement data, the method further includes: obtaining initial measurement data, where the initial measurement data include initial data measured by all of the sensors, all of the sensors including different types of sensors, and each type including a plurality of sensors; 3 evaluating the initial measurement data to obtain an evaluation value of the initial measurement data; establishing, for a same type of sensors, a confidence level matrix of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; and obtaining a sensor having a relatively higher confidence level among the same type of sensors based on the confidence level matrix, and setting the evaluation value of the initial measurement data of the sensor having the relatively higher confidence level as the first measurement data.

[0010] Optionally, the evaluating of the initial measurement data to obtain the evaluation value of the initial measurement data includes: setting initial data measured by a target sensor as target initial data based on the initial measurement data, where the target sensor is any one of all of the sensors, and the target initial data include data measured for multiple times by the target sensor; obtaining single-time target initial data of the target sensor based on the target initial data, and classifying the single-time target initial data into odd-numbered initial data and even-numbered initial data; obtaining a mean and a standard deviation of the odd-numbered initial data based on the odd-numbered initial data, and obtaining a mean and a standard deviation of the even-numbered initial data based on the even-numbered initial data; obtaining a partial fusion result of the single-time target initial data based on the mean and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data, where the partial fusion result includes a mean fusion value and a standard deviation fusion value; obtaining an evaluation value of the target initial data based on the partial fusion result of the single-time target initial data; and obtaining the evaluation data of the initial measurement data, where the evaluation data of the initial measurement data include evaluation values of all of the initial measurement data.

[0011] Optionally, the obtaining of the partial fusion result of the single-time target initial 4 data based on the mean and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data, includes: obtaining the partial fusion result of the single-time target initial data according to the following formulas: f= 10 0) T+ a 0) ï, o, +0, 9 +0 and … [ciel 77 ol +0) where X represents the mean fusion value, & represents the standard deviation fusion value, Xx, represents the mean of the odd-numbered initial data, ¥, represents the mean of the even-numbered initial data, ©, represents the standard deviation of the odd-numbered initial data, and ©, represents the standard deviation of the even-numbered initial data.

[0012] Optionally, the establishing of the confidence level matrix of the initial measurement data of the same type of sensors based on the evaluation value of the initial measurement data, for the same type of sensors, includes: obtaining, for the same type of sensors, the evaluation value of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; obtaining a confidence distance between the same type of sensors based on the evaluation value of the initial measurement data of the same type of sensors; and establishing the confidence level matrix of the initial measurement data of the same type of sensors based on the confidence distance.

[0013] According to a second aspect of this application, a data preprocessing apparatus for data fusion is disclosed, including: a first data obtaining module, configured to obtain first measurement data, where the first measurement data include data measured by different types of sensors; a second data obtaining module, configured to obtain data measured by two different types of sensors in an arbitrary manner based on the first measurement data, and respectively set the data measured by the two different types of sensors as target data and reference data; 5 a feature obtaining module, configured to obtain features of the target data based on the target data, and obtain features of the reference data based on the reference data; a feature description set obtaining module, configured to obtain a target feature description set based on the features of the target data, and obtain a reference feature description set based on the features of the reference data, where the target feature description set is a set that describes status of the features of the target data, and the reference feature description set is a set that describes status of the features of the reference data; a similarity obtaining module, configured to obtain a similarity between the target data and the reference data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set; and a determining module, configured to determine, based on the similarity, whether the similarity is greater than a preset critical threshold, where if a determining result is yes, it is determined that the target data and the reference data can be fused, and if the determining result is no, it is determined that the target data and the reference data cannot be fused.

[0014] Optionally, the similarity obtaining module includes: a model establishment unit, configured to establish a feature similarity model based on the features of the target data and the features of the reference data, where the feature similarity model is configured to characterize a similarity between the features of the target data and the features of the reference data; and a similarity obtaining unit, configured to obtain the similarity between the target data and the reference data based on the target feature description set, the reference feature description set, and the feature similarity model.

[0015] Optionally, the apparatus further includes: an initial data obtaining model, configured to obtain initial measurement data, where the initial measurement data are initial data measured by all of the sensors, all of the sensors including different types of sensors, and each type including a plurality of sensors; an evaluation module, configured to evaluate the initial measurement data to obtain an evaluation value of the initial measurement data; a confidence level matrix establishment module, configured to establish, for a same type of sensors, a confidence level matrix of the initial measurement data of the same type of 6 sensors based on the evaluation value of the initial measurement data; and a first data setting model, configured to obtain a sensor having a relatively higher confidence level among the same type of sensors based on the confidence level matrix, and set the evaluation value of the initial measurement data of the sensor having the relatively higher confidence level as the first measurement data.

[0016] Optionally, the similarity obtaining module is further configured to obtain the similarity between the target data and the reference data according to the following formula: Sflas)= SUB) >0,8>0 f(AnB)+af(4-B)+ Af(B- 4) where Sta, b) represents the similarity between the target data and the reference data, a represents the features of the target data, b represents the features of the reference data, A represents the target feature description set, B represents the reference feature description set, f (4 NB) represents a quantity of statuses that belong to both of the target feature description set and the reference feature description set, f (4-B) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the target feature description set but do not belong to the reference feature description set, f (B -A) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the reference feature description set but do not belong to target feature description set, & represents an attention degree to the features of the reference data, and 5 represents an attention degree to the features of the target data.

[0017] Optionally, the evaluation module includes: a target initial data setting model, configured to set initial data measured by a target sensor as target initial data based on the initial measurement data, where the target sensor is any one of all of the sensors, and the target initial data include data measured for multiple times by the target sensor; a single-time target initial data obtaining unit, configured to obtain single-time target initial data of the target sensor based on the target initial data, and classify the single-time target initial data into odd-numbered initial data and even-numbered initial data, a first calculation unit, configured to obtain a mean and a standard deviation of the 7 odd-numbered initial data based on the odd-numbered initial data, and obtain a mean and a standard deviation of the even-numbered initial data based on the even-numbered initial data; a partial fusion result obtaining unit, configured to obtain a partial fusion result of the single-time target initial data based on the mean and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data, where the partial fusion result includes a mean fusion value and a standard deviation fusion value; a first evaluation value obtaining unit, configured to obtain an evaluation value of the target initial data based on the partial fusion result of the single-time target initial data; and a second evaluation value obtaining unit, configured to obtain the evaluation data of the initial measurement data, where the evaluation data of the initial measurement data include evaluation values of all of the initial measurement data.

[0018] Optionally, the partial fusion result obtaining unit is further configured to obtain the partial fusion result of the single-time target initial data according to the following formulas: f= 10 0) T+ a 0) %, o, +0, o, +0, : and … [ciel ° Voiso; where x represents the mean fusion value, & represents the standard deviation fusion value, Xx, represents the mean of the odd-numbered initial data, X, represents the mean of the even-numbered initial data, ©, represents the standard deviation of the odd-numbered initial data, and ©, represents the standard deviation of the even-numbered initial data.

[0019] Optionally, the confidence level matrix establishment module includes: a third evaluation value obtaining unit, configured to obtain, for the same type of sensors, the evaluation value of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; a confidence distance obtaining unit, configured to obtain a confidence distance between the same type of sensors based on the evaluation value of the initial measurement data of the same type of sensors; and 8 a confidence level matrix establishment unit, configured to establish the confidence level matrix of the initial measurement data of the same type of sensors based on the confidence distance.

[0020] This application discloses a preprocessing method and apparatus for data fusion. In the method, data measured by two different types of sensors is obtained in an arbitrary manner, and the data measured by the two different types of sensors are respectively set as the target data and the reference data. The features of the target data and the features of the reference data are obtained based on the target data and the reference data. Subsequently, the target feature description set and the reference feature description set are obtained. Subsequently, a similarity between the target data and the reference data is obtained based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set. Finally, whether the similarity is greater than the preset critical threshold is determined based on the similarity, so as to determine whether the target data and the reference data can be fused.

[0021] Before the data fusion, the data preprocessing method in the prior art cannot determine data of which sensors can be fused, and data of which sensors cannot be fused, but fuses all of the sensor measurement data. This results in that computational complexity during the data fusion is increased and the data fusion efficiency is reduced. Compared with the prior art, the data preprocessing method and apparatus disclosed in this application can obtain, at a feature level, the similarity between the sensor measurement data based on the features of the sensor measurement data, so as to determine, based on the similarity, whether the measurement data of any two sensors can be fused. In actual applications, for the multi-sensing system, the data preprocessing method disclosed in this application can determine, before the data fusion, measurement data of which sensors can be fused, so that the measurement data of the sensors can be fused in a targeted manner during the data fusion, thereby reducing the computational complexity of the data fusion and improving the data fusion efficiency.

BRIEF DESCRIPTION OF THE DRAWINGS

[0022] To more clearly describe the technical solutions of this application, the accompanying drawings to be used in the embodiments are briefly illustrated below. It is apparent that persons 9 of ordinary skills in the art can also derive other accompanying drawings according to these accompanying drawings without an effective effort.

[0023] FIG 1 is a schematic workflow diagram of a data preprocessing method for data fusion according to an embodiment of this application; and

[0024] FIG 2 is a schematic structural diagram of a data preprocessing apparatus for data fusion according to an embodiment of this application.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[0025] To resolve problems that the data preprocessing method in the prior art reduces the data fusion efficiency, and an error occurs between the result obtained by the data fusion and the result that is actually required, this application discloses a data preprocessing method and apparatus for data fusion, according to the following embodiments.

[0026] Refer to the schematic workflow diagram shown in FIG 1, a first embodiment of this application discloses a data preprocessing method for data fusion, including the following steps.

[0027] Step S11: Obtaining first measurement data, where the first measurement data include data measured by different types of sensors.

[0028] Step S12: Obtaining data measured by two different types of sensors in an arbitrary manner based on the first measurement data, and respectively set the data measured by the two different types of sensors as target data and reference data.

[0029] Step S13: Obtaining features of the target data based on the target data, and obtaining features of the reference data based on the reference data.

[0030] Step S14: Obtaining a target feature description set based on the features of the target data, and obtaining a reference feature description set based on the features of the reference data, where the target feature description set is a set that describes the status of the features of the target data, and the reference feature description set is a set that describes the status the features of the reference data.

[0031] People have different recognitions about complex objective objects, and thus the abstractions of the corresponding objective objects also differ. Inevitably, knowledge elements that do not conform to "definition" and "incompleteness" may occur, finally resulting in a problem that descriptions of data for the objective objects or an attribute thereof are inconsistent. 10

The knowledge element refers to a knowledge unit that cannot be further segmented and has a complete knowledge expression. To determine whether data measured by any two sensors can be fused, the data preprocessing method disclosed in this application constructs a knowledge element base by obtaining the features of the target data and the features of the reference data, and further determines whether descriptions of the target data and the reference data are consistent at a feature level, so as to enable descriptions of data used during the data fusion to be consistent at the feature level. For example, when processing the measurement data of the light sensor, because the light sensor is mainly configured to measure a shadow parameter and a brightness parameter, for the data measured by the light sensor, shadow and brightness may serve as features of the measurement data of the light sensor, and data related to the shadow and data related to the brightness may be used as feature description sets by extracting, from the measurement data, relevant data for describing the shadow and relevant data for describing the brightness.

[0032] Step S15: Obtaining a similarity between the target data and the reference data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set.

[0033] Step S16: Determining, based on the similarity, whether the similarity is greater than a preset critical threshold, where if a determining result is yes, it is determined that the target data and the reference data can be fused, and if the determining result is no, it is determined that the target data and the reference data cannot be fused.

[0034] To obtain comprehensive and complete information of an environment or an objective object, the multi-sensing system may include multiple types of sensors. For example, the multi-sensing system in an expressway application usually includes an infrared sensor, an ultrasonic sensor, a piezoelectric sensor, a light sensor, a geomagnetic sensor, and the like. For a certain analysis requirement, for example, the traffic flow, the measurement data of the light sensor and the geomagnetic sensor are fused during the data fusion. However, by using the data preprocessing method in the prior art, before the data fusion it cannot be determined that data of which sensors can be fused and data of which sensors cannot be fused, but all of the sensor measurement data are fused. This results in that computational complexity during the data fusion is increased and the data fusion efficiency is reduced. Compared with the prior art, the 11 data preprocessing method and apparatus disclosed in this application can obtain the similarity between the sensor measurement data based on the features of the sensor measurement data, and determine, based on the similarity, whether the measurement data of any two sensors can be fused. In actual applications, for the multi-sensing system, the data preprocessing method disclosed in this application can determine, before the data fusion, the measurement data of which sensors can be fused, so that the measurement data of such sensors can be fused in a targeted manner during the data fusion, thereby reducing the computational complexity of the data fusion and improving the data fusion efficiency.

[0035] Further, the obtaining of the similarity between the target data and the reference data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set, includes: establishing a feature similarity model based on the features of the target data and the features of the reference data, where the feature similarity model is configured to characterize a similarity between the features of the target data and the features of the reference data.

[0036] An objective object or system may be abstracted as a model m, and the model m may be expressed as a knowledge element Km by studying the common form of knowledge expression of the model m. Assuming that Nm is a conceptual name of the objective object or system, where Nm is essentially a set of vocabularies having a same meaning or similar meanings. Am is a set for describing attributes of features of the objective object and how to describe these attributes, and is classified into a set of qualitative descriptions of status and a set of measurable quantitative descriptions of status. Rm is a set for describing association relationships between the attributes. » € Rm indicates a mapping relationship on Amx Am. The relevant relationships may be classified into qualitative relevant relationships and quantitative relevant relationships. A relationship knowledge element model can be abstracted by summarizing the common features of the relevant relationships. The knowledge element Km corresponding to the model m may be expressed as: Km=(Nm, Am, Rm).

[0037] Obviously, the following attributes may be obtained according to a definition of a similarity function Sf of the knowledge element: Sf (x, y) SE [0,1]; Sf (x, y)=0, if and only if x and y are irrelevant; and 12

Sf (x, y)=1, if and only if x and y are the same, and Sf (x, y)=Sf (y, x). where x and y respectively represent different knowledge elements, and may be understood as the features of the target data and the features of the reference data disclosed above in this application.

[0038] It may be learned from the foregoing definitions that it is very convenient to calculate the similarity by using a geometrical principle. However, due to the symmetry of the geometrical similarity model, the geometrical similarity model is unsuitable for comparing a lot of objective objects having asymmetrical features in real life. Applications of similarity comparison models for asymmetrical features are wider, and on the basis of the feature similarity model, researches in the aspect of the extended applications of the similarity model are continuously developed in depth. The feature similarity model is used in this application. The model enumerates features of the objective object in a set manner, and obtains a similarity between the feature sets by defining a function conforming to the feature elements, so as to further characterize the feature-level similarity between the target data and the reference data.

[0039] The obtaining of the similarity further includes obtaining the similarity between the target data and the reference data based on the target feature description set, the reference feature description set, and the feature similarity model.

[0040] Further, the similarity between the target data and the reference data is obtained according to the following formula: Sfla,n)= SOB) 0,820 f(A B)+of (4-B)+ A(B-4) where Sta, b) represents the similarity between the target data and the reference data, qa represents the features of the target data, b represents the features of the reference data, A represents the target feature description set, B represents the reference feature description set, f (4 NB) represents a quantity of statuses that belong to both of the target feature description set and the reference feature description set, f(4-B) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the target feature description set but do not belong to the reference feature description set, f (B -A) represents a quantity of statuses among the target feature description set and the reference 13 feature description set that belong to the reference feature description set but do not belong to target feature description set, & represents an attention degree to the features of the reference data, and £ represents an attention degree to the features of the target data. The attention degree is a value preset according to actual application situations. For a certain analysis 3 requirement, when determining which measurement data may be fused, the attention degree represents the importance levels of different sensor measurement data. For example, for the analysis requirement of the traffic flow, when determining the similarity between the measurement data of two sensors, i.e. the light sensor and the geomagnetic sensor, if an operator considers that the measurement data of the light sensor is particular important to the analyses of the traffic flow, the attention degree to the features of the measurement data of the light sensor may be manually preset to be relatively greater, and the attention degree to the features of the measurement data of the geomagnetic sensor may be manually preset to be relatively smaller.

[0041] In actual applications, each objective fact is a multi-dimensional complex having a plurality of features. For calculations of a similarity between the objective fact having the multi-dimensional complex features, the feature similarity model for each feature may be used for linear weighing so as to calculate a synthetic multi-feature similarity model of the objective object. Assuming that the target data has 7 features, then the similarity between an i” feature of the target data and the reference data is calculated to be Sf; (a, b), and the similarity between the target data and the reference data is: Sf(a,b)= Sf (0,5) i=l , where ©, represents the weight of the i” feature, and may be preset according to actual application situations.

[0042] When determining a value of the similarity, a critical threshold # is set, and 0<u<l When Sf (a, b) > MU, it indicates that there is a large possibility that the target data and the reference data are the measurement data of a same object, and the target data and the reference data can be fused. When Sf(a,b)<u, it indicates that the target data and the reference data are not the measurement data of a same object, and the target data and the reference data cannot be fused. Specifically, the critical threshold # is set according to the 14 degree required for the similarity in actual applications. For example, if requirements on the similarity are relatively high, the critical threshold # may be set to 0.8. When requirements on the similarity are relatively low, the critical threshold # may be appropriately set to be smaller.

[0043] The sensors are easily interfered by the environment during a data collection process. Therefore, the data measured by the sensors usually have deviations, thus affecting accuracy and stability during the data fusion. Therefore, according to the data preprocessing method disclosed in this application, in order to improve accuracy of the first measurement data, errors of the initial data measured by the sensor are removed before the first measurement data are obtained.

[0044] Further, before the obtaining of the first measurement data, the method further includes: obtaining initial measurement data, where the initial measurement data include initial data measured by all of the sensors, the all of the sensors including different types of sensors, and each type including a plurality of sensors; evaluating the initial measurement data to obtain an evaluation value of the initial measurement data; establishing, for a same type of sensors, a confidence level matrix of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; and obtaining a sensor having a relatively higher confidence level among the same type of sensors based on the confidence level matrix, and setting the evaluation value of the initial measurement data of the sensor having the relatively higher confidence level as the first measurement data.

[0045] Further, the evaluating of the initial measurement data to obtain the evaluation value of the initial measurement data includes: setting initial data measured by a target sensor as target initial data based on the initial measurement data, where the target sensor is any one of all of the sensors, and the target initial data include data measured for multiple times by the target sensor; obtaining single-time target initial data of the target sensor based on the target initial 15 data, and classifying the single-time target initial data into odd-numbered initial data and even-numbered initial data; classifying the single-time target initial data into odd-numbered initial data and even-numbered initial data, where the odd-numbered initial data are X;,%,,"**,%,,"**,%5 and > include T, pieces of data in total, and the even-numbered initial data are 626,0, and include 7, pieces of data in total; obtaining a mean and a standard deviation of the odd-numbered initial data based on the odd-numbered initial data, and obtaining a mean and a standard deviation of the even-numbered initial data based on the even-numbered initial data.

[0046] Specifically, the mean of the odd-numbered initial data is obtained according to the following formula:

I Xx, = 72 xy, where in the odd-numbered initial data, % represents 1, piece of data, and 4 € [1,7 ].

[0047] The mean of the even-numbered initial data is obtained according to the following formula: _ 1a xX, = T 2 where in the even-numbered initial data, f, represents tf, ® piece of data, and 7, € [7].

[0048] The standard deviation of the odd-numbered initial data is obtained according to the following formula: 7 0, = => (x, 5) Ll

[0049] The standard deviation of the even-numbered initial data is obtained according to the following formula: 7 0, = Te 5)

[0050] The evaluating of the initial measurement data further includes: obtaining a partial fusion result of the single-time target initial data based on the mean 16 and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data, where the partial fusion result includes a mean fusion value and a standard deviation fusion value; obtaining an evaluation value of the target initial data based on the partial fusion > result of the single-time target initial data; obtaining the evaluation value of the initial measurement data, where the evaluation value of the initial measurement data include evaluation values of all of the initial measurement data.

[0051] Further, the obtaining of the partial fusion result of the single-time target initial data based on the mean and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data includes: obtaining the partial fusion result of the single-time target initial data according to the following formulas: 2 2 2 2 ios gels LT t 2 and 6 [DE o, +0, where X represents the mean fusion value, & represents the standard deviation fusion value, Xx, represents the mean of the odd-numbered initial data, X, represents the mean of the even-numbered initial data, ©, represents the standard deviation of the odd-numbered initial data, and ©, represents the standard deviation of the even-numbered initial data.

[0052] Specifically, a deduction formula of the mean fusion value £ is: . 0.0; co’ Ox) oo _ oo _ te ol +0; ! { 0 JE} - ol +07 nr 0; +05 2 where i 1] is a matrix with one row and two columns.

[0053] A deduction formula of the standard deviation fusion value X is: 2 1 2 2 ê | 1 J) == = 0 5, 1 o, +0, 17

[0054] A variance fusion value 4? may be easily obtained based on the standard deviation fusion value: =o 0, +0;

[0055] The evaluation value of the target initial data that is obtained based on the partial 5 fusion result of the single-time target initial data is: X= Bett] where * represents the evaluation value of the target initial data, X, represents the mean fusion value of the initial data measured at a j” time by the target sensor, and the target sensor measures #7 times in total.

[0056] To improve accuracy of the initial measurement data, single-time measurement data of a single sensor is partially fused, so as to remove errors of the single-time measurement data; and then the evaluation value of the initial measurement data is obtained, so as to remove errors of the initial measurement data of all of the sensors.

[0057] Because different sensors suffer from different interferences when collecting data, credibility of the data measured thereby are also different. This problem also exists in the same type of sensors. In order to screen out the measurement data having relatively higher credibility, subsequently, for the same type of sensors, a confidence level matrix of the measurement data is established by calculating a confidence distance between the sensors, and then the data measured by which sensors among the same type of sensors has relatively higher credibility is determined by using the confidence level matrix.

[0058] Further, the establishing of, for the same type of sensors, the confidence level matrix of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data, includes: obtaining, for the same type of sensors, the evaluation value of the initial measurement data of the same type of sensors based on the evaluation value of the initial measurement data, obtaining a confidence distance between the same type of sensors based on the evaluation value of the initial measurement data of the same type of sensors. 18

[0059] Specifically, assuming that a sensor £ and a sensor F are the same type of sensors, the evaluation value of the initial measurement data of the sensor E is X;, and the standard deviation fusion value is ó;; and the evaluation value of the initial measurement data of the sensor Fis X +, and the standard deviation fusion value is ó;. The confidence distance dy between the two sensors is obtained according to the following formula: Xr Xr 1 1 XX dg =2|. PAX, =2| == exp| |L ld EF J: 2 ( jd; J: 276, v) | 5, J x Xp-X (72 9% 9% a | X Xr-X Ye NDT 2 Oy where FP; (Xx ) is defined as a probability density function of X + > and 1 1(X-X, ; bP (x )= ———eXp|——| ||. X represents a mean of evaluation values of the initial N27S, 2 Op measurement data of all of the sensors. exp represents an exponential function operation that uses a natural constant e as a base number.

[0060] The establishing of the confidence level matrix further includes: establishing the confidence level matrix of the initial measurement data of the same type of sensors based on the confidence distance.

[0061] Assuming that the same type of sensors includes K numbers of sensors, that is, the same object is measured by the K numbers of sensors, then the confidence level matrix D is established based on the obtained confidence distance, as shown below: di d,, == dx D= d,, d,, == dx dy, dy, == dex |

[0062] In view of the foregoing K numbers of sensors that belong to the same type, it is assumed that the sensor £ is an £” sensor. Elements in an £” row of the established confidence level matrix D represent the confidence distances from the £” sensor to all of the remaining sensors. A sum of the elements in this row is obtained. If the sum of the elements in this row is relatively greater than a sum of elements in each of the remaining rows, it indicates that data measured by the £7 sensor is trusted by most sensors. In this case, the data measured by the E 19 sensor has a relatively higher confidence level. Otherwise, it indicates that the data measured by the £7 sensor is not trusted by most sensors. In this case, a possibility that the data measured by the £™ sensor is real data is relatively small. Specifically, when determining, for the confidence level matrix, whether data measured by a sensor has a relatively higher confidence level among the same type of sensors, it may be determined, for this type of sensors, data measured by how many sensors are finally required to be fused, according to requirements in actual applications. For example, there are ten sensors that belong to a same type. During the subsequent data fusion, if measurement data of merely five sensors are required to be fused, then five rows, sums of elements in which are relatively greater, are required to be selected from ten rows of the confidence level matrix, and sensors represented by the five rows are used as sensors having relatively higher confidence levels.

[0063] According to the data preprocessing method for data fusion disclosed in this application, error data are removed by using a data-level error-removing algorism that is on the basis of the confidence distance, and then sensor measurement data with consistent descriptions are obtained at a feature level based on a feature-level normalization algorism of knowledge elements. In this way, the sensor measurement data can be fused in a targeted manner during the data fusion, thereby reducing the computational complexity and improving the data fusion efficiency.

[0064] Apparatus embodiments disclosed in this application are described below, and may be used to implement the method embodiments of this application. For details of the apparatus embodiments of this application that are not disclosed, reference may be made to the method embodiments of this application.

[0065] Correspondingly, refer to the schematic structural diagram shown in FIG 2, another embodiment of this application discloses a data preprocessing apparatus for data fusion, including: a first data obtaining module 10, configured to obtain first measurement data, where the first measurement data include data measured by different types of sensors; a second data obtaining module 20, configured to obtain data measured by two different types of sensors in an arbitrary manner based on the first measurement data, and respectively set the data measured by the two different types of sensors as target data and 20 reference data; a feature obtaining module 30, configured to obtain features of the target data based on the target data, and obtain features of the reference data based on the reference data; a feature description set obtaining module 40, configured to obtain a target feature description set based on the features of the target data, and obtain a reference feature description set based on the features of the reference data, where the target feature description set is a set that describes status of the features of the target data, and the reference feature description set is a set that describes status of the features of the reference data; a similarity obtaining module 50, configured to obtain a similarity between the target data and the reference data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set; and a determining module 60, configured to determine, based on the similarity, whether the similarity is greater than a preset critical threshold, where if a determining result is yes, it is determined that the target data and the reference data can be fused, and if the determining result is no, it is determined that the target data and the reference data cannot be fused.

[0066] Further, the similarity obtaining module includes: a model establishment unit, configured to establish a feature similarity model based on the features of the target data and the features of the reference data, where the feature similarity model is configured to represent a similarity between the features of the target data and the features of the reference data; and a similarity obtaining unit, configured to obtain the similarity between the target data and the reference data based on the target feature description set, the reference feature description set, and the feature similarity model.

[0067] Further, the apparatus further includes: an initial data obtaining model, configured to obtain initial measurement data, where the initial measurement data are initial data measured by all of the sensors, all of the sensors including different types of sensors, and each type including a plurality of sensors; an evaluation module, configured to evaluate the initial measurement data to obtain an evaluation value of the initial measurement data; a confidence level matrix establishment module, configured to establish, for a same 21 type of sensors, a confidence level matrix of the initial measurement data of the same type of sensors based on the evaluation value of the initial measurement data; and a first data setting model, configured to obtain a sensor having a relatively higher confidence level among the same type of sensors based on the confidence level matrix, and set the evaluation value of the initial measurement data of the sensor having a relatively higher confidence level as the first measurement data.

[0068] Further, the similarity obtaining module is further configured to obtain the similarity between the target data and the reference data according to the following formula: Sflas)= JANE) >0,8>0 f(AnB)+of(4-B)+ ff (B- 4) where Sf(a,b) represents the similarity between the target data and the reference data, a represents the features of the target data, b represents the features of the reference data, A represents the target feature description set, B represents the reference feature description set, f (4 AB) represents a quantity of statuses that belong to both of the target feature description set and the reference feature description set, f (4-B) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the target feature description set but do not belong to the reference feature description set, f (B -A) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the reference feature description set but do not belong to target feature description set, & represents an attention degree to the features of the reference data, and J represents an attention degree to the features of the target data.

[0069] Optionally, the evaluation module includes: a target initial data setting model, configured to set initial data measured by a target sensor as target initial data based on the initial measurement data, where the target sensor is any one of all of the sensors, and the target initial data include data measured for multiple times by the target sensor; a single-time target initial data obtaining unit, configured to obtain single-time target initial data of the target sensor based on the target initial data, and classify the single-time target initial data into odd-numbered initial data and even-numbered initial data; 22 a first calculation unit, configured to obtain a mean and a standard deviation of the odd-numbered initial data based on the odd-numbered initial data, and obtain a mean and a standard deviation of the even-numbered initial data based on the even-numbered initial data; a partial fusion result obtaining unit, configured to obtain a partial fusion result of the single-time target initial data based on the mean and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data, where the partial fusion result includes a mean fusion value and a standard deviation fusion value; a first evaluation value obtaining unit, configured to obtain an evaluation value of the target initial data based on the partial fusion result of the single-time target initial data; and a second evaluation value obtaining unit, configured to obtain the evaluation data of the initial measurement data, where the evaluation data of the initial measurement data include evaluation values of all of the initial measurement data.

[0070] Further, the partial fusion result obtaining unit is further configured to obtain the partial fusion result of the single-time target initial data according to the following formulas: f= 10 0) T+ a 0) %, o, +0, o, +0, : and 6 [DE o, +0, where x represents the mean fusion value, & represents the standard deviation fusion value, Xx, represents the mean of the odd-numbered initial data, X, represents the mean of the even-numbered initial data, ©; represents the standard deviation of the odd-numbered initial data, and ©, represents the standard deviation of the even-numbered initial data.

[0071] Further, the confidence level matrix establishment module includes: a third evaluation value obtaining unit, configured to obtain, for a same type of sensors, an evaluation value of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; a confidence distance obtaining unit, configured to obtain a confidence distance between the same type of sensors based on the evaluation value of the initial measurement data 23 of the same type of sensors; and a confidence level matrix establishment unit, configured to establish the confidence level matrix of the initial measurement data of the same type of sensors based on the confidence distance.

[0072] This application is described in detail above in combination with specific implementations and exemplary embodiments, but these descriptions cannot be understood as limitations to this application. A person skilled in the art understands that various equivalent replacements, modifications, or improvements may be made to the technical solutions and implementations of this application without departing from the spirit and scope of this application, and these equivalent replacements, modifications, or improvements all fall within the scope of this application. The protection scope of this application is subject to appended claims.

24

Claims

CONCLUSIONS

A method of preprocessing data for data fusion, comprising: obtaining first measurement data, the first measurement data comprising data measured by different types of sensors; obtaining data measured by two different types of sensors in a random manner based on the first measurement data, and setting the data measured by the two different types of sensors as target data and reference data respectively, obtaining characteristics of the target data based on of the target data, and obtaining features of the reference data based on the reference data, obtaining a target feature description set based on the features of the target data, and obtaining a reference feature description set based on the features of the reference data wherein the target feature description set is a set that describes status of the features of the target data, and the reference feature description set is a set that describes status of the features of the reference data; obtaining a similarity between and the target data and the reference data based on the characteristics of the target data, the characteristics of the reference data, the target feature description set and the reference feature description set; and determining, based on the similarity, whether the similarity is greater than a preset critical threshold, wherein if a determining result is yes, it is determined that the target data and the reference data can be fused, and if the determining result is no, there It is determined that the target data and the reference data cannot be fused.

The method of claim 1, wherein obtaining the identity between the target data and the reference data based on the characteristics of the target data, the characteristics of the reference data, the target feature description set and the reference feature description set comprises: determining a feature similarity model based on the features of the target data and the features of the reference data, the feature similarity model being configured to characterize a similarity between the features of the target data and the features of the reference data; and obtaining the identity between the target data and the reference data based on the target feature description set, the reference feature description set, and the feature similarity model.

The method according to claim 2, wherein the identity between the target data and the reference data is obtained according to the following formula: Sth) = n LAB) ig pag fAnB) + af (4-B) + pf (B- A) where Sf (a, hb) represents the similarity between the target data and the reference data, & represents the features of the target data, 4 represents the features of the reference data, 4 represents the target feature description set, B represents the reference feature description set, (A ~ B) represents a number of states that belong to both the target feature description set and the reference feature description set, f (4-B) represents a number of states between the target feature description set and the reference feature description set that belong to the target feature description set, but not belonging to the reference feature description set, f (BA) represents a number of states between the target feature description set and the reference feature description set that belong to the reference feature description g-set, but not belonging to the target attribute description set, & represents an attention factor for the attributes of the reference data, and f represents an attention rate for the attributes of the target data

The method of claim 1, wherein the method for obtaining the first measurement data further comprises: obtaining initial measurement data, the initial measurement data comprising initial data measured by all sensors, all sensors containing different types of sensors, and each type with multiple sensors; evaluating the initial measurement data to obtain an evaluation value of the initial measurement data; determining, for the same type of sensors, a confidence level matrix of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; and obtaining a sensor with a relatively higher confidence level among the same type of sensors based on the confidence level matrix, and setting the evaluation value of the initial measurement data of the sensor with the relatively higher confidence level as the first measurement data.

The method of claim 4, wherein evaluating the initial measurement data to obtain the evaluation value of the initial measurement data comprises: setting initial data measured by a target sensor as initial target data based on the initial measurement data, wherein the target sensor has a is from all of the sensors, and the initial target data includes data measured multiple times by the target sensor; obtaining one-time initial target data from the target sensor based on the initial target data, and classifying the one-time initial target data into odd numbered initial data and even numbered initial data; obtaining an average and standard deviation of the odd-numbered initial data based on the odd-numbered initial data, and obtaining a mean and standard deviation of the even-numbered initial data based on the even-numbered initial data, obtaining a partial fusion result of the target one-time initial data based on the mean and standard deviation one of the odd-numbered initial data and the mean and standard deviation of the even-numbered initial data, where the partial fusion result includes an average fusion value and a standard deviation fusion value; obtaining an evaluation value of the initial target data based on the partial fusion result of the initial one-time target data, and obtaining the evaluation data of the initial measurement data, wherein the evaluation data of the initial measurement data includes evaluation values of all initial measurement data.

The method of claim 5, wherein obtaining the partial fusion result of the one-time initial target data based on the mean and standard deviation of the odd-numbered initial data and the mean and standard deviation of the even-numbered initial data comprises: obtaining of the partial fusion result of the one-time initial target data white according to the following formulas: 2 2 2 2 ~ 0.0,. 0.0.0; + o, Ci +0 and 2 2 Go [CLO 2 2 Oo; +0; where x stands for the mean fusion value, & stands for the standard deviation fusion value, X, stands for the mean of the odd numbered initial data, Xx, stands for the mean of the even-numbered initial data, ©, stands for the standard deviation of the odd numbered initial data, and ©, represents the standard deviation of the even-numbered initial data.

The method of claim 4, wherein determining, for the same type of sensors, a confidence level matrix of the initial measurement data from the same type of sensors, based on the evaluation value of the initial measurement data, comprises: obtaining, for the same type of sensors, of the evaluation value of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; obtaining a confidence distance between the same type of sensors based on the evaluation value of the initial measurement data of the same type of sensors; and determining the confidence level matrix of the initial measurement data from the same type of sensors based on the confidence distance.

An apparatus for preprocessing data for data fusion, comprising: a first data acquisition module configured to obtain first measurement data,

wherein the first measurement data includes data measured by different types of sensors; a second data acquisition module, configured to obtain data measured by two different types of sensors in a random manner based on the first measurement data, and set the data measured by the two different types of sensors as target data and reference data respectively, a feature acquisition module configured to obtain features of the target data based on the target data and to obtain features of the reference data based on the reference data; a feature description set acquisition module, configured to obtain an IO target feature description set based on the characteristics of the target data, and obtain a reference feature description set based on the characteristics of the reference data, the target feature description set being a set that status of the features of the target data, and the reference feature description set is a set that describes status of the features of the reference data; a similarity acquisition module, configured to obtain a similarity between the target data and the reference data based on the characteristics of the target data, the characteristics of the reference data, the target feature description set and the reference feature description set; and a determination module configured to determine, based on the similarity, if the similarity is greater than a preset critical threshold, where if a determining result is yes, it is determined that the target data and the reference data can be fused, and if the determining result is no, it is determined that the target data and reference data cannot be fused.

The apparatus of claim 8, wherein the similarity acquisition module comprises: a model determination unit, configured to determine a feature similarity model based on the features of the target data and the features of the reference data, wherein the feature similarity model is configured to establish a similarity between characterize the characteristics of the target data and the characteristics of the reference data; and a similarity obtaining unit configured to obtain the identity between the target data and the reference data based on the target feature description set, the reference feature description set, and the feature similarity model.

The device of claim 8, wherein the device further comprises: an initial data acquisition model configured to obtain initial measurement data, the initial measurement data being initial data measured by all sensors, all sensors comprising different types of sensors and each type includes multiple sensors; an evaluation module configured to evaluate the initial measurement data to obtain an evaluation value of the initial measurement data; a confidence matrix determination module configured to determine, for the same type of sensors, a confidence level matrix of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; and a first data setting model, configured to obtain a sensor with a relatively higher confidence level among the same type of sensors based on the confidence level matrix, and the evaluation value of the initial measurement data of the sensor with a relatively higher confidence level as the first measurement data.