NL2025323B1 - Data preprocessing method and apparatus for data fusion - Google Patents
Data preprocessing method and apparatus for data fusion Download PDFInfo
- Publication number
- NL2025323B1 NL2025323B1 NL2025323A NL2025323A NL2025323B1 NL 2025323 B1 NL2025323 B1 NL 2025323B1 NL 2025323 A NL2025323 A NL 2025323A NL 2025323 A NL2025323 A NL 2025323A NL 2025323 B1 NL2025323 B1 NL 2025323B1
- Authority
- NL
- Netherlands
- Prior art keywords
- data
- target
- initial
- sensors
- description set
- Prior art date
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 104
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000007781 pre-processing Methods 0.000 title claims abstract description 30
- 238000005259 measurement Methods 0.000 claims description 168
- 238000011156 evaluation Methods 0.000 claims description 76
- 239000011159 matrix material Substances 0.000 claims description 35
- 238000004458 analytical method Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 1
- 239000000499 gel Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 102200097281 rs35807406 Human genes 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B23/00—Testing or monitoring of control systems or parts thereof
- G05B23/02—Electric testing or monitoring
- G05B23/0205—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
- G05B23/0218—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
- G05B23/0224—Process history based detection method, e.g. whereby history implies the availability of large amounts of data
- G05B23/024—Quantitative history assessment, e.g. mathematical relationships between available data; Functions therefor; Principal component analysis [PCA]; Partial least square [PLS]; Statistical classifiers, e.g. Bayesian networks, linear regression or correlation analysis; Neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Automation & Control Theory (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Computational Biology (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Testing Or Calibration Of Command Recording Devices (AREA)
- Measurement Of Optical Distance (AREA)
Abstract
A preprocessing method and apparatus for data fusion. Data measured by two different types of sensors are obtained in an arbitrary manner, and the data measured by the two different types of sensors are respectively set as target data and reference data. Features of the target data and features of the reference data are obtained based on the target data and the reference data. A target feature description set and a reference feature description set are obtained. A similarity 10 between the target data and the reference data is obtained based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set. Whether the similarity is greater than a preset critical threshold is determined based on the similarity, so as to determine whether the target data and the reference data can be fused, thus providing improved efficiency. 15 [FIG 1] 31
Description
[0001] This application relates to the field of data fusion technologies, and in particular, to a data preprocessing method and apparatus for data fusion.
[0002] A multi-sensing system refers to obtaining of comprehensive and complete information of an objective fact by using a plurality of sensors. For example, in an expressway application scenario, when an objective fact such as the traffic flow needs to be analyzed, real-time measurements are usually performed by providing a geomagnetic sensor and a light sensor, and then analyses are performed based on the measured data to obtain information about the traffic flow. For the multi-sensing system, the obtained data are various and complex, and data provided by various types of sensors have different features. For different analysis requirements, in order to improve credibility and utilization of the data, various sensors and the measured data thereby are usually appropriately controlled and fused by using the data fusion technology.
[0003] In addition to determining authenticity of the data, the key point of the data fusion technology lies on determining weights of various sensor data. Therefore, in the prior art, before the data fusion, a preprocessing method is usually used to preprocess the measurement data. In a common data preprocessing method, the weights of the measurement data of the various sensors which are to be used during the data fusion, are decided mainly by calculating measurement variances of the sensors so as to achieve the data fusion.
[0004] However, during the research process of the present invention, the applicant found that in the data preprocessing method in the prior art, the weights of all of the sensor measurement data in the multi-sensing system are calculated before the data fusion, and then all of the sensor data are fused based on the weights. In actual applications, various types of sensor data in the multi-sensing system have different features. When an objective fact needs to be analyzed, it may not be required to perform the data fusion on all of the sensor measurement data. The data 1 preprocessing method in the prior art cannot determine data of which sensors can be fused, and data of which sensors cannot be fused, but fuses all of the sensor measurement data. This may increase computational complexity and reduce the data fusion efficiency.
[0005] To resolve problems that the data preprocessing method in the prior art reduces the data fusion efficiency, and an error occurs between the result obtained by the data fusion and the result that is actually required, this application discloses a data preprocessing method and apparatus for data fusion, according to the following embodiments.
[0006] According to a first aspect of this application, a data preprocessing method for data fusion is disclosed, including: obtaining first measurement data, where the first measurement data include data measured by different types of sensors; obtaining data measured by two different types of sensors in an arbitrary manner based on the first measurement data, and respectively setting the data measured by the two different types of sensors as target data and reference data; obtaining features of the target data based on the target data, and obtaining features of the reference data based on the reference data; obtaining a target feature description set based on the features of the target data, and obtaining a reference feature description set based on the features of the reference data, where the target feature description set is a set that describes status of the features of the target data, and the reference feature description set is a set that describes status of the features of the reference data; obtaining a similarity between the target data and the reference data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set; and determining, based on the similarity, whether the similarity is greater than a preset critical threshold, where if a determining result is yes, it is determined that the target data and the reference data can be fused, and if the determining result is no, it is determined that the target data and the reference data cannot be fused.
[0007] Optionally, the obtaining of the similarity between the target data and the reference 2 data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set, includes: establishing a feature similarity model based on the features of the target data and the features of the reference data, where the feature similarity model is configured to 5S characterize a similarity between the features of the target data and the features of the reference data; and obtaining the similarity between the target data and the reference data based on the target feature description set, the reference feature description set, and the feature similarity model.
[0008] Optionally, the similarity between the target data and the reference data is obtained according to the following formula: Sflas)= SUB) >0,8>0 f(AnB)+af(4-B)+ Af(B- 4) ’ where Sf! (a, b) represents the similarity between the target data and the reference data, a represents the features of the target data, b represents the features of the reference data, A represents the target feature description set, B represents the reference feature description set, f (4 NB) represents a quantity of statuses that belong to both of the target feature description set and the reference feature description set, f (4-B) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the target feature description set but do not belong to the reference feature description set, f (B- A) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the reference feature description set but do not belong to target feature description set, & represents an attention degree to the features of the reference data, and £ represents an attention degree to the features of the target data.
[0009] Optionally, before the obtaining of the first measurement data, the method further includes: obtaining initial measurement data, where the initial measurement data include initial data measured by all of the sensors, all of the sensors including different types of sensors, and each type including a plurality of sensors; 3 evaluating the initial measurement data to obtain an evaluation value of the initial measurement data; establishing, for a same type of sensors, a confidence level matrix of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; and obtaining a sensor having a relatively higher confidence level among the same type of sensors based on the confidence level matrix, and setting the evaluation value of the initial measurement data of the sensor having the relatively higher confidence level as the first measurement data.
[0010] Optionally, the evaluating of the initial measurement data to obtain the evaluation value of the initial measurement data includes: setting initial data measured by a target sensor as target initial data based on the initial measurement data, where the target sensor is any one of all of the sensors, and the target initial data include data measured for multiple times by the target sensor; obtaining single-time target initial data of the target sensor based on the target initial data, and classifying the single-time target initial data into odd-numbered initial data and even-numbered initial data; obtaining a mean and a standard deviation of the odd-numbered initial data based on the odd-numbered initial data, and obtaining a mean and a standard deviation of the even-numbered initial data based on the even-numbered initial data; obtaining a partial fusion result of the single-time target initial data based on the mean and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data, where the partial fusion result includes a mean fusion value and a standard deviation fusion value; obtaining an evaluation value of the target initial data based on the partial fusion result of the single-time target initial data; and obtaining the evaluation data of the initial measurement data, where the evaluation data of the initial measurement data include evaluation values of all of the initial measurement data.
[0011] Optionally, the obtaining of the partial fusion result of the single-time target initial 4 data based on the mean and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data, includes: obtaining the partial fusion result of the single-time target initial data according to the following formulas: f= 10 0) T+ a 0) ï, o, +0, 9 +0 and … [ciel 77 ol +0) where X represents the mean fusion value, & represents the standard deviation fusion value, Xx, represents the mean of the odd-numbered initial data, ¥, represents the mean of the even-numbered initial data, ©, represents the standard deviation of the odd-numbered initial data, and ©, represents the standard deviation of the even-numbered initial data.
[0012] Optionally, the establishing of the confidence level matrix of the initial measurement data of the same type of sensors based on the evaluation value of the initial measurement data, for the same type of sensors, includes: obtaining, for the same type of sensors, the evaluation value of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; obtaining a confidence distance between the same type of sensors based on the evaluation value of the initial measurement data of the same type of sensors; and establishing the confidence level matrix of the initial measurement data of the same type of sensors based on the confidence distance.
[0013] According to a second aspect of this application, a data preprocessing apparatus for data fusion is disclosed, including: a first data obtaining module, configured to obtain first measurement data, where the first measurement data include data measured by different types of sensors; a second data obtaining module, configured to obtain data measured by two different types of sensors in an arbitrary manner based on the first measurement data, and respectively set the data measured by the two different types of sensors as target data and reference data; 5 a feature obtaining module, configured to obtain features of the target data based on the target data, and obtain features of the reference data based on the reference data; a feature description set obtaining module, configured to obtain a target feature description set based on the features of the target data, and obtain a reference feature description set based on the features of the reference data, where the target feature description set is a set that describes status of the features of the target data, and the reference feature description set is a set that describes status of the features of the reference data; a similarity obtaining module, configured to obtain a similarity between the target data and the reference data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set; and a determining module, configured to determine, based on the similarity, whether the similarity is greater than a preset critical threshold, where if a determining result is yes, it is determined that the target data and the reference data can be fused, and if the determining result is no, it is determined that the target data and the reference data cannot be fused.
[0014] Optionally, the similarity obtaining module includes: a model establishment unit, configured to establish a feature similarity model based on the features of the target data and the features of the reference data, where the feature similarity model is configured to characterize a similarity between the features of the target data and the features of the reference data; and a similarity obtaining unit, configured to obtain the similarity between the target data and the reference data based on the target feature description set, the reference feature description set, and the feature similarity model.
[0015] Optionally, the apparatus further includes: an initial data obtaining model, configured to obtain initial measurement data, where the initial measurement data are initial data measured by all of the sensors, all of the sensors including different types of sensors, and each type including a plurality of sensors; an evaluation module, configured to evaluate the initial measurement data to obtain an evaluation value of the initial measurement data; a confidence level matrix establishment module, configured to establish, for a same type of sensors, a confidence level matrix of the initial measurement data of the same type of 6 sensors based on the evaluation value of the initial measurement data; and a first data setting model, configured to obtain a sensor having a relatively higher confidence level among the same type of sensors based on the confidence level matrix, and set the evaluation value of the initial measurement data of the sensor having the relatively higher confidence level as the first measurement data.
[0016] Optionally, the similarity obtaining module is further configured to obtain the similarity between the target data and the reference data according to the following formula: Sflas)= SUB) >0,8>0 f(AnB)+af(4-B)+ Af(B- 4) where Sta, b) represents the similarity between the target data and the reference data, a represents the features of the target data, b represents the features of the reference data, A represents the target feature description set, B represents the reference feature description set, f (4 NB) represents a quantity of statuses that belong to both of the target feature description set and the reference feature description set, f (4-B) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the target feature description set but do not belong to the reference feature description set, f (B -A) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the reference feature description set but do not belong to target feature description set, & represents an attention degree to the features of the reference data, and 5 represents an attention degree to the features of the target data.
[0017] Optionally, the evaluation module includes: a target initial data setting model, configured to set initial data measured by a target sensor as target initial data based on the initial measurement data, where the target sensor is any one of all of the sensors, and the target initial data include data measured for multiple times by the target sensor; a single-time target initial data obtaining unit, configured to obtain single-time target initial data of the target sensor based on the target initial data, and classify the single-time target initial data into odd-numbered initial data and even-numbered initial data, a first calculation unit, configured to obtain a mean and a standard deviation of the 7 odd-numbered initial data based on the odd-numbered initial data, and obtain a mean and a standard deviation of the even-numbered initial data based on the even-numbered initial data; a partial fusion result obtaining unit, configured to obtain a partial fusion result of the single-time target initial data based on the mean and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data, where the partial fusion result includes a mean fusion value and a standard deviation fusion value; a first evaluation value obtaining unit, configured to obtain an evaluation value of the target initial data based on the partial fusion result of the single-time target initial data; and a second evaluation value obtaining unit, configured to obtain the evaluation data of the initial measurement data, where the evaluation data of the initial measurement data include evaluation values of all of the initial measurement data.
[0018] Optionally, the partial fusion result obtaining unit is further configured to obtain the partial fusion result of the single-time target initial data according to the following formulas: f= 10 0) T+ a 0) %, o, +0, o, +0, : and … [ciel ° Voiso; where x represents the mean fusion value, & represents the standard deviation fusion value, Xx, represents the mean of the odd-numbered initial data, X, represents the mean of the even-numbered initial data, ©, represents the standard deviation of the odd-numbered initial data, and ©, represents the standard deviation of the even-numbered initial data.
[0019] Optionally, the confidence level matrix establishment module includes: a third evaluation value obtaining unit, configured to obtain, for the same type of sensors, the evaluation value of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; a confidence distance obtaining unit, configured to obtain a confidence distance between the same type of sensors based on the evaluation value of the initial measurement data of the same type of sensors; and 8 a confidence level matrix establishment unit, configured to establish the confidence level matrix of the initial measurement data of the same type of sensors based on the confidence distance.
[0020] This application discloses a preprocessing method and apparatus for data fusion. In the method, data measured by two different types of sensors is obtained in an arbitrary manner, and the data measured by the two different types of sensors are respectively set as the target data and the reference data. The features of the target data and the features of the reference data are obtained based on the target data and the reference data. Subsequently, the target feature description set and the reference feature description set are obtained. Subsequently, a similarity between the target data and the reference data is obtained based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set. Finally, whether the similarity is greater than the preset critical threshold is determined based on the similarity, so as to determine whether the target data and the reference data can be fused.
[0021] Before the data fusion, the data preprocessing method in the prior art cannot determine data of which sensors can be fused, and data of which sensors cannot be fused, but fuses all of the sensor measurement data. This results in that computational complexity during the data fusion is increased and the data fusion efficiency is reduced. Compared with the prior art, the data preprocessing method and apparatus disclosed in this application can obtain, at a feature level, the similarity between the sensor measurement data based on the features of the sensor measurement data, so as to determine, based on the similarity, whether the measurement data of any two sensors can be fused. In actual applications, for the multi-sensing system, the data preprocessing method disclosed in this application can determine, before the data fusion, measurement data of which sensors can be fused, so that the measurement data of the sensors can be fused in a targeted manner during the data fusion, thereby reducing the computational complexity of the data fusion and improving the data fusion efficiency.
[0022] To more clearly describe the technical solutions of this application, the accompanying drawings to be used in the embodiments are briefly illustrated below. It is apparent that persons 9 of ordinary skills in the art can also derive other accompanying drawings according to these accompanying drawings without an effective effort.
[0023] FIG 1 is a schematic workflow diagram of a data preprocessing method for data fusion according to an embodiment of this application; and
[0024] FIG 2 is a schematic structural diagram of a data preprocessing apparatus for data fusion according to an embodiment of this application.
[0025] To resolve problems that the data preprocessing method in the prior art reduces the data fusion efficiency, and an error occurs between the result obtained by the data fusion and the result that is actually required, this application discloses a data preprocessing method and apparatus for data fusion, according to the following embodiments.
[0026] Refer to the schematic workflow diagram shown in FIG 1, a first embodiment of this application discloses a data preprocessing method for data fusion, including the following steps.
[0027] Step S11: Obtaining first measurement data, where the first measurement data include data measured by different types of sensors.
[0028] Step S12: Obtaining data measured by two different types of sensors in an arbitrary manner based on the first measurement data, and respectively set the data measured by the two different types of sensors as target data and reference data.
[0029] Step S13: Obtaining features of the target data based on the target data, and obtaining features of the reference data based on the reference data.
[0030] Step S14: Obtaining a target feature description set based on the features of the target data, and obtaining a reference feature description set based on the features of the reference data, where the target feature description set is a set that describes the status of the features of the target data, and the reference feature description set is a set that describes the status the features of the reference data.
[0031] People have different recognitions about complex objective objects, and thus the abstractions of the corresponding objective objects also differ. Inevitably, knowledge elements that do not conform to "definition" and "incompleteness" may occur, finally resulting in a problem that descriptions of data for the objective objects or an attribute thereof are inconsistent. 10
The knowledge element refers to a knowledge unit that cannot be further segmented and has a complete knowledge expression. To determine whether data measured by any two sensors can be fused, the data preprocessing method disclosed in this application constructs a knowledge element base by obtaining the features of the target data and the features of the reference data, and further determines whether descriptions of the target data and the reference data are consistent at a feature level, so as to enable descriptions of data used during the data fusion to be consistent at the feature level. For example, when processing the measurement data of the light sensor, because the light sensor is mainly configured to measure a shadow parameter and a brightness parameter, for the data measured by the light sensor, shadow and brightness may serve as features of the measurement data of the light sensor, and data related to the shadow and data related to the brightness may be used as feature description sets by extracting, from the measurement data, relevant data for describing the shadow and relevant data for describing the brightness.
[0032] Step S15: Obtaining a similarity between the target data and the reference data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set.
[0033] Step S16: Determining, based on the similarity, whether the similarity is greater than a preset critical threshold, where if a determining result is yes, it is determined that the target data and the reference data can be fused, and if the determining result is no, it is determined that the target data and the reference data cannot be fused.
[0034] To obtain comprehensive and complete information of an environment or an objective object, the multi-sensing system may include multiple types of sensors. For example, the multi-sensing system in an expressway application usually includes an infrared sensor, an ultrasonic sensor, a piezoelectric sensor, a light sensor, a geomagnetic sensor, and the like. For a certain analysis requirement, for example, the traffic flow, the measurement data of the light sensor and the geomagnetic sensor are fused during the data fusion. However, by using the data preprocessing method in the prior art, before the data fusion it cannot be determined that data of which sensors can be fused and data of which sensors cannot be fused, but all of the sensor measurement data are fused. This results in that computational complexity during the data fusion is increased and the data fusion efficiency is reduced. Compared with the prior art, the 11 data preprocessing method and apparatus disclosed in this application can obtain the similarity between the sensor measurement data based on the features of the sensor measurement data, and determine, based on the similarity, whether the measurement data of any two sensors can be fused. In actual applications, for the multi-sensing system, the data preprocessing method disclosed in this application can determine, before the data fusion, the measurement data of which sensors can be fused, so that the measurement data of such sensors can be fused in a targeted manner during the data fusion, thereby reducing the computational complexity of the data fusion and improving the data fusion efficiency.
[0035] Further, the obtaining of the similarity between the target data and the reference data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set, includes: establishing a feature similarity model based on the features of the target data and the features of the reference data, where the feature similarity model is configured to characterize a similarity between the features of the target data and the features of the reference data.
[0036] An objective object or system may be abstracted as a model m, and the model m may be expressed as a knowledge element Km by studying the common form of knowledge expression of the model m. Assuming that Nm is a conceptual name of the objective object or system, where Nm is essentially a set of vocabularies having a same meaning or similar meanings. Am is a set for describing attributes of features of the objective object and how to describe these attributes, and is classified into a set of qualitative descriptions of status and a set of measurable quantitative descriptions of status. Rm is a set for describing association relationships between the attributes. » € Rm indicates a mapping relationship on Amx Am. The relevant relationships may be classified into qualitative relevant relationships and quantitative relevant relationships. A relationship knowledge element model can be abstracted by summarizing the common features of the relevant relationships. The knowledge element Km corresponding to the model m may be expressed as: Km=(Nm, Am, Rm).
[0037] Obviously, the following attributes may be obtained according to a definition of a similarity function Sf of the knowledge element: Sf (x, y) SE [0,1]; Sf (x, y)=0, if and only if x and y are irrelevant; and 12
Sf (x, y)=1, if and only if x and y are the same, and Sf (x, y)=Sf (y, x). where x and y respectively represent different knowledge elements, and may be understood as the features of the target data and the features of the reference data disclosed above in this application.
[0038] It may be learned from the foregoing definitions that it is very convenient to calculate the similarity by using a geometrical principle. However, due to the symmetry of the geometrical similarity model, the geometrical similarity model is unsuitable for comparing a lot of objective objects having asymmetrical features in real life. Applications of similarity comparison models for asymmetrical features are wider, and on the basis of the feature similarity model, researches in the aspect of the extended applications of the similarity model are continuously developed in depth. The feature similarity model is used in this application. The model enumerates features of the objective object in a set manner, and obtains a similarity between the feature sets by defining a function conforming to the feature elements, so as to further characterize the feature-level similarity between the target data and the reference data.
[0039] The obtaining of the similarity further includes obtaining the similarity between the target data and the reference data based on the target feature description set, the reference feature description set, and the feature similarity model.
[0040] Further, the similarity between the target data and the reference data is obtained according to the following formula: Sfla,n)= SOB) 0,820 f(A B)+of (4-B)+ A(B-4) where Sta, b) represents the similarity between the target data and the reference data, qa represents the features of the target data, b represents the features of the reference data, A represents the target feature description set, B represents the reference feature description set, f (4 NB) represents a quantity of statuses that belong to both of the target feature description set and the reference feature description set, f(4-B) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the target feature description set but do not belong to the reference feature description set, f (B -A) represents a quantity of statuses among the target feature description set and the reference 13 feature description set that belong to the reference feature description set but do not belong to target feature description set, & represents an attention degree to the features of the reference data, and £ represents an attention degree to the features of the target data. The attention degree is a value preset according to actual application situations. For a certain analysis 3 requirement, when determining which measurement data may be fused, the attention degree represents the importance levels of different sensor measurement data. For example, for the analysis requirement of the traffic flow, when determining the similarity between the measurement data of two sensors, i.e. the light sensor and the geomagnetic sensor, if an operator considers that the measurement data of the light sensor is particular important to the analyses of the traffic flow, the attention degree to the features of the measurement data of the light sensor may be manually preset to be relatively greater, and the attention degree to the features of the measurement data of the geomagnetic sensor may be manually preset to be relatively smaller.
[0041] In actual applications, each objective fact is a multi-dimensional complex having a plurality of features. For calculations of a similarity between the objective fact having the multi-dimensional complex features, the feature similarity model for each feature may be used for linear weighing so as to calculate a synthetic multi-feature similarity model of the objective object. Assuming that the target data has 7 features, then the similarity between an i” feature of the target data and the reference data is calculated to be Sf; (a, b), and the similarity between the target data and the reference data is: Sf(a,b)= Sf (0,5) i=l , where ©, represents the weight of the i” feature, and may be preset according to actual application situations.
[0042] When determining a value of the similarity, a critical threshold # is set, and 0<u<l When Sf (a, b) > MU, it indicates that there is a large possibility that the target data and the reference data are the measurement data of a same object, and the target data and the reference data can be fused. When Sf(a,b)<u, it indicates that the target data and the reference data are not the measurement data of a same object, and the target data and the reference data cannot be fused. Specifically, the critical threshold # is set according to the 14 degree required for the similarity in actual applications. For example, if requirements on the similarity are relatively high, the critical threshold # may be set to 0.8. When requirements on the similarity are relatively low, the critical threshold # may be appropriately set to be smaller.
[0043] The sensors are easily interfered by the environment during a data collection process. Therefore, the data measured by the sensors usually have deviations, thus affecting accuracy and stability during the data fusion. Therefore, according to the data preprocessing method disclosed in this application, in order to improve accuracy of the first measurement data, errors of the initial data measured by the sensor are removed before the first measurement data are obtained.
[0044] Further, before the obtaining of the first measurement data, the method further includes: obtaining initial measurement data, where the initial measurement data include initial data measured by all of the sensors, the all of the sensors including different types of sensors, and each type including a plurality of sensors; evaluating the initial measurement data to obtain an evaluation value of the initial measurement data; establishing, for a same type of sensors, a confidence level matrix of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; and obtaining a sensor having a relatively higher confidence level among the same type of sensors based on the confidence level matrix, and setting the evaluation value of the initial measurement data of the sensor having the relatively higher confidence level as the first measurement data.
[0045] Further, the evaluating of the initial measurement data to obtain the evaluation value of the initial measurement data includes: setting initial data measured by a target sensor as target initial data based on the initial measurement data, where the target sensor is any one of all of the sensors, and the target initial data include data measured for multiple times by the target sensor; obtaining single-time target initial data of the target sensor based on the target initial 15 data, and classifying the single-time target initial data into odd-numbered initial data and even-numbered initial data; classifying the single-time target initial data into odd-numbered initial data and even-numbered initial data, where the odd-numbered initial data are X;,%,,"**,%,,"**,%5 and > include T, pieces of data in total, and the even-numbered initial data are 626,0, and include 7, pieces of data in total; obtaining a mean and a standard deviation of the odd-numbered initial data based on the odd-numbered initial data, and obtaining a mean and a standard deviation of the even-numbered initial data based on the even-numbered initial data.
[0046] Specifically, the mean of the odd-numbered initial data is obtained according to the following formula:
I Xx, = 72 xy, where in the odd-numbered initial data, % represents 1, piece of data, and 4 € [1,7 ].
[0047] The mean of the even-numbered initial data is obtained according to the following formula: _ 1a xX, = T 2 where in the even-numbered initial data, f, represents tf, ® piece of data, and 7, € [7].
[0048] The standard deviation of the odd-numbered initial data is obtained according to the following formula: 7 0, = => (x, 5) Ll
[0049] The standard deviation of the even-numbered initial data is obtained according to the following formula: 7 0, = Te 5)
[0050] The evaluating of the initial measurement data further includes: obtaining a partial fusion result of the single-time target initial data based on the mean 16 and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data, where the partial fusion result includes a mean fusion value and a standard deviation fusion value; obtaining an evaluation value of the target initial data based on the partial fusion > result of the single-time target initial data; obtaining the evaluation value of the initial measurement data, where the evaluation value of the initial measurement data include evaluation values of all of the initial measurement data.
[0051] Further, the obtaining of the partial fusion result of the single-time target initial data based on the mean and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data includes: obtaining the partial fusion result of the single-time target initial data according to the following formulas: 2 2 2 2 ios gels LT t 2 and 6 [DE o, +0, where X represents the mean fusion value, & represents the standard deviation fusion value, Xx, represents the mean of the odd-numbered initial data, X, represents the mean of the even-numbered initial data, ©, represents the standard deviation of the odd-numbered initial data, and ©, represents the standard deviation of the even-numbered initial data.
[0052] Specifically, a deduction formula of the mean fusion value £ is: . 0.0; co’ Ox) oo _ oo _ te ol +0; ! { 0 JE} - ol +07 nr 0; +05 2 where i 1] is a matrix with one row and two columns.
[0053] A deduction formula of the standard deviation fusion value X is: 2 1 2 2 ê | 1 J) == = 0 5, 1 o, +0, 17
[0054] A variance fusion value 4? may be easily obtained based on the standard deviation fusion value: =o 0, +0;
[0055] The evaluation value of the target initial data that is obtained based on the partial 5 fusion result of the single-time target initial data is: X= Bett] where * represents the evaluation value of the target initial data, X, represents the mean fusion value of the initial data measured at a j” time by the target sensor, and the target sensor measures #7 times in total.
[0056] To improve accuracy of the initial measurement data, single-time measurement data of a single sensor is partially fused, so as to remove errors of the single-time measurement data; and then the evaluation value of the initial measurement data is obtained, so as to remove errors of the initial measurement data of all of the sensors.
[0057] Because different sensors suffer from different interferences when collecting data, credibility of the data measured thereby are also different. This problem also exists in the same type of sensors. In order to screen out the measurement data having relatively higher credibility, subsequently, for the same type of sensors, a confidence level matrix of the measurement data is established by calculating a confidence distance between the sensors, and then the data measured by which sensors among the same type of sensors has relatively higher credibility is determined by using the confidence level matrix.
[0058] Further, the establishing of, for the same type of sensors, the confidence level matrix of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data, includes: obtaining, for the same type of sensors, the evaluation value of the initial measurement data of the same type of sensors based on the evaluation value of the initial measurement data, obtaining a confidence distance between the same type of sensors based on the evaluation value of the initial measurement data of the same type of sensors. 18
[0059] Specifically, assuming that a sensor £ and a sensor F are the same type of sensors, the evaluation value of the initial measurement data of the sensor E is X;, and the standard deviation fusion value is ó;; and the evaluation value of the initial measurement data of the sensor Fis X +, and the standard deviation fusion value is ó;. The confidence distance dy between the two sensors is obtained according to the following formula: Xr Xr 1 1 XX dg =2|. PAX, =2| == exp| |L ld EF J: 2 ( jd; J: 276, v) | 5, J x Xp-X (72 9% 9% a | X Xr-X Ye NDT 2 Oy where FP; (Xx ) is defined as a probability density function of X + > and 1 1(X-X, ; bP (x )= ———eXp|——| ||. X represents a mean of evaluation values of the initial N27S, 2 Op measurement data of all of the sensors. exp represents an exponential function operation that uses a natural constant e as a base number.
[0060] The establishing of the confidence level matrix further includes: establishing the confidence level matrix of the initial measurement data of the same type of sensors based on the confidence distance.
[0061] Assuming that the same type of sensors includes K numbers of sensors, that is, the same object is measured by the K numbers of sensors, then the confidence level matrix D is established based on the obtained confidence distance, as shown below: di d,, == dx D= d,, d,, == dx dy, dy, == dex |
[0062] In view of the foregoing K numbers of sensors that belong to the same type, it is assumed that the sensor £ is an £” sensor. Elements in an £” row of the established confidence level matrix D represent the confidence distances from the £” sensor to all of the remaining sensors. A sum of the elements in this row is obtained. If the sum of the elements in this row is relatively greater than a sum of elements in each of the remaining rows, it indicates that data measured by the £7 sensor is trusted by most sensors. In this case, the data measured by the E 19 sensor has a relatively higher confidence level. Otherwise, it indicates that the data measured by the £7 sensor is not trusted by most sensors. In this case, a possibility that the data measured by the £™ sensor is real data is relatively small. Specifically, when determining, for the confidence level matrix, whether data measured by a sensor has a relatively higher confidence level among the same type of sensors, it may be determined, for this type of sensors, data measured by how many sensors are finally required to be fused, according to requirements in actual applications. For example, there are ten sensors that belong to a same type. During the subsequent data fusion, if measurement data of merely five sensors are required to be fused, then five rows, sums of elements in which are relatively greater, are required to be selected from ten rows of the confidence level matrix, and sensors represented by the five rows are used as sensors having relatively higher confidence levels.
[0063] According to the data preprocessing method for data fusion disclosed in this application, error data are removed by using a data-level error-removing algorism that is on the basis of the confidence distance, and then sensor measurement data with consistent descriptions are obtained at a feature level based on a feature-level normalization algorism of knowledge elements. In this way, the sensor measurement data can be fused in a targeted manner during the data fusion, thereby reducing the computational complexity and improving the data fusion efficiency.
[0064] Apparatus embodiments disclosed in this application are described below, and may be used to implement the method embodiments of this application. For details of the apparatus embodiments of this application that are not disclosed, reference may be made to the method embodiments of this application.
[0065] Correspondingly, refer to the schematic structural diagram shown in FIG 2, another embodiment of this application discloses a data preprocessing apparatus for data fusion, including: a first data obtaining module 10, configured to obtain first measurement data, where the first measurement data include data measured by different types of sensors; a second data obtaining module 20, configured to obtain data measured by two different types of sensors in an arbitrary manner based on the first measurement data, and respectively set the data measured by the two different types of sensors as target data and 20 reference data; a feature obtaining module 30, configured to obtain features of the target data based on the target data, and obtain features of the reference data based on the reference data; a feature description set obtaining module 40, configured to obtain a target feature description set based on the features of the target data, and obtain a reference feature description set based on the features of the reference data, where the target feature description set is a set that describes status of the features of the target data, and the reference feature description set is a set that describes status of the features of the reference data; a similarity obtaining module 50, configured to obtain a similarity between the target data and the reference data based on the features of the target data, the features of the reference data, the target feature description set, and the reference feature description set; and a determining module 60, configured to determine, based on the similarity, whether the similarity is greater than a preset critical threshold, where if a determining result is yes, it is determined that the target data and the reference data can be fused, and if the determining result is no, it is determined that the target data and the reference data cannot be fused.
[0066] Further, the similarity obtaining module includes: a model establishment unit, configured to establish a feature similarity model based on the features of the target data and the features of the reference data, where the feature similarity model is configured to represent a similarity between the features of the target data and the features of the reference data; and a similarity obtaining unit, configured to obtain the similarity between the target data and the reference data based on the target feature description set, the reference feature description set, and the feature similarity model.
[0067] Further, the apparatus further includes: an initial data obtaining model, configured to obtain initial measurement data, where the initial measurement data are initial data measured by all of the sensors, all of the sensors including different types of sensors, and each type including a plurality of sensors; an evaluation module, configured to evaluate the initial measurement data to obtain an evaluation value of the initial measurement data; a confidence level matrix establishment module, configured to establish, for a same 21 type of sensors, a confidence level matrix of the initial measurement data of the same type of sensors based on the evaluation value of the initial measurement data; and a first data setting model, configured to obtain a sensor having a relatively higher confidence level among the same type of sensors based on the confidence level matrix, and set the evaluation value of the initial measurement data of the sensor having a relatively higher confidence level as the first measurement data.
[0068] Further, the similarity obtaining module is further configured to obtain the similarity between the target data and the reference data according to the following formula: Sflas)= JANE) >0,8>0 f(AnB)+of(4-B)+ ff (B- 4) where Sf(a,b) represents the similarity between the target data and the reference data, a represents the features of the target data, b represents the features of the reference data, A represents the target feature description set, B represents the reference feature description set, f (4 AB) represents a quantity of statuses that belong to both of the target feature description set and the reference feature description set, f (4-B) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the target feature description set but do not belong to the reference feature description set, f (B -A) represents a quantity of statuses among the target feature description set and the reference feature description set that belong to the reference feature description set but do not belong to target feature description set, & represents an attention degree to the features of the reference data, and J represents an attention degree to the features of the target data.
[0069] Optionally, the evaluation module includes: a target initial data setting model, configured to set initial data measured by a target sensor as target initial data based on the initial measurement data, where the target sensor is any one of all of the sensors, and the target initial data include data measured for multiple times by the target sensor; a single-time target initial data obtaining unit, configured to obtain single-time target initial data of the target sensor based on the target initial data, and classify the single-time target initial data into odd-numbered initial data and even-numbered initial data; 22 a first calculation unit, configured to obtain a mean and a standard deviation of the odd-numbered initial data based on the odd-numbered initial data, and obtain a mean and a standard deviation of the even-numbered initial data based on the even-numbered initial data; a partial fusion result obtaining unit, configured to obtain a partial fusion result of the single-time target initial data based on the mean and the standard deviation of the odd-numbered initial data and the mean and the standard deviation of the even-numbered initial data, where the partial fusion result includes a mean fusion value and a standard deviation fusion value; a first evaluation value obtaining unit, configured to obtain an evaluation value of the target initial data based on the partial fusion result of the single-time target initial data; and a second evaluation value obtaining unit, configured to obtain the evaluation data of the initial measurement data, where the evaluation data of the initial measurement data include evaluation values of all of the initial measurement data.
[0070] Further, the partial fusion result obtaining unit is further configured to obtain the partial fusion result of the single-time target initial data according to the following formulas: f= 10 0) T+ a 0) %, o, +0, o, +0, : and 6 [DE o, +0, where x represents the mean fusion value, & represents the standard deviation fusion value, Xx, represents the mean of the odd-numbered initial data, X, represents the mean of the even-numbered initial data, ©; represents the standard deviation of the odd-numbered initial data, and ©, represents the standard deviation of the even-numbered initial data.
[0071] Further, the confidence level matrix establishment module includes: a third evaluation value obtaining unit, configured to obtain, for a same type of sensors, an evaluation value of the initial measurement data of the same type of sensors, based on the evaluation value of the initial measurement data; a confidence distance obtaining unit, configured to obtain a confidence distance between the same type of sensors based on the evaluation value of the initial measurement data 23 of the same type of sensors; and a confidence level matrix establishment unit, configured to establish the confidence level matrix of the initial measurement data of the same type of sensors based on the confidence distance.
[0072] This application is described in detail above in combination with specific implementations and exemplary embodiments, but these descriptions cannot be understood as limitations to this application. A person skilled in the art understands that various equivalent replacements, modifications, or improvements may be made to the technical solutions and implementations of this application without departing from the spirit and scope of this application, and these equivalent replacements, modifications, or improvements all fall within the scope of this application. The protection scope of this application is subject to appended claims.
24
Claims (10)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910291404.3A CN109766958B (en) | 2019-04-12 | 2019-04-12 | A kind of data preprocessing method and device for data fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
NL2025323A NL2025323A (en) | 2020-10-15 |
NL2025323B1 true NL2025323B1 (en) | 2020-12-22 |
Family
ID=66460306
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
NL2025323A NL2025323B1 (en) | 2019-04-12 | 2020-04-09 | Data preprocessing method and apparatus for data fusion |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN109766958B (en) |
DE (1) | DE102020110028A1 (en) |
NL (1) | NL2025323B1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021134564A1 (en) * | 2019-12-31 | 2021-07-08 | Siemens Aktiengesellschaft | Method and device for processing sensor data |
CN112003891B (en) * | 2020-07-16 | 2022-09-06 | 山东派蒙机电技术有限公司 | Multi-sensing data fusion method for intelligent networked vehicle controller |
CN114528276B (en) * | 2022-02-21 | 2024-01-19 | 新疆能源翱翔星云科技有限公司 | Big data acquisition, storage and management system and method based on artificial intelligence |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104637371B (en) * | 2015-03-06 | 2017-06-30 | 中国农业大学 | A kind of method being embedded into ontologies in game model |
WO2019066841A1 (en) * | 2017-09-28 | 2019-04-04 | Intel Corporation | Multimodal sensing in autonomous driving vehicles with self-healing capabilities |
CN109556615B (en) * | 2018-10-10 | 2022-10-04 | 吉林大学 | Driving map generation method based on multi-sensor fusion cognition of automatic driving |
-
2019
- 2019-04-12 CN CN201910291404.3A patent/CN109766958B/en active Active
-
2020
- 2020-04-09 NL NL2025323A patent/NL2025323B1/en not_active IP Right Cessation
- 2020-04-09 DE DE102020110028.0A patent/DE102020110028A1/en not_active Withdrawn
Also Published As
Publication number | Publication date |
---|---|
CN109766958B (en) | 2019-07-05 |
NL2025323A (en) | 2020-10-15 |
DE102020110028A1 (en) | 2020-10-15 |
CN109766958A (en) | 2019-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wilson et al. | Predictive inequity in object detection | |
NL2025323B1 (en) | Data preprocessing method and apparatus for data fusion | |
CN107633265B (en) | Data processing method and device for optimizing credit evaluation model | |
US9164022B2 (en) | Neighborhood thresholding in mixed model density gating | |
US9721213B2 (en) | Information matching apparatus, method of matching information, and computer readable storage medium having stored information matching program | |
US9753968B1 (en) | Systems and methods for detection of anomalous entities | |
US11656174B2 (en) | Outlier detection for spectroscopic classification | |
Ghazal et al. | Data Mining and Exploration: A Comparison Study among Data Mining Techniques on Iris Data Set | |
KR101953190B1 (en) | A multidimensional recursive learning process and system used to discover complex dyadic or multiple counterparty relationships | |
WO2024067387A1 (en) | User portrait generation method based on characteristic variable scoring, device, vehicle, and storage medium | |
WO2019200739A1 (en) | Data fraud identification method, apparatus, computer device, and storage medium | |
Gupta | An efficient feature subset selection approach for machine learning | |
CN114036531A (en) | Multi-scale code measurement-based software security vulnerability detection method | |
McFee et al. | Hierarchical Evaluation of Segment Boundary Detection. | |
CN113128329A (en) | Visual analytics platform for updating object detection models in autonomous driving applications | |
KR102336679B1 (en) | Index normalization based probability distribution selection method for model selection | |
CN103678709B (en) | Recommendation system attack detection method based on time series data | |
US7469186B2 (en) | Finding usable portion of sigmoid curve | |
CN117857202A (en) | Multi-dimensional security assessment method for information system | |
KR20210091591A (en) | An electronic device including evaluation operation of originated technology | |
CN117036781A (en) | Image classification method based on tree comprehensive diversity depth forests | |
CN115187064A (en) | Qingdao city property development index analysis based on principal component and clustering method | |
Jandová et al. | Age verification using random forests on facial 3D landmarks | |
Fong et al. | Incremental methods for detecting outliers from multivariate data stream | |
Sriram et al. | Exploratory data analysis using artificial neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM | Lapsed because of non-payment of the annual fee |
Effective date: 20230501 |