Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a decoding-free transmission method for compressed data facing edge computing, which directly analyzes and classifies the compressed data without data decoding, greatly improves the computing efficiency, saves computing resources and improves the transmission performance of mass data in cloud computing on the premise of accurately analyzing and processing the data.
In order to achieve the purpose, the invention adopts the technical scheme that:
an edge-computation-oriented decoding-free transmission method for compressed data, comprising:
step 1, collecting data of different time slots on an edge fixed network or a mobile network at an initial node;
step 2, adopting a classifiable manifold learning data compression algorithm to compress the collected data, and comprising the following steps:
step 2.1, respectively storing the data of different time slots collected in the step 1 in different arrays;
step 2.2, constructing a neighbor weighted graph G for the data in each array, recording a pair of data with adjacent relation as a data pair, and recording the adjacent relation between the characteristic attributes corresponding to the data pair;
step 2.3, constructing a global measurement characteristic matrix, and solving the global measurement characteristic matrix by using a geodesic distance algorithm to obtain the shortest paths of all data pairs in the neighbor weighted graph G;
step 2.4, capturing feature mapping embedding, and mapping the data features from a high-dimensional space to a low-dimensional space;
step 2.5, obtaining the feature mapping results of all time slots, and combining the results to be used as the feature mapping compression expression result of the next time slot edge network node;
and step 3, transmitting the result combination to the edge network node.
In the step 1, data on the edge fixed network or the mobile network is collected according to the time slot, and the characteristic attribute of the data is recorded, and in the step 2.1, the data in the edge fixed network or the mobile network of different time slots are respectively recorded by using different arrays.
In the step 2.2, a neighbor weighted graph G is constructed by using an epsilon-neighbor method, and the method comprises the following steps:
given a threshold ε, if | | | xi-xj||2If epsilon is not more than epsilon, data xiAnd data xjIs a neighbor point, has a neighbor relation, and is marked as a data pair, wherein xiCorresponding to the ith characteristic attribute ai,xjCorresponding to the ith characteristic attribute ajX is to beiAnd xjConnected by an edge, and the weight L of the edge is set as xiAnd xjEuclidean distance of dx(i, j) which represents aiAnd ajIn a manner such that, among other things,||||2representing the euclidean norm.
In step 2.3, the global metric feature matrix is expressed as: dG=(dG(i,j))n*nThe geodesic distance algorithm solving formula is as follows:
wherein D isGN x n matrix composed of shortest paths of all data pairs in neighbor weighted graph G, dG(i, j) represents xiAnd xjGeodesic distance between, N represents a data set, dG(i, k) represents data xiAnd data xkGeodesic distance between, dG(k, j) represents data xkAnd data xjGeodesic distance between them.
In the step 2.4, order
Find matrix τ (D)
G) All the eigenvalues and eigenvectors are sorted from large to small according to the eigenvalues, and the first d eigenvalues lambda are selected
1,λ
2…λ
dUsing λ
1,λ
2…λ
dCorresponding feature vector u
1,u
2…u
dThe composition matrix U ═ U
1,u
2…u
d]Then, the final feature mapping embedding result is:
where Y represents the low-dimensional eigenmap embedding result, i.e., eigenmap result, H is the center matrix, which is summed with D
GIs a unit matrix of the same order, S is a square matrix of the geodesic distance,
τ denotes a matrix transform operator, and C denotes an adjacent weighted graph G.
Compared with the prior art, the invention has the beneficial effects that:
the data compression method adopted by the invention omits the decoding operation of the compressed data in the data processing process by utilizing the characteristic mapping compression algorithm, and finally obviously optimizes the data transmission and processing time and saves the calculation and communication resources on the basis of improving the classification accuracy.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples.
As shown in fig. 1, the present invention is a decoding-free transmission method for compressed data facing edge calculation, which mainly includes the following steps:
step 1, at an initial node, collecting data of time slots 1-t on an edge fixed network or a mobile network, and recording the characteristic attribute and the corresponding data value of the data.
Step 2, in order to reduce the transmission data scale and optimize the transmission efficiency, a classifiable manifold learning data compression algorithm is adopted at an initial node to perform feature mapping on data homomorphism, so that data compression is effectively realized on the premise of ensuring the classifiable data, and the premise is provided for efficiently acquiring data by an edge node, wherein the method comprises the following steps:
and 2.1, respectively storing the data of the time slots 1 to t collected in the step 1 in different arrays, namely storing the data of the time slot 1 in the array 1, storing the data of the time slot 2 in the arrays 2 and … …, and so on.
And 2.2, constructing a neighbor weighted graph G for the data in each array, recording a pair of data with adjacent relation as a data pair, and recording the adjacent relation between the characteristic attributes corresponding to the data pair.
The neighbor weighted graph G is constructed by an epsilon-neighbor method, and the method comprises the following steps:
in some array, given a threshold ε, if | | xi-xj||2If epsilon is not more than epsilon, data xiAnd data xjIs a neighbor point, has a neighbor relation, and is marked as a data pair, wherein xiThe corresponding characteristic attribute is ai,xjThe corresponding characteristic attribute is ajX is to beiAnd xjConnected by an edge, and the weight L of the edge is set as xiAnd xjEuclidean distance of dx(i, j) which represents aiAnd ajHave an adjacent relationship therebetween, wherein | | | | non-phosphor2Representing the euclidean norm.
Step 2.3, constructing a global metric feature matrix, expressed as:
DG=(dG(i,j))n*n
and (3) solving the global measurement characteristic matrix by using a geodesic distance algorithm to obtain the shortest paths of all data pairs in the neighbor weighted graph G, wherein the solving formula is as follows:
wherein D isGN x n matrix composed of shortest paths of all data pairs in neighbor weighted graph G, dG(i, j) represents xiAnd xjGeodesic distance between, N represents a data set, dG(i, k) represents data xiAnd data xkGeodesic distance between, dG(k, j) represents data xkAnd data xjGeodesic distance between them.
And 2.4, capturing feature mapping embedding, and mapping the data features from a high-dimensional space to a low-dimensional space.
Specifically, let
Solving matrix tau (D) by knowledge of matrix analysis
G) All the eigenvalues and eigenvectors are sorted from large to small according to the eigenvalues, and the first d eigenvalues lambda are selected
1,λ
2…λ
dUsing λ
1,λ
2…λ
dCorresponding feature vector u
1,u
2…u
dThe composition matrix U ═ U
1,u
2…u
d]Then, the final feature mapping embedding result is:
where Y represents the low-dimensional eigenmap embedding result, i.e., eigenmap result, H is the center matrix, which is summed with D
GIs a unit matrix of the same order, S is a square matrix of the geodesic distance,
τ denotes a matrix transform operator, and C denotes an adjacent weighted graph G.
And 2.5, sequentially calculating and obtaining feature mapping results Y1-Yt of the time slots 1-t according to the method, taking (Y1, Y2, … … and Yt) as a feature mapping compression representation result of the edge network node of the time slot t +1, and transmitting the result to the edge network node.
The invention can adopt an Options-based data processing method to perform time domain abstraction at the central node in the transmission process, and uses the Options to output data compression classification results of different time slots. Therefore, on the premise of data compression, the classification accuracy and efficiency are effectively improved, and computing and communication resources are saved.
In the present invention, Options are defined on MDP (Markov Decision Process) and are used
A triple is represented by a single triple, and,
representing the initial state set, pi representing the policy, and beta representing the terminal state set. ε (o, S, t) represents the probability that the options end up in state S after the compressed data classification model performs k steps, for each state S ∈ S, p (S, k) represents the probability that the options end up in state S,after terminating one state, the next state is entered. The equation for the state transition function for the state s with respect to option is as follows:
wherein gamma represents the threshold value of the threshold value,
indicating the next state. The method is used for describing the classification output results of the compressed data classification model on different time scales.
Referring to fig. 2, in one embodiment of the present invention, two data sets, respectively data of a rossford national forest in northern colorado and athletic data of a plurality of healthy elderly people, are used, as shown in tables 1 and 2, respectively. The dataset is a Rossfu national forest in northern Colorado, with 4 wildland types, including 6 types of forest coverage, each of which will be determined by 12 data features. And the data set II records the data change of daily activities of the old through a method of installing sensors on coats worn by a plurality of old people. The sensor recording contents are respectively as follows: time, equalized reading G on the positive axis, equalized reading G on the vertical axis, Sensor ID on the absolute reading, Signal string, Phase, and Frequency. The collected attribute features are then deposited in an array. The twelve characteristic attributes of the forest comprise factors of Elevation, Aspect, Slope, Hillshade _9am, Wilderness _ Area, Soil _ Type and the like. And for the first data set, setting a plurality of arrays, wherein the size of each array is 12, and the arrays are used for storing 12 characteristic attributes of the forest to record 12 characteristic attributes of different time slots. For data set two, the array size is set to 8, for storing the attributes of 8 elderly activities.
TABLE 1 four wilderness region information table
Table 2 data set information table
Datasets 1
|
Size
|
Datasets 2
|
Size
|
DS_10
|
7000
|
DS_10
|
4000
|
DS_20
|
14000
|
DS_20
|
8000
|
DS_30
|
21000
|
DS_30
|
12000
|
DS_40
|
28000
|
DS_40
|
16000
|
DS_50
|
35000
|
DS_50
|
20000 |
After the feature mapping compression representation result is obtained by the method, a classification model of compressed data is constructed by using a BP neural network and an SVM respectively, the data of the two compressed data sets are classified, the classified result is compared with the original class, and the classification performance of the compressed data is verified.
To verify the effectiveness of the present invention, two comparison schemes were set up: the contrast scheme is data compression realized based on a self-encoder, and the contrast scheme does not perform any compression processing on original input data. Wherein the scheme used by the invention is represented by the ISOMAP-C @ BP and ISOMAP-C @ SVM symbols. According to the comparison scheme, a BP neural network and an SVM are combined to build a classification model respectively, wherein the CAE @ BP mark utilizes a BP neural network algorithm to build a scheme of a classification model facing a compression result, and the CAE @ SVM mark SVM is used to build a scheme of a classification model facing the compression result; and in the second comparison scheme, original input data are directly and respectively classified by a BP classifier and an SVM classifier and are marked by Basic @ BP and Basic @ SVM. The evaluation indexes adopted are accuracy and efficiency.
Table 3 table of accuracy comparison under data set 1
Table 4 table of accuracy comparison under data set 2
Table 5 run time comparison table under data set 1
Table 6 specific run time comparison table under dataset 2
The invention can remarkably optimize the data transmission or processing time by the proposed classifiable data compression mechanism on the premise of improving the classification accuracy by 2.2%.
While the invention has been described in detail with reference to specific embodiments thereof, it will be understood that the invention is not limited to the details of construction and the embodiments set forth herein. For a person skilled in the art to which the invention pertains, several simple deductions or substitutions may be made without departing from the spirit of the invention and the scope of protection defined by the claims, which shall be regarded as belonging to the scope of protection of the invention.