CN113656480A - Data integration method and system for heterogeneous data source - Google Patents
Data integration method and system for heterogeneous data source Download PDFInfo
- Publication number
- CN113656480A CN113656480A CN202110952257.7A CN202110952257A CN113656480A CN 113656480 A CN113656480 A CN 113656480A CN 202110952257 A CN202110952257 A CN 202110952257A CN 113656480 A CN113656480 A CN 113656480A
- Authority
- CN
- China
- Prior art keywords
- data
- information
- heterogeneous
- module
- source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000010354 integration Effects 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000003745 diagnosis Methods 0.000 claims abstract description 28
- 238000004891 communication Methods 0.000 claims abstract description 21
- 238000004458 analytical method Methods 0.000 claims abstract description 15
- 238000012423 maintenance Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 61
- 239000013598 vector Substances 0.000 claims description 57
- 238000006243 chemical reaction Methods 0.000 claims description 28
- 238000005457 optimization Methods 0.000 claims description 27
- 238000013528 artificial neural network Methods 0.000 claims description 22
- 238000011156 evaluation Methods 0.000 claims description 22
- 230000015654 memory Effects 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 18
- 238000003062 neural network model Methods 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000003993 interaction Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000013500 data storage Methods 0.000 claims description 6
- 230000006855 networking Effects 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000012549 training Methods 0.000 claims description 4
- 108091034117 Oligonucleotide Proteins 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000011157 data evaluation Methods 0.000 claims description 3
- 239000000835 fiber Substances 0.000 claims description 3
- 230000010365 information processing Effects 0.000 claims description 3
- 238000012905 input function Methods 0.000 claims description 3
- 238000007726 management method Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 101100177269 Arabidopsis thaliana HCAR gene Proteins 0.000 claims 4
- 238000010586 diagram Methods 0.000 description 4
- 230000007787 long-term memory Effects 0.000 description 4
- 238000013523 data management Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A data integration method and a system of a heterogeneous data source relate to the technical field of networks, and the integration method comprises the following steps: (S1) a plurality of data sources with different data structures, access modes and forms are collected, and information among different databases is transferred or interacted; (S2) integrating the distributed heterogeneous data sources with each other, so that the user can access the data sources in a transparent manner; (S3) multi-source heterogeneous data information collection, control, communication, application, operation and maintenance, diagnosis or data display after application integration. A data integration system of heterogeneous data sources, comprising: the system comprises a data source, an integration module and an application module; the output end of the data source is connected with the input end of the integrated module, and the output end of the integrated module is connected with the input end of the application module. The invention can realize the analysis and diagnosis of heterogeneous data integration and improve the data integration and application capability of a heterogeneous data source.
Description
Technical Field
The present invention relates to the field of network technologies, and in particular, to a data integration method and system for heterogeneous data sources.
Background
With the development of database technology and the popularization of networks, on one hand, massive data are stored in heterogeneous databases to form an information island which is not beneficial to data sharing, and on the other hand, with the aggravation of global market competition, more and more information systems need to share data in the heterogeneous databases. This requires integration of the data information.
Currently, there are generally two methods for implementing the integration of heterogeneous databases. The first is to migrate the original data to a new data management system, and some non-traditional data types must be converted to new data types in order to integrate different types of data. Many relational database vendors provide similar functionality. The disadvantage of this integration is that as the data management system is upgraded, the application software associated with the original data is either discarded or re-developed to accommodate the new data management system. Thus, migration to a new system is not a practical solution in general. The second approach is to integrate heterogeneous databases using middleware, which does not require changes in the way the raw data is stored and managed. The middleware is positioned between the heterogeneous database systems (data layer) and the application programs (application layer), coordinates all the database systems downwards, and provides a uniform data mode and a general interface for data access for applications accessing the integrated data upwards. The application of each database still does not complete their tasks, and the middleware system mainly focuses on providing a high-level retrieval service for heterogeneous data sources. But the search method is inefficient.
Data integration is actually a typical ETL process, and how to extract data from a source-end database through a reader plug-in, and finally write the data into a large data center through data conversion becomes a technical problem to be solved urgently.
Disclosure of Invention
Aiming at the technical defects, the invention discloses a data integration method and a data integration system of a heterogeneous data source, which can realize the analysis and diagnosis of heterogeneous data integration and improve the data integration and application capability of the heterogeneous data source.
In order to realize the technical scheme, the following technical scheme is adopted in the research:
a data integration method for heterogeneous data sources, comprising the steps of:
(S1) a plurality of data sources with different data structures, access modes and forms are collected, and information among different databases is transferred or interacted; the data acquisition modes include, but are not limited to: an SMS network, a GPRS network, a CDMA wireless network, or a fiber optic network;
(S2) integrating the distributed heterogeneous data sources with each other, so that the user can access the data sources in a transparent manner; the system is used for analyzing the heterogeneous data source integration information; when the heterogeneous data source integrated information is analyzed, an improved harmony search optimization algorithm is used for analyzing the heterogeneous data source integrated information; a long-time memory neural network algorithm is used for diagnosing fault information in the heterogeneous data source integration process; and realize the interaction and communication of different data channels;
(S3) multi-source heterogeneous data information collection, control, communication, application, operation and maintenance, diagnosis or data display after application integration.
As a further technical scheme of the invention, the improved harmony search optimization algorithm is an optimization algorithm based on a Markov decision process model.
As a further technical scheme of the invention, the improved harmony search optimization algorithm comprises the following steps:
the method comprises the following steps: defining collected multi-source heterogeneous data information:
in formula (1), f (x) refers to an objective function of multi-source heterogeneous integrated information evaluation; x is the number ofiRefers to variables, X, that affect the evaluation of multi-source heterogeneous integrated informationiThe method refers to a multi-source heterogeneous integrated information evaluation area range; and N refers to the number of variables in the multi-source heterogeneous integrated information evaluation function. Firstly, parameters such as harmonic vector set size, HMCR, maximum iteration number and the like required for solving an optimization problem in the HS algorithm are defined.
Step two: generation of HM: the harmony vector set is a place for storing values obtained by all solution vectors and evaluation data objective functions output during multi-source heterogeneous integrated information evaluation in each iteration, wherein randomly generated variable values influencing the multi-source heterogeneous integrated information are filled in the harmony vector set, and an output multi-source heterogeneous integrated information evaluation information formation matrix B is represented as follows:
step three: generation of new harmony: in this step, the new harmonic vector elements in the multi-source heterogeneous integrated information parameters are generated by updating the multi-source heterogeneous integrated information parameter elements of the HM or assigning a random value to the X multi-source heterogeneous integrated information data range applied in the second step according to the HMCR possibility; for this purpose, a random multi-source heterogeneous integration information parameter is selected between 0 and 1:
if the randomly generated multi-source heterogeneous integrated information parameter number corresponds to the HMCR possibility, the possibility is between 0 and 1, a new vector multi-source heterogeneous integrated information parameter can be picked up from multi-source heterogeneous integrated information parameter elements in the HM, and if the randomly generated multi-source heterogeneous integrated information parameter number does not accord with the HMCR possibility, a new multi-source heterogeneous integrated information parameter vector element is randomly selected from parameter variables influencing the range of the multi-source heterogeneous integrated information parameter data set instead of being selected from the HM;
step four: HM updating: in the stage, a heterogeneous data evaluation objective function is mainly calculated according to the value of a newly generated multi-source heterogeneous integrated information parameter solution vector; this value is then compared to the objective function value of the solution vector for the HM; if the objective function value of the newly generated solution vector is better than the objective function value, the newly established harmony solution vector replaces the worst harmony vector of the objective function value, and the worst solution vector is deleted from the HM; thus, a better solution vector is stored in the HM;
step five: repeating (3) and (4) until the termination criterion: if the criterion is met, the iterative training is ended, and the optimal vector found in the HM is used as a final solution of the multi-source heterogeneous integrated information estimation; if this criterion is not met, steps 3 and 4 are repeated.
As a further technical scheme of the invention, an MDP model is introduced in the harmony generation process.
As a further technical scheme of the invention, the long-time memory neural network algorithm is a fault diagnosis method realized based on a single LSTM block, and comprises the following steps:
(1) inputting, deleting and reading multi-source heterogeneous integrated information; multi-source heterogeneous integrated information processing is realized; information updating is continuously realized, and the information screening capability is improved; set up CtStoring information for the heterogeneous data information; f. oftRemoving information for heterogeneous data information, itFor heterogeneous data information flows into information, OtStreaming information for heterogeneous data information;
(2) calling a Sigmoid function, and calculating an output function of a single LSTM block, wherein the function formula is as follows:
wherein t represents different network node parameter data nodes in the neural network model, W [ i, f, C, O ] represents a parameter weight matrix in the neural network model, b [ i, f, C, O ] represents offset vectors of different node weight matrices in the neural network model, X represents input multi-source heterogeneous integrated information operation data information parameters, and Y represents multi-source heterogeneous integrated information operation fault diagnosis data output parameters;
(3) reading a single LSTM block to output storage information, wherein the output function is as follows:
where tanh is a hyperbolic tangent function and e represents a multiplication in the neural network node calculated as an element.
As a further technical scheme of the invention, the long-time memory neural network algorithm is added with a Softmax classification function, and the classification method comprises the following steps:
multi-source heterogeneous data information to be classified is passed through [ xt,yt]Representation, where different multi-source heterogeneous data information may be represented as ytE {1,2, …, K }, the softmax classification function can evaluate the data information under the application of the input multi-source heterogeneous integration information, and the probability p of occurrence under the jth application is assumed to be represented by the following formula:
in equation (6), θ is a parameter matrix of the probability calculated by the neural network model, θjThe data column vector is expressed as the J-th type related data column vector in the multi-source heterogeneous integrated information, then the standardized cross entropy loss function J is started to obtain the optimal value of the letter theta, and the output expression can be as follows:
in formula (7), where λ and M are the normalized model parameters of the input function J, to achieve the regularization computation requirement, the softmax classification function operates on the multi-source heterogeneous integrated information running data sample xtThe classification method of (2) is performed by the following formula:
yt=arg max p (8)
by classifying and evaluating the multi-source heterogeneous integrated information under different applications, the rapid classification is realized, and the operation and control capacity of the multi-source heterogeneous integrated information is improved.
In order to solve the technical problems, the invention also adopts the following technical scheme:
a data integration system of heterogeneous data sources, comprising:
a data source; the heterogeneous database is externally connected with a heterogeneous database and a heterogeneous data interface, is a set of various database systems under the limitation of the heterogeneous database and is used for realizing the sharing and transparent access of multi-source heterogeneous data information; the heterogeneous data interface is used for realizing information transmission or interaction among different databases;
an integration module; the system is used for integrating distributed heterogeneous data sources which are related to each other, so that a user can access the data sources in a transparent mode; the integrated module comprises an integrated control module, a channel communication analysis module, an information integration diagnosis module, a first channel conversion module, a second channel conversion module, a multi-source heterogeneous networking architecture, a router and an integrated output interface; the integrated control module is respectively connected with the channel communication analysis module, the information integration diagnosis module, the first channel conversion module and the integrated output interface, the multi-source heterogeneous networking framework is connected with the first channel conversion module through a router, the output end of the first channel conversion module is connected with the input end of the second channel conversion module, and the output end of the second channel conversion module is connected with the input end of the integrated output interface; and
an application module; the application module comprises a heterogeneous data storage control module, and a fault warning module, an operation and maintenance management module, a visual display module, a dynamic monitoring module, a fault warning module and a heterogeneous diagnosis output interface control which are connected with the heterogeneous data storage control module, wherein the fault warning module is connected with an LED lamp.
The output end of the data source is connected with the input end of the integrated module, and the output end of the integrated module is connected with the input end of the application module.
As a further technical scheme of the invention, the integrated control module is a dual-core processor comprising a DSP computing module and an ARM computing module, wherein the DSP computing module is a TMS321VC5501 model-based data module, and the ARM computing module is a S3C-44BOX model-based data moduleA block; the channel communication analysis module is an improved harmony search optimization algorithm module, and the information integration diagnosis module is a long-time memory neural network algorithm module; the single LSTM block structure in the long-time memory neural network algorithm module comprises a storage module CtAnd an information removing door ftInformation entry door itAnd an information outflow gate Ot。
As a further technical solution of the present invention, the channel communication analysis module includes a first program medium and an improved harmonic search optimization algorithm program disposed on the first medium; the system is used for analyzing the heterogeneous data source integration information;
the information integration diagnosis module comprises a second program medium and a long-term memory neural network algorithm program arranged on the second medium; the method is used for diagnosing fault information in the heterogeneous data source integration process.
As a further technical solution of the present invention, the first channel conversion module is provided with an SDN controller; the second channel conversion module is provided with an ASON controller.
Has the positive and beneficial effects that:
1. the invention integrates a plurality of data sources with different acquired data structures, access modes and forms, realizes the information transmission or interaction among different databases, and improves the integration and application capability of heterogeneous network data information.
2. An improved harmony search optimization algorithm is adopted for analyzing the integrated information of the heterogeneous data sources; the capability of data integration analysis is improved.
3. A long-time memory neural network algorithm is used for diagnosing fault information in the heterogeneous data source integration process; and realize the interaction and communication of different data channels. And the data integration fault diagnosis is improved.
4. The integrated multi-source heterogeneous data information is acquired, controlled, communicated, applied, operated and maintained, diagnosed or displayed, and the data application capacity is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive exercise, wherein:
FIG. 1 is a schematic flow chart illustrating a data integration method of heterogeneous data sources according to the present invention;
FIG. 2 is a schematic flow chart of an improved harmony search optimization algorithm in the data integration method of the heterogeneous data source according to the present invention;
FIG. 3 is a schematic diagram of a data integration apparatus for heterogeneous data sources according to the present invention;
FIG. 4 is a diagram illustrating an integrated module hardware architecture of a data integration apparatus of heterogeneous data sources according to the present invention;
FIG. 5 is a diagram illustrating an application module hardware architecture of a data integration apparatus of heterogeneous data sources according to the present invention;
FIG. 6 is a schematic diagram of a single LSTM block in a long-term and short-term neural network algorithm module in the data integration apparatus of a heterogeneous data source according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are merely for the purpose of illustrating and explaining the present invention and are not intended to limit the present invention.
Multi-mode integrated fusion of multi-source heterogeneous data sources: the method is compatible with batch, real-time data integration and CDC data synchronization of various data source systems such as DB, MPP, NoSQL, file systems, Http, NoSQL, Ftp and the like. In order to implement the above technical solution, the following embodiments are adopted in the present study, wherein it is noted that:
the Markov Decision Process model refers to Markov Decision Process, MDP; and the sound vector set refers to Harmony Memory, HM; harmony preference ratio refers to Harmony Memory consistency Rate, HMCR, Pitch fitness refers to Pitch Adjusting Rate, PAR, and Long and Short Term Memory refers to Long and Short Term Memory, LSTM.
Example one
As shown in fig. 1-2, a data integration method for heterogeneous data sources includes the following steps:
(S1) a plurality of data sources with different data structures, access modes and forms are collected, and information among different databases is transferred or interacted; the data acquisition modes include, but are not limited to: an SMS network, a GPRS network, a CDMA wireless network, or a fiber optic network;
(S2) integrating the distributed heterogeneous data sources with each other, so that the user can access the data sources in a transparent manner; the system is used for analyzing the heterogeneous data source integration information; when the heterogeneous data source integrated information is analyzed, an improved harmony search optimization algorithm is used for analyzing the heterogeneous data source integrated information; a long-time memory neural network algorithm is used for diagnosing fault information in the heterogeneous data source integration process; and realize the interaction and communication of different data channels;
(S3) multi-source heterogeneous data information collection, control, communication, application, operation and maintenance, diagnosis or data display after application integration.
In the present invention, the modified harmonic search optimization algorithm is an optimization algorithm based on a markov decision process model. And optimizing the H-BIM model based on an improved HS optimization algorithm of Markov Decision Process (MDP). The HS is a group-based meta-heuristic algorithm, a group of solutions can be kept in a harmonic vector set (Harmony Memory, HM), and in training heterogeneous communication information samples in each iteration, an optimal solution is obtained through a group of optimization parameters applied to the HM, so that a new harmonic vector consisting of harmonic optimization Rate (HMCR) and harmonic high fitness (PAR) is obtained. The HS algorithm can be divided into four steps: HM initialization, generation of harmony, addition of newly generated harmony to the HM (provided its fitness is higher than the worst fitness value in the previous HM), and satisfaction of a termination criterion (e.g., maximum number of iterations). The principle of the HS algorithm is that a perfect solution determined by an objective function is sought in the heterogeneous data information optimization process, and the high-efficiency high standard of analyzing heterogeneous data information by an H-BIM model is realized through the method of optimizing a result, the objective function and the optimal solution.
In the present invention, the improved harmonic search optimization algorithm comprises the following steps:
the method comprises the following steps: defining collected multi-source heterogeneous data information:
in formula (1), f (x) refers to an objective function of multi-source heterogeneous integrated information evaluation; x is the number ofiRefers to variables, X, that affect the evaluation of multi-source heterogeneous integrated informationiThe method refers to a multi-source heterogeneous integrated information evaluation area range; n refers to the number of variables in the multi-source heterogeneous integrated information evaluation function; defining parameters such as harmonic vector set size, HMCR, maximum iteration number and the like required for solving an optimization problem in the HS algorithm;
step two: generation of HM: the harmony vector set is a place for storing values obtained by all solution vectors and evaluation data objective functions output during multi-source heterogeneous integrated information evaluation in each iteration, wherein randomly generated variable values influencing the multi-source heterogeneous integrated information are filled in the harmony vector set, and an output multi-source heterogeneous integrated information evaluation information formation matrix B is represented as follows:
step three: generation of new harmony: in this step, the new harmonic vector elements in the multi-source heterogeneous integrated information parameters are generated by updating the multi-source heterogeneous integrated information parameter elements of the HM or assigning a random value to the X multi-source heterogeneous integrated information data range applied in the second step according to the HMCR possibility; for this purpose, a random multi-source heterogeneous integration information parameter is selected between 0 and 1:
if the randomly generated multi-source heterogeneous integrated information parameter number corresponds to the HMCR possibility, the possibility is between 0 and 1, a new vector multi-source heterogeneous integrated information parameter can be picked up from multi-source heterogeneous integrated information parameter elements in the HM, and if the randomly generated multi-source heterogeneous integrated information parameter number does not accord with the HMCR possibility, a new multi-source heterogeneous integrated information parameter vector element is randomly selected from parameter variables influencing the range of the multi-source heterogeneous integrated information parameter data set instead of being selected from the HM;
step four: HM updating: in the stage, a heterogeneous data evaluation objective function is mainly calculated according to the value of a newly generated multi-source heterogeneous integrated information parameter solution vector; this value is then compared to the objective function value of the solution vector for the HM; if the objective function value of the newly generated solution vector is better than the objective function value, the newly established harmony solution vector replaces the worst harmony vector of the objective function value, and the worst solution vector is deleted from the HM; thus, a better solution vector is stored in the HM;
step five: repeating (3) and (4) until the termination criterion: if the criterion is met, the iterative training is ended, and the optimal vector found in the HM is used as a final solution of the multi-source heterogeneous integrated information estimation; if this criterion is not met, steps 3 and 4 are repeated.
In the invention, the MDP model is introduced in the harmony generation process. In a specific embodiment, it is assumed that the empirical sample set G of the information parameters of the heterogeneous data source generated by the MDP model is:
G=[(s,a),(s′,r)]=[G1,G2] (4)
wherein, G1 and G2 correspond to x1 and x2 respectively. Since the subsequent state function is the last finite state function of the continuation, both G1, G2 have similarities, introducing the concept of relative entropy (KL), which represents the similarities of both G1, G2 as:
in formula (5), P1 and Q correspond to G1 and G2, respectively. P and Q are function values in P and Q respectively, and i represents a relative entropy function independent variable evaluated by the heterogeneous data source.
As a result of the extension of equation (5), DKL is 0 if P is Q. This is because when the similarity between the generated state and action function pair and the generated subsequent state and reward function pair is very high, the relative entropy of the two is infinitely close to 0, the heterogeneous data source evaluation objective function of the MDP model will obtain the global minimum, and the quality of the trained heterogeneous data source information parameter sample is also very high.
In the invention, the long-time memory neural network algorithm is a fault diagnosis method realized based on a single LSTM block, and comprises the following steps:
(1) inputting, deleting and reading multi-source heterogeneous integrated information; multi-source heterogeneous integrated information processing is realized; information updating is continuously realized, and the information screening capability is improved; set up CtStoring information for the heterogeneous data information; f. oftRemoving information for heterogeneous data information, itFor heterogeneous data information flows into information, OtStreaming information for heterogeneous data information;
(2) calling a Sigmoid function, and calculating an output function of a single LSTM block, wherein the function formula is as follows:
wherein t represents different network node parameter data nodes in the neural network model, W [ i, f, C, O ] represents a parameter weight matrix in the neural network model, b [ i, f, C, O ] represents offset vectors of different node weight matrices in the neural network model, X represents input multi-source heterogeneous integrated information operation data information parameters, and Y represents multi-source heterogeneous integrated information operation fault diagnosis data output parameters;
(3) reading a single LSTM block to output storage information, wherein the output function is as follows:
where tanh is a hyperbolic tangent function representing a multiplication by an element calculation in a neural network node.
In the invention, the long-time memory neural network algorithm is added with a Softmax classification function, and the classification method comprises the following steps:
multi-source heterogeneous data information to be classified is passed through [ xt,yt]Representation, where different multi-source heterogeneous data information may be represented as ytE {1,2, …, K }, the softmax classification function can evaluate the data information under the application of the input multi-source heterogeneous integration information, and the probability p of occurrence under the jth application is assumed to be represented by the following formula:
in equation (8), θ is a parameter matrix of the probability calculated by the neural network model, θjThe data column vector is expressed as the J-th type related data column vector in the multi-source heterogeneous integrated information, then the standardized cross entropy loss function J is started to obtain the optimal value of the letter theta, and the output expression can be as follows:
in the formula (9), λ and M are standardized model parameters of the input function J, and in order to realize the regularization calculation requirement, the softmax classification function operates the data sample x on the multi-source heterogeneous integrated informationtThe classification method of (2) is performed by the following formula:
yt=arg max p (10)
by classifying and evaluating the multi-source heterogeneous integrated information under different applications, the rapid classification is realized, and the operation and control capacity of the multi-source heterogeneous integrated information is improved.
Example two
As shown in fig. 3-6, a data integration system of heterogeneous data sources includes:
a data source; the heterogeneous database is externally connected with a heterogeneous database and a heterogeneous data interface, is a set of various database systems under the limitation of the heterogeneous database and is used for realizing the sharing and transparent access of multi-source heterogeneous data information; the heterogeneous data interface is used for realizing information transmission or interaction among different databases;
an integration module; the system is used for integrating distributed heterogeneous data sources which are related to each other, so that a user can access the data sources in a transparent mode; the integrated module comprises an integrated control module, a channel communication analysis module, an information integration diagnosis module, a first channel conversion module, a second channel conversion module, a multi-source heterogeneous networking architecture, a router and an integrated output interface; the integrated control module is respectively connected with the channel communication analysis module, the information integration diagnosis module, the first channel conversion module and the integrated output interface, the multi-source heterogeneous networking framework is connected with the first channel conversion module through a router, the output end of the first channel conversion module is connected with the input end of the second channel conversion module, and the output end of the second channel conversion module is connected with the input end of the integrated output interface; and
an application module; the application module comprises a heterogeneous data storage control module, and a fault warning module, an operation and maintenance management module, a visual display module, a dynamic monitoring module, a fault warning module and a heterogeneous diagnosis output interface control which are connected with the heterogeneous data storage control module, wherein the fault warning module is connected with an LED lamp.
The output end of the data source is connected with the input end of the integrated module, and the output end of the integrated module is connected with the input end of the application module.
In the invention, the integrated control module is a dual-core processor comprising a DSP calculation module and an ARM calculation module, wherein the DSP calculation module is a TMS321VC5501 model-based data module, and the ARM calculation module is an S3C-44BOX model-based data module; the channel communication analysis module is an improved harmony search optimization algorithm module, and the information integration diagnosis module is a long-time memory nerveA network algorithm module; as shown in FIG. 3, a single LSTM block structure in the LONG-TIME MEMORY NEURAL NETWORK ALGORITHM module comprises a storage module CtAnd an information removing door ftInformation entry door itAnd an information outflow gate Ot。
In the invention, the channel communication analysis module comprises a first program medium and an improved harmonic search optimization algorithm program arranged on the first medium; the system is used for analyzing the heterogeneous data source integration information;
the information integration diagnosis module comprises a second program medium and a long-term memory neural network algorithm program arranged on the second medium; the method is used for diagnosing fault information in the heterogeneous data source integration process.
In the invention, the first channel conversion module is provided with an SDN controller; the second channel conversion module is provided with an ASON controller.
Although specific embodiments of the present invention have been described above, it will be understood by those skilled in the art that these specific embodiments are merely illustrative and that various omissions, substitutions and changes in the form of the detail of the methods and systems described above may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the steps of the above-described methods to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is to be limited only by the following claims.
Claims (10)
1. A data integration method of heterogeneous data sources is characterized in that: the method comprises the following steps:
(S1) a plurality of data sources with different data structures, access modes and forms are collected, and information among different databases is transferred or interacted; the data acquisition modes include, but are not limited to: an SMS network, a GPRS network, a CDMA wireless network, or a fiber optic network;
(S2) integrating the distributed heterogeneous data sources with each other, so that the user can access the data sources in a transparent manner; the system is used for analyzing the heterogeneous data source integration information; when the heterogeneous data source integrated information is analyzed, an improved harmony search optimization algorithm is used for analyzing the heterogeneous data source integrated information; a long-time memory neural network algorithm is used for diagnosing fault information in the heterogeneous data source integration process; and realize the interaction and communication of different data channels;
(S3) multi-source heterogeneous data information collection, control, communication, application, operation and maintenance, diagnosis or data display after application integration.
2. The data integration method of the heterogeneous data source according to claim 1, wherein: the improved harmony search optimization algorithm is an optimization algorithm based on a Markov decision process model.
3. The data integration method of the heterogeneous data source according to claim 2, wherein: the improved harmony search optimization algorithm comprises the following steps:
the method comprises the following steps: defining collected multi-source heterogeneous data information:
in formula (1), f (x) refers to an objective function of multi-source heterogeneous integrated information evaluation; x is the number ofiRefers to variables, X, that affect the evaluation of multi-source heterogeneous integrated informationiThe method refers to a multi-source heterogeneous integrated information evaluation area range; n refers to the number of variables in the multi-source heterogeneous integrated information evaluation function; defining parameters such as harmonic vector set size, HMCR, maximum iteration number and the like required for solving an optimization problem in the HS algorithm;
step two: generation of HM: the harmony vector set is a place for storing values obtained by all solution vectors and evaluation data objective functions output during multi-source heterogeneous integrated information evaluation in each iteration, wherein randomly generated variable values influencing the multi-source heterogeneous integrated information are filled in the harmony vector set, and an output multi-source heterogeneous integrated information evaluation information formation matrix B is represented as follows:
step three: generation of new harmony: in this step, the new harmonic vector elements in the multi-source heterogeneous integrated information parameters are generated by updating the multi-source heterogeneous integrated information parameter elements of the HM or assigning a random value to the X multi-source heterogeneous integrated information data range applied in the second step according to the HMCR possibility; for this purpose, a random multi-source heterogeneous integration information parameter is selected between 0 and 1:
if the randomly generated multi-source heterogeneous integrated information parameter number corresponds to the HMCR possibility, the possibility is between 0 and 1, a new vector multi-source heterogeneous integrated information parameter can be picked up from multi-source heterogeneous integrated information parameter elements in the HM, and if the randomly generated multi-source heterogeneous integrated information parameter number does not accord with the HMCR possibility, a new multi-source heterogeneous integrated information parameter vector element is randomly selected from parameter variables influencing the range of the multi-source heterogeneous integrated information parameter data set instead of being selected from the HM;
step four: HM updating: in the stage, a heterogeneous data evaluation objective function is mainly calculated according to the value of a newly generated multi-source heterogeneous integrated information parameter solution vector; this value is then compared to the objective function value of the solution vector for the HM; if the objective function value of the newly generated solution vector is better than the objective function value, the newly established harmony solution vector replaces the worst harmony vector of the objective function value, and the worst solution vector is deleted from the HM; thus, a better solution vector is stored in the HM;
step five: repeating (3) and (4) until the termination criterion: if the criterion is met, the iterative training is ended, and the optimal vector found in the HM is used as a final solution of the multi-source heterogeneous integrated information estimation; if this criterion is not met, steps 3 and 4 are repeated.
4. The data integration method of the heterogeneous data source according to claim 3, wherein: and introducing an MDP model in the harmony generation process.
5. The data integration method of the heterogeneous data source according to claim 1, wherein: the long-time memory neural network algorithm is a fault diagnosis method realized based on a single LSTM block, and comprises the following steps:
(1) inputting, deleting and reading multi-source heterogeneous integrated information; multi-source heterogeneous integrated information processing is realized; information updating is continuously realized, and the information screening capability is improved; set up CtStoring information for the heterogeneous data information; f. oftRemoving information for heterogeneous data information, itFor heterogeneous data information flows into information, OtStreaming information for heterogeneous data information;
(2) calling a Sigmoid function, and calculating an output function of a single LSTM block, wherein the function formula is as follows:
wherein t represents different network node parameter data nodes in the neural network model, W [ i, f, C, O ] represents a parameter weight matrix in the neural network model, b [ i, f, C, O ] represents offset vectors of different node weight matrices in the neural network model, X represents input multi-source heterogeneous integrated information operation data information parameters, and Y represents multi-source heterogeneous integrated information operation fault diagnosis data output parameters;
(3) reading a single LSTM block to output storage information, wherein the output function is as follows:
where tanh is a hyperbolic tangent function representing a multiplication by an element calculation in a neural network node.
6. The data integration method of the heterogeneous data source according to claim 5, wherein: the long-time memory neural network algorithm is added with a Softmax classification function, and the classification method comprises the following steps:
multi-source heterogeneous data information to be classified is passed through [ xt,yt]Representation, where different multi-source heterogeneous data information may be represented as ytE {1,2, …, K }, the softmax classification function can evaluate the data information under the application of the input multi-source heterogeneous integration information, and the probability p of occurrence under the jth application is assumed to be represented by the following formula:
in equation (6), θ is a parameter matrix of the probability calculated by the neural network model, θjThe data column vector is expressed as the J-th type related data column vector in the multi-source heterogeneous integrated information, then the standardized cross entropy loss function J is started to obtain the optimal value of the letter theta, and the output expression can be as follows:
in formula (7), where λ and M are the normalized model parameters of the input function J, to achieve the regularization computation requirement, the softmax classification function operates on the multi-source heterogeneous integrated information running data sample xtThe classification method of (2) is performed by the following formula:
yt=arg max p (8)
by classifying and evaluating the multi-source heterogeneous integrated information under different applications, the rapid classification is realized, and the operation and control capacity of the multi-source heterogeneous integrated information is improved.
7. A data integration system for heterogeneous data sources, comprising: the method comprises the following steps:
a data source; the heterogeneous database is externally connected with a heterogeneous database and a heterogeneous data interface, is a set of various database systems under the limitation of the heterogeneous database and is used for realizing the sharing and transparent access of multi-source heterogeneous data information; the heterogeneous data interface is used for realizing information transmission or interaction among different databases;
an integration module; the system is used for integrating distributed heterogeneous data sources which are related to each other, so that a user can access the data sources in a transparent mode; the integrated module comprises an integrated control module, a channel communication analysis module, an information integration diagnosis module, a first channel conversion module, a second channel conversion module, a multi-source heterogeneous networking architecture, a router and an integrated output interface; the integrated control module is respectively connected with the channel communication analysis module, the information integration diagnosis module, the first channel conversion module and the integrated output interface, the multi-source heterogeneous networking framework is connected with the first channel conversion module through a router, the output end of the first channel conversion module is connected with the input end of the second channel conversion module, and the output end of the second channel conversion module is connected with the input end of the integrated output interface; and
an application module; the application module comprises a heterogeneous data storage control module, and a fault warning module, an operation and maintenance management module, a visual display module, a dynamic monitoring module, a fault warning module and a heterogeneous diagnosis output interface control which are connected with the heterogeneous data storage control module, wherein the fault warning module is connected with an LED lamp;
the output end of the data source is connected with the input end of the integrated module, and the output end of the integrated module is connected with the input end of the application module.
8. The data integration system of heterogeneous data sources of claim 7, wherein: the integrated control module is a dual-core processor with a DSP computing module and an ARM computing module, wherein the DSP computing module is a data module based on a TMS321VC5501 model, and the ARM computing module is based on an S3C-44BOX modelA data module; the channel communication analysis module is an improved harmony search optimization algorithm module, and the information integration diagnosis module is a long-time memory neural network algorithm module; the single LSTM block structure in the long-time memory neural network algorithm module comprises a storage module CtAnd an information removing door ftInformation entry door itAnd an information outflow gate Ot。
9. The data integration system of heterogeneous data sources of claim 7, wherein:
the channel communication analysis module comprises a first program medium and an improved harmonic search optimization algorithm program arranged on the first medium; the system is used for analyzing the heterogeneous data source integration information;
the information integration diagnosis module comprises a second program medium and a long-time memory neural network algorithm program arranged on the second medium; the method is used for diagnosing fault information in the heterogeneous data source integration process.
10. The data integration system of heterogeneous data sources of claim 7, wherein: the first channel conversion module is provided with an SDN controller; the second channel conversion module is provided with an ASON controller.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110952257.7A CN113656480B (en) | 2021-08-19 | 2021-08-19 | Data integration method and system of heterogeneous data sources |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110952257.7A CN113656480B (en) | 2021-08-19 | 2021-08-19 | Data integration method and system of heterogeneous data sources |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113656480A true CN113656480A (en) | 2021-11-16 |
CN113656480B CN113656480B (en) | 2024-09-24 |
Family
ID=78481186
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110952257.7A Active CN113656480B (en) | 2021-08-19 | 2021-08-19 | Data integration method and system of heterogeneous data sources |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113656480B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118014065A (en) * | 2024-01-30 | 2024-05-10 | 新疆泽智信息技术有限公司 | Multi-mode heterogeneous admission data integration method based on knowledge graph |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104168569A (en) * | 2014-07-15 | 2014-11-26 | 哈尔滨工程大学 | Dynamic frequency spectrum distribution method of cognitive heterogeneous network |
CN105163325A (en) * | 2015-09-25 | 2015-12-16 | 重庆工商大学 | Heterogeneous directed sensor network deployment method |
CN111098312A (en) * | 2018-12-12 | 2020-05-05 | 广东鼎义互联科技股份有限公司 | Window government affairs service robot |
US20210209388A1 (en) * | 2020-01-06 | 2021-07-08 | The Research Foundation For The State University Of New York | Fakecatcher: detection of synthetic portrait videos using biological signals |
-
2021
- 2021-08-19 CN CN202110952257.7A patent/CN113656480B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104168569A (en) * | 2014-07-15 | 2014-11-26 | 哈尔滨工程大学 | Dynamic frequency spectrum distribution method of cognitive heterogeneous network |
CN105163325A (en) * | 2015-09-25 | 2015-12-16 | 重庆工商大学 | Heterogeneous directed sensor network deployment method |
CN111098312A (en) * | 2018-12-12 | 2020-05-05 | 广东鼎义互联科技股份有限公司 | Window government affairs service robot |
US20210209388A1 (en) * | 2020-01-06 | 2021-07-08 | The Research Foundation For The State University Of New York | Fakecatcher: detection of synthetic portrait videos using biological signals |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118014065A (en) * | 2024-01-30 | 2024-05-10 | 新疆泽智信息技术有限公司 | Multi-mode heterogeneous admission data integration method based on knowledge graph |
Also Published As
Publication number | Publication date |
---|---|
CN113656480B (en) | 2024-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Random search and reproducibility for neural architecture search | |
CN113905391B (en) | Integrated learning network traffic prediction method, system, equipment, terminal and medium | |
CN110175628A (en) | A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation | |
Liu et al. | A survey on computationally efficient neural architecture search | |
CN112686376A (en) | Node representation method based on timing diagram neural network and incremental learning method | |
CN113808396A (en) | Traffic speed prediction method and system based on traffic flow data fusion | |
Bi et al. | Large-scale network traffic prediction with LSTM and temporal convolutional networks | |
CN117034100A (en) | Self-adaptive graph classification method, system, equipment and medium based on hierarchical pooling architecture | |
CN117786602A (en) | Long-period multi-element time sequence prediction method based on multi-element information interaction | |
CN113656480A (en) | Data integration method and system for heterogeneous data source | |
CN116542701A (en) | Carbon price prediction method and system based on CNN-LSTM combination model | |
WO2023274213A1 (en) | Data processing method and related apparatus | |
CN114463596A (en) | Small sample image identification method, device and equipment of hypergraph neural network | |
CN117421657B (en) | Method and system for screening and learning picture samples with noise labels based on oversampling strategy | |
CN117913808A (en) | Distributed photovoltaic power generation prediction method and device | |
CN116992940A (en) | SAR image multi-type target detection light-weight method and device combining channel pruning and knowledge distillation | |
Hao et al. | Architecture self-attention mechanism: Nonlinear optimization for neural architecture search | |
CN112699271B (en) | Recommendation method for improving retention time of user video website | |
CN115904728A (en) | Memory consumption value estimation method and device, terminal equipment and storage medium | |
CN115081609A (en) | Acceleration method in intelligent decision, terminal equipment and storage medium | |
CN114818945A (en) | Small sample image classification method and device integrating category adaptive metric learning | |
CN114401496A (en) | Video information rapid processing method based on 5G edge calculation | |
CN111382191A (en) | Machine learning identification method based on deep learning | |
Narkhede et al. | Towards compressed and efficient CNN architectures via pruning | |
CN117151229B (en) | Cloud reasoning method and system based on cloud side architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |