[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113656480A - Data integration method and system for heterogeneous data source - Google Patents

Data integration method and system for heterogeneous data source Download PDF

Info

Publication number
CN113656480A
CN113656480A CN202110952257.7A CN202110952257A CN113656480A CN 113656480 A CN113656480 A CN 113656480A CN 202110952257 A CN202110952257 A CN 202110952257A CN 113656480 A CN113656480 A CN 113656480A
Authority
CN
China
Prior art keywords
data
information
heterogeneous
module
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110952257.7A
Other languages
Chinese (zh)
Other versions
CN113656480B (en
Inventor
唐华
徐海鹏
张华�
丁英峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Qincheng Health Technology Co Ltd
Original Assignee
Shandong Qincheng Health Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Qincheng Health Technology Co Ltd filed Critical Shandong Qincheng Health Technology Co Ltd
Priority to CN202110952257.7A priority Critical patent/CN113656480B/en
Publication of CN113656480A publication Critical patent/CN113656480A/en
Application granted granted Critical
Publication of CN113656480B publication Critical patent/CN113656480B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A data integration method and a system of a heterogeneous data source relate to the technical field of networks, and the integration method comprises the following steps: (S1) a plurality of data sources with different data structures, access modes and forms are collected, and information among different databases is transferred or interacted; (S2) integrating the distributed heterogeneous data sources with each other, so that the user can access the data sources in a transparent manner; (S3) multi-source heterogeneous data information collection, control, communication, application, operation and maintenance, diagnosis or data display after application integration. A data integration system of heterogeneous data sources, comprising: the system comprises a data source, an integration module and an application module; the output end of the data source is connected with the input end of the integrated module, and the output end of the integrated module is connected with the input end of the application module. The invention can realize the analysis and diagnosis of heterogeneous data integration and improve the data integration and application capability of a heterogeneous data source.

Description

Data integration method and system for heterogeneous data source
Technical Field
The present invention relates to the field of network technologies, and in particular, to a data integration method and system for heterogeneous data sources.
Background
With the development of database technology and the popularization of networks, on one hand, massive data are stored in heterogeneous databases to form an information island which is not beneficial to data sharing, and on the other hand, with the aggravation of global market competition, more and more information systems need to share data in the heterogeneous databases. This requires integration of the data information.
Currently, there are generally two methods for implementing the integration of heterogeneous databases. The first is to migrate the original data to a new data management system, and some non-traditional data types must be converted to new data types in order to integrate different types of data. Many relational database vendors provide similar functionality. The disadvantage of this integration is that as the data management system is upgraded, the application software associated with the original data is either discarded or re-developed to accommodate the new data management system. Thus, migration to a new system is not a practical solution in general. The second approach is to integrate heterogeneous databases using middleware, which does not require changes in the way the raw data is stored and managed. The middleware is positioned between the heterogeneous database systems (data layer) and the application programs (application layer), coordinates all the database systems downwards, and provides a uniform data mode and a general interface for data access for applications accessing the integrated data upwards. The application of each database still does not complete their tasks, and the middleware system mainly focuses on providing a high-level retrieval service for heterogeneous data sources. But the search method is inefficient.
Data integration is actually a typical ETL process, and how to extract data from a source-end database through a reader plug-in, and finally write the data into a large data center through data conversion becomes a technical problem to be solved urgently.
Disclosure of Invention
Aiming at the technical defects, the invention discloses a data integration method and a data integration system of a heterogeneous data source, which can realize the analysis and diagnosis of heterogeneous data integration and improve the data integration and application capability of the heterogeneous data source.
In order to realize the technical scheme, the following technical scheme is adopted in the research:
a data integration method for heterogeneous data sources, comprising the steps of:
(S1) a plurality of data sources with different data structures, access modes and forms are collected, and information among different databases is transferred or interacted; the data acquisition modes include, but are not limited to: an SMS network, a GPRS network, a CDMA wireless network, or a fiber optic network;
(S2) integrating the distributed heterogeneous data sources with each other, so that the user can access the data sources in a transparent manner; the system is used for analyzing the heterogeneous data source integration information; when the heterogeneous data source integrated information is analyzed, an improved harmony search optimization algorithm is used for analyzing the heterogeneous data source integrated information; a long-time memory neural network algorithm is used for diagnosing fault information in the heterogeneous data source integration process; and realize the interaction and communication of different data channels;
(S3) multi-source heterogeneous data information collection, control, communication, application, operation and maintenance, diagnosis or data display after application integration.
As a further technical scheme of the invention, the improved harmony search optimization algorithm is an optimization algorithm based on a Markov decision process model.
As a further technical scheme of the invention, the improved harmony search optimization algorithm comprises the following steps:
the method comprises the following steps: defining collected multi-source heterogeneous data information:
Figure BDA0003218951380000021
in formula (1), f (x) refers to an objective function of multi-source heterogeneous integrated information evaluation; x is the number ofiRefers to variables, X, that affect the evaluation of multi-source heterogeneous integrated informationiThe method refers to a multi-source heterogeneous integrated information evaluation area range; and N refers to the number of variables in the multi-source heterogeneous integrated information evaluation function. Firstly, parameters such as harmonic vector set size, HMCR, maximum iteration number and the like required for solving an optimization problem in the HS algorithm are defined.
Step two: generation of HM: the harmony vector set is a place for storing values obtained by all solution vectors and evaluation data objective functions output during multi-source heterogeneous integrated information evaluation in each iteration, wherein randomly generated variable values influencing the multi-source heterogeneous integrated information are filled in the harmony vector set, and an output multi-source heterogeneous integrated information evaluation information formation matrix B is represented as follows:
Figure BDA0003218951380000022
step three: generation of new harmony: in this step, the new harmonic vector elements in the multi-source heterogeneous integrated information parameters are generated by updating the multi-source heterogeneous integrated information parameter elements of the HM or assigning a random value to the X multi-source heterogeneous integrated information data range applied in the second step according to the HMCR possibility; for this purpose, a random multi-source heterogeneous integration information parameter is selected between 0 and 1:
Figure BDA0003218951380000023
if the randomly generated multi-source heterogeneous integrated information parameter number corresponds to the HMCR possibility, the possibility is between 0 and 1, a new vector multi-source heterogeneous integrated information parameter can be picked up from multi-source heterogeneous integrated information parameter elements in the HM, and if the randomly generated multi-source heterogeneous integrated information parameter number does not accord with the HMCR possibility, a new multi-source heterogeneous integrated information parameter vector element is randomly selected from parameter variables influencing the range of the multi-source heterogeneous integrated information parameter data set instead of being selected from the HM;
step four: HM updating: in the stage, a heterogeneous data evaluation objective function is mainly calculated according to the value of a newly generated multi-source heterogeneous integrated information parameter solution vector; this value is then compared to the objective function value of the solution vector for the HM; if the objective function value of the newly generated solution vector is better than the objective function value, the newly established harmony solution vector replaces the worst harmony vector of the objective function value, and the worst solution vector is deleted from the HM; thus, a better solution vector is stored in the HM;
step five: repeating (3) and (4) until the termination criterion: if the criterion is met, the iterative training is ended, and the optimal vector found in the HM is used as a final solution of the multi-source heterogeneous integrated information estimation; if this criterion is not met, steps 3 and 4 are repeated.
As a further technical scheme of the invention, an MDP model is introduced in the harmony generation process.
As a further technical scheme of the invention, the long-time memory neural network algorithm is a fault diagnosis method realized based on a single LSTM block, and comprises the following steps:
(1) inputting, deleting and reading multi-source heterogeneous integrated information; multi-source heterogeneous integrated information processing is realized; information updating is continuously realized, and the information screening capability is improved; set up CtStoring information for the heterogeneous data information; f. oftRemoving information for heterogeneous data information, itFor heterogeneous data information flows into information, OtStreaming information for heterogeneous data information;
(2) calling a Sigmoid function, and calculating an output function of a single LSTM block, wherein the function formula is as follows:
Figure BDA0003218951380000031
wherein t represents different network node parameter data nodes in the neural network model, W [ i, f, C, O ] represents a parameter weight matrix in the neural network model, b [ i, f, C, O ] represents offset vectors of different node weight matrices in the neural network model, X represents input multi-source heterogeneous integrated information operation data information parameters, and Y represents multi-source heterogeneous integrated information operation fault diagnosis data output parameters;
(3) reading a single LSTM block to output storage information, wherein the output function is as follows:
Figure BDA0003218951380000032
where tanh is a hyperbolic tangent function and e represents a multiplication in the neural network node calculated as an element.
As a further technical scheme of the invention, the long-time memory neural network algorithm is added with a Softmax classification function, and the classification method comprises the following steps:
multi-source heterogeneous data information to be classified is passed through [ xt,yt]Representation, where different multi-source heterogeneous data information may be represented as ytE {1,2, …, K }, the softmax classification function can evaluate the data information under the application of the input multi-source heterogeneous integration information, and the probability p of occurrence under the jth application is assumed to be represented by the following formula:
Figure BDA0003218951380000041
in equation (6), θ is a parameter matrix of the probability calculated by the neural network model, θjThe data column vector is expressed as the J-th type related data column vector in the multi-source heterogeneous integrated information, then the standardized cross entropy loss function J is started to obtain the optimal value of the letter theta, and the output expression can be as follows:
Figure BDA0003218951380000042
in formula (7), where λ and M are the normalized model parameters of the input function J, to achieve the regularization computation requirement, the softmax classification function operates on the multi-source heterogeneous integrated information running data sample xtThe classification method of (2) is performed by the following formula:
yt=arg max p (8)
by classifying and evaluating the multi-source heterogeneous integrated information under different applications, the rapid classification is realized, and the operation and control capacity of the multi-source heterogeneous integrated information is improved.
In order to solve the technical problems, the invention also adopts the following technical scheme:
a data integration system of heterogeneous data sources, comprising:
a data source; the heterogeneous database is externally connected with a heterogeneous database and a heterogeneous data interface, is a set of various database systems under the limitation of the heterogeneous database and is used for realizing the sharing and transparent access of multi-source heterogeneous data information; the heterogeneous data interface is used for realizing information transmission or interaction among different databases;
an integration module; the system is used for integrating distributed heterogeneous data sources which are related to each other, so that a user can access the data sources in a transparent mode; the integrated module comprises an integrated control module, a channel communication analysis module, an information integration diagnosis module, a first channel conversion module, a second channel conversion module, a multi-source heterogeneous networking architecture, a router and an integrated output interface; the integrated control module is respectively connected with the channel communication analysis module, the information integration diagnosis module, the first channel conversion module and the integrated output interface, the multi-source heterogeneous networking framework is connected with the first channel conversion module through a router, the output end of the first channel conversion module is connected with the input end of the second channel conversion module, and the output end of the second channel conversion module is connected with the input end of the integrated output interface; and
an application module; the application module comprises a heterogeneous data storage control module, and a fault warning module, an operation and maintenance management module, a visual display module, a dynamic monitoring module, a fault warning module and a heterogeneous diagnosis output interface control which are connected with the heterogeneous data storage control module, wherein the fault warning module is connected with an LED lamp.
The output end of the data source is connected with the input end of the integrated module, and the output end of the integrated module is connected with the input end of the application module.
As a further technical scheme of the invention, the integrated control module is a dual-core processor comprising a DSP computing module and an ARM computing module, wherein the DSP computing module is a TMS321VC5501 model-based data module, and the ARM computing module is a S3C-44BOX model-based data moduleA block; the channel communication analysis module is an improved harmony search optimization algorithm module, and the information integration diagnosis module is a long-time memory neural network algorithm module; the single LSTM block structure in the long-time memory neural network algorithm module comprises a storage module CtAnd an information removing door ftInformation entry door itAnd an information outflow gate Ot
As a further technical solution of the present invention, the channel communication analysis module includes a first program medium and an improved harmonic search optimization algorithm program disposed on the first medium; the system is used for analyzing the heterogeneous data source integration information;
the information integration diagnosis module comprises a second program medium and a long-term memory neural network algorithm program arranged on the second medium; the method is used for diagnosing fault information in the heterogeneous data source integration process.
As a further technical solution of the present invention, the first channel conversion module is provided with an SDN controller; the second channel conversion module is provided with an ASON controller.
Has the positive and beneficial effects that:
1. the invention integrates a plurality of data sources with different acquired data structures, access modes and forms, realizes the information transmission or interaction among different databases, and improves the integration and application capability of heterogeneous network data information.
2. An improved harmony search optimization algorithm is adopted for analyzing the integrated information of the heterogeneous data sources; the capability of data integration analysis is improved.
3. A long-time memory neural network algorithm is used for diagnosing fault information in the heterogeneous data source integration process; and realize the interaction and communication of different data channels. And the data integration fault diagnosis is improved.
4. The integrated multi-source heterogeneous data information is acquired, controlled, communicated, applied, operated and maintained, diagnosed or displayed, and the data application capacity is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive exercise, wherein:
FIG. 1 is a schematic flow chart illustrating a data integration method of heterogeneous data sources according to the present invention;
FIG. 2 is a schematic flow chart of an improved harmony search optimization algorithm in the data integration method of the heterogeneous data source according to the present invention;
FIG. 3 is a schematic diagram of a data integration apparatus for heterogeneous data sources according to the present invention;
FIG. 4 is a diagram illustrating an integrated module hardware architecture of a data integration apparatus of heterogeneous data sources according to the present invention;
FIG. 5 is a diagram illustrating an application module hardware architecture of a data integration apparatus of heterogeneous data sources according to the present invention;
FIG. 6 is a schematic diagram of a single LSTM block in a long-term and short-term neural network algorithm module in the data integration apparatus of a heterogeneous data source according to the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, and it should be understood that the embodiments described herein are merely for the purpose of illustrating and explaining the present invention and are not intended to limit the present invention.
Multi-mode integrated fusion of multi-source heterogeneous data sources: the method is compatible with batch, real-time data integration and CDC data synchronization of various data source systems such as DB, MPP, NoSQL, file systems, Http, NoSQL, Ftp and the like. In order to implement the above technical solution, the following embodiments are adopted in the present study, wherein it is noted that:
the Markov Decision Process model refers to Markov Decision Process, MDP; and the sound vector set refers to Harmony Memory, HM; harmony preference ratio refers to Harmony Memory consistency Rate, HMCR, Pitch fitness refers to Pitch Adjusting Rate, PAR, and Long and Short Term Memory refers to Long and Short Term Memory, LSTM.
Example one
As shown in fig. 1-2, a data integration method for heterogeneous data sources includes the following steps:
(S1) a plurality of data sources with different data structures, access modes and forms are collected, and information among different databases is transferred or interacted; the data acquisition modes include, but are not limited to: an SMS network, a GPRS network, a CDMA wireless network, or a fiber optic network;
(S2) integrating the distributed heterogeneous data sources with each other, so that the user can access the data sources in a transparent manner; the system is used for analyzing the heterogeneous data source integration information; when the heterogeneous data source integrated information is analyzed, an improved harmony search optimization algorithm is used for analyzing the heterogeneous data source integrated information; a long-time memory neural network algorithm is used for diagnosing fault information in the heterogeneous data source integration process; and realize the interaction and communication of different data channels;
(S3) multi-source heterogeneous data information collection, control, communication, application, operation and maintenance, diagnosis or data display after application integration.
In the present invention, the modified harmonic search optimization algorithm is an optimization algorithm based on a markov decision process model. And optimizing the H-BIM model based on an improved HS optimization algorithm of Markov Decision Process (MDP). The HS is a group-based meta-heuristic algorithm, a group of solutions can be kept in a harmonic vector set (Harmony Memory, HM), and in training heterogeneous communication information samples in each iteration, an optimal solution is obtained through a group of optimization parameters applied to the HM, so that a new harmonic vector consisting of harmonic optimization Rate (HMCR) and harmonic high fitness (PAR) is obtained. The HS algorithm can be divided into four steps: HM initialization, generation of harmony, addition of newly generated harmony to the HM (provided its fitness is higher than the worst fitness value in the previous HM), and satisfaction of a termination criterion (e.g., maximum number of iterations). The principle of the HS algorithm is that a perfect solution determined by an objective function is sought in the heterogeneous data information optimization process, and the high-efficiency high standard of analyzing heterogeneous data information by an H-BIM model is realized through the method of optimizing a result, the objective function and the optimal solution.
In the present invention, the improved harmonic search optimization algorithm comprises the following steps:
the method comprises the following steps: defining collected multi-source heterogeneous data information:
Figure BDA0003218951380000071
in formula (1), f (x) refers to an objective function of multi-source heterogeneous integrated information evaluation; x is the number ofiRefers to variables, X, that affect the evaluation of multi-source heterogeneous integrated informationiThe method refers to a multi-source heterogeneous integrated information evaluation area range; n refers to the number of variables in the multi-source heterogeneous integrated information evaluation function; defining parameters such as harmonic vector set size, HMCR, maximum iteration number and the like required for solving an optimization problem in the HS algorithm;
step two: generation of HM: the harmony vector set is a place for storing values obtained by all solution vectors and evaluation data objective functions output during multi-source heterogeneous integrated information evaluation in each iteration, wherein randomly generated variable values influencing the multi-source heterogeneous integrated information are filled in the harmony vector set, and an output multi-source heterogeneous integrated information evaluation information formation matrix B is represented as follows:
Figure BDA0003218951380000072
step three: generation of new harmony: in this step, the new harmonic vector elements in the multi-source heterogeneous integrated information parameters are generated by updating the multi-source heterogeneous integrated information parameter elements of the HM or assigning a random value to the X multi-source heterogeneous integrated information data range applied in the second step according to the HMCR possibility; for this purpose, a random multi-source heterogeneous integration information parameter is selected between 0 and 1:
Figure BDA0003218951380000081
if the randomly generated multi-source heterogeneous integrated information parameter number corresponds to the HMCR possibility, the possibility is between 0 and 1, a new vector multi-source heterogeneous integrated information parameter can be picked up from multi-source heterogeneous integrated information parameter elements in the HM, and if the randomly generated multi-source heterogeneous integrated information parameter number does not accord with the HMCR possibility, a new multi-source heterogeneous integrated information parameter vector element is randomly selected from parameter variables influencing the range of the multi-source heterogeneous integrated information parameter data set instead of being selected from the HM;
step four: HM updating: in the stage, a heterogeneous data evaluation objective function is mainly calculated according to the value of a newly generated multi-source heterogeneous integrated information parameter solution vector; this value is then compared to the objective function value of the solution vector for the HM; if the objective function value of the newly generated solution vector is better than the objective function value, the newly established harmony solution vector replaces the worst harmony vector of the objective function value, and the worst solution vector is deleted from the HM; thus, a better solution vector is stored in the HM;
step five: repeating (3) and (4) until the termination criterion: if the criterion is met, the iterative training is ended, and the optimal vector found in the HM is used as a final solution of the multi-source heterogeneous integrated information estimation; if this criterion is not met, steps 3 and 4 are repeated.
In the invention, the MDP model is introduced in the harmony generation process. In a specific embodiment, it is assumed that the empirical sample set G of the information parameters of the heterogeneous data source generated by the MDP model is:
G=[(s,a),(s′,r)]=[G1,G2] (4)
wherein, G1 and G2 correspond to x1 and x2 respectively. Since the subsequent state function is the last finite state function of the continuation, both G1, G2 have similarities, introducing the concept of relative entropy (KL), which represents the similarities of both G1, G2 as:
Figure BDA0003218951380000082
in formula (5), P1 and Q correspond to G1 and G2, respectively. P and Q are function values in P and Q respectively, and i represents a relative entropy function independent variable evaluated by the heterogeneous data source.
As a result of the extension of equation (5), DKL is 0 if P is Q. This is because when the similarity between the generated state and action function pair and the generated subsequent state and reward function pair is very high, the relative entropy of the two is infinitely close to 0, the heterogeneous data source evaluation objective function of the MDP model will obtain the global minimum, and the quality of the trained heterogeneous data source information parameter sample is also very high.
In the invention, the long-time memory neural network algorithm is a fault diagnosis method realized based on a single LSTM block, and comprises the following steps:
(1) inputting, deleting and reading multi-source heterogeneous integrated information; multi-source heterogeneous integrated information processing is realized; information updating is continuously realized, and the information screening capability is improved; set up CtStoring information for the heterogeneous data information; f. oftRemoving information for heterogeneous data information, itFor heterogeneous data information flows into information, OtStreaming information for heterogeneous data information;
(2) calling a Sigmoid function, and calculating an output function of a single LSTM block, wherein the function formula is as follows:
Figure BDA0003218951380000091
wherein t represents different network node parameter data nodes in the neural network model, W [ i, f, C, O ] represents a parameter weight matrix in the neural network model, b [ i, f, C, O ] represents offset vectors of different node weight matrices in the neural network model, X represents input multi-source heterogeneous integrated information operation data information parameters, and Y represents multi-source heterogeneous integrated information operation fault diagnosis data output parameters;
(3) reading a single LSTM block to output storage information, wherein the output function is as follows:
Figure BDA0003218951380000092
where tanh is a hyperbolic tangent function representing a multiplication by an element calculation in a neural network node.
In the invention, the long-time memory neural network algorithm is added with a Softmax classification function, and the classification method comprises the following steps:
multi-source heterogeneous data information to be classified is passed through [ xt,yt]Representation, where different multi-source heterogeneous data information may be represented as ytE {1,2, …, K }, the softmax classification function can evaluate the data information under the application of the input multi-source heterogeneous integration information, and the probability p of occurrence under the jth application is assumed to be represented by the following formula:
Figure BDA0003218951380000093
in equation (8), θ is a parameter matrix of the probability calculated by the neural network model, θjThe data column vector is expressed as the J-th type related data column vector in the multi-source heterogeneous integrated information, then the standardized cross entropy loss function J is started to obtain the optimal value of the letter theta, and the output expression can be as follows:
Figure BDA0003218951380000094
in the formula (9), λ and M are standardized model parameters of the input function J, and in order to realize the regularization calculation requirement, the softmax classification function operates the data sample x on the multi-source heterogeneous integrated informationtThe classification method of (2) is performed by the following formula:
yt=arg max p (10)
by classifying and evaluating the multi-source heterogeneous integrated information under different applications, the rapid classification is realized, and the operation and control capacity of the multi-source heterogeneous integrated information is improved.
Example two
As shown in fig. 3-6, a data integration system of heterogeneous data sources includes:
a data source; the heterogeneous database is externally connected with a heterogeneous database and a heterogeneous data interface, is a set of various database systems under the limitation of the heterogeneous database and is used for realizing the sharing and transparent access of multi-source heterogeneous data information; the heterogeneous data interface is used for realizing information transmission or interaction among different databases;
an integration module; the system is used for integrating distributed heterogeneous data sources which are related to each other, so that a user can access the data sources in a transparent mode; the integrated module comprises an integrated control module, a channel communication analysis module, an information integration diagnosis module, a first channel conversion module, a second channel conversion module, a multi-source heterogeneous networking architecture, a router and an integrated output interface; the integrated control module is respectively connected with the channel communication analysis module, the information integration diagnosis module, the first channel conversion module and the integrated output interface, the multi-source heterogeneous networking framework is connected with the first channel conversion module through a router, the output end of the first channel conversion module is connected with the input end of the second channel conversion module, and the output end of the second channel conversion module is connected with the input end of the integrated output interface; and
an application module; the application module comprises a heterogeneous data storage control module, and a fault warning module, an operation and maintenance management module, a visual display module, a dynamic monitoring module, a fault warning module and a heterogeneous diagnosis output interface control which are connected with the heterogeneous data storage control module, wherein the fault warning module is connected with an LED lamp.
The output end of the data source is connected with the input end of the integrated module, and the output end of the integrated module is connected with the input end of the application module.
In the invention, the integrated control module is a dual-core processor comprising a DSP calculation module and an ARM calculation module, wherein the DSP calculation module is a TMS321VC5501 model-based data module, and the ARM calculation module is an S3C-44BOX model-based data module; the channel communication analysis module is an improved harmony search optimization algorithm module, and the information integration diagnosis module is a long-time memory nerveA network algorithm module; as shown in FIG. 3, a single LSTM block structure in the LONG-TIME MEMORY NEURAL NETWORK ALGORITHM module comprises a storage module CtAnd an information removing door ftInformation entry door itAnd an information outflow gate Ot
In the invention, the channel communication analysis module comprises a first program medium and an improved harmonic search optimization algorithm program arranged on the first medium; the system is used for analyzing the heterogeneous data source integration information;
the information integration diagnosis module comprises a second program medium and a long-term memory neural network algorithm program arranged on the second medium; the method is used for diagnosing fault information in the heterogeneous data source integration process.
In the invention, the first channel conversion module is provided with an SDN controller; the second channel conversion module is provided with an ASON controller.
Although specific embodiments of the present invention have been described above, it will be understood by those skilled in the art that these specific embodiments are merely illustrative and that various omissions, substitutions and changes in the form of the detail of the methods and systems described above may be made by those skilled in the art without departing from the spirit and scope of the invention. For example, it is within the scope of the present invention to combine the steps of the above-described methods to perform substantially the same function in substantially the same way to achieve substantially the same result. Accordingly, the scope of the invention is to be limited only by the following claims.

Claims (10)

1. A data integration method of heterogeneous data sources is characterized in that: the method comprises the following steps:
(S1) a plurality of data sources with different data structures, access modes and forms are collected, and information among different databases is transferred or interacted; the data acquisition modes include, but are not limited to: an SMS network, a GPRS network, a CDMA wireless network, or a fiber optic network;
(S2) integrating the distributed heterogeneous data sources with each other, so that the user can access the data sources in a transparent manner; the system is used for analyzing the heterogeneous data source integration information; when the heterogeneous data source integrated information is analyzed, an improved harmony search optimization algorithm is used for analyzing the heterogeneous data source integrated information; a long-time memory neural network algorithm is used for diagnosing fault information in the heterogeneous data source integration process; and realize the interaction and communication of different data channels;
(S3) multi-source heterogeneous data information collection, control, communication, application, operation and maintenance, diagnosis or data display after application integration.
2. The data integration method of the heterogeneous data source according to claim 1, wherein: the improved harmony search optimization algorithm is an optimization algorithm based on a Markov decision process model.
3. The data integration method of the heterogeneous data source according to claim 2, wherein: the improved harmony search optimization algorithm comprises the following steps:
the method comprises the following steps: defining collected multi-source heterogeneous data information:
Figure FDA0003218951370000011
in formula (1), f (x) refers to an objective function of multi-source heterogeneous integrated information evaluation; x is the number ofiRefers to variables, X, that affect the evaluation of multi-source heterogeneous integrated informationiThe method refers to a multi-source heterogeneous integrated information evaluation area range; n refers to the number of variables in the multi-source heterogeneous integrated information evaluation function; defining parameters such as harmonic vector set size, HMCR, maximum iteration number and the like required for solving an optimization problem in the HS algorithm;
step two: generation of HM: the harmony vector set is a place for storing values obtained by all solution vectors and evaluation data objective functions output during multi-source heterogeneous integrated information evaluation in each iteration, wherein randomly generated variable values influencing the multi-source heterogeneous integrated information are filled in the harmony vector set, and an output multi-source heterogeneous integrated information evaluation information formation matrix B is represented as follows:
Figure FDA0003218951370000012
step three: generation of new harmony: in this step, the new harmonic vector elements in the multi-source heterogeneous integrated information parameters are generated by updating the multi-source heterogeneous integrated information parameter elements of the HM or assigning a random value to the X multi-source heterogeneous integrated information data range applied in the second step according to the HMCR possibility; for this purpose, a random multi-source heterogeneous integration information parameter is selected between 0 and 1:
Figure FDA0003218951370000021
if the randomly generated multi-source heterogeneous integrated information parameter number corresponds to the HMCR possibility, the possibility is between 0 and 1, a new vector multi-source heterogeneous integrated information parameter can be picked up from multi-source heterogeneous integrated information parameter elements in the HM, and if the randomly generated multi-source heterogeneous integrated information parameter number does not accord with the HMCR possibility, a new multi-source heterogeneous integrated information parameter vector element is randomly selected from parameter variables influencing the range of the multi-source heterogeneous integrated information parameter data set instead of being selected from the HM;
step four: HM updating: in the stage, a heterogeneous data evaluation objective function is mainly calculated according to the value of a newly generated multi-source heterogeneous integrated information parameter solution vector; this value is then compared to the objective function value of the solution vector for the HM; if the objective function value of the newly generated solution vector is better than the objective function value, the newly established harmony solution vector replaces the worst harmony vector of the objective function value, and the worst solution vector is deleted from the HM; thus, a better solution vector is stored in the HM;
step five: repeating (3) and (4) until the termination criterion: if the criterion is met, the iterative training is ended, and the optimal vector found in the HM is used as a final solution of the multi-source heterogeneous integrated information estimation; if this criterion is not met, steps 3 and 4 are repeated.
4. The data integration method of the heterogeneous data source according to claim 3, wherein: and introducing an MDP model in the harmony generation process.
5. The data integration method of the heterogeneous data source according to claim 1, wherein: the long-time memory neural network algorithm is a fault diagnosis method realized based on a single LSTM block, and comprises the following steps:
(1) inputting, deleting and reading multi-source heterogeneous integrated information; multi-source heterogeneous integrated information processing is realized; information updating is continuously realized, and the information screening capability is improved; set up CtStoring information for the heterogeneous data information; f. oftRemoving information for heterogeneous data information, itFor heterogeneous data information flows into information, OtStreaming information for heterogeneous data information;
(2) calling a Sigmoid function, and calculating an output function of a single LSTM block, wherein the function formula is as follows:
Figure FDA0003218951370000022
wherein t represents different network node parameter data nodes in the neural network model, W [ i, f, C, O ] represents a parameter weight matrix in the neural network model, b [ i, f, C, O ] represents offset vectors of different node weight matrices in the neural network model, X represents input multi-source heterogeneous integrated information operation data information parameters, and Y represents multi-source heterogeneous integrated information operation fault diagnosis data output parameters;
(3) reading a single LSTM block to output storage information, wherein the output function is as follows:
Figure FDA0003218951370000031
where tanh is a hyperbolic tangent function representing a multiplication by an element calculation in a neural network node.
6. The data integration method of the heterogeneous data source according to claim 5, wherein: the long-time memory neural network algorithm is added with a Softmax classification function, and the classification method comprises the following steps:
multi-source heterogeneous data information to be classified is passed through [ xt,yt]Representation, where different multi-source heterogeneous data information may be represented as ytE {1,2, …, K }, the softmax classification function can evaluate the data information under the application of the input multi-source heterogeneous integration information, and the probability p of occurrence under the jth application is assumed to be represented by the following formula:
Figure FDA0003218951370000032
in equation (6), θ is a parameter matrix of the probability calculated by the neural network model, θjThe data column vector is expressed as the J-th type related data column vector in the multi-source heterogeneous integrated information, then the standardized cross entropy loss function J is started to obtain the optimal value of the letter theta, and the output expression can be as follows:
Figure FDA0003218951370000033
in formula (7), where λ and M are the normalized model parameters of the input function J, to achieve the regularization computation requirement, the softmax classification function operates on the multi-source heterogeneous integrated information running data sample xtThe classification method of (2) is performed by the following formula:
yt=arg max p (8)
by classifying and evaluating the multi-source heterogeneous integrated information under different applications, the rapid classification is realized, and the operation and control capacity of the multi-source heterogeneous integrated information is improved.
7. A data integration system for heterogeneous data sources, comprising: the method comprises the following steps:
a data source; the heterogeneous database is externally connected with a heterogeneous database and a heterogeneous data interface, is a set of various database systems under the limitation of the heterogeneous database and is used for realizing the sharing and transparent access of multi-source heterogeneous data information; the heterogeneous data interface is used for realizing information transmission or interaction among different databases;
an integration module; the system is used for integrating distributed heterogeneous data sources which are related to each other, so that a user can access the data sources in a transparent mode; the integrated module comprises an integrated control module, a channel communication analysis module, an information integration diagnosis module, a first channel conversion module, a second channel conversion module, a multi-source heterogeneous networking architecture, a router and an integrated output interface; the integrated control module is respectively connected with the channel communication analysis module, the information integration diagnosis module, the first channel conversion module and the integrated output interface, the multi-source heterogeneous networking framework is connected with the first channel conversion module through a router, the output end of the first channel conversion module is connected with the input end of the second channel conversion module, and the output end of the second channel conversion module is connected with the input end of the integrated output interface; and
an application module; the application module comprises a heterogeneous data storage control module, and a fault warning module, an operation and maintenance management module, a visual display module, a dynamic monitoring module, a fault warning module and a heterogeneous diagnosis output interface control which are connected with the heterogeneous data storage control module, wherein the fault warning module is connected with an LED lamp;
the output end of the data source is connected with the input end of the integrated module, and the output end of the integrated module is connected with the input end of the application module.
8. The data integration system of heterogeneous data sources of claim 7, wherein: the integrated control module is a dual-core processor with a DSP computing module and an ARM computing module, wherein the DSP computing module is a data module based on a TMS321VC5501 model, and the ARM computing module is based on an S3C-44BOX modelA data module; the channel communication analysis module is an improved harmony search optimization algorithm module, and the information integration diagnosis module is a long-time memory neural network algorithm module; the single LSTM block structure in the long-time memory neural network algorithm module comprises a storage module CtAnd an information removing door ftInformation entry door itAnd an information outflow gate Ot
9. The data integration system of heterogeneous data sources of claim 7, wherein:
the channel communication analysis module comprises a first program medium and an improved harmonic search optimization algorithm program arranged on the first medium; the system is used for analyzing the heterogeneous data source integration information;
the information integration diagnosis module comprises a second program medium and a long-time memory neural network algorithm program arranged on the second medium; the method is used for diagnosing fault information in the heterogeneous data source integration process.
10. The data integration system of heterogeneous data sources of claim 7, wherein: the first channel conversion module is provided with an SDN controller; the second channel conversion module is provided with an ASON controller.
CN202110952257.7A 2021-08-19 2021-08-19 Data integration method and system of heterogeneous data sources Active CN113656480B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110952257.7A CN113656480B (en) 2021-08-19 2021-08-19 Data integration method and system of heterogeneous data sources

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110952257.7A CN113656480B (en) 2021-08-19 2021-08-19 Data integration method and system of heterogeneous data sources

Publications (2)

Publication Number Publication Date
CN113656480A true CN113656480A (en) 2021-11-16
CN113656480B CN113656480B (en) 2024-09-24

Family

ID=78481186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110952257.7A Active CN113656480B (en) 2021-08-19 2021-08-19 Data integration method and system of heterogeneous data sources

Country Status (1)

Country Link
CN (1) CN113656480B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118014065A (en) * 2024-01-30 2024-05-10 新疆泽智信息技术有限公司 Multi-mode heterogeneous admission data integration method based on knowledge graph

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104168569A (en) * 2014-07-15 2014-11-26 哈尔滨工程大学 Dynamic frequency spectrum distribution method of cognitive heterogeneous network
CN105163325A (en) * 2015-09-25 2015-12-16 重庆工商大学 Heterogeneous directed sensor network deployment method
CN111098312A (en) * 2018-12-12 2020-05-05 广东鼎义互联科技股份有限公司 Window government affairs service robot
US20210209388A1 (en) * 2020-01-06 2021-07-08 The Research Foundation For The State University Of New York Fakecatcher: detection of synthetic portrait videos using biological signals

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104168569A (en) * 2014-07-15 2014-11-26 哈尔滨工程大学 Dynamic frequency spectrum distribution method of cognitive heterogeneous network
CN105163325A (en) * 2015-09-25 2015-12-16 重庆工商大学 Heterogeneous directed sensor network deployment method
CN111098312A (en) * 2018-12-12 2020-05-05 广东鼎义互联科技股份有限公司 Window government affairs service robot
US20210209388A1 (en) * 2020-01-06 2021-07-08 The Research Foundation For The State University Of New York Fakecatcher: detection of synthetic portrait videos using biological signals

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118014065A (en) * 2024-01-30 2024-05-10 新疆泽智信息技术有限公司 Multi-mode heterogeneous admission data integration method based on knowledge graph

Also Published As

Publication number Publication date
CN113656480B (en) 2024-09-24

Similar Documents

Publication Publication Date Title
Li et al. Random search and reproducibility for neural architecture search
CN113905391B (en) Integrated learning network traffic prediction method, system, equipment, terminal and medium
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
Liu et al. A survey on computationally efficient neural architecture search
CN112686376A (en) Node representation method based on timing diagram neural network and incremental learning method
CN113808396A (en) Traffic speed prediction method and system based on traffic flow data fusion
Bi et al. Large-scale network traffic prediction with LSTM and temporal convolutional networks
CN117034100A (en) Self-adaptive graph classification method, system, equipment and medium based on hierarchical pooling architecture
CN117786602A (en) Long-period multi-element time sequence prediction method based on multi-element information interaction
CN113656480A (en) Data integration method and system for heterogeneous data source
CN116542701A (en) Carbon price prediction method and system based on CNN-LSTM combination model
WO2023274213A1 (en) Data processing method and related apparatus
CN114463596A (en) Small sample image identification method, device and equipment of hypergraph neural network
CN117421657B (en) Method and system for screening and learning picture samples with noise labels based on oversampling strategy
CN117913808A (en) Distributed photovoltaic power generation prediction method and device
CN116992940A (en) SAR image multi-type target detection light-weight method and device combining channel pruning and knowledge distillation
Hao et al. Architecture self-attention mechanism: Nonlinear optimization for neural architecture search
CN112699271B (en) Recommendation method for improving retention time of user video website
CN115904728A (en) Memory consumption value estimation method and device, terminal equipment and storage medium
CN115081609A (en) Acceleration method in intelligent decision, terminal equipment and storage medium
CN114818945A (en) Small sample image classification method and device integrating category adaptive metric learning
CN114401496A (en) Video information rapid processing method based on 5G edge calculation
CN111382191A (en) Machine learning identification method based on deep learning
Narkhede et al. Towards compressed and efficient CNN architectures via pruning
CN117151229B (en) Cloud reasoning method and system based on cloud side architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant