CN108647266A

CN108647266A - A kind of isomeric data is quickly distributed storage, exchange method

Info

Publication number: CN108647266A
Application number: CN201810399691.5A
Authority: CN
Inventors: 陈新碧
Original assignee: Chongqing Bazemun Zhe Zhe Network Technology Co Ltd
Current assignee: Chongqing Bazemun Zhe Zhe Network Technology Co Ltd
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2018-10-12

Abstract

A kind of isomeric data is quickly distributed storage, exchange method, data dispersion is stored in more independent equipment, using expansible system structure, shares storage load using more storage servers, this not only increases the reliability, availability and access efficiency of system, is also easy to extend；The Optimizing Queries algorithm that the present invention uses uses keyword count sort strategy, shortens query time.

Description

A kind of isomeric data is quickly distributed storage, exchange method

Technical field

The present invention relates to technical field of data processing, and in particular to a kind of isomeric data distribution storage, real-time, interactive processing Method.

Background technology

In IT application in enterprise process, a large amount of functional application is integrated in enterprise information portal system, needs Centralized and unified management is carried out to it, to meet the needs of shared data application.But exist between a large amount of functional application more Class difference is embodied in development language, development platform, operating system, data base management system, network communication protocol etc..Its In, database difference is relatively prominent, and different system data source and application demand result in the otherness in data structure, due to Heterogeneous database is different with the mode of data sharing in data access, can not realize the Real-Time Sharing between data well, because How this, realize that isomeric data distribution storage, real-time, interactive processing are current technology problems.

Invention content

It is an object of the invention to provide a kind of isomeric datas to be quickly distributed storage, exchange method, it can solve isomery The distribution storage of data and real-time, interactive process problem realize the Real-Time Sharing between data.

It realizes, is as follows the purpose of the present invention is technical solution in this way：

1) isomeric data, deposit data center caching are split, and data class is numbered in the buffer；

2) de-redundancy processing is carried out to the floor data in caching；

3) amount for calculating each data accounts for the proportion of total amount of dataΣP_i=1, wherein S_iFor certain class data volume, S For total amount of data；

4) setting threshold vector P₁′,P₂′,...,P_i', 0 ＜ P₁' ＜ ... ＜ P_n' ＜ 1, and set n₁, n₂..., n_k's Value, wherein n₁, n₂..., n_kIt is the integer more than 0, wherein threshold vector and n_iDifferent numbers is set according to actual needs And numerical values recited；

5) compare P_iWith P₁′,P₂′,...,P_i' size：If P_i＜ P₁', then n₁Kind data deposit is same from server； If P₁' ＜ P_i＜ P₂' then n₂Kind data deposit is same from server；And so on, if P_i＞ P_i', then this kind of data are stored in n_k It is a from server；

6) according to the storage address of distributed storage data, isomery concordance list is established；

7) inquiry request that user terminal is sent is received, searched targets content is subjected to keyword extraction；

8) according to the position where isomery concordance list step by step searching keyword；

9) information after inquiry is distributed in database configuration information to corresponding datum number storage is according to library, from corresponding number According to data needed for extraction in storage database；

10) by the required data summarization extracted in step 9) and user terminal is returned to.

Further, isomery concordance list is established described in step 6) to be as follows：

Keyword 6-1) is extracted to new data set, and is pre-processed to obtain inquiry meter of the keyword in the data set Number；

Keyword is sorted from small to large by query counts 6-2), and forms count table；

6-3) based on count table, structure index forms concordance list step by step, and every grade of concordance list includes corresponding keyword And its corresponding data object information；

The mapping relations for 6-4) establishing concordance list and source database can get data position letter according to index information Breath.

Further, described in step 8) according to the specific steps of the position where isomery concordance list step by step searching keyword such as Under：

8-1) keywords database of the inquiry request of user terminal and index is mapped, original inquiry, which is mapped to target, looks into It askes；

8-2) to the keyword in inquiry according to counting size sequence in count table；

8-3) the keyword being successively read from small to large in inquiry by size is counted, is looked into step by step from up to down in concordance list It askes, finds matched keyword.

Further, the required data summarization that is extracted described in step 10) and the specific method for returning to user terminal is：

Required data being extracted from corresponding data set according to data mapping relations and being summarized, the data of extraction are converted to Required data format, returns to user terminal.

Further, data class is numbered in the buffer described in step 1) and is as follows：

1-1) collected industrial system initial data is pre-processed, i.e., original floor data split, counted According to legitimate verification, the extraction of different data logic association and Data Format Transform；

1-2) pretreated floor data is stored in and is cached；

1-3) data class is numbered in the buffer.

Further, de-redundancy processing is carried out to the floor data in caching described in step 2) to be as follows：

2-1) by the way that data priority is set in advance, the non-critical information in floor data is filtered out, they are lost Abandon processing；

2-2) extract the repeated public information of floor data；

2-3) lossless compression algorithm is used to carry out compression processing to floor data.

Further, data are stored according to data temperature, diversiform data can correspond to together from server described in step 5) Back end memory space, is divided by temperature that high speed capability is small, fast capacity is medium, these three big layers of middling speed capacity by one node It is secondary；When fresh data updates, the first order is put into recent renewal by certain the number of minutes or accesses most frequent data, the second level It is put into recent renewal by certain number of days or accesses most frequent data, the third level is put into more by the time cycle arranged in advance Data new or that access is most frequent；The data temperature, the visiting frequency according to industrial process floor data and access time It determines.

By adopting the above-described technical solution, the present invention has the advantage that：

The distributed memory system of the present invention is that data dispersion is stored in more independent equipment, using expansible System structure shares storage load using more storage servers, this not only increases the reliability, availability and access of system Efficiency is also easy to extend.Real-time, interactive processing method can improve data-handling efficiency, can realize processing in real time；Using key Word count sort strategy, saves data space and calculation amount, shortens the time of index construct；It is looked into using data hierarchy It askes, efficiency data query is improved using query counts；The memory database system of structure is by memory database and data in magnetic disk Library efficiently combines, and the difference of memory database is made up with disk database, while will be interrelated between the two, is promoted entire The real-time of system and the operation load for reducing system.

Other advantages, target and the feature of the present invention will be illustrated in the following description to a certain extent, and And to a certain extent, based on will be apparent to those skilled in the art to investigating hereafter, Huo Zheke To be instructed from the practice of the present invention.The target and other advantages of the present invention can be wanted by following specification and right Book is sought to realize and obtain.

Description of the drawings

The description of the drawings of the present invention is as follows.

Fig. 1 is the configuration diagram of the present invention；

Fig. 2 is that Stored Procedure schematic diagram is shown in present invention distribution.

Specific implementation mode

The invention will be further described with reference to the accompanying drawings and examples.

A kind of isomeric data is quickly distributed storage, exchange method, is as follows：

2) de-redundancy processing is carried out to the floor data in caching；

5) compare P_iWith P₁′,P₂′,...,P_i' size：If P_i＜ P₁', then n₁Kind data deposit is same from server； If P₁' ＜ P_i＜ P₂', then n₂Kind data deposit is same from server；And so on, if P_i＞ P_i', then this kind of data are stored in n_k It is a from server；

Isomery concordance list is established to be as follows：

Keyword 6-1) is extracted to new data set, obtains keyword set；

Each keyword in keyword set is scanned on new data set 6-2), obtains the inquiry meter of keyword Number；

Keyword is sorted from small to large by query counts 6-3), and gives each keyword label in order；

6-4) according to keyword counting sequence builds last layer node, and structure index forms concordance list, every grade of concordance list step by step Including corresponding keyword and its corresponding data object information；

The mapping relations for 6-5) establishing concordance list and source database can get data position letter according to index information Breath.

7) inquiry request is received, searched targets content is subjected to keyword extraction；

Position where searching keyword is as follows：

8-1) keywords database of the searching keyword of user terminal and index is mapped, original inquiry, which is mapped to target, looks into It askes；

Count table 8-2) is retrieved, keyword query counting sequence number is obtained；

Keyword 8-3) being successively read from small to large by counting sequence number in inquiry, in concordance list from up to down step by step Inquiry, finds matched keyword.

10) by the required data summarization extracted in step 9) and user terminal is returned to；

It is as follows：

10-1) by the data summarization of extraction, and extensible markup language is used to encapsulate data for the document of unified format, Return to user terminal；

10-2) user terminal parses document content, and is converted to required data format.

Optimizing Queries algorithm of the present invention uses keyword count sort strategy, shortens query time；To isomeric data into Row is split and distributed storage, improves data processing speed.

Finally illustrate, the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although with reference to compared with Good embodiment describes the invention in detail, it will be understood by those of ordinary skill in the art that, it can be to the skill of the present invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of the technical program, should all be covered in the present invention Right in.

Claims

1. a kind of isomeric data is quickly distributed storage, exchange method, which is characterized in that be as follows：

2) de-redundancy processing is carried out to the floor data in caching；

3) amount for calculating each data accounts for the proportion of total amount of data∑P_i=1, wherein S_iFor certain class data volume, S is total Data volume；

4) setting threshold vector P '₁,P′₂,...,P′_i, 0 ＜ P '₁＜ ... ＜ P '_n＜ 1, and set n₁, n₂..., n_kValue, Middle n₁, n₂..., n_kIt is the integer more than 0, wherein threshold vector and n_iDifferent numbers and numerical value are set according to actual needs Size；

5) compare P_iWith P '₁,P′₂,...,P′_iSize：If P_i＜ P '₁, then n₁Kind data deposit is same from server；If P '₁ ＜ P_i＜ P '₂, then n₂Kind data deposit is same from server；And so on, if P_i＞ P '_i, then this kind of data be stored in n_kIt is a from In server；

9) information after inquiry corresponding datum number storage in database configuration information is distributed to deposit from corresponding data according to library Store up data needed for being extracted in database；

2. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that：In step 6) The isomery concordance list of establishing is as follows：

Keyword 6-1) is extracted to new data set, and is pre-processed to obtain query counts of the keyword in the data set；

6-3) based on count table, structure index forms concordance list step by step, every grade of concordance list include corresponding keyword and its Corresponding data object information；

The mapping relations for 6-4) establishing concordance list and source database can get data position information according to index information.

3. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that：In step 8) The position according to where isomery concordance list step by step searching keyword is as follows：

8-1) keywords database of the inquiry request of user terminal and index is mapped, original inquiry is mapped to target query；

8-3) the keyword being successively read from small to large in inquiry by size is counted, is inquired step by step from up to down in concordance list, Find matched keyword.

4. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that：In step 10) The required data summarization of the extraction and specific method for returning to user terminal is：

Required data are extracted from corresponding data set according to data mapping relations and summarized, the data of extraction are converted to required Data format, return to user terminal.

5. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that：In step 1) Described data class is numbered in the buffer is as follows：

1-1) collected industrial system initial data is pre-processed, i.e., original floor data is split, data are closed Method verification, the extraction of different data logic association and Data Format Transform；

1-2) pretreated floor data is stored in and is cached；

1-3) data class is numbered in the buffer.

6. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that：In step 2) Floor data in described pair of caching carries out de-redundancy processing and is as follows：

2-1) by the way that data priority is set in advance, the non-critical information in floor data is filtered out, they are carried out at discarding Reason；

2-2) extract the repeated public information of floor data；

7. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that：In step 5) Described to store data according to data temperature from server, diversiform data can correspond to same node, back end be stored empty Between be divided into that high speed capability is small, fast capacity is medium, these three big levels of middling speed capacity by temperature；When fresh data updates, first Grade is put into recent renewal by certain the number of minutes or accesses most frequent data, and the second level is put into recently more by certain number of days The most frequent data third level is newly either accessed to be put into update by the time cycle arranged in advance or access most frequent number According to；The data temperature is determined according to the visiting frequency of industrial process floor data and access time.