CN108647266A - A kind of isomeric data is quickly distributed storage, exchange method - Google Patents
A kind of isomeric data is quickly distributed storage, exchange method Download PDFInfo
- Publication number
- CN108647266A CN108647266A CN201810399691.5A CN201810399691A CN108647266A CN 108647266 A CN108647266 A CN 108647266A CN 201810399691 A CN201810399691 A CN 201810399691A CN 108647266 A CN108647266 A CN 108647266A
- Authority
- CN
- China
- Prior art keywords
- data
- keyword
- distributed storage
- concordance list
- isomeric
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of isomeric data is quickly distributed storage, exchange method, data dispersion is stored in more independent equipment, using expansible system structure, shares storage load using more storage servers, this not only increases the reliability, availability and access efficiency of system, is also easy to extend;The Optimizing Queries algorithm that the present invention uses uses keyword count sort strategy, shortens query time.
Description
Technical field
The present invention relates to technical field of data processing, and in particular to a kind of isomeric data distribution storage, real-time, interactive processing
Method.
Background technology
In IT application in enterprise process, a large amount of functional application is integrated in enterprise information portal system, needs
Centralized and unified management is carried out to it, to meet the needs of shared data application.But exist between a large amount of functional application more
Class difference is embodied in development language, development platform, operating system, data base management system, network communication protocol etc..Its
In, database difference is relatively prominent, and different system data source and application demand result in the otherness in data structure, due to
Heterogeneous database is different with the mode of data sharing in data access, can not realize the Real-Time Sharing between data well, because
How this, realize that isomeric data distribution storage, real-time, interactive processing are current technology problems.
Invention content
It is an object of the invention to provide a kind of isomeric datas to be quickly distributed storage, exchange method, it can solve isomery
The distribution storage of data and real-time, interactive process problem realize the Real-Time Sharing between data.
It realizes, is as follows the purpose of the present invention is technical solution in this way:
1) isomeric data, deposit data center caching are split, and data class is numbered in the buffer;
2) de-redundancy processing is carried out to the floor data in caching;
3) amount for calculating each data accounts for the proportion of total amount of dataΣPi=1, wherein SiFor certain class data volume, S
For total amount of data;
4) setting threshold vector P1′,P2′,...,Pi', 0 < P1' < ... < Pn' < 1, and set n1, n2..., nk's
Value, wherein n1, n2..., nkIt is the integer more than 0, wherein threshold vector and niDifferent numbers is set according to actual needs
And numerical values recited;
5) compare PiWith P1′,P2′,...,Pi' size:If Pi< P1', then n1Kind data deposit is same from server;
If P1' < Pi< P2' then n2Kind data deposit is same from server;And so on, if Pi> Pi', then this kind of data are stored in nk
It is a from server;
6) according to the storage address of distributed storage data, isomery concordance list is established;
7) inquiry request that user terminal is sent is received, searched targets content is subjected to keyword extraction;
8) according to the position where isomery concordance list step by step searching keyword;
9) information after inquiry is distributed in database configuration information to corresponding datum number storage is according to library, from corresponding number
According to data needed for extraction in storage database;
10) by the required data summarization extracted in step 9) and user terminal is returned to.
Further, isomery concordance list is established described in step 6) to be as follows:
Keyword 6-1) is extracted to new data set, and is pre-processed to obtain inquiry meter of the keyword in the data set
Number;
Keyword is sorted from small to large by query counts 6-2), and forms count table;
6-3) based on count table, structure index forms concordance list step by step, and every grade of concordance list includes corresponding keyword
And its corresponding data object information;
The mapping relations for 6-4) establishing concordance list and source database can get data position letter according to index information
Breath.
Further, described in step 8) according to the specific steps of the position where isomery concordance list step by step searching keyword such as
Under:
8-1) keywords database of the inquiry request of user terminal and index is mapped, original inquiry, which is mapped to target, looks into
It askes;
8-2) to the keyword in inquiry according to counting size sequence in count table;
8-3) the keyword being successively read from small to large in inquiry by size is counted, is looked into step by step from up to down in concordance list
It askes, finds matched keyword.
Further, the required data summarization that is extracted described in step 10) and the specific method for returning to user terminal is:
Required data being extracted from corresponding data set according to data mapping relations and being summarized, the data of extraction are converted to
Required data format, returns to user terminal.
Further, data class is numbered in the buffer described in step 1) and is as follows:
1-1) collected industrial system initial data is pre-processed, i.e., original floor data split, counted
According to legitimate verification, the extraction of different data logic association and Data Format Transform;
1-2) pretreated floor data is stored in and is cached;
1-3) data class is numbered in the buffer.
Further, de-redundancy processing is carried out to the floor data in caching described in step 2) to be as follows:
2-1) by the way that data priority is set in advance, the non-critical information in floor data is filtered out, they are lost
Abandon processing;
2-2) extract the repeated public information of floor data;
2-3) lossless compression algorithm is used to carry out compression processing to floor data.
Further, data are stored according to data temperature, diversiform data can correspond to together from server described in step 5)
Back end memory space, is divided by temperature that high speed capability is small, fast capacity is medium, these three big layers of middling speed capacity by one node
It is secondary;When fresh data updates, the first order is put into recent renewal by certain the number of minutes or accesses most frequent data, the second level
It is put into recent renewal by certain number of days or accesses most frequent data, the third level is put into more by the time cycle arranged in advance
Data new or that access is most frequent;The data temperature, the visiting frequency according to industrial process floor data and access time
It determines.
By adopting the above-described technical solution, the present invention has the advantage that:
The distributed memory system of the present invention is that data dispersion is stored in more independent equipment, using expansible
System structure shares storage load using more storage servers, this not only increases the reliability, availability and access of system
Efficiency is also easy to extend.Real-time, interactive processing method can improve data-handling efficiency, can realize processing in real time;Using key
Word count sort strategy, saves data space and calculation amount, shortens the time of index construct;It is looked into using data hierarchy
It askes, efficiency data query is improved using query counts;The memory database system of structure is by memory database and data in magnetic disk
Library efficiently combines, and the difference of memory database is made up with disk database, while will be interrelated between the two, is promoted entire
The real-time of system and the operation load for reducing system.
Other advantages, target and the feature of the present invention will be illustrated in the following description to a certain extent, and
And to a certain extent, based on will be apparent to those skilled in the art to investigating hereafter, Huo Zheke
To be instructed from the practice of the present invention.The target and other advantages of the present invention can be wanted by following specification and right
Book is sought to realize and obtain.
Description of the drawings
The description of the drawings of the present invention is as follows.
Fig. 1 is the configuration diagram of the present invention;
Fig. 2 is that Stored Procedure schematic diagram is shown in present invention distribution.
Specific implementation mode
The invention will be further described with reference to the accompanying drawings and examples.
A kind of isomeric data is quickly distributed storage, exchange method, is as follows:
1) isomeric data, deposit data center caching are split, and data class is numbered in the buffer;
2) de-redundancy processing is carried out to the floor data in caching;
3) amount for calculating each data accounts for the proportion of total amount of dataΣPi=1, wherein SiFor certain class data volume, S
For total amount of data;
4) setting threshold vector P1′,P2′,...,Pi', 0 < P1' < ... < Pn' < 1, and set n1, n2..., nk's
Value, wherein n1, n2..., nkIt is the integer more than 0, wherein threshold vector and niDifferent numbers is set according to actual needs
And numerical values recited;
5) compare PiWith P1′,P2′,...,Pi' size:If Pi< P1', then n1Kind data deposit is same from server;
If P1' < Pi< P2', then n2Kind data deposit is same from server;And so on, if Pi> Pi', then this kind of data are stored in nk
It is a from server;
6) according to the storage address of distributed storage data, isomery concordance list is established;
Isomery concordance list is established to be as follows:
Keyword 6-1) is extracted to new data set, obtains keyword set;
Each keyword in keyword set is scanned on new data set 6-2), obtains the inquiry meter of keyword
Number;
Keyword is sorted from small to large by query counts 6-3), and gives each keyword label in order;
6-4) according to keyword counting sequence builds last layer node, and structure index forms concordance list, every grade of concordance list step by step
Including corresponding keyword and its corresponding data object information;
The mapping relations for 6-5) establishing concordance list and source database can get data position letter according to index information
Breath.
7) inquiry request is received, searched targets content is subjected to keyword extraction;
8) according to the position where isomery concordance list step by step searching keyword;
Position where searching keyword is as follows:
8-1) keywords database of the searching keyword of user terminal and index is mapped, original inquiry, which is mapped to target, looks into
It askes;
Count table 8-2) is retrieved, keyword query counting sequence number is obtained;
Keyword 8-3) being successively read from small to large by counting sequence number in inquiry, in concordance list from up to down step by step
Inquiry, finds matched keyword.
7) inquiry request that user terminal is sent is received, searched targets content is subjected to keyword extraction;
8) according to the position where isomery concordance list step by step searching keyword;
9) information after inquiry is distributed in database configuration information to corresponding datum number storage is according to library, from corresponding number
According to data needed for extraction in storage database;
10) by the required data summarization extracted in step 9) and user terminal is returned to;
It is as follows:
10-1) by the data summarization of extraction, and extensible markup language is used to encapsulate data for the document of unified format,
Return to user terminal;
10-2) user terminal parses document content, and is converted to required data format.
Optimizing Queries algorithm of the present invention uses keyword count sort strategy, shortens query time;To isomeric data into
Row is split and distributed storage, improves data processing speed.
Finally illustrate, the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although with reference to compared with
Good embodiment describes the invention in detail, it will be understood by those of ordinary skill in the art that, it can be to the skill of the present invention
Art scheme is modified or replaced equivalently, and without departing from the objective and range of the technical program, should all be covered in the present invention
Right in.
Claims (7)
1. a kind of isomeric data is quickly distributed storage, exchange method, which is characterized in that be as follows:
1) isomeric data, deposit data center caching are split, and data class is numbered in the buffer;
2) de-redundancy processing is carried out to the floor data in caching;
3) amount for calculating each data accounts for the proportion of total amount of data∑Pi=1, wherein SiFor certain class data volume, S is total
Data volume;
4) setting threshold vector P '1,P′2,...,P′i, 0 < P '1< ... < P 'n< 1, and set n1, n2..., nkValue,
Middle n1, n2..., nkIt is the integer more than 0, wherein threshold vector and niDifferent numbers and numerical value are set according to actual needs
Size;
5) compare PiWith P '1,P′2,...,P′iSize:If Pi< P '1, then n1Kind data deposit is same from server;If P '1
< Pi< P '2, then n2Kind data deposit is same from server;And so on, if Pi> P 'i, then this kind of data be stored in nkIt is a from
In server;
6) according to the storage address of distributed storage data, isomery concordance list is established;
7) inquiry request that user terminal is sent is received, searched targets content is subjected to keyword extraction;
8) according to the position where isomery concordance list step by step searching keyword;
9) information after inquiry corresponding datum number storage in database configuration information is distributed to deposit from corresponding data according to library
Store up data needed for being extracted in database;
10) by the required data summarization extracted in step 9) and user terminal is returned to.
2. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 6)
The isomery concordance list of establishing is as follows:
Keyword 6-1) is extracted to new data set, and is pre-processed to obtain query counts of the keyword in the data set;
Keyword is sorted from small to large by query counts 6-2), and forms count table;
6-3) based on count table, structure index forms concordance list step by step, every grade of concordance list include corresponding keyword and its
Corresponding data object information;
The mapping relations for 6-4) establishing concordance list and source database can get data position information according to index information.
3. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 8)
The position according to where isomery concordance list step by step searching keyword is as follows:
8-1) keywords database of the inquiry request of user terminal and index is mapped, original inquiry is mapped to target query;
8-2) to the keyword in inquiry according to counting size sequence in count table;
8-3) the keyword being successively read from small to large in inquiry by size is counted, is inquired step by step from up to down in concordance list,
Find matched keyword.
4. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 10)
The required data summarization of the extraction and specific method for returning to user terminal is:
Required data are extracted from corresponding data set according to data mapping relations and summarized, the data of extraction are converted to required
Data format, return to user terminal.
5. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 1)
Described data class is numbered in the buffer is as follows:
1-1) collected industrial system initial data is pre-processed, i.e., original floor data is split, data are closed
Method verification, the extraction of different data logic association and Data Format Transform;
1-2) pretreated floor data is stored in and is cached;
1-3) data class is numbered in the buffer.
6. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 2)
Floor data in described pair of caching carries out de-redundancy processing and is as follows:
2-1) by the way that data priority is set in advance, the non-critical information in floor data is filtered out, they are carried out at discarding
Reason;
2-2) extract the repeated public information of floor data;
2-3) lossless compression algorithm is used to carry out compression processing to floor data.
7. a kind of isomeric data as described in claim 1 is quickly distributed storage, exchange method, it is characterised in that:In step 5)
Described to store data according to data temperature from server, diversiform data can correspond to same node, back end be stored empty
Between be divided into that high speed capability is small, fast capacity is medium, these three big levels of middling speed capacity by temperature;When fresh data updates, first
Grade is put into recent renewal by certain the number of minutes or accesses most frequent data, and the second level is put into recently more by certain number of days
The most frequent data third level is newly either accessed to be put into update by the time cycle arranged in advance or access most frequent number
According to;The data temperature is determined according to the visiting frequency of industrial process floor data and access time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810399691.5A CN108647266A (en) | 2018-04-28 | 2018-04-28 | A kind of isomeric data is quickly distributed storage, exchange method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810399691.5A CN108647266A (en) | 2018-04-28 | 2018-04-28 | A kind of isomeric data is quickly distributed storage, exchange method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108647266A true CN108647266A (en) | 2018-10-12 |
Family
ID=63748529
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810399691.5A Withdrawn CN108647266A (en) | 2018-04-28 | 2018-04-28 | A kind of isomeric data is quickly distributed storage, exchange method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108647266A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492004A (en) * | 2018-10-29 | 2019-03-19 | 广东开放大学(广东理工职业学院) | A kind of number fishery isomeric data storage method, system and device |
CN111026721A (en) * | 2019-11-12 | 2020-04-17 | 上海麦克风文化传媒有限公司 | Temperature data storage method |
CN113254427A (en) * | 2021-07-15 | 2021-08-13 | 深圳市同富信息技术有限公司 | Database expansion method and device |
CN115934794A (en) * | 2022-11-30 | 2023-04-07 | 二十一世纪空间技术应用股份有限公司 | Elastic management method for mass multi-source heterogeneous remote sensing space data query |
CN116303833A (en) * | 2023-05-18 | 2023-06-23 | 联通沃音乐文化有限公司 | OLAP-based vectorized data hybrid storage method |
-
2018
- 2018-04-28 CN CN201810399691.5A patent/CN108647266A/en not_active Withdrawn
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492004A (en) * | 2018-10-29 | 2019-03-19 | 广东开放大学(广东理工职业学院) | A kind of number fishery isomeric data storage method, system and device |
CN111026721A (en) * | 2019-11-12 | 2020-04-17 | 上海麦克风文化传媒有限公司 | Temperature data storage method |
CN113254427A (en) * | 2021-07-15 | 2021-08-13 | 深圳市同富信息技术有限公司 | Database expansion method and device |
CN115934794A (en) * | 2022-11-30 | 2023-04-07 | 二十一世纪空间技术应用股份有限公司 | Elastic management method for mass multi-source heterogeneous remote sensing space data query |
CN115934794B (en) * | 2022-11-30 | 2024-05-24 | 二十一世纪空间技术应用股份有限公司 | Elastic management method for massive multi-source heterogeneous remote sensing space data query |
CN116303833A (en) * | 2023-05-18 | 2023-06-23 | 联通沃音乐文化有限公司 | OLAP-based vectorized data hybrid storage method |
CN116303833B (en) * | 2023-05-18 | 2023-07-21 | 联通沃音乐文化有限公司 | OLAP-based vectorized data hybrid storage method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Packet forwarding in named data networking requirements and survey of solutions | |
CN108647266A (en) | A kind of isomeric data is quickly distributed storage, exchange method | |
US10423626B2 (en) | Systems and methods for data conversion and comparison | |
US10430433B2 (en) | Systems and methods for data conversion and comparison | |
CN104820714B (en) | Magnanimity tile small documents memory management method based on hadoop | |
Quan et al. | TB2F: Tree-bitmap and bloom-filter for a scalable and efficient name lookup in content-centric networking | |
CN100505762C (en) | Distributed multi-stage buffer storage system suitable for object network storage | |
CN102819586B (en) | A kind of URL sorting technique based on high-speed cache and equipment | |
CN102638584B (en) | Data distributing and caching method and data distributing and caching system | |
Cambazoglu et al. | Scalability challenges in web search engines | |
KR20200053512A (en) | KVS tree database | |
US20130191523A1 (en) | Real-time analytics for large data sets | |
US9129010B2 (en) | System and method of partitioned lexicographic search | |
CN102971732A (en) | System architecture for integrated hierarchical query processing for key/value stores | |
CN110765138B (en) | Data query method, device, server and storage medium | |
CN105160039A (en) | Query method based on big data | |
US9262511B2 (en) | System and method for indexing streams containing unstructured text data | |
CN118113663A (en) | Method, apparatus and computer program product for managing a storage system | |
CN106649150A (en) | Cache management method and device | |
CN113722274B (en) | R-tree index remote sensing data storage model | |
CN109246102B (en) | System and method for supporting large-scale authentication data rapid storage and retrieval | |
CN117076523B (en) | Local data time sequence storage method | |
CN117539915B (en) | Data processing method and related device | |
CN116680276A (en) | Data tag storage management method, device, equipment and storage medium | |
CN108509585A (en) | A kind of isomeric data real-time, interactive optimized treatment method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20181012 |