CN102890678A

CN102890678A - Gray-code-based distributed data layout method and query method

Info

Publication number: CN102890678A
Application number: CN2011102030050A
Authority: CN
Inventors: 周敏奇; 周傲英
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2011-07-20
Filing date: 2011-07-20
Publication date: 2013-01-23

Abstract

The invention belongs to the technical field of a database, and discloses a gray-code-based distributed data layout method. The method comprises the steps of: dividing a range of each attribute into a plurality of equal portions; encoding according to a gray code order; marking a certain attribute value of a tuple comprising a plurality of attributes through the gray codes of the equal portions of the attribute value, namely an index code of the attribute value; forming an index key value of the tuple by mixing the index code of each attribute value in the tuple, wherein the tuple achieves the distributed data layout according to the order of the gray codes, the distributed data layout is deployed on a distributed system, the bitmap index of content perception is achieved at a host computer terminal of the system and the content perception is stored in a file name, and physical storage of data and statistical index of the data are achieved on a slave terminal of the system. The invention also discloses a query method employing the database formed by means of the method. The data layout obtained by the method can meet the requirements of data processing such as exact matching search, range search, multi-dimensional search, multi-attribute search and aggregated analysis, and the method is high in disc access efficiency.

Description

A kind of distributed data layout method and querying method based on Gray code

Technical field

The invention belongs to database technical field, be specifically related to a kind of efficient distributed data layout method and querying method.

Background technology

Trace back to 20 century 70s, succeeding in developing of IBM System R system and the Ingres of Berkeley University system, proved the superiority of relational database system processing business data.The eighties subsequently, the derivative IBM DB2 of model thus, Sybase SQL Server, Oracle Database, INFORMIX-SQL etc. be take issued transaction (OLTP) and are the flourish of main Database Systems, make Database Systems obtain abundant commercialization, and formed huge marketable value.Again to the nineties, the integration historical data that W.H.Inmon proposes, realize the data warehouse of the commercial intelligence services such as commercial planning, decision support by methods such as on-line analysis (OLAP), data minings, for brand-new chapter has been opened up in the application of Database Systems, and promote the development of word management, Data Stream Processing etc.According to IDC, investigate, through the development of more than ten years, the market value of global commerce intellectual analysis in 2008 has reached 77.84 hundred million dollars, accounts for 38% of global metadata base management system market value 204.79 hundred million dollars [2], and keeping annual growth more than 10.6%, showing huge development potentiality.Yet this Database Systems framework that is close to 30 years history, one size fits all (one size fits all) seems that in face of current demand haltingly data warehouse, word management, Data Stream Processing etc. are just being found new solution.Cloud computing system---representing that mass data analysis process system of new generation shows up prominently, and Google, the success of the companies such as Amazon is feasibility, the validity of this type systematic of tentative confirmation.

Trace to its source, why the relational database system of one size fits all can't be competent at the demand that current data is analyzed, and is following three aspects:

1) extensibility of existing relational database system is poor.Between 30 years, relational database system has been done much research and trial aspect extensibility.Data base machine (as IBM System/38) from centralized, the client/server Database Systems (as Micro DBMS) towards large scale computer to the data base-oriented system is again to parallel database system, comprise shared drive system (as Volcano), shared disc system (as Oracle), without shared system (as Teradata, Oracle-nCUBE), its parallel processing capability, extensibility improve.Current, the parallel database system of the out to out that can see is the DB2 Extended Enterprise Edition that can support 512 nodes.Yet such yardstick expanded is difficult to adapt to the needs that current data is analyzed: (a) the exponential growth of pending data volume.Data analysis application is based on extensive historical data, comprises the data from a plurality of departments, a plurality of operating databases.For example, the data warehouse of eBay has that 5 PB data, MalMart have the 2.5PB data, Bank of America-National Trust & Savings Association has the 1.5PB data, and these data are according to one times of the every 12-18 of a Moore's Law monthly increment.(b) impromptu property (ad hoc) and the complicacy of inquiry, analysis task.Current business intelligence data analysis (as Client Attrition of Credit Card, Insurance Fraud behavioural analysis) usually adopts on-line analysis to process the high-level data query analysis technology such as (OLAP), data mining, enterprise performance management, forecast analysis, word excavation and realizes decision support function, and its query analysis task is very complicated.Visible, mass data storage and complex query analyzing and processing have been brought great challenge to the existing database system.

2) each inter-module imbalance of development of computing machine.Table 1 has provided the state of development of nearest 30 years each inter-modules of computing machine.Known according to table 1, between 30 years, the addressing speed of disk is only reduced to 5ms by 50ms, and bandwidth has also only improved 150 times, far lags behind the speed of development of other assembly.Jim Gray claims the disk of a 20TB of random access to need the time of 1 year in paper, and sequential access can improve 500 times of speed, and disk (randow addressing) has progressively deteriorated to tape (sequential addressing).Therefore to optimize random access (as adopted B-Tree, cache etc.), be the development that main relational database system progressively is not suitable with current computer hardware in early days, mass data query analysis processing aspect seems particularly outstanding.Mike Strongbraker also claims thus and revise this system architecture.

The state of development of each assembly of table 1. computing machine

3) optimization aim difference.The main transaction optimizing of relational database system is processed (being OLTP) and is served that to take merchandising and service be main business activity.This type systematic adopts usually by row (row-wise, n-aray storage model) storage mode, and---record of Coutinuous store---inserts, upgrades, revises to realize efficient data---writes optimization; And improve treatment effeciency by strategies such as B-tree index, multithreadings; And the concurrent control based on lock, the Restoration Mechanism based on record are to guarantee the ACID characteristic of database.Business intelligence system (data warehouse) needs optimization data analyzing and processing (being OLAP) and serves that to take MRP, decision support be main mass data query analysis task.This type systematic adopts star, snowflake schema store historical data to analyze (once deposit in repeatedly and read) usually---reads to optimize; Improve complicated, impromptu query analysis ability by bitmap index, snapshot data, view; And the requirement (as weak consistency, final consistency) of reduction data ACID, to improve the concurrency of data query analyzing and processing.

The present invention has overcome single index in above-mentioned prior art can't support that multiattribute inquiry, multiattribute continuous data physical store are too disperseed, the defect of distributed storage poor expandability, a kind of distributed data layout method and querying method based on Gray code proposed, the present invention is a kind of data layout method of efficient and Highly Scalable, effectively keep the locality of multidimensional data and, in conjunction with the master/slave structure of distributed file system, realize the enhanced scalability of data storage by Gray code.The inventive method obtains data layout can meet the demand that the data such as exact match is searched, scope is searched, multi-Dimensional Range is searched, multiattribute is searched, polymerization analysis are processed, and has higher disk access efficiency.

Summary of the invention

The present invention proposes a kind of distributed data layout method based on Gray code, comprise the steps:

The first step, the codomain scope of each attribute is divided into to a plurality of equal portions, each equal portions is according to the Gray code sequential encoding, including thus the Gray code that a certain property value of the tuple of a plurality of described attributes can be worth the described equal portions of institute's subordinate by this is identified, the index codes that is called this property value, other property value of this tuple can obtain index codes by identical method, finally by the index codes of shuffling each property value in this tuple and then an index key assignments that forms this tuple;

Second step, described tuple can realize the distributed data layout according to the order of described Gray code, described distributed data layout is deployed in distributed system, realize the ratio index of the picture of perception of content in the host side of described system, and, with the filename storage, realize the statistical index of physical store and the data of data at the slave end of described system.

Wherein, described method also comprises the 3rd step, when take in the Single document that the ratio index of the picture of perception of content is filename the tuple of storing while reaching the capacity that described file sets, described file is split into two files, wherein, the ratio index of the picture of the perception of content of original is split into the ratio index of the picture of two sub-perception of content, and distributes to respectively two files after division.

Wherein, described distributed system comprises HDFS.

Wherein, the ratio index of the picture of described perception of content is incorporated to the metadata management node of described distributed file system.

Wherein, described statistical index distributed storage is in back end.

The invention allows for a kind of method that the database that utilizes above-mentioned method to form is inquired about, comprise the following steps:

Calculate the index codes that meets predetermined inquiry constraint condition by Karnaugh map;

By the method for shuffling with identical in claim 1-5, described index codes is shuffled into to the index key assignments;

Fetch all tuples with described index key assignments; And

The described tuple of fetching is filtered to remove unwanted tuple.

Wherein, described inquiry comprises exact match, partial match query.

Wherein, described inquiry comprises multi-Dimensional Range inquiry and multiattribute range query.

Wherein, the restrictive condition in described inquiry can provide for the combination of any attribute item in data tuple or any attribute item, and described restrictive condition is the scope restriction.

Wherein, by ratio index of the picture and the statistical index of described perception of content, realize described approximate aggregate query, by accessing concrete tuple, realize aggregate query.

The present invention proposes a kind of distributed data layout method, this data layout method is introduced Gray code, and data are according to Gray code sequential placement, and the data access expense in the time of can effectively reducing range query, because significantly improved the efficiency of data sequential access.

The present invention is comprehensively in conjunction with the master/slave framework that has the distributed system of increasing income (Hadoop) now, form with filename in the main frame (being called Namenode) of storage data metadata (meta data) is stored the Gray code index, the content of stored data in this document can effectively be indicated in this index, in slave (being called Datanode) with the data in Gray code order store files.Sequential access rate when these data can effectively improve the data area access.

Below explanation some concept and definition relevant with the present invention.

karnaugh map:karnaugh map is a kind of diagrammatic representation of logical function.The Karnaugh map of a logical function is exactly that each minterm in this minimum of a function item expression formula is correspondingly inserted in a graticule, and this graticule is called Karnaugh map.By the ultimate principle of Karnaugh map abbreviation logical function, exactly above-mentioned logic basis and graphic feature are combined, merged by the adjacent lattice " circle " characterizing Adjacentminterm on Karnaugh map together, reached the purpose that replaces some minterms with a simple term “and”.

the distributed system of increasing income:the Hadoop system is a distributed system architecture, by the Apache foundation, is developed.The user can be in the situation that do not understand distributed bottom details, the exploitation distributed program.Take full advantage of cluster high-speed computation and mass memory.Hadoop has realized a distributed file system (Hadoop Distributed File System), is called for short HDFS.HDFS has the characteristics of high fault tolerance, and design is used for being deployed on cheap hardware.And its data of providing high transmission rates to visit application program, be applicable to processing the super large data set.HDFS has relaxed the requirement of POSIX with the data in the form access file system that realizes stream.

HDFS :for external client, HDFS is just as a traditional hierarchical file system.Can create, delete, move or Rename file, etc.But the framework of HDFS is based on one group of specific node and builds, this is to be determined by it self characteristics.These nodes comprise NameNode, and it provides Metadata Service in HDFS inside; DataNode, it provides the data physical store for HDFS.Owing to only there being a NameNode, so this is the problem that there is single point failure in HDFS.The size of piece (being generally 64MB) and the number of blocks copied are determined by client computer when creating file.NameNode can control the All Files operation.All communications of HDFS inside are measured TCP/IP agreement all.

NameNode :nameNode is the host node in the HDFS file system.It is in charge of file system title space and controls the access of external client.Whether NameNode determines File Mapping on the copy block on DataNode.For modal 3 copy block, first copy block is stored on the different nodes of same frame, and last copy block is stored on certain node of different frames.Note, need you to understand aggregated structure here.

Actual I/O affairs, not through NameNode, only have the metadata of the File Mapping that means DataNode and piece through NameNode.When external client sends request while require creating file, NameNode can be usingd the DataNode IP address of first copy of block identification and this piece as response.This NameNode also can notify other will receive the DataNode of the copy of this piece.

NameNode in a file that is called FsImage storage all about the information in file system title space.This file and a log file that comprises all affairs (being EditLog here) will be stored on the local file system of NameNode.FsImage and EditLog file also need reproduction replica, in case file corruption or NameNode system loss.

DataNode :dataNode is also the software moved on a common independent machine in the HDFS example.The Hadoop cluster comprises a NameNode and a large amount of DataNode.DataNode is usually with the form tissue of frame, and frame couples together all systems by a switch.The hypothesis of Hadoop is: the transmission speed between the frame internal node is faster than the transmission speed of frame intermediate node.

The DataNode response is from the read-write requests of HDFS client computer.They are gone back response creation, delete and copy the order from the piece of NameNode.NameNode relies on regular heartbeat (heartbeat) message from each DataNode.Every message all comprises a piece report, and NameNode can be according to this report checking piece mapping and alternative document system metadata.If DataNode can not send heartbeat message, NameNode will take reclamation activities, again be replicated in the piece of losing on this node.

gray code:gray code (Gray code), make again cyclic binary code or reflected binary code can only identify 0 and 1 in digital display circuit, various data will be converted to binary code and just can be processed, Gray code is a kind of non-weighted code, adopt absolute addressing mode, typical case Gray code be a kind of single step self-complementing code with reflection characteristic and cycle characteristics, the possibility of significant error appears in its circulation, single step characteristic while having eliminated direct access, its reflection, self-complementary characteristic make negate very convenient.Gray code belongs to Reliability codes, is the minimized coded system of a kind of mistake.

In the present invention, the Gray code data layout has following characteristics:

Any two have a potential difference not the corresponding tuple of first group coding of (one-bit difference) the Numeric Attributes value or belong to same section, or belong to adjacent segment.

First group coding with same prefix, establish the figure place corresponding to same Numeric Attributes in their not identical suffix and mostly be k most, and the Numeric Attributes value of their corresponding tuples differs at most 2k section.

Be generated as mapping one by one based on shuffling the join index key assignments.

Index key assignments based on Gray code has extensibility.Expanding here refer to when all index key assignments have p-bit to expand to (p+1)-bit, and wherein the p-bit prefix is identical with previous p-bit index key assignments, and thus, during the division of index key assignments, data are without movement.

Gray code is compared with binary coding, and unified range query can reduce at most general random disk access.

The accompanying drawing explanation

The schematic diagram that Fig. 1 is data storage shelves structure in the inventive method.

Fig. 2 is the schematic diagram that in the inventive method, codomain is cut apart.

Fig. 3 is the schematic diagram that in the inventive method, the index key assignments is cut apart.

Fig. 4 is that in the inventive method, order reads the chart of raising.

Embodiment

Embodiment 1: overview

Tables of data be projected into according to demand one group of projection , wherein each projection is comprised of one group of attribute,

.Without loss of generality, we are from the details projection data layout, and suppose that this projection is by set of properties form wherein 0 <<,.

In order to adapt to current popular distributed file system framework (as HDFS etc.), our data layout has adopted client/server equally, as shown in Figure 1.Main frame (being Namenode) stores the perception of content index (content-aware index) of indicating by Gray code, can directly inquiry request be navigated to specific data block according to this index thus, and then realizes efficient inquiry.Each tables of data burst after slave (being Datanodes) is cut apart by the indexed sequential storage in distributed mode.Each burst has head /load data two parts content.Store the statistical information of data recording in this burst for head; And the data recording for depositing according to indexed sequential of storing in load data.

Embodiment 2: the bitmap index of perception of content

The perception of content bitmap index proposed in the present invention has double action: the first, be used to indicate the content (that is, the effect of bitmap index) that multidimensional data records; The second, be used to indicate data and be recorded in the position in the projection of corresponding data table.Basic, this index is created according to the content recorded.Below introduce according to the method that records the content creating index.

The coding of Numeric Attributes value:

Native system is supported all kinds of Numeric Attributes, comprises the segmentation Numeric Attributes of the serial number type attribute of known codomain scope, known codomain scope and the Numeric Attributes of unknown codomain scope.

Serial number type attribute for known codomain scope, evenly be divided into some sections by the codomain of attribute, and the power that is 2 of the hop count after dividing, and its size determines (introducing in the time of will drawing multiattribute data in place of matchmakers) by system.For example, for codomain, be [1 ,100] attribute, when being divided into 4 sections, its result is ?<[1 ,25] ,(25 ,50] ,(50 ,75] ,(75 ,100] }.For the segmentation Numeric Attributes of known codomain scope, each sectionally smooth join of attribute item codomain is got up, then spliced continuum evenly is divided into to some sections, and the power that is 2 of the hop count after dividing.For example, for codomain, be [1 ,4] ∪[5 ,17] ∪[31 ,36] attribute, when being divided into 4 sections, its result is ?<[1 ,4] [5 ,7] ,(7 ,12] ,(12 ,17] ,[31 ,36] }.

For the Numeric Attributes of unknown codomain scope, use formula 1 by the codomain of any range be mapped to a specific scope [ ?Π/2 ,+ Π/2].Simultaneously, formula 1 can make the homogenising more distributed on the codomain of data after mapping.

( )?=?arctan( × ) (1)

In formula 1, ?original property value, ?an amplifying parameters, and ( ?) be the smooth value (smooth) after mapping. ?according to the prior knowledge of property distribution, set.We know, formula 1 is scope [0 ,1] be mapped to [0 ,Π/4], and scope [1 ,+ ∞] be mapped to [Π/4 ,Π/2].Arrange ?=1/ ?, wherein ?be the mid point of codomain scope, what usually in the codomain scope after data-mapping, distribute is more even.If without any the prior knowledge distributed about codomain, arrange ?=1.After completing the codomain range mappings, by scope [ ?Π/2 ,+ Π/2] carry out decile, and the hop count of the dividing power that is 2.For example, incite somebody to action [ ?Π/2 ,+ Π/2] be divided into 4 sections, result is ?<[ ?Π/2 ,Π/4] ,( ?Π/4 ,0] ,(0 ,+ Π/4] ,(+Π/4 ,+ Π/2] }.

After the Numeric Attributes codomain is divided into to 2 a power section, after each section number according to the order of sequence, use Gray code (Gray Code) is encoded, as shown in Fig. 2.

Calculate Gray code corresponding to tuple Numeric Attributes and can be divided into two steps, the first, calculate this tuple attributes value ?the binary number of the section of falling into , in Fig. 2 at attribute

on binary coding corresponding to the section of falling into be 001; The second, convert this binary coding to Gray code by algorithm 1, in Fig. 2

at attribute

gray code be 001.

The coding of text genotype property value:

For text genotype property value, it is encoded to the result of value being carried out to the rear delivery of Hash (hash), = ?( ?) 2 ^m.In realization, native system is used SHA-1 as hash function.Obviously, the coding of same alike result value is identical, thereby can be mapped on identical overlay network node.

The index multiattribute data:

During multiattribute tuple generating indexes key assignments, when being used how many positions to be coded in system initialization, each attribute item sets.If do not set, each attribute is divided equally ?bit-identify.A given tuple, the method for introducing according to a upper trifle, calculate its Gray code to each property value respectively.Native system adopts and shuffles coding (shuffle-based), and the coding of each attribute is broken up rear generating indexes key assignments.When shuffling, the coding of each attribute is spaced successively, with first of first another attribute coding of splicing of the coding of an attribute, the like until all codings have all been rearranged.For example, record

include three attributes, be respectively , and on these three attributes, the Gray code of the correspondence of analog value is respectively " 011 ", " 010 " and " 101 ".Shuffle rear record

corresponding index is " 001110101 ", i.e. attribute

one of index codes splicing attribute

one of the code of index, more then attribute

one of index codes, by that analogy.

Embodiment 3: index construct

Effective storage of data in the present invention has been described in the present embodiment, by minimum extra storage expense, has realized the perception of content index.

The distributed file system of current popular and generally use (as, HDFS, GFS etc.) all there is master/slave framework, because this framework can effectively be simplified the design of distributed file system.In host side, store metadata (that is, filename, file storage location, file size, backup etc.), and, at the slave end, store load data (that is the data that, truly need storage).Master/slave framework based on above-mentioned distributed file system, native system has proposed the deployment of two layer indexs, and wherein one deck is for the project content index, is called the projection layer secondary index; And one deck is for the burst content indexing in addition, be called the sliced layer secondary index.

As what mention in embodiment 1, arbitrary projection be divided into a component sheets by level, wherein each fragment Coutinuous store is in one group of consecutive data block of allocating in advance.Storage architecture based on distributed file system, can be stored in host side with the form of filename by the projection layer secondary index, not only can realize the indication to content in burst thus, and this index is without any extra storage overhead simultaneously.Due to the data in burst or projection according to Gray code sequential storage (that is, in burst or projection the index of data form one section continuous Gray code scope).Therefore, by this Gray code scope initial-stop value can effectively mean the data of storing in this burst or projection.For this reason, native system is used the projection title to add the file name of the initial Gray code of this scope as this burst or projection.As shown in Figure 1, the filename of burst (as ?1) as metadata store in Namenode, this document name comprises that projection id( ) and scope initial-stop Gray code (being 0000-0111).

For the load (as storage overhead, access expense etc.) that effectively reduces Namenode, native system is stored in the index of burst level in Datenode with distributed way, in order to safeguard the statistical information of data in corresponding burst.The sliced layer secondary index is stored in the head of each burst, and comprises two parts content.First is the index key assignments, stores the corresponding index key assignments recorded in burst; Second portion be the respective index key-value pair answer record quantity.Thus, this index information can obtain two parts major function, and the first is used to indicate the quantity that records had in order to the index key assignments, and it two is the positions in burst that are used to indicate each index key assignments corresponding record.In Fig. 1 ?head in 1 are example.Second assembly (0001,1) means corresponding to the index key assignments " 0001 " record is arranged, and from the Sub_clause 11 record of this burst.

Distributed index with centralized/distributed structure/architecture has more advantage, as: 1) by the metadata of having stored, store the index key assignments of foregoing perception in Namenode, and then avoided extra index stores expense; 2) this index key assignments has succinct centralized architecture, and then has simplified the process of data recording retrieval; 3) this index framework has extensibility preferably, because jumbo index key Data-Statistics information-distribution type is stored on Datanode.The data layout strategy of burst inside below is discussed.

Embodiment 4: data layout

Provided the details of index creation, maintenance algorithm in above-described embodiment, the present embodiment describes data recording in detail and how to insert in corresponding burst.

Without loss of generality, below with record

insert projection for example describes data layout in detail, for more representative, suppose projection herein include one or more burst, and physical store in distributed file system (as, GFS, HDFS etc.).After having generated corresponding index key assignments, this record can find the burst be inserted into according to corresponding index key assignments.TupleInsert(in algorithm 2) function has provided detail.The record that specifically includes three levels in this function inserts (being ProjectInsert(), SegmentInsert() and BlockInsert()).SegmentInsert() function is responsible for finding the target burst, require the corresponding index key assignments that records of storing in this target burst to comprise the index key assignments that is inserted into record, and call SegmentCheck() method, whether need division (split) in order to detect this burst.Concrete burst is cut apart details and is referred to the 5th joint.BlockInsert() method is found target data block and is adjusted the index of corresponding burst level.Finally, be recorded in target data block and find corresponding position to be inserted, and physical store is in corresponding burst.Can further improve the performance that above-mentioned wall scroll records insertion algorithm, because record plug-in type at every turn, all need a disk randow addressing.Native system supports batch records to insert equally, and can increase substantially performance.Before carrying out the batch records insertion, be inserted in the internal memory that at first record be pre-stored in corresponding Datanode, and sorted according to their the index key assignments of correspondence.For example, to be recorded

corresponding index key assignments is respectively " 0110 ", " 1000 " and " 0001 ", do that three be inserted into be recorded in corresponding Datatnode with

after sequence, be stored in internal memory.The record of storing in internal memory can be realized by MERGING/SORTING ALGORITHM (merge-sort) insertion of batch records after reaching certain quantity.

Embodiment 5: the burst splitting-up method

Stored when full when the data block of distributing for certain burst, after needing this burst is split into to two bursts, recorded insertion.Specifically, can be divided into two class divisions, i.e. data block division and the division of index key assignments.

If it is full to be inserted into the data block that the burst of record stores, the multipair index range of key values of answering of this burst includes more than one index key assignments simultaneously, so executing data block splitting (being the SegmentCheck(in algorithm 2) method).For the division of concrete data block, at first find in middle position (as the BlockSplit() method of index range of key values corresponding to data block to be divided '), this index key assignments is that corresponding index key assignments is recorded at the data block middle part.Data block can be split into to two according to this index key assignments, one of them corresponding index range of key values is [., '], another be ['+1 .], and record move in corresponding data block.

If the data block that burst to be divided distributes is full, the corresponding index range of key values of this burst only includes an index key assignments simultaneously, will implement so the division of index key assignments.Owing to need to the scope that only includes an index key assignments being divided, so at first need this index range is enlarged (herein for double).The index key value generation method used according to native system at first needed the record attribute codomain is cut apart again before division, was divided into 2 times of original interval, finally will cause the index key assignments regenerated to increase by one with originally comparing.Provide the object lesson of index codes division in Fig. 3 (a).When the index key assignments divides, shuffle order according to the index key assignments, suppose attribute

being selected as field of definition cuts apart again.So for recording 2 at attribute

on the index key assignments split into 00 from 0, and then cause the index key assignments to split into 010 by 01, as shown in the figure.Similar, all records in burst 2 need to regenerate the index key assignments.After completing the index key assignments and regenerating, this burst just can split into two bursts as the data block division.It should be noted that: also can change the burst that record is stored after the index key assignments division based on Gray code, after division, record 1,2 and still be stored in distribution 1, be IP fragmentation and reassembly as shown in Figure 3 (b) herein.Index key assignments based on Gray code has extensibility preferably, and the data that need to move during the file division are less.

Embodiment 6: data manipulation

The data manipulation that in the present invention, system is supported comprises that exact match is searched, part matched and searched, scope are searched, the part matching range is searched, approximate polymerization is searched, polymerization is searched etc.

The concrete data for projection of below take is set forth all kinds of query processings as example, and in this projection, every record includes two Numeric Attributes (1,2), and codomain is successive range for [0,10].And continuous codomain normalization is arrived to scope [0,1] generating indexes key assignments afterwards.

Exact match, part matched and searched:

Here to search be that the inquiry packet submitted to is containing the constraint condition to each property value exact match to exact match.For example, can submit to following exact match to search, 1: 1=4,2=8.And the part matched and searched refers in submit Query the definite matching constraint condition comprised the part property value.For example, 2: 1=4,2=.Carrying out this type of searches, at first the identical method (participating in the 4.3rd) of take is each attribute value generation index codes, 1: 1=01, and 2=10 and 2: 1=01,2=, wherein mean on corresponding position (bit) that can be both 1 can be also 0.Then, fetch all tuples with above-mentioned index key assignments, 1=0110 and 2=0 1.Finally, also need the tuple to fetching to do the later stage filtration, because include a part of unwanted result in the result of returning.

Multi-Dimensional Range inquiry and multiattribute range query:

The definition of multi-Dimensional Range inquiry and multiattribute range query is similar to the definition of exact match inquiry and partial match query, difference only is that the restrictive condition of inquiry is range constraint, for example, can be as the multi-Dimensional Range inquiry 1: 1≤1<5 of giving a definition, 6<2≤10, multiattribute range query 2: 1≤1<5,2=.Then at first the process of execution inquiry also comparing class seemingly, done the later stage according to the corresponding record of index key assignments and filtered.For example, calculate gained 1=01 and 2=0.

Aggregate query and approximate aggregate query:

System in the present invention support the operation of some common aggregate queries (as,,,,, etc.), approximate aggregate query refers to the approximation that efficiently provides aggregate query here.Here, approximate aggregate query is mainly realized by the perception of content index.The inquiry of below take describes as example.For this aggregate query, at first calculate the index key assignments that meets restrictive condition.According to this index key assignments and recording frequency corresponding to this index key assignments, then the corresponding property value of index key assignments and recording frequency are multiplied each other, can obtain the result that approximate polymerization is searched.After needing to access all records for the result of definite aggregate query, just can provide.

Embodiment 7: analyze

The data processing performance of the described analysis of the present embodiment based on above-mentioned data layout.

Sequential access:

In foreword, point out, when front disk addressing time and Disk bandwidth have become the bottleneck of mass data access.If but can decrease disk addressing number of times during data access, the efficiency of data access can be greatly enhanced so.Simultaneously, range query is addressing operation commonly used while processing mass data.Here analyze the data layout strategy according to above-mentioned optimization, while processing mass data, performance can obtain how many liftings.Fortunately, the data layout based on Gray code has the characteristic that keeps data locality for multiattribute, and then has increased the possibility of data sequential access.Fig. 4 has provided the comparison aspect sequentially reading in lifting of data layout based on Gray code and traditional binary coded data layout.The data recording of storing in the burst shown in Fig. 4 include two attributes (as, 1,2), and the codomain of each attribute is carried out to 4 deciles.Distinguish Gray code and binary-coded index key assignments by different fonts in figure.Mainly provide the comparative result of range query (that is, multi-Dimensional Range inquiry and multiattribute range query) in this section.Obviously, to three inquiries given in figure (1,2,3), order reads aspect and all has greatly improved, and three Query Results are respectively by mark *, #, and % is distinguished.For example, the quantity that reads at random for inquiry 1 has 2 to reduce to 1.

Index stores:

The storage efficiency of this section metric data.Due to the data layout strategy provided of native system, each burst not storage is full, because need to improve the efficiency that data are inserted.But consider when every new data records is inserted, split into two for storing a full minute sector-meeting, wherein each burst has all been stored the record of half capacity.As can be seen here, the data storage efficiency of system of the present invention is at least 50%.

Claims

1. the distributed data layout method based on Gray code, is characterized in that, comprises the steps:

2. data layout method according to claim 1, it is characterized in that, described method also comprises the 3rd step, when take in the Single document that the ratio index of the picture of perception of content is filename the tuple of storing while reaching the capacity that described file sets, described file is split into two files, wherein, the ratio index of the picture of the perception of content of original is split into the ratio index of the picture of two sub-perception of content, and distributes to respectively two files after division.

3. data layout method according to claim 1, is characterized in that, described distributed system comprises HDFS.

4. data layout method according to claim 1, is characterized in that, the ratio index of the picture of described perception of content incorporated to the metadata management node of described distributed file system.

5. data layout method according to claim 1, is characterized in that, described statistical index distributed storage is in back end.

6. the method that the database that utilizes method as described as any one in claim 1-5 to form is inquired about, is characterized in that, comprises the following steps:

Fetch all tuples with described index key assignments; And

The described tuple of fetching is filtered to remove unwanted tuple.

7. querying method as claimed in claim 6, is characterized in that, described inquiry comprises exact match, partial match query.

8. querying method as claimed in claim 6, is characterized in that, described inquiry comprises multi-Dimensional Range inquiry and multiattribute range query.

9. querying method as claimed in claim 6, is characterized in that, the restrictive condition in described inquiry can provide for the combination of any attribute item in data tuple or any attribute item, and described restrictive condition is the scope restriction.

10. querying method as described as claim 1-5, is characterized in that, by ratio index of the picture and the statistical index of described perception of content, realizes described approximate aggregate query, by accessing concrete tuple, realizes aggregate query.