Detailed description of the invention
Here will illustrate exemplary embodiment in detail, its example represents in the accompanying drawings.Explained below relates to
During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represents same or analogous key element.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they are only with the most appended
The example of the apparatus and method that some aspects that described in detail in claims, the application are consistent.
It is only merely for describing the purpose of specific embodiment at term used in this application, and is not intended to be limiting the application.
" a kind of ", " described " and " being somebody's turn to do " of singulative used in the application and appended claims is also intended to include majority
Form, unless context clearly shows that other implications.It is also understood that term "and/or" used herein refers to and wraps
Any or all containing one or more projects of listing being associated may combination.
Although should be appreciated that in the application possible employing term first, second, third, etc. to describe various information, but this
A little information should not necessarily be limited by these terms.These terms are only used for same type of information is distinguished from each other out.Such as, without departing from
In the case of the application scope, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as
One information.Depend on linguistic context, word as used in this " if " can be construed to " ... time " or " when ...
Time " or " in response to determining ".
As it is shown in figure 1, Fig. 1 is the application flow process according to a kind of data query method shown in an exemplary embodiment
Figure, comprises the following steps 101 to 104:
In a step 101, obtaining inquiry request, described inquiry request includes querying condition.
In a step 102, by default file index determines whether there is the data meeting described querying condition.
In step 103, if there are not the data meeting described querying condition, then terminate inquiry.
At step 104, if there are the data meeting described querying condition, by default data directory at initial data
File finds out the data meeting described querying condition.
The querying method that the embodiment of the present application is provided, can be when setting up index to raw data file, and foundation is used for
Quickly whether the data of location file have the file index of the data meeting condition, if determining according to file index and there is number
According to, recycling data directory inquires about this data meeting querying condition;If not existing, then can directly terminate inquiry.The application is real
Execute example can first pass through file index determine whether there is meet condition data ought not in the presence of, then can directly terminate inquiry,
Make a look up without in all data files, therefore can be effectively improved search efficiency.
Wherein, described file index generates previously according to raw data file, and its record has the property value of index attributes to exist
Minima in computer code and maximum.
Described by default file index determines whether there is the data meeting described querying condition, including:
Judge the sizes values of described querying condition whether between the minima and maximum of described index attributes, if so,
Then determine and there are the data meeting described querying condition, if not, it is determined that there are not the data meeting described querying condition.
In the embodiment of the present application, the determination of index attributes in file index, can be civilian with initial data according to actual needs
Part and flexible configuration, after determining the index attributes for creating file index, file index can be at raw data file
Create the when of loading warehouse-in.One file index can comprise a kind of index attributes, in actual applications, and can basis
Need to create one or more file index, when inquiry, can determine whether in multiple file indexes according to querying condition
There are the data meeting condition.
In file index, record has index attributes value minima in computer code and maximum, refers to initial data
In file, the maximum of the property value corresponding to the index attributes of pieces of data and minima.Such as, at a relevant student letter
In the file of breath, including 100 student data, this document includes one " name " attribute, is somebody's turn to do corresponding to " name " attribute
Property value is the name of student." name " attribute is defined as index attributes, then index attributes value in computer code
Little value and maximum, refer to compare the name of each bar student data, determine the maximum of name in 100 student data
Value and minima.Owing to the computer code of data can compare size, can be by recorded in querying condition and file index
Maximum and minima compare, thus determined whether there is the data meeting querying condition by file index.
In an optional implementation, file index has also recorded exhaustive value quantity and exhaustive value;Described exhaustive value
Quantity is the number of different attribute value corresponding to index attributes in described raw data file, when described exhaustive value quantity is less than pre-
If during threshold value, described exhaustive value includes all properties value that in described raw data file, index attributes is corresponding, when described exhaustive
When value quantity is not less than predetermined threshold value, described exhaustive value is empty.
When the data directory by presetting finds out the data meeting described querying condition in raw data file, if
Described exhaustive value is not empty, then obtain the property value corresponding with described querying condition, then root in exhaustive value according to querying condition
In raw data file, the data meeting described querying condition are found out according to property value and data directory.
The following is a kind of of file index to illustrate:
Minima |
|
Maximum |
|
Exhaustive value quantity |
|
Exhaustive value |
|
In the embodiment of the present application, in file, any attribute of data can create index, can root when in use
Create according to needs.Such as, having a file about student data, every student data includes " student number ", " name ", " property
Not ", " age " etc..When using name attribute to create file index, name attribute value in file can be read, according to title
The computer code of property value, determines maximum and the minima of each name attribute value in whole part file.If using " property
Not " attribute creates file index, and " sex " property value that can read all student data in file contrasts, and obtains maximum
Value and minima.
Assume there is a file index:
Minima |
Li Si |
Maximum |
Zhang San |
Exhaustive value quantity |
|
Exhaustive value |
|
When the querying condition of input is " title=king five ", first according to file index, determine that " king five " are minima " Lee
Four " and between maximum " Zhang San ", therefore can determine that and file exists the data meeting this querying condition, then recycle data
Index search goes out to meet the data of this querying condition.
When the querying condition of input is " title=Zheng Tao ", owing to " Zheng Tao " " does not opens in minima " Li Si " and maximum
Three " between, therefore can determine that and file does not exist the data meeting this querying condition, inquiry can be terminated.
Wherein, when exhaustive value negligible amounts, such as, gender attribute is utilized to create file index, due to gender attribute value only
Two kinds, then can store both gender attribute values in exhaustive value.
When exhaustive value quantity is more, file has 50,000 student data, utilize name attribute to create file index, different
Student has different name attribute values, it is assumed that have 40,000 9 thousand different name attribute values, owing to quantity is relatively big, has exceeded pre-
If threshold value, the most exhaustive value can be empty.
By exhaustive value quantity and exhaustive value, can be when determining the data that existence meets querying condition, according to inquiry bar
Part obtains the property value corresponding with described querying condition in exhaustive value, civilian at initial data further according to property value and data directory
Part finds out the data meeting described querying condition, therefore can improve the inquiry velocity of data.
Wherein, the attribute as index can be one, it is also possible to is the combination of multiple attribute.Such as, there is portion relevant
The file of student data, every student data includes " student number ", " name ", " sex ", " age " etc., it is assumed that use " name "
" age " combination one file index of establishment:
Minima |
Li Si, 17 |
Maximum |
Zhang San, 15 |
Exhaustive value quantity |
|
Exhaustive value |
|
In this file index, during maximizing and minima, can be " name " and " age " group
Closing contrast, contrast algorithm is exemplified below:
Have two students of A, B, need contrast size (name and the combined index at age that contrast A here are less than B):
Next data directory is illustrated.
In the embodiment of the present application, described data directory can include head and data block list;Described data block list bag
Including multiple data block, the corresponding data of each data block, each data block record has index attributes value and this data in phase
Side-play amount in the raw data file answered;Described head record has the size of data block and total number of data block, every number
Size according to block is identical.Described data block, according to two hierarchy numbers preset, pre-loaded one or more is in two points of positions
Data block is in internal memory, and the data directory that recycling is preset finds out the number meeting described querying condition in raw data file
According to time, use binary chop mode, internal memory finds out data block or the data block range meeting querying condition.
As an embodiment, data directory can use structure as shown in Figure 2.Such as, index value 1 and data-bias
Amount 1 just constitutes a data block, that is to say, it is assumed that have 100 data in raw data file, has been equal to 100 data blocks.
When the attribute comprised in querying condition in data directory, data directory can be used to make a look up, otherwise use biography
The mode of the data scanning of system scans initial data.When querying condition for being more than, be more than or equal to, be less than, be less than or equal to, being equal to,
The when of like or in, the mode of binary chop can be used to search, when querying condition for being not equal to, not like, not in
When, then the mode can being scanned data directory is searched.
In order to reduce disk I/O (input/output, Input/Output), data directory can be carried out special caching side
Formula, can reduce disk I/O binary chop when.Generally in data directory, each data block is smaller, but disk
IO minimum unit is sector, more much larger than data block.The when of carrying out binary chop in data directory, if repeatedly carrying out retaking of a year or grade
Take little data block and can reduce disk I/O efficiency, therefore can protect in advance needing the data block accessed during a part of binary chop
It is stored to internal memory as caching, the when of lookup, first determines a little data block range in the buffer, the most just according to determining
Data block range reads the data block of this scope from disk and makes a look up.
Such as, raw data file has 1,000,000 data, sets up data directory in the manner previously described, according to two points of thoughts,
The data block being saved in internal memory is in the data block of two points of positions:
Layer0:(0+100 ten thousand)/2,=50 ten thousand
Layer1:(0+50 ten thousand)/2,=25 ten thousand (500,000+100 ten thousand)/2,=75 ten thousand
Layer2:(0+25 ten thousand)/2=12.5 ten thousand (250,000+50 ten thousand)/2=37.5 ten thousand (500,000+75 ten thousand)/2=62.5 ten thousand
(750,000+100 ten thousand)/2=87.5 ten thousand
......
Layer that is two hierarchy number, refer to all data blocks in data directory by the number of plies of two points, such as, two hierarchy numbers are 1
Layer, represents and all data blocks is equally divided into 21Part;Two hierarchy numbers are 2 layers, represent and all data blocks are equally divided into 22Part;Two
Hierarchy number is 3 layers, represents and all data blocks are equally divided into 23Part, by that analogy.The Layer used is the highest, is saved in internal memory
Data the most, inquire about the fastest.In actual applications, the number of plies of two points can determine according to actual needs.
Assuming to comprise 10 records in data, set up data block list in the manner previously described, this Database Lists includes
10 data blocks, can determine two cached subindex blocks afterwards as required.
These 10 are recorded as 123456789 10;The present embodiment determines as required data block is divided into 22
Part, that is to say and all data blocks are carried out two-layer two points, all data blocks are divided into 4 parts, i.e. data directory caches in internal memory
Data block only include data block 3, data block 5 and data block 7.
When needs carry out binary chop to data block 2, first carry out binary chop for the first time, in data directory, data
Block list includes 10 data blocks, then binary chop is to search data block 5 in the buffer for the first time;Due to the data block searched
2 are less than data block 5, carry out second time binary chop, find data block 3 in the buffer.
Owing to data block 2 is less than data block 3, but in data block 2 not caching, therefore can work as and find data block 3
When, it may be determined that go out data block 2 less than data block 3, so data block 2 is in the range of data block 1 to data block 3.Root
Index block size recorded in head according to data directory, determines that the length of " scope of data block 1 to data block 3 " is " (3-
1) * data block size ", therefore can read data block 2 rapidly according to this length, and then obtain the required data searched.
From above-described embodiment, Database Systems can starting when according to data block in data directory by two points
The number of plies, load two subindex data blocks (that is to say the data block being in two points of positions by putting in order) in internal memory.When looking into
When finding out the data block range meeting querying condition, compare taking up room and presetting of data base corresponding to described data block range
Configuration Values, if described in take up room less than described preset configuration value, then the data block corresponding to described data block range is loaded
To internal memory, the mode of recycling binary chop finds out the data block meeting querying condition.
When queried, it is possible to use binary chop determines the data block meeting querying condition, and may find N step
When, determine that searched data block does not has in internal memory, " data block size is multiplied by lookup position now can to use formula
Put " obtain the data block position range in data directory that needs to access, then according to data directory position in disk,
Position the data block relevant position scope in disk that these needs access, further according to the further binary chop of this position range,
Determine the data block of required lookup, and then according to data offset, from raw data file, read required data.
Wherein, after determining data block position range in disk, when reading out this data block, it can be basis
Above-mentioned position range directly searches data from the data directory being stored in disk, it is also possible to be by corresponding to this position range
All data blocks be all loaded in internal memory, in internal memory, carry out binary chop afterwards further, determine searched data block.
In an optional implementation, a Configuration Values can be preset, such as, can be the numerical value such as 4KB (kilobytes).If
No more than this Configuration Values that takes up room of data block corresponding to this position range, then load the data corresponding to this position range
Block is in internal memory, then carries out binary chop in internal memory;If taking up room more than being somebody's turn to do of the data block corresponding to this position range
Configuration Values, then directly search data block in disk.Generally the IO minimum unit of disk is not byte, but sector, if one
The byte number of sector is 4K, then when the data block corresponding to this position range take up room no more than this Configuration Values time,
The data reading several byte in disk and the sector reading a 4K byte are as broad as long, it is possible to by this position range institute
Corresponding data block loads internal memory, then carries out binary chop from internal memory, thus prevents repeatedly calling disk and read data.
Corresponding with the embodiment of preceding method, present invention also provides the embodiment of corresponding system.As it is shown on figure 3, Fig. 3
Being the application block diagram according to a kind of data query system shown in an exemplary embodiment, described system includes:
Acquisition module 31, is used for obtaining inquiry request, and described inquiry request includes querying condition.
Determine module 32, for by default file index determines whether there is the number meeting described querying condition
According to.
Enquiry module 33, if for there are not the data meeting described querying condition, then terminating inquiry;If existing and meeting institute
State the data of querying condition, utilize the data directory preset to find out the number meeting described querying condition in raw data file
According to.
In an optional implementation, described file index generates previously according to raw data file, described file
Index record has the property value of index attributes minima in computer code and maximum;
Described determine module, be additionally operable to:
Judge the sizes values of described querying condition whether between described minima and maximum, if, it is determined that exist
Meet the data of described querying condition, if not, it is determined that there are not the data meeting described querying condition.
In an optional implementation, described file index has also recorded exhaustive value quantity and exhaustive value;Described thoroughly
Act value quantity is the number of different attribute value corresponding to index attributes in described raw data file, when described exhaustive value quantity is low
When predetermined threshold value, described exhaustive value includes all properties value that in described raw data file, index attributes is corresponding, when described
When exhaustive value quantity is not less than predetermined threshold value, described exhaustive value is empty.
Described enquiry module be additionally operable to the file index by presetting find out in raw data file meet described
During the data of querying condition, if described exhaustive value is not empty, then obtains in exhaustive value and described inquiry bar according to querying condition
The property value that part is corresponding, finds out in raw data file further according to property value and data directory and meets described querying condition
Data.
In an optional implementation, described data directory includes head and data block list;Described data block arranges
Table includes multiple data block, and the corresponding data of each data block, each data block record has index attributes value and this data
Side-play amount in raw data file;Described head record has the size of data block and total number of data block, each data
The size of block is identical.
Described data block is according to two hierarchy numbers preset, and pre-loaded one or more data blocks being in two points of positions exist
In internal memory, described lookup module be additionally operable to utilize preset data directory find out in raw data file meet described in look into
During the data of inquiry condition, use the mode of binary chop, internal memory finds out data block or the data block meeting querying condition
Scope.
In an optional implementation, described lookup module is additionally operable to when finding out the data block meeting querying condition
During scope, relatively the taking up room and preset configuration value of data base corresponding to described data block range, if described in take up room little
In described preset configuration value, then after the data block corresponding to described data block range being loaded onto internal memory, recycle binary chop
Mode find out the data block meeting querying condition.
In said system, the function of modules and the process that realizes of effect specifically refer to corresponding step in said method
Realize process, do not repeat them here.
For system embodiment, owing to it corresponds essentially to embodiment of the method, so relevant part sees method in fact
The part executing example illustrates.System embodiment described above is only schematically, wherein said as separating component
The module illustrated can be or may not be physically separate, and the parts shown as module can be or can also
It not physical module, i.e. may be located at a place, or can also be distributed on multiple mixed-media network modules mixed-media.Can be according to reality
Need to select some or all of module therein to realize the purpose of the application scheme.Those of ordinary skill in the art are not paying
In the case of going out creative work, i.e. it is appreciated that and implements.
Those skilled in the art, after considering the invention that description and practice are applied for here, will readily occur to its of the application
Its embodiment.The application is intended to any modification, purposes or the adaptations of the application, these modification, purposes or
Person's adaptations is followed the general principle of the application and includes the common knowledge in the art that the application does not applies for
Or conventional techniques means.Description and embodiments is considered only as exemplary, and the true scope of the application and spirit are by following
Claim is pointed out.
It should be appreciated that the application is not limited to precision architecture described above and illustrated in the accompanying drawings, and
And various modifications and changes can carried out without departing from the scope.Scope of the present application is only limited by appended claim.
The foregoing is only the preferred embodiment of the application, not in order to limit the application, all essences in the application
Within god and principle, any modification, equivalent substitution and improvement etc. done, should be included within the scope of the application protection.