[summary of the invention]
Based on this, be necessary to provide a kind of fuzzy query method that can improve search efficiency.
A kind of fuzzy query method, comprises the following steps:
According to the mapping relations between the index value preset and data item, global index is set up to database table;
Described global index is sorted;
Obtain key word of the inquiry, between the index area adopting interval binary search to mate with described key word of the inquiry;
Obtain the index value in described index area, obtain the data item corresponding with described index value according to described mapping relations, and described data item is returned as Query Result.
Preferably, the mapping relations between described index value and data item are the mapping relations of the corresponding data item of multiple index value.
Preferably, the step between the index area that mates with described key word of the inquiry of the interval binary search of described employing is specially:
Dichotomy is adopted to locate the match index of mating with described key word of the inquiry;
With described match index for end points obtains between described index area border.
Preferably, the mapping relations between the index value preset of described basis and data item also comprise after setting up global index to database table:
The inquiry times of index value is stored in described global index;
The index value that inquiry times described in buffer memory is greater than threshold value and the data item corresponding to index value being greater than threshold value with described inquiry times.
Preferably, after described acquisition key word of the inquiry, also comprise:
Search the index value mated with described key word of the inquiry in the buffer, and the data item corresponding with the described index value found in buffer memory is returned as Query Result, and the inquiry times of the index value mated with described key word of the inquiry is added 1.
In addition, there is a need to provide a kind of fuzzy query system that can improve search efficiency.
A kind of fuzzy query system, comprises with lower module:
Index module, for setting up global index according to the mapping relations between the index value preset and data item to database table;
Order module, for sorting described global index;
Search module, for obtaining key word of the inquiry, between the index area adopting interval binary search to mate with described key word of the inquiry;
Value module, for obtaining the index value in described index area, obtaining the data item corresponding with described index value according to described mapping relations, and described data item being returned as Query Result.
Preferably, the mapping relations between described index value and data item are the mapping relations of the corresponding data item of multiple index value.
Preferably, search described in module comprise for adopt dichotomy locate the match index of mating with described key word of the inquiry the first submodule and for second submodule of described match index for the border that end points obtains between described index area.
Preferably, also comprise the inquiry times for storing index value in described global index, and the cache module of the index value that is greater than threshold value of inquiry times described in buffer memory and the data item corresponding to index value that is greater than threshold value with described inquiry times.
Preferably, describedly search module also for searching the index value mated with described key word of the inquiry in the buffer, and the data item corresponding with the described index value found in buffer memory returned as Query Result, and the inquiry times of the index value mated with described key word of the inquiry is added 1.
Above-mentioned fuzzy query method and system first establish unified global index to tables of data, between the index area that the index that then the lower dichotomy of complexity service time carries out Search and Orientation and Keywords matching is formed, make use of continuity and the order of index itself efficiently, thus the time complexity of inquiry is reduced when sacrificing segment space complexity (for storing index), improve inquiry velocity, thus improve the efficiency of fuzzy query.
[embodiment]
As shown in Figure 1, in one embodiment, a kind of fuzzy query method, comprises the following steps:
Step S102, sets up global index according to the mapping relations between the index value preset and data item to database table.
Database is made up of multiple database table, stores many data records in database table, and a data record is a data item.Such as, user message table and organizational chart two table is stored in database.The data stored in user message table are the relevant information of user; The data stored in organizational chart are the relevant information of each department in the composition structure of company.
In one embodiment, first determining that query manipulation in database is frequent and increase, delete and database table that renewal rewards theory is less, is then the definition of data item key words of these search operations frequently in database table.Preferably, can be the multiple key word of definition of data item.Then key word is set up global index in a database as index value.Index value and data item have mapping relations, preferably, are many-to-one mapping relations, i.e. the corresponding data item of multiple index value.And for the situation with identical key word of multiple data item, multiple identical index value can be used to map multiple different data item respectively.
Such as, for index value " lm ", both can be the abbreviation at " dawn ", also can be the abbreviation of " Li Meng ".Now, two index values " lm " need only be set up respectively corresponding " dawn " and " Li Meng ".
Step S104, sorts global index.
In one embodiment, after global index establishes, then according to the ordering rule preset, global index is sorted.Preferably, the ordering rule preset is lexcographical order, namely first index value is converted to character string (Chinese character first converts phonetic to), then sorts according to the lexcographical order of character string.Such as: be [a, ab, abc, abe, b] after index value sequence [ab, a, abe, b, abc] sorts from small to large according to lexcographical order.
Step S106, obtains key word of the inquiry, between the index area adopting interval binary search to mate with key word of the inquiry.
Step S108, obtains the index value in index area, obtains the data item corresponding with index value, and data item returned as Query Result according to mapping relations.
Interval dichotomy (RBS, RangeBinarySearch) for a kind of based on the data search method of traditional dichotomy.Compared with traditional dichotomy, traditional binary search is the position determining key word of the inquiry place in one sequence, and interval dichotomy locates the continuum that all matching values of mating with key word of the inquiry form.
In one embodiment, key word of the inquiry is obtained, between the index area adopting interval binary search to mate with key word of the inquiry: (1) adopts dichotomy to locate the match index of mating with key word of the inquiry; (2) with this match index for end points obtains border between this index area (such as, the index value that in the continuous integral number sequence formed by the subscript that the border between index area during storage of array index value sequence is array, the left and right end points of certain continuum is corresponding, such as the continuum [a that array index is formed, b], then a is the left end point in this interval, and b is the right endpoint in this interval).
Concrete, be that end points obtains the step on the border between index area preferably: the left and right end points adopting dichotomy to obtain respectively between the index area mated with key word of the inquiry for end points with this match index with match index.
The detailed process of above-mentioned steps S106 is described with a concrete example below.In this example, if the index value sequence in global index is stored in array E, and arrange from small to large by lexcographical order.Expect that the key word searched is key.During fuzzy search, the matching relationship of key and index value is: if index value character string is using key as bebinning character (comprising index value character string and the identical situation of key), then this index value mates with key word key, otherwise, do not mate.
The method that two character strings compare size by lexcographical order is: the ASCII character value (being converted into phonetic for Chinese character) first comparing the initial character of two character strings, and the character string that initial character ASCII character value is large is larger; If the ASCII character value of initial character is equal, then continue to compare next bit character, until last position.If relatively to a certain position time, certain character string ends up, then giving tacit consent to this ASCII character of this character string is 0 (namely certainty is minimum).Such as, according to lexcographical order, a < b, a < ab, abcde < abcf.
The detailed process of searching in step S106 between the index area that mates with key word of the inquiry is:
(1) initialization integer vernier variable low, mid and high, make low=0, high=E.length-1, wherein, E.length represents the length of array E.Be designated as LOW under making the left end point between index area again, under right endpoint, be designated as HIGH, and be designated as LOW=HIGH=-1 under supposing initial left and right end points.
(2) as low <=high, circulation performs (3), otherwise, perform (4).
(3) assignment mid=(low+high)/2, compares the size of key and index value E [mid], and E [mid] is for being designated as the element of mid under in array E:
If by aforesaid lexcographical order relatively after, key < E [mid], then E [mid] drops on the right side of the right endpoint between index area, namely E [mid] is larger than the right endpoint in index interval, thus the index value that all subscripts are greater than mid-1 can be eliminated, then assignment high=mid-1 return (2) and continue to compare;
If by aforesaid lexcographical order relatively after, key > E [mid] and key and E [mid] do not mate, then E [mid] drops on the left side of the left end point between index area, namely E [mid] is less than the left end point in index interval, thus the index value that all subscripts are less than mid+1 can be eliminated, then assignment low=mid+1 return (2) and continue to compare;
If by aforesaid lexcographical order relatively after, if key <=E [mid] and key and E [mid] mate, then E [mid] drops in index area, and namely E [mid] is match index.
Then, perform subprocess S3 and S4, and by the rreturn value assignment of S3 to LOW, by the rreturn value assignment of S4 to HIGH.Subprocess S3 and S4 is respectively used to left end point and the right endpoint of locating index interval.
(4) judge the value of LOW, if the value of LOW is-1, then return null value, namely do not find match index; If LOW is <=HIGH, then return the subscript LOW of the left end point between index area and the subscript HIGH of right endpoint.
Subprocess S3 is for obtaining the left end point between index area, and detailed process is:
(a) initialization integer vernier variable s3_low=low, s3_high=mid, s3_mid=0 (low and high is herein the local vernier variable in abovementioned steps S106).
B (), as s3_low < s3_high, performs (c), otherwise, perform (d).
C () assignment s3_mid=(s3_low+s3_high)/2, compares key and E [s3_mid]:
If key and E [s3_mid] does not mate, then subscript is less than s3_mid+1 element in the past, i.e. s3_low=s3_mid+1.
If key and E [s3_mid] mates, then eliminate the element that subscript is greater than s3_mid-1, i.e. s3_high=s3_mid-1, performs (b) (final (c) will enter (d) because of s3_low=s3_high after terminating).
D () compares key and E [s3_low], if key mates E [s3_low], return s3_low, otherwise return s3_low+1.
Subprocess S4 is for obtaining the right endpoint between index area, and detailed process is:
(I) s4_low=mid, s4_high=high, s4_mid=0 (low and high is herein the local vernier variable in abovementioned steps S106) during initialization integer variable.
(II) as s4_low < s4_high, perform (III), otherwise, perform (IV).
(III) assignment s4_mid=(s4_low+s4_high)/2, the relatively size of key and E [s4_mid], if key and E [s4_mid] does not mate, then eliminate all in array E under be marked on the later element of s4_mid-1, i.e. s4_high=s4_mid-1.
If key and E [s4_mid] mates, then eliminate the record before s4_mid+1, i.e. s4_low=s4_mid+1, performs (II) (final, (III) will perform because of s4_low=s4_high (IV) after terminating).
(IV) compare the size of key and E [s4_low], if key and E [s4_low] coupling, then return s4_low, otherwise return s4_low-1.
It should be noted that, in other embodiments, can also preferably adjust according to the length of key word of the inquiry for the method between the index area that above-mentioned location is mated with key word of the inquiry.Namely, if certain key word of the inquiry longer (relative to index value), then can estimate and show that less between Matching band (key word of the inquiry is longer, then be greater than most of index value in global index, therefore the index value matched is also relatively less), then after determining match index, directly adopt the left and right end points of sequential search method (comparing one by one successively in order) position matching index.
Such as, if during certain inquiry, only include two index values between the Matching band mated with key word of the inquiry, then compare through three times the left and right end points can determined between Matching band according to sequential search method.If now still employing dichotomy determines the left and right end points between Matching band, when the length of index value sequence is longer, then need two just can obtain result several times.The above-mentioned method of the left and right end points of sequential search legal position match index that adopts when key word of the inquiry is longer can accelerate inquiry velocity.
Be located by step S106 and keyword match index area between border after, then according to aforesaid mapping relations, the all data item corresponding with all index values in index area are found out, generates a Query Result list, then Query Result list is returned as Query Result.
When fuzzy query result is a lot, and when the interface showing Query Result is less, usually disposable all Query Results can not be showed user.And in some cases, user is often only concerned about modal or follows the immediate result of fuzzy search.Therefore, also can comprise after according to the mapping relations preset global index being set up to database table: the inquiry times storing index value in global index; The index value that caching query number of times is greater than threshold value and the data item corresponding to index value being greater than threshold value with inquiry times.
In one embodiment, also can comprise after obtaining key word of the inquiry: search the index value mated with this key word of the inquiry in the buffer, and the data item corresponding with the index value found in buffer memory returned as Query Result, and the inquiry times of the index value mated with key word of the inquiry is added 1.
Such as, the number of times of key word " scb " coupling " market department " is maximum, and the number of times mating name " Shen Changbin " is less.Therefore, when receiving the inquiry request that key word is " scb ", first finding the data item " market department " that the inquiry times of " scb " correspondence is maximum in the buffer, then first " market department " data item being returned.Meanwhile, can continue to perform aforesaid step S106 in global index, finally all fuzzy query results be returned.This processing mode user is waited for the time of Query Result is shorter, efficiently utilize the time of user's browse queries result, improve Consumer's Experience.
As shown in Figure 3, in one embodiment, a kind of fuzzy query system, comprises index module 102, order module 104, searches module 106 and value module 108, wherein:
Index module 102 is for setting up global index according to the mapping relations between the index value preset and data item to database table.
Database is made up of multiple database table, stores many data records in database table, and a data record is a data item.Such as, user message table and organizational chart two table is stored in database.The data stored in user message table are the relevant information of user; The data stored in organizational chart are the relevant information of each department in the composition structure of company.
In one embodiment, index module 102 is first determined that query manipulation in database is frequent and increases, deletes and database table that renewal rewards theory is less, is then the definition of data item key words of these search operations frequently in database table.Preferably, index module 102 can be the multiple key word of definition of data item.Then key word is set up global index in a database as index value.Index value and data item have mapping relations, preferably, are many-to-one mapping relations, i.e. the corresponding data item of multiple index value.And for the situation with identical key word of multiple data item, multiple identical index value can be used to map multiple different data item respectively.
Such as, for index value " lm ", both can be the abbreviation at " dawn ", also can be the abbreviation of " Li Meng ".Now, two index values " lm " need only be set up respectively corresponding " dawn " and " Li Meng ".
Order module 104 is for sorting global index.
In one embodiment, after global index establishes, then according to the ordering rule preset, global index is sorted.Preferably, the ordering rule preset is lexcographical order, namely first index value is converted to character string (Chinese character first converts phonetic to), then sorts according to the lexcographical order of character string.Such as: be [a, ab, abc, abe, b] after index sequence [ab, a, abe, b, abc] sorts from small to large according to lexcographical order.
Enquiry module 106 for obtaining key word of the inquiry, between the index area adopting interval binary search to mate with key word of the inquiry.
Value module 108, for obtaining the index value in index area, obtains the data item corresponding with index value according to mapping relations, and data item is returned as Query Result.
Interval dichotomy (RBS, RangeBinarySearch) for a kind of based on the data search method of traditional dichotomy.Compared with traditional dichotomy, traditional binary search is the position determining key word of the inquiry place in one sequence, and interval dichotomy locates the continuum that all matching values of mating with key word of the inquiry form.
In one embodiment, search module comprise for adopt interval dichotomy locate the match index of mating with key word of the inquiry the first submodule (not shown) and for being that end points obtains border between index area (such as with match index, the index value that in the continuous integral number sequence formed by the subscript that the border between index area during storage of array index value sequence is array, the left and right end points of certain continuum is corresponding, such as the continuum [a that array index is formed, b], then a is the left end point in this interval, b is the right endpoint in this interval) the second submodule (not shown).
Concrete, the left and right end points that the second submodule adopts dichotomy to obtain respectively between the index area mated with key word of the inquiry with this match index for end points.
Detailed process between the index area that enquiry module 106 adopts interval binary search to mate with key word of the inquiry please refer to the query script that aforesaid step S106 describes, and does not repeat them here.
It should be noted that, in other embodiments, the second submodule, when locating between the index area mated with key word of the inquiry, can also preferably adjust according to the length of key word of the inquiry.Namely, if certain key word of the inquiry longer (relative to index value), then can estimate and show that less between Matching band (key word of the inquiry is longer, then be greater than most of index value in global index, therefore the index value matched is also relatively less), then after determining match index, directly adopt the left and right end points of sequential search method (comparing one by one successively in order) position matching index.
Such as, if during certain inquiry, only include two index values between the Matching band mated with key word of the inquiry, then compare through three times the left and right end points can determined between Matching band according to sequential search method.If now still employing dichotomy determines the left and right end points between Matching band, when the length of index value sequence is longer, then need two just can obtain result several times.The above-mentioned method of the left and right end points of sequential search legal position match index that adopts when key word of the inquiry is longer can accelerate inquiry velocity.
Enquiry module 106 located and keyword match index area between border after, value module 108 is according to aforesaid mapping relations, the all data item corresponding with all index values in index area are found out, generate a Query Result list, then Query Result list is returned as Query Result.
When fuzzy query result is a lot, and when the interface showing Query Result is less, usually disposable all Query Results can not be showed user.And in some cases, user is often only concerned about modal or follows the immediate result of fuzzy search, therefore, preferably, in one embodiment, also comprise the inquiry times for storing index value in aforesaid global index, and the cache module (not indicating in figure) of the caching query number of times index value that is greater than threshold value and the data item corresponding to index value that is greater than threshold value with inquiry times.
In one embodiment, search module also for searching the index value mated with key word of the inquiry in the buffer, and the data item corresponding with the index value found in buffer memory returned as Query Result, and the inquiry times of the index value mated with key word of the inquiry is added 1.
Such as, the number of times of key word " scb " coupling " market department " is maximum, and the number of times mating name " Shen Changbin " is less.Therefore, when receiving the inquiry request that key word is " scb ", first finding the data item " market department " that the inquiry times of " scb " correspondence is maximum in the buffer, then first " market department " data item being returned.Meanwhile, enquiry module 106 can continue in global index, inquire about all key words mated with key word of the inquiry, and all fuzzy query results return by last value module 108.This processing mode user is waited for the time of Query Result is shorter, efficiently utilize the time of user's browse queries result, improve Consumer's Experience.
The above embodiment only have expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.