CN108021689A - The method inquired about using the IVkNN algorithms based on MapReduce - Google Patents
The method inquired about using the IVkNN algorithms based on MapReduce Download PDFInfo
- Publication number
- CN108021689A CN108021689A CN201711358195.7A CN201711358195A CN108021689A CN 108021689 A CN108021689 A CN 108021689A CN 201711358195 A CN201711358195 A CN 201711358195A CN 108021689 A CN108021689 A CN 108021689A
- Authority
- CN
- China
- Prior art keywords
- mapreduce
- key
- value pair
- data set
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2237—Vectors, bitmaps or matrices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
This divisional application discloses the method that a kind of IVkNN algorithm of use based on MapReduce is inquired about, and belongs to cloud computing, big data field, is for solving the problems, such as:Available data querying method index efficiency is improved, technical essential is:Host node is loaded files into distributed caching, and Mappers reads R from distributed cachingi∈ R and SjThe key-value pair that each partition informations of ∈ S are formed, map functions generate new key-value pair, and for each object r ∈ R, wherein key is its subregion id, and effect is:Algorithm index efficiency can be improved so that MapReduce can be no longer influenced by room and time influence.
Description
The application for application number 2017109759745, applying date 2017-10-19, denomination of invention " based on MapReduce with
The divisional application of the extensive nearest Neighbor of row's Thiessen polygon ".
Technical field
The invention belongs to cloud computing, big data field, is related to a kind of can be effectively improved under distributed environment and inquires about effect
The MapReduce indexes of rate.
Background technology
MapReduce is a kind of currently a popular programming framework based on cloud platform, it can handle and generate large-scale number
According to collection, it supports data-intensive application using the shared cluster of nothing.Processing step is specially:In distributed cache system
In, it is that key/value pair among one group is generated in map functions, according to phase by MapReduce tasks when handling a key/value pair
With middle key merge all medians, each map is independently of each other operations, i.e., all maps can be held parallel
OK.One group " reducers " of MapReduce can perform reduction operation, and the output that the Map with identical key is operated at the same time may be used
With reduction to same reducer.But one reduction procedure of isolated operation may be such that inefficiency;
MapReduce can be used for supporting than traditional more massive data processing of commerce server cluster, it can be
The data of a PB quantity can be only handled in a few houres, carrying out data directory using MapReduce has preferable application
Prospect.However, parallel processing of the existing Index Algorithm due to not adapting to MapReduce, builds the time consumption of index not
Enough ideals, scalability are bad.
The content of the invention
In order to improve available data querying method index efficiency, the present invention provides following scheme:
A kind of extensive nearest Neighbor based on MapReduce with row's Thiessen polygon, includes the following steps:
S1. the row's of falling Voronoi indexes based on MapReduce are constructed;
S2. subregion is carried out to data set R and S using the row's of falling Voronoi indexes;
S3. the IVKNN based on MapReduce is used to carry out distribution kNN inquiries.
Construction is as follows the step of arranging Voronoi indexes based on MapReduce:
Two datasets R is given in S1.1.d dimension spaces and S, Hadoop carry out burst, part mappers is parallel at the same time to be transported
OK, in MapReduce tasks, using the reducer of acquiescence, before map functions are started, generation is obtained using pre- clustering algorithm
Table point p, and be loaded into the main memory of each map;
S1.2. in each map treatment progress, the burst of input is read using TextInputForma successively,
TextInputFormat reads data into the example of Mapper from file, calculate respectively m object r in data set R with respectively
N object s in the distance between point p, data set S is represented with representing the distance between point p, and by range data collection R
J-th of object s in i-th of object r and data set SjImmediate representative point PijSelect and be gathered in Voronoi cells
VCiIn, subregion VC1~VCm, the i order for forming m VC are taken all over 1~m, also, are worked as i orders and taken one in 1~m
During occurrence, under the value of i, j orders take all values all over 1~n, obtain the immediate Voronoi for representing point of m storage
Cell;Output<VCi, List (Pij)>It is right;
Known query point p, differentiates its closest VCiOr most some neighbouring VCiCollection, it is most adjacent described in mapper outputs
Near VCiOr most some neighbouring VCiCollect r, s object in corresponding raw data set R and/or S, and export closest VCi
Or most some neighbouring VCiThe id of collection;
S1.3., mapper is output to the file system of Hadoop.
In the step S1.2, j-th in i-th of object r and data set S in the R by range data collection
Object sjImmediate representative point PijSelect and be gathered in Voronoi cells VCiIn, formed m VC subregion VC1~
The process of VCm is as follows:
First object r in data set R1With first object s in data set S1The immediate point that represents as P11, number
According to first object r in collection R1With second object s in data set S2It is immediate represent point as
P12, first object r in data set R1With n-th of object s in data set SnImmediate represent a little
For P1n, a immediate representatives of the n, which are put, to be selected and is gathered in Voronoi cells V1In;
······
M-th of object r in data set RmWith first object s in data set S1The immediate point that represents as Pm1, number
According to m-th of object r in collection RmWith second object s in data set S2The immediate point that represents as Pm2,
M-th of object r in data set RmWith n-th of object s in data set SnThe immediate point that represents as Pmn, the n closest
Representative point select and be gathered in Voronoi cells VmIn.
The step of being inquired about using the IVkNN algorithms based on MapReduce is as follows:
S1. the partition value file for including data set R and S is put according to IVI;
S2. host node is loaded files into distributed caching, and Mappers reads R from distributed cachingi∈ R and Sj∈
The key-value pair that each partition informations of S are formed, map functions generate new key-value pair, and for each object r ∈ R, wherein key is its point
Area id, for each object s ∈ S, if dist (s, P)<Dist (r, P), then map functions also create one group of new key-value pair;
Reducer iteratively reads R from MapperiAnd SiIn all the points, by VCiIdentical key-value pair into dividing group,
KNN inquiries are carried out, check SjAll subregions after, reducer output kNN (r, S), export key-value pair<R, kNN (r, S)>As
KNN query results.
The row's of falling Voronoi indexes include two parts:Master index, it includes all cluster centres;Second index,
It include being stored in each VC to as queue.
Beneficial effect:The present invention is that one kind handles kNN inquiries using MapReduce, and can using Voronoi diagram
Parallelization suitable for MapReduce is handled, so as to improving algorithm index efficiency so that MapReduce can be no longer influenced by
Room and time influences.
Brief description of the drawings
Fig. 1 is Voronoi diagram;
Fig. 2 is Voronoi diagram index;
Fig. 3 is that influence of the node data change to RSD and SDS data collection structure index time in MRIV structures contrasts
Figure.
Specific embodiment party
Embodiment 1:A kind of extensive nearest Neighbor based on MapReduce with row's Thiessen polygon, wherein:
MapReduce is existing programming model, and for the concurrent operation of large-scale dataset, described method includes following steps:
S1. construct based on MapReduce the row's of falling Voronoi indexes (Thiessen polygon index,
InvertedVoronoiIndex,IVI);
S2. using fall row Voronoi indexes to data set R and S progress subregion, obtain VC subregions, two subregions be by
Need to carry out when establishing of Voronoi diagram in the later stage, Voronoi diagram needs two subregions to combine, thus carries out in this step
The subregions of two datasets;
S3. the IVKNN based on MapReduce is used to carry out distribution kNN (nearest neighbor algorithm) inquiries, IVKNN is to utilize IVI
Fall row Voronoi indexes.
Wherein:Construction is as follows the step of arranging Voronoi indexes based on MapReduce:
Two datasets R is given in S1.d dimension spaces and S, Hadoop (distributed system architecture) carry out burst, portion
Divide mappers (a kind of existing coding and decoding) while parallel operation, in MapReduce tasks, use the reducer of acquiescence
(subtask merging process), before map functions are started, obtains representing point p, and be loaded into each map using pre- clustering algorithm
In the main memory of (subtask, which is decomposed, to be performed);
Wherein:Inside, is clustered the data clusters of point by the acquisition methods of representative, definite internal cluster and consecutive points,
Select cluster centre after cluster to be indexed, required data are to cluster a consecutive points for connection with internal, are clustered with this inside
Point is the center of circle, and circle is established comprising adjacent cluster centre point, and Delaunay triangles are used as the triangle of circumscribed circle using this circle
Two different inside cluster points are established Delaunay triangles in this method, the two Delaunay triangles by shape respectively
Delaunay triangulation network is established by common ground of consecutive points, data object is divided into several big subregions, selects wherein one cluster
Point is represented as representing a little, each object being divided is to be clustered in a Voronoi unit, each Voronoi grids
In contain object id.
S2. in each map treatment progress, read successively using TextInputForma (a kind of existing read mode)
The burst of input is taken, TextInputFormat reads data into the example of Mapper from file, calculates respectively in data set R
The distance between the distance between m object r and each representative point p, n object s in data set S and representative point p, and will be away from
From j-th of object s in i-th of the object r and data set S in data set RjImmediate representative point PijSelect and be gathered in
Voronoi cells VCiIn, subregion VC1~VCm, the i order for forming m VC (subregion of Voronoi diagram) are taken all over 1
~m, also, when i orders take an occurrence in 1~m, under the value of i, j orders take all values all over 1~n, obtain
The immediate Voronoi cells for representing point of m storage;Output<VCi, List (Pij)>It is right;
I.e.:First object r in data set R1With first object s in data set S1It is immediate represent point as
P11, first object r in data set R1With second object s in data set S2It is immediate represent point as
P12, first object r in data set R1With n-th of object s in data set SnImmediate represent a little
For P1n, a immediate representatives of the n, which are put, to be selected and is gathered in Voronoi cells V1In;
Second object r in data set R2With first object s in data set S1The immediate point that represents as P21, number
According to second object r in collection R2With second object s in data set S2It is immediate represent point as
P22, second object r in data set R2With n-th of object s in data set SnImmediate represent a little
For P2n, a immediate representatives of the n, which are put, to be selected and is gathered in Voronoi cells V2In;
······
M-th of object r in data set RmWith first object s in data set S1The immediate point that represents as Pm1, number
According to m-th of object r in collection RmWith second object s in data set S2The immediate point that represents as Pm2,
M-th of object r in data set RmWith n-th of object s in data set SnThe immediate point that represents as Pmn, the n closest
Representative point select and be gathered in Voronoi cells VmIn;
Known query point p, differentiates its closest VCiOr most some neighbouring VCiCollection, it is most adjacent described in mapper outputs
Near VCiOr most some neighbouring VCiCollect r, s object in corresponding raw data set R and/or S, and export closest VCi
Or most some neighbouring VCiThe id of collection;
S3., mapper is output to the file system of Hadoop.
The step of being inquired about using the IVkNN algorithms based on MapReduce is as follows:
S1. the partition value file for including data set R and S is put according to IVI (arranging Voronoi indexes);
S2. host node is loaded files into distributed caching, and Mappers reads R from distributed cachingi∈ R and Sj∈
The key-value pair that each partition informations of S are formed, map functions generate new key-value pair, and for each object r ∈ R, wherein key is its point
Area id, is worth and is made of k and v, for each object s ∈ S, if dist (s, Pij)<Dist (r, Pij), then map functions also create
One group of new key-value pair;
Reducer iteratively reads R from MapperiAnd SjIn all the points, by VCiIdentical key-value pair into dividing group,
KNN inquiries are carried out, check SjAll subregions after, reducer output kNN (r, S), export key-value pair<R, kNN (r, S)>As
KNN query results.
The IVI, IVI include two parts:Master index, it includes all cluster centres;Second index, it includes storage
There are each VC to as queue.
Embodiment 2:The present embodiment can implement as independent technical solution or as in embodiment 1 each scheme into one
Step explanation, present embodiments provides a kind of extensive nearest Neighbor based on MapReduce with row's Thiessen polygon, this
Method is a kind of highly effective algorithm that kNN inquiries are handled using Voronoi diagram based on MapReduce, it can also solve to move
Dynamic medical call system meets wireless penetration, networking, mobile this future developing trend.The present embodiment is also for the prior art
In deficiency improved, possess good high efficiency and scalability.To achieve these goals, the present embodiment is used
Technical solution perform step it is as follows:Carry out MapReduce and arrange building for the extensive NN Query index of Thiessen polygon
It is vertical;MapReduce is a kind of currently a popular programming framework based on cloud platform, it can handle and generate large data collection,
It supports data-intensive application using the shared cluster of nothing.Processing step is specially:In distributed cache system, by
MapReduce tasks are that key/value pair among one group is generated in map functions, according to identical when handling a key/value pair
Middle key merges all medians.Each map is independently of each other operations, i.e., all maps can be performed parallel ---
Although actually this is limited to separate data source or data source CPU quantity nearby.Likewise, one group " reducers " can be performed
Reduction operation, the output that the Map with identical key is operated at the same time can be with reduction to same reducer.Although with serial calculation
Method is compared, this process it is possible that inefficiency (because must run a multiple rather than reduction procedure), but
MapReduce can be used for supporting than traditional more massive data processing of commerce server cluster, it can be only several small
When the interior data that can handle a PB quantity, we are mainly used in Spark systems in practice, have been obtained obvious
Improved efficiency, the method execution efficiency more traditional than before use improve more than 12%.
The present embodiment includes two parts using MapReduce structures IVI, IVI:In master index, including all clusters
The heart;Second index, including be stored in each VC to as queue.Inverted index is for effectively index position and query object
Data object in adjacent queue.When a given inquiry, we can differentiate closest VC or most some are neighbouring
VC collection.Then the corresponding queue element (QE)s of these VC are included to come, so as to obtain kNN query results.Due to Voronoi diagram
It can be obtained with merging multiple Voronoi diagrams (VP) by splitting, be suitable for so construction falls to arrange Voronoi indexes
MapReduce model.The Voronoi for particularly every sub- VP being merged to the end.
Construction falls to arrange Voronoi indexes and concretely comprises the following steps:
S1. give and two datasets R and S are given in d dimension spaces.Hadoop peace default mechanisms carry out burst.Some
Mappers parallel operations at the same time.In MapReduce tasks, we use the reducer given tacit consent to.Start map functions it
Before, we obtain representing point p using quick pre- clustering algorithm, and are loaded into the main memory of each map.
S2. in each map treatment progress, it will read the burst of input using TextInputFormat successively
(pressing the input format in distributed file system), TextInputFormat can read data to the reality of Mapper from file
In example.Each r, the distance between s objects and p points are calculated, and by r, s distributes to immediate representative point P. in algorithm
In 2-3 rows, each point is collected in a Voronoi cell, it will be produced into m Voronoi cell, in algorithm
It can be exported in 4-6 rows<VCm, List (Pi)>Right, mapper output raw data sets (R or S) arrive each of hithermost subregion
A object r, s and its subregion VCmId.
Finally, in algorithm 8-10 rows, it would be desirable to needed according to what is controlled oneself by customized
Mapper is output to the file system of Hadoop by MultipleOutputFormat functions.It is determined how task result
Write back in the lasting storage of bottom.Voronoi index of the structure based on MapReduce is described in detail in we in algorithm 1
The algorithm pseudo code of structure.Using IVI, if given one represents a little, we can start MapReduce tasks and come into line number
According to subregion and collect some data messages of each subregion.
In practice, we first by varying clustered node number, the number change from 2 to 32, is based on to assess structure
The row's of falling Voronoi (MRIV) index efficiency of MapReduce.Fig. 2 illustrates node data change to RSD and SDS data collection structure
Index the influence of time.In Fig. 2, index structure almost linearly increases with the increase of clustered node number.Practice result is also demonstrate,proved
Understanding the structure of MRIV indexes needs less time, and final Voronoi can be obtained by merging multiple Voronoi subgraphs
Therefore handled suitable for MapReduce parallelizations.In addition, IVI employs inverted index structure, its scalability is better than its other party
Method, to sum up, indexes large-scale spatial data, and it is most suitable method to arrange Voronoi indexes.
Embodiment 3:In today that social security service develops rapidly at a high speed, the living standard of people increasingly improves,
Also become more hommization and personalization for the demand of medical services.Also there are more and more people to need more at the same time
Prompt and perfect medical services.
With mobile communication and the rapid growth based on location-based service correlation technique, cloud computing, big data, Internet of Things, shifting
The technologies such as dynamic calculating and space orientation are progressively ripe, and such as GPS, camera, blue-teeth data are also constantly increasing, and emerge in large numbers
Substantial amounts of spatial data, this to be faced with huge challenge in the storage and processing of various spatial datas or object.Therefore, with
The development of IT application process, electronic health record, nursing call center system, extensive medical data base etc. in industry of medical care
Using being also rapidly developing, improve work efficiency, improve medical services, Economy type medicine cost etc. played it is more and more
Effect.
But China's geographical environment difference is huge, economic development is uneven, medical resource skewness weighing apparatus, developed regions with
Outlying district is compared, and medical level is there is also very big difference, while as rural area is to the at full speed of the industries such as urban migration, tourism
Development so that exponentially type increases on the basis of script population mobility is big, is frequently encountered just to a place, runs into disease
Disease is unknown to where see a doctor, and stands in the queue to register it is more likely that need several months ahead of time to buy tickets, tosses about multiple hospitals by bus, most
The problem of a large amount of manpower financial capacities have been wasted on road at last, and disease is not got timely medical treatment but.Being frequently encountered needs emergency treatment
When, do not know but around have what hospital, which hospital can handle this dangerous situation, which hospital position is more preferable closer to, service,
Because the delay time at stop, cause treatment not in time, or even tragedy because of delay treatment and lethal can occur.But with the big data epoch
Arrival, there is the mode that more related medical resources are inquired about.Medical resource inquiry makes patient more easily find certainly
Oneself required data, for example, patient can inquire the hospital nearest from oneself by medical resource, since it is convenient just
Doctor.Based on the starting point, we have designed and Implemented the invention.
In the present embodiment, to the big rule based on MapReduce with row's Thiessen polygon described in above-mentioned two embodiment
The concrete application of mould nearest Neighbor is made an explanation.
In one embodiment, the process returned based on following steps realization index, request processing, data:
S1. hospital, which establishes, is used to perform based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon
System;
S2. user initiates to ask by ambulatory medical device to hospital;
S3. when hospital receives user's request, agreed to, and user requested data is returned into user.
Medical knowledge is sent specific to patient to ask for or during medical aid request, its mobile calls method is as follows:
S1. hospital, which establishes, is used to perform based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon
System, so that user can effectively improve search efficiency under distributed environment.
S2. patient user, when being badly in need of medical knowledge and/or medical aid, can be exhaled outside hospital by portable medical
Equipment is made to send respective request to hospital.
S3. when hospital receives user's request, row's Thiessen polygon index is transferred down, performs distribution Skyline or kNN
Deng spatial query algorithms (being not limited to these algorithms), agreement decision-making is carried out if meeting condition, and user requested data is returned
Back to user.
In another embodiment, the extensive NN Query side based on MapReduce with row's Thiessen polygon
Method is applied to medical material dispatching, it realizes that medical resource dispatching includes the following steps:
S1. medical resource supplier is established to perform and is looked into based on MapReduce and the extensive neighbour of row's Thiessen polygon
The system of inquiry, and medical resource data are uploaded in systems;
S2. when user needs medical resource, to medical resource, supplier files a request;
S3. medical resource supplier gives two medical resource data sets, and Hadoop carries out burst, each map processing
In process, the burst of input is read using TextInputForma successively, calculates each object and the medical treatment needed for user
The distance between resource, and the object searched out is distributed to closest to the medical resource needed for user, by from needed for user
The nearest object of medical resource is selected and is gathered in Voronoi cells VmIn, export on the medical resource needed for user most
Neighbouring VCiOr most some neighbouring VCiThe id of collection, transfers data to user.
In another embodiment, handheld device medical resource demand client access mobile Internet, and provided with medical treatment
Source supplier establishes contact, and medical resource demand client sends medical resource requirement request, medical treatment money to medical resource supplier
The spatial geographic information of source demand client be collected to for perform described in embodiment 1-2 based on MapReduce with fall arrange
In the system of the extensive nearest Neighbor of Thiessen polygon, and the spatial geographic information of medical resource logistic car is by the system
System collection in real time, the system is using medical resource demand customer information as representative point, with medical resource position and medical resource thing
It is data set to flow lorry position data, by the logistic car geo-spatial data collection of medical resource progress burst, using based on
MapReduce and the extensive nearest Neighbor of the row's of falling Thiessen polygon are apart from medical resource demand client to find
The medical resource logistics truck position and medical resource position of arest neighbors.
As being explained further for such scheme, which is also referred to as medical dispensing vehicle, handheld device
Medical resource demand client is by the wireless network based on 2G/3G/4G modes or WIFI, while accessing mobile Internet
Establish and contact with medical resource supplier, medical resource demand client sends to medical resource supplier and asks, medical treatment dispatching doctor
Treat resource dispensing vehicle driver after the system is logged in, by spatial geographic information (including the positional information of oneself and medical treatment provide
Source demand customer information) medical resource supplier MapReduce is sent to arranging the extensive NN Query of Thiessen polygon
System, spatial geographic information spatial geographic information disclosed in system is to medical dispensing vehicle driver establish index.Medical resource needs
Ask client to obtain the information of present position by mobile phone, the positional information got is sent to system, system uses
MapReduce and the extensive NN Query index technology for arranging Thiessen polygon, are believed in the medical resource of magnanimity with client
Cease for representative point, using the sum data of logistics van as data set, logistics van data set is subjected to burst, finds distance symbol
The optimal logistics van of customer information is closed, the medical resource dispensing vehicle driver inquired appears in the doctor of medical resource demand client
Treat on resource requirement client software map interface, medical resource demand client selects medical resource dispensing vehicle driver, Ke Yixiang
System, which sends request or directly dials the mode of medical resource dispensing vehicle driver's phone, sends request, and server is by medical resource
Demand client request is forwarded to medical resource dispensing vehicle driver, and medical resource dispensing vehicle driver receipt is to after sending request, to asking
Ask and handle it and return to medical resource supplier, medical resource demand Client handset is sent to by medical resource supplier.
This can greatly improve search efficiency.In medical resource dispatching, such case is very common, if complete is manually one really
Very big workload.If but can be significantly using the extensive nearest Neighbor based on MapReduce and row's Thiessen polygon
The manual working time is reduced, reduces unnecessary expenditures.
Claims (1)
1. the method that a kind of IVkNN algorithm of use based on MapReduce is inquired about, its characterization step step are as follows:S1. root
The partition value file for including data set R and S is put according to IVI;S2. host node is loaded files into distributed caching, Mappers
R is read from distributed cachingi∈ R and SjThe key-value pair that each partition informations of ∈ S are formed, map functions generate new key-value pair,
For each object r ∈ R, wherein key is its subregion id, for each object s ∈ S, if dist (s, P)<Dist (r, P), then
Map functions also create one group of new key-value pair;Reducer iteratively reads R from MapperiAnd SiIn all the points, by VCi's
Identical key-value pair carries out kNN inquiries into group is divided, and checks SjAll subregions after, reducer output kNN (r, S), output
Key-value pair<R, kNN (r, S)>As kNN query results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711358195.7A CN108021689A (en) | 2017-10-19 | 2017-10-19 | The method inquired about using the IVkNN algorithms based on MapReduce |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711358195.7A CN108021689A (en) | 2017-10-19 | 2017-10-19 | The method inquired about using the IVkNN algorithms based on MapReduce |
CN201710975974.5A CN107844532A (en) | 2017-10-19 | 2017-10-19 | Based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710975974.5A Division CN107844532A (en) | 2017-10-19 | 2017-10-19 | Based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108021689A true CN108021689A (en) | 2018-05-11 |
Family
ID=61662375
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711358195.7A Pending CN108021689A (en) | 2017-10-19 | 2017-10-19 | The method inquired about using the IVkNN algorithms based on MapReduce |
CN201710975974.5A Pending CN107844532A (en) | 2017-10-19 | 2017-10-19 | Based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710975974.5A Pending CN107844532A (en) | 2017-10-19 | 2017-10-19 | Based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN108021689A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115203159A (en) * | 2022-07-25 | 2022-10-18 | 北京字跳网络技术有限公司 | Data storage method and device, computer equipment and storage medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109918446A (en) * | 2019-02-25 | 2019-06-21 | 湖北金拓维信息技术有限公司 | A kind of distribution Thiessen polygon parallel constructing method |
CN110175546B (en) * | 2019-05-15 | 2022-02-25 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN118535652B (en) * | 2024-07-25 | 2024-11-01 | 卓世智星(青田)元宇宙科技有限公司 | Big data storage method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488679A (en) * | 2013-08-14 | 2014-01-01 | 大连大学 | Inverted grid index-based car-sharing system under mobile cloud computing environment |
CN106708989A (en) * | 2016-12-14 | 2017-05-24 | 大连大学 | Spatial time sequence data stream application-based Skyline query method |
-
2017
- 2017-10-19 CN CN201711358195.7A patent/CN108021689A/en active Pending
- 2017-10-19 CN CN201710975974.5A patent/CN107844532A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488679A (en) * | 2013-08-14 | 2014-01-01 | 大连大学 | Inverted grid index-based car-sharing system under mobile cloud computing environment |
CN106708989A (en) * | 2016-12-14 | 2017-05-24 | 大连大学 | Spatial time sequence data stream application-based Skyline query method |
Non-Patent Citations (5)
Title |
---|
CHANGQING JI等: "Inverted Grid-based kNN Query Processing with MapReduce", 《IEEE XPLORE》 * |
CHANGQING JI等: "Inverted Voronoi-Based kNN Query Processing with MapReduce", 《IEEE XPLORE》 * |
刘彪: "空间数据库中基于MapReduce的kNN算法研究", 《中国优秀硕士学位论文信息科技辑》 * |
吴晓兵: "基于Voronoi图的分布式反最近邻查询方法研究", 《中国优秀硕士学位论文信息科技辑》 * |
李钰: "基于MapReduce的空间数据RkNN算法研究", 《中国优秀硕士学位论文信息科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115203159A (en) * | 2022-07-25 | 2022-10-18 | 北京字跳网络技术有限公司 | Data storage method and device, computer equipment and storage medium |
CN115203159B (en) * | 2022-07-25 | 2024-06-04 | 北京字跳网络技术有限公司 | Data storage method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN107844532A (en) | 2018-03-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108021690A (en) | Arrange Voronoi indexes | |
Xu et al. | Blockchain-based cloudlet management for multimedia workflow in mobile cloud computing | |
Sashi et al. | Dynamic replication in a data grid using a modified BHR region based algorithm | |
CN108021689A (en) | The method inquired about using the IVkNN algorithms based on MapReduce | |
CN106708989A (en) | Spatial time sequence data stream application-based Skyline query method | |
CN107766496A (en) | Based on MapReduce and the extensive NN Query system for arranging Thiessen polygon | |
CN105760470A (en) | Medical calling system based on spatial reverse nearest neighbor query in cloud computing environment | |
CN109074304A (en) | The data distribution system of optimization | |
CN102981913B (en) | Inference control method and inference control system with support on large-scale distributed incremental computation | |
CN111400033B (en) | Platform resource cost allocation method and device, storage medium and computer equipment | |
Miao et al. | Task assignment with efficient federated preference learning in spatial crowdsourcing | |
CN111698332A (en) | Method, device and equipment for distributing business objects and storage medium | |
CN113300982A (en) | Resource allocation method, device, system and storage medium | |
CN108280175A (en) | The row's of the falling space index method divided based on medical services region | |
CN107818147A (en) | Distributed temporal index system based on Voronoi diagram | |
CN105761037A (en) | Logistics scheduling method based on space reverse neighbor search under cloud computing environment | |
Yang et al. | Efficient knowledge management for heterogeneous federated continual learning on resource-constrained edge devices | |
Zhang et al. | A real-time distributed cluster storage optimization for massive data in internet of multimedia things | |
CN108153910B (en) | Establishing distributed space-time multidimensional indexing system for mobile medical service | |
CN107832479A (en) | Medical aid request mobile calls method | |
Sun et al. | Big data trip classification on the New York City taxi and Uber sensor network | |
CN108804502A (en) | Big data inquiry system, method, computer equipment and storage medium | |
Deng | Database task processing optimization based on performance evaluation and machine learning algorithm. | |
Yang et al. | Joint heterogeneity-aware personalized federated search for energy efficient battery-powered edge computing | |
CN117035980A (en) | Resource borrowing evaluation method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180511 |
|
RJ01 | Rejection of invention patent application after publication |