[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN108021689A - The method inquired about using the IVkNN algorithms based on MapReduce - Google Patents

The method inquired about using the IVkNN algorithms based on MapReduce Download PDF

Info

Publication number
CN108021689A
CN108021689A CN201711358195.7A CN201711358195A CN108021689A CN 108021689 A CN108021689 A CN 108021689A CN 201711358195 A CN201711358195 A CN 201711358195A CN 108021689 A CN108021689 A CN 108021689A
Authority
CN
China
Prior art keywords
mapreduce
key
value pair
data set
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711358195.7A
Other languages
Chinese (zh)
Inventor
季长清
汪祖民
秦静
吴锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN201711358195.7A priority Critical patent/CN108021689A/en
Publication of CN108021689A publication Critical patent/CN108021689A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

This divisional application discloses the method that a kind of IVkNN algorithm of use based on MapReduce is inquired about, and belongs to cloud computing, big data field, is for solving the problems, such as:Available data querying method index efficiency is improved, technical essential is:Host node is loaded files into distributed caching, and Mappers reads R from distributed cachingi∈ R and SjThe key-value pair that each partition informations of ∈ S are formed, map functions generate new key-value pair, and for each object r ∈ R, wherein key is its subregion id, and effect is:Algorithm index efficiency can be improved so that MapReduce can be no longer influenced by room and time influence.

Description

The method inquired about using the IVkNN algorithms based on MapReduce
The application for application number 2017109759745, applying date 2017-10-19, denomination of invention " based on MapReduce with The divisional application of the extensive nearest Neighbor of row's Thiessen polygon ".
Technical field
The invention belongs to cloud computing, big data field, is related to a kind of can be effectively improved under distributed environment and inquires about effect The MapReduce indexes of rate.
Background technology
MapReduce is a kind of currently a popular programming framework based on cloud platform, it can handle and generate large-scale number According to collection, it supports data-intensive application using the shared cluster of nothing.Processing step is specially:In distributed cache system In, it is that key/value pair among one group is generated in map functions, according to phase by MapReduce tasks when handling a key/value pair With middle key merge all medians, each map is independently of each other operations, i.e., all maps can be held parallel OK.One group " reducers " of MapReduce can perform reduction operation, and the output that the Map with identical key is operated at the same time may be used With reduction to same reducer.But one reduction procedure of isolated operation may be such that inefficiency;
MapReduce can be used for supporting than traditional more massive data processing of commerce server cluster, it can be The data of a PB quantity can be only handled in a few houres, carrying out data directory using MapReduce has preferable application Prospect.However, parallel processing of the existing Index Algorithm due to not adapting to MapReduce, builds the time consumption of index not Enough ideals, scalability are bad.
The content of the invention
In order to improve available data querying method index efficiency, the present invention provides following scheme:
A kind of extensive nearest Neighbor based on MapReduce with row's Thiessen polygon, includes the following steps:
S1. the row's of falling Voronoi indexes based on MapReduce are constructed;
S2. subregion is carried out to data set R and S using the row's of falling Voronoi indexes;
S3. the IVKNN based on MapReduce is used to carry out distribution kNN inquiries.
Construction is as follows the step of arranging Voronoi indexes based on MapReduce:
Two datasets R is given in S1.1.d dimension spaces and S, Hadoop carry out burst, part mappers is parallel at the same time to be transported OK, in MapReduce tasks, using the reducer of acquiescence, before map functions are started, generation is obtained using pre- clustering algorithm Table point p, and be loaded into the main memory of each map;
S1.2. in each map treatment progress, the burst of input is read using TextInputForma successively, TextInputFormat reads data into the example of Mapper from file, calculate respectively m object r in data set R with respectively N object s in the distance between point p, data set S is represented with representing the distance between point p, and by range data collection R J-th of object s in i-th of object r and data set SjImmediate representative point PijSelect and be gathered in Voronoi cells VCiIn, subregion VC1~VCm, the i order for forming m VC are taken all over 1~m, also, are worked as i orders and taken one in 1~m During occurrence, under the value of i, j orders take all values all over 1~n, obtain the immediate Voronoi for representing point of m storage Cell;Output<VCi, List (Pij)>It is right;
Known query point p, differentiates its closest VCiOr most some neighbouring VCiCollection, it is most adjacent described in mapper outputs Near VCiOr most some neighbouring VCiCollect r, s object in corresponding raw data set R and/or S, and export closest VCi Or most some neighbouring VCiThe id of collection;
S1.3., mapper is output to the file system of Hadoop.
In the step S1.2, j-th in i-th of object r and data set S in the R by range data collection Object sjImmediate representative point PijSelect and be gathered in Voronoi cells VCiIn, formed m VC subregion VC1~ The process of VCm is as follows:
First object r in data set R1With first object s in data set S1The immediate point that represents as P11, number According to first object r in collection R1With second object s in data set S2It is immediate represent point as P12, first object r in data set R1With n-th of object s in data set SnImmediate represent a little For P1n, a immediate representatives of the n, which are put, to be selected and is gathered in Voronoi cells V1In;
······
M-th of object r in data set RmWith first object s in data set S1The immediate point that represents as Pm1, number According to m-th of object r in collection RmWith second object s in data set S2The immediate point that represents as Pm2, M-th of object r in data set RmWith n-th of object s in data set SnThe immediate point that represents as Pmn, the n closest Representative point select and be gathered in Voronoi cells VmIn.
The step of being inquired about using the IVkNN algorithms based on MapReduce is as follows:
S1. the partition value file for including data set R and S is put according to IVI;
S2. host node is loaded files into distributed caching, and Mappers reads R from distributed cachingi∈ R and Sj∈ The key-value pair that each partition informations of S are formed, map functions generate new key-value pair, and for each object r ∈ R, wherein key is its point Area id, for each object s ∈ S, if dist (s, P)<Dist (r, P), then map functions also create one group of new key-value pair;
Reducer iteratively reads R from MapperiAnd SiIn all the points, by VCiIdentical key-value pair into dividing group, KNN inquiries are carried out, check SjAll subregions after, reducer output kNN (r, S), export key-value pair<R, kNN (r, S)>As KNN query results.
The row's of falling Voronoi indexes include two parts:Master index, it includes all cluster centres;Second index, It include being stored in each VC to as queue.
Beneficial effect:The present invention is that one kind handles kNN inquiries using MapReduce, and can using Voronoi diagram Parallelization suitable for MapReduce is handled, so as to improving algorithm index efficiency so that MapReduce can be no longer influenced by Room and time influences.
Brief description of the drawings
Fig. 1 is Voronoi diagram;
Fig. 2 is Voronoi diagram index;
Fig. 3 is that influence of the node data change to RSD and SDS data collection structure index time in MRIV structures contrasts Figure.
Specific embodiment party
Embodiment 1:A kind of extensive nearest Neighbor based on MapReduce with row's Thiessen polygon, wherein: MapReduce is existing programming model, and for the concurrent operation of large-scale dataset, described method includes following steps:
S1. construct based on MapReduce the row's of falling Voronoi indexes (Thiessen polygon index, InvertedVoronoiIndex,IVI);
S2. using fall row Voronoi indexes to data set R and S progress subregion, obtain VC subregions, two subregions be by Need to carry out when establishing of Voronoi diagram in the later stage, Voronoi diagram needs two subregions to combine, thus carries out in this step The subregions of two datasets;
S3. the IVKNN based on MapReduce is used to carry out distribution kNN (nearest neighbor algorithm) inquiries, IVKNN is to utilize IVI Fall row Voronoi indexes.
Wherein:Construction is as follows the step of arranging Voronoi indexes based on MapReduce:
Two datasets R is given in S1.d dimension spaces and S, Hadoop (distributed system architecture) carry out burst, portion Divide mappers (a kind of existing coding and decoding) while parallel operation, in MapReduce tasks, use the reducer of acquiescence (subtask merging process), before map functions are started, obtains representing point p, and be loaded into each map using pre- clustering algorithm In the main memory of (subtask, which is decomposed, to be performed);
Wherein:Inside, is clustered the data clusters of point by the acquisition methods of representative, definite internal cluster and consecutive points, Select cluster centre after cluster to be indexed, required data are to cluster a consecutive points for connection with internal, are clustered with this inside Point is the center of circle, and circle is established comprising adjacent cluster centre point, and Delaunay triangles are used as the triangle of circumscribed circle using this circle Two different inside cluster points are established Delaunay triangles in this method, the two Delaunay triangles by shape respectively Delaunay triangulation network is established by common ground of consecutive points, data object is divided into several big subregions, selects wherein one cluster Point is represented as representing a little, each object being divided is to be clustered in a Voronoi unit, each Voronoi grids In contain object id.
S2. in each map treatment progress, read successively using TextInputForma (a kind of existing read mode) The burst of input is taken, TextInputFormat reads data into the example of Mapper from file, calculates respectively in data set R The distance between the distance between m object r and each representative point p, n object s in data set S and representative point p, and will be away from From j-th of object s in i-th of the object r and data set S in data set RjImmediate representative point PijSelect and be gathered in Voronoi cells VCiIn, subregion VC1~VCm, the i order for forming m VC (subregion of Voronoi diagram) are taken all over 1 ~m, also, when i orders take an occurrence in 1~m, under the value of i, j orders take all values all over 1~n, obtain The immediate Voronoi cells for representing point of m storage;Output<VCi, List (Pij)>It is right;
I.e.:First object r in data set R1With first object s in data set S1It is immediate represent point as P11, first object r in data set R1With second object s in data set S2It is immediate represent point as P12, first object r in data set R1With n-th of object s in data set SnImmediate represent a little For P1n, a immediate representatives of the n, which are put, to be selected and is gathered in Voronoi cells V1In;
Second object r in data set R2With first object s in data set S1The immediate point that represents as P21, number According to second object r in collection R2With second object s in data set S2It is immediate represent point as P22, second object r in data set R2With n-th of object s in data set SnImmediate represent a little For P2n, a immediate representatives of the n, which are put, to be selected and is gathered in Voronoi cells V2In;
······
M-th of object r in data set RmWith first object s in data set S1The immediate point that represents as Pm1, number According to m-th of object r in collection RmWith second object s in data set S2The immediate point that represents as Pm2, M-th of object r in data set RmWith n-th of object s in data set SnThe immediate point that represents as Pmn, the n closest Representative point select and be gathered in Voronoi cells VmIn;
Known query point p, differentiates its closest VCiOr most some neighbouring VCiCollection, it is most adjacent described in mapper outputs Near VCiOr most some neighbouring VCiCollect r, s object in corresponding raw data set R and/or S, and export closest VCi Or most some neighbouring VCiThe id of collection;
S3., mapper is output to the file system of Hadoop.
The step of being inquired about using the IVkNN algorithms based on MapReduce is as follows:
S1. the partition value file for including data set R and S is put according to IVI (arranging Voronoi indexes);
S2. host node is loaded files into distributed caching, and Mappers reads R from distributed cachingi∈ R and Sj∈ The key-value pair that each partition informations of S are formed, map functions generate new key-value pair, and for each object r ∈ R, wherein key is its point Area id, is worth and is made of k and v, for each object s ∈ S, if dist (s, Pij)<Dist (r, Pij), then map functions also create One group of new key-value pair;
Reducer iteratively reads R from MapperiAnd SjIn all the points, by VCiIdentical key-value pair into dividing group, KNN inquiries are carried out, check SjAll subregions after, reducer output kNN (r, S), export key-value pair<R, kNN (r, S)>As KNN query results.
The IVI, IVI include two parts:Master index, it includes all cluster centres;Second index, it includes storage There are each VC to as queue.
Embodiment 2:The present embodiment can implement as independent technical solution or as in embodiment 1 each scheme into one Step explanation, present embodiments provides a kind of extensive nearest Neighbor based on MapReduce with row's Thiessen polygon, this Method is a kind of highly effective algorithm that kNN inquiries are handled using Voronoi diagram based on MapReduce, it can also solve to move Dynamic medical call system meets wireless penetration, networking, mobile this future developing trend.The present embodiment is also for the prior art In deficiency improved, possess good high efficiency and scalability.To achieve these goals, the present embodiment is used Technical solution perform step it is as follows:Carry out MapReduce and arrange building for the extensive NN Query index of Thiessen polygon It is vertical;MapReduce is a kind of currently a popular programming framework based on cloud platform, it can handle and generate large data collection, It supports data-intensive application using the shared cluster of nothing.Processing step is specially:In distributed cache system, by MapReduce tasks are that key/value pair among one group is generated in map functions, according to identical when handling a key/value pair Middle key merges all medians.Each map is independently of each other operations, i.e., all maps can be performed parallel --- Although actually this is limited to separate data source or data source CPU quantity nearby.Likewise, one group " reducers " can be performed Reduction operation, the output that the Map with identical key is operated at the same time can be with reduction to same reducer.Although with serial calculation Method is compared, this process it is possible that inefficiency (because must run a multiple rather than reduction procedure), but MapReduce can be used for supporting than traditional more massive data processing of commerce server cluster, it can be only several small When the interior data that can handle a PB quantity, we are mainly used in Spark systems in practice, have been obtained obvious Improved efficiency, the method execution efficiency more traditional than before use improve more than 12%.
The present embodiment includes two parts using MapReduce structures IVI, IVI:In master index, including all clusters The heart;Second index, including be stored in each VC to as queue.Inverted index is for effectively index position and query object Data object in adjacent queue.When a given inquiry, we can differentiate closest VC or most some are neighbouring VC collection.Then the corresponding queue element (QE)s of these VC are included to come, so as to obtain kNN query results.Due to Voronoi diagram It can be obtained with merging multiple Voronoi diagrams (VP) by splitting, be suitable for so construction falls to arrange Voronoi indexes MapReduce model.The Voronoi for particularly every sub- VP being merged to the end.
Construction falls to arrange Voronoi indexes and concretely comprises the following steps:
S1. give and two datasets R and S are given in d dimension spaces.Hadoop peace default mechanisms carry out burst.Some Mappers parallel operations at the same time.In MapReduce tasks, we use the reducer given tacit consent to.Start map functions it Before, we obtain representing point p using quick pre- clustering algorithm, and are loaded into the main memory of each map.
S2. in each map treatment progress, it will read the burst of input using TextInputFormat successively (pressing the input format in distributed file system), TextInputFormat can read data to the reality of Mapper from file In example.Each r, the distance between s objects and p points are calculated, and by r, s distributes to immediate representative point P. in algorithm In 2-3 rows, each point is collected in a Voronoi cell, it will be produced into m Voronoi cell, in algorithm It can be exported in 4-6 rows<VCm, List (Pi)>Right, mapper output raw data sets (R or S) arrive each of hithermost subregion A object r, s and its subregion VCmId.
Finally, in algorithm 8-10 rows, it would be desirable to needed according to what is controlled oneself by customized Mapper is output to the file system of Hadoop by MultipleOutputFormat functions.It is determined how task result Write back in the lasting storage of bottom.Voronoi index of the structure based on MapReduce is described in detail in we in algorithm 1 The algorithm pseudo code of structure.Using IVI, if given one represents a little, we can start MapReduce tasks and come into line number According to subregion and collect some data messages of each subregion.
In practice, we first by varying clustered node number, the number change from 2 to 32, is based on to assess structure The row's of falling Voronoi (MRIV) index efficiency of MapReduce.Fig. 2 illustrates node data change to RSD and SDS data collection structure Index the influence of time.In Fig. 2, index structure almost linearly increases with the increase of clustered node number.Practice result is also demonstrate,proved Understanding the structure of MRIV indexes needs less time, and final Voronoi can be obtained by merging multiple Voronoi subgraphs Therefore handled suitable for MapReduce parallelizations.In addition, IVI employs inverted index structure, its scalability is better than its other party Method, to sum up, indexes large-scale spatial data, and it is most suitable method to arrange Voronoi indexes.
Embodiment 3:In today that social security service develops rapidly at a high speed, the living standard of people increasingly improves, Also become more hommization and personalization for the demand of medical services.Also there are more and more people to need more at the same time Prompt and perfect medical services.
With mobile communication and the rapid growth based on location-based service correlation technique, cloud computing, big data, Internet of Things, shifting The technologies such as dynamic calculating and space orientation are progressively ripe, and such as GPS, camera, blue-teeth data are also constantly increasing, and emerge in large numbers Substantial amounts of spatial data, this to be faced with huge challenge in the storage and processing of various spatial datas or object.Therefore, with The development of IT application process, electronic health record, nursing call center system, extensive medical data base etc. in industry of medical care Using being also rapidly developing, improve work efficiency, improve medical services, Economy type medicine cost etc. played it is more and more Effect.
But China's geographical environment difference is huge, economic development is uneven, medical resource skewness weighing apparatus, developed regions with Outlying district is compared, and medical level is there is also very big difference, while as rural area is to the at full speed of the industries such as urban migration, tourism Development so that exponentially type increases on the basis of script population mobility is big, is frequently encountered just to a place, runs into disease Disease is unknown to where see a doctor, and stands in the queue to register it is more likely that need several months ahead of time to buy tickets, tosses about multiple hospitals by bus, most The problem of a large amount of manpower financial capacities have been wasted on road at last, and disease is not got timely medical treatment but.Being frequently encountered needs emergency treatment When, do not know but around have what hospital, which hospital can handle this dangerous situation, which hospital position is more preferable closer to, service, Because the delay time at stop, cause treatment not in time, or even tragedy because of delay treatment and lethal can occur.But with the big data epoch Arrival, there is the mode that more related medical resources are inquired about.Medical resource inquiry makes patient more easily find certainly Oneself required data, for example, patient can inquire the hospital nearest from oneself by medical resource, since it is convenient just Doctor.Based on the starting point, we have designed and Implemented the invention.
In the present embodiment, to the big rule based on MapReduce with row's Thiessen polygon described in above-mentioned two embodiment The concrete application of mould nearest Neighbor is made an explanation.
In one embodiment, the process returned based on following steps realization index, request processing, data:
S1. hospital, which establishes, is used to perform based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon System;
S2. user initiates to ask by ambulatory medical device to hospital;
S3. when hospital receives user's request, agreed to, and user requested data is returned into user.
Medical knowledge is sent specific to patient to ask for or during medical aid request, its mobile calls method is as follows:
S1. hospital, which establishes, is used to perform based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon System, so that user can effectively improve search efficiency under distributed environment.
S2. patient user, when being badly in need of medical knowledge and/or medical aid, can be exhaled outside hospital by portable medical Equipment is made to send respective request to hospital.
S3. when hospital receives user's request, row's Thiessen polygon index is transferred down, performs distribution Skyline or kNN Deng spatial query algorithms (being not limited to these algorithms), agreement decision-making is carried out if meeting condition, and user requested data is returned Back to user.
In another embodiment, the extensive NN Query side based on MapReduce with row's Thiessen polygon Method is applied to medical material dispatching, it realizes that medical resource dispatching includes the following steps:
S1. medical resource supplier is established to perform and is looked into based on MapReduce and the extensive neighbour of row's Thiessen polygon The system of inquiry, and medical resource data are uploaded in systems;
S2. when user needs medical resource, to medical resource, supplier files a request;
S3. medical resource supplier gives two medical resource data sets, and Hadoop carries out burst, each map processing In process, the burst of input is read using TextInputForma successively, calculates each object and the medical treatment needed for user The distance between resource, and the object searched out is distributed to closest to the medical resource needed for user, by from needed for user The nearest object of medical resource is selected and is gathered in Voronoi cells VmIn, export on the medical resource needed for user most Neighbouring VCiOr most some neighbouring VCiThe id of collection, transfers data to user.
In another embodiment, handheld device medical resource demand client access mobile Internet, and provided with medical treatment Source supplier establishes contact, and medical resource demand client sends medical resource requirement request, medical treatment money to medical resource supplier The spatial geographic information of source demand client be collected to for perform described in embodiment 1-2 based on MapReduce with fall arrange In the system of the extensive nearest Neighbor of Thiessen polygon, and the spatial geographic information of medical resource logistic car is by the system System collection in real time, the system is using medical resource demand customer information as representative point, with medical resource position and medical resource thing It is data set to flow lorry position data, by the logistic car geo-spatial data collection of medical resource progress burst, using based on MapReduce and the extensive nearest Neighbor of the row's of falling Thiessen polygon are apart from medical resource demand client to find The medical resource logistics truck position and medical resource position of arest neighbors.
As being explained further for such scheme, which is also referred to as medical dispensing vehicle, handheld device Medical resource demand client is by the wireless network based on 2G/3G/4G modes or WIFI, while accessing mobile Internet Establish and contact with medical resource supplier, medical resource demand client sends to medical resource supplier and asks, medical treatment dispatching doctor Treat resource dispensing vehicle driver after the system is logged in, by spatial geographic information (including the positional information of oneself and medical treatment provide Source demand customer information) medical resource supplier MapReduce is sent to arranging the extensive NN Query of Thiessen polygon System, spatial geographic information spatial geographic information disclosed in system is to medical dispensing vehicle driver establish index.Medical resource needs Ask client to obtain the information of present position by mobile phone, the positional information got is sent to system, system uses MapReduce and the extensive NN Query index technology for arranging Thiessen polygon, are believed in the medical resource of magnanimity with client Cease for representative point, using the sum data of logistics van as data set, logistics van data set is subjected to burst, finds distance symbol The optimal logistics van of customer information is closed, the medical resource dispensing vehicle driver inquired appears in the doctor of medical resource demand client Treat on resource requirement client software map interface, medical resource demand client selects medical resource dispensing vehicle driver, Ke Yixiang System, which sends request or directly dials the mode of medical resource dispensing vehicle driver's phone, sends request, and server is by medical resource Demand client request is forwarded to medical resource dispensing vehicle driver, and medical resource dispensing vehicle driver receipt is to after sending request, to asking Ask and handle it and return to medical resource supplier, medical resource demand Client handset is sent to by medical resource supplier. This can greatly improve search efficiency.In medical resource dispatching, such case is very common, if complete is manually one really Very big workload.If but can be significantly using the extensive nearest Neighbor based on MapReduce and row's Thiessen polygon The manual working time is reduced, reduces unnecessary expenditures.

Claims (1)

1. the method that a kind of IVkNN algorithm of use based on MapReduce is inquired about, its characterization step step are as follows:S1. root The partition value file for including data set R and S is put according to IVI;S2. host node is loaded files into distributed caching, Mappers R is read from distributed cachingi∈ R and SjThe key-value pair that each partition informations of ∈ S are formed, map functions generate new key-value pair, For each object r ∈ R, wherein key is its subregion id, for each object s ∈ S, if dist (s, P)<Dist (r, P), then Map functions also create one group of new key-value pair;Reducer iteratively reads R from MapperiAnd SiIn all the points, by VCi's Identical key-value pair carries out kNN inquiries into group is divided, and checks SjAll subregions after, reducer output kNN (r, S), output Key-value pair<R, kNN (r, S)>As kNN query results.
CN201711358195.7A 2017-10-19 2017-10-19 The method inquired about using the IVkNN algorithms based on MapReduce Pending CN108021689A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711358195.7A CN108021689A (en) 2017-10-19 2017-10-19 The method inquired about using the IVkNN algorithms based on MapReduce

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711358195.7A CN108021689A (en) 2017-10-19 2017-10-19 The method inquired about using the IVkNN algorithms based on MapReduce
CN201710975974.5A CN107844532A (en) 2017-10-19 2017-10-19 Based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201710975974.5A Division CN107844532A (en) 2017-10-19 2017-10-19 Based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon

Publications (1)

Publication Number Publication Date
CN108021689A true CN108021689A (en) 2018-05-11

Family

ID=61662375

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201711358195.7A Pending CN108021689A (en) 2017-10-19 2017-10-19 The method inquired about using the IVkNN algorithms based on MapReduce
CN201710975974.5A Pending CN107844532A (en) 2017-10-19 2017-10-19 Based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201710975974.5A Pending CN107844532A (en) 2017-10-19 2017-10-19 Based on MapReduce and the extensive nearest Neighbor for arranging Thiessen polygon

Country Status (1)

Country Link
CN (2) CN108021689A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203159A (en) * 2022-07-25 2022-10-18 北京字跳网络技术有限公司 Data storage method and device, computer equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918446A (en) * 2019-02-25 2019-06-21 湖北金拓维信息技术有限公司 A kind of distribution Thiessen polygon parallel constructing method
CN110175546B (en) * 2019-05-15 2022-02-25 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium
CN118535652B (en) * 2024-07-25 2024-11-01 卓世智星(青田)元宇宙科技有限公司 Big data storage method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488679A (en) * 2013-08-14 2014-01-01 大连大学 Inverted grid index-based car-sharing system under mobile cloud computing environment
CN106708989A (en) * 2016-12-14 2017-05-24 大连大学 Spatial time sequence data stream application-based Skyline query method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103488679A (en) * 2013-08-14 2014-01-01 大连大学 Inverted grid index-based car-sharing system under mobile cloud computing environment
CN106708989A (en) * 2016-12-14 2017-05-24 大连大学 Spatial time sequence data stream application-based Skyline query method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHANGQING JI等: "Inverted Grid-based kNN Query Processing with MapReduce", 《IEEE XPLORE》 *
CHANGQING JI等: "Inverted Voronoi-Based kNN Query Processing with MapReduce", 《IEEE XPLORE》 *
刘彪: "空间数据库中基于MapReduce的kNN算法研究", 《中国优秀硕士学位论文信息科技辑》 *
吴晓兵: "基于Voronoi图的分布式反最近邻查询方法研究", 《中国优秀硕士学位论文信息科技辑》 *
李钰: "基于MapReduce的空间数据RkNN算法研究", 《中国优秀硕士学位论文信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203159A (en) * 2022-07-25 2022-10-18 北京字跳网络技术有限公司 Data storage method and device, computer equipment and storage medium
CN115203159B (en) * 2022-07-25 2024-06-04 北京字跳网络技术有限公司 Data storage method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN107844532A (en) 2018-03-27

Similar Documents

Publication Publication Date Title
CN108021690A (en) Arrange Voronoi indexes
Xu et al. Blockchain-based cloudlet management for multimedia workflow in mobile cloud computing
Sashi et al. Dynamic replication in a data grid using a modified BHR region based algorithm
CN108021689A (en) The method inquired about using the IVkNN algorithms based on MapReduce
CN106708989A (en) Spatial time sequence data stream application-based Skyline query method
CN107766496A (en) Based on MapReduce and the extensive NN Query system for arranging Thiessen polygon
CN105760470A (en) Medical calling system based on spatial reverse nearest neighbor query in cloud computing environment
CN109074304A (en) The data distribution system of optimization
CN102981913B (en) Inference control method and inference control system with support on large-scale distributed incremental computation
CN111400033B (en) Platform resource cost allocation method and device, storage medium and computer equipment
Miao et al. Task assignment with efficient federated preference learning in spatial crowdsourcing
CN111698332A (en) Method, device and equipment for distributing business objects and storage medium
CN113300982A (en) Resource allocation method, device, system and storage medium
CN108280175A (en) The row&#39;s of the falling space index method divided based on medical services region
CN107818147A (en) Distributed temporal index system based on Voronoi diagram
CN105761037A (en) Logistics scheduling method based on space reverse neighbor search under cloud computing environment
Yang et al. Efficient knowledge management for heterogeneous federated continual learning on resource-constrained edge devices
Zhang et al. A real-time distributed cluster storage optimization for massive data in internet of multimedia things
CN108153910B (en) Establishing distributed space-time multidimensional indexing system for mobile medical service
CN107832479A (en) Medical aid request mobile calls method
Sun et al. Big data trip classification on the New York City taxi and Uber sensor network
CN108804502A (en) Big data inquiry system, method, computer equipment and storage medium
Deng Database task processing optimization based on performance evaluation and machine learning algorithm.
Yang et al. Joint heterogeneity-aware personalized federated search for energy efficient battery-powered edge computing
CN117035980A (en) Resource borrowing evaluation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180511

RJ01 Rejection of invention patent application after publication