[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3459104.3459192acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiseeieConference Proceedingsconference-collections
research-article

kNN-join Query Processing Algorithm on Mapreduce for Large Amounts of Data

Published: 20 July 2021 Publication History

Abstract

Recently, the amount of data is rapidly increasing with the continuous development of computation and communication capabilities. So, it has been actively studied for the effective data analysis schemes of the large amounts of data on MapReduce which supports efficient parallel data processing for large-scale data. Among various queries for analysing data, k nearest neighbour (kNN) join query, which aims to combine the k nearest neighbours of each point of dataset R with those from another dataset S, has been considered typical. However, existing kNN join schemes on MapReduce require high computation cost for constructing and managing index structures. To solve the problems, we propose a kNN-join query processing algorithm on MapReduce for analysing large-scale data. First, our algorithm can reduce the overhead for constructing the index structure by using the seed-based dynamic partitioning. Second, it can reduce the computational overhead to find candidate partitions by using the average distance between a pair of neighbouring seeds. We show that our algorithm outperforms the existing scheme in terms of the query processing time.

References

[1]
J. Dean, S. Ghemawat, “MapReduce: Simplified data processing on large clusters”, Communications of the ACM, Vol. 51, No. 1, pp. 107-113, 2008.
[2]
D. Jiang, BC. Ooi, L. Shi, S. Wu, “The performance of MapReduce: An in-depth study”, Proceedings of the VLDB Endowment, Vol 3.1-2, pp. 472-483, 2010.
[3]
C. Ji, Y. Li, W. Qiu, U. Awada, K. Li, “Big Data Processing in Cloud Computing Environments”, ISPAN 2012, pp.17-23, 2012.
[4]
C. Bohm, HP. Kriegel, “A cost model and index architecture for the similarity join”, ICDE 2001, pp. 411-420, 2001.
[5]
C. Xia, H. Lu, BC. Ooi, J. Hu, “Gorder: An efficient method for knn join processing”, VLDB 2004, pp. 756-767, 2004.
[6]
W. Lu., Y. Shen, Y. Chen, BC. Ooi, “Efficient processing of k nearest neighbor joins using MapReduce”, Proceedings of the VLDB Endowment, Vol. 5(10), pp. 1016–1027, 2012.
[7]
C. Yu, B. Cui, S. Wang, J. Su, “Efficient index-based knn join processing for high-dimensional data”, Information and Software Technology, Vol. 49, No.4, pp. 332-344, 2007.
[8]
B. Yao, F. Li, P. Kumar, “K nearest neighbor queries and knn-joins in large relational databases (almost) for free”, ICDE 2010, pp 4-15, 2010.
[9]
R. Vernica, M. J. Carey, C. Li, “Efficient parallel set-similarity joins using MapReduce”, SIGMOD 2010, pp. 495–506, 2010.
[10]
A. Metwally, C. Faloutsos, “V-smart-join: A scalable MapReduce framework for all-pair similarity joins of multisets and vectors”, PVLDB 2012, Vol. 5(8), pp. 704–715, 2012.
[11]
R. Walpole, R. Myers, S. Myers, K. Ye, “Probability and Statistics for Engineers and Scientists”, Prentice-Hall 20

Cited By

View all
  • (2024)A Cluster-Based Approach to kNN Join Over Batch-Dynamic High-Dimensional DataAdvanced Data Mining and Applications10.1007/978-981-96-0814-0_6(81-96)Online publication date: 13-Dec-2024
  • (2023)Efficient continuous kNN join over dynamic high-dimensional dataWorld Wide Web10.1007/s11280-023-01204-926:6(3759-3794)Online publication date: 11-Sep-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ISEEIE 2021: 2021 International Symposium on Electrical, Electronics and Information Engineering
February 2021
644 pages
ISBN:9781450389839
DOI:10.1145/3459104
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 July 2021

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ISEEIE 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Cluster-Based Approach to kNN Join Over Batch-Dynamic High-Dimensional DataAdvanced Data Mining and Applications10.1007/978-981-96-0814-0_6(81-96)Online publication date: 13-Dec-2024
  • (2023)Efficient continuous kNN join over dynamic high-dimensional dataWorld Wide Web10.1007/s11280-023-01204-926:6(3759-3794)Online publication date: 11-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media