More Web Proxy on the site http://driver.im/

research-article

Criss-Cross Hash Joins: Design and Analysis

Authors:

Stanley ZiontsAuthors Info & Claims

IEEE Transactions on Knowledge and Data Engineering, Volume 13, Issue 4

Pages 637 - 653

https://doi.org/10.1109/69.940737

Published: 01 July 2001 Publication History

Abstract

Join processing in relational database systems continues to be a difficult and challenging problem. In this research, we propose a criss-cross hash join strategy that draws from both hashing and indexing techniques, inheriting the advantages of each. To facilitate the criss-cross hash join, a simple data structure, termed page map, is introduced. The page maps aid in reducing the hashing effort incurred in the current hash based join methods. Furthermore, the page maps implicitly capture and exploit the possible inherent order among tuples in the relations, however partial it may be, to achieve superior performance. As the proposed methodology relies on the hashing scheme, the page maps are simpler, more compact, and easier to maintain than the traditional data structures associated with index based join methods. We develop the ideas intuitively first, followed by a formal development of the concepts and the algorithms. A detailed probabilistic analysis of the algorithms is presented and their performance is assessed through extensive empirical investigations. The empirical analysis suggests significant performance improvements over the current state-of-the-art hybrid hash method, especially in the presence of possible inherent order.

References

[1]

D.A. Bell D.H.O. Ling and S. McClean, “Pragmatic Estimation of Join Sizes and Attribute Correlations,” Proc. Conf. Data Eng., 1989.

Digital Library

[2]

M.W. Blasgen and K.P. Eswaran, “Storage and Access in Relational Databases,” IBM Systems J., vol. 16, no. 4, 1977.

Digital Library

[3]

K. Bratbergsengen, “Hashing Methods and Relational Algebra Operations,” Proc. Conf. Very Large Databases, 1984.

Digital Library

[4]

J.S.J. Chen and V.O.K. Li, “Optimizing Joins in Fragmented Database Systems on a Broadcast Local Network,” IEEE Trans. Software Eng., vol. 15, no. 1, Jan. 1989.

Digital Library

[5]

M.S. Chen M.L. Lo P.S. Yu and H.E. Young, “Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins,” IEEE Trans. Knowledge and Data Eng., vol. 7, no. 4, Aug. 1995.

Digital Library

[6]

M.S. Chen and P.S. Yu, “Interleaving a Join Sequence with SemiJoins in Distributed Query Processing,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, May 1992.

Digital Library

[7]

J. Cheng D. Haderle R. Hedges B. Iyer T. Messinger and C. Mohan, “An Efficient Hybrid Join Algorithm: A DB2 Prototype,” Proc. Seventh Int'l Conf. Data Eng., 1991.

Digital Library

[8]

B.C. Desai, “Performance of a Composite Attribute and Join Index,” IEEE Trans. Software Eng., vol. 15, no. 2, Feb. 1989.

Digital Library

[9]

D.J. DeWitt R.H. Katz F. Olken L.D. Shapiro and M.R. Stonebraker, “Implementation Techniques for Main Memory Database Systems,” Proc. ACM SIGMOD, 1984.

Digital Library

[10]

R.D. Gopal R. Ramesh and Z. Zionts, “Join Processing in Relational Databases: Analysis, Design, and Prototype Development,” working paper, State Univ. of New York at Buffalo, 1998.

Digital Library

[11]

L.R. Gotlieb, “Computing Joins of Relations,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, 1975.

Digital Library

[12]

P. Goyal H.F. Li E. Regener and F. Sadri, “Scheduling of Page Fetches in Join Operations Using Bc-Trees,” Proc. Conf. Data Eng., 1988.

Digital Library

[13]

H.I. Hsiao M.S. Chen and P.S. Yu, “On Parallel Execution of Multiple Pipelined Hash Joins,” Proc. ACM SIGMOD, 1994.

Digital Library

[14]

P. HweeHwa M.J. Carey and M. Livny, “Partially Preemptible Hash Joins,” Proc. ACM SIGMOD, 1993.

Digital Library

[15]

Y.E. Ioannidis and S. Christodoulakis, “Optimal Histograms for Limiting Worst-Case Error Propogation in the Size of Join Results,” ACM Trans. Database Systems, vol. 18, no. 4, 1993.

Digital Library

[16]

Y.E. Ioannidis and Y.C. Kang, “Left-Deep vs. Bushy Trees: An Analysis of Strategy Spaces and its implications for Query Optimization,” Proc. ACM SIGMOD, 1991.

Digital Library

[17]

M. Kitsuregawa H. Tanaka and T. Moto-Oka, “Application of Hash to Database Machine and Its Architecture,” New Generation Computing, 1983.

[18]

D.E. Knuth, The Art of Computer Programming. Addison-Wesley, 1973.

Digital Library

[19]

T. Lehman and M. Carey, “Query Processing in Main Memory Database Systems,” Proc. SIGMOD, 1986.

Digital Library

[20]

M.L. Lo M.S. Chen C.V. Ravishanker and P.S. Yu, “On Optimal Processor Allocation to Support Pipelined Hash Joins,” Proc. ACM SIGMOD, 1993.

Digital Library

[21]

M. Matysiak, “Efficient Optimization of Large Join Queries Using Tabu Search,” Information Sciences, vol. 83, 1995.

Digital Library

[22]

K.P. Mikkilineni and S.Y.W. Su, “An Evaluation of Relational Join Algorithms in a Pipelined Query Processing Environment,” IEEE Trans. Software Eng., vol. 14, no. 6, June 1988.

Digital Library

[23]

P. Mishra and M. Eich, “Join Processing in Relational Databases,” ACM Computing Surveys, 1992.

Digital Library

[24]

M. Negri and G. Pelagatti, “Distributive Join—A New Algorithm for Joining Relations,” ACM Trans. Database Systems, vol. 16, no. 4, 1991.

Digital Library

[25]

S. Pramanik and D. Vineyard, “Optimizing Join Queries in Distributed Databases,” IEEE Trans. Software Eng., vol. 14, no. 9, 1988.

Digital Library

[26]

D.J. Reid, “Optimal Distributed Execution of Join Queries,” Computers and Math. with Applications, vol. 27, no. 11, 1994.

[27]

N. Roussopoulos, “An Incremental Access Method for ViewCache: Concepts, Algorithms, and Cost Analysis,” ACM Trans. Database Systems, vol. 16, no. 3, 1991.

Digital Library

[28]

L.D. Shapiro, “Join Processing in Database Systems with Large Main Memories,” ACM Trans. Database Systems, vol. 11, no. 3, 1986.

Digital Library

[29]

D. Shasha and T.L. Wang, “Optimizing Equijoin Queries in Distributed Databases where Relations Are Hash Partitioned,” ACM Trans. Database Systems, vol. 16, no. 2, 1991.

Digital Library

[30]

E.J. Shekita and M.J. Carey, “A Performance Evaluation of Pointer-Based Joins,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1990.

Digital Library

[31]

J.W. Stamos and H.C. Young, “A Symmetrical Fragment and Replicate Algorithm for Distributed Joins,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, Dec. 1993.

Digital Library

[32]

A. Swami, “Optimization of Large Join Queries: Combining Heuristics with Combinatorial Techniques,” Proc. ACM SIGMOD, 1989.

Digital Library

[33]

Y.C. Tay, “On Optimality of Strategies for Multiple Joins,” J. ACM, vol. 40, no. 5, 1993.

Digital Library

[34]

J.D. Ullman, “Implementation of Logical Query Languages for Databases,” ACM Trans. Database Systems, vol. 10, no. 2, 1985.

Digital Library

[35]

P. Valduriez, “Join Indices,” ACM Trans. Database Systems, vol. 12, no. 2, 1987.

Digital Library

[36]

J.L. Wolf P.S. Yu J. Turek and D.M. Dias, “A Parallel Hash Join Algorithm for Managing Data Skew,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, 1993.

Digital Library

[37]

C. Zaniolo, “The Representation and Deductive Retrieval of Complex Objects,” Proc. 11th VLDB, 1985.

Cited By

Ullmann J(2007)Partition search for non-binary constraint satisfactionInformation Sciences: an International Journal10.1016/j.ins.2007.03.030177:18(3639-3678)Online publication date: 1-Sep-2007
https://dl.acm.org/doi/10.1016/j.ins.2007.03.030

Index Terms

Criss-Cross Hash Joins: Design and Analysis
1. Information systems
  1. Information retrieval

Recommendations

Faster joins, self-joins and multi-way joins using join indices
Ranked Reverse Nearest Neighbor Search

Given a set of data points P and a query point q in a multidimensional space, Reverse Nearest Neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-Nearest Neighbor (RkNN) query (where k ≥ 1) generalizes RNN query to find ...
Faster joins, self-joins and multi-way joins using join indices
Special issue: next generation information technologies and systems

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering

IEEE Transactions on Knowledge and Data Engineering Volume 13, Issue 4

July 2001

190 pages

ISSN:1041-4347

Issue’s Table of Contents

Copyright © Copyright © 2001 IEEE. All Rights Reserved.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 July 2001

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ullmann J(2007)Partition search for non-binary constraint satisfactionInformation Sciences: an International Journal10.1016/j.ins.2007.03.030177:18(3639-3678)Online publication date: 1-Sep-2007
https://dl.acm.org/doi/10.1016/j.ins.2007.03.030

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents