[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Criss-Cross Hash Joins: Design and Analysis

Published: 01 July 2001 Publication History

Abstract

Join processing in relational database systems continues to be a difficult and challenging problem. In this research, we propose a criss-cross hash join strategy that draws from both hashing and indexing techniques, inheriting the advantages of each. To facilitate the criss-cross hash join, a simple data structure, termed page map, is introduced. The page maps aid in reducing the hashing effort incurred in the current hash based join methods. Furthermore, the page maps implicitly capture and exploit the possible inherent order among tuples in the relations, however partial it may be, to achieve superior performance. As the proposed methodology relies on the hashing scheme, the page maps are simpler, more compact, and easier to maintain than the traditional data structures associated with index based join methods. We develop the ideas intuitively first, followed by a formal development of the concepts and the algorithms. A detailed probabilistic analysis of the algorithms is presented and their performance is assessed through extensive empirical investigations. The empirical analysis suggests significant performance improvements over the current state-of-the-art hybrid hash method, especially in the presence of possible inherent order.

References

[1]
D.A. Bell D.H.O. Ling and S. McClean, “Pragmatic Estimation of Join Sizes and Attribute Correlations,” Proc. Conf. Data Eng., 1989.
[2]
M.W. Blasgen and K.P. Eswaran, “Storage and Access in Relational Databases,” IBM Systems J., vol. 16, no. 4, 1977.
[3]
K. Bratbergsengen, “Hashing Methods and Relational Algebra Operations,” Proc. Conf. Very Large Databases, 1984.
[4]
J.S.J. Chen and V.O.K. Li, “Optimizing Joins in Fragmented Database Systems on a Broadcast Local Network,” IEEE Trans. Software Eng., vol. 15, no. 1, Jan. 1989.
[5]
M.S. Chen M.L. Lo P.S. Yu and H.E. Young, “Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins,” IEEE Trans. Knowledge and Data Eng., vol. 7, no. 4, Aug. 1995.
[6]
M.S. Chen and P.S. Yu, “Interleaving a Join Sequence with SemiJoins in Distributed Query Processing,” IEEE Trans. Parallel and Distributed Systems, vol. 3, no. 5, May 1992.
[7]
J. Cheng D. Haderle R. Hedges B. Iyer T. Messinger and C. Mohan, “An Efficient Hybrid Join Algorithm: A DB2 Prototype,” Proc. Seventh Int'l Conf. Data Eng., 1991.
[8]
B.C. Desai, “Performance of a Composite Attribute and Join Index,” IEEE Trans. Software Eng., vol. 15, no. 2, Feb. 1989.
[9]
D.J. DeWitt R.H. Katz F. Olken L.D. Shapiro and M.R. Stonebraker, “Implementation Techniques for Main Memory Database Systems,” Proc. ACM SIGMOD, 1984.
[10]
R.D. Gopal R. Ramesh and Z. Zionts, “Join Processing in Relational Databases: Analysis, Design, and Prototype Development,” working paper, State Univ. of New York at Buffalo, 1998.
[11]
L.R. Gotlieb, “Computing Joins of Relations,” Proc. ACM-SIGMOD Int'l Conf. Management of Data, 1975.
[12]
P. Goyal H.F. Li E. Regener and F. Sadri, “Scheduling of Page Fetches in Join Operations Using Bc-Trees,” Proc. Conf. Data Eng., 1988.
[13]
H.I. Hsiao M.S. Chen and P.S. Yu, “On Parallel Execution of Multiple Pipelined Hash Joins,” Proc. ACM SIGMOD, 1994.
[14]
P. HweeHwa M.J. Carey and M. Livny, “Partially Preemptible Hash Joins,” Proc. ACM SIGMOD, 1993.
[15]
Y.E. Ioannidis and S. Christodoulakis, “Optimal Histograms for Limiting Worst-Case Error Propogation in the Size of Join Results,” ACM Trans. Database Systems, vol. 18, no. 4, 1993.
[16]
Y.E. Ioannidis and Y.C. Kang, “Left-Deep vs. Bushy Trees: An Analysis of Strategy Spaces and its implications for Query Optimization,” Proc. ACM SIGMOD, 1991.
[17]
M. Kitsuregawa H. Tanaka and T. Moto-Oka, “Application of Hash to Database Machine and Its Architecture,” New Generation Computing, 1983.
[18]
D.E. Knuth, The Art of Computer Programming. Addison-Wesley, 1973.
[19]
T. Lehman and M. Carey, “Query Processing in Main Memory Database Systems,” Proc. SIGMOD, 1986.
[20]
M.L. Lo M.S. Chen C.V. Ravishanker and P.S. Yu, “On Optimal Processor Allocation to Support Pipelined Hash Joins,” Proc. ACM SIGMOD, 1993.
[21]
M. Matysiak, “Efficient Optimization of Large Join Queries Using Tabu Search,” Information Sciences, vol. 83, 1995.
[22]
K.P. Mikkilineni and S.Y.W. Su, “An Evaluation of Relational Join Algorithms in a Pipelined Query Processing Environment,” IEEE Trans. Software Eng., vol. 14, no. 6, June 1988.
[23]
P. Mishra and M. Eich, “Join Processing in Relational Databases,” ACM Computing Surveys, 1992.
[24]
M. Negri and G. Pelagatti, “Distributive Join—A New Algorithm for Joining Relations,” ACM Trans. Database Systems, vol. 16, no. 4, 1991.
[25]
S. Pramanik and D. Vineyard, “Optimizing Join Queries in Distributed Databases,” IEEE Trans. Software Eng., vol. 14, no. 9, 1988.
[26]
D.J. Reid, “Optimal Distributed Execution of Join Queries,” Computers and Math. with Applications, vol. 27, no. 11, 1994.
[27]
N. Roussopoulos, “An Incremental Access Method for ViewCache: Concepts, Algorithms, and Cost Analysis,” ACM Trans. Database Systems, vol. 16, no. 3, 1991.
[28]
L.D. Shapiro, “Join Processing in Database Systems with Large Main Memories,” ACM Trans. Database Systems, vol. 11, no. 3, 1986.
[29]
D. Shasha and T.L. Wang, “Optimizing Equijoin Queries in Distributed Databases where Relations Are Hash Partitioned,” ACM Trans. Database Systems, vol. 16, no. 2, 1991.
[30]
E.J. Shekita and M.J. Carey, “A Performance Evaluation of Pointer-Based Joins,” Proc. ACM SIGMOD Int'l Conf. Management of Data, 1990.
[31]
J.W. Stamos and H.C. Young, “A Symmetrical Fragment and Replicate Algorithm for Distributed Joins,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, Dec. 1993.
[32]
A. Swami, “Optimization of Large Join Queries: Combining Heuristics with Combinatorial Techniques,” Proc. ACM SIGMOD, 1989.
[33]
Y.C. Tay, “On Optimality of Strategies for Multiple Joins,” J. ACM, vol. 40, no. 5, 1993.
[34]
J.D. Ullman, “Implementation of Logical Query Languages for Databases,” ACM Trans. Database Systems, vol. 10, no. 2, 1985.
[35]
P. Valduriez, “Join Indices,” ACM Trans. Database Systems, vol. 12, no. 2, 1987.
[36]
J.L. Wolf P.S. Yu J. Turek and D.M. Dias, “A Parallel Hash Join Algorithm for Managing Data Skew,” IEEE Trans. Parallel and Distributed Systems, vol. 4, no. 12, 1993.
[37]
C. Zaniolo, “The Representation and Deductive Retrieval of Complex Objects,” Proc. 11th VLDB, 1985.

Cited By

View all
  • (2007)Partition search for non-binary constraint satisfactionInformation Sciences: an International Journal10.1016/j.ins.2007.03.030177:18(3639-3678)Online publication date: 1-Sep-2007

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 13, Issue 4
July 2001
190 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 July 2001

Author Tags

  1. Database
  2. hash
  3. index.
  4. join
  5. query processing
  6. relational architecture

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2007)Partition search for non-binary constraint satisfactionInformation Sciences: an International Journal10.1016/j.ins.2007.03.030177:18(3639-3678)Online publication date: 1-Sep-2007

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media