Join processing in relational database systems

January 1993

Author:
Ram Danturti Gopal
State Univ. of New York at Buffalo, Buffalo

Publisher:

State University of New York at Buffalo
Computer Science Department 226 Bell Hall Buffalo, NY
United States

Order Number:UMI Order No. GAX94-20149

Bibliometrics

Abstract

An extremely important issue in the efficient implementation of a relational database system is query optimization. Query optimization is concerned with the minimization of the amount of information that must be accessed from a database in order to respond to a user's query. In this research, we address the issue of join processing in relational database systems. Join is an important query which arises frequently, both in traditional and in emerging database applications. The need for efficient join processing algorithms is strongly underscored in the literature as it is an expensive query operation which directly limits the wide spread use of database systems in real world applications.

The central premise of this research is that optimization of the sequence in which the data pages of the underlying relations are accessed can lead to a significant improvement in the join execution cost over the current methods. Accordingly, we develop three approaches for join processing. In the first two approaches, we develop a tree representation consisting of the data pages of the relations. The tree representation reveals the structural properties of the join and provides a framework to identify the optimal page access sequence that minimizes the join execution cost. We formulate the join as a tree traversal process and accordingly develop efficient tree traversal algorithms. The third approach draws its inspiration from the hybrid hash algorithm, which is considered the state-of-the-art join processing method. In this approach, we segment the relations into zones. The join is materialized by processing the zones sequentially. We theoretically show that the proposed approach is guaranteed to outperform the hybrid hash method. The proposed approaches utilize indices on the participating relations to accelerate the join.

The algorithms developed in this research are procedurally straight forward and are easily transportable to any database management system. The data structures needed for the algorithms are simple, and easy to create and manage. Our computational experience suggests that the proposed approaches can lead to an order of magnitude improvement over current approaches to join processing, and thus constitute an important contribution to the theory and practise of relational database management.

Cited By

Gopal R, Ramesh R and Zionts S (2001). Criss-Cross Hash Joins, IEEE Transactions on Knowledge and Data Engineering, 13:4, (637-653), Online publication date: 1-Jul-2001.

Contributors

Ram Danturti Gopal
Warwick Business School
- Publication Years1992 - 2024
- Publication counts68
- Citation count958
- Available for Download13
- Downloads (cumulative)43,388
- Downloads (12 months)1,700
- Downloads (6 weeks)423
- Average Downloads per Article3,338
- Average Citation per Article14
View Full Profile

Index Terms

Join processing in relational database systems
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
  2. Information retrieval
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Query processing over object views of relational data

This paper presents an approach to object view management for relational databases. Such a view mechanism makes it possible for users to transparently work with data in a relational database as if it was stored in an object-oriented (OO) database. A ...
Improvement Algorithms for Semijoin Query Processing Programs in Distributed Database Systems

The problem of optimal query processing in distributed database systems was shown to be NP-hard. This means that heuristic algorithms are necessary to solve the query processing problem. In this paper, we describe algorithms to improve the solutions ...
An Optimal Algorithm for Processing Distributed Star Queries

The problem of optimal query processing in distributed database systems was shown to be NP-hard. However, for a special type of queries called star queries, we have developed a polynomial optimal algorithm. Semijoin tactics are applied for query ...

Browse Theses

Sections

Cited By

Index Terms

Query processing over object views of relational data

Improvement Algorithms for Semijoin Query Processing Programs in Distributed Database Systems

An Optimal Algorithm for Processing Distributed Star Queries

Sections

Cited By

Save to Binder

Index Terms

Recommendations

Query processing over object views of relational data

Improvement Algorithms for Semijoin Query Processing Programs in Distributed Database Systems

An Optimal Algorithm for Processing Distributed Star Queries