An extremely important issue in the efficient implementation of a relational database system is query optimization. Query optimization is concerned with the minimization of the amount of information that must be accessed from a database in order to respond to a user's query. In this research, we address the issue of join processing in relational database systems. Join is an important query which arises frequently, both in traditional and in emerging database applications. The need for efficient join processing algorithms is strongly underscored in the literature as it is an expensive query operation which directly limits the wide spread use of database systems in real world applications.
The central premise of this research is that optimization of the sequence in which the data pages of the underlying relations are accessed can lead to a significant improvement in the join execution cost over the current methods. Accordingly, we develop three approaches for join processing. In the first two approaches, we develop a tree representation consisting of the data pages of the relations. The tree representation reveals the structural properties of the join and provides a framework to identify the optimal page access sequence that minimizes the join execution cost. We formulate the join as a tree traversal process and accordingly develop efficient tree traversal algorithms. The third approach draws its inspiration from the hybrid hash algorithm, which is considered the state-of-the-art join processing method. In this approach, we segment the relations into zones. The join is materialized by processing the zones sequentially. We theoretically show that the proposed approach is guaranteed to outperform the hybrid hash method. The proposed approaches utilize indices on the participating relations to accelerate the join.
The algorithms developed in this research are procedurally straight forward and are easily transportable to any database management system. The data structures needed for the algorithms are simple, and easy to create and manage. Our computational experience suggests that the proposed approaches can lead to an order of magnitude improvement over current approaches to join processing, and thus constitute an important contribution to the theory and practise of relational database management.
Index Terms
- Join processing in relational database systems
Recommendations
Query processing over object views of relational data
This paper presents an approach to object view management for relational databases. Such a view mechanism makes it possible for users to transparently work with data in a relational database as if it was stored in an object-oriented (OO) database. A ...
Improvement Algorithms for Semijoin Query Processing Programs in Distributed Database Systems
The problem of optimal query processing in distributed database systems was shown to be NP-hard. This means that heuristic algorithms are necessary to solve the query processing problem. In this paper, we describe algorithms to improve the solutions ...
An Optimal Algorithm for Processing Distributed Star Queries
The problem of optimal query processing in distributed database systems was shown to be NP-hard. However, for a special type of queries called star queries, we have developed a polynomial optimal algorithm. Semijoin tactics are applied for query ...