Processing multi-join queries

March 1998

Author:
Roberto Javier Bayardo

Publisher:

University of Texas at Austin
Computer Science Dept. Taylor Hall 2.124 Austin, TX
United States

Order Number:UMI Order No. GAX98-02826

Bibliometrics

Abstract

This dissertation presents a theoretical, analytical, and empirical investigation into novel techniques for processing multi-join query. Unlike most previous work on multi-join query support, this dissertation is not solely concerned with the particular mechanism used to find the best plan. Instead, it also seeks to provide the appropriate functionality for ensuring that a good query plan exists. In most work on multi-join query processing, the set of potential query plans is consistently assumed to be identical to that considered by commercial systems intended to process only small-join queries. Here, this convention is challenged by identifying new techniques for executing multi-join queries more efficiently.Query processors often treat each join as a localized binary operation that accepts two tables as input and returns a third. Our thesis is that join operations should be executed more globally. That is, given a query requiring several join operations, we should look at executing the several joins in a cooperative manner. We demonstrate several such techniques that improve the efficiency of multi-join query answering. For each technique, we develop methods for predicting its cost and evaluate its performance on carefully designed suites of randomly generated queries. We also develop techniques for processing multi-join queries as if their answer sets are an unending stream of data--the goal being to materialize the first answers as quickly as possible. These techniques are necessary for dealing with queries whose answer sets are unmanageable (e.g. too large), or queries where only the first few answers are required. For these queries, it becomes particularly important to execute joins cooperatively since localized join processing will inevitably produce far more information than the system is capable of handling, or is needed by the user.

Contributors

Roberto Javier J Bayardo
Google LLC
- Publication Years1994 - 2009
- Publication counts35
- Citation count4,452
- Available for Download20
- Downloads (cumulative)25,621
- Downloads (12 months)1,469
- Downloads (6 weeks)160
- Average Downloads per Article1,281
- Average Citation per Article127
View Full Profile

Index Terms

Processing multi-join queries
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Recommendations

Combining Joint and Semi-Join Operations for Distributed Query Processing

The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as ...
Interleaving a Join Sequence with Semijoins in Distributed Query Processing

The problem of combining join and semijoin reducers for distributed query processing is studied. An approach based on interleaving a join sequence with beneficial semijoins is proposed. A join sequence is mapped into a join sequence tree first. The join ...
Join and multi-join processing in data integration systems

Query processing in a data integration system is complicated by a lack of quality statistics about the data, unpredictable and bursty data transfer rates, and slow or unavailable data sources. Conventional query processing algorithms, which are based on ...

Browse Theses

Sections

Index Terms

Combining Joint and Semi-Join Operations for Distributed Query Processing

Interleaving a Join Sequence with Semijoins in Distributed Query Processing

Join and multi-join processing in data integration systems

Sections

Save to Binder

Index Terms

Recommendations

Combining Joint and Semi-Join Operations for Distributed Query Processing

Interleaving a Join Sequence with Semijoins in Distributed Query Processing

Join and multi-join processing in data integration systems