This dissertation presents a theoretical, analytical, and empirical investigation into novel techniques for processing multi-join query. Unlike most previous work on multi-join query support, this dissertation is not solely concerned with the particular mechanism used to find the best plan. Instead, it also seeks to provide the appropriate functionality for ensuring that a good query plan exists. In most work on multi-join query processing, the set of potential query plans is consistently assumed to be identical to that considered by commercial systems intended to process only small-join queries. Here, this convention is challenged by identifying new techniques for executing multi-join queries more efficiently.Query processors often treat each join as a localized binary operation that accepts two tables as input and returns a third. Our thesis is that join operations should be executed more globally. That is, given a query requiring several join operations, we should look at executing the several joins in a cooperative manner. We demonstrate several such techniques that improve the efficiency of multi-join query answering. For each technique, we develop methods for predicting its cost and evaluate its performance on carefully designed suites of randomly generated queries. We also develop techniques for processing multi-join queries as if their answer sets are an unending stream of data--the goal being to materialize the first answers as quickly as possible. These techniques are necessary for dealing with queries whose answer sets are unmanageable (e.g. too large), or queries where only the first few answers are required. For these queries, it becomes particularly important to execute joins cooperatively since localized join processing will inevitably produce far more information than the system is capable of handling, or is needed by the user.
Index Terms
- Processing multi-join queries
Recommendations
Combining Joint and Semi-Join Operations for Distributed Query Processing
The application of a combination of join and semi-join operations to minimize the amount of data transmission required for distributed query processing is discussed. Specifically, two important concepts that occur with the use of join operations as ...
Interleaving a Join Sequence with Semijoins in Distributed Query Processing
The problem of combining join and semijoin reducers for distributed query processing is studied. An approach based on interleaving a join sequence with beneficial semijoins is proposed. A join sequence is mapped into a join sequence tree first. The join ...
Join and multi-join processing in data integration systems
Query processing in a data integration system is complicated by a lack of quality statistics about the data, unpredictable and bursty data transfer rates, and slow or unavailable data sources. Conventional query processing algorithms, which are based on ...