Relational database system users retrieve data by specifying what they want, not how it should be found. A 'query optimizer' is responsible for determining how the desired data will be found. For each step in a query there may exist several candidate algorithms which could be used to perform the required operation. The execution times for these algorithms depend on the statistical properties of the user data to be processed, and may differ widely. The execution cost of each algorithm is estimated using statistics on the user data and cost models. The algorithm with lowest estimated cost is used.
This decision process is subject to error. Bayesian decision theory is used here to quantify the expected improvement in the decision if the choice of algorithm were postponed until additional information is acquired. The expected value of additional information is compared with the information acquisition cost before the additional information is actually acquired. Additional information, gained from sources like sampled user data, should be obtained whenever it is expected to yield a significant net reduction in execution time.
Computational methods for determining the expected value of sample information (EVSI) are explored. These methods allow for the application of these concepts to a wide range of query optimization problems. Three such problems in database query optimization are presented with example computations of the expected value of sample information.
Cited By
- Haas P and Swami A Sequential sampling procedures for query size estimation Proceedings of the 1992 ACM SIGMOD international conference on Management of data, (341-350)
- Haas P and Swami A (2019). Sequential sampling procedures for query size estimation, ACM SIGMOD Record, 21:2, (341-350), Online publication date: 1-Jun-1992.
Recommendations
View determinacy for preserving selected information in data transformations
When transforming data one often wants certain information in the data source to be preserved, i.e.,we identify parts of the source data and require these parts to be transformed without loss of information. We characterize the preservation of selected ...