Abstract
ptimizing join queries is a major problem in distributed database systems, particularly when files are replicated and copies stored at different nodes in the network. A distributed query optimization algorithm must select file copies and determine how and where those files will be processed. Process decisions include which files to reduce via semijoins, if any, the sites at which to perform join operations, and the order in which to perform those join operations. We extend the scope of distributed query optimization research by develop-ing a model that, for the first time, includes all of these design decisions and considers both communication and local processing costs. We develop a genetic algorithm-based solution procedure for this model which quickly determines efficient query processing plans. We demonstrate that ignoring local processing costs or restricting join processing to the result site, as commonly done in prior research, can result in inefficient query execution plans.
Similar content being viewed by others
References
P.M.G. Apers, A.R. Hevner and S.B. Yao, Optimization algorithms for distributed queries, IEEE Transactions on Software Engineering SE-9(1983)57–68.
K. Bennett, M.C. Ferris and Y.E. Ioannidis, A genetic algorithm for database query optimization, Proceedings of the 4th International Conference on Genetic Algorithms, San Diego, CA, July 1991, pp. 400–407.
P.A. Bernstein and D.W. Chiu, Using semi-joins to solve relational queries, Journal of the ACM 28(1981)25–40.
P.A. Bernstein et al., Query processing in a system for distributed databases (SDD-1), ACM Transactions on Database Systems 6(1981)602–625.
M.-S. Chen and P.S. Yu, Interleaving a join sequence with semijoins in distributed query processing, IEEE Transactions on Parallel and Distributed Systems 3(1992)611–621.
M.-S. Chen and P.S. Yu, Combining join and semijoin operations for distributed query processing, IEEE Transactions on Knowledge and Data Engineering 5(1993)534–542.
M.-S. Chen and P.S. Yu, A graph theoretical approach to determine a join reducer sequence in distributed query processing, IEEE Transactions on Knowledge and Data Engineering 6(1994) 152–165.
J.S.J. Chen and V.O.K. Li, Optimizing joins in fragmented database systems on a broadcast local network, IEEE Transactions on Software Engineering 15(1989)26–38.
S. Christodoulakis, Implications of certain assumptions in database performance evaluation, ACM Transactions on Database Systems 9(1984)163–186.
D.W. Cornell and P.S. Yu, Site assignment for relations and join operations in the distributed transaction processing environment, Proceedings of the 4th International Conference on Data Engineering, 1988, pp. 100–108.
D.W. Cornell, and P.S. Yu, On optimal site assignment for relations in the distributed database environment, IEEE Transactions on Software Engineering 15(1989)1004–1009.
L. Davis (ed.), Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, 1991.
K.A. De Jong, Genetic-algorithm-based learning, in: Machine Learning, Y. Kodratoff and R.S. Michalski, eds., Morgan Kaufmann, 1990, pp. 611–638.
R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Benjamin/Cummings, 1989.
D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, 1989.
A.R. Hevner and S.B. Yao, Query processing in distributed database systems, IEEE Transactions on Software Engineering SE-5(1979)177–187.
A.R. Hevner and S.B. Yao, Querying distributed databases on local area networks, Proceedings of the IEEE 75(1987)563–572.
J.H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, MI, 1975.
E.S.H. Hou, N. Ansari and H. Ren, A genetic algorithm for multiprocessor scheduling, IEEE Transactions on Parallel and Distributed Systems 5(1994)113–120.
S. Lafortune and E. Wong, A state transition model for distributed query processing, ACM Transactions on Database Systems 11(1986)294–322.
E.-P. Lim and J. Srivastava, Query optimization/processing in federated database systems, Proceedings of Conference on Information and Knowledge Management, 1993.
G.M. Lohman et al., Query processing in R*, in: Query Processing in Database Systems, W. Kim et al., eds., Springer, Berlin, 1985, pp. 31–47.
H. Lu and M.J. Carey, Some experimental results on distributed join algorithms in a local network, Proceedings of the 10th International Conference on VLDB, Stockholm, 1985, pp. 292–304.
L.F. Mackert and G.M. Lohman, R* optimizer validation and performance evaluation for distributed queries, Proceedings of the 12th International Conference on VLDB, Kyoto, 1986, pp. 149–159.
S.T. March and S. Rho, Allocating data and operations to nodes in distributed database design, IEEE Transactions on Knowledge and Data Engineering 7(1995)305–317.
T.P. Martin, K.H. Lam and J.L. Russell, Evaluation of site selection algorithms for distributed query processing, Computer Journal 33(1990)61–70.
P. Mishra and M.H. Eich, Join processing in relational databases, ACM Computing Surveys 24 (1992)63–113.
I. Oliver, D. Smith and J.H. Holland, A study of permutation crossover operators on the traveling salesman problem, Proceedings of the 2nd International Conference on Genetic Algorithms, Cambridge, MA, July 28–31, 1987, pp. 224–230.
C.B. Petty, M.R. Leuze and J.J. Grefenstette, A parallel genetic algorithm, Proceedings of the 2nd International Conference on Genetic Algorithms, 1987, pp. 155–161.
S. Pramanik and D. Vineyard, Optimizing join queries in distributed databases, IEEE Transactions on Software Engineering 14(1988)1319–1326.
S. Rho and S.T. March, A nested genetic algorithm for distributed database design, Proceeding of the 27th Hawaii International Conference on System Sciences, January 1994.
S. Rho, Distributed database design: Allocation of data and operations to nodes in distributed database systems, unpublished Ph.D. Dissertation, University of Minnesota, May 1995.
A. Segev, Optimization of join operations in horizontally partitioned database systems, ACM Transactions on Database Systems 11(1986)48–80.
A. Segev, Strategies for distributed query optimization, Information Sciences 54(1991)67–88.
D. Shasha and T. Wang, Optimizing equijoin queries in distributed databases where relations are hash partitioned, ACM Transactions on Database Systems 16(1991)279–308.
G. Syswerda, Uniform crossover in genetic algorithm, Proceedings of the 3rd International Conference on Genetic Algorithms, 1989, pp. 2–9.
R. Tanese, Parallel genetic algorithms for a hypercube, Proceedings of the 2nd International Conference on Genetic Algorithms, 1987, pp. 177–183.
D. Whitley, T. Starkweather and D. Fuquay, Problems and traveling salesmen: The genetic edge recombination operator, Proceedings of the 3rd International Conference on Genetic Algorithms, Morgan Kaufmann, 1989.
H. Yoo and S. Lafortune, An intelligent search method for query optimization by semijoins, IEEE Transactions on Knowledge and Data Engineering 1(1989)226–237.
C.T. Yu and C.C. Chang, On the design of a distributed query processing algorithm, Proceedings of the ACM-SIGMOD International Conference on the Management of Data, San Jose, 1983, pp. 30–39.
C.T. Yu and C.C. Chang, Distributed query processing, ACM Computing Surveys 16(1984)399–433.
Rights and permissions
About this article
Cite this article
Rho, S., March, S.T. Optimizing distributed join queries: A genetic algorithm approach. Annals of Operations Research 71, 199–228 (1997). https://doi.org/10.1023/A:1018967414664
Issue Date:
DOI: https://doi.org/10.1023/A:1018967414664