[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Distance browsing in spatial databases

Published: 01 June 1999 Publication History

Abstract

We compare two different techniques for browsing through a collection of spatial objects stored in an R-tree spatial data structure on the basis of their distances from an arbitrary spatial query object. The conventional approach is one that makes use of a k-nearest neighbor algorithm where k is known prior to the invocation of the algorithm. Thus if m < k neighbors are needed, the k-nearest neighbor algorithm has to be reinvoked for m neighbors, thereby possibly performing some redundant computations. The second approach is incremental in the sense that having obtained the k nearest neighbors, the k + 1st neighbor can be obtained without having to calculate the k + 1 nearest neighbors from scratch. The incremental approach is useful when processing complex queries where one of the conditions involves spatial proximity (e.g., the nearest city to Chicago with population greater than a million), in which case a query engine can make use of a pipelined strategy. We present a general incremental nearest neighbor algorithm that is applicable to a large class of hierarchical spatial data structures. This algorithm is adapted to the R-tree and its performance is compared to an existing k-nearest neighbor algorithm for R-trees [Rousseopoulos et al. 1995]. Experiments show that the incremental nearest neighbor algorithm significantly outperforms the k-nearest neighbor algorithm for distance browsing queries in a spatial database that uses the R-tree as a spatial index. Moreover, the incremental nearest neighbor algorithm usually outperforms the k-nearest neighber algorithm when applied to the k-nearest neighbor problem for the R-tree, although the improvement is not nearly as large as for distance browsing queries. In fact, we prove informally that at any step in its execution the incremental nearest neighbor algorithm is optimal with respect to the spatial data structure that is employed. Furthermore, based on some simplifying assumptions, we prove that in two dimensions the number of distance computations and leaf nodes accesses made by the algorithm for finding k neighbors is O(k + k).

References

[1]
AOKI, P. M. 1998. Generalizing "search" in generalized search trees. In Proceedings of the 14th International Conference on Data Engineering (Orlando, FL, Feb.). IEEE Computer Society Press, Los Alamitos, CA, 380-389.
[2]
AREF, W. G. AND SAMET, H. 1992. Uniquely reporting spatial objects: Yet another operation for comparing spatial data structures. In Proceedings of the 5th Symposium on Spatial Data Handling (Charleston, SC, Aug.). 178-189.
[3]
AREF, W. G. AND SAMET, H. 1993. Estimating selectivity factors of spatial operations. In Optimization in Databases - Proceedings of the 5th International Workshop on Foundations of Models and Languages for Data and Objects (Aigen, Austria, Sept.). 31-40.
[4]
ARYA, S., MOUNT, D. M., NETANYAHU, N. S., SILVERMAN, R., AND WU, A. Y. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45, 6, 891-923.
[5]
BECKER, L. AND G TING, R. H. 1992. Rule-based optimization and query processing in an extensible geometric database system. ACM Trans. Database Syst. 17, 2 (June 1992), 247-303.
[6]
BECKMANN, N., KRIEGEL, H.-P., SCHNEIDER, R., AND SEEGER, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90, Atlantic City, NJ, May 23-25, 1990), H. Garcia-Molina, Ed. ACM Press, New York, NY, 322-331.
[7]
BENTLEY, g. L. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 9 (Sept.), 509-517.
[8]
BERCHTOLD, S., B HM, C., KEIM, D. A., AND KRIEGEL, H.-P. 1997. A cost model for nearest neighbor search in high-dimensional data space. In Proceedings of the 16th ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '97, Tucson, AZ, May 12-14, 1997), A. Mendelzon and Z.M. zsoyoglu, Eds. ACM Press, New York, NY, 78-86.
[9]
BERCHTOLD, S., KEIM, D. A., AND KRIEGEL, H.-P. 1996. The X-tree: An index structure for high-dimensional data. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96, Mumbai, India, Sept.). 28-39.
[10]
BERN, M. 1993. Approximate closest-point queries in high dimensions. Inf. Process. Lett. 45, 2 (Feb. 26, 1993), 95-99.
[11]
BRIN, S. 1995. Near neighbor search in large metric space. In Proceedings of the 21st International Conference on Very Large Data Bases (VLDB '95, Zurich, Sept.). 574-584.
[12]
BRODER, A. J. 1990. Strategies for efficient incremental nearest neighbor search. Pattern Recogn. 23, 1/2 (Jan. 1990), 171-178.
[13]
BURKHARD, W. A. AND KELLER, R. 1973. Some approaches to best-match file searching. Commun. ACM 16, 4 (Apr.), 230-236.
[14]
CIACCIA, P., PATELLA, M., AND ZEZULA, P. 1997. M-tree: An efficient access method for similarity search in metric spaces. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB '97, Athens, Greece, Aug.). 426-435.
[15]
COMER, D. 1979. The ubiquitous B-tree. ACM Comput. Surv. 11, 2 (June), 121-137.
[16]
EASTMAN, C. M. AND ZEMANKOVA, M. 1982. Partially specified nearest neighbor searches using k-d-trees. Inf. Process. Lett. 15, 2 (Sept.), 53-56.
[17]
ESPERAN~A, C. AND SAMET, H. 1997. Orthogonal polygons as bounding structures in filter-refine query processing strategies. In Proceedings of the Fifth International Symposium on Advances in Spatial Databases (SSD'97, Berlin, July), M. Scholl and A. Voisard, Eds. Springer-Verlag, New York, 197-220.
[18]
FALOUTSOS, C. AND LIN, K. 1995. FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In Proceedings of the ACM SIGMOD Conference on Management of Data (San Jose, CA, May). ACM Press, New York, NY, 163-174.
[19]
FRANK, A. U. AND BARRERA, R. 1989. The Fieldtree: a data structure for geographic information systems. In Proceedings of the First Symposium on Design and Implementation of Large Spatial Databases (SSD'89, Santa Barbara, CA, July), A. Buchmann, O. G nther, T. R. Smith, and Y. F. Wang, Eds. Springer-Verlag, New York, 29-44.
[20]
FREDMAN, M. L., SEDGEWICK, R., SLEATOR, D. D., AND TARJAN, R. E. 1986. The pairing heap: a new form of self-adjusting heap. Algorithmica 1, 1 (Jan. 1986), 111-129.
[21]
FRIEDMAN, J. H., BENTLEY, J. L., AND FINKEL, R.A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 3 (Sept.), 209-226.
[22]
FUKUNAGA, K. AND NARENDRA, P. M. 1975. A branch and bound algorithm for computing. IEEE Trans. Comput. 24, 7 (July), 750-753.
[23]
G NTHER, O. AND NOLTEMEIER, H. 1991. Spatial database indices for large extended objects. In Proceedings of the Seventh International Conference on Data Engineering (Kobe, Japan). IEEE Computer Society Press, Los Alamitos, CA, 520-526.
[24]
GUTTMAN, A. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the ACM SIGMOD Annual Meeting on Management of Data (SIGMOD '84, Boston, MA, June18-21). ACM, New York, NY, 47-57.
[25]
HAFNER, J., SAWHNEY, H., AND EQUITZ, W. E. AL. 1995. Efficient color histogram indexing for quadratic form distance functions. IEEE Trans. Pattern Anal. Mach. Intell. 17, 7 (July), 729-736.
[26]
HENRICH, A. 1994. A distance-scan algorithm for spatial access structures. In Proceedings of the Second ACM Workshop on Geographic Information Systems (Gaithersburg, MD, Dec.). ACM Press, New York, NY, 136-143.
[27]
HENRICH, A. 1998. The LSDh-tree: an access structure for feature vectors. In Proceedings of the 14th International Conference on Data Engineering (Orlando, FL, Feb.). IEEE Computer Society Press, Los Alamitos, CA, 362-369.
[28]
HENRICH, A., SIX, H.-W., AND WIDMAYER, P. 1989. The LSD tree: spatial access to multidimensional and non-point objects. In Proceedings of the 15th International Conference on Very Large Data Bases (VLDB '89, Amsterdam, The Netherlands, Aug 22-25), R. P. van de Riet, Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, 45-53.
[29]
HJALTASON, G. R. AND SAMET, H. 1995. Ranking in spatial databases. In Proceedings of the Fourth International Symposium on Advances in Spatial Databases (SSD'95, Portland, ME, Aug.), M. J. Egenhofer and J. R. Herring, Eds. Springer-Verlag, New York, 83-95.
[30]
HOEL, E. G. AND SAMET, H. 1991. Efficient processing of spatial queries in line segment databases. In Proceedings of the 2nd symposium on Advances in Spatial Databases (SSD '91, Zurich, Switzerland, Aug. 28-30, 1991), O. G nther and H.-J. Schek, Eds. Springer Lecture Notes in Computer Science, vol. 525. Springer-Verlag, New York, NY, 237-256.
[31]
KAMEL, I. AND FALOUTSOS, C. 1993. On packing R-trees. In Proceedings of the Second International Conference on Information and Knowledge Management (CIKM '93, Washington, DC, Nov. 1-5 1993), B. Bhargava, T. Finin, and Y. Yesha, Eds. ACM Press, New York, NY, 490-499.
[32]
KAMEL, I. AND FALOUTSOS, C. 1994. Hilbert R-tree: An improved R-tree using fractals. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94, Santiago, Chile, Sept.). VLDB Endowment, Berkeley, CA, 500-509.
[33]
KAMGAR-PARSI, B. AND KANAL, L. N. 1985. An improved branch and bound algorithm for computing k -nearest neighbors. Pattern Recogn. Lett. 3, 1 (Jan.).
[34]
KANTH, K. V. R., AGRAWAL, D., AND SINGH, A. 1998. Dimensionality reduction for similarity searching in dynamic databases. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD '98, Seattle, WA, June 1-4, 1998), L. Haas, P. Drew, A. Tiwary, and M. Franklin, Eds. ACM Press, New York, NY, 237-248.
[35]
KATAYAMA, N. AND SATOH, S. 1997. The SR-tree: an index structure for high-dimensional nearest neighbor queries. In Proceedings of the International ACM Conference on Management of Data (SIGMOD '97, May). ACM, New York, NY, 369-380.
[36]
KORN, F., SIDIROPOULOS, N., FALOUTSOS, C., SIEGEL, E., AND PROTOPAPAS, Z. 1996. Fast nearest neighbor search in medical image databases. In Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB '96, Mumbai, India, Sept.). 215-226.
[37]
K_RIEGEL, H.-P., SCHMIDT, T., AND SEIDL, T. 1997. 3-D similarity search by shape approximation. In Proceedings of the Fifth International Symposium on Advances in Spatial Databases (SSD'97, Berlin, July), M. Scholl and A. Voisard, Eds. Springer-Verlag, New York, 11-28.
[38]
LINDENBAUM, M. AND SAMET, H. 1995. A probabilistic analysis of trie-based sorting of large collections of line segments. TR-3455. University of Maryland at College Park, College Park, MD.
[39]
LOMET, D. AND SALZBERG, B. 1989. A robust multi-attribute search structure. In Proceedings of the Fifth IEEE International Conference on Data Engineering (Los Angeles, CA, Feb. 1989). 296-304.
[40]
MURALIKRISHNA, M. AND DEWITT, D. J. 1988. Equi-depth multidimensional histograms. In Proceedings of the Conference on Management of Data (SIGMOD '88, Chicago, IL, June 1-3, 1988), H. Boral and P.-A. Larson, Eds. ACM Press, New York, NY, 28-36.
[41]
MURPHY, O J AND SELKOW, S M 1986. The efficiency of using k-d trees for finding nearest neighbors in discrete space. Inf. Process. Lett. 23, 4 (Nov. 8, 1986), 215-218.
[42]
NELSON, R. C AND SAMET, H. 1986. A consistent hierarchical representation for vector data. SIGGRAPH Comput. Graph. 20, 4 (Aug. 1986), 197-206.
[43]
BUREAU OF THE CENSUS, 1989. Tiger/Line precensus files. Bureau of the Census, Washington DC.
[44]
ROBINSON, J. T. 1981. The k-d-b-tree: A search structure for large multidimensional dynamic indexes. In Proceedings of the ACM SIGMOD 1981 International Conference on Management of Data (Ann Arbor, MI, Apr. 29-May 1). ACM Press, New York, NY, 10-18.
[45]
ROUSSOPOULOS, N., KELLEY, S., AND VINCENT, F. 1995. Nearest neighbor queries. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD '95, San Jose, CA, May 23-25), M. Carey and D. Schneider, Eds. ACM Press, New York, NY, 71-79.
[46]
ROUSSOPOULOS, N. AND LEIFKER, D. 1985. Direct spatial search on pictorial databases using packed R-trees. In Proceedings of the ACM SIGMOD Conference on Management of Data (SIGMOD, Austin, TX, May). ACM Press, New York, NY, 17-31.
[47]
SAMET, H. 1990. The Design and Analysis of Spatial Data Structures. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA.
[48]
SEIDL, T. AND KRIEGEL, H.-P. 1998. Optimal multi-step k-nearest neighbor search. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD '98, Seattle, WA, June 1-4, 1998), L. Haas, P. Drew, A. Tiwary, and M. Franklin, Eds. ACM Press, New York, NY, 154-165.
[49]
SELINGER, P. G., ASTRAHAN, M. M., LORIE, R. A., AND PRICE, T. G. 1979. Access path selection in a relational database management system. In Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD '79, Boston, MA, May 30-June 1). ACM Press, New York, NY, 23-34.
[50]
SELLIS, T., ROUSSOPOULOS, N., AND FALOUTSOS, C. 1987. The R+-tree: A dynamic index for multi-dimensional objects. In Proceedings of the 13th International Conference on Very Large Data Bases (Brighton, UK, Sept.). 71-79.
[51]
SPROULL, R. F. 1991. Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica 6, 4, 579-589.
[52]
UHLMANN, J. K. 1991. Satisfying general proximity/similarity queries with metric trees. Inf. Process. Lett. 40, 4 (Nov.), 175-179.
[53]
WANG, T. L. AND SHASHA, D. 1990. Query processing for distance metrics. In Proceedings of the 16th VLDB Conference on Very Large Data Bases (VLDB, Brisbane, Australia). VLDB Endowment, Berkeley, CA, 602-613.
[54]
WHITE, D. A. AND JAIN, R. 1996. Algorithms and strategies for similarity retrieval. Tech. Rep. VCL-96-101. University of California at San Diego, La Jolla, CA.
[55]
WHITE, D. A. AND JAIN, R. 1996. Similarity indexing with the SS-tree. In Proceedings of the 12th IEEE International Conference on Data Engineering (New Orleans, LA). IEEE Press, Piscataway, NJ, 516-523.

Cited By

View all
  • (2024)Consumer Price Index Forecasting in Turkey: A Comparison of Deep Learning and Machine Learning ApproachesIğdır Üniversitesi Sosyal Bilimler Dergisi10.54600/igdirsosbilder.1386274Online publication date: 10-May-2024
  • (2024)Does ESG Predict Business Failure in Brazil? An Application of Machine Learning TechniquesRisks10.3390/risks1212018512:12(185)Online publication date: 25-Nov-2024
  • (2024)Efficient Distance Queries on Non-point DataACM Transactions on Spatial Algorithms and Systems10.1145/369819411:1(1-37)Online publication date: 2-Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 24, Issue 2
June 1999
142 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/320248
  • Editor:
  • Won Kim
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 1999
Published in TODS Volume 24, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. R-trees
  2. distance browsing
  3. hiearchical spatial data structures
  4. nearest neighbors
  5. ranking

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)375
  • Downloads (Last 6 weeks)51
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Consumer Price Index Forecasting in Turkey: A Comparison of Deep Learning and Machine Learning ApproachesIğdır Üniversitesi Sosyal Bilimler Dergisi10.54600/igdirsosbilder.1386274Online publication date: 10-May-2024
  • (2024)Does ESG Predict Business Failure in Brazil? An Application of Machine Learning TechniquesRisks10.3390/risks1212018512:12(185)Online publication date: 25-Nov-2024
  • (2024)Efficient Distance Queries on Non-point DataACM Transactions on Spatial Algorithms and Systems10.1145/369819411:1(1-37)Online publication date: 2-Oct-2024
  • (2024)DB-LSH 2.0: Locality-Sensitive Hashing With Query-Based Dynamic BucketingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.329583136:3(1000-1015)Online publication date: 1-Mar-2024
  • (2024)Processing Conflict-Aware $k$ Nearest Neighbor Queries in Euclidean Space2024 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp60711.2024.00035(175-182)Online publication date: 18-Feb-2024
  • (2024)Distributed Processing of Budget-based Neighboring Object Group Queries2024 IEEE/ACIS 9th International Conference on Big Data, Cloud Computing, and Data Science (BCD)10.1109/BCD61269.2024.10743106(54-58)Online publication date: 16-Jul-2024
  • (2024)On approximate near-neighbors search under the (continuous) Fréchet distance in higher dimensionsInformation Processing Letters10.1016/j.ipl.2023.106405183:COnline publication date: 1-Jan-2024
  • (2024)Efficient processing of all neighboring object group queries with budget range constraint in road networksComputing10.1007/s00607-024-01260-7106:5(1359-1393)Online publication date: 1-May-2024
  • (2023)BDMCA: a big data management system for Chinese auditingPeerJ Computer Science10.7717/peerj-cs.13179(e1317)Online publication date: 13-Apr-2023
  • (2023)SIMD-ified R-tree Query Processing and OptimizationProceedings of the 31st ACM International Conference on Advances in Geographic Information Systems10.1145/3589132.3625610(1-10)Online publication date: 13-Nov-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media