[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

The TV-tree: an index structure for high-dimensional data

Published: 01 October 1994 Publication History

Abstract

We propose a file structure to index high-dimensionality data, which are typically points in some feature space. The idea is to use only a few of the features, using additional features only when the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such "varying length" feature vectors. Finally, we report simulation results, comparing the proposed structure with the R*-tree, which is one of the most successful methods for low-dimensionality spaces. The results illustrate the superiority of our method, which saves up to 80% in disk accesses.

References

[1]
Agrawal, R., Faloutsos, C., and Swami, A. Efficient similarity search in sequence databases. FODO Conference, Evanston, IL, 1993.
[2]
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. A basic local alignment tool. Journal of Molecular Biology, 215(13):403-410, 1990.
[3]
Angell, R.C., Freund, G.E., and Willet, P. Automatic spelling correction using a trigram similarity measure. Information Processing and Management, 19(4):255-261, 1983.
[4]
Arya, M., Cody, W., Faloutsos, C., Richardson, J., and Toga, A. Qbism: A prototype 3-D medical image database system. IEEE Data Engineering Bulletin, 16(1):38-42, 1993.
[5]
Aurenhammer, F. Voronoi diagrams: A survey of a fundamental geometric data structure. ACM Computing Surveys, 23(3):345-405, 1991.
[6]
Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. The R*-tree: An efficient and robust access method for points and rectangles. ACM SIGMOD, Atlantic City, NJ, 1990.
[7]
Bentley, J.L., Weide, B.W., and Yao, A.C. Optimal expected-time algorithms for closest-point problems. ACM Transactions on Mathematical Software, 6(4):563-580, 1980.
[8]
Brinkhoff, T., Kriegel, H.-P., and Seeger, B. Efficient processing of spatial joins using R-trees. Proceedings of the ACM SIGMOD, Washington, DC, 1993.
[9]
Chatfield, C. The Analysis of Time Series: An Introduction. London: Chapman and Hall, 1984. Third edition.
[10]
Friedman, J.H., Baskett, F., and Shustek, L.H. An algorithm for finding nearest neighbors. IEEE Transactions on Computers, C-24(10):1000-1006, 1975.
[11]
Fukunaga, K. Introduction to Statistical Pattern Recognition. New York: Academic Press, 1990.
[12]
Fukunaga, K. and Narendra, P.M. A branch and bound algorithm for computing k-nearest neighbors. IEEE Transactions on Computers, C-24(7):750-753, 1975.
[13]
Greene, D. An implementation and performance analysis of spatial data access methods. Proceedings of Data Engineering, Boston, MA, 1989.
[14]
Guttman, A. R-trees: A dynamic index structure for spatial searching. Proceedings of the ACM SIGMOD, 1984.
[15]
Hamming, R.W. Digital Filters. Englewood Cliffs, NJ: Prentice-Hall, 1977.
[16]
Hartigan, J.A. Clustering algorithms. New York: John Wiley & Sons, 1975.
[17]
Hoel, E.G. and Samet, H. A qualitative comparison study of data structures for large line segment databases. Proceedings of the ACM SIGMOD Conference, San Diego, CA, 1992.
[18]
Hunter, G.M. and Steiglitz, K. Operations on images using quad trees. IEEE Transactions on PAMI, 1(2):145-153 (1979).
[19]
Jagadish, H.V. Spatial search with polyhedra. Proceedings of the Sixth IEEE International Conference on Data Engineering, Los Angeles, CA, 1990.
[20]
Jagadish, H.V. A retrieval technique for similar shapes. Proceedings of the ACM SIGMOD Conference, Denver, CO, 1991.
[21]
Kamel, I. and Faloutsos, C. Hilbert R-tree: An improved R-tree using fractals. Systems Research Center (SRC) TR-93-19, University of Maryland, College Park, MD, 1993.
[22]
Kukich, K. Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4):377-440, 1992.
[23]
Mandelbrot, B. Fractal Geometry of Nature. New York: W.H. Freeman, 1977.
[24]
Murtagh, F. A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4):354-359, 1983.
[25]
Narasimhalu, A.D. and Christodoulakis, S. Multimedia information systems: The unfolding of a reality. IEEE Computer, 24(10):6-8, 1991.
[26]
Niblack, W., Barber, R., Equitz, W., Flickner, M., Glasman, E., Petkovic, D., Yanker, P., Faloutsos, C., and Taubin, G. The qbic project: Querying images by content using color, texture, and shape. SPIE 1993 International Symposium on Electronic Imaging: Science and Technology Conference 1908, Storage and Retrieval for Image and Video Databases, San Jose, CA, 1993. Also available as IBM Research Report RJ 9203 (81511), 1993.
[27]
Nievergelt, J., Hinterberger, H., and Sevcik, K.C. The grid file: An adaptable, symmetric, multikey file structure. ACM TODS, 9(1):38-71, 1984.
[28]
Orenstein, J.A. and Manola, F.A. Probe spatial data modeling and query processing in an image database application. IEEE Transactions on Software Engineering, 14(5):611-629, 1988.
[29]
Ruskai, M.B., Beylkin, G., Coifman, R., Daubechies, I., Mallat, S., Meyer, Y., and Raphael, L. Wavelets and Their Applications. Boston: Jones and Bartlett Publishers, 1992.
[30]
Salton, G. and Wong, A. Generation and search of clustered files. ACM TODS, 3(4):321-346, 1978.
[31]
Samet, H. The Design and Analysis of Spatial Data Structures. Reading, MA: Addison-Wesley, 1989.
[32]
Schroeder, M. Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise. New York: W.H. Freeman and Company, 1991.
[33]
Wallace, G.K. The jpeg still picture compression standard. CACM, 34(4):31-44, 1991.

Cited By

View all
  • (2023)A new near-linear time algorithm for k-nearest neighbor search using a compressed cover treeProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618780(9267-9311)Online publication date: 23-Jul-2023
  • (2023)ARKGraph: All-Range Approximate K-Nearest-Neighbor GraphProceedings of the VLDB Endowment10.14778/3603581.360360116:10(2645-2658)Online publication date: 1-Jun-2023
  • (2022)Indexing Metric Spaces for Exact Similarity SearchACM Computing Surveys10.1145/353496355:6(1-39)Online publication date: 7-Dec-2022
  • Show More Cited By
  1. The TV-tree: an index structure for high-dimensional data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image The VLDB Journal — The International Journal on Very Large Data Bases
    The VLDB Journal — The International Journal on Very Large Data Bases  Volume 3, Issue 4
    Spatial Database Systems
    October 1994
    184 pages

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 01 October 1994

    Author Tags

    1. query by content
    2. similarity retrieval
    3. spatial index

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)139
    • Downloads (Last 6 weeks)33
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A new near-linear time algorithm for k-nearest neighbor search using a compressed cover treeProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618780(9267-9311)Online publication date: 23-Jul-2023
    • (2023)ARKGraph: All-Range Approximate K-Nearest-Neighbor GraphProceedings of the VLDB Endowment10.14778/3603581.360360116:10(2645-2658)Online publication date: 1-Jun-2023
    • (2022)Indexing Metric Spaces for Exact Similarity SearchACM Computing Surveys10.1145/353496355:6(1-39)Online publication date: 7-Dec-2022
    • (2020)DeltaPQProceedings of the VLDB Endowment10.14778/3424573.342458013:13(3603-3616)Online publication date: 27-Oct-2020
    • (2020)Outlier DetectionACM Computing Surveys10.1145/338102853:3(1-37)Online publication date: 12-Jun-2020
    • (2020)Monotonic Cardinality Estimation of Similarity Selection: A Deep Learning ApproachProceedings of the 2020 ACM SIGMOD International Conference on Management of Data10.1145/3318464.3380570(1197-1212)Online publication date: 11-Jun-2020
    • (2018)PS-tree-based efficient boolean expression matching for high-dimensional and dense workloadsProceedings of the VLDB Endowment10.14778/3291264.329127012:3(251-264)Online publication date: 1-Nov-2018
    • (2017)Efficient Ad-Hoc Graph Inference and Matching in Biological DatabasesProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3035929(359-373)Online publication date: 9-May-2017
    • (2017)MSQLThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-017-0481-626:6(829-854)Online publication date: 1-Dec-2017
    • (2016)Comparative Evaluation of Various Indexing Techniques of Geospatial Vector Data for Processing in Distributed Computing EnvironmentProceedings of the 9th Annual ACM India Conference10.1145/2998476.2998493(167-172)Online publication date: 21-Oct-2016
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media