[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/223784.223812acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free access

FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

Published: 22 May 1995 Publication History

Abstract

A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several types of queries, including the 'Query By Example' type (which translates to a range query); the 'all pairs' query (which translates to a spatial join [8]); the nearest-neighbor or best-match query, etc.However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points.This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some k-dimensional space (k is user-defined), such that the dis-similarities are preserved. There are two benefits from this mapping: (a) efficient retrieval, in conjunction with a SAM, as discussed before and (b) visualization and data-mining: the objects can now be plotted as points in 2-d or 3-d space, revealing potential clusters, correlations among attributes and other regularities that data-mining is looking for.We introduce an older method from pattern recognition, namely, Multi-Dimensional Scaling (MDS) [51]; although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is significantly faster than MDS, (being linear, as opposed to quadratic, on the database size N), while it manages to preserve distances and the overall structure of the data-set.

References

[1]
Rakesh Agrawal, Christos Faloutsos, and Arun Swami. Efficient similarity search in sequence databases. In Foundations of Data Organization and Algorithms (FODO) Conference, Evanston, Illinois, October 1993. also available through anonymous ftp, from olympos.cs.umd.edu: ftp/pub/TechReports/fodo.ps.
[2]
Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining association rules between sets of items in large databases. Proc. A CM SIGMOD, pages 207-216, May 1993.
[3]
Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules in large databases. Proc. of VLDB Conf., pages 487-499, September 1994.
[4]
S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. A basic local alignment search tool. Journal of Molecular Biology, 215(3):403-410, 1990.
[5]
Manish Arya, William Cody, Christos FaIoutsos, Joel Richardson, and Arthur Toga. Qbism: a prototype 3-d medical image database system. IEEE Data Engineering Bulletin, 16(1):38-42, March 1993.
[6]
Ricardo A. Baeza-Yates, Walter Cunto, Udi Manber, and Sun Wu. Proximity matching using fixed queries trees. In M. Crochemore and D. Gusfield, editors, 5th Combinatorial Pattern Matching, LNCS 807, pages 198-212. Springer-Verlag, Asilomar, CA, June 1994.
[7]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The r*-tree: an efficient and robust access method for points and rectangles. A CM SIGMOD, pages 322-331, May 1990.
[8]
Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. Multi-step processing of spatial joins. A CM SIGMOD, pages 197-208, May 1994.
[9]
Thomas Brinkhoff, Hans-Peter Kriegel, and Bernhard Seeger. Efficient processing of spatial joins using r-trees. Proc. of A CM SIGMOD, pages 237-246, May 1993.
[10]
W.A. Burkhard and R.M. Keller. Some approaches to best-match file searching. Comm. of the A CM (CA CM), 16(4):230-236, April 1973.
[11]
Mathematical Committee on Physical and NSF Engineering Sciences. Grand Challenges" High Performance Computing and Communications. National Science Foundation, 1992. The FY 1992 U.S. Research and Development Program.
[12]
R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.
[13]
Susan T. Dumais. Latent semantic indexing (LSI) and TREC-2. In D. K. Harman, editor, The Second Text Retrieval Conference (TREC-2), pages 105-115, Gaithersburg, MD, March 1994. NIST. Special Publication 500- 215.
[14]
C. Faloutsos and S. Roseman. Fractals for secondary key retrieval. Eighth A CM SIGA CT-SIGMOD- SIGART Symposium on Principles of Database Systems (PODS), pages 247-252, March 1989. also available as UMIACS-TR-89-47 and CS-TR-2242.
[15]
Christos Faloutsos and King-Ip (David) Lin. Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. Cs-tr- 3383 umiacs-tr-94-132 isr tr 94-80, Dept. of Computer Science, Univ. of Maryland, College Park, 1994. also available from mosaic (URL ftp: //olympos.cs.umd.edu /pub/TechReports/sigmoc195.ps).
[16]
Peter W. Foltz and Susan T. Dumais. Personalized information delivery: an analysis of information filtering methods. Comm. of A CM (CA CM), 35(12):51-60, December 1992.
[17]
Keinosuke Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, 1990. 2nd Edition.
[18]
I. Gargantini. An effective way to represent quadtrees. Comm. of ACM (CACM), 25(12):905-910, December 1982.
[19]
G. H. Golub and C. F. Van Loan. Matrix Computations. The Johns Hopkins University Press, Baltimore, second edition, 1989.
[20]
A. Guttman. R-trees: a dynamic index structure for spatial searching. Proc. A CM SIGMOD, pages 47-57, June 1984.
[21]
John A. Hartigan. Clustering Algorithms. John WHey &; Sons, 1975.
[22]
K. Hinrichs and J. Nievergelt. The grid file: a data structure to support proximity queries on spatial objects. Proc. of the WG'83 (Intern. Workshop on Graph Theoretic Concepts in Computer Science), pages 100-113, 1983.
[23]
H.V. Jagadish. Linear clustering of objects with multiple attributes. A CM ~qlGMOD Conf., pages 332- 342, May 1990.
[24]
H.V. Jagadish. Spatial search with polyhedra. Proc. Sixth IEEE Int'l Conf. on Data Engineering, February 1990.
[25]
H.V. Jagadish. A retrieval technique for similar shapes. Proc. A CM SIGMOD Conf., pages 208-217, May 1991.
[26]
Mark A. Jones, Guy A. Story, and Bruce W. Ballard. Integrating multiple knowledge sources in a bayesian ocr post-processor. In First International Conference on Document Analysis and Recognition, Saint-Malo, France, September 1991. to appear.
[27]
Ibrahim Kamel and Christos Fa/outsos. Hilbert rtree: an improved r-tree using fractals. In Proc. of VLDB Conference, pages 500-509, Santiago, Chile, September 1994.
[28]
Joseph B. Kruskal. Nonmetric multidimensional scaling. Psychometrika, 29"1-27, 1964.
[29]
Joseph B. Kruskal and Myron Wish. Multidimensional scaling. SAGE publications, Beverly Hills, 1978.
[30]
Karen Kukich. Techniques for automatically correcting words in text. A CM Computing Surveys, 24(4):377-440, December 1992.
[31]
David B. Lomet and Betty Salzberg. The hb-tree: a multiattribute indexing method with good guaranteed performance. A CM TODS, 15(4):625-658, December 1990.
[32]
F. Murtagh. A survey of recent advances in hierarchical clustering Mgorithms. The Computer Journal, 26(4):354-359, 1983.
[33]
A. Desai Narasimhalu and Stavros Christodoulakis. Multimedia information systems: the unfolding of a reality. IEEE Computer, 24(10):6-8, October 1991.
[34]
Raymond T. Ng and Jiawei Han. Efficient and effective clustering methods for spatial data mining. Proc. of VLDB Conf., pages 144-155, September 1994.
[35]
Wayne Niblack, Ron Barber, Will Equitz, Myron Flickner, Eduardo Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, and Gabriel Taubin. The QBIC project: Querying images by content using color, texture and shape. SPIE 1993 Intl. Symposium on Electronic Imaging: Science and Technology, Conf. 1908, Storage and Retrieval for Image and Video Databases, February 1993. Also available as IBM Research Report RJ 9203 (81511), Feb. 1, 1993, Computer Science.
[36]
J. Nievergelt, H. Hinterberger, and K.C. Sevcik. The grid file: an adaptable, symmetric multikey file structure. ACM TODS, 9(1):38-7}, March 1984.
[37]
J. Orenstein. Spatial query processing in an objectoriented database system. Proc. ACM SIGMOD, pages 326-336, May 1986.
[38]
J.A. Orenstein. A comparison of spatial query processing techniques for native and parameter spaces. Proc. of A CM SIGMOD Conf., pages 343-352, 1990.
[39]
William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes in C. Cambridge University Press, 1988.
[40]
A. Ravishankar Rao and Jerry Lohse. Identifying high level features of texture perception. In SPIE Conference, San Jose, February 1992.
[41]
A. Kimball Romney, Roger N. Shepa~d, and Sara Beth Nel:love. Multidimensional scaling: Theory and applications in the behavioral sciences : vol II- Applications. Seminar Press, New York, 1972.
[42]
Nick Roussopoulos, Steve Kelley, and F. Vincent. Nearest neighbor queries. In Proc. of the 1995 A CM- SIGMOD Conference, San Jose, CA, May 1995. to appear.
[43]
G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.
[44]
David Sankoff and Joseph B. Kruskal. Time Warps, String Edits and Macromolecules: the Theory and Practice o} Sequence Comparisons. Addison-Wesley Publishing Company, Inc., Reading, MA, 1983.
[45]
T. Sellis, N. Roussopoulos, and C. Faloutsos. The r+ tree: a dynamic index for multi-dimensional objects, in Proc. 13th International Conference on VLDB, pages 507-518, England, September 1987. also available as SRC-TR-87-32, UMIACS-TR-8%3, CS-TR-1795.
[46]
M. Shapiro. The choice of reference points in bestmatch file searching. Comm. of the A CM (CA CM), 20(5):339-343, May 1977.
[47]
Dennis Shasha and Tsong-Li Wang. New techniques for best-match retrieval. A CM TOIS, 8(2):140-158, April 1990.
[48]
R. N. Shepard. The analysis of proximities: Multidimensional scaling with an unknown distance i and ii. Psychometrika, 27:125-140, 219-246, 1962.
[49]
Gilbert Strang. Linear Algebra and its Applications. Academic Press, 1980. 2nd edition.
[50]
A.W. Toga, P.K. Banerjee, and E.M. Santori. Warping 3d models for interbrain comparisons. Neurosc. A bs., 16:247, 1990.
[51]
W. S. Torgerson. Multidimensional scaling: I. theory and method. Psychometrika, 17:401-419, 1952.
[52]
C.J. Van-Rijsbergen. information Retrieval. Butterworths, London, England, 1979. 2nd edition.
[53]
Dimitris Vassiliadis. The input-state space approach to the prediction of auroral geomagnetic activity from solar wind variables. Int. Workshop on Applications o/ Artificial Intelligence in Solar Terrestrial Physics, September 1993.
[54]
Stephen Wolfram. Mathematica. Addison Wesley, 1991. Second Edition.
[55]
Forrest W. Young. Multidimensional scaling : Hzstory, Theory and Applications. Lawrence Erlbaum associates, Hillsdale, New Jersey, 1987.

Cited By

View all
  • (2024)Map Reading and Analysis with GPT-4V(ision)ISPRS International Journal of Geo-Information10.3390/ijgi1304012713:4(127)Online publication date: 11-Apr-2024
  • (2024)iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36988142:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Sensor Network Based on Raspberry Pi for Safety System in Coal MinesProceedings of the 2024 3rd International Symposium on Intelligent Unmanned Systems and Artificial Intelligence10.1145/3669721.3675584(299-302)Online publication date: 17-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '95: Proceedings of the 1995 ACM SIGMOD international conference on Management of data
June 1995
508 pages
ISBN:0897917316
DOI:10.1145/223784
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 1995

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS95

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)419
  • Downloads (Last 6 weeks)61
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Map Reading and Analysis with GPT-4V(ision)ISPRS International Journal of Geo-Information10.3390/ijgi1304012713:4(127)Online publication date: 11-Apr-2024
  • (2024)iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor SearchProceedings of the ACM on Management of Data10.1145/36988142:6(1-26)Online publication date: 20-Dec-2024
  • (2024)Sensor Network Based on Raspberry Pi for Safety System in Coal MinesProceedings of the 2024 3rd International Symposium on Intelligent Unmanned Systems and Artificial Intelligence10.1145/3669721.3675584(299-302)Online publication date: 17-May-2024
  • (2024)Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project HealthACM Transactions on Software Engineering and Methodology10.1145/363025233:3(1-22)Online publication date: 14-Mar-2024
  • (2024)Trading Off Scalability, Privacy, and Performance in Data SynthesisIEEE Access10.1109/ACCESS.2024.336655612(26642-26654)Online publication date: 2024
  • (2024)Multidimensional scaling for big dataAdvances in Data Analysis and Classification10.1007/s11634-024-00591-9Online publication date: 13-Apr-2024
  • (2024)A FastMap-Based Framework for Efficiently Computing Top-K Projected CentralityMachine Learning, Optimization, and Data Science10.1007/978-3-031-53969-5_13(158-173)Online publication date: 16-Feb-2024
  • (2023)SurCoProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618810(10034-10052)Online publication date: 23-Jul-2023
  • (2023)CoSOV1Net: A Cone- and Spatial-Opponent Primary Visual Cortex-Inspired Neural Network for Lightweight Salient Object DetectionSensors10.3390/s2314645023:14(6450)Online publication date: 17-Jul-2023
  • (2023)Model Review: A PROMISEing OpportunityProceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering10.1145/3617555.3617876(64-68)Online publication date: 8-Dec-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media