[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Access Structures for Angular Similarity Queries

Published: 01 November 2006 Publication History

Abstract

Angular similarity measures have been utilized by several database applications to define semantic similarity between various data types such as text documents, time-series, images, and scientific data. Although similarity searches based on Euclidean distance have been extensively studied in the database community, processing of angular similarity searches has been relatively untouched. Problems due to a mismatch in the underlying geometry as well as the high dimensionality of the data make current techniques either inapplicable or their use results in poor performance. This brings up the need for effective indexing methods for angular similarity queries. We first discuss how to efficiently process such queries and propose effective access structures suited to angular similarity measures. In particular, we propose two classes of access structures, namely, Angular-sweep and Cone-shell, which perform different types of quantization based on the angular orientation of the data objects. We also develop query processing algorithms that utilize these structures as dense indices. The proposed techniques are shown to be scalable with respect to both dimensionality and the size of the data. Our experimental results on real data sets from various applications show two to three orders of magnitude of speedup over the current techniques.

References

[1]
R.K. Ando, “Latent Semantic Space: Iterative Scaling Improves Precision of Inter-Document Similarity Measurement,” Proc. 23rd ACM SIGIR Conf., pp. 216-223, 2000.
[2]
M. Andrec, P. Du, and R.M. Levy, “Protein Structural Motif Recognition via NMR Residual Dipolar Couplings,” J. Am. Chemical Soc., no. 123, pp. 1222-1229, 2001.
[3]
D. Androutsos, K. Plataniotis, and A. Venetsanopoulos, “A Novel Vector-Based Approach to Color Image Retrieval Using a Vector Angular-Based Distance Measure,” Computer Vision and Image Understanding, vol. 75, nos. 1/2, pp. 46-58, Aug. 1999.
[4]
A. Baruffolo, “R-Trees for Astronomical Data Indexing,” ASP Conf. Ser., Astronomical Data Analysis Software and Systems VII, pp.375, 1999.
[5]
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger, “The ${\rm R}^{\ast}$ Tree: An Efficient and Robust Access Method for Points and Rectangles,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 322-331, May 1990.
[6]
S. Berchtold, C. Bohm, H. Jagadish, H. Kriegel, and J. Sander, “Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces,” Proc. 16th Int'l Conf. Data Eng., pp. 577-588, 2000.
[7]
S. Berchtold, C. Bohm, H.V. Jagadish, H.-P. Kriegel, and J. Sander, “Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces,” Proc. Int'l Conf. Data Eng., pp.577-588, 2000.
[8]
S. Berchtold, C. Bohm, D. Keim, and H. Kriegel, “A Cost Model for Nearest Neighbor Search in High-Dimensional Data Space,” Proc. ACM Symp. Principles of Database Systems, pp. 78-86, June 1997.
[9]
S. Berchtold, C. Bohm, and H.-P. Kriegel, “The Pyramid-Technique: Towards Breaking the Curse of Dimensionality,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 142-153, June 1998.
[10]
P.V. Biron and D.H. Kraft, “New Methods for Relevance Feedback: Improving Information Retrieval Performance,” Proc. ACM Symp. Applied Computing, pp. 482-487, 1995.
[11]
I. Choi, J. Kwon, and S. Kim, “Local Feature Frequency Profile: A Method to Measure Structural Similarity in Proteins,” Proc. Nat'l Aacademy of Sciences of the US, no. 101, pp. 3797-3802, Mar. 2004.
[12]
P. Ciaccia, M. Patella, and P. Zezula, “M-Tree: An Efficient Access Method for Similarity Search in Metric Space,” Proc. 23rd Very Large Data Bases Conf., pp. 426-435, 1997.
[13]
A. Debelack, J. Dehn, L. Muchinsky, and D. Smith, “Next Generation Air Traffic Control Automation,” IBM Systems J., vol. 24, no. 1, pp. 63-77, 1995.
[14]
S.C. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman, “Indexing by Latent Semantic Analysis,” J. Am. Soc. Information Science, vol. 41, no. 6, pp. 391-407, 1990.
[15]
D. Hull, “Improving Text Retrieval for the Routing Problem Using Latent Semantic Indexing,” Proc. 17th ACM-SIGIR Conf., pp. 282-291, 1994.
[16]
M.P.T. Doom and M. Raymer, “GA-Facilitated Knowledge Discovery and Pattern Recognition Optimization Applied to the Biochemistry of Protein Solvation,” Proc. Genetic and Evolutionary Computation Conf., pp. 426-437, May 2004.
[17]
A. Dumitras and A.N. Venetsanopoulos, “Angular Map-Driven Snakes with Application to Object Shape Description in Color Images,” IEEE Trans. Image Processing, vol. 10, no. 12, pp. 1851-1859, Dec. 2001.
[18]
H. Ferhatosmanoglu, D. Agrawal, and A.E. Abbadi, “Concentric Hyperspaces and Disk Allocation for Fast Parallel Range Searching,” Proc. Int'l Conf. Data Eng., pp. 608-615, Mar. 1999.
[19]
H. Ferhatosmanoglu, D. Agrawal, and A.E. Abbadi, “Efficient Processing of Conical Queries,” Proc. ACM Conf. Information and Knowledge Management (CIKM), pp. 1-8, 2001.
[20]
W.M.G. Dummy, “Introduction to Statistics for Bioinformatics, with Applications to Expression Microarrays,” CSB Tutorial Sessions, 2003.
[21]
A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 47-57, 1984.
[22]
F. Korn, H.V. Jagadish, and C. Faloutsos, “Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences,“ Proc. ACM-SIGMOD Int'l Conf. Management of Data, pp. 289-300, 1997.
[23]
P. Kostiuk, M. Adams, D. Allinger, G. Rosch, and J. Kuchar, “An Integrated Safety Analysis Methodology for Emerging Air Transport Technologies,” NASA/CR-1998-207661, p. 66, Apr. 1998.
[24]
LEDAS, Leicester Database and Archive Service,
[25]
P.O.B.M.B. Eisen, P.T. Spellman, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” Proc. Nat'l Aacademy of Sciences of the US., no. 95, pp. 14863-14868, Dec. 1998.
[26]
B.S. Manjunath, “Airphoto Dataset,”
[27]
Z.D. Michael, W. Berry, and E.R. Jessup, “Matrices, Vector Spaces and Information Retrieval,” SIAM Rev., vol. 41, no. 2, pp. 335-362, 1999.
[28]
J. Nievergelt, H. Hans, and K.C. Sevcik, “The Grid File: An Adaptable, Symmetric Multikey File Structure,” ACM Trans. Database Systems, vol. 9, no. 1, pp. 38-71, 1984.
[29]
D.K.P.E. Trahanias and A.N. Venetsanopoulos, “Directional Processing of Color Images: Theory and Experimental Results,” IEEE Trans. Image Processing, vol. 5, no. 6, pp. 868-880, Oct. 1996.
[30]
G. Qian, S. Sural, Y. Gu, and S. Pramanik, “Similarity between Euclidean and Cosine Angle Distance for Nearest Neighbor Queries,” Proc. ACM Symp. Applied Computing, pp. 1232-1237, 2004.
[31]
A. Razdan, “Wavelet Correlation Coefficient of Strongly Correlated Financial Time Series,” ArXiv:Cond-Mat, no. 2, Oct. 2003.
[32]
Y. Sakurai, M. Yoshikawa, S. Uemura, and H. Kojima, “The A-Tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation,” Proc. 26th Int'l Conf. Very Large Data Bases, pp. 516-526, 2000.
[33]
G. Salton and M.J. McGill, Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983.
[34]
A. Singhal, “Modern Information Retrieval: A Brief Overview,” IEEE Data Eng. Bull., vol. 24, no. 4, pp. 35-43, 2001.
[35]
V. Subrahmanian, Principles of Multimedia Database Systems. San Francisco: Morgan Kaufmann, 1999.
[36]
A.S. Szalay, J. Gray, and A.R. Thakar, “The sdss Skyserver: Public Access to the Sloan Digital Sky Server Data,” Proc. ACM SIGMOD Int'l Conf. Management of Data, pp. 570-581, 2002.
[37]
C. Tang, Z. Xu, and M. Mahalingam, “pSearch: Information Retrieval in Structured Overlays,” Proc. ACM HotNets-I, Oct. 2002.
[38]
P.E. Trahanias and A.N. Venetsanopoulos, “Vector Directional Filters: A New Class of Multichannel Image Processing Filters,” IEEE Trans. Image Processing, vol. 2, no. 4, pp. 528-534, Oct. 1993.
[39]
D.R. Warn, “Lighting Controls for Synthetic Images,” Computer Graphics, vol. 17, no. 3, pp. 13-21, 1983.
[40]
R. Weber, H.-J. Schek, and S. Blott, “A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces,” Proc. Int'l Conf. Very Large Data Bases, pp.194-205, Aug. 1998.
[41]
A. Wicenec and M. Albrecht, “Methods for Structuring and Searching Very Large Catalogs,” ASP Conf. Ser., Astronomical Data Analysis Software and Systems VII, no. 145, p. 512, 1998.
[42]
R. Wilkinson and P. Hingston, “Using Cosine Distance in a Neural Network for Document Retrieval,” Proc. ACM SIGIR Conf., pp.202-210, 1991.
[43]
P. Zhang, Y. Huang, S. Shekhar, and V. Kumar, “Correlation Analysis of Spatial Time Series Datasets: A Filter-and-Refine Approach,” Proc. Seventh Pacific-Asia Conf. Knowledge Discovery and Data Mining, 2003.
[44]
P. Zhang, Y. Huang, S. Shekhar, and V. Kumar, “Exploiting Spatial Autocorrelation to Efficiently Process Correlation-Based Similarity Queries,” Proc. Eighth Int'l Symp. Spatial and Temporal Databases, July 2003.

Cited By

View all
  • (2023)Generative adversarial networks in electrocardiogram synthesisArtificial Intelligence in Medicine10.1016/j.artmed.2023.102632143:COnline publication date: 1-Sep-2023
  • (2015)Diversity-Aware Top-k Publish/Subscribe for Text StreamProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2749451(347-362)Online publication date: 27-May-2015
  • (2015)Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataundefinedOnline publication date: 27-May-2015
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 18, Issue 11
November 2006
140 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 November 2006

Author Tags

  1. Angular query
  2. angular similarity measures
  3. high-dimensional data.
  4. indexing
  5. performance

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Generative adversarial networks in electrocardiogram synthesisArtificial Intelligence in Medicine10.1016/j.artmed.2023.102632143:COnline publication date: 1-Sep-2023
  • (2015)Diversity-Aware Top-k Publish/Subscribe for Text StreamProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2749451(347-362)Online publication date: 27-May-2015
  • (2015)Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataundefinedOnline publication date: 27-May-2015
  • (2014)Feature selection for fault level diagnosis of planetary gearboxesAdvances in Data Analysis and Classification10.1007/s11634-014-0168-48:4(377-401)Online publication date: 1-Dec-2014
  • (2012)The study of different similarity measure techniques in recognition of handwritten charactersProceedings of the International Conference on Advances in Computing, Communications and Informatics10.1145/2345396.2345524(781-787)Online publication date: 3-Aug-2012
  • (2011)Go with the flowProceedings of the First international ICST conference on Theory and practice of algorithms in (computer) systems10.5555/1987334.1987344(81-91)Online publication date: 18-Apr-2011
  • (2010)iPocProceedings of the 11th international conference on Web-age information management10.5555/1884017.1884060(345-356)Online publication date: 15-Jul-2010

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media