[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1401890.1401918acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Structured metric learning for high dimensional problems

Published: 24 August 2008 Publication History

Abstract

The success of popular algorithms such as k-means clustering or nearest neighbor searches depend on the assumption that the underlying distance functions reflect domain-specific notions of similarity for the problem at hand. The distance metric learning problem seeks to optimize a distance function subject to constraints that arise from fully-supervised or semisupervised information. Several recent algorithms have been proposed to learn such distance functions in low dimensional settings. One major shortcoming of these methods is their failure to scale to high dimensional problems that are becoming increasingly ubiquitous in modern data mining applications. In this paper, we present metric learning algorithms that scale linearly with dimensionality, permitting efficient optimization, storage, and evaluation of the learned metric. This is achieved through our main technical contribution which provides a framework based on the log-determinant matrix divergence which enables efficient optimization of structured, low-parameter Mahalanobis distances. Experimentally, we evaluate our methods across a variety of high dimensional domains, including text, statistical software analysis, and collaborative filtering, showing that our methods scale to data sets with tens of thousands or more features. We show that our learned metric can achieve excellent quality with respect to various criteria. For example, in the context of metric learning for nearest neighbor classification, we show that our methods achieve 24% higher accuracy over the baseline distance. Additionally, our methods yield very good precision while providing recall measures up to 20% higher than other baseline methods such as latent semantic analysis.

References

[1]
R. A. Baeza-Yates and B. A. Ribeiro-Neto. Modern Information Retrieval. ACM Press / Addison-Wesley, 1999.
[2]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, March 2004.
[3]
J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-theoretic Metric Learning. In ICML, June 2007.
[4]
S. C. Deerwester, S. T. Dumais, T. K. Landauer, G.W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391--407, 1990.
[5]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience Publication, 2000.
[6]
A. Globerson and S. Roweis. Metric Learning by Collapsing Classes. In NIPS, 2005
[7]
G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, MD, second edition, 1989.
[8]
J. Ha, C. Rossbach, J. Davis, I. Roy, D. Chen, H. Ramadan, and E. Witchel. Improved Error Reporting for Software that Uses Black Box Components. In Programming Language Design and Implementation (PLDI), 2007.
[9]
B. Kulis, M. Sustik, and I. S. Dhillon. Learning Low-rank Kernel Matrices. In ICML, 2006.
[10]
K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance Metric Learning for Large Margin Nearest Neighbor Classification. In NIPS, 2005.
[11]
E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. In NIPS, 2002.

Cited By

View all
  • (2024)Reinforcement Learning for Efficient Power Systems Planning: A Review of Operational and Expansion StrategiesEnergies10.3390/en1709216717:9(2167)Online publication date: 1-May-2024
  • (2023)Automatic MILP solver configuration by learning problem similaritiesAnnals of Operations Research10.1007/s10479-023-05508-x339:1-2(909-936)Online publication date: 14-Jul-2023
  • (2021)A new measure between sets of probability distributions with applications to erratic financial behaviorJournal of Statistical Mechanics: Theory and Experiment10.1088/1742-5468/ac3d912021:12(123404)Online publication date: 23-Dec-2021
  • Show More Cited By

Index Terms

  1. Structured metric learning for high dimensional problems

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2008
    1116 pages
    ISBN:9781605581934
    DOI:10.1145/1401890
    • General Chair:
    • Ying Li,
    • Program Chairs:
    • Bing Liu,
    • Sunita Sarawagi
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithms
    2. experimentation

    Qualifiers

    • Research-article

    Conference

    KDD08

    Acceptance Rates

    KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reinforcement Learning for Efficient Power Systems Planning: A Review of Operational and Expansion StrategiesEnergies10.3390/en1709216717:9(2167)Online publication date: 1-May-2024
    • (2023)Automatic MILP solver configuration by learning problem similaritiesAnnals of Operations Research10.1007/s10479-023-05508-x339:1-2(909-936)Online publication date: 14-Jul-2023
    • (2021)A new measure between sets of probability distributions with applications to erratic financial behaviorJournal of Statistical Mechanics: Theory and Experiment10.1088/1742-5468/ac3d912021:12(123404)Online publication date: 23-Dec-2021
    • (2020)Reinforcement Learning based Evolutionary Metric Filtering for High Dimensional Problems2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)10.1109/ICMLA51294.2020.00045(226-233)Online publication date: Dec-2020
    • (2020)Deep Metric Learning for text data based on Triplet NetworkIOP Conference Series: Materials Science and Engineering10.1088/1757-899X/806/1/012038806(012038)Online publication date: 5-May-2020
    • (2018)High-dimensional similarity learning via dual-sparse random projectionProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304889.3305078(3005-3011)Online publication date: 13-Jul-2018
    • (2018)Global-Local Temporal Saliency Action PredictionIEEE Transactions on Image Processing10.1109/TIP.2017.275114527:5(2272-2285)Online publication date: May-2018
    • (2018)Discriminative Transformation Learning for Fuzzy Sparse Subspace ClusteringIEEE Transactions on Cybernetics10.1109/TCYB.2017.272954248:8(2218-2231)Online publication date: Aug-2018
    • (2017)Efficient stochastic optimization for low-rank distance metric learningProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298239.3298375(933-939)Online publication date: 4-Feb-2017
    • (2017)Sparse Compositional Local Metric LearningProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/3097983.3098153(1097-1104)Online publication date: 13-Aug-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media