[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Distributional Scaling: An Algorithm for Structure-Preserving Embedding of Metric and Nonmetric Spaces

Published: 01 December 2004 Publication History

Abstract

We present a novel approach for embedding general metric and nonmetric spaces into low-dimensional Euclidean spaces. As opposed to traditional multidimensional scaling techniques, which minimize the distortion of pairwise distances, our embedding algorithm seeks a low-dimensional representation of the data that preserves the structure (geometry) of the original data. The algorithm uses a hybrid criterion function that combines the pairwise distortion with what we call the geometric distortion. To assess the geometric distortion, we explore functions that reflect geometric properties. Our approach is different from the Isomap and LLE algorithms in that the discrepancy in distributional information is used to guide the embedding. We use clustering algorithms in conjunction with our embedding algorithm to direct the embedding process and improve its convergence properties.We test our method on metric and nonmetric data sets, and in the presence of noise. We demonstrate that our method preserves the structural properties of embedded data better than traditional MDS, and that its performance is robust with respect to clustering errors in the original data. Other results of the paper include accelerated algorithms for optimizing the standard MDS objective functions, and two methods for finding the most appropriate dimension in which to embed a given set of data.

References

[1]
D. K. Agrafiotis and H. Xu. A self-organizing principle for learning nonlinear manifolds. Proceedings of the National Academy of Arts and Sciences, 99:15869-15872, 2002.
[2]
I. Apostol and W. Szpankowski. Indexing and mapping of proteins using a modified nonlinear Sammon projection. Journal of Computational Chemistry, 20:1049-1059, 1999.
[3]
W. Basalaj. Incremental multidimensional scaling method for database visualization. In Visual Data Exploration and Analysis VI (Proceedings of the SPIE), volume 3643, pages 149-158, 1999.
[4]
M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral eigenmaps for embedding and clustering. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, volume 14, pages 585-591. The MIT Press, 2002.
[5]
M. Blatt, S. Wiseman, and E. Domani. Data clustering using a model granular magnet. Neural Computation, 9:1805-1842, 1997.
[6]
T. F. Cox and M. A. A. Cox. Multidimensional Scaling. Chapman and Hall CRC, second edition, 2001.
[7]
D. L. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Arts and Sciences, 100:5591-5596, 2003.
[8]
S. Dubnov, R. El-Yaniv, Y. Gdalyahu, E. Schneidman, N. Tishby, and G. Yona. A new nonparametric pairwise clustering algorithm based on iterative estimation of distance profiles. Machine Learning, 47:35-61, 2002.
[9]
R. El-Yaniv, S. Fine, and N. Tishby. Agnostic classification of Markovian sequences. In M. I. Jordan, M. J. Kearns, and S. A. Solla, editors, Advances in Neural Information Processing Systems, volume 10, pages 465-471. The MIT Press, 1998.
[10]
Y. Gdalyahu, D. Weinshall, and M. Werman. A randomized algorithm for pairwise clustering. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems, volume 11, pages 424-430. The MIT Press, 1999.
[11]
J. C. Gower. Some distance properties of latent root and vector methods in multivariate analysis. Biometrika, 53:325-338, 1966.
[12]
H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417-441, 498-520, 1933.
[13]
A. J. Kearsley, R. A. Tapia, and M. W. Trosset. The solution of the metric STRESS and SSTRESS problems in multidimensional scaling using Newton's method. Computational Statistics, 13: 369-396, 1998.
[14]
R. W. Klein and R. C. Dubes. Experiments in projection and clustering by simulated annealing. Pattern Recognition, 22:213-220, 1989.
[15]
H. Klock and J. M. Buhmann. Multidimensional scaling by deterministic annealing. In Proceedings of the International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 245-260, 1997.
[16]
S. Kullback. Information Theory and Statistics. John Wiley and Sons, 1959.
[17]
E. Levina and P. Bickel. The earth mover's distance is the Mallows distance: Some insights from statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, pages 251-256, 2001.
[18]
J. Lin. Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1):145-151, 1991.
[19]
N. Linial, E. London, and Yu. Rabinovich. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15:215-245, 1995.
[20]
S. W. Malone and M. W. Trosset. A study of the stationary configurations of the SSTRESS criterion for metric multidimensional scaling. Technical Report 00-06, Department of Computational & Applied Mathematics, Rice University, 2000.
[21]
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, second edition, 1993.
[22]
V. Roth, J. Laub, M. Kawanabe, and J. M. Buhmann. Optimal cluster preserving embedding of non-metric proximity data. Technical Report IAI-TR-2002-5, University of Bonn, Informatik III, 2002.
[23]
S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290:2323-2326, 2000.
[24]
Y. Rubner, C. Tomasi, and L. B. Guibas. A metric for distributions with applications to image databases. In Proceedings of the Sixth IEEE International Conference on Computer Vision, pages 59-66, 1998.
[25]
J. W. Sammon. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18:401-409, 1969.
[26]
L. K. Saul and S. T. Roweis. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4:119-155, 2003.
[27]
J. Shi and J. Malik. Normalized cuts and image segmentation. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pages 731-737, 1997.
[28]
P. S. Smith. Threshold validity for mutual neighborhood clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 15:89-92, 1993.
[29]
J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:2319-2323, 2000.
[30]
Z. Wu and R. Leahy. An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. PAMI, 15:1101-1113, 1993.
[31]
G. Yona. Methods for Global Organization of the Protein Sequence Space. PhD thesis, The Hebrew University, Jerusalem, Israel, 1999.
[32]
G. Young and A. S. Householder. Discussion of a set of points in terms of their mutual distances. Psychometrika, 3:19-22, 1938.

Cited By

View all
  • (2024)ModalChorus: Visual Probing and Alignment of Multi-Modal Embeddings via Modal Fusion MapIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345638731:1(294-304)Online publication date: 9-Sep-2024
  • (2019)Geometric Representations of Dichotomous Ordinal DataGraph-Theoretic Concepts in Computer Science10.1007/978-3-030-30786-8_16(205-217)Online publication date: 19-Jun-2019
  • (2014)Local ordinal embeddingProceedings of the 31st International Conference on International Conference on Machine Learning - Volume 3210.5555/3044805.3044987(II-847-II-855)Online publication date: 21-Jun-2014

Index Terms

  1. Distributional Scaling: An Algorithm for Structure-Preserving Embedding of Metric and Nonmetric Spaces

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image The Journal of Machine Learning Research
      The Journal of Machine Learning Research  Volume 5, Issue
      12/1/2004
      1571 pages
      ISSN:1532-4435
      EISSN:1533-7928
      Issue’s Table of Contents

      Publisher

      JMLR.org

      Publication History

      Published: 01 December 2004
      Published in JMLR Volume 5

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)44
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)ModalChorus: Visual Probing and Alignment of Multi-Modal Embeddings via Modal Fusion MapIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345638731:1(294-304)Online publication date: 9-Sep-2024
      • (2019)Geometric Representations of Dichotomous Ordinal DataGraph-Theoretic Concepts in Computer Science10.1007/978-3-030-30786-8_16(205-217)Online publication date: 19-Jun-2019
      • (2014)Local ordinal embeddingProceedings of the 31st International Conference on International Conference on Machine Learning - Volume 3210.5555/3044805.3044987(II-847-II-855)Online publication date: 21-Jun-2014

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media