[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis

Published: 01 May 2007 Publication History

Abstract

Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in high-dimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a class are multimodal. An unsupervised dimensionality reduction method called locality-preserving projection (LPP) can work well with multimodal data due to its locality preserving property. However, since LPP does not take the label information into account, it is not necessarily useful in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to non-linear dimensionality reduction scenarios by applying the kernel trick.

References

[1]
. Albert. Regression and the Moore-Penrose Pseudoinverse. Academic Press, New York and London, 1972.
[2]
N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337-404, 1950.
[3]
G. Baudat and F. Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation, 12(10):2385-2404, 2000.
[4]
M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373-1396, 2003.
[5]
D. P. Bertsekas. On the Goldstein-Levitin-Polyak gradient projection method. IEEE Transactions on Automatic Control, 21(2):174-184, 1976.
[6]
C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998. URL http://www.ics.uci.edu/~mlearn/MLRepository.html.
[7]
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, 2004.
[8]
F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, Providence, R.I., 1997.
[9]
D. Coomans, M. Broeckaert, M. Jonckheer, and D. L. Massart. Comparison of multivariate discriminant techniques for clinical data--Application to the thyroid functional state. Methods of Information in Medicine, 22:93-101, 1983.
[10]
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, series B, 39(1):1-38, 1977.
[11]
R. O. Duda, P. E. Hart, and D. G. Stor. Pattern Classification. Wiley, New York, 2001.
[12]
N. Duffy and M. Collins. Convolution kernels for natural language. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.
[13]
B. S. Everitt, S. Landau, and M. Leese. Cluster Analysis. Arnold, London, 2001.
[14]
R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2): 179-188, 1936.
[15]
J. H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84:165-175, 1989.
[16]
K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, Inc., Boston, second edition, 1990.
[17]
T. Gärtner. A survey of kernels for structured data. SIGKDD Explorations, 5(1):S268-S275, 2003.
[18]
T. Gärtner, P. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In Proceedings of the Sixteenth Annual Conference on Computational Learning Theory, 2003.
[19]
A. Globerson, G. Chechik, F. Pereira, and N. Tishby. Euclidean embedding of co-occurrence data. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 497-504. MIT Press, Cambridge, MA, 2005.
[20]
A. Globerson and S. Roweis. Metric learning by collapsing classes. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 451-458, Cambridge, MA, 2006. MIT Press.
[21]
J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 513-520. MIT Press, Cambridge, MA, 2005.
[22]
J. Ham, D. D. Lee, S. Mika, and B. Schölkopf. A kernel view of the dimensionality reduction of manifolds. In Proceedings of the Twenty-First International Conference on Machine Learning, New York, NY, 2004. ACM Press.
[23]
T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6):607-615, 1996a.
[24]
T. Hastie and R. Tibshirani. Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society, Series B, 58(1):155-176, 1996b.
[25]
X. He and P. Niyogi. Locality preserving projections. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
[26]
R. E. Henkel. Tests of Significance. SAGE Publication, Beverly Hills, 1979.
[27]
G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507, 2006.
[28]
H. Kashima and T. Koyanagi. Kernels for semi-structured date. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 291-298, San Francisco, CA, 2002. Morgan Kaufmann.
[29]
H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the Twentieth International Conference on Machine Learning, San Francisco, CA, 2003. Morgan Kaufmann.
[30]
T. Kohonen. Self-Organization and Associative Memory. Springer-Verlag, Berlin, 1989.
[31]
R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 315-322, 2002.
[32]
S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79-86, 1951.
[33]
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2:419-444, 2002.
[34]
J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281-297, Berkeley, CA., USA, 1967. University of California Press.
[35]
S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A. Smola, and K.-R. Müller. Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):623-628, 2003.
[36]
G. Rätsch, T. Onoda, and K.-R. Müller. Soft margins for adaboost. Machine Learning, 42(3): 287-320, 2001.
[37]
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326, 2000.
[38]
L. K. Saul and S. T. Roweis. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4(Jun):119-155, 2003.
[39]
B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299-1319, 1998.
[40]
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
[41]
M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B, 36:111-147, 1974.
[42]
J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.
[43]
V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
[44]
G. Wahba. Spline Model for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia and Pennsylvania, 1990.
[45]
K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classification. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 1473-1480, Cambridge, MA, 2006. MIT Press.
[46]
L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1601-1608. MIT Press, Cambridge, MA, 2005.

Cited By

View all

Index Terms

  1. Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image The Journal of Machine Learning Research
      The Journal of Machine Learning Research  Volume 8, Issue
      5/1/2007
      1150 pages
      ISSN:1532-4435
      EISSN:1533-7928
      Issue’s Table of Contents

      Publisher

      JMLR.org

      Publication History

      Published: 01 May 2007
      Published in JMLR Volume 8

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)81
      • Downloads (Last 6 weeks)15
      Reflects downloads up to 13 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Local dual-graph discriminant classifier for binary classificationNeurocomputing10.1016/j.neucom.2024.127508581:COnline publication date: 7-May-2024
      • (2024)Subconcept perturbation-based classifier for within-class multimodal dataNeural Computing and Applications10.1007/s00521-023-09144-136:5(2479-2491)Online publication date: 1-Feb-2024
      • (2023)Capped norm linear discriminant analysis and its applicationsApplied Intelligence10.1007/s10489-022-04395-253:15(18488-18507)Online publication date: 31-Jan-2023
      • (2022)Feature Selection Using Elephant Herd Principal Component Optimization Technique in Big Data Streams Using Internet of ThingsInternational Journal of e-Collaboration10.4018/IJeC.30404118:2(1-14)Online publication date: 1-Mar-2022
      • (2022)One-shot Distributed Algorithm for Generalized Eigenvalue ProblemProceedings of the 8th International Conference on Computing and Artificial Intelligence10.1145/3532213.3532229(104-109)Online publication date: 18-Mar-2022
      • (2022)Neighborhood linear discriminant analysisPattern Recognition10.1016/j.patcog.2021.108422123:COnline publication date: 1-Mar-2022
      • (2022)Unified feature extraction framework based on contrastive learningKnowledge-Based Systems10.1016/j.knosys.2022.110028258:COnline publication date: 22-Dec-2022
      • (2022)Robust discriminative projection with dynamic graph regularization for feature extraction and classificationKnowledge-Based Systems10.1016/j.knosys.2022.109563253:COnline publication date: 11-Oct-2022
      • (2022)A decomposition-based multi-objective particle swarm optimization algorithm with a local search strategy for key quality characteristic identification in production processesComputers and Industrial Engineering10.1016/j.cie.2022.108617172:PAOnline publication date: 1-Oct-2022
      • (2022)Simultaneous p- and s-orders minmax robust locality preserving projectionMultimedia Tools and Applications10.1007/s11042-021-11393-y81:29(42513-42526)Online publication date: 1-Dec-2022
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media