More Web Proxy on the site http://driver.im/

article

Free access

Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis

Author:

Masashi SugiyamaAuthors Info & Claims

The Journal of Machine Learning Research, Volume 8

Pages 1027 - 1061

Published: 01 May 2007 Publication History

Abstract

Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in high-dimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a class are multimodal. An unsupervised dimensionality reduction method called locality-preserving projection (LPP) can work well with multimodal data due to its locality preserving property. However, since LPP does not take the label information into account, it is not necessarily useful in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to non-linear dimensionality reduction scenarios by applying the kernel trick.

References

[1]

. Albert. Regression and the Moore-Penrose Pseudoinverse. Academic Press, New York and London, 1972.

[2]

N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68:337-404, 1950.

[3]

G. Baudat and F. Anouar. Generalized discriminant analysis using a kernel approach. Neural Computation, 12(10):2385-2404, 2000.

Digital Library

[4]

M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373-1396, 2003.

Digital Library

[5]

D. P. Bertsekas. On the Goldstein-Levitin-Polyak gradient projection method. IEEE Transactions on Automatic Control, 21(2):174-184, 1976.

[6]

C. L. Blake and C. J. Merz. UCI repository of machine learning databases, 1998. URL http://www.ics.uci.edu/~mlearn/MLRepository.html.

[7]

S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, 2004.

Digital Library

[8]

F. R. K. Chung. Spectral Graph Theory. American Mathematical Society, Providence, R.I., 1997.

[9]

D. Coomans, M. Broeckaert, M. Jonckheer, and D. L. Massart. Comparison of multivariate discriminant techniques for clinical data--Application to the thyroid functional state. Methods of Information in Medicine, 22:93-101, 1983.

[10]

A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, series B, 39(1):1-38, 1977.

[11]

R. O. Duda, P. E. Hart, and D. G. Stor. Pattern Classification. Wiley, New York, 2001.

Digital Library

[12]

N. Duffy and M. Collins. Convolution kernels for natural language. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.

[13]

B. S. Everitt, S. Landau, and M. Leese. Cluster Analysis. Arnold, London, 2001.

Digital Library

[14]

R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2): 179-188, 1936.

[15]

J. H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84:165-175, 1989.

[16]

K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic Press, Inc., Boston, second edition, 1990.

Digital Library

[17]

T. Gärtner. A survey of kernels for structured data. SIGKDD Explorations, 5(1):S268-S275, 2003.

Digital Library

[18]

T. Gärtner, P. Flach, and S. Wrobel. On graph kernels: Hardness results and efficient alternatives. In Proceedings of the Sixteenth Annual Conference on Computational Learning Theory, 2003.

[19]

A. Globerson, G. Chechik, F. Pereira, and N. Tishby. Euclidean embedding of co-occurrence data. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 497-504. MIT Press, Cambridge, MA, 2005.

[20]

A. Globerson and S. Roweis. Metric learning by collapsing classes. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 451-458, Cambridge, MA, 2006. MIT Press.

[21]

J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 513-520. MIT Press, Cambridge, MA, 2005.

Digital Library

[22]

J. Ham, D. D. Lee, S. Mika, and B. Schölkopf. A kernel view of the dimensionality reduction of manifolds. In Proceedings of the Twenty-First International Conference on Machine Learning, New York, NY, 2004. ACM Press.

Digital Library

[23]

T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6):607-615, 1996a.

Digital Library

[24]

T. Hastie and R. Tibshirani. Discriminant analysis by Gaussian mixtures. Journal of the Royal Statistical Society, Series B, 58(1):155-176, 1996b.

[25]

X. He and P. Niyogi. Locality preserving projections. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.

[26]

R. E. Henkel. Tests of Significance. SAGE Publication, Beverly Hills, 1979.

[27]

G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507, 2006.

[28]

H. Kashima and T. Koyanagi. Kernels for semi-structured date. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 291-298, San Francisco, CA, 2002. Morgan Kaufmann.

Digital Library

[29]

H. Kashima, K. Tsuda, and A. Inokuchi. Marginalized kernels between labeled graphs. In Proceedings of the Twentieth International Conference on Machine Learning, San Francisco, CA, 2003. Morgan Kaufmann.

Digital Library

[30]

T. Kohonen. Self-Organization and Associative Memory. Springer-Verlag, Berlin, 1989.

Digital Library

[31]

R. I. Kondor and J. Lafferty. Diffusion kernels on graphs and other discrete input spaces. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 315-322, 2002.

Digital Library

[32]

S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79-86, 1951.

[33]

H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2:419-444, 2002.

Digital Library

[34]

J. B. MacQueen. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 281-297, Berkeley, CA., USA, 1967. University of California Press.

[35]

S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A. Smola, and K.-R. Müller. Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):623-628, 2003.

Digital Library

[36]

G. Rätsch, T. Onoda, and K.-R. Müller. Soft margins for adaboost. Machine Learning, 42(3): 287-320, 2001.

Digital Library

[37]

S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323-2326, 2000.

[38]

L. K. Saul and S. T. Roweis. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4(Jun):119-155, 2003.

Digital Library

[39]

B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299-1319, 1998.

Digital Library

[40]

B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.

[41]

M. Stone. Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, Series B, 36:111-147, 1974.

[42]

J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.

[43]

V. N. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.

Digital Library

[44]

G. Wahba. Spline Model for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia and Pennsylvania, 1990.

[45]

K. Weinberger, J. Blitzer, and L. Saul. Distance metric learning for large margin nearest neighbor classification. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 1473-1480, Cambridge, MA, 2006. MIT Press.

[46]

L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1601-1608. MIT Press, Cambridge, MA, 2005.

Cited By

Zheng XZhang LYan L(2024)Local dual-graph discriminant classifier for binary classificationNeurocomputing10.1016/j.neucom.2024.127508581:COnline publication date: 7-May-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127508
Cavalcanti GSoares RAraújo E(2024)Subconcept perturbation-based classifier for within-class multimodal dataNeural Computing and Applications10.1007/s00521-023-09144-136:5(2479-2491)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1007/s00521-023-09144-1
Liu JXiong XRen PLi CShao Y(2023)Capped norm linear discriminant analysis and its applicationsApplied Intelligence10.1007/s10489-022-04395-253:15(18488-18507)Online publication date: 31-Jan-2023
https://dl.acm.org/doi/10.1007/s10489-022-04395-2
Show More Cited By

Index Terms

Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis
1. Computing methodologies
  1. Machine learning
2. Theory of computation
  1. Logic
    1. Modal and temporal logics

Recommendations

Local Fisher discriminant analysis for supervised dimensionality reduction
ICML '06: Proceedings of the 23rd international conference on Machine learning

Dimensionality reduction is one of the important preprocessing steps in high-dimensional data analysis. In this paper, we consider the supervised dimensionality reduction problem where samples are accompanied with class labels. Traditional Fisher ...
Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis

Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in high-dimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it ...
Local Weighted Semi-supervised Discriminant Analysis for Dimensionality Reduction
IHMSC '15: Proceedings of the 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics - Volume 01

In this paper, we present a novel weighted version of semi-supervised discriminant analysis method by assigning weights to each labeled samples. The proposed within-class weight can detect the outliers and between-class weight can discover the support ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 8, Issue

5/1/2007

1150 pages

ISSN:1532-4435

EISSN:1533-7928

Issue’s Table of Contents

Publisher

JMLR.org

Publication History

Published: 01 May 2007

Published in JMLR Volume 8

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

211
Total Citations
View Citations
1,486
Total Downloads

Downloads (Last 12 months)81
Downloads (Last 6 weeks)15

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zheng XZhang LYan L(2024)Local dual-graph discriminant classifier for binary classificationNeurocomputing10.1016/j.neucom.2024.127508581:COnline publication date: 7-May-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.127508
Cavalcanti GSoares RAraújo E(2024)Subconcept perturbation-based classifier for within-class multimodal dataNeural Computing and Applications10.1007/s00521-023-09144-136:5(2479-2491)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1007/s00521-023-09144-1
Liu JXiong XRen PLi CShao Y(2023)Capped norm linear discriminant analysis and its applicationsApplied Intelligence10.1007/s10489-022-04395-253:15(18488-18507)Online publication date: 31-Jan-2023
https://dl.acm.org/doi/10.1007/s10489-022-04395-2
N. GK. M(2022)Feature Selection Using Elephant Herd Principal Component Optimization Technique in Big Data Streams Using Internet of ThingsInternational Journal of e-Collaboration10.4018/IJeC.30404118:2(1-14)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.4018/IJeC.304041
Lv KSun ZHe FHuang XYang J(2022)One-shot Distributed Algorithm for Generalized Eigenvalue ProblemProceedings of the 8th International Conference on Computing and Artificial Intelligence10.1145/3532213.3532229(104-109)Online publication date: 18-Mar-2022
https://dl.acm.org/doi/10.1145/3532213.3532229
Zhu FGao JYang JYe N(2022)Neighborhood linear discriminant analysisPattern Recognition10.1016/j.patcog.2021.108422123:COnline publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.patcog.2021.108422
Zhang HQiang WZhang JChen YJing L(2022)Unified feature extraction framework based on contrastive learningKnowledge-Based Systems10.1016/j.knosys.2022.110028258:COnline publication date: 22-Dec-2022
https://dl.acm.org/doi/10.1016/j.knosys.2022.110028
Qu HLi LLi ZZheng JTang X(2022)Robust discriminative projection with dynamic graph regularization for feature extraction and classificationKnowledge-Based Systems10.1016/j.knosys.2022.109563253:COnline publication date: 11-Oct-2022
https://dl.acm.org/doi/10.1016/j.knosys.2022.109563
He ZHu HZhang MZhang YLi A(2022)A decomposition-based multi-objective particle swarm optimization algorithm with a local search strategy for key quality characteristic identification in production processesComputers and Industrial Engineering10.1016/j.cie.2022.108617172:PAOnline publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1016/j.cie.2022.108617
Song BTian YAl-Nabhan N(2022)Simultaneous p- and s-orders minmax robust locality preserving projectionMultimedia Tools and Applications10.1007/s11042-021-11393-y81:29(42513-42526)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s11042-021-11393-y
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents