Feature extraction In machine learning, pattern recognition and in image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning and generalization steps, and in some cases leading to better human interpretations. Feature extraction is related to dimensionality reduction. (Wiki) Overview A survey of dimensionality reduction techniques C.O.S.Sorzano, J.Vargas, A.Pascual‐Montano Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review (2019) Benyamin Ghojogh, Maria N. Samad, Sayema Asif Mashhadi,Tania Kapoor, Wahab Ali, Fakhri Karray, Mark Crowley PCA Principal Component Analysis (Wiki) On lines and planes of closest fit to systems of points in space (1901) Karl Pearson Supervised PCA: Prediction by Supervised Principal Components (2006) Eric Bair, Trevor Hastie, Debashis Paul, Robert Tibshirani Sparse PCA (sklearn) DPCA Dual Principal Component Analysis KPCA Kernel Principal Component Analysis (sklearn, Wiki) Nonlinear Component Analysis as a Kernel Eigenvalue Problem (1998) Bernhard Scholkopf, Alexander Smola, Klaus-Robert Muller Kernel PCA for Novelty Detection (2006) Heiko Hoffmann Robust Kernel Principal Component Analysis Minh Hoai Nguyen, Fernando De la Torre IPCA Incremental (online) PCA (CRAN, sklearn) ICA Independent Component Analysis (Wiki) Independent Component Analysis: Algorithms and Applications (2000) Aapo Hyvärinen, Erkki Oja Independent Component Analysis (2001) - Free ebook Aapo Hyvarinen, Juha Karhunen, Erkki Oja FastICA (sklearn) FLDA Fisher's Linear Discriminant Analysis (Supervised) (Wiki) Similar to PCA, FLDA calculates the projection of data along a direction; however, rather than maximizing the variation of data, FLDA utilizes label information to get a projection maximizing the ratio of between-class variance to within-class variance. (Source) The Use of Multiple Measurements in Taxonomic Problems (1936) R. A. Fisher The Utilization of Multiple Measurements in Problems of Biological Classification (1948) - require registration C. Radhakrishna Rao PCA versus LDA (2001) Aleix M. Martinez, Avinash C. Kak Package: MASS includes lda (CRAN) Package: sda (CRAN) KFLDA Kernel Fisher Linear Discriminant Analysis MDS Multidimensional Scaling (Wiki) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis (1964) J. B. Kruskal An Analysis of Classical Multidimensional Scaling (2019) Anna Little, Yuying Xie, Qiang Sun Packages: sklearn Isomap (Homepage, Wiki) A Global Geometric Framework for Nonlinear Dimensionality Reduction (2000) Joshua B. Tenenbaum, Vin de Silva, John C. Langford Packages: dimRed, sklearn Latent Dirichlet Allocation Online Learning for Latent Dirichlet Allocation (2010) Matthew D. Hoffman, David M. Blei, Francis Bach Factor analysys (Wiki, sklearn) This technique is used to reduce a large number of variables into fewer numbers of factors. The values of observed data are expressed as functions of a number of possible causes in order to find which are the most important. The observations are assumed to be caused by a linear transformation of lower-dimensional latent factors and added Gaussian noise. (Source) t-SNE (Homepage, Wiki, CRAN, sklearn) Visualizing Data using t-SNE (2008) Laurens van der Maaten, Geoffrey Hinton Accelerating t-SNE using Tree-Based Algorithms (2014) Laurens van der Maaten Tree-SNE - Hieararchical t-SNE (Code) Tree-SNE: Hierarchical Clustering and Visualization Using t-SNE (2020) Isaac Robinson, Emma Pierce-Hoffman Let-SNE Let-SNE: A Hybrid Approach to Data Embedding and Visualization of Hyperspectral Imagery (2020) Megh Shukla, Biplab Banerjee, Krishna Mohan Buddhiraju LLE Locally Linear Embedding Constructs a k-nearest neighbor graph similar to Isomap. Then it tries to locally represent every data sample x i using a weighted summation of its k-nearest neighbors. (Source) HLLE Hessian Eigenmapping Projects data to a lower dimension while preserving the local neighborhood like LLE but uses the Hessian operator to better achieve this result and hence the name. (Source) Laplacian Eigenmap Spectral Embedding Maximum Variance Unfolding NMF Non-negative matrix factorization UMAP Uniform Manifold Approximation and Projection (Code, GPU version) UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction (2018) Leland McInnes, John Healy, James Melville Trimap (Code, PyPI) Trimap: Large-scale Dimensionality Reduction Using Triplets (2019) Ehsan Amid, Manfred K. Warmuth Autoencoders (Wiki) SOM Self-Organizing Maps or Kohonen Maps (Wiki) Self-Organized Formation of Topologically Correct Feature Maps (1982) Teuvo Kohonen Sammon’s Mapping SDE Semi-definite embedding LargeVis Visualizing Large-scale and High-dimensional Data (2016) Jian Tang, Jingzhou Liu, Ming Zhang, Qiaozhu Mei Software R dimRed (CRAN) dyndimred (CRAN) intrinsicDimemsion (CRAN) Rdimtools (Paper, CRAN) Python scikit-learn umap-learn (Homepage, PyPI) Javascript tsne (NPM) umap-js (NPM) dimred (NPM) C++ tapkee (Code) Web StatSim (Vis)