Feature extraction

In machine learning, pattern recognition and in image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning and generalization steps, and in some cases leading to better human interpretations. Feature extraction is related to dimensionality reduction. (Wiki)

Overview
- A survey of dimensionality reduction techniques C.O.S.Sorzano, J.Vargas, A.Pascual‐Montano
- Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review (2019) Benyamin Ghojogh, Maria N. Samad, Sayema Asif Mashhadi,Tania Kapoor, Wahab Ali, Fakhri Karray, Mark Crowley
PCA Principal Component Analysis (Wiki)
- On lines and planes of closest fit to systems of points in space (1901) Karl Pearson
- Supervised PCA: Prediction by Supervised Principal Components (2006) Eric Bair, Trevor Hastie, Debashis Paul, Robert Tibshirani
- Sparse PCA (sklearn)
DPCA Dual Principal Component Analysis
KPCA Kernel Principal Component Analysis (sklearn, Wiki)
- Nonlinear Component Analysis as a Kernel Eigenvalue Problem (1998) Bernhard Scholkopf, Alexander Smola, Klaus-Robert Muller
- Kernel PCA for Novelty Detection (2006) Heiko Hoffmann
- Robust Kernel Principal Component Analysis Minh Hoai Nguyen, Fernando De la Torre
IPCA Incremental (online) PCA (CRAN, sklearn)
ICA Independent Component Analysis (Wiki)
- Independent Component Analysis: Algorithms and Applications (2000) Aapo Hyvärinen, Erkki Oja
- Independent Component Analysis (2001) - Free ebook Aapo Hyvarinen, Juha Karhunen, Erkki Oja
- FastICA (sklearn)
FLDA Fisher's Linear Discriminant Analysis (Supervised) (Wiki)

Similar to PCA, FLDA calculates the projection of data along a direction; however, rather than maximizing the variation of data, FLDA utilizes label information to get a projection maximizing the ratio of between-class variance to within-class variance. (Source)
- The Use of Multiple Measurements in Taxonomic Problems (1936) R. A. Fisher
- The Utilization of Multiple Measurements in Problems of Biological Classification (1948) - require registration C. Radhakrishna Rao
- PCA versus LDA (2001) Aleix M. Martinez, Avinash C. Kak
- Package: MASS includes lda (CRAN)
- Package: sda (CRAN)
KFLDA Kernel Fisher Linear Discriminant Analysis
MDS Multidimensional Scaling (Wiki)
- Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis (1964) J. B. Kruskal
- An Analysis of Classical Multidimensional Scaling (2019) Anna Little, Yuying Xie, Qiang Sun
- Packages: sklearn
Isomap (Homepage, Wiki)
- A Global Geometric Framework for Nonlinear Dimensionality Reduction (2000) Joshua B. Tenenbaum, Vin de Silva, John C. Langford
- Packages: dimRed, sklearn
Latent Dirichlet Allocation
- Online Learning for Latent Dirichlet Allocation (2010) Matthew D. Hoffman, David M. Blei, Francis Bach
Factor analysys (Wiki, sklearn)

This technique is used to reduce a large number of variables into fewer numbers of factors. The values of observed data are expressed as functions of a number of possible causes in order to find which are the most important. The observations are assumed to be caused by a linear transformation of lower-dimensional latent factors and added Gaussian noise. (Source)
t-SNE (Homepage, Wiki, CRAN, sklearn)
- Visualizing Data using t-SNE (2008) Laurens van der Maaten, Geoffrey Hinton
- Accelerating t-SNE using Tree-Based Algorithms (2014) Laurens van der Maaten
- Tree-SNE - Hieararchical t-SNE (Code)
  - Tree-SNE: Hierarchical Clustering and Visualization Using t-SNE (2020) Isaac Robinson, Emma Pierce-Hoffman
- Let-SNE
  - Let-SNE: A Hybrid Approach to Data Embedding and Visualization of Hyperspectral Imagery (2020) Megh Shukla, Biplab Banerjee, Krishna Mohan Buddhiraju
LLE Locally Linear Embedding

Constructs a k-nearest neighbor graph similar to Isomap. Then it tries to locally represent every data sample x i using a weighted summation of its k-nearest neighbors. (Source)
HLLE Hessian Eigenmapping

Projects data to a lower dimension while preserving the local neighborhood like LLE but uses the Hessian operator to better achieve this result and hence the name. (Source)
Laplacian Eigenmap Spectral Embedding
Maximum Variance Unfolding
NMF Non-negative matrix factorization
UMAP Uniform Manifold Approximation and Projection (Code, GPU version)
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction (2018) Leland McInnes, John Healy, James Melville
Trimap (Code, PyPI)
- Trimap: Large-scale Dimensionality Reduction Using Triplets (2019) Ehsan Amid, Manfred K. Warmuth
Autoencoders (Wiki)
SOM Self-Organizing Maps or Kohonen Maps (Wiki)
- Self-Organized Formation of Topologically Correct Feature Maps (1982) Teuvo Kohonen
Sammon’s Mapping
SDE Semi-definite embedding
LargeVis
- Visualizing Large-scale and High-dimensional Data (2016) Jian Tang, Jingzhou Liu, Ming Zhang, Qiaozhu Mei

Software

R
- dimRed (CRAN)
- dyndimred (CRAN)
- intrinsicDimemsion (CRAN)
- Rdimtools (Paper, CRAN)
Python
- scikit-learn
- umap-learn (Homepage, PyPI)
Javascript
- tsne (NPM)
- umap-js (NPM)
- dimred (NPM)
C++
- tapkee (Code)
Web
- StatSim (Vis)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
_config.yml		_config.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feature extraction

Software

About

Releases

Packages

mlpapers/feature-extraction

Folders and files

Latest commit

History

Repository files navigation

Feature extraction

Software

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages