ℓ1 Major Component Detection and Analysis (ℓ1 MCDA) in Three and Higher Dimensional Spaces
<p>Sample from one Gaussian distribution.</p> "> Figure 2
<p>Sample from one Student <span class="html-italic">t</span> distribution.</p> "> Figure 3
<p>Sample from four overlaid Gaussian distributions.</p> "> Figure 4
<p>Sample from four overlaid Student <span class="html-italic">t</span> distributions.</p> "> Figure 5
<p>Sample from one Gaussian distribution with 10% artificial outliers.</p> "> Figure 6
<p>Sample from one Student <span class="html-italic">t</span> distribution with 10% artificial outliers.</p> "> Figure 7
<p>Sample from four Gaussian distributions with 10% artificial outliers.</p> "> Figure 8
<p>Sample from four Student <span class="html-italic">t</span> distributions with 10% artificial outliers.</p> ">
Abstract
:1. Introduction
- Calculate the central point of the data and subtract it out of the data.
- Find a local neighbor of a direction.
- Calculate the quadratic surface that best fits the data in the current local neighbor in norm.
- Determine the location of the maximum of the quadratic surface over the local neighbor. If the location of the maximum is strictly inside the local neighbor, go to Step 5; otherwise move to the direction on the boundary of the local neighbor and return to Step 2.
- Calculate a new best-fit quadratic surface on a larger neighbor of the direction identified in Step 4. Output the maximum of the new quadratic surface.
2. High Dimensional Major Component Detection and Analysis (MCDA)
- (1)
- Calculate the central point of data and subtract it out of the data.
- (2)
- Calculate the major directions of the point cloud resulting from Step 1.
- (3)
- Calculate the radial spread of the point cloud in various directions such as directions where the radial spread is maximal.
2.1. Calculation of the Central Point
2.2. Calculation of the Major Directions and Median Radii in those Directions
- For , transform into “polar” form by the relations
- Choose a direction to start, where .
- Choose a positive integer I that represents the size of the neighbor of a point (including the given point itself). Determine the neighbor of by choosing I directions , , that are nearest to in a given angular measure (-norm or -norm, etc.).
- Determine the neighbor for each direction by choosing I directions , , that are nearest to . For each direction , obtain the neighborhood of I directions , , that are nearest to and calculate the median of the lengths . Then, calculate the median of the lengths .
- Determine the maximum of the median lengths , . Let . If the maximum is achieved at , go to Step 6. Otherwise, set and return to Step 3 .
- Refine the location and value of the local maximum of the median radius. Calculate a neighbor of directions of the local maximal direction identified in Step 5, where is the smallest integer no less than . The median of in is the calculated major direction of the distribution. The median of the lengths is the estimate of the median radius in the major direction.
3. Computational Experiments
- n-Dimensional () Gaussian without and with additional artificial outliers;
- n-Dimensional () Student t (degree of freedom = 1) without and with additional artificial outliers;
- Four superimposed n-dimensional () Gaussians without and with additional artificial outliers;
- Four superimposed n-dimensional () Student t (degree of freedom = 1) without and with additional artificial outliers.
n | MCDA | Croux + Ruiz-Gazen | Brooks et al. | |||
---|---|---|---|---|---|---|
av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | |
3 | 0.0112 | 0.0314 | 0.0289 | 0.0681 | 0.0556 | 0.0675 |
8 | 0.0233 | 0.2498 | 0.0399 | 0.2177 | 0.0118 | 0.1586 |
10 | 0.0277 | 0.3129 | 0.0308 | 0.2683 | 0.0090 | 0.2256 |
30 | 0.0315 | 0.5772 | 0.0112 | 0.5308 | 0.0052 | 0.5099 |
50 | 0.0363 | 0.6539 | 0.0136 | 0.6269 | – | – |
n | MCDA | Croux + Ruiz-Gazen | Brooks et al. | |||
---|---|---|---|---|---|---|
av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | |
3 | 0.0276 | 0.0921 | 0.0346 | 0.1224 | 0.0557 | 0.1310 |
8 | 0.0423 | 0.2035 | 0.0483 | 0.2113 | 0.0118 | 0.1800 |
10 | 0.0441 | 0.2717 | 0.0455 | 0.2625 | 0.0090 | 0.2352 |
30 | 0.0446 | 0.5367 | 0.0113 | 0.5163 | 0.0053 | 0.4869 |
50 | 0.0448 | 0.6350 | 0.0140 | 0.6100 | – | – |
n | MCDA | Croux + Ruiz-Gazen | Brooks et al. | |||
---|---|---|---|---|---|---|
av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | |
3 | 0.0119 | 0.0373 | 0.0318 | 0.0759 | 0.0560 | 0.0654 |
8 | 0.0236 | 0.2512 | 0.0499 | 0.2292 | 0.0211 | 0.1588 |
10 | 0.0279 | 0.3134 | 0.0549 | 0.2781 | 0.0170 | 0.2289 |
30 | 0.0322 | 0.5785 | 0.0990 | 0.5376 | 0.0072 | 0.5077 |
50 | 0.0375 | 0.6601 | 0.1393 | 0.6308 | – | – |
n | MCDA | Croux + Ruiz-Gazen | Brooks et al. | |||
---|---|---|---|---|---|---|
av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | |
3 | 0.0276 | 0.0969 | 0.0366 | 0.1140 | 0.0554 | 0.1515 |
8 | 0.0424 | 0.2038 | 0.0556 | 0.2054 | 0.0214 | 0.1714 |
10 | 0.0441 | 0.2725 | 0.0616 | 0.2775 | 0.0170 | 0.2314 |
30 | 0.0467 | 0.5376 | 0.1009 | 0.5238 | 0.0072 | 0.4833 |
50 | 0.0449 | 0.6351 | 0.1390 | 0.6142 | – | – |
MCDA | Croux + Ruiz-Gazen | Brooks et al. | |
---|---|---|---|
Table 1 | 3.77 | 4.20 | 236.03 |
Table 2 | 3.64 | 4.50 | 239.32 |
Table 3 | 3.55 | 4.46 | 286.29 |
Table 4 | 3.21 | 4.49 | 278.69 |
n | knn = 640 (2%) | knn = 320 (1%) | knn = 160 (0.5%) | |||
---|---|---|---|---|---|---|
av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | |
3 | 0.0124 | 0.0499 | 0.0089 | 0.0374 | 0.0121 | 0.0428 |
10 | 0.0226 | 0.0904 | 0.0294 | 0.0666 | 0.0364 | 0.0805 |
50 | 0.0269 | 0.5323 | 0.0352 | 0.4931 | 0.0448 | 0.4518 |
n | knn = 640 (2%) | knn = 320 (1%) | knn = 160 (0.5%) | |||
---|---|---|---|---|---|---|
av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | |
3 | 0.0120 | 0.0552 | 0.0155 | 0.0794 | 0.0211 | 0.1420 |
10 | 0.0225 | 0.3225 | 0.0318 | 0.2458 | 0.0437 | 0.1777 |
50 | 0.0250 | 0.6220 | 0.0335 | 0.5814 | 0.0430 | 0.5355 |
n | knn = 640 (2%) | knn = 320 (1%) | knn = 160 (0.5%) | |||
---|---|---|---|---|---|---|
av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | |
3 | 0.0152 | 0.0396 | 0.0099 | 0.0406 | 0.0171 | 0.0493 |
10 | 0.0271 | 0.2614 | 0.0208 | 0.3044 | 0.0173 | 0.3615 |
50 | 0.0379 | 0.6275 | 0.0308 | 0.6566 | 0.0261 | 0.6920 |
n | knn = 640 (2%) | knn = 320 (1%) | knn = 160 (0.5%) | |||
---|---|---|---|---|---|---|
av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | av. abs_err of angle | av. rel_err ra. of length | |
3 | 0.0279 | 0.1639 | 0.0184 | 0.0941 | 0.0173 | 0.0501 |
10 | 0.0452 | 0.1671 | 0.0336 | 0.2570 | 0.0238 | 0.3308 |
50 | 0.0499 | 0.5333 | 0.0388 | 0.5851 | 0.0293 | 0.6198 |
4. Conclusions and Future Work
- Use of alternative principles for calculation of the central point (for example, for V-shaped distributions or more complicated asymmetric distributions with spokes for which the median of the data is not a meaningful central point);
- Properties of MCDA for high dimensions ( to );
- Ability of MCDA to deal with missing data.
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Jolliffe, I.T. Principal Component Analysis, 2nd ed.; Springer: New York City, NY, USA, 2002. [Google Scholar]
- Candès, E.J.; Li, X.; Ma, Y.; Wright, J. Robust Principal Component Analysis; Technical Report No. 13; Department of Statistics, Stanford University: Stanford, CA, USA, 2009. [Google Scholar]
- Choulakian, V. L1-Norm projection pursuit principal component analysis. Comput. Stat. Data Anal. 2006, 50, 1441–1451. [Google Scholar] [CrossRef]
- Croux, C.; Filzmoser, P.; Fritz, H. Robust sparse principal component analysis. Technometrics 2013, 55, 202–214. [Google Scholar] [CrossRef]
- Croux, C.; Ruiz-Gazen, A. High breakdown estimators for principal components: The projection-pursuit approach revisited. J. Multivar. Anal. 2005, 95, 206–226. [Google Scholar] [CrossRef]
- Ke, Q.; Kanade, T. Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming. IEEE Conf. Comput. Vis. Pattern Recognit. 2005, 1, 739–746. [Google Scholar]
- Kwak, N. Principal component analysis based on L1-norm maximization. IEEE Trans. Pattern Anal. 2008, 30, 1672–1680. [Google Scholar] [CrossRef] [PubMed]
- Lin, Z.; Chen, M.; Wu, L.; Ma, Y. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrix. Available online: http://arxiv.org/abs/1009.5055 (accessed on 18 October 2013).
- Xu, H.; Caramanis, C.; Mannor, S. Outlier-robust PCA: the high-dimensional case. IEEE Trans. Inf. Theory 2013, 59, 546–572. [Google Scholar] [CrossRef]
- Gribonval, R.; Nielsen, M. Sparse approximations in signal and image processing. Signal Process. 2006, 86, 415–416. [Google Scholar] [CrossRef] [Green Version]
- Lai, M.-J.; Wang, J. An unconstrained ℓq minimization with 0 < q ≤ 1 for sparse solution of under-determined linear systems. SIAM J. Optim. 2010, 21, 82–101. [Google Scholar]
- Candès, E.J.; Wakin, M.B. An introduction to compressive sampling. IEEE Signal Process. Mag. 2008, 25, 21–30. [Google Scholar] [CrossRef]
- Chartrand, R. Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Process. Lett. 2007, 14, 707–710. [Google Scholar] [CrossRef]
- Auquiert, P.; Gibaru, O.; Nyiri, E. Fast polynomial spline interpolation algorithm with shape-preserving properties. Comput. Aided Geom. Des. 2011, 28, 65–74. [Google Scholar]
- Yu, L.; Jin, Q.; Lavery, J.E.; Fang, S.-C. Univariate cubic L1 interpolating splines: Spline functional, window size and analysis-based algorithm. Algorithms 2010, 3, 311–328. [Google Scholar] [CrossRef]
- Tian, Y.; Jin, Q.; Lavery, J.E.; Fang, S.-C. ℓ1 major component detection and analysis (ℓ1 MCDA): Foundations in two dimensions. Algorithms 2013, 6, 12–28. [Google Scholar] [CrossRef]
- Brooks, J.P.; Dulá, J.H.; Boone, E.L. A pure L1-norm principal component analysis. Comput. Stat. Data Anal. 2013, 61, 83–98. [Google Scholar] [CrossRef] [PubMed]
- Deng, Z.; Luo, J. FANGroup-Fuzzy And Neural Group at North Carolina State University. Available online: http://www.ise.ncsu.edu/fangroup/index.htm (accessed on 8 August 2014).
- MATLAB Release 2012a; The MathWorks Inc.: Natick, MA, USA, 2012.
- Friedman, J.H.; Bentely, J.; Finkel, R.A. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 1997, 3, 209–226. [Google Scholar] [CrossRef]
- Filzmozer, P.; Fritz, H.; Kalcher, K. pcaPP: Robust PCA by Projection Pursuit. Available online: http://cran.r-project.org/web/packages/pcaPP/index.html (accessed on 22 March 2013).
- R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2011. [Google Scholar]
- IBM ILOG CPLEX Optimization. Available online: http://www-01.ibm.com/software/commerce/ optimization/cplex-optimizer/ (accessed on 16 March 2014).
© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Share and Cite
Deng, Z.; Lavery, J.E.; Fang, S.-C.; Luo, J. ℓ1 Major Component Detection and Analysis (ℓ1 MCDA) in Three and Higher Dimensional Spaces. Algorithms 2014, 7, 429-443. https://doi.org/10.3390/a7030429
Deng Z, Lavery JE, Fang S-C, Luo J. ℓ1 Major Component Detection and Analysis (ℓ1 MCDA) in Three and Higher Dimensional Spaces. Algorithms. 2014; 7(3):429-443. https://doi.org/10.3390/a7030429
Chicago/Turabian StyleDeng, Zhibin, John E. Lavery, Shu-Cherng Fang, and Jian Luo. 2014. "ℓ1 Major Component Detection and Analysis (ℓ1 MCDA) in Three and Higher Dimensional Spaces" Algorithms 7, no. 3: 429-443. https://doi.org/10.3390/a7030429
APA StyleDeng, Z., Lavery, J. E., Fang, S. -C., & Luo, J. (2014). ℓ1 Major Component Detection and Analysis (ℓ1 MCDA) in Three and Higher Dimensional Spaces. Algorithms, 7(3), 429-443. https://doi.org/10.3390/a7030429