[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

H-PARAFAC: Hierarchical Parallel Factor Analysis of Multidimensional Big Data

Published: 01 April 2017 Publication History

Abstract

It has long been an important issue in various disciplines to examine massive multidimensional data superimposed by a high level of noises and interferences by extracting the embedded multi-way factors. With the quick increases of data scales and dimensions in the big data era, research challenges arise in order to (1) reflect the dynamics of large tensors while introducing no significant distortions in the factorization procedure and (2) handle influences of the noises in sophisticated applications. A hierarchical parallel processing framework over a GPU cluster, namely H-PARAFAC, has been developed to enable scalable factorization of large tensors upon a “divide-and-conquer” theory for Parallel Factor Analysis (PARAFAC). The H-PARAFAC framework incorporates a coarse-grained model for coordinating the processing of sub-tensors and a fine-grained parallel model for computing each sub-tensor and fusing sub-factors. Experimental results indicate that (1) the proposed method breaks the limitation on the scale of multidimensional data to be factorized and dramatically outperforms the traditional counterparts in terms of both scalability and efficiency, e.g., the runtime increases in the order of $n^2$ when the data volume increases in the order of $n^3$ , (2) H-PARAFAC has potentials in refraining the influences of significant noises, and (3) H-PARAFAC is far superior to the conventional window-based counterparts in preserving the features of multiple modes of large tensors.

References

[1]
P. Comon, “Independent component analysis, a new concept?” Signal Process., vol. Volume 36, no. Issue 3, pp. 287–314, 1994.
[2]
N. M. Faber, L. Buydens, and G. Kateman, “Generalized rank annihilation method. I: Derivation of eigenvalue problems,” J. Chemometrics, vol. Volume 8, no. Issue 2, pp. 147–154, 1994.
[3]
F. Estienne, N. Matthijs, D. L. Massart, P. Ricoux, and D. Leibovici, “Multi-way modelling of high-dimensionality electroencephalographic data,” Chemometrics Intell. Laboratory Syst., vol. Volume 58, no. Issue 1, pp. 59–72, 2001.
[4]
J. P. Cunningham and B. M. Yu, “Dimensionality reduction for large-scale neural recordings,” Nat. Neuroscience Rev., vol. Volume 17, no. Issue 11, pp. 1500–1509, 2014.
[5]
X. Lu, Y. Yuan, and X. Zheng, “Image super-resolution via double sparsity regularized manifold learning,” IEEE Trans. Circuits Syst. Video Technol., vol. Volume 23, no. Issue 12, pp. 2022–2033, 2013.
[6]
X. Lu, Y. Yuan, and P. Yan, “Alternatively constrained dictionary learning for image super-resolution,” IEEE Trans. Cybern., vol. Volume 44, no. Issue 3, pp. 366–377, 2014.
[7]
J. Jiang, R. Hu, Z. Wang, Z. Han, and J. Ma, “Facial image hallucination through coupled-layer neighbor embedding,” IEEE Trans. Circuits Syst. Video Technol., vol. Volume 26, no. Issue 9, pp. 1674–1684, 2016.
[8]
K. Pearson, “LIII. On lines and planes of closest fit to systems of points in space,” Philosoph. Mag. Series 6, vol. Volume 2, no. Issue 11, pp. 559–572, 1901.
[9]
A. Hyvärinen and E. Oja, “Independent component analysis: Algorithms and applications,” Neural Netw., vol. Volume 13, no. Issue 4, pp. 411–430, 2000.
[10]
S. Lahabar and P. J. Narayanan, “Singular value decomposition on GPU using CUDA,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2009, pp. 1–10.
[11]
R. Bro, “PARAFAC tutorial and applications,” Chemometrics Intell. Laboratory Syst., vol. Volume 38, no. Issue 2, pp. 149–171, 1997.
[12]
R. A. Harshman and M. E. Lundy, “PARAFAC: Parallel factor analysis,” Comput. Statistics Data Anal., vol. Volume 18, no. Issue 1, pp. 39–72, 1994.
[13]
J. Wang, X. Li, C. Lu, L. J. Voss, J. P. Barnard, and J. W. Sleigh, “Characteristics of evoked potential multiple EEG recordings in patients with chronic pain by means of parallel factor analysis,” Comput. Math. Methods Med., vol. Volume 12, 2012, Art. no. 279560.
[14]
A. H. Phan, P. Tichavsky, and A. Cichocki, “Fast alternating LS algorithms for high order CANDECOMP/PARAFAC tensor factorizations,” IEEE Trans. Signal Process., vol. Volume 61, no. Issue 19, pp. 4836–4846, 2013.
[15]
L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. Volume 31, no. Issue 3, pp. 279–311, 1966.
[16]
M. Morup, L. K. Hansen, C. S. Herrmann, J. Parnas, and S. M. Arnfred, “Parallel factor analysis as an exploratory tool for wavelet transformed event-related EEG,” NeuroImage, vol. Volume 29, no. Issue 3, pp. 938–947, 2006.
[17]
D. Chen, et al., “Fast and scalable multi-way analysis of neural data,” IEEE Trans. Comput., vol. Volume 64, no. Issue 3, pp. 707–719, 2015.
[18]
A. H. Phan and A. Cichocki, “PARAFAC algorithms for large-scale problems,” Neurocomputing, vol. Volume 74, no. Issue 11, pp. 1970–1984, 2011.
[19]
Nvidia CUDA Programming Guide, <institution>NVIDIA</institution>, Santa Clara, CA, USA, 2015.
[20]
J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue, vol. Volume 6, no. Issue 2, pp. 40–53, 2008.
[21]
M. Rajih, P. Comon, and R. A. Harshman, “Enhanced line search: A novel method to accelerate PARAFAC,” SIAM J. Matrix Anal. Appl., vol. Volume 30, no. Issue 3, pp. 1128–1147, 2008.
[22]
A. Cichocki, R. Zdunek, and S. Amari, “Hierarchical ALS algorithms for nonnegative matrix and 3D tensor factorization,” in Proc. 7th Int. Conf. Ind. Compon. Anal. Signal Separation, 2007, pp. 169–176.
[23]
F. Cong et al., “Low-rank approximation based nonnegative multi-way array decomposition on event-related potentials,” Int. J. Neural Syst., vol. Volume 24, no. Issue 8, 2014, Art. no. 1440005.
[24]
G. Zhou, A. Cichocki, and S. Xie, “Fast nonnegative matrix/tensor factorization based on low-rank approximation,” IEEE Trans. Signal Process., vol. Volume 60, no. Issue 6, pp. 2928–2940, Jun. 2012.
[25]
A. H. Phan and A. Cichocki, “Advances in PARAFAC using parallel block decomposition,” in Neural Information Processing . Berlin, Germany: Springer, 2009, pp. 323–330.
[26]
U. Kang, E. Papalexakis, A. Harpale, and C. Faloutsos, “GigaTensor: Scaling tensor analysis up by 100 times—Algorithms and discoveries,” in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2012, pp. 316–324.
[27]
J. H. Choi and S. Vishwanathan, “DFacTo: Distributed factorization of tensors,” in Proc. Neural Inf. Process. Syst., 2014, pp. 1296–1304.
[28]
E. E. Papalexakis, C. Faloutsos, and N. D. Sidiropoulos, “ParCube: Sparse parallelizable tensor decompositions,” in Proc. Eur. Conf. Mach. Learn. Knowl. Discovery Databases, 2012, pp. 521–536.
[29]
S. Smith, N. Ravindran, N. D. Sidiropoulos, and G. Karypis, “SPLATT: Efficient and parallel sparse tensor-matrix multiplication,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2015, pp. 61–70.
[30]
A. H. Phan, P. Tichavsky, and A. Cichocki, “CANDECOMP/PARAFAC decomposition of high-order tensors through tensor reshaping,” IEEE Trans. Signal Process., vol. Volume 61, no. Issue 19, pp. 4847–4860, 2012.
[31]
A. Cichocki, R. Zdunek, A. H. Phan, and S.-i. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation . Hoboken, NJ, USA: Wiley, 2009.
[32]
P. Tichavský, A. H. Phan, and Z. Koldovský, “Cramér-rao-induced bounds for CANDECOMP/PARAFAC tensor decomposition,” IEEE Trans. Signal Process., vol. Volume 61, no. Issue 8, pp. 1986–1997, 2013.
[33]
F. Cong, Q. Lin, L. Kuang, X. Gong, P. Astikainen, and T. Ristaniemi, “Tensor decomposition of EEG signals: A brief review,” J. Neuroscience Methods, vol. Volume 248, pp. 59–69, 2015.
[34]
L. Wang, W. Song, and P. Liu, “Link the remote sensing big data to the image features via wavelet transformation,” Cluster Comput., vol. Volume 19, no. Issue 2, pp. 793–810, 2016.
[35]
F. Cong et al., “Benefits of multi-domain feature of mismatch negativity extracted by non-negative tensor factorization from EEG collected by low-density array,” Int. J. Neural Syst., vol. Volume 22, no. Issue 6, pp. 1415–1428, 2012.

Cited By

View all
  • (2022)Tensor Multi-Clustering Parallel Intelligent Computing Method Based on Tensor Chain DecompositionComputational Intelligence and Neuroscience10.1155/2022/73961852022Online publication date: 1-Jan-2022
  • (2021)Dimensionality Reduction Methods for Brain Imaging Data AnalysisACM Computing Surveys10.1145/344830254:4(1-36)Online publication date: 3-May-2021
  • (2021)Fast and Memory-Efficient Tucker Decomposition for Answering Diverse Time Range QueriesProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467290(725-735)Online publication date: 14-Aug-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 28, Issue 4
April 2017
310 pages

Publisher

IEEE Press

Publication History

Published: 01 April 2017

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Tensor Multi-Clustering Parallel Intelligent Computing Method Based on Tensor Chain DecompositionComputational Intelligence and Neuroscience10.1155/2022/73961852022Online publication date: 1-Jan-2022
  • (2021)Dimensionality Reduction Methods for Brain Imaging Data AnalysisACM Computing Surveys10.1145/344830254:4(1-36)Online publication date: 3-May-2021
  • (2021)Fast and Memory-Efficient Tucker Decomposition for Answering Diverse Time Range QueriesProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467290(725-735)Online publication date: 14-Aug-2021
  • (2021)Incremental Factorization of Big Time Series Data with Blind Factor ApproximationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.293168733:2(569-584)Online publication date: 1-Feb-2021
  • (2020)Listen, Look, and Find the OneACM Transactions on Multimedia Computing, Communications, and Applications10.1145/338054916:2(1-20)Online publication date: 22-May-2020
  • (2020)Retracted on May 30, 2024: Gait Recognition as a Service for Unobtrusive User Identification in Smart SpacesACM Transactions on Internet of Things10.1145/33757991:1(1-21)Online publication date: 2-Mar-2020
  • (2020)Secure search for encrypted personal health records from big data NoSQL databases in cloudComputing10.1007/s00607-019-00762-z102:6(1521-1545)Online publication date: 1-Jun-2020
  • (2020)Transferring fashion to surveillance with weak labelsNeural Computing and Applications10.1007/s00521-020-05528-935:18(13021-13035)Online publication date: 23-Nov-2020
  • (2019)A Performance Model for GPU Architectures that Considers On-Chip Resources: Application to Medical Image RegistrationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.290521330:9(1947-1961)Online publication date: 1-Sep-2019
  • (2019)Outlier Dirichlet Mixture MechanismIEEE Transactions on Information Forensics and Security10.1109/TIFS.2018.289080814:8(1975-1987)Online publication date: 1-Aug-2019
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media