[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Incremental tensor analysis: Theory and applications

Published: 27 October 2008 Publication History

Abstract

How do we find patterns in author-keyword associations, evolving over time? Or in data cubes (tensors), with product-branchcustomer sales information? And more generally, how to summarize high-order data cubes (tensors)? How to incrementally update these patterns over time? Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks, and many more settings. However, they have only two orders (i.e., matrices, like author and keyword in the previous example).
We propose to envision such higher-order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semi-infinite streams. Thus, we introduce a general framework, incremental tensor analysis (ITA), which efficiently computes a compact summary for high-order and high-dimensional data, and also reveals the hidden correlations. Three variants of ITA are presented: (1) dynamic tensor analysis (DTA); (2) streaming tensor analysis (STA); and (3) window-based tensor analysis (WTA). In paricular, we explore several fundamental design trade-offs such as space efficiency, computational cost, approximation accuracy, time dependency, and model complexity.
We implement all our methods and apply them in several real settings, such as network anomaly detection, multiway latent semantic indexing on citation networks, and correlation study on sensor measurements. Our empirical studies show that the proposed methods are fast and accurate and that they find interesting patterns and outliers on the real datasets.

References

[1]
Aggarwal, C. C., Han, J., Wang, J., and Yu, P. S. 2003. A framework for clustering evolving data streams. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 81--92.
[2]
Agrawal, R., Imieliński, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 207--216.
[3]
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117.
[4]
Ding, C. and Ye, J. 2005. Two-Dimensional singular value decomposition (2dsvd) for 2d maps and images. In SDM.
[5]
Domingos, P. and Hulten, G. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 71--80.
[6]
Domingos, P. and Richardson, M. 2001. Mining the network value of customers. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 57--66.
[7]
Drineas, P. and Mahoney, M. W. 2005. A randomized algorithm for a tensor-based generalization of the SVD. Tech. Rep. YALEU/DCS/TR-1327, Yale University.
[8]
Enron Dataset. 2004. U.C. Berkeley Enron email analysis. http://bailando.sims.berkeley.edu/enron_email.html.
[9]
Foltz, P. W. and Dumais, S. T. 1992. Personalized information delivery: An analysis of information filtering methods. Commun. ACM 35, 12, 51--60.
[10]
Ganti, V., Gehrke, J., and Ramakrishnan, R. 2002. Mining data streams under block evolution. SIGKDD Explor. 3, 2, 1--10.
[11]
Guha, S., Gunopulos, D., and Koudas, N. 2003. Correlating synchronous and asynchronous data streams. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 529--534.
[12]
Guha, S., Meyerson, A., Mishra, N., Motwani, R., and O'Callaghan, L. 2003. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng. 15, 3, 515--528.
[13]
Haykin, S. 1991. Adaptive Filter Theory, 2nd ed. Prentice-Hall, Upper Saddle River, NJ.
[14]
He, X., Cai, D., and Niyogi, P. 2005. Tensor subspace analysis. In Proceeding of the Conference on Advance, in Neural Information Processing Systems (NIPS).
[15]
Indyk, P., Koudas, N., and Muthukrishnan, S. 2000. Identifying representative trends in massive time series data sets using sketches. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB). Morgan Kaufmann, San Francisco, CA, 363--372.
[16]
Jolliffe, I. 2002. Principal Component Analysis. Springer.
[17]
Kannan, R., Vempala, S., and Veta, A. 2000. On clusterings-good, bad and spectral. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS). IEEE Computer Society, Washington, DC, 367.
[18]
Karypis, G. and Kumar, V. 1999. Parallel multilevel series k-way partitioning scheme for irregular graphs. SIAM Rev. 41, 2, 278--300.
[19]
Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46, 5, 604--632.
[20]
Kolda, T. G. 2006. Multilinear operators for higher-order decompositions. Tech. Rep. SAND2006-2081, Sandia National Laboratory/April.
[21]
Kolda, T. G. and Bader, B. W. 2008 Tensor decompositions and applications. SIAM Rev. (in revision). http://csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/SAND2007-6702.pdf.
[22]
Kolda, T. G., Bader, B. W., and Kenny, J. P. 2005. Higher-Order Web link analysis using multilinear algebra. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM). IEEE Computer Society, Washington, DC, 242--249.
[23]
Korn, F., Labrinidis, A., Kotidis, Y., and Faloutsos, C. 2000. Quantifiable data-mining using ratio rules. The VLDB. 8, 3-4, 254--266.
[24]
Kroonenberg, P. and Leeuw, J. D. 1980. Principal component analysis of three-mode data by means of alternating least square algorithms. Psychometrika 45.
[25]
Kroonenberg, P. M. 1983. Three-Mode Principal Component Analysis. Theory and Applications. DWO Press.
[26]
Lathauwer, L. D., Moor, B. D., and Vandewalle, J. 2006. On the best rank-1 and rank-(r1,r2,. . .,rn) approximation of higher-order tensors. SIAM. Matrix Anal. Appl. 21.
[27]
Mahoney, M. W., Maggioni, M., and Drineas, P. 2006. Tensor-Cur decompositions for tensor-based data. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 327--336.
[28]
Papadimitriou, C. H., Tamaki, H., Raghavan, P., and Vempala, S. 1998. Latent semantic indexing: A probabilistic analysis. In Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). ACM, New York, 159--168.
[29]
Papadimitriou, S., Brockwell, A., and Faloutsos, C. 2003. Adaptive, hands-off stream mining. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB). Endowment, 560--571.
[30]
Papadimitriou, S., Sun, J., and Faloutsos, C. 2005. Streaming pattern discovery in multiple time-series. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB). Endowment, 697--708.
[31]
Sakurai, Y., Papadimitriou, S., and Faloutsos, C. 2005. Braid: Stream mining through group lag correlations. In Proceeding of the ACM/International Conference on Management of Data, 599--610.
[32]
Sanguansat, P., Asdornwised, W., Jitapunkul, S., and Marukatat, S. 2006. Two-Dimensional linear discriminant analysis of principle component vectors for face recognition. IEICE - Trans. Inf. Syst. E89-D, 7, 2164--2170.
[33]
Shashua, A. and Levin, A. 2001. Linear image coding for regression and classification using the tensor-rank principle. In Proceedings of the IEEE Conference on Computer vision and Patteroo Recongnistion CVPR (1). IEEE Computer Society, 42--49.
[34]
Smilde, A., Bro, R., and Geladi, P. 2004. Multi-Way Analysis: Applications in the Chemical Sciences. Wiley, West Sussex, UK.
[35]
Sun, J., Papadimitriou, S., and Yu, P. S. 2006a. Window-Based tensor analysis on high-dimensional and multi-aspect streams. In Proceedings of the 6th International Conference on Data Mining (ICDM). IEEE Computer Society, Washington, DC, 1076--1080.
[36]
Sun, J., Tao, D., and Faloutsos, C. 2006a. Beyond streams and graphs: Dynamic tensor analysis. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 374--383.
[37]
Tao, D., Li, X., Wu, X., and Maybank, S. J. 2007. General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29, 10, 1700--1715.
[38]
Tucker, L. R. 1966. Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279--311.
[39]
Universität Trier. 2008. http://www.informatik.uni-trier.de/~ley/db/.
[40]
Van Loan, C. F. 2000. The ubiquitous Kronecker product. J. Comput. Appl. Math. 123, 85--100.
[41]
Vasilescu, M. A. O. and Terzopoulos, D. 2002. Multilinear analysis of image ensembles: Tensorfaces. In Proceedings of the 7th European Conference on Computer Vision-Part I (ECCV). Springer, London, UK, 447--460.
[42]
Wang, H., Fan, W., Yu, P. S., and Han, J. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). ACM, New York, 226--235.
[43]
Xu, D., Yan, S., Zhang, L., Zhang, H.-J., Liu, Z., and Shum, H.-Y. 2005. Concurrent subspaces analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattem Recognition (CVPR).
[44]
Ye, J. 2005. Generalized low rank approximations of matrices. Mach. Learn. 61, 1-3, 167--191.
[45]
Yi, B.-K., Sidiropoulos, N., Johnson, T., Jagadish, H., Faloutsos, C., and Biliris, A. 2000. Online data-mining for co-evolving time sequences. In Proceedings of the 16th International Conference on Data Engineering (ICDE). IEEE Computer Society, Washington, DC, 13.
[46]
Zhao, L. and Zaki, M. J. 2005. Tricluster: An effective algorithm for mining coherent clusters in 3D microarray data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 694--705.
[47]
Zhu, Y. and Shasha, D. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the International Conference on Voy Large Databases (VLDB). 358--369.

Cited By

View all
  • (2025)Fusion of generative adversarial networks and non-negative tensor decomposition for depression fMRI data analysisInformation Processing & Management10.1016/j.ipm.2024.10396162:2(103961)Online publication date: Mar-2025
  • (2024)Deep Factor Learning for Accurate Brain Neuroimaging Data Analysis on Discrimination for Structural MRI and Functional MRIIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2023.325257721:4(582-595)Online publication date: Jul-2024
  • (2024)MUNPE:Multi-view uncorrelated neighborhood preserving embedding for unsupervised feature extractionKnowledge-Based Systems10.1016/j.knosys.2024.111421287(111421)Online publication date: Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 2, Issue 3
October 2008
124 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/1409620
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2008
Accepted: 01 May 2008
Revised: 01 March 2008
Received: 01 October 2007
Published in TKDD Volume 2, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Tensor
  2. multilinear algebra
  3. stream mining

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)7
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Fusion of generative adversarial networks and non-negative tensor decomposition for depression fMRI data analysisInformation Processing & Management10.1016/j.ipm.2024.10396162:2(103961)Online publication date: Mar-2025
  • (2024)Deep Factor Learning for Accurate Brain Neuroimaging Data Analysis on Discrimination for Structural MRI and Functional MRIIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2023.325257721:4(582-595)Online publication date: Jul-2024
  • (2024)MUNPE:Multi-view uncorrelated neighborhood preserving embedding for unsupervised feature extractionKnowledge-Based Systems10.1016/j.knosys.2024.111421287(111421)Online publication date: Mar-2024
  • (2024)A novel multi-modal incremental tensor decomposition for anomaly detection in large-scale networksInformation Sciences10.1016/j.ins.2024.121210681(121210)Online publication date: Oct-2024
  • (2024)Unsupervised deep frequency-channel attention factorization to non-linear feature extraction: A case study of identification and functional connectivity interpretation of Parkinson’s diseaseExpert Systems with Applications10.1016/j.eswa.2023.122853243(122853)Online publication date: Jun-2024
  • (2024)Tracking tensor ring decompositions of streaming tensorsComputational and Applied Mathematics10.1007/s40314-024-03019-444:1Online publication date: 3-Dec-2024
  • (2024)MPCA and MDA via Einstein productComputational and Applied Mathematics10.1007/s40314-024-02866-543:6Online publication date: 2-Aug-2024
  • (2024)Incremental algorithms for truncated higher-order singular value decompositionsBIT10.1007/s10543-023-01004-764:1Online publication date: 8-Jan-2024
  • (2023)A Distributed Framework for Predictive Analytics Using Big Data and MapReduce Parallel ProgrammingMathematical Problems in Engineering10.1155/2023/60488912023(1-10)Online publication date: 1-Feb-2023
  • (2023)A Contemporary and Comprehensive Survey on Streaming Tensor DecompositionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323087435:11(10897-10921)Online publication date: 1-Nov-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media