TensorCast: forecasting and mining with coupled tensors

Miguel Araujo^1,2,4,
Pedro Ribeiro^2,3,
Hyun Ah Song ORCID: orcid.org/0000-0001-6337-9036¹ &
…
Christos Faloutsos¹

886 Accesses
8 Citations
Explore all metrics

Abstract

Given an heterogeneous social network, can we forecast its future? Can we predict who will start using a given hashtag on twitter? Can we leverage side information, such as who retweets or follows whom, to improve our membership forecasts? We present TensorCast, a novel method that forecasts time-evolving networks more accurately than current state-of-the-art methods by incorporating multiple data sources in coupled tensors. TensorCast is (a) scalable, being linearithmic on the number of connections; (b) effective, achieving over 20% improved precision on top-1000 forecasts of community members; (c) general, being applicable to data sources with different structure. We run our method on multiple real-world networks, including DBLP, epidemiology data, power grid data, and a Twitter temporal network with over 310 million nonzeros, where we predict the evolution of the activity of the use of political hashtags.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Large-scale agent-based simulations of online social networks

Article 18 June 2022

Hashkat: large-scale simulations of online social networks

Article 04 February 2017

ONE-M: Modeling the Co-evolution of Opinions and Network Connections

Notes

One of the experiments in Sect. 5 deals with this scenario.
For instance, we know that the second biggest element is one of $\varvec{a}_{2}\varvec{b}_1\varvec{s}_1$, $\varvec{a}_{1}\varvec{b}_{2}\varvec{s}_1$ or $\varvec{a}_{1}\varvec{b}_1\varvec{s}_{2}$.
Remember that $\varvec{A}$ and $\varvec{B}$ are nonnegative matrices. In the worst-case, the score of the Kth biggest element is taken from a single power-law and the contribution of the rest of the factors is 0, hence $K^{-\alpha _m}$ is a lower-bound for the Kth biggest value.
In the worst-case scenario, this element is at position x in every of the factors.
We consider the sum of the nonzeros of both tensors.
Note that the quality of absolute precision numbers is affected by (1) how imbalanced the two classes are and (2) the cost of false positives. An improvement from 2 to 5% precision might imply that 1 out of 20 phone-calls we make target a potential customer versus every 1 in 50.
https://www.tycho.pitt.edu/.
https://www.cdc.gov/mmwr/preview/mmwrhtml/00056803.htm.

References

Acar E, Aykut-Bingol C, Bingol H, Bro R, Yener B (2007) Multiway analysis of epilepsy tensors. Bioinformatics 23(13):i10–i18
Article Google Scholar
Araujo M, Günnemann S, Mateos G, Faloutsos C (2014) Beyond blocks: hyperbolic community detection. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 50–65
Bader BW, Kolda TG et al (2015) Matlab tensor toolbox version 2.6. http://www.sandia.gov/tgkolda/TensorToolbox/. Accessed 3 Jan 2018
Baiocchi G, Distaso W (2003) GRETL: econometric software for the gnu generation. J Appl Econom 18(1):105–110
Article Google Scholar
Beutel A, Talukdar PP, Kumar A, Faloutsos C, Papalexakis EE, Xing EP (2014) Flexifact: scalable flexible factorization of coupled tensors on hadoop. IN: SDM, SIAM, pp 109–117
Box GE, Pierce DA (1970) Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J Am Stat Assoc 65(332):1509–1526
Article MathSciNet MATH Google Scholar
Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: ACM sigmod record, vol 29, ACM, pp 93–104
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):15
Article Google Scholar
Cheng J, Adamic L, Dow PA, Kleinberg JM, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd international conference on world wide web, ACM, pp 925–936
Dunlavy DM, Kolda TG, Acar E (2011) Temporal link prediction using matrix and tensor factorizations. ACM Trans Knowl Discov Data (TKDD) 5(2):10
Google Scholar
Ermiş B, Cemgil AT, Acar E (2013) Generalized coupled symmetric tensor factorization for link prediction. In: 2013 21st signal processing and communications applications conference (SIU), IEEE, pp 1–4
Gao S, Denoyer L, Gallinari P (2011) Temporal link prediction by integrating content and structure information. In: Proceedings of the 20th ACM international conference on information and knowledge management, ACM, pp 1169–1174
Grimmett G, Stirzaker D (2001) Probability and random processes. Oxford University Press, Oxford
MATH Google Scholar
Guha S, Mishra N, Roy G, Schrijvers O (2016) Robust random cut forest based anomaly detection on streams. In: International conference on machine learning, pp 2712–2721
Harshman RA (1970) Foundations of the PARAFAC procedure: models and conditions for an“explanatory” multi-modal factor analysis. University of California, Los Angeles
He Z, Xie S, Zdunek R, Zhou G, Cichocki A (2011) Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering. IEEE Trans Neural Netw 22(12):2117–2131
Article Google Scholar
Iasemidis LD, Sackellares JC (1996) Review: chaos theory and epilepsy. Neuroscientist 2(2):118–126
Article Google Scholar
Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500
Article MathSciNet MATH Google Scholar
Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 426–434
Koren Y (2010) Collaborative filtering with temporal dynamics. Commun ACM 53(4):89–97
Article Google Scholar
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Advances in neural information processing systems, pp 556–562
Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In: European conference on principles of data mining and knowledge discovery, Springer, pp 133–145
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58(7):1019–1031
Article Google Scholar
Matsubara Y, Sakurai Y, Faloutsos C, Iwata T, Yoshikawa M (2012) Fast mining and forecasting of complex time-stamped events. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 271–279
Menon AK, Elkan C (2011) Link prediction via matrix factorization. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 437–452
Neyshabur B, Srebro N (2014) On symmetric and asymmetric LSHS for inner product search. arXiv preprint arXiv:1410.5518
Papadimitriou S, Kitagawa H, Gibbons PB, Faloutsos C (2003) Loci: fast outlier detection using the local correlation integral. In: Proceedings of the 19th international conference on data engineering, 2003, IEEE, pp 315–326
Papalexakis EE, Faloutsos C, Sidiropoulos ND (2012) Parcube: sparse parallelizable tensor decompositions. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, pp 521–536
Pasta MQ, Jan Z, Sallaberry A, Zaidi F (2013) Tunable and growing network generation model with community structures. In: 2013 third international conference on cloud and green computing (CGC), IEEE, pp 233–240
Ram P, Gray AG (2012) Maximum inner-product search using cone trees. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 931–939
Scheffer M, Bascompte J, Brock WA, Brovkin V, Carpenter SR, Dakos V, Held H, Van Nes EH, Rietkerk M, Sugihara G (2009) Early-warning signals for critical transitions. Nature 461(7260):53
Article Google Scholar
Sharan U, Neville J (2008) Temporal-relational classifiers for prediction in evolving domains. In: 2008 eighth IEEE international conference on data mining, IEEE, pp 540–549
Shi C, Li Y, Zhang J, Sun Y, Philip SY (2017) A survey of heterogeneous information network analysis. IEEE Trans Knowl Data Eng 29(1):17–37
Article Google Scholar
Shrivastava A, Li P (2014) Improved asymmetric locality sensitive hashing (ALSH) for maximum inner product search (MIPS). arXiv preprint arXiv:1410.5410
Şimşekli U, Ermiş B, Cemgil, AT, Acar E (2013) Optimal weight learning for coupled tensor factorization with mixed divergences. In: 21st European signal processing conference (EUSIPCO 2013), IEEE, pp 1–5
Song H A, Hooi B, Jereminov M, Pandey A, Pileggi L, Faloutsos C (2017) Powercast: mining and forecasting power grid sequences. In: Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S (eds) Machine learning and knowledge discovery in databases. ECML PKDD 2017. Lecture Notes in Computer Science, vol 10535. Springer, Cham, pp 606–621
Sun J, Tao D, Faloutsos C (2006) Beyond streams and graphs: dynamic tensor analysis. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 374–383
Tao D, Maybank S, Hu W, Li X (2005) Stable third-order tensor representation for color image classification. In: Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence, IEEE Computer Society, pp 641–644
Teflioudi C, Gemulla R, Mykytiuk O (2015) Lemp: fast retrieval of large entries in a matrix product. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, ACM, pp 107–122
Walker PB, Gilpin S, Fooshee S, Davidson I (2015) Constrained tensor decomposition via guidance: increased inter and intra-group reliability in FMRI analyses. In: International conference on augmented cognition, Springer, pp 361–369
Welling M, Weber M (2001) Positive tensor factorization. Pattern Recognit Lett 22(12):1255–1261
Article MATH Google Scholar
Xie Z, Li X, Wang X (2007) A new community-based evolving network model. Phys A Stat Mech Appl 384(2):725–732
Article Google Scholar
Yılmaz YK (2012) Generalized tensor factorization. Ph.D. thesis, Citeseer
Zellner A (1962) An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J Am Stat Assoc 57(298):348–368
Article MathSciNet MATH Google Scholar
Zhou X, Xiang L, Xiao-Fan W (2008) Weighted evolving networks with self-organized communities. Commun Theor Phys 50(1):261
Article MATH Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. IIS-1247489, and by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053. This work was also financed by the ERDF European Regional Development Fund through the Operational Programme for Competitiveness and Internationalization - COMPETE 2020 Programme within Project POCI-01-0145-FEDER-006961, and by FCT Fundao para a Cincia e a Tecnologia (Portuguese Foundation for Science and Technology) as part of Project UID/EEA/50014/2013. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, the Army Research Laboratory, the U.S. Government, or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Author information

Authors and Affiliations

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Miguel Araujo, Hyun Ah Song & Christos Faloutsos
Computer Science Department, INESC-TEC, Porto, Portugal
Miguel Araujo & Pedro Ribeiro
University of Porto, Porto, Portugal
Pedro Ribeiro
Feedzai, Lisbon, Portugal
Miguel Araujo

Authors

Miguel Araujo
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Ribeiro
View author publications
You can also search for this author in PubMed Google Scholar
Hyun Ah Song
View author publications
You can also search for this author in PubMed Google Scholar
Christos Faloutsos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Miguel Araujo.

Appendix: multiplicative updates of coupled tensors factorization

The nonnegative coupled tensor factorization problem

$$\begin{aligned} \min _{\varvec{A,B,C,T}}\left||\varvec{\mathcal {X}} - \displaystyle \sum _f \varvec{a_f} \circ \varvec{b_f} \circ \varvec{t_f} \right||_F^2 + \lambda \left||\varvec{\mathcal {Y}} - \displaystyle \sum _f \varvec{a_f} \circ \varvec{c_f} \circ \varvec{t_f}\right||_F^2 \end{aligned}$$

is well studied and its multiplicative update equations have been previously described in the literature (e.g., considering the dispersion parameter $\lambda $ [35]). The solution can be found by iteratively updating

$$\begin{aligned} \varvec{A}&\leftarrow \varvec{A} \otimes \frac{\varvec{\mathcal {X}_{(1)}}(\varvec{B}\odot \varvec{T}) + \lambda \varvec{\mathcal {Y}_{(1)}}(\varvec{C}\odot \varvec{T})}{\varvec{A}(\varvec{B}\odot \varvec{T})^t(\varvec{B}\odot \varvec{T})+\lambda \varvec{A}(\varvec{C}\odot \varvec{T})^t(\varvec{C}\odot \varvec{T})} \\ \varvec{B}&\leftarrow \varvec{B} \otimes \frac{\varvec{\mathcal {X}_{(2)}}(\varvec{A}\odot \varvec{T})}{\varvec{B}(\varvec{A}\odot \varvec{T})^t(\varvec{A}\odot \varvec{T})} \\ \varvec{C}&\leftarrow \varvec{C} \otimes \frac{\varvec{\mathcal {Y}_{(2)}}(\varvec{A}\odot \varvec{T})}{\varvec{C}(\varvec{A}\odot \varvec{T})^t(\varvec{A}\odot \varvec{T})}\\ \varvec{T}&\leftarrow \varvec{T} \otimes \frac{\varvec{\mathcal {X}_{(3)}}(\varvec{A}\odot \varvec{B}) + \lambda \varvec{\mathcal {Y}_{(3)}}(\varvec{A}\odot \varvec{C})}{\varvec{T}(\varvec{A}\odot \varvec{B})^t(\varvec{A}\odot \varvec{B})+\lambda \varvec{T}(\varvec{A}\odot \varvec{C})^t(\varvec{A}\odot \varvec{C})} \end{aligned}$$

The problem is not as well understood when one of the factorizations is symmetric, e.g., $\displaystyle \varvec{\hat{\mathcal {Y}}} = \sum _f \varvec{a_f} \circ \varvec{a_f} \circ \varvec{t_f}$, as this is no longer a linear problem.

Welling and Weber [41] note the need for a scaling exponent (for the simple, non-coupled case):

$$\begin{aligned} \varvec{A}&\leftarrow \varvec{A} \otimes \left( \frac{\varvec{\mathcal {X}_{(1)}}(\varvec{A}\odot \varvec{T})}{\varvec{A}(\varvec{A}\odot \varvec{T})^t(\varvec{A}\odot \varvec{T})}\right) ^{1/d} \end{aligned}$$

which should be at least 1 / 2 for the matrix case, although no proof is provided. To the best of our knowledge, the best theoretical bound is 1 / 3 when the matrix is semi-definite positive [16]. Empirical results (for the coupled case) indicate that removing the exponent ($d=1$) might eliminate the convergence guarantees, but even small perturbations converge (e.g., 0.98 in [11]).

We recommend an exponent of 1 / 3, as convergence is exponentially fast in any case.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Araujo, M., Ribeiro, P., Song, H.A. et al. TensorCast: forecasting and mining with coupled tensors. Knowl Inf Syst 59, 497–522 (2019). https://doi.org/10.1007/s10115-018-1223-9

Download citation

Received: 04 January 2018
Revised: 11 March 2018
Accepted: 10 May 2018
Published: 24 May 2018
Issue Date: 04 June 2019
DOI: https://doi.org/10.1007/s10115-018-1223-9

TensorCast: forecasting and mining with coupled tensors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large-scale agent-based simulations of online social networks

Hashkat: large-scale simulations of online social networks

ONE-M: Modeling the Co-evolution of Opinions and Network Connections

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: multiplicative updates of coupled tensors factorization

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

TensorCast: forecasting and mining with coupled tensors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Large-scale agent-based simulations of online social networks

Hashkat: large-scale simulations of online social networks

ONE-M: Modeling the Co-evolution of Opinions and Network Connections

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix: multiplicative updates of coupled tensors factorization

Appendix: multiplicative updates of coupled tensors factorization

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation