Abstract
How frequently are computer jobs submitted to an industrial-scale datacenter? We investigate the trace that contains job requests and execution collected in one of large-scale industrial datacenters, which spans near half of a Terabyte. In this paper, we discover and explain two surprising patterns with respect to the inter-arrival time (IAT) of job requests: (a) multiple periodicities and (b) multi-level bundling effects. Specifically, we propose a novel generative process, Hierarchical Bundling Model (HiBM), for modeling the data. HiBM is able to mimic multiple components in the distribution of IAT, and to simulate job requests with the same statistical properties as in the real data. We also provide a systematic approach to estimate the parameters of HiBM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bennett, S.: Log-logistic regression models for survival data. Applied Statistics, 165–171 (1983)
Benson, T., Anand, A., Akella, A., Zhang, M.: Understanding data center traffic characteristics. ACM SIGCOMM Computer Communication Review 40(1), 92–99 (2010)
Casella, G., Berger, R.L.: Statistical inference, vol. 70. Duxbury Press, Belmont (1990)
Vaz de Melo, P.O.S., Akoglu, L., Faloutsos, C., Loureiro, A.A.F.: Surprising patterns for the call duration distribution of mobile phone users. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part III. LNCS, vol. 6323, pp. 354–369. Springer, Heidelberg (2010)
Fischer, W., Meier-Hellstern, K.: The markov-modulated poisson process (mmpp) cookbook. Performance Evaluation 18(2), 149–171 (1993)
Gokhale, S.S., Trivedi, K.S.: Log-logistic software reliability growth model. In: HASE, pp. 34–41. IEEE (1998)
Ihler, A., Hutchins, J., Smyth, P.: Adaptive event detection with time-varying poisson processes. In: KDD, pp. 207–216. ACM (2006)
Kleinberg, J.: Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery 7(4), 373–397 (2003)
Lawless, J.F.: Statistical models and methods for lifetime data, vol. 362. John Wiley & Sons (2011)
Leland, W.E., Taqqu, M.S., Willinger, W., Wilson, D.V.: On the self-similar nature of ethernet traffic. ACM SIGCOMM Computer Communication Review 23, 183–193 (1993)
Massey Jr., F.J.: The kolmogorov-smirnov test for goodness of fit. JASA 46(253), 68–78 (1951)
Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: SOCC, p. 7. ACM (2012)
Saveski, M., Grčar, M.: Web services for stream mining: A stream-based active learning use case. ECML PKDD 2011, 36 (2011)
Wang, M., Madhyastha, T., Chan, N.H., Papadimitriou, S., Faloutsos, C.: Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic. In: ICDE, pp. 507–516. IEEE (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Juan, DC., Li, L., Peng, HK., Marculescu, D., Faloutsos, C. (2014). Beyond Poisson: Modeling Inter-Arrival Time of Requests in a Datacenter. In: Tseng, V.S., Ho, T.B., Zhou, ZH., Chen, A.L.P., Kao, HY. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2014. Lecture Notes in Computer Science(), vol 8444. Springer, Cham. https://doi.org/10.1007/978-3-319-06605-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-06605-9_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06604-2
Online ISBN: 978-3-319-06605-9
eBook Packages: Computer ScienceComputer Science (R0)