[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/CCGrid.2014.97acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccgridConference Proceedingsconference-collections
research-article

V for vicissitude: the challenge of scaling complex big data workflows

Published: 26 May 2014 Publication History

Abstract

In this paper we present the scaling of BTWorld, our MapReduce-based approach to observing and analyzing the global BitTorrent network which we have been monitoring for the past 4 years. BTWorld currently provides a comprehensive and complex set of queries implemented in Pig Latin, with data dependencies between them, which translate to several MapReduce jobs that have a heavy-tailed distribution with respect to both execution time and input size characteristics. Processing BitTorrent data in excess of 1 TB with our BTWorld workflow required an in-depth analysis of the entire software stack and the design of a complete optimization cycle. We analyze our system from both theoretical and experimental perspectives and we show how we attained a 15 times larger scale of data processing than our previous results.

References

[1]
M. Wojciechowski, M. Capotă, J. Pouwelse, and A. Iosup, "BTWorld: Towards Observing the Global BitTorrent File-Sharing Network," LSAP Workshop in conjunction with HPDC, 2010.
[2]
T. Hegeman, B. Ghit, M. Capotă, J. Hidders, D. H. J. Epema, and A. Iosup, "The BTWorld use case for big data analytics: Description, MapReduce logical workflow, and empirical evaluation," 2013 Int'l Conf. on Big Data. IEEE, Oct. 2013, pp. 622--630. {Online}. Available
[3]
BitTorrent, Inc., "BitTorrent and Torrent Software Surpass 150 Million User Milestone." {Online}. Available: http://www.bittorrent.com/company/about/ces_2012_150m_users
[4]
Sandvine, "Global Internet Phenomena Report 1H2013."
[5]
B. Cohen, "The BitTorrent Protocol Specification." {Online}. Available: http://bittorrent.org/beps/bep_0003.html
[6]
C. Zhang, P. Dhungel, D. Wu, and K. W. Ross, "Unraveling the bittorrent ecosystem," IEEE TPDS, Vol. 22, no. 7, pp. 1164--1177, 2011.
[7]
J. Poort, J. Leenheer, J. van der Ham, and C. Dumitru, "Baywatch: Two Approaches to Measure the Effects of Blocking Access to the Pirate Bay," SSRN Electronic Journal, 2013. {Online}. Available
[8]
www.cs.vu.nl/das4/.
[9]
M. Zaharia, D. Borthakur, J. Sen Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, "Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling," EuroSys, 2010.
[10]
H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. Cetin, and S. Babu, "Starfish: A Self-tuning System for Big Data Analytics," 5th Biennal Conference on Innovative Data Systems Research (CIDR), 2011.
[11]
N. Yigitbasi, T. L. Willke, G. Liao, and D. Epema, "Towards machine learning-based auto-tuning of mapreduce," 21st MASCOTS. IEEE Computer Society, 2013, pp. 11--20.
[12]
B. Ghit, N. Yigitbasi, and D. Epema, "Resource Management for Dynamic MapReduce Clusters in Multicluster Systems," High Performance Computing, Networking, Storage and Analysis (SCC), SC Companion. IEEE, 2012, pp. 1252--1259.
[13]
S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang, "The Hibench Benchmark Suite: Characterization of the MapReduce-based Data Analysis," ICDEW, 2010, pp. 41--51.
[14]
Y. Chen, A. Ganapathi, R. Griffith, and R. Katz, "The Case for Evaluating MapReduce Performance Using Workload Suites," MASCOTS, 2011, pp. 390--399.

Information & Contributors

Information

Published In

cover image ACM Other conferences
CCGRID '14: Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing
May 2014
987 pages
ISBN:9781479927838

Publisher

IEEE Press

Publication History

Published: 26 May 2014

Check for updates

Qualifiers

  • Research-article

Conference

CCGrid '14

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 11
    Total Downloads
  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media