[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3126908.3126913acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Efficient process mapping in geo-distributed cloud data centers

Published: 12 November 2017 Publication History

Abstract

Recently, various applications including data analytics and machine learning have been developed for geo-distributed cloud data centers. For those applications, the ways to map parallel processes to physical nodes (i.e., "process mapping") could significantly impact the performance of the applications because of non-uniform communication cost in such geo-distributed environments. While process mapping has been widely studied in grid/cluster environments, few of the existing studies have considered the problem in geo-distributed cloud environments. In this paper, we propose a novel model to formulate the geo-distributed process mapping problem and develop a new method to efficiently find the near optimal solution. Our algorithm considers both the network communication performance of geo-distributed data centers as well as the communication matrix of the target application. Evaluation results with real experiments on Amazon EC2 and simulations demonstrate that our proposal achieves significant performance improvement (50% on average) compared to the state-of-the-art algorithms.

References

[1]
D. Abramson, J. Giddy, and L. Kotler. 2000. High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?. In IPDPS '00. 520--.
[2]
Albert Alexandrov, Mihai F. Ionescu, Klaus E. Schauser, and Chris Scheiman. 1995. LogGP: Incorporating Long Messages into the LogP Model---One Step Closer Towards a Realistic Model for Parallel Computation. In SPAA '95. 95--105.
[3]
Apache. 2011. Apache Oozie. http://oozie.apache.org/. (2011).
[4]
Ebru Arisoy, Tara N Sainath, Brian Kingsbury, and Bhuvana Ramabhadran. 2012. Deep neural network language models. In WLM '12. 20--28.
[5]
The NAS Parallel Benchmarks. http://www.nas.nasa.gov/publications/npb.html.
[6]
Philip A Bernstein, Nathan Goodman, Eugene Wong, Christopher L Reeve, and James B Rothnie Jr. 1981. Query processing in a system for distributed databases (SDD-1). ACM TODS (1981).
[7]
Shahid H Bokhari. 1981. On the mapping problem. IEEE TOC (1981).
[8]
S Wayne Bollinger and Scott F Midkiff. 1991. Heuristic technique for processor and link assignment in multicomputers. IEEE TOC (1991).
[9]
Ignacio Cano, Markus Weimer, Dhruv Mahajan, Carlo Curino, and Giovanni Matteo Fumarola. 2016. Towards Geo-Distributed Machine Learning. CoRR abs/1603.09035(2016).
[10]
Henri Casanova, Dmitrii Zagorodnov, Francine Berman, and Arnaud Legrand. 2000. Heuristics for Scheduling Parameter Sweep Applications in Grid Environments. In HCW '00. 349--.
[11]
ED Cashwell and CJ Everett. 1959. Monte carlo method. New York (1959).
[12]
Hu Chen, Wenguang Chen, Jian Huang, Bob Robert, and Harold Kuhn. 2006. MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In ICS'06. 353--360.
[13]
Ann Chervenak, Robert Schuler, Carl Kesselman, Scott Koranda, and Brian Moe. 2008. Wide Area Data Replication for Scientific Collaborations. Int. J. High Perform. Comput. Netw. 5, 3 (Oct. 2008), 124--134.
[14]
Wesley W. Chu and Paul Hurley. 1982. Optimal query processing for distributed database systems. IEEE TOC (1982).
[15]
James C Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, Jeffrey John Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, and others. 2013. Spanner: Googles globally distributed database. ACM TOCS (2013).
[16]
Court of Justice of the European Union. 2015. The court of justice declares that the commissions us safe harbour decision is invalid. https://curia.europa.eu/jcms/upload/docs/application/pdf/2015--10/cp150117en.pdf. (2015).
[17]
David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. 1993. LogP: Towards a Realistic Model of Parallel Computation. In PPOPP '93. 1--12.
[18]
Josep Díaz, Jordi Petit, and Maria Serna. 2002. A survey of graph layout problems. ACM Computing Surveys (CSUR) (2002).
[19]
J. Diaz-Montes, M. Diaz-Granados, M. Zou, S. Tao, and M. Parashar. 2017. Supporting Data-intensive Workflows in Software-defined Federated Multi-Clouds. IEEE TCC PP, 99 (2017), 1--1.
[20]
Rohan Gandhi, Di Xie, and Y. Charlie Hu. 2013. PIKACHU: How to Rebalance Load in Optimizing Mapreduce on Heterogeneous Clusters. In USENIX ATC'13. 61--66.
[21]
Yifan Gong, Bingsheng He, and Dan Li. 2014. Finding constant from change: Revisiting network performance aware optimizations on iaas clouds. In SC'14. 982--993.
[22]
Yifan Gong, Bingsheng He, and Amelie Chi Zhou. 2015. Monetary cost optimizations for mpi-based hpc applications on amazon clouds: Checkpoints and replicated execution. In SC'15. Article 32, 12 pages.
[23]
Lin Gu, Deze Zeng, Peng Li, and Song Guo. 2014. Cost Minimization for Big Data Processing in Geo-Distributed Data Centers. IEEE TETC 2, 3 (2014), 314--323.
[24]
Greg Hamerly and Charles Elkan. 2002. Alternatives to the k-means algorithm that find better clusterings. In CIKM '02. 600--607.
[25]
Torsten Hoefler, Emmanuel Jeannot, and Guillaume Mercier. 2014. An overview of topology mapping algorithms and techniques in high-performance computing. High-Performance Computing on Complex Environments (2014).
[26]
Torsten Hoefler and Marc Snir. 2011. Generic topology mapping strategies for large-scale parallel architectures. In ICS'11.
[27]
Chien-Chun Hung, Leana Golubchik, and Minlan Yu. 2015. Scheduling Jobs Across Geo-distributed Datacenters. In SoCC '15. 111--124.
[28]
AWS Global Infrastructure. https://aws.amazon.com/about-aws/global-infrastructure/. accessed on Dec 2015.
[29]
Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth Silverman, and Angela Y Wu. 2002. An efficient k-means clustering algorithm: Analysis and implementation. IEEE TPAMI 24, 7 (2002), 881--892.
[30]
Thilo Kielmann, Rutger F. H. Hofman, Henri E. Bal, Aske Plaat, and Raoul A. F. Bhoedjang. 1999. MagPIe: MPI's Collective Communication Operations for Clustered Wide Area Systems. In PPoPP '99. 131--140.
[31]
Tevfik Kosar and Miron Livny. 2004. Stork: Making Data Placement a First Class Citizen in the Grid. In ICDCS '04. 342--349.
[32]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a NewsMedia?. In WWW '10. 591--600.
[33]
Cheol H Lee, Myunghwan Kim, Chan Park, and others. 1990. An efficient k-way graph partitioning algorithm for task allocation in parallel computing systems. In ISCI '90. 748--751.
[34]
Soo-Young Lee and JK Aggarwal. 1987. A mapping strategy for parallel processing. IEEE TOC (1987).
[35]
Avneesh Pant and Hassan Jafri. 2004. Communicating efficiently on cluster based grids with MPICH-VMI. In Cluster Computing, 2004 IEEE International Conference on. IEEE, 23--33.
[36]
Marco Polverini, Antonio Cianfrani, Shaolei Ren, and Athanasios V. Vasilakos. 2014. Thermal-Aware Scheduling of Batch Jobs in Geographically Distributed Data Centers. IEEE TCC 2, 1 (2014), 71--84.
[37]
Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, and Ion Stoica. 2015. Low latency geo-distributed data analytics. In SIGCOMM'15. 421--434.
[38]
Ariel Rabkin, Matvey Arye, Siddhartha Sen, Vivek S Pai, and Michael J Freedman. 2014. Aggregation and degradation in JetStream: Streaming analytics in the wide area. In NSDI'14. 275--288.
[39]
Aboozar Rajabi, Hamid Reza Faragardi, and Thomas Nolte. 2014. An Efficient Scheduling of HPC Applications on Geographically Distributed Cloud Data Centers. 155--167.
[40]
Kavitha Ranganathan and Ian Foster. 2002. Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications. In HPDC '02. 352--.
[41]
Ralf Reussner, Peter S, Lutz Prechelt, and Matthias Muller. 1998. SKaMPI: A detailed, accurate MPI benchmark. In Recent advances in Parallel Virtual Machine and Message Passing Interface. 52--59.
[42]
Marcin Skowron, Mathias Theunis, Stefan Rank, and Anna Borowiec. 2011. Effect of affective profile on communication patterns and affective expressions in interactions with a dialog system. In Affective Computing and Intelligent Interaction. 347--356.
[43]
N Spring and Rich Wolski. 1998. Application level scheduling: Gene sequence library comparison. In ICS'98, Vol. 1.
[44]
Long Thai, Adam Barker, Blesson Varghese, Ozgur Akgun, and Ian Miguel. 2014. Optimal deployment of geographically distributed workflow engines on the Cloud. In CloudCom'14. 811--816.
[45]
Rajeev Thakur and Rolf Rabenseifner. 2005. Optimization of Collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 1 (Feb. 2005), 49--66.
[46]
Jesper Larsson Träff. 2002. Implementing the MPI process topology mechanism. In SC'02.
[47]
Raajay Viswanathan, Ganesh Ananthanarayanan, and Aditya Akella. 2016. CLARINET: WAN-Aware Optimization for Analytics Queries. In OSDI'16. 435--450.
[48]
Xudong Xiang, Chuang Lin, Fu Chen, and Xin Chen. 2014. Greening Geo-distributed Data Centers by Joint Optimization of Request Routing and Virtual Machine Scheduling. In UCC '14. 1--10.
[49]
Xiaolong Xu, Wanchun Dou, Xuyun Zhang, and Jinjun Chen. 2015. EnReal: An Energy-Aware Resource Allocation Method for Scientific Workflow Executions in Cloud Environment. IEEE TCC 1 (2015), 1--1.
[50]
Lingyan Yin, Jizhou Sun, Laiping Zhao, Chenzhou Cui, Jian Xiao, and Ce Yu. 2015. Joint Scheduling of Data and Computation in Geo-Distributed Cloud Systems. In CCGrid '15. 657--666.
[51]
Jidong Zhai, Jianfei Hu, Xiongchao Tang, Xiaosong Ma, and Wenguang Chen. 2014. Cypress: combining static and dynamic analysis for top-down communication trace compression. In SC'14. 143--153.
[52]
Martin Zinkevich, Markus Weimer, Lihong Li, and Alex J. Smola. 2010. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems 23. Curran Associates, Inc., 2595--2603.

Cited By

View all
  • (2024)Carbon-Aware and Fault-Tolerant Migration of Deep Learning Workloads in the Geo-Distributed Cloud2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00062(494-501)Online publication date: 7-Jul-2024
  • (2022) Throughput-Conscious Energy Allocation and Reliability-Aware Task Assignment for Renewable Powered In-Situ Server Systems IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.306809541:3(516-529)Online publication date: Mar-2022
  • (2021)Efficient Replica Migration Scheme for Distributed Cloud Storage SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2018.28587929:1(155-167)Online publication date: 1-Jan-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2017
801 pages
ISBN:9781450351140
DOI:10.1145/3126908
  • General Chair:
  • Bernd Mohr,
  • Program Chair:
  • Padma Raghavan
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. geo-distributed data centers
  3. process mapping

Qualifiers

  • Research-article

Funding Sources

Conference

SC '17
Sponsor:

Acceptance Rates

SC '17 Paper Acceptance Rate 61 of 327 submissions, 19%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Carbon-Aware and Fault-Tolerant Migration of Deep Learning Workloads in the Geo-Distributed Cloud2024 IEEE 17th International Conference on Cloud Computing (CLOUD)10.1109/CLOUD62652.2024.00062(494-501)Online publication date: 7-Jul-2024
  • (2022) Throughput-Conscious Energy Allocation and Reliability-Aware Task Assignment for Renewable Powered In-Situ Server Systems IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.306809541:3(516-529)Online publication date: Mar-2022
  • (2021)Efficient Replica Migration Scheme for Distributed Cloud Storage SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2018.28587929:1(155-167)Online publication date: 1-Jan-2021
  • (2021)AI-oriented Workload Allocation for Cloud-Edge Computing2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid51090.2021.00065(555-564)Online publication date: May-2021
  • (2019)Spread-n-shareProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356152(1-15)Online publication date: 17-Nov-2019
  • (2019)Privacy Regulation Aware Process Mapping in Geo-Distributed Cloud Data CentersIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2896894(1-1)Online publication date: 2019
  • (2019)Wide-Area Spark Streaming: Automated Routing and Batch SizingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.288018930:6(1434-1448)Online publication date: 1-Jun-2019
  • (2019)Exploring the Potential of Elastic Computing Clusters in Geo-Distributed Data Centers with Fast Fabric Interconnection2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2019.00135(937-944)Online publication date: Aug-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media