[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/CloudCom.2012.6427527guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Data analytics in the cloud with flexible MapReduce workflows

Published: 03 December 2012 Publication History

Abstract

Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. It would be desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as sub-workflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the AWARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.

Cited By

View all
  • (2020)Multiple Workflows Scheduling in Multi-tenant Distributed SystemsACM Computing Surveys10.1145/336803653:1(1-39)Online publication date: 6-Feb-2020
  • (2016)Design science research contribution to business intelligence in the cloud - A systematic literature reviewFuture Generation Computer Systems10.1016/j.future.2015.11.01463:C(108-122)Online publication date: 1-Oct-2016

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
CLOUDCOM '12: Proceedings of the 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom)
December 2012
926 pages
ISBN:9781467345118

Publisher

IEEE Computer Society

United States

Publication History

Published: 03 December 2012

Author Tags

  1. Awards activities
  2. Cloud
  3. Cloud computing
  4. Computational modeling
  5. Corporate acquisitions
  6. Data models
  7. MapReduce
  8. Programming
  9. Text Mining
  10. Text mining
  11. Workflow

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Multiple Workflows Scheduling in Multi-tenant Distributed SystemsACM Computing Surveys10.1145/336803653:1(1-39)Online publication date: 6-Feb-2020
  • (2016)Design science research contribution to business intelligence in the cloud - A systematic literature reviewFuture Generation Computer Systems10.1016/j.future.2015.11.01463:C(108-122)Online publication date: 1-Oct-2016

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media