[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2534248.2534257acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Time-bound analytic tasks on large datasets through dynamic configuration of workflows

Published: 17 November 2013 Publication History

Abstract

Domain experts are often untrained in big data technologies and this limits their ability to exploit the data they have available. Workflow systems hide the complexities of high-end computing and software engineering by offering pre-packaged analytic steps combined into multi-step methods commonly used by experts. A current limitation of workflow systems is that they do not take into account user deadlines: they run workflows selected by the user, but take their time to do so. This is impractical when large datasets are at stake, since users often prefer to see an answer faster even if it has lower precision or quality. In this paper, we present an extension to workflow systems that enables them to take into account user deadlines by automatically generating alternative workflow candidates and ranking them according to performance estimates. The system makes these estimates based on workflow performance models created from workflow executions, and uses semantic technologies to reason about workflow options. Possible workflow candidates are presented to the user in a compact manner, and are ranked according to their runtime estimates. We have implemented this approach in the WOOT system, which combines and extends capabilities from the WINGS semantic workflow system and the Apache OODT Object Oriented Data Technology and workflow execution system.

References

[1]
Bergmann, R.; and Gil, Y. "Similarity Assessment and Efficient Retrieval of Semantic Workflows." Information Systems Journal, 2012.
[2]
Blei, D., Ng, A., and M. Jordan. "Latent Dirichlet Allocation." Journal of Machine Learning Research, 3, pp 993--1022, January 2003.
[3]
Carrington, L. C. et al. "How Well Can Simple Metrics Represent the Performance of HPC Applications?", Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005.
[4]
De Roure, D; Goble, C.; Stevens, R. "The design and realizations of the myExperiment Virtual Research Environment for social sharing of workflows". Future Generation Computer Systems, 25 (561--567), 2009.
[5]
Furlani, T. R., Jones, M. D., Gallo, S. M., Bruno, A. E., Lu, C., Ghadersohi, A., Gentner, R. J., Patra, A., DeLeon, R. L., von Laszewski, G., Wang, F., and A. Zimmerman. Performance metrics and auditing framework using application kernels for high-performance computer systems. Concurrency and Computation: Practice and Experience, 25(7), 2013.
[6]
Garijo, D.; and Gil, Y. "A New Approach for Publishing Workflows: Abstractions, Standards, and Linked Data." Proceedings of the Sixth Workshop on Workflows in Support of Large-Scale Science (WORKS'11), held in conjunction with SC 2011, Seattle, WA, 2011.
[7]
Gil, Y., Groth, P., Ratnakar, V., and C. Fritz. "Expressive Reusable Workflow Templates." Proceedings of the IEEE e-Science Conference, Oxford, UK, pages 244--351. 2009.
[8]
Gil, Y.; Deelman, E.; Ellisman, M. H.; Fahringer, T.; Fox, G.; Gannon, D.; Goble, C. A.; Livny, M.; Moreau, L.; and Myers, J. "Examining the Challenges of Scientific Workflows." IEEE Computer, 40(12), 2007.
[9]
Gil, Y.; Ratnakar, V.; Kim, J.; Gonzalez-Calero, P. A.; Groth, P.; Moody, J.; and Deelman, E. "WINGS: Intelligent Workflow-Based Design of Computational Experiments." IEEE Intelligent Systems, 26(1). 2011.
[10]
Gil, Y.; Gonzalez-Calero, P. A.; Kim, J.; Moody, J.; and Ratnakar, V. "A Semantic Framework for Automatic Generation of Computational Workflows Using Distributed Data and Component Catalogs." Journal of Experimental and Theoretical Artificial Intelligence, 23(4), 2011.
[11]
Goderis, A. "Workflow Re-use and Discovery in Bioinformatics." Ph.D. thesis. University of Manchester, 2008.
[12]
Goderis, A., Li, P., Goble, C. "Workflow discovery: the problem, a case study from e-science and a graph-based solution." International Journal of Web Services Research 5, 2008.
[13]
Hauder, M., Gil, Y. and Liu, Y. "A Framework for Efficient Text Analytics through Automatic Configuration and Customization of Scientific Workflows". Proceedings of the Seventh IEEE International Conference on e-Science, Stockholm, Sweden, December 5--8, 2011.
[14]
Hauder, M.; Gil, Y.; Sethi, R.; Liu, Y.; and Jo, H. "Making Data Analysis Expertise Broadly Accessible through Workflows." Proceedings of the Sixth Workshop on Workflows in Support of Large-Scale Science (WORKS'11), held in conjunction with SC 2011, Seattle, WA, 2011.
[15]
Hoffman, M., Blei, D., and F. Bach. "Online Learning for Latent Dirichlet Allocation." NIPS, 2010.
[16]
Hutter, F., Xu, L., Hoos, H. H., and K. Leyton-Brown. "Algorithm Runtime Prediction: The State of the Art". Available from arXiv:1211.0906.
[17]
Kumar, V.; Kurc, T.; Ratnakar, V.; Kim, J.; Mehta, G.; Vahi, K.; Nelson, Y. L.; Sadayappan, P.; Deelman, E.; Gil, Y.; Hall, M.; and Saltz, J. "Parameterized Specification, Configuration, and Execution of Data-Intensive Scientific Workflows." Cluster Computing Journal, 13(3), 2010.
[18]
Langford, J. Vowpal Wabbit. https://github.com/JohnLangford/vowpal_wabbit/ 2011.
[19]
Mattmann, C., Crichton, D., Medvidovic, N., and Hughes, S. "A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications." Proceedings of the 28th International Conference on Software Engineering (ICSE06), pp. 721--730, Shanghai, China, May 20th--28th, 2006.
[20]
Mattmann, C. A., et al. "A reusable process control system framework for the orbiting carbon observatory and NPP Sounder PEATE missions." Third IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT), 2009.
[21]
Mattmann, C. A., and J. Zitting. "Tika in Action." Manning Publications, 2011.
[22]
McCallum, A. K. "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu. 2002.
[23]
Miu, T. and P. Missier. Predicting the Execution Time of Workflow Activities Based on Their Input Features, Proceedings of the Seventh Workshop on Workflows in Support of Large-Scale Science (WORKS'12), held in conjunction with SC 2012.
[24]
Montagnat, J., Glatard, T., Reimert, D., Maheshwari, K., Caron, E., and F. Desprez. "Workflow-based comparison of two Distributed Computing Infrastructures." Proceedings of the Fifth Workshop on Workflows in Support of Large-Scale Science (WORKS'10), New Orleans, LA, 2010.
[25]
Ramachandran R, Movva S, Conover H, Lynnes C. Talkoot Software Appliance for Collaborative Science. IEEE International Geoscience & Remote Sensing Symposium, 2009.
[26]
Rehůřek, R. gensim. http://radimrehurek.com/gensim/. 2009.
[27]
Vahi, K, Harvey, I., Samak, T., Gunter, D. K, Evans, K, Rogers, D., Taylor, I., Goode, M., Silva, F., Al-Shakarchi, E., Mehta, G., Jones, A. and E. Deelman. "A General Approach to Real-Time Workflow Monitoring." Proceedings of the Seventh Workshop on Workflows in Support of Large-Scale Science (WORKS'12), 2012.
[28]
Wang, Y., Bai, H., Stanton, M., Chen, W., and E. Y. Chang. "PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications." AAIM, 2009.
[29]
Winkler, R. L. "An Introduction to Bayesian Inference and Decision. Probabilistic Press, 2003.
[30]
Woollard, D.; Medvidovic, N.; Gil, Y.; and Mattmann, C. "Scientific Software as Workflows: From Discovery to Distribution." IEEE Software, 25(4): 37--43. 2008.

Cited By

View all
  • (2017)Towards Automating Data NarrativesProceedings of the 22nd International Conference on Intelligent User Interfaces10.1145/3025171.3025193(565-576)Online publication date: 7-Mar-2017
  • (2017)Constraint-Driven Dynamic Workflow for Automation of Big Data Analytics Based on GraphPlan2017 IEEE International Conference on Web Services (ICWS)10.1109/ICWS.2017.120(357-364)Online publication date: Jun-2017
  • (2014)Teaching parallelism without programmingProceedings of the Workshop on Education for High-Performance Computing10.1109/EduHPC.2014.12(42-48)Online publication date: 16-Nov-2014

Index Terms

  1. Time-bound analytic tasks on large datasets through dynamic configuration of workflows

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      WORKS '13: Proceedings of the 8th Workshop on Workflows in Support of Large-Scale Science
      November 2013
      133 pages
      ISBN:9781450325028
      DOI:10.1145/2534248
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 November 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. OODT
      2. WINGS
      3. performance
      4. semantic workflows
      5. workflows

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SC13

      Acceptance Rates

      WORKS '13 Paper Acceptance Rate 13 of 16 submissions, 81%;
      Overall Acceptance Rate 30 of 54 submissions, 56%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Towards Automating Data NarrativesProceedings of the 22nd International Conference on Intelligent User Interfaces10.1145/3025171.3025193(565-576)Online publication date: 7-Mar-2017
      • (2017)Constraint-Driven Dynamic Workflow for Automation of Big Data Analytics Based on GraphPlan2017 IEEE International Conference on Web Services (ICWS)10.1109/ICWS.2017.120(357-364)Online publication date: Jun-2017
      • (2014)Teaching parallelism without programmingProceedings of the Workshop on Education for High-Performance Computing10.1109/EduHPC.2014.12(42-48)Online publication date: 16-Nov-2014

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media