[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3079079.3079082acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

Enabling scalability-sensitive speculative parallelization for FSM computations

Published: 14 June 2017 Publication History

Abstract

Finite state machines (FSMs) are the backbone of many applications, but are difficult to parallelize due to their inherent dependencies. Speculative FSM parallelization has shown promise on multicore machines with up to eight cores. However, as hardware parallelism grows (e.g., Xeon Phi has up to 288 logical cores), a fundamental question raises: How does the speculative FSM parallelization scale as the number of cores increases? Without answering this question, existing methods for speculative FSM parallelization simply choose to use all available cores, which might not only waste computing resources, but also result in suboptimal performance.
In this work, we conduct a systematic scalability analysis for speculative FSM parallelization. Unlike many other parallelizations which can be modeled by the classic Amdahl's law or its simple extensions, speculative FSM parallelization scales unconventionally due to the non-deterministic nature of speculation and the cost variations of misspeculation. To address these challenges, this work introduces a spectrum of scalability models that are customized to the properties of specific FSMs and the underlying architecture. The models, for the first time, precisely capture the scalability of speculative parallelization for different FSM computations, and clearly show the existence of a "sweet spot" in terms of the number of cores employed by the speculative FSM parallelization to achieve the optimal performance. To make the scalability models practical, we develop S3, a <u>s</u>calability-<u>s</u>ensitive <u>s</u>peculation framework for FSM parallelization. For any given FSM, S3 can automatically characterize its properties and analyze its scalability, hence guide speculative parallelization towards the optimal performance and more efficient use of computing resources. Evaluations on different FSMs and architectures confirm the accuracy of the proposed models and show that S3 achieves significant speedup (up to 5X) and energy savings (up to 77%).

References

[1]
Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. 1986. Compilers, Principles, Techniques. Addison wesley Boston.
[2]
Gene M Amdahl. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18--20, 1967, spring joint computer conference. ACM, 483--485.
[3]
S. Browne, C. Deane, G. Ho, and P. Mucci. 1999. PAPI: A portable interface to hardware performance counters. In Proceedings of Department of Defense HPCMP Users Group Conference.
[4]
Sutapa Datta and Subhasis Mukhopadhyay. 2015. A grammar inference approach for predicting kinase specific phosphorylation sites. PloS one 10, 4 (2015), e0122294.
[5]
C. Ding, X. Shen, K. Kelsey, C. Tice, R. Huang, and C. Zhang. 2007. Software Behavior-oriented Parallelization. In PLDI.
[6]
Paul Dlugosch, Dave Brown, Paul Glendenning, Michael Leventhal, and Harold Noyes. 2014. An efficient and scalable semiconductor architecture for parallel automata processing. IEEE Transactions on Parallel and Distributed Systems 25, 12 (2014), 3088--3098.
[7]
M. Feng, R. Gupta, and Y. Hu. 2011. SpiceC: Scalable parallelism via implicit copying and explicit commit. In Proceedings of the ACM SIGPLAN Symposium on Principles Practice of Parallel Programming.
[8]
John L Gustafson. 1988. Reevaluating Amdahl's law. Commun. ACM 31, 5 (1988), 532--533.
[9]
Maurice Herlihy and Eric Koskinen. 2008. Transactional Boosting: A Methodology for Highly-concurrent Transactional Objects. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'08).
[10]
Adolfy Hoisie, Olaf Lubeck, and Harvey Wasserman. 2000. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. The International Journal of High Performance Computing Applications 14, 4 (2000), 330--346.
[11]
C. Jones, R. Liu, L. Meyerovich, K. Asanovic, and R. Bodik. 2009. Parallelizing the web browser. In HotPar.
[12]
B. Kaplan. Speculative parsing path. (????). http://bugzilla.mozilla.org.
[13]
S. Klein and Y. Wiseman. 2003. Parallel huffman decoding with applications to jpeg files. Jounal of Computing 46, 5 (2003).
[14]
Shmuel Tomi Klein and Yair Wiseman. 2003. Parallel Huffman decoding with applications to JPEG files. Comput. J. 46, 5 (2003), 487--497.
[15]
Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. 2007. Optimistic Parallelism Requires Abstractions. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '07).
[16]
Sailesh Kumar, Sarang Dharmapurikar, Fang Yu, Patrick Crowley, and Jonathan Turner. 2006. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In ACM SIGCOMM Computer Communication Review, Vol. 36. ACM, 339--350.
[17]
Vipin Kumar and Anshul Gupta. 1991. Analysis of Scalability of Parallel Algorithms and Architectures: A Survey. In Proceedings of the 5th International Conference on Supercomputing (ICS '91). ACM, New York, NY, USA, 396--405.
[18]
Richard E. Ladner and Michael J. Fischer. 1980. Parallel Prefix Computation. J. ACM 27, 4 (Oct. 1980), 831--838.
[19]
Dan Lin, Nigel Medforth, Kenneth S Herdy, Arrvindh Shriraman, and Rob Cameron. 2012. Parabix: Boosting the efficiency of text processing on commodity processors. In High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 1--12.
[20]
Weifeng Liu and Brian Vinter. 2015. Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors. Parallel Comput. 49 (2015), 179--193.
[21]
D. Luchaup, R. Smith, C. Estan, and S. Jha. 2009. Multi-byte regular expression matching with speculation. In RAID.
[22]
Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte. 2014. Data-Parallel Finite-State Machines. In ASPLOS '14: Proceedings of 19th International Conference on Architecture Support for Programming Languages and Operating Systems. ACM Press.
[23]
Gonzalo Navarro. 2001. NR-grep: a fast and flexible pattern-matching tool. Software: Practice and Experience 31, 13 (2001), 1265--1312.
[24]
Yinfei Pan, Ying Zhang, Kenneth Chiu, and Wei Lu. 2007. Parallel xml parsing using meta-dfas. In e-Science and Grid Computing, IEEE International Conference on. IEEE, 237--244.
[25]
Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The Tao of Parallelism in Algorithms. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '11).
[26]
P. Prabhu, G. Ramalingam, and K. Vaswani. 2010. Safe Programmable Speculative Parallelism. In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation.
[27]
Junqiao Qiu, Zhijia Zhao, and Bin Ren. 2016. MicroSpec: Speculation-centric fine-grained parallelization for FSM computations. In Parallel Architecture and Compilation Techniques (PACT), 2016 International Conference on. IEEE, 221--233.
[28]
C.G. Quinones, C. Madriles, J. Sanchez, P. Marcuello, A. Gonzalez, and D. M. Tullsen. 2005. Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices. In PLDI.
[29]
A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. 2010. Speculative parallelization using software multi-threaded transactions. In Proceedings of the international conference on Architectural support for programming languages and operating systems.
[30]
Lawrence Rauchwerger and David A. Padua. 1995. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In Proceedings of the ACM SIGPLAN'95 Conference on Programming Language Design and Implementation (PLDI), La Jolla, California, USA, June 18--21, 1995. 218--232.
[31]
Bin Ren, Gagan Agrawal, James R Larus, Todd Mytkowicz, Tomi Poutanen, and Wolfram Schulte. 2013. SIMD parallelization of applications that traverse irregular data structures. In Code Generation and Optimization (CGO), 2013 IEEE/ACM International Symposium on. IEEE, 1--10.
[32]
Martin Roesch et al. 1999. Snort: Lightweight Intrusion Detection for Networks. In LISA, Vol. 99. 229--238.
[33]
Indranil Roy and Srinivas Aluru. 2014. Finding motifs in biological sequences using the micron automata processor. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International. IEEE, 415--424.
[34]
Priti Shankar, Amitava Dasgupta, Kaustubh Deshmukh, and B Sundar Rajan. 2003. On viewing block codes as finite automata. Theoretical Computer Science 290, 3 (2003), 1775--1797.
[35]
C. Tian, M. Feng, V. Nagarajan, and R. Gupta. 2008. Copy Or Discard Execution Model For Speculative Parallelization On Multicores. In Proceedings of the International Symposium on Microarchitecture.
[36]
Fang Yu, Zhifeng Chen, Yanlei Diao, TV Lakshman, and Randy H Katz. 2006. Fast and memory-efficient regular expression matching for deep packet inspection. In Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems. ACM, 93--102.
[37]
Zhijia Zhao and Xipeng Shen. 2015. On-the-Fly Principled Speculation for FSM Parallelization. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '15, Istanbul, Turkey, March 14--18, 2015. 619--630.
[38]
Zhijia Zhao, Bo Wu, and Xipeng Shen. 2014. Challenging the "Embarrassingly Sequential": Parallelizing Finite State Machine-Based Computations through Principled Speculation. In ASPLOS '14: Proceedings of 19th International Conference on Architecture Support for Programming Languages and Operating Systems. ACM Press.
[39]
Y. Zu, M. Yang, Z. Xu, L. Wang, X. Tian, K. Peng, and Q. Dong. 2009. GPU-based NFA implementation for memory efficient high speed regular expression matching. In PPoPP '12: Proceedings of the ACM SIGPLAN symposium on Principles and practice of parallel programming. 129--140.

Cited By

View all
  • (2024)ngAP: Non-blocking Large-scale Automata Processing on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624848(268-285)Online publication date: 27-Apr-2024
  • (2023)Corrigendum: Coupled cluster theory on modern heterogeneous supercomputersFrontiers in Chemistry10.3389/fchem.2023.125651011Online publication date: 15-Aug-2023
  • (2023)Coupled cluster theory on modern heterogeneous supercomputersFrontiers in Chemistry10.3389/fchem.2023.115452611Online publication date: 14-Jun-2023
  • Show More Cited By

Index Terms

  1. Enabling scalability-sensitive speculative parallelization for FSM computations

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ICS '17: Proceedings of the International Conference on Supercomputing
      June 2017
      300 pages
      ISBN:9781450350204
      DOI:10.1145/3079079
      • General Chairs:
      • William D. Gropp,
      • Pete Beckman,
      • Program Chairs:
      • Zhiyuan Li,
      • Francisco J. Cazorla
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 June 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. FSM
      2. finite state machine
      3. parallelization
      4. scalability
      5. speculation

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      ICS '17
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 629 of 2,180 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)93
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)ngAP: Non-blocking Large-scale Automata Processing on GPUsProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624848(268-285)Online publication date: 27-Apr-2024
      • (2023)Corrigendum: Coupled cluster theory on modern heterogeneous supercomputersFrontiers in Chemistry10.3389/fchem.2023.125651011Online publication date: 15-Aug-2023
      • (2023)Coupled cluster theory on modern heterogeneous supercomputersFrontiers in Chemistry10.3389/fchem.2023.115452611Online publication date: 14-Jun-2023
      • (2023)Asynchronous Automata Processing on GPUsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35794537:1(1-27)Online publication date: 2-Mar-2023
      • (2023)Parallel Pattern Matching over Brotli Compressed Network Traffic2023 IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)10.1109/TrustCom60117.2023.00079(477-484)Online publication date: 1-Nov-2023
      • (2022)GSpecPal: Speculation-Centric Finite State Machine Parallelization on GPUs2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00053(481-491)Online publication date: May-2022
      • (2021)Scalable structural index construction for JSON analyticsProceedings of the VLDB Endowment10.14778/3436905.343692614:4(694-707)Online publication date: 22-Feb-2021
      • (2021)Scalable FSM parallelization via path fusion and higher-order speculationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446705(887-901)Online publication date: 19-Apr-2021
      • (2020)Reliability Analysis for Unreliable FSM ComputationsACM Transactions on Architecture and Code Optimization10.1145/337745617:2(1-23)Online publication date: 29-May-2020
      • (2020)Why GPUs are Slow at Executing NFAs and How to Make them FasterProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378471(251-265)Online publication date: 9-Mar-2020
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media