[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3445814.3446739acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article
Public Access

BayesPerf: minimizing performance monitoring errors using Bayesian statistics

Published: 17 April 2021 Publication History

Abstract

Hardware performance counters (HPCs) that measure low-level architectural and microarchitectural events provide dynamic contextual information about the state of the system. However, HPC measurements are error-prone due to non determinism (e.g., undercounting due to event multiplexing, or OS interrupt-handling behaviors). In this paper, we present BayesPerf, a system for quantifying uncertainty in HPC measurements by using a domain-driven Bayesian model that captures microarchitectural relationships between HPCs to jointly infer their values as probability distributions. We provide the design and implementation of an accelerator that allows for low-latency and low-power inference of the BayesPerf model for x86 and ppc64 CPUs. BayesPerf reduces the average error in HPC measurements from 40.1% to 7.6% when events are being multiplexed. The value of BayesPerf in real-time decision-making is illustrated with a simple example of scheduling of PCIe transfers.

References

[1]
Glenn Ammons, Thomas Ball, and James R. Larus. 1997. Exploiting Hardware Performance Counters with Flow and Context Sensitive Profiling. SIGPLAN Not. 32, 5 (May 1997 ), 85-96. https://doi.org/10.1145/258916.258924
[2]
Subho Banerjee, Saurabh Jha, Zbigniew Kalbarczyk, and Ravishankar Iyer. 2020. Inductive-bias-driven Reinforcement Learning For Eficient Schedules in Heterogeneous Clusters. In Proceedings of the 37th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 119 ), Hal Daumé III and Aarti Singh (Eds.). PMLR, Virtual, 629-641. http://proceedings.mlr.press/v119/ banerjee20a.html
[3]
Subho S. Banerjee, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. 2019. AcMC2 : Accelerating Markov Chain Monte Carlo Algorithms for Probabilistic Models. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) ( ASPLOS '19). ACM, New York, NY, USA, 515-528. https://doi.org/10.1145/3297858.3304019
[4]
Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The Multikernel: A New OS Architecture for Scalable Multicore Systems. In Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles (Big Sky, Montana, USA) ( SOSP '09). ACM, New York, NY, USA, 29-44. https: //doi.org/10.1145/1629575.1629579
[5]
Donald J. Berndt and James Cliford. 1994. Using Dynamic Time Warping to Find Patterns in Time Series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (Seattle, WA) ( AAAIWS'94). AAAI Press, 359-370.
[6]
Jingde Chen, Subho S. Banerjee, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. 2020. Machine Learning for Load Balancing in the Linux Kernel. In Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems (Tsukuba, Japan) ( APSys '20). Association for Computing Machinery, New York, NY, USA, 67-74. https://doi.org/10.1145/3409963.3410492
[7]
Intel Corp. 2016. Intel® 64 and IA-32 Architectures Software Developer Manuals. https://software.intel.com/en-us/articles/intel-sdm. Accessed 2019-03-05.
[8]
S. Das, J. Werner, M. Antonakakis, M. Polychronakis, and F. Monrose. 2019. SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security. In 2019 IEEE Symposium on Security and Privacy (SP). 20-38. https: //doi.org/10.1109/SP. 2019.00021
[9]
Pritam Dash, Mehdi Karimibiuki, and Karthik Pattabiraman. 2019. Out of Control: Stealthy Attacks against Robotic Vehicles Protected by Control-Based Techniques. In Proceedings of the 35th Annual Computer Security Applications Conference (San Juan, Puerto Rico) (ACSAC '19). Association for Computing Machinery, New York, NY, USA, 660-672. https://doi.org/10.1145/3359789.3359847
[10]
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware Scheduling for Heterogeneous Datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (Houston, Texas, USA) ( ASPLOS '13). ACM, New York, NY, USA, 77-88. https://doi.org/10.1145/2451116.2451125
[11]
Joshua V Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hofman, and Rif A Saurous. 2017. Tensorflow distributions. arXiv preprint arXiv:1711.10604 ( 2017 ).
[12]
M. Dimakopoulou, S. Eranian, N. Koziris, and N. Bambos. 2016. Reliable and Eficient Performance Monitoring in Linux. In SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 396-408. https://doi.org/10.1109/SC. 2016.33
[13]
Yi Ding, Nikita Mishra, and Henry Hofmann. 2019. Generative and Multiphase Learning for Computer Systems Optimization. In Proceedings of the 46th International Symposium on Computer Architecture (Phoenix, Arizona) ( ISCA '19). ACM, New York, NY, USA, 39-52. https://doi.org/10.1145/3307650.3326633
[14]
Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou. 2019. Seer: Leveraging Big Data to Navigate the Complexity of Performance Debugging in Cloud Microservices. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (Providence, RI, USA) ( ASPLOS '19). ACM, New York, NY, USA, 19-33. https://doi.org/10.1145/3297858.3304004
[15]
A Gelman, JB Carlin, HS Stern, and DB Rubin. 1995. Bayesian Data Analysis. Chapman & Hall, New York.
[16]
Andrew Gelman, Aki Vehtari, Pasi Jylänki, Tuomas Sivula, Dustin Tran, Swupnil Sahai, Paul Blomstedt, John P Cunningham, David Schiminovich, and Christian Robert. 2017. Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data. arXiv preprint arXiv:1412.4869 ( 2017 ).
[17]
Jana Giceva, Gustavo Alonso, Timothy Roscoe, and Tim Harris. 2014. Deployment of Query Plans on Multicores. Proc. VLDB Endow. 8, 3 (Nov. 2014 ), 233-244. https://doi.org/10.14778/2735508.2735513
[18]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672-2680. http://papers.nips.cc/paper/5423-generative-adversarialnets.pdf
[19]
Brian Hall, Peter Bergner, Alon Shalev Housfater, Madhusudanan Kandasamy, Tulio Magno, Alex Mericas, Steve Munroe, Mauricio Oliveira, Bill Schmidt, Will Schmidt, et al. 2017. Performance optimization and tuning techniques for IBM Power Systems processors including IBM POWER8. IBM Redbooks.
[20]
Matthew D Hofman, David M Blei, Chong Wang, and John Paisley. 2013. Stochastic variational inference. The Journal of Machine Learning Research 14, 1 ( 2013 ), 1303-1347.
[21]
Intel. 2014. Intel 64 and IA-32 architectures optimization reference manual. Intel Corporation, Sept ( 2014 ).
[22]
Intel. 2016. Sparkbench: The Big Data Micro Benchmark Suite for Spark 2.0. https://github.com/intel-hadoop/HiBench/. Accessed 19-November-2019.
[23]
Intel. 2019. Top-down Microarchitecture Analysis Method. https://software.intel. com/en-us/ vtune-cookbook-top-down-microarchitecture-analysis-method. [Online; accessed 19-November-2019].
[24]
Saurabh Jha, Shengkun Cui, Subho S Banerjee, Timothy Tsai, Zbigniew T Kalbarczyk, and Ravishankar K Iyer. 2020. ML-driven Malware for Targeting AV Safety. In 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE.
[25]
Hans-Jürgen Koch. 2006. The Userspace I/O HOWTO. https://www.kernel.org/ doc/html/v4.12/driver-api/uio-howto. html. [Online; accessed 19-November2019].
[26]
Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models: principles and techniques. MIT press.
[27]
Linux Community. 2019. perf: Linux profiling with performance counters. https: //perf.wiki.kernel.org/index.php/Main_Page. [Online; accessed 19-November2019 ].
[28]
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geof Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: building customized program analysis tools with dynamic instrumentation. Acm sigplan notices 40, 6 ( 2005 ), 190-200.
[29]
Y. Lv, B. Sun, Q. Luo, J. Wang, Z. Yu, and X. Qian. 2018. CounterMiner: Mining Big Performance Data from Hardware Counters. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 613-626. https://doi. org/10.1109/MICRO. 2018.00056
[30]
J. M. May. 2001. MPX: Software for multiplexing hardware performance counters in multithreaded programs. In Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001. 8 pp.-. https://doi.org/10.1109/ IPDPS. 2001.924955
[31]
Thomas P. Minka. 2001. Expectation Propagation for Approximate Bayesian Inference. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (Seattle, Washington) (UAI'01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 362-369.
[32]
T. Mytkowicz, P. F. Sweeney, M. Hauswirth, and A. Diwan. 2007. Time Interpolation: So Many Metrics, So Few Registers. In 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007 ). 286-300. https: //doi.org/10.1109/MICRO. 2007.27
[33]
Richard Neill, Andi Drebes, and Antoniu Pop. 2017. Fuse: Accurate Multiplexing of Hardware Performance Counters Across Executions. ACM Trans. Archit. Code Optim. 14, 4, Article 43 ( Dec. 2017 ), 26 pages. https://doi.org/10.1145/3148054
[34]
Manfred Opper and Ole Winther. 2000. Gaussian Processes for Classification: Mean-Field Algorithms. Neural Comput. 12, 11 (Nov. 2000 ), 2655-2684. https: //doi.org/10.1162/089976600300014881
[35]
Michael K. Papamichael and James C. Hoe. 2012. CONNECT: Re-examining Conventional Wisdom for Designing Nocs in the Context of FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, California, USA) ( FPGA '12). ACM, New York, NY, USA, 37-46. https: //doi.org/10.1145/2145694.2145703
[36]
Raghavendra Pradyumna Pothukuchi, Joseph L. Greathouse, Karthik Rao, Christopher Erb, Leonardo Piga, Petros G. Voulgaris, and Josep Torrellas. 2019. Tangram: Integrated Control of Heterogeneous Computers. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (Columbus, OH, USA) ( MICRO '52). ACM, New York, NY, USA, 384-398. https: //doi.org/10.1145/3352460.3358285
[37]
R. P. Pothukuchi, S. Y. Pothukuchi, P. Voulgaris, and J. Torrellas. 2018. Yukta: Multilayer Resource Controllers to Maximize Eficiency. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). 505-518. https: //doi.org/10.1109/ISCA. 2018.00049
[38]
Haoran Qiu, Subho S. Banerjee, Saurabh Jha, Zbigniew T. Kalbarczyk, and Ravishankar K. Iyer. 2020. FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 805-825. https://www.usenix.org/conference/osdi20/presentation/qiu
[39]
J. Stuecheli, B. Blaner, C. R. Johns, and M. S. Siegel. 2015. CAPI: A Coherent Accelerator Processor Interface. IBM Journal of Research and Development 59, 1 (Jan 2015 ), 7 : 1-7 :7. https://doi.org/10.1147/JRD. 2014.2380198
[40]
Stephen J. Tarsa, Rangeen Basu Roy Chowdhury, Julien Sebot, Gautham Chinya, Jayesh Gaur, Karthik Sankaranarayanan, Chit-Kwan Lin, Robert Chappell, Ronak Singhal, and Hong Wang. 2019. Post-silicon CPU Adaptation Made Practical Using Machine Learning. In Proceedings of the 46th International Symposium on Computer Architecture (Phoenix, Arizona) ( ISCA '19). ACM, New York, NY, USA, 14-26. https://doi.org/10.1145/3307650.3322267
[41]
Linus Torvald. 2020. Linux Perf Subsystem Userspace Tools. https://git.kernel.org/ pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/pmu-events /arch. Accessed 2020-03-05.
[42]
Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, and David M. Blei. 2016. Edward: A library for probabilistic modeling, inference, and criticism. arXiv preprint arXiv:1610.09787 ( 2016 ).
[43]
Vincent M Weaver and Sally A McKee. 2008. Can hardware performance counters be trusted?. In 2008 IEEE International Symposium on Workload Characterization. IEEE, 141-150.
[44]
V. M. Weaver, D. Terpstra, and S. Moore. 2013. Non-determinism and overcount on modern hardware performance counter implementations. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 215-224. https://doi.org/10.1109/ISPASS. 2013.6557172
[45]
A. Yasin. 2014. A Top-Down method for performance analysis and counters architecture. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 35-44. https://doi.org/10.1109/ISPASS. 2014. 6844459
[46]
Wucherl Yoo, Kevin Larson, Lee Baugh, Sangkyum Kim, and Roy H. Campbell. 2012. ADP: Automated Diagnosis of Performance Pathologies Using Hardware Events. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (London, England, UK) ( SIGMETRICS '12). Association for Computing Machinery, New York, NY, USA, 283-294. https://doi.org/10.1145/2254756.2254791
[47]
Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: A Unified Engine for Big Data Processing. Commun. ACM 59, 11 (Oct. 2016 ), 56-65. https://doi.org/10.1145/2934664
[48]
Gerd Zellweger, Denny Lin, and Timothy Roscoe. 2016. So Many Performance Events, So Little Time. In Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems (Hong Kong, Hong Kong) (APSys '16). ACM, New York, NY, USA, Article 14, 9 pages. https://doi.org/10.1145/2967360.2967375

Cited By

View all
  • (2024)Efficient Cross-platform Multiplexing of Hardware Performance Counters via Adaptive GroupingACM Transactions on Architecture and Code Optimization10.1145/362952521:1(1-26)Online publication date: 19-Jan-2024
  • (2024) CBANA: A Lightweight, Efficient, and Flexible C ache B ehavior Ana lysis Framework IEEE Transactions on Computers10.1109/TC.2024.341674773:9(2262-2274)Online publication date: Sep-2024
  • (2024)An Empirical Study of Performance Interference: Timing Violation Patterns and Impacts2024 IEEE 30th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS61025.2024.00033(320-333)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
April 2021
1090 pages
ISBN:9781450383172
DOI:10.1145/3445814
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Accelerator
  2. Error Correction
  3. Error Detection
  4. Performance Counter
  5. Probabilistic Graphical Model
  6. Sampling Errors

Qualifiers

  • Article

Funding Sources

Conference

ASPLOS '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)239
  • Downloads (Last 6 weeks)32
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Efficient Cross-platform Multiplexing of Hardware Performance Counters via Adaptive GroupingACM Transactions on Architecture and Code Optimization10.1145/362952521:1(1-26)Online publication date: 19-Jan-2024
  • (2024) CBANA: A Lightweight, Efficient, and Flexible C ache B ehavior Ana lysis Framework IEEE Transactions on Computers10.1109/TC.2024.341674773:9(2262-2274)Online publication date: Sep-2024
  • (2024)An Empirical Study of Performance Interference: Timing Violation Patterns and Impacts2024 IEEE 30th Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS61025.2024.00033(320-333)Online publication date: 13-May-2024
  • (2023)Methodologies Based on Hardware Performance Counters for Supporting CybersecurityContemporary Challenges for Cyber Security and Data Privacy10.4018/979-8-3693-1528-6.ch007(108-129)Online publication date: 16-Oct-2023
  • (2023)μConAdapterProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624980(427-442)Online publication date: 30-Oct-2023
  • (2023)On the impact of hardware-related events on the execution of real-time programsDesign Automation for Embedded Systems10.1007/s10617-023-09281-927:4(275-302)Online publication date: 31-Dec-2023
  • (2023)Strategies and software support for the management of hardware performance countersSoftware: Practice and Experience10.1002/spe.323653:10(1928-1957)Online publication date: 17-Jul-2023
  • (2022)Characterizing Job Microarchitectural Profiles at Scale: Dataset and AnalysisProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545026(1-11)Online publication date: 29-Aug-2022
  • (2022)Cache Antagonists Identification: A Practice from Alibaba Colocation Datacenter2022 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)10.1109/ISSREW55968.2022.00031(7-12)Online publication date: Oct-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media