[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2523721.2523750acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

An empirical model for predicting cross-core performance interference on multicore processors

Published: 07 October 2013 Publication History

Abstract

Despite their widespread adoption in cloud computing, multicore processors are heavily under-utilized in terms of computing resources. To avoid the potential for negative and unpredictable interference, co-location of a latency-sensitive application with others on the same multicore processor is disallowed, leaving many cores idle and causing low machine utilization. To enable co-location while providing QoS guarantees, it is challenging but important to predict performance interference between co-located applications.
This research is driven by two key insights. First, the performance degradation of an application can be represented as a predictor function of the aggregate pressures on shared resources from all cores, regardless of which applications are co-running and what their individual pressures are. Second, a predictor function is piecewise rather than non-piecewise as in prior work, thereby enabling different types of dominant contention factors to be more accurately captured by different subfunctions in its different subdomains. Based on these insights, we propose to adopt a two-phase regression approach to efficiently building a predictor function. Validation using a large number of benchmarks and nine real-world datacenter applications on three different platforms shows that our approach is also precise, with an average error not exceeding 0.4%. When applied to the nine datacenter applications, our approach improves overall resource utilization from 50% to 88% at the cost of 10% QoS degradation.

References

[1]
B. Bao and C. Ding. Defensive loop tiling for shared cache. In CGO, 2013.
[2]
L. A. Barroso and U. Hölzle. The case for energy-proportional computing. In IEEE Computer, 2007.
[3]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT, 2008.
[4]
D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA, 2005.
[5]
J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS, 2007.
[6]
S. Chen, P. Gibbons, M. Kozuch, V. liaskovitis, A. Ailamaki, G. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on CMPs. In SPAA, 2007.
[7]
S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006.
[8]
H. Cui, L. Wang, J. Xue, Y. Yang, and X. Feng. Automatic library generation for BLAS3 on GPUs. In IPDPS, 2011.
[9]
H. Cui, J. Xue, L. Wang, Y. Yang, X. Feng, and D. Fan. Extendable pattern-oriented optimization directives. In CGO, 2011.
[10]
H. Cui, J. Xue, L. Wang, Y. Yang, X. Feng, and D. Fan. Extendable pattern-oriented optimization directives. ACM Transactions on Architecture and Code Optimization, 9(3):14, 2012.
[11]
D. Eklov, N. Nikoleris, D. Black-Schaffer, and E. Hagersten. Bandwidth Bandit: Quantitative characterization of memory contention. In CGO, 2013.
[12]
A. Fedorova, M. Seltzer, and M. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In PACT, 2007.
[13]
F. Guo and Y. Solihin. An analytical model for cache replacement policy performance. In SIGMETRICS, 2006.
[14]
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In IISWC, 2001.
[15]
Intel. Intel performance tuning utility. http://software.intel.com/en-us/articles/intel-performance-tuning-utility.
[16]
Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In PACT, 2008.
[17]
Y. Jiang, K. Tian, and X. Shen. Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In HiPEAC, 2010.
[18]
S. Jim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004.
[19]
R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS observations to improve performance in multicore systems. In Micro, 2008.
[20]
Y. Liang and T. Mitra. Cache modeling in probabilistic execution time analysis. In DAC, 2008.
[21]
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA, 2008.
[22]
F. Liu, X. Jiang, and Y. Solihin. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In HPCA, 2010.
[23]
J. Machina and A. Sodan. Predicting cache needs and cache sensitivity for applications in cloud computing on CMP servers with configurable caches. In IPDPS, 2009.
[24]
J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations. In MICRO, 2011.
[25]
J. Mars, L. Tang, and M. L. Soffa. Directly characterizing cross-core interference through contention synthesis. In HiPEAC, 2011.
[26]
J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. Contention aware execution: online contention detection and response. In CGO, 2010.
[27]
R. L. Matterson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. In IBM Systems Journal 9, 1970.
[28]
K. Nesbit, M. Moreto, F. Cazorla, A. Ramirez, M. Valero, and J. Smith. Multicore resource management. In MICRO, 2008.
[29]
M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA, 2006.
[30]
M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006.
[31]
K. Sankaralingam and R. H. Arpaci-Dusseau. Get the parallelism out of my cloud. In HotPar, 2010.
[32]
L. Shang, X. Xie, and J. Xue. On-demand dynamic summary-based points-to analysis. In CGO, pages 264--274, 2012.
[33]
S. Srikantaiah, R. Das, A. K. Mishra, C. R. Das, and M. Kandemir. A case for integrated processor-cache partitioning in chip multiprocessors. In SC, 2009.
[34]
Y. Sui, Y. Li, and J. Xue. Query-directed adaptive heap cloning for optimizing compilers. In CGO, pages 1--11, 2013.
[35]
L. Tang, J. Mars, and M. L. Soffa. Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures. In EXADAPT, 2011.
[36]
L. Tang, J. Mars, and M. L. Soffa. Compiling For Niceness Mitigating Contention for QoS in Warehouse Scale Computers. In CGO, 2012.
[37]
L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. In ISCA, 2011.
[38]
X. Vera, B. Lisper, and J. Xue. Data caches in multitasking hard real-time systems. In RTSS, pages 154--165, 2003.
[39]
X. Vera, B. Lisper, and J. Xue. Data cache locking for tight timing calculations. ACM Transactions on Embedded Computing Systems, 7(1), 2007.
[40]
Y. Xie and G. Loh. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA, 2009.
[41]
D. Xu, C. Wu, P. Yew, J. Li, and Z. Wang. Providing Fairness on Shared-Memory Multiprocessors via Process Scheduling. In SIGMETRICS, 2012.
[42]
D. Xu, C. Wu, and P.-C. Yew. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In PACT, 2010.
[43]
H. Yu, J. Xue, W. Huo, X. Feng, and Z. Zhang. Level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines of code. In CGO, pages 218--229, 2010.
[44]
X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multicore cache management. In EuroSys, 2009.
[45]
S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS, 2010.
[46]
S. Zhuravlev, J. C. Saez, S. Blagodurov, A. Fedorova, and M. Prieto. Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Computing Surveys, pages 1--31, 2011.

Cited By

View all
  • (2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
  • (2020)Contention-aware application performance prediction for disaggregated memory systemsProceedings of the 17th ACM International Conference on Computing Frontiers10.1145/3387902.3392625(49-59)Online publication date: 11-May-2020
  • (2020)RhythmProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387534(1-17)Online publication date: 15-Apr-2020
  • Show More Cited By

Index Terms

  1. An empirical model for predicting cross-core performance interference on multicore processors

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
      October 2013
      422 pages
      ISBN:9781479910212

      Sponsors

      Publisher

      IEEE Press

      Publication History

      Published: 07 October 2013

      Check for updates

      Author Tags

      1. cross-core performance interference
      2. memory subsystems
      3. multicore processors
      4. performance analysis
      5. prediction model

      Qualifiers

      • Research-article

      Acceptance Rates

      PACT '13 Paper Acceptance Rate 36 of 208 submissions, 17%;
      Overall Acceptance Rate 121 of 471 submissions, 26%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)8
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 24 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
      • (2020)Contention-aware application performance prediction for disaggregated memory systemsProceedings of the 17th ACM International Conference on Computing Frontiers10.1145/3387902.3392625(49-59)Online publication date: 11-May-2020
      • (2020)RhythmProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387534(1-17)Online publication date: 15-Apr-2020
      • (2019)EMBAProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337863(1-12)Online publication date: 5-Aug-2019
      • (2018)Revisiting Loop Tiling for DatacentersProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205306(328-340)Online publication date: 12-Jun-2018
      • (2018)LazygraphACM SIGPLAN Notices10.1145/3200691.317850853:1(276-289)Online publication date: 10-Feb-2018
      • (2018)LazygraphProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178508(276-289)Online publication date: 10-Feb-2018
      • (2018)Rate-based thermal, power, and co-location aware resource management for heterogeneous data centersJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.04.015112:P2(126-139)Online publication date: 1-Feb-2018
      • (2018)Non-clairvoyant online scheduling of synchronized jobs on virtual clustersThe Journal of Supercomputing10.1007/s11227-018-2262-474:6(2353-2384)Online publication date: 1-Jun-2018
      • (2017)Reducing Load Imbalance of Virtual Clusters via Reconfiguration and Adaptive Job SchedulingProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.60(992-999)Online publication date: 14-May-2017
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media