More Web Proxy on the site http://driver.im/

research-article

An empirical model for predicting cross-core performance interference on multicore processors

Authors:

Wensen YangAuthors Info & Claims

PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Pages 201 - 212

Published: 07 October 2013 Publication History

Abstract

Despite their widespread adoption in cloud computing, multicore processors are heavily under-utilized in terms of computing resources. To avoid the potential for negative and unpredictable interference, co-location of a latency-sensitive application with others on the same multicore processor is disallowed, leaving many cores idle and causing low machine utilization. To enable co-location while providing QoS guarantees, it is challenging but important to predict performance interference between co-located applications.

This research is driven by two key insights. First, the performance degradation of an application can be represented as a predictor function of the aggregate pressures on shared resources from all cores, regardless of which applications are co-running and what their individual pressures are. Second, a predictor function is piecewise rather than non-piecewise as in prior work, thereby enabling different types of dominant contention factors to be more accurately captured by different subfunctions in its different subdomains. Based on these insights, we propose to adopt a two-phase regression approach to efficiently building a predictor function. Validation using a large number of benchmarks and nine real-world datacenter applications on three different platforms shows that our approach is also precise, with an average error not exceeding 0.4%. When applied to the nine datacenter applications, our approach improves overall resource utilization from 50% to 88% at the cost of 10% QoS degradation.

References

[1]

B. Bao and C. Ding. Defensive loop tiling for shared cache. In CGO, 2013.

Digital Library

[2]

L. A. Barroso and U. Hölzle. The case for energy-proportional computing. In IEEE Computer, 2007.

Digital Library

[3]

C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In PACT, 2008.

Digital Library

[4]

D. Chandra, F. Guo, S. Kim, and Y. Solihin. Predicting inter-thread cache contention on a chip multi-processor architecture. In HPCA, 2005.

Digital Library

[5]

J. Chang and G. S. Sohi. Cooperative cache partitioning for chip multiprocessors. In ICS, 2007.

Digital Library

[6]

S. Chen, P. Gibbons, M. Kozuch, V. liaskovitis, A. Ailamaki, G. Blelloch, B. Falsafi, L. Fix, N. Hardavellas, T. Mowry, and C. Wilkerson. Scheduling threads for constructive cache sharing on CMPs. In SPAA, 2007.

Digital Library

[7]

S. Cho and L. Jin. Managing distributed, shared L2 caches through OS-level page allocation. In MICRO, 2006.

Digital Library

[8]

H. Cui, L. Wang, J. Xue, Y. Yang, and X. Feng. Automatic library generation for BLAS3 on GPUs. In IPDPS, 2011.

Digital Library

[9]

H. Cui, J. Xue, L. Wang, Y. Yang, X. Feng, and D. Fan. Extendable pattern-oriented optimization directives. In CGO, 2011.

Digital Library

[10]

H. Cui, J. Xue, L. Wang, Y. Yang, X. Feng, and D. Fan. Extendable pattern-oriented optimization directives. ACM Transactions on Architecture and Code Optimization, 9(3):14, 2012.

Digital Library

[11]

D. Eklov, N. Nikoleris, D. Black-Schaffer, and E. Hagersten. Bandwidth Bandit: Quantitative characterization of memory contention. In CGO, 2013.

Digital Library

[12]

A. Fedorova, M. Seltzer, and M. Smith. Improving performance isolation on chip multiprocessors via an operating system scheduler. In PACT, 2007.

Digital Library

[13]

F. Guo and Y. Solihin. An analytical model for cache replacement policy performance. In SIGMETRICS, 2006.

Digital Library

[14]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench: A free, commercially representative embedded benchmark suite. In IISWC, 2001.

[15]

Intel. Intel performance tuning utility. http://software.intel.com/en-us/articles/intel-performance-tuning-utility.

[16]

Y. Jiang, X. Shen, J. Chen, and R. Tripathi. Analysis and approximation of optimal co-scheduling on chip multiprocessors. In PACT, 2008.

Digital Library

[17]

Y. Jiang, K. Tian, and X. Shen. Combining locality analysis with online proactive job co-scheduling in chip multiprocessors. In HiPEAC, 2010.

Digital Library

[18]

S. Jim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT, 2004.

Digital Library

[19]

R. Knauerhase, P. Brett, B. Hohlt, T. Li, and S. Hahn. Using OS observations to improve performance in multicore systems. In Micro, 2008.

Digital Library

[20]

Y. Liang and T. Mitra. Cache modeling in probabilistic execution time analysis. In DAC, 2008.

Digital Library

[21]

J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In HPCA, 2008.

[22]

F. Liu, X. Jiang, and Y. Solihin. Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In HPCA, 2010.

[23]

J. Machina and A. Sodan. Predicting cache needs and cache sensitivity for applications in cloud computing on CMP servers with configurable caches. In IPDPS, 2009.

Digital Library

[24]

J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations. In MICRO, 2011.

Digital Library

[25]

J. Mars, L. Tang, and M. L. Soffa. Directly characterizing cross-core interference through contention synthesis. In HiPEAC, 2011.

Digital Library

[26]

J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. Contention aware execution: online contention detection and response. In CGO, 2010.

Digital Library

[27]

R. L. Matterson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. In IBM Systems Journal 9, 1970.

Digital Library

[28]

K. Nesbit, M. Moreto, F. Cazorla, A. Ramirez, M. Valero, and J. Smith. Multicore resource management. In MICRO, 2008.

Digital Library

[29]

M. K. Qureshi, D. N. Lynch, O. Mutlu, and Y. N. Patt. A case for MLP-aware cache replacement. In ISCA, 2006.

Digital Library

[30]

M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In MICRO, 2006.

Digital Library

[31]

K. Sankaralingam and R. H. Arpaci-Dusseau. Get the parallelism out of my cloud. In HotPar, 2010.

Digital Library

[32]

L. Shang, X. Xie, and J. Xue. On-demand dynamic summary-based points-to analysis. In CGO, pages 264--274, 2012.

Digital Library

[33]

S. Srikantaiah, R. Das, A. K. Mishra, C. R. Das, and M. Kandemir. A case for integrated processor-cache partitioning in chip multiprocessors. In SC, 2009.

Digital Library

[34]

Y. Sui, Y. Li, and J. Xue. Query-directed adaptive heap cloning for optimizing compilers. In CGO, pages 1--11, 2013.

Digital Library

[35]

L. Tang, J. Mars, and M. L. Soffa. Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures. In EXADAPT, 2011.

Digital Library

[36]

L. Tang, J. Mars, and M. L. Soffa. Compiling For Niceness Mitigating Contention for QoS in Warehouse Scale Computers. In CGO, 2012.

Digital Library

[37]

L. Tang, J. Mars, N. Vachharajani, R. Hundt, and M. L. Soffa. The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. In ISCA, 2011.

Digital Library

[38]

X. Vera, B. Lisper, and J. Xue. Data caches in multitasking hard real-time systems. In RTSS, pages 154--165, 2003.

Digital Library

[39]

X. Vera, B. Lisper, and J. Xue. Data cache locking for tight timing calculations. ACM Transactions on Embedded Computing Systems, 7(1), 2007.

Digital Library

[40]

Y. Xie and G. Loh. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In ISCA, 2009.

Digital Library

[41]

D. Xu, C. Wu, P. Yew, J. Li, and Z. Wang. Providing Fairness on Shared-Memory Multiprocessors via Process Scheduling. In SIGMETRICS, 2012.

Digital Library

[42]

D. Xu, C. Wu, and P.-C. Yew. On mitigating memory bandwidth contention through bandwidth-aware scheduling. In PACT, 2010.

Digital Library

[43]

H. Yu, J. Xue, W. Huo, X. Feng, and Z. Zhang. Level by level: making flow- and context-sensitive pointer analysis scalable for millions of lines of code. In CGO, pages 218--229, 2010.

Digital Library

[44]

X. Zhang, S. Dwarkadas, and K. Shen. Towards practical page coloring-based multicore cache management. In EuroSys, 2009.

Digital Library

[45]

S. Zhuravlev, S. Blagodurov, and A. Fedorova. Addressing shared resource contention in multicore processors via scheduling. In ASPLOS, 2010.

Digital Library

[46]

S. Zhuravlev, J. C. Saez, S. Blagodurov, A. Fedorova, and M. Prieto. Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Computing Surveys, pages 1--31, 2011.

Digital Library

Cited By

Zhao LCui YYang YZhou XQiu TLi KBao Y(2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
https://dl.acm.org/doi/10.1145/3630006
Zacarias FNishtala RCarpenter PPalesi MPalermo GGraves CArima E(2020)Contention-aware application performance prediction for disaggregated memory systemsProceedings of the 17th ACM International Conference on Computing Frontiers10.1145/3387902.3392625(49-59)Online publication date: 11-May-2020
https://dl.acm.org/doi/10.1145/3387902.3392625
Zhao LYang YZhang KZhou XQiu TLi KBao YBilas AMagoutis KMarkatos EKostic DSeltzer M(2020)RhythmProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387534(1-17)Online publication date: 15-Apr-2020
https://dl.acm.org/doi/10.1145/3342195.3387534
Show More Cited By

Index Terms

An empirical model for predicting cross-core performance interference on multicore processors
1. General and reference
  1. Cross-computing tools and techniques
    1. Design
2. Hardware
  1. Hardware validation

Recommendations

Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis

Despite their widespread adoption in cloud computing, multicore processors are heavily under-utilized in terms of computing resources. To avoid the potential for negative and unpredictable interference, co-location of a latency-sensitive application ...
Parallelism via Multithreaded and Multicore CPUs

Multicore and multithreaded CPUs have become the new approach to obtaining increases in CPU performance. Numeric applications mostly benefit from a large number of computationally powerful cores. Servers typically benefit more if chip circuitry is used ...
Characterizing the efficiency of multicore and manycore processors for the solution of sparse linear systems

We analyze the efficiency of servers equipped with state-of-the-art general-purpose multicore processors as well as platforms based on accelerators such as graphics processing units (GPUs) and the Intel Xeon Phi. Following the proposal recently ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

PACT '13: Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

October 2013

422 pages

ISBN:9781479910212

Conference Chair:
Christian Fensch
University of Edinburgh, UK
,
General Chair:
Michael O'Boyle
University of Edinburgh, UK
,
Program Chairs:
André Seznec
INRIA Rennes, France
,
François Bodin
IRISA/CAPS Entreprise, France

Sponsors

IFIP WG 10.3: IFIP WG 10.3
IEEE TCCA: IEEE Computer Society Technical Committee on Computer Architecture
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE CS TCPP: IEEE Computer Society Technical Committee on Parallel Processing

Publisher

IEEE Press

Publication History

Published: 07 October 2013

Check for updates

Author Tags

Qualifiers

Research-article

Acceptance Rates

PACT '13 Paper Acceptance Rate 36 of 208 submissions, 17%;

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
305
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao LCui YYang YZhou XQiu TLi KBao Y(2023)Component-distinguishable Co-location and Resource Reclamation for High-throughput ComputingACM Transactions on Computer Systems10.1145/363000642:1-2(1-37)Online publication date: 18-Nov-2023
https://dl.acm.org/doi/10.1145/3630006
Zacarias FNishtala RCarpenter PPalesi MPalermo GGraves CArima E(2020)Contention-aware application performance prediction for disaggregated memory systemsProceedings of the 17th ACM International Conference on Computing Frontiers10.1145/3387902.3392625(49-59)Online publication date: 11-May-2020
https://dl.acm.org/doi/10.1145/3387902.3392625
Zhao LYang YZhang KZhou XQiu TLi KBao YBilas AMagoutis KMarkatos EKostic DSeltzer M(2020)RhythmProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387534(1-17)Online publication date: 15-Apr-2020
https://dl.acm.org/doi/10.1145/3342195.3387534
Xiang YYe CWang XLuo YWang Z(2019)EMBAProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337863(1-12)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337863
Zhao JCui HZhang YXue JFeng X(2018)Revisiting Loop Tiling for DatacentersProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205306(328-340)Online publication date: 12-Jun-2018
https://dl.acm.org/doi/10.1145/3205289.3205306
Wang LZhuang LChen JCui HLv FLiu YFeng X(2018)LazygraphACM SIGPLAN Notices10.1145/3200691.317850853:1(276-289)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3200691.3178508
Wang LZhuang LChen JCui HLv FLiu YFeng XKrall AGross T(2018)LazygraphProceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3178487.3178508(276-289)Online publication date: 10-Feb-2018
https://dl.acm.org/doi/10.1145/3178487.3178508
Oxley MJonardi EPasricha SMaciejewski ASiegel HBurns PKoenig G(2018)Rate-based thermal, power, and co-location aware resource management for heterogeneous data centersJournal of Parallel and Distributed Computing10.1016/j.jpdc.2017.04.015112:P2(126-139)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1016/j.jpdc.2017.04.015
Khorandi SSharifi M(2018)Non-clairvoyant online scheduling of synchronized jobs on virtual clustersThe Journal of Supercomputing10.1007/s11227-018-2262-474:6(2353-2384)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s11227-018-2262-4
Khorandi SGhiasvand SSharifi M(2017)Reducing Load Imbalance of Virtual Clusters via Reconfiguration and Adaptive Job SchedulingProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.60(992-999)Online publication date: 14-May-2017
https://dl.acm.org/doi/10.1109/CCGRID.2017.60
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten