article

Free access

Cache performance of operating system and multiprogramming workloads

Editor: Anita K. Jones Authors:

Anant Agarwal,

John Hennessy,

Mark HorowitzAuthors Info & Claims

ACM Transactions on Computer Systems (TOCS), Volume 6, Issue 4

Pages 393 - 431

https://doi.org/10.1145/48012.48037

Published: 01 November 1988 Publication History

PDF eReader

Abstract

Large caches are necessary in current high-performance computer systems to provide the required high memory bandwidth. Because a small decrease in cache performance can result in significant system performance degradation, accurately characterizing the performance of large caches is important. Although measurements on actual systems have shown that operating systems and multiprogramming can affect cache performance, previous studies have not focused on these effects. We have developed a program tracing technique called ATUM (Address Tracing Using Microcode) that captures realistic traces of multitasking workloads including the operating system. Examining cache behavior using these traces from a VAX processor shows that both the operating system and multiprogramming activity significantly degrade cache performance, with an even greater proportional impact on large caches. From a careful analysis of the causes of this degradation, we explore various techniques to reduce this loss. While seemingly little can be done to mitigate the effect of system references, multitasking cache miss activity can be substantially reduced with small hardware additions.

References

[1]

AGARWAL, A. Analysis of cache performance for operating systems and multiprogramming. Ph.D. dissertation, Stanford Univ., May 1987. Available as Computer Systems Lab. Rep. TR-87- 332,

Crossref

Google Scholar

[2]

AGARWAL, A., CHOW, P., HOROWITZ, M., ACKEN, J., SALZ, A., AND HENNESSY, J. On-chip instruction caches for high performance processors. In Proceedings of the Conference on Advanced Research in VLSI (Stanford, Cal., March 1987). MIT Press, Boston, Mass, 1987, pp. 1-24.

Google Scholar

[3]

AGARWAL, i., HOROWITZ, M., AND HENNESSY, J. An analytical cache model. Computer Systems Lab. Rep. 86-304, Stanford Univ., Sept. 1986.

Google Scholar

[4]

AGARWAL, A., SITES, R. L., AND HOROWITZ, M. ATUM: A new technique for capturing address traces using microcode. In Proceedings of the 13th Annual Symposium on Computer Architecture (June 1986). IEEE, New York, 1986, pp. 119-127.

Crossref

Google Scholar

[5]

ALEXANDER, C., KESHLEAR, W., COOPER, F., AND BRIGGS, F. Cache memory performance in a Unix environment. Comput. Architecture News 14, 3 (June 1986), 41-70.

Crossref

Google Scholar

[6]

CHO, J., SMITH, A. J., AND SACHS, H. The memory architecture and the cache and memory management unit for the Fairchild CLIPPER processor. Computer Science Div. (EECS) UCB/CSD 86/289, Univ. of California at Berkeley, April 1986.

Crossref

Google Scholar

[7]

CLARK, D.W. Cache performance in the VAX-11/780. ACM Trans. Comput. Syst. 1, 1 (Feb. 1983), 24-37.

Crossref

Google Scholar

[8]

CLARK, D. W., AND EMER, J.S. Performance of the VAX-11/780 translation buffer: Simulation and measurement. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 31-62.

Crossref

Google Scholar

[9]

DENNING, P.J. The working set model for program behavior. Commun. ACM 11, 5 (May 1968), 323-333.

Crossref

Google Scholar

[10]

EASTON, M. C. Computation of cold-start miss ratios. IEEE Trans. Comput. C-27, 5 (May 1978).

Google Scholar

[11]

EASTON, M. C., AND FAGIN, R. Cold-start vs. warm-start miss ratios. Commun. ACM 21, 10 (Oct. 1978), 866-872.

Crossref

Google Scholar

[12]

Fu, J., KELLER, J. B., AND HADUCH, K.J. Aspects of the VAX 8800 C box design. Digital Tech. J. 4 (Feb. 1987), 41-51.

Google Scholar

[13]

GOODMAN, J. R. Cache memory optimization to reduce processor/memory traffic. Dept. of Computer Sciences, Univ. of Wisconsin-Madison, 1985.

Google Scholar

[14]

HAIKALA, I.J. Cache hit ratios with geometric task switch intervals. In Proceedings o{ the 11th Annual Symposium on Computer Architecture (June 1984). IEEE, New York, 1984, pp. 364-371.

Crossref

Google Scholar

[15]

HENNESSY, J.L. VLSI processor architecture. IEEE Trans. Comput. C-33, 12 (Dec. 1984).

Google Scholar

[16]

HILL, M. D. ET AL. Design decisions in SPUR. Computer 19, 10 (Nov. 1986), 8-22.

Crossref

Google Scholar

[17]

HOROWITZ, M., AND CHOW, P. The MIPS-X microprocessor. In Proceedings o{ IEEE WESCON 85 (San Francisco, 1985). IEEE, New York, 1985.

Google Scholar

[18]

KAPLAN, K. R., AND WINDER, R. O. Cache-based computer systems. Comput. 6, 3 (March 1973), 30-36.

Google Scholar

[19]

KOBAYASHI, M. An empirical study of task switching locality in MVS. IEEE Trans. Comput. C-35, 8 (Aug. 1986), 720-731.

Crossref

Google Scholar

[20]

LAHA, S., PATEL, J. H., AND IYER, R.K. Accurate low-cost methods for performance evaluation of cache memory systems. Coordinated Science Laboratory, Univ. of Illinois, 1986.

Google Scholar

[21]

MOUSSOURIS, J. ET AL. A CMOS RISC processor with integrated system functions. In COMP- CON (San Francisco, Mar. 1986). IEEE, New York, March 1986, pp. 126-131.

Google Scholar

[22]

PA~'rERSON, D. A., AND SEQUIN, C.H. Design considerations for single-chip computers of the future. IEEE Trans. Comput. C-29, 2 (Feb. 1980), 108-116.

Google Scholar

[23]

PEUTO, B. L., AND SHUSTEK, L.J. An instruction timing model of CPU performance. In Proceedings of the 4th Annual Symposium on Computer Architecture (Mar. 1977). IEEE, New York, 1977, pp. 165-178.

Crossref

Google Scholar

[24]

RADIN, G. The 801 minicomputer. In Proceedings of the ACM Symposium on Architectural Support for Programming Languages and Operating Systems (Palo Alto, Cal., Mar. 1982). ACM, New York, 1982, pp. 39-47.

Crossref

Google Scholar

[25]

SALZ, A., AGARWAL, A., AND CHOW, P. MIPS-X: The external interface. Computer Systems Laboratory, TR 87-339, Stanford Univ., April 1987.

Crossref

Google Scholar

[26]

SITES, R. L., AND AGARWAL, A. Multiprocessor cache analysis using ATUM. In Proceedings of the 15th International Symposium on Computer Architecture (June 1988). IEEE, New York, 1988, pp. 186-195.

Crossref

Google Scholar

[27]

SMITH, A.J. Cache memories. ACM Comput. Suru. 14, 3 (Sept. 1982), 473-530.

Crossref

Google Scholar

[28]

SMITH, A. J. Cache evaluation and the impact of workload choice. In Proceedings of the 12th Annual Symposium on Computer Architecture (June 1985). IEEE, New York, 1985, pp. 64-73.

Crossref

Google Scholar

[29]

SMITH, J. E., AND GOODMAN, J.R. A study of instruction cache organizations and replacement policies, in Proceedings of the lOth annual symposium on computer architecture (June 1983). IEEE, New York, 1983, pp. 132-137.

Crossref

Google Scholar

[30]

STRECKER, W.D. Cache memories for PDP-11 Family of Computers. In Proceedings of the 3rd Annual Symposium on Computer Architecture (Jan. 1976). IEEE, New York, 1976, pp. 155-158.

Crossref

Google Scholar

[31]

STRECKER, W.D. Transient behavior of cache memories. ACM Trans. Comput. Syst. 1, 4 (Nov. 1983), 281-293.

Crossref

Google Scholar

[32]

THAKKAR, S. S., AND KNOWLES, A. E. A high-performance memory management scheme. Computer 19, 5 (May 1986), 8-22.

Crossref

Google Scholar

[33]

Vax-11 Architecture Reference Manual. Form EK-VARAR-RM-001, Digital Equipment Corp., Bedford, Mass., 1982.

Google Scholar

Cited By

View all

Shi ZGao JGuan X(2024)Machine Learning-Driven GCC Loop Unrolling Optimization: Compiler Performance Enhancement Strategy Based on XGBoostJournal of Circuits, Systems and Computers10.1142/S0218126625500355Online publication date: 23-Sep-2024
https://doi.org/10.1142/S0218126625500355
Weston KJohnson AJanfaza VMahmud FMuzahid A(2024)Customizing Cache Indexing Through Entropy Estimation2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00041(451-463)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00041
Zhang YChao FHamidouche WDeforges O(2023)PAV-SOD: A New Task towards Panoramic Audiovisual Saliency DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356526719:3(1-26)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3565267
Show More Cited By

Recommendations

Optimizing instruction cache performance for operating system intensive workloads
HPCA '95: Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture

High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been ...
Improving Cache Performance of Network Intensive Workloads
ICPP '01: Proceedings of the International Conference on Parallel Processing

Abstract: The performance of servers for net work-intensive workloads such as web services and online transaction Processing (ICPP '01) applications depends on the effective utilization of the processor caches. A detailed analysis of the cache space ...
An intelligent cache system with hardware prefetching for high performance

We present a high performance cache structure with a hardware prefetching mechanism that enhances exploitation of spatial and temporal locality. The proposed cache, which we call a selective-mode intelligent (SMI) cache, consists of three parts: a ...

Reviews

Reviewer: Wilfred J. Hansen

Cache memory between main memory and the processor increases both performance and hardware cost. Analysis of the trade-offs for proposed hardware is difficult and is usually done by simulation driven from some imagined or measured workload. In this paper, the workload is measured on a VAX using microcode modifications called ATUM, which extract samples of several hundred thousand references. Although it lacks careful definition of the hardware designs considered, the paper does an excellent job of describing analysis techniques and wielding them to study various trade-offs, especially in the presence of system references and multiprogramming. These are both demonstrated to reduce performance through increased working set size and reduced reference locality. The best trade-off is reported to result when process identifiers are associated with each cache entry to distinguish the multiple address spaces. The many measures defined and analytic techniques described should make this paper a foundation for further work. One reservation is that the programs measured are heavily representative of computer engineering: about half are compilations, but the rest are diagnostics, hardware simulations, and hardware design tools. Especially important omissions are input/output-intensive and interactive programs, both of which would increase the diversity of references and thus further reduce performance. Even so, replication of this work will eventually lead to a robust basis for future designs.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems

ACM Transactions on Computer Systems Volume 6, Issue 4

Nov. 1988

101 pages

ISSN:0734-2071

EISSN:1557-7333

DOI:10.1145/48012

Editor:
Anita K. Jones

Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1988

Published in TOCS Volume 6, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

278
Total Citations
View Citations
2,457
Total Downloads

Downloads (Last 12 months)237
Downloads (Last 6 weeks)43

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Shi ZGao JGuan X(2024)Machine Learning-Driven GCC Loop Unrolling Optimization: Compiler Performance Enhancement Strategy Based on XGBoostJournal of Circuits, Systems and Computers10.1142/S0218126625500355Online publication date: 23-Sep-2024
https://doi.org/10.1142/S0218126625500355
Weston KJohnson AJanfaza VMahmud FMuzahid A(2024)Customizing Cache Indexing Through Entropy Estimation2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00041(451-463)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00041
Zhang YChao FHamidouche WDeforges O(2023)PAV-SOD: A New Task towards Panoramic Audiovisual Saliency DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356526719:3(1-26)Online publication date: 25-Feb-2023
https://dl.acm.org/doi/10.1145/3565267
Le HYamada THonda YSakamoto TMatsuo RYamazaki TAraki KYokota H(2023)Methods for Analyzing Medical-Order Sequence Variants in Sequential Pattern Mining for Electronic Medical Record SystemsACM Transactions on Computing for Healthcare10.1145/35618254:1(1-28)Online publication date: 30-Mar-2023
https://dl.acm.org/doi/10.1145/3561825
Bender MDas RFarach-Colton MTagliavini GAgrawal KShun J(2023)An Associativity Threshold Phenomenon in Set-Associative CachesProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591084(117-127)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591084
Navarro-Torres AAlastruey-Benedé JIbáñez PViñals-Yúfera V(2023)BALANCER: bandwidth allocation and cache partitioning for multicore processorsThe Journal of Supercomputing10.1007/s11227-023-05070-079:9(10252-10276)Online publication date: 4-Feb-2023
https://dl.acm.org/doi/10.1007/s11227-023-05070-0
LIU JEGAWA RTAKIZAWA H(2022)A Conflict-Aware Capacity Control Mechanism for Deep Cache HierarchyIEICE Transactions on Information and Systems10.1587/transinf.2021EDP7201E105.D:6(1150-1163)Online publication date: 1-Jun-2022
https://doi.org/10.1587/transinf.2021EDP7201
Ho SProtzenko J(2022)Aeneas: Rust verification by functional translationProceedings of the ACM on Programming Languages10.1145/35476476:ICFP(711-741)Online publication date: 31-Aug-2022
https://dl.acm.org/doi/10.1145/3547647
Vasilenko EVazou NBarthe G(2022)Safe couplings: coupled refinement typesProceedings of the ACM on Programming Languages10.1145/35476436:ICFP(596-624)Online publication date: 31-Aug-2022
https://dl.acm.org/doi/10.1145/3547643
Balsamo SMarin AMitrani IFeng DBecker SHerbst NLeitner PPapadopoulos A(2022)A Mixed PS-FCFS Policy for CPU Intensive WorkloadsProceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering10.1145/3489525.3511678(199-210)Online publication date: 9-Apr-2022
https://dl.acm.org/doi/10.1145/3489525.3511678
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Optimizing instruction cache performance for operating system intensive workloads

Improving Cache Performance of Network Intensive Workloads

An intelligent cache system with hardware prefetching for high performance

Reviews

Access critical reviews of Computing literature here