[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Cache performance of operating system and multiprogramming workloads

Published: 01 November 1988 Publication History

Abstract

Large caches are necessary in current high-performance computer systems to provide the required high memory bandwidth. Because a small decrease in cache performance can result in significant system performance degradation, accurately characterizing the performance of large caches is important. Although measurements on actual systems have shown that operating systems and multiprogramming can affect cache performance, previous studies have not focused on these effects. We have developed a program tracing technique called ATUM (Address Tracing Using Microcode) that captures realistic traces of multitasking workloads including the operating system. Examining cache behavior using these traces from a VAX processor shows that both the operating system and multiprogramming activity significantly degrade cache performance, with an even greater proportional impact on large caches. From a careful analysis of the causes of this degradation, we explore various techniques to reduce this loss. While seemingly little can be done to mitigate the effect of system references, multitasking cache miss activity can be substantially reduced with small hardware additions.

References

[1]
AGARWAL, A. Analysis of cache performance for operating systems and multiprogramming. Ph.D. dissertation, Stanford Univ., May 1987. Available as Computer Systems Lab. Rep. TR-87- 332,
[2]
AGARWAL, A., CHOW, P., HOROWITZ, M., ACKEN, J., SALZ, A., AND HENNESSY, J. On-chip instruction caches for high performance processors. In Proceedings of the Conference on Advanced Research in VLSI (Stanford, Cal., March 1987). MIT Press, Boston, Mass, 1987, pp. 1-24.
[3]
AGARWAL, i., HOROWITZ, M., AND HENNESSY, J. An analytical cache model. Computer Systems Lab. Rep. 86-304, Stanford Univ., Sept. 1986.
[4]
AGARWAL, A., SITES, R. L., AND HOROWITZ, M. ATUM: A new technique for capturing address traces using microcode. In Proceedings of the 13th Annual Symposium on Computer Architecture (June 1986). IEEE, New York, 1986, pp. 119-127.
[5]
ALEXANDER, C., KESHLEAR, W., COOPER, F., AND BRIGGS, F. Cache memory performance in a Unix environment. Comput. Architecture News 14, 3 (June 1986), 41-70.
[6]
CHO, J., SMITH, A. J., AND SACHS, H. The memory architecture and the cache and memory management unit for the Fairchild CLIPPER processor. Computer Science Div. (EECS) UCB/CSD 86/289, Univ. of California at Berkeley, April 1986.
[7]
CLARK, D.W. Cache performance in the VAX-11/780. ACM Trans. Comput. Syst. 1, 1 (Feb. 1983), 24-37.
[8]
CLARK, D. W., AND EMER, J.S. Performance of the VAX-11/780 translation buffer: Simulation and measurement. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 31-62.
[9]
DENNING, P.J. The working set model for program behavior. Commun. ACM 11, 5 (May 1968), 323-333.
[10]
EASTON, M. C. Computation of cold-start miss ratios. IEEE Trans. Comput. C-27, 5 (May 1978).
[11]
EASTON, M. C., AND FAGIN, R. Cold-start vs. warm-start miss ratios. Commun. ACM 21, 10 (Oct. 1978), 866-872.
[12]
Fu, J., KELLER, J. B., AND HADUCH, K.J. Aspects of the VAX 8800 C box design. Digital Tech. J. 4 (Feb. 1987), 41-51.
[13]
GOODMAN, J. R. Cache memory optimization to reduce processor/memory traffic. Dept. of Computer Sciences, Univ. of Wisconsin-Madison, 1985.
[14]
HAIKALA, I.J. Cache hit ratios with geometric task switch intervals. In Proceedings o{ the 11th Annual Symposium on Computer Architecture (June 1984). IEEE, New York, 1984, pp. 364-371.
[15]
HENNESSY, J.L. VLSI processor architecture. IEEE Trans. Comput. C-33, 12 (Dec. 1984).
[16]
HILL, M. D. ET AL. Design decisions in SPUR. Computer 19, 10 (Nov. 1986), 8-22.
[17]
HOROWITZ, M., AND CHOW, P. The MIPS-X microprocessor. In Proceedings o{ IEEE WESCON 85 (San Francisco, 1985). IEEE, New York, 1985.
[18]
KAPLAN, K. R., AND WINDER, R. O. Cache-based computer systems. Comput. 6, 3 (March 1973), 30-36.
[19]
KOBAYASHI, M. An empirical study of task switching locality in MVS. IEEE Trans. Comput. C-35, 8 (Aug. 1986), 720-731.
[20]
LAHA, S., PATEL, J. H., AND IYER, R.K. Accurate low-cost methods for performance evaluation of cache memory systems. Coordinated Science Laboratory, Univ. of Illinois, 1986.
[21]
MOUSSOURIS, J. ET AL. A CMOS RISC processor with integrated system functions. In COMP- CON (San Francisco, Mar. 1986). IEEE, New York, March 1986, pp. 126-131.
[22]
PA~'rERSON, D. A., AND SEQUIN, C.H. Design considerations for single-chip computers of the future. IEEE Trans. Comput. C-29, 2 (Feb. 1980), 108-116.
[23]
PEUTO, B. L., AND SHUSTEK, L.J. An instruction timing model of CPU performance. In Proceedings of the 4th Annual Symposium on Computer Architecture (Mar. 1977). IEEE, New York, 1977, pp. 165-178.
[24]
RADIN, G. The 801 minicomputer. In Proceedings of the ACM Symposium on Architectural Support for Programming Languages and Operating Systems (Palo Alto, Cal., Mar. 1982). ACM, New York, 1982, pp. 39-47.
[25]
SALZ, A., AGARWAL, A., AND CHOW, P. MIPS-X: The external interface. Computer Systems Laboratory, TR 87-339, Stanford Univ., April 1987.
[26]
SITES, R. L., AND AGARWAL, A. Multiprocessor cache analysis using ATUM. In Proceedings of the 15th International Symposium on Computer Architecture (June 1988). IEEE, New York, 1988, pp. 186-195.
[27]
SMITH, A.J. Cache memories. ACM Comput. Suru. 14, 3 (Sept. 1982), 473-530.
[28]
SMITH, A. J. Cache evaluation and the impact of workload choice. In Proceedings of the 12th Annual Symposium on Computer Architecture (June 1985). IEEE, New York, 1985, pp. 64-73.
[29]
SMITH, J. E., AND GOODMAN, J.R. A study of instruction cache organizations and replacement policies, in Proceedings of the lOth annual symposium on computer architecture (June 1983). IEEE, New York, 1983, pp. 132-137.
[30]
STRECKER, W.D. Cache memories for PDP-11 Family of Computers. In Proceedings of the 3rd Annual Symposium on Computer Architecture (Jan. 1976). IEEE, New York, 1976, pp. 155-158.
[31]
STRECKER, W.D. Transient behavior of cache memories. ACM Trans. Comput. Syst. 1, 4 (Nov. 1983), 281-293.
[32]
THAKKAR, S. S., AND KNOWLES, A. E. A high-performance memory management scheme. Computer 19, 5 (May 1986), 8-22.
[33]
Vax-11 Architecture Reference Manual. Form EK-VARAR-RM-001, Digital Equipment Corp., Bedford, Mass., 1982.

Cited By

View all
  • (2024)Machine Learning-Driven GCC Loop Unrolling Optimization: Compiler Performance Enhancement Strategy Based on XGBoostJournal of Circuits, Systems and Computers10.1142/S0218126625500355Online publication date: 23-Sep-2024
  • (2024)Customizing Cache Indexing Through Entropy Estimation2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00041(451-463)Online publication date: 2-Nov-2024
  • (2023)PAV-SOD: A New Task towards Panoramic Audiovisual Saliency DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356526719:3(1-26)Online publication date: 25-Feb-2023
  • Show More Cited By

Recommendations

Reviews

Wilfred J. Hansen

Cache memory between main memory and the processor increases both performance and hardware cost. Analysis of the trade-offs for proposed hardware is difficult and is usually done by simulation driven from some imagined or measured workload. In this paper, the workload is measured on a VAX using microcode modifications called ATUM, which extract samples of several hundred thousand references. Although it lacks careful definition of the hardware designs considered, the paper does an excellent job of describing analysis techniques and wielding them to study various trade-offs, especially in the presence of system references and multiprogramming. These are both demonstrated to reduce performance through increased working set size and reduced reference locality. The best trade-off is reported to result when process identifiers are associated with each cache entry to distinguish the multiple address spaces. The many measures defined and analytic techniques described should make this paper a foundation for further work. One reservation is that the programs measured are heavily representative of computer engineering: about half are compilations, but the rest are diagnostics, hardware simulations, and hardware design tools. Especially important omissions are input/output-intensive and interactive programs, both of which would increase the diversity of references and thus further reduce performance. Even so, replication of this work will eventually lead to a robust basis for future designs.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 6, Issue 4
Nov. 1988
101 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/48012
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 1988
Published in TOCS Volume 6, Issue 4

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)237
  • Downloads (Last 6 weeks)43
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Machine Learning-Driven GCC Loop Unrolling Optimization: Compiler Performance Enhancement Strategy Based on XGBoostJournal of Circuits, Systems and Computers10.1142/S0218126625500355Online publication date: 23-Sep-2024
  • (2024)Customizing Cache Indexing Through Entropy Estimation2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00041(451-463)Online publication date: 2-Nov-2024
  • (2023)PAV-SOD: A New Task towards Panoramic Audiovisual Saliency DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/356526719:3(1-26)Online publication date: 25-Feb-2023
  • (2023)Methods for Analyzing Medical-Order Sequence Variants in Sequential Pattern Mining for Electronic Medical Record SystemsACM Transactions on Computing for Healthcare10.1145/35618254:1(1-28)Online publication date: 30-Mar-2023
  • (2023)An Associativity Threshold Phenomenon in Set-Associative CachesProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591084(117-127)Online publication date: 17-Jun-2023
  • (2023)BALANCER: bandwidth allocation and cache partitioning for multicore processorsThe Journal of Supercomputing10.1007/s11227-023-05070-079:9(10252-10276)Online publication date: 4-Feb-2023
  • (2022)A Conflict-Aware Capacity Control Mechanism for Deep Cache HierarchyIEICE Transactions on Information and Systems10.1587/transinf.2021EDP7201E105.D:6(1150-1163)Online publication date: 1-Jun-2022
  • (2022)Aeneas: Rust verification by functional translationProceedings of the ACM on Programming Languages10.1145/35476476:ICFP(711-741)Online publication date: 31-Aug-2022
  • (2022)Safe couplings: coupled refinement typesProceedings of the ACM on Programming Languages10.1145/35476436:ICFP(596-624)Online publication date: 31-Aug-2022
  • (2022)A Mixed PS-FCFS Policy for CPU Intensive WorkloadsProceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering10.1145/3489525.3511678(199-210)Online publication date: 9-Apr-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media