More Web Proxy on the site http://driver.im/

tutorial

Dynamic Performance Profiling of Cloud Caches

Authors:

Trausti Saemundsson,

Hjortur Bjornsson,

Gregory Chockler,

Ymir VigfussonAuthors Info & Claims

SOCC '14: Proceedings of the ACM Symposium on Cloud Computing

Pages 1 - 14

https://doi.org/10.1145/2670979.2671007

Published: 03 November 2014 Publication History

Abstract

Large-scale in-memory object caches such as memcached are widely used to accelerate popular web sites and to reduce burden on backend databases. Yet current cache systems give cache operators limited information on what resources are required to optimally accommodate the present workload. This paper focuses on a key question for cache operators: how much total memory should be allocated to the in-memory cache tier to achieve desired performance?

We present our Mimir system: a lightweight online profiler that hooks into the replacement policy of each cache server and produces graphs of the overall cache hit rate as a function of memory size. The profiler enables cache operators to dynamically project the cost and performance impact from adding or removing memory resources within a distributed in-memory cache, allowing "what-if" questions about cache performance to be answered without laborious offline tuning. Internally, Mimir uses a novel lock-free algorithm and lookup filters for quickly and dynamically estimating hit rate of LRU caches.

Running Mimir as a profiler requires minimal changes to the cache server, thanks to a lean API. Our experiments show that Mimir produces dynamic hit rate curves with over 98% accuracy and 2--5% overhead on request latency and throughput when Mimir is run in tandem with memcached, suggesting online cache profiling can be a practical tool for improving provisioning of large caches.

References

[1]

Redis key-value store. http://redis.io.

[2]

G. Almási, C. Caşcaval, and D. A. Padua. Calculating stack distances efficiently. SIGPLAN Notices, 38(2 supplement):37--43, June 2002.

Digital Library

[3]

B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In SIGMETRICS '12, pages 53--64, 2012.

Digital Library

[4]

R. Azimi, L. Soares, M. Stumm, T. Walsh, and A. D. Brown. PATH: page access tracking to improve memory management. In ISMM '07, pages 31--42, 2007.

Digital Library

[5]

S. Bansal and D. S. Modha. CAR: Clock with Adaptive Replacement. pages 187--200, 2004.

Digital Library

[6]

L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2):78--101, 1966.

Digital Library

[7]

B. T. Bennett and V. J. Kruskal. LRU stack processing. IBM Journal of Research and Development, 19(4):353--357, July 1975.

Digital Library

[8]

E. Berg and E. Hagersten. StatCache: A probabilistic approach to efficient and accurate data locality analysis. In Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS '04, pages 20--27, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7803-8385-0.

Digital Library

[9]

N. G. Bronson, J. Casper, H. Chafi, and K. Olukotun. A practical concurrent binary search tree. SIGPLAN Not., 45 (5):257--268, Jan. 2010.

Digital Library

[10]

G. Chockler, G. Laden, and Y. Vigfusson. Data caching as a cloud service. In LADIS '10, pages 18--21, 2010.

Digital Library

[11]

G. Chockler, G. Laden, and Y. Vigfusson. Design and implementation of caching services in the cloud. IBM Journal of Research and Development, 55(6):9:1--9:11, 2011.

Digital Library

[12]

B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In SoCC '10, pages 143--154, 2010.

Digital Library

[13]

F. J. Corbato. Festschrift: In Honor of P. M. Morse, chapter A Paging Experiment with the Multics System, pages 217--228. MIT Press, 1969.

[14]

P. J. Denning. Working sets past and present. IEEE Transactions on Software Engineering, 6(1):64--84, Jan. 1980.

Digital Library

[15]

C. Ding and Y. Zhong. Predicting whole-program locality through reuse distance analysis. In PLDI '03, pages 245--257, 2003.

Digital Library

[16]

A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 401--414, Berkeley, CA, USA, 2014. USENIX Association. ISBN 978-1-931971-09-6.

Digital Library

[17]

M. R. Ebling, L. B. Mummert, and D. C. Steere. Overcoming the network bottleneck in mobile computing. In WMCSA '94, pages 34--36, 1994.

Digital Library

[18]

B. Fan, H. Lim, D. G. Andersen, and M. Kaminsky. Small cache, big effect: Provable load balancing for randomly partitioned cluster services. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC '11, pages 23:1--23:12, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0976-9.

Digital Library

[19]

B. Fan, D. G. Andersen, and M. Kaminsky. MemC3: Compact and concurrent MemCache with dumber caching and smarter hashing. In NSDI '13, pages 385--398, 2013.

Digital Library

[20]

L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw., 8(3):281--293, June 2000. ISSN 1063-6692.

Digital Library

[21]

B. Fitzpatrick. Distributed caching with memcached. Linux journal, (124):72--74, 2004.

Digital Library

[22]

S. Hart, E. Frachtenberg, and M. Berezecki. Predicting Memcached throughput using simulation and modeling. In Proceedings of the 2012 Symposium on Theory of Modeling and Simulation - DEVS Integrative M&S Symposium, TMS/DEVS '12, pages 40:1--40:8, San Diego, CA, USA, 2012. Society for Computer Simulation International. ISBN 978-1-61839-786-7.

Digital Library

[23]

M. Hines, A. Gordon, M. Silva, D. Da Silva, K. D. Ryu, and M. Ben-Yehuda. Applications know best: Performance-driven memory overcommit with Ginkgo. In CloudCom'11, pages 130--137, 2011.

Digital Library

[24]

Y.-J. Hong and M. Thottethodi. Understanding and mitigating the impact of load imbalance in the memory caching tier. In Proceedings of the 4th ACM Symposium on Cloud Computing, SOCC '13, pages 13:1--13:17, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2428-1.

Digital Library

[25]

Q. Huang, K. Birman, R. van Renesse, W. Lloyd, S. Kumar, and H. C. Li. An analysis of Facebook photo caching. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 167--181, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2388-8.

Digital Library

[26]

J. Hwang and T. Wood. Adaptive performance-aware distributed memory caching. In Proceedings of the 10th International Conference on Autonomic Computing (ICAC 13), pages 33--43, San Jose, CA, 2013. USENIX. ISBN 978-1-931971-02-7.

[27]

S. Jiang and X. Zhang. LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Performance Evaluation Review, 30(1):31--42, June 2002.

Digital Library

[28]

S. Jiang, F. Chen, and X. Zhang. CLOCK-Pro: an effective improvement of the CLOCK replacement. In ATEC'05, pages 35--35, 2005.

Digital Library

[29]

S. T. Jones, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Geiger: monitoring the buffer cache in a virtual machine environment. In ASPLOS XII, pages 14--24, 2006.

Digital Library

[30]

D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, STOC '97, pages 654--663, New York, NY, USA, 1997. ACM. ISBN 0-89791-888-6.

Digital Library

[31]

J. M. Kim, J. Choi, J. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references. In OSDI '00, pages 9--9, 2000.

Digital Library

[32]

H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 429--444, Seattle, WA, Apr. 2014. USENIX Association. ISBN 978-1-931971-09-6.

Digital Library

[33]

Y. Lu, T. Abdelzaher, C. Lu, and G. Tao. An Adaptive Control Framework for QoS Guarantees and its Application to Differentiated Caching Services.

[34]

R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, June 1970.

Digital Library

[35]

N. Megiddo and D. Modha. ARC: A self-tuning, low overhead replacement cache. In FAST '03, pages 115--130, 2003.

Digital Library

[36]

R. Nishtala, H. Fugal, S. Grimm, et al. Scaling Memcache at Facebook. In NSDI '13, pages 385--398, 2013.

Digital Library

[37]

R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed prefetching and caching. In SOSP '95, pages 79--95, 1995.

Digital Library

[38]

M. Rajashekhar and Y. Yue. Caching with twemcache. http://engineering.twitter.com/2012/07/caching-with-twemcache.html.

[39]

H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Transactions on Computers, 41(9): 1054--1068, 1992.

Digital Library

[40]

A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in DB2. In VLDB '06, pages 1081--1092, 2006.

Digital Library

[41]

G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Cache Partitioning for Simultaneous Multithreading Systems. In PDCS'01, pages 116--127, 2001.

[42]

D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. In ASPLOS XIV, pages 121--132, 2009.

Digital Library

[43]

V. Venkataramani, Z. Amsden, N. Bronson, G. Cabrera III, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, J. Hoon, S. Kulkarni, N. Lawrence, M. Marchukov, D. Petrov, and L. Puzar. TAO: How Facebook serves the social graph. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 791--792, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1247-9.

Digital Library

[44]

C. A. Waldspurger. Memory resource management in VMware ESX server. In OSDI '02, 2002.

Digital Library

[45]

A. Wiggins and J. Langstone. Enhancing the scalability of memcached. http://software.intel.com/en-us/articles/enhancing-the-scalability-of-memcached-0.

[46]

T. Yang, E. D. Berger, S. F. Kaplan, and J. E. B. Moss. CRAMM: virtual memory support for garbage-collected applications. In OSDI '06, pages 103--116, 2006.

Digital Library

[47]

W. Zhao and Z. Wang. Dynamic memory balancing for virtual machines. In VEE '09, pages 21--30, 2009.

Digital Library

[48]

W. Zhao, X. Jin, Z. Wang, X. Wang, Y. Luo, and X. Li. Low cost working set size tracking. In ATC'11, pages 17--23, 2011.

Digital Library

[49]

P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In ASPLOS XI, pages 177--188, 2004.

Digital Library

[50]

T. Zhu, A. Gandhi, M. Harchol-Balter, and M. A. Kozuch. Saving cash by using less cache. In Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, Hot-Cloud'12, Berkeley, CA, USA, 2012. USENIX Association.

Digital Library

Cited By

Park HQiu ZGanger GAmvrosiadis GWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)Reducing Cross-Cloud/Region Costs with the Auto-Configuring MACARON CacheProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695972(347-368)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695972
Sultan SShakiba KLee AChen PStumm M(2024)TTLs Matter: Efficient Cache Sizing with TTL-Aware Miss Ratio Curves and Working Set SizesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650066(387-404)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650066
Li YHu HLei CZhou XQian W(2024)Hill-Cache: Adaptive Integration of Recency and Frequency in Caching with Hill-Climbing2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00302(3947-3960)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00302
Show More Cited By

Index Terms

Dynamic Performance Profiling of Cloud Caches
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

Modeling LRU cache with invalidation

Least Recently Used (LRU) is a very popular caching replacement policy. It is very easy to implement and offers good performance, especially when data requests are temporally correlated, as in the case of web traffic.When the data content can change ...
An Empirical Analysis on Memcached's Replacement Policies
MEMSYS '23: Proceedings of the International Symposium on Memory Systems

The performance of large-scale web services heavily relies on the hit ratio of the key-value caches. One core component of a high-performance key-value cache is the replacement policy. A right replacement policy can help the caching system achieve a ...
An Associativity Threshold Phenomenon in Set-Associative Caches
SPAA '23: Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures

In an α-way set-associative cache, the cache is partitioned into disjoint sets of size α, and each item can only be cached in one set, typically selected via a hash function. Set-associative caches are widely used and have many benefits, e.g., in terms ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SOCC '14: Proceedings of the ACM Symposium on Cloud Computing

November 2014

383 pages

ISBN:9781450332521

DOI:10.1145/2670979

Conference Chairs:
Ed Lazowska,
Doug Terry,
Program Chairs:
Remzi H. Arpaci-Dusseau,
Johannes Gehrke

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

SOCC '14

Sponsor:

SOCC '14: ACM Symposium on Cloud Computing

November 3 - 5, 2014

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
694
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)2

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Park HQiu ZGanger GAmvrosiadis GWitchel EArpaci-Dusseau ARossbach CKeeton K(2024)Reducing Cross-Cloud/Region Costs with the Auto-Configuring MACARON CacheProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695972(347-368)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3694715.3695972
Sultan SShakiba KLee AChen PStumm M(2024)TTLs Matter: Efficient Cache Sizing with TTL-Aware Miss Ratio Curves and Working Set SizesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650066(387-404)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3650066
Li YHu HLei CZhou XQian W(2024)Hill-Cache: Adaptive Integration of Recency and Frequency in Caching with Hill-Climbing2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00302(3947-3960)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00302
Marini LCarra D(2024)Can Miss Ratio Curves Predict Cache Performance?2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00018(101-106)Online publication date: 24-Sep-2024
https://doi.org/10.1109/IC2E61754.2024.00018
Xiao JXiang YWang XLuo YPimentel AWang ZGallivan KNikolopoulos DBeivide RGallopoulos E(2023)FLORIA: A Fast and Featherlight Approach for Predicting Cache PerformanceProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593740(25-36)Online publication date: 21-Jun-2023
https://dl.acm.org/doi/10.1145/3577193.3593740
Kurniawan DWang RZulkifli KWiranata FBent JVigfusson YGunawi HAamodt TJerger NSwift M(2023)EVStore: Storage and Caching Capabilities for Scaling Embedding Tables in Deep Recommendation SystemsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575718(281-294)Online publication date: 27-Jan-2023
https://dl.acm.org/doi/10.1145/3575693.3575718
Bender MDeLayo DKuszmaul BKuszmaul WWest EAgrawal KShun J(2023)Increment - and - Freeze: Every Cache, Everywhere, All of the TimeProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591085(129-139)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3558481.3591085
Wang YYang JWang Z(2023)Multi-Tenant In-Memory Key-Value Cache Partitioning Using Efficient Random Sampling-Based LRU ModelIEEE Transactions on Cloud Computing10.1109/TCC.2023.330088911:4(3601-3618)Online publication date: Oct-2023
https://doi.org/10.1109/TCC.2023.3300889
Seyri APan AVamanan BLippautz MChisnall D(2022)MemSweeper: virtualizing cluster memory management for high memory utilization and isolationProceedings of the 2022 ACM SIGPLAN International Symposium on Memory Management10.1145/3520263.3534651(15-28)Online publication date: 14-Jun-2022
https://dl.acm.org/doi/10.1145/3520263.3534651
Lim SPark D(2022)Efficient Stack Distance Approximation Based on Workload CharacteristicsIEEE Access10.1109/ACCESS.2022.318032710(59792-59805)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3180327
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents