[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2670979.2671007acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
tutorial

Dynamic Performance Profiling of Cloud Caches

Published: 03 November 2014 Publication History

Abstract

Large-scale in-memory object caches such as memcached are widely used to accelerate popular web sites and to reduce burden on backend databases. Yet current cache systems give cache operators limited information on what resources are required to optimally accommodate the present workload. This paper focuses on a key question for cache operators: how much total memory should be allocated to the in-memory cache tier to achieve desired performance?
We present our Mimir system: a lightweight online profiler that hooks into the replacement policy of each cache server and produces graphs of the overall cache hit rate as a function of memory size. The profiler enables cache operators to dynamically project the cost and performance impact from adding or removing memory resources within a distributed in-memory cache, allowing "what-if" questions about cache performance to be answered without laborious offline tuning. Internally, Mimir uses a novel lock-free algorithm and lookup filters for quickly and dynamically estimating hit rate of LRU caches.
Running Mimir as a profiler requires minimal changes to the cache server, thanks to a lean API. Our experiments show that Mimir produces dynamic hit rate curves with over 98% accuracy and 2--5% overhead on request latency and throughput when Mimir is run in tandem with memcached, suggesting online cache profiling can be a practical tool for improving provisioning of large caches.

References

[1]
Redis key-value store. http://redis.io.
[2]
G. Almási, C. Caşcaval, and D. A. Padua. Calculating stack distances efficiently. SIGPLAN Notices, 38(2 supplement):37--43, June 2002.
[3]
B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny. Workload analysis of a large-scale key-value store. In SIGMETRICS '12, pages 53--64, 2012.
[4]
R. Azimi, L. Soares, M. Stumm, T. Walsh, and A. D. Brown. PATH: page access tracking to improve memory management. In ISMM '07, pages 31--42, 2007.
[5]
S. Bansal and D. S. Modha. CAR: Clock with Adaptive Replacement. pages 187--200, 2004.
[6]
L. A. Belady. A study of replacement algorithms for a virtual-storage computer. IBM Systems Journal, 5(2):78--101, 1966.
[7]
B. T. Bennett and V. J. Kruskal. LRU stack processing. IBM Journal of Research and Development, 19(4):353--357, July 1975.
[8]
E. Berg and E. Hagersten. StatCache: A probabilistic approach to efficient and accurate data locality analysis. In Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS '04, pages 20--27, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7803-8385-0.
[9]
N. G. Bronson, J. Casper, H. Chafi, and K. Olukotun. A practical concurrent binary search tree. SIGPLAN Not., 45 (5):257--268, Jan. 2010.
[10]
G. Chockler, G. Laden, and Y. Vigfusson. Data caching as a cloud service. In LADIS '10, pages 18--21, 2010.
[11]
G. Chockler, G. Laden, and Y. Vigfusson. Design and implementation of caching services in the cloud. IBM Journal of Research and Development, 55(6):9:1--9:11, 2011.
[12]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In SoCC '10, pages 143--154, 2010.
[13]
F. J. Corbato. Festschrift: In Honor of P. M. Morse, chapter A Paging Experiment with the Multics System, pages 217--228. MIT Press, 1969.
[14]
P. J. Denning. Working sets past and present. IEEE Transactions on Software Engineering, 6(1):64--84, Jan. 1980.
[15]
C. Ding and Y. Zhong. Predicting whole-program locality through reuse distance analysis. In PLDI '03, pages 245--257, 2003.
[16]
A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 401--414, Berkeley, CA, USA, 2014. USENIX Association. ISBN 978-1-931971-09-6.
[17]
M. R. Ebling, L. B. Mummert, and D. C. Steere. Overcoming the network bottleneck in mobile computing. In WMCSA '94, pages 34--36, 1994.
[18]
B. Fan, H. Lim, D. G. Andersen, and M. Kaminsky. Small cache, big effect: Provable load balancing for randomly partitioned cluster services. In Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC '11, pages 23:1--23:12, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0976-9.
[19]
B. Fan, D. G. Andersen, and M. Kaminsky. MemC3: Compact and concurrent MemCache with dumber caching and smarter hashing. In NSDI '13, pages 385--398, 2013.
[20]
L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw., 8(3):281--293, June 2000. ISSN 1063-6692.
[21]
B. Fitzpatrick. Distributed caching with memcached. Linux journal, (124):72--74, 2004.
[22]
S. Hart, E. Frachtenberg, and M. Berezecki. Predicting Memcached throughput using simulation and modeling. In Proceedings of the 2012 Symposium on Theory of Modeling and Simulation - DEVS Integrative M&S Symposium, TMS/DEVS '12, pages 40:1--40:8, San Diego, CA, USA, 2012. Society for Computer Simulation International. ISBN 978-1-61839-786-7.
[23]
M. Hines, A. Gordon, M. Silva, D. Da Silva, K. D. Ryu, and M. Ben-Yehuda. Applications know best: Performance-driven memory overcommit with Ginkgo. In CloudCom'11, pages 130--137, 2011.
[24]
Y.-J. Hong and M. Thottethodi. Understanding and mitigating the impact of load imbalance in the memory caching tier. In Proceedings of the 4th ACM Symposium on Cloud Computing, SOCC '13, pages 13:1--13:17, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2428-1.
[25]
Q. Huang, K. Birman, R. van Renesse, W. Lloyd, S. Kumar, and H. C. Li. An analysis of Facebook photo caching. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, SOSP '13, pages 167--181, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-2388-8.
[26]
J. Hwang and T. Wood. Adaptive performance-aware distributed memory caching. In Proceedings of the 10th International Conference on Autonomic Computing (ICAC 13), pages 33--43, San Jose, CA, 2013. USENIX. ISBN 978-1-931971-02-7.
[27]
S. Jiang and X. Zhang. LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Performance Evaluation Review, 30(1):31--42, June 2002.
[28]
S. Jiang, F. Chen, and X. Zhang. CLOCK-Pro: an effective improvement of the CLOCK replacement. In ATEC'05, pages 35--35, 2005.
[29]
S. T. Jones, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Geiger: monitoring the buffer cache in a virtual machine environment. In ASPLOS XII, pages 14--24, 2006.
[30]
D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, STOC '97, pages 654--663, New York, NY, USA, 1997. ACM. ISBN 0-89791-888-6.
[31]
J. M. Kim, J. Choi, J. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. A low-overhead high-performance unified buffer management scheme that exploits sequential and looping references. In OSDI '00, pages 9--9, 2000.
[32]
H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation, NSDI'14, pages 429--444, Seattle, WA, Apr. 2014. USENIX Association. ISBN 978-1-931971-09-6.
[33]
Y. Lu, T. Abdelzaher, C. Lu, and G. Tao. An Adaptive Control Framework for QoS Guarantees and its Application to Differentiated Caching Services.
[34]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal, 9(2):78--117, June 1970.
[35]
N. Megiddo and D. Modha. ARC: A self-tuning, low overhead replacement cache. In FAST '03, pages 115--130, 2003.
[36]
R. Nishtala, H. Fugal, S. Grimm, et al. Scaling Memcache at Facebook. In NSDI '13, pages 385--398, 2013.
[37]
R. H. Patterson, G. A. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed prefetching and caching. In SOSP '95, pages 79--95, 1995.
[38]
M. Rajashekhar and Y. Yue. Caching with twemcache. http://engineering.twitter.com/2012/07/caching-with-twemcache.html.
[39]
H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Transactions on Computers, 41(9): 1054--1068, 1992.
[40]
A. J. Storm, C. Garcia-Arellano, S. S. Lightstone, Y. Diao, and M. Surendra. Adaptive self-tuning memory in DB2. In VLDB '06, pages 1081--1092, 2006.
[41]
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Cache Partitioning for Simultaneous Multithreading Systems. In PDCS'01, pages 116--127, 2001.
[42]
D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm. RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations. In ASPLOS XIV, pages 121--132, 2009.
[43]
V. Venkataramani, Z. Amsden, N. Bronson, G. Cabrera III, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, J. Hoon, S. Kulkarni, N. Lawrence, M. Marchukov, D. Petrov, and L. Puzar. TAO: How Facebook serves the social graph. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD '12, pages 791--792, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1247-9.
[44]
C. A. Waldspurger. Memory resource management in VMware ESX server. In OSDI '02, 2002.
[45]
A. Wiggins and J. Langstone. Enhancing the scalability of memcached. http://software.intel.com/en-us/articles/enhancing-the-scalability-of-memcached-0.
[46]
T. Yang, E. D. Berger, S. F. Kaplan, and J. E. B. Moss. CRAMM: virtual memory support for garbage-collected applications. In OSDI '06, pages 103--116, 2006.
[47]
W. Zhao and Z. Wang. Dynamic memory balancing for virtual machines. In VEE '09, pages 21--30, 2009.
[48]
W. Zhao, X. Jin, Z. Wang, X. Wang, Y. Luo, and X. Li. Low cost working set size tracking. In ATC'11, pages 17--23, 2011.
[49]
P. Zhou, V. Pandey, J. Sundaresan, A. Raghuraman, Y. Zhou, and S. Kumar. Dynamic tracking of page miss ratio curve for memory management. In ASPLOS XI, pages 177--188, 2004.
[50]
T. Zhu, A. Gandhi, M. Harchol-Balter, and M. A. Kozuch. Saving cash by using less cache. In Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Ccomputing, Hot-Cloud'12, Berkeley, CA, USA, 2012. USENIX Association.

Cited By

View all
  • (2024)Reducing Cross-Cloud/Region Costs with the Auto-Configuring MACARON CacheProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695972(347-368)Online publication date: 4-Nov-2024
  • (2024)TTLs Matter: Efficient Cache Sizing with TTL-Aware Miss Ratio Curves and Working Set SizesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650066(387-404)Online publication date: 22-Apr-2024
  • (2024)Hill-Cache: Adaptive Integration of Recency and Frequency in Caching with Hill-Climbing2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00302(3947-3960)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SOCC '14: Proceedings of the ACM Symposium on Cloud Computing
November 2014
383 pages
ISBN:9781450332521
DOI:10.1145/2670979
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LRU
  2. caching
  3. hit-rate curves
  4. memcached
  5. miss-rate curves
  6. profiling

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

SOCC '14
Sponsor:
SOCC '14: ACM Symposium on Cloud Computing
November 3 - 5, 2014
WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Reducing Cross-Cloud/Region Costs with the Auto-Configuring MACARON CacheProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695972(347-368)Online publication date: 4-Nov-2024
  • (2024)TTLs Matter: Efficient Cache Sizing with TTL-Aware Miss Ratio Curves and Working Set SizesProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650066(387-404)Online publication date: 22-Apr-2024
  • (2024)Hill-Cache: Adaptive Integration of Recency and Frequency in Caching with Hill-Climbing2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00302(3947-3960)Online publication date: 13-May-2024
  • (2024)Can Miss Ratio Curves Predict Cache Performance?2024 IEEE International Conference on Cloud Engineering (IC2E)10.1109/IC2E61754.2024.00018(101-106)Online publication date: 24-Sep-2024
  • (2023)FLORIA: A Fast and Featherlight Approach for Predicting Cache PerformanceProceedings of the 37th International Conference on Supercomputing10.1145/3577193.3593740(25-36)Online publication date: 21-Jun-2023
  • (2023)EVStore: Storage and Caching Capabilities for Scaling Embedding Tables in Deep Recommendation SystemsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575718(281-294)Online publication date: 27-Jan-2023
  • (2023)Increment - and - Freeze: Every Cache, Everywhere, All of the TimeProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591085(129-139)Online publication date: 17-Jun-2023
  • (2023)Multi-Tenant In-Memory Key-Value Cache Partitioning Using Efficient Random Sampling-Based LRU ModelIEEE Transactions on Cloud Computing10.1109/TCC.2023.330088911:4(3601-3618)Online publication date: Oct-2023
  • (2022)MemSweeper: virtualizing cluster memory management for high memory utilization and isolationProceedings of the 2022 ACM SIGPLAN International Symposium on Memory Management10.1145/3520263.3534651(15-28)Online publication date: 14-Jun-2022
  • (2022)Efficient Stack Distance Approximation Based on Workload CharacteristicsIEEE Access10.1109/ACCESS.2022.318032710(59792-59805)Online publication date: 2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media