The techniques in Table
4 show a range of applications and implementations of cache re-partitioning in recent literature. We compare each with each other and our technique, CASHT across the following technique features: partition allocation
algorithm; whether partitions are split between cores (
C) or threads (
T); what cache dimension (set or way) are caches
partitioned; how partitions are
enforced; if they are hardware (
hw) or software (
sw)-based; how cache behavior is
profiled; when
re-partitioning occurs; and the
overhead. UCP tracks per workload cache hit curves to compute and balance the cache needs every 5M cycles [
32]. UCP introduced the lookahead algorithm to cache-partitioning problem, and many works can and do adopt the algorithm as a foundation in their work [
35,
45,
48], but UCP has large overhead and does not scale well as core counts scale up. COLORIS leverages page coloring via custom page allocator to partition cache along sets [
49], but requires modifications to the operating system. Kpart exploits way partitioning adoption in silicon to partition last-level cache into cluster groups, and uses IPC (plus miss rates and bandwidth) to cluster workloads before passing input to the lookahead algorithm [
11]. Kpart clusters applications to avoid the diminished returns of partitioning a few ways between many cores, which is not the goal of CASHT. Further, Kpart without clustering is similar to UCP adapted in software given that the lookahead algorithm is in use to determine partition sizes at each repartition step, so we believe comparing against UCP is sufficient. Cooperative partitioning addresses orphaned lines and smooth the transitions after a re-partition step occurs, and modifies lookahead to allow for blocks to be disabled altogether [
42].
Reuse locality aware cache algorithm (ROCA) partitions between reused and not-reused cache blocks, leveraging a reuse predictor to determine partition placement at insertion [
37]. This differs from the approach taken by partitioning algorithms generally, but can be reduced to identifying blocks by prediction rather than CPU ID so most way-based can adapt to this problem.
Gradient-based Cache-partitioning Algorithm (GPA) enables continuous partition adjustment at a cache block granularity by determining how the rate at which blocks are placed in cache in a protected (or vulnerable) state [
13]. Consequently, the usage of gradient sets can cause harm to a portion of cache due to purposeful beneficial and detrimental behavior across gradient sets, which CASHT avoids with PSA (Section
3).
Machine learned cache (MLC) partitions L2 AND L3 cache via a trained reinforcement learning model called Q-learning, enables smart and collaborative partitioning decisions between per thread Markov agents [
18]. Though MLC and CASHT both take advantage of learning algorithms, MLC partitions both L2 and L3 to achieve performance gain on a system that assumes Simultaneous Multi-Threading, which CASHT does not.