CIDR: A cost-effective in-line data reduction system for terabit-per-second scale SSD arrays
2019 IEEE International Symposium on High Performance Computer …, 2019•ieeexplore.ieee.org
An SSD array, a storage system consisting of multiple SSDs per node, has become a design
choice to implement a fast primary storage system, and modern storage architects now aim
to achieve terabit-per-second scale performance with the next-generation SSD array. To
reduce the storage cost and improve the device endurability, such SSD array must employ
data reduction schemes (ie, deduplication, compression), which provide high data reduction
capability at minimum costs. However, existing data reduction schemes do not scale with the …
choice to implement a fast primary storage system, and modern storage architects now aim
to achieve terabit-per-second scale performance with the next-generation SSD array. To
reduce the storage cost and improve the device endurability, such SSD array must employ
data reduction schemes (ie, deduplication, compression), which provide high data reduction
capability at minimum costs. However, existing data reduction schemes do not scale with the …
An SSD array, a storage system consisting of multiple SSDs per node, has become a design choice to implement a fast primary storage system, and modern storage architects now aim to achieve terabit-per-second scale performance with the next-generation SSD array. To reduce the storage cost and improve the device endurability, such SSD array must employ data reduction schemes (i.e., deduplication, compression), which provide high data reduction capability at minimum costs. However, existing data reduction schemes do not scale with the fast increasing performance of an SSD array, due to inhibitive amount of CPU resources (e.g., in software-based schemes) or low data reduction ratio (e.g., in SSD device wide deduplication) or being cost ineffective to address workload changes in datacenters (e.g., in ASIC-based acceleration). In this paper, we propose CIDR, a novel FPGA-based, cost-effective data reduction system for an SSD array to achieve the terabit-per-second scale storage performance. Our key ideas are as follows. First, we decouple data reduction related computing tasks from the unscalable host CPUs by offloading them to a scalable array of FPGA boards. Second, we employ a centralized, node-wide metadata management scheme to achieve an SSD array-wide, high data reduction. Third, our FPGA-based reconfiguration adapts to different workload patterns by dynamically balancing the amount of software and hardware tasks running on CPUs and FPGAs, respectively. For evaluation, we built our example CIDR prototype achieving up to 12.8 GB/s (0.1 Tbps) on one FPGA. CIDR outperforms the baseline for a write-only workload by up to 2.47x and a mixed read-write workload by an expected 3.2x, respectively. We showed CIDR's scalability to achieve Tbps-scale performance by measuring a two-FPGA CIDR and projecting the performance impacts for more FPGAs.
ieeexplore.ieee.org