5.1 Numerical Simulation
Comparison approaches:. We compare
ClusterRT to two state-of-the-art approaches: SRS [
32] and ERS [
39], both of which aim to accelerate the redundancy transitioning for RS codes. We elaborate on the main ideas of SRS and ERS as below:
–
SRS [
32]: It establishes the distributions of stripes based on the pre-fixed coding parameters
\((k,m, k^{\prime })\) and eliminates the data relocation for the first transitioning operation.
–
ERS [
39]: It also eliminates the data relocation for the first transitioning operation and reduces parity update traffic via enlarging the encoding matrix in advance.
Experimental setup:. We first carry out numerical simulation by disabling the functionalities of network transmissions and storage operations. We mainly consider the general erasure coding configurations. We deploy a \((k,m)\) code onto a storage system and increase the value of \(k\), while keeping the value of \(m\) constant. We also adopt the following default configurations unless otherwise specified. We set the number of stripes to 100,000 and distribute them across 40 clusters. We set the chunk size to 64 MB. We then measure the average transitioning traffic induced to stretch a stripe in a transitioning operation.
Experiment A.1 (reduction on the cross-cluster transitioning traffic):. We first evaluate the cross-cluster transitioning traffic generated by each approach to achieve successive transitioning. We performed 15 transitioning operations by increasing the value of
\(k\) from 6 to 96 and tested the cross-cluster transitioning traffic under three approaches. It can be seen from Figure
12 that
ClusterRT greatly reduces the cross-cluster transitioning traffic compared with SRS and ERS.
As shown in Figure
12,
ClusterRT maintains a small cross-cluster transitioning traffic during the successive transitioning operations, while SRS and ERS only achieve a small cross-cluster transitioning traffic during the first transitioning operation. This is mainly because both SRS and ERS maintain an enlarged stripe layout before the first transitioning, thus eliminating the data chunk relocation traffic during the first transitioning. However, in the successive transitioning operations, SRS and ERS require chunk relocation to adjust the stripe layout. Therefore, in the successive transitioning operation, SRS and ERS will incur large traffic of cross-cluster transmission when adjusting the layout, thus introducing additional traffic. Furthermore, compared with SRS, ERS reduces the traffic of parity chunk updates through its coding algorithm.
As a comparison, ClusterRT utilizes the maximum flow algorithm to facilitate data chunk relocation and minimize cross-cluster relocation traffic. Additionally, to minimize the cross-cluster update traffic, ClusterRT places the parity chunks of decomposed stripes and stretched stripes within the same cluster in advance, which reduces the number of data chunks to be read. Consequently, compared to ERS and SRS, ClusterRT can reduce 94.0% and 96.2% of the transitioning traffic on average, respectively.
Experiment A.2 (breakdown analysis of cross-cluster transitioning traffic):. We then evaluate the breakdown analysis of the cross-cluster transitioning traffic to learn the impact of different operations performed in redundancy transitioning. According to Figure
13, we make two observations. First,
ClusterRT successfully eliminates the cross-cluster relocation traffic across all redundancy transitioning operations by establishing maximum flows to allocate data chunks. Second, compared to ERS and SRS,
ClusterRT always consumes the lower-parity update traffic, up to 75.0% and 96.1%, respectively, indicating that the parity-coordinated update approach can effectively achieve relatively low-parity update traffic. In summary, the two techniques in
ClusterRT are both effective and complementary without comprising the effectiveness of each other.
Experiment A.3 (impact of number of clusters):. We increase the number of clusters from 10 to 50 and measure the cross-cluster transitioning traffic under different erasure coding paradigms.
Figure
14 shows that the cross-cluster transitioning traffic of
ClusterRT maintains 0.1 while the number of clusters keeps changing. The reason is that
ClusterRT can eliminate cross-cluster data chunk relocation using the maximum flow algorithm in the vast majority of cases, and the parity-coordinated update approach only requires a few data chunks to be transmitted across clusters. Thus,
ClusterRT can keep a relatively low cross-cluster transitioning traffic during successive redundancy transitioning. On the contrary, the traffic of ERS and SRS stays high because the cost of layout adjustment increases with the number of clusters. To conclude,
ClusterRT achieves at most 97.6% and 97.5% lower cross-cluster traffic compared with ERS and SRS, respectively.
Experiment A.4 (load balance):. We evaluate the load balancing ratio of the three approaches. Figure
15 shows that
ClusterRT effectively balances the transitioning traffic loaded across clusters, since it adopts a download bandwidth balance layout that each cluster downloads the same amount of data chunks. Statistically, the average load balancing ratio for both ERS and SRS is 2.4 across all the successive transitioning operations, while
ClusterRT achieves a load balancing ratio of 1, which corresponds to the optimal.
Experiment A.5 (update performance):. We study the load balancing ratio and cross-cluster update traffic for two data placement approaches:
ClusterRT’s placement (see Section
3.3) and the random placement. This experiment aims to validate that
ClusterRT’s strategy of placing all parity chunks of a stripe within the same cluster does not downgrade performance under normal mode. We use YCSB [
9] to generate workloads, which consist of 1 million transactions with a read/update ratio of 50:50. These workloads include both Zipfian (
\(\alpha =0.99\)) and Uniform distributions to encompass a diverse range of access patterns.
Figures
16(a) and
16(b) show that both
ClusterRT and the random placement exhibit similar load balancing ratios across all erasure coding parameters, regardless of whether the workload follows a Zipfian or Uniform distribution. Consequently, it indicates that
ClusterRT’s parity placement does not lead to the concentration of update traffic in any specific cluster. Moreover, by placing all parity chunks of a stripe within a cluster,
ClusterRT can effectively reduce cross-cluster update traffic. This is because
ClusterRT only needs to transmit the data delta chunks across clusters to the parity cluster, such that the parity chunks can be updated internally. Figures
16(c) and
16(d) show that compared to the random placement,
ClusterRT achieves an average reduction of 64.2% in cross-cluster traffic for update operations.
Experiment A.6 (performance for LRC codes):. We measure the cross-cluster transitioning traffic generated by ClusterRT and the random placement method for LRCs, where the random placement distributes the chunks of every stripe at random under the requirement of cluster-level fault tolerance. We perform 12 transitioning operations by increasing the value of \(k\) from 4 to 60 while keeping the values of \(l\) and \(g\) both untouched. We measure the resulting cross-cluster transitioning traffic.
Figure
17 shows that
ClusterRT greatly reduces the cross-cluster transitioning traffic compared with the random placement.
ClusterRT introduces a small amount of cross-cluster transitioning traffic during successive transitioning operations. This is because
ClusterRT employs the maximum flow to minimize the cross-cluster relocation traffic. Furthermore, by placing both global and local parity chunks of the same transitioning group within the same cluster,
ClusterRT significantly reduces the cross-cluster traffic when updating parity chunks. Overall, compared to the random placement,
ClusterRT can reduce 77.2% of cross-cluster traffic on average during transitioning.
Experiment A.7 (comparison with convertible codes):. We finally measure the cross-cluster transitioning traffic introduced by
ClusterRT and convertible codes [
22,
23]. We realize stripe merge to meet the conversion requirements of convertible codes by setting the parameter
\(k^{\prime }\) in the
\((k, m, k^{\prime })\)-transitioning to be a multiple of
\(k\). Figure
18 shows the cross-cluster transitioning traffic of
ClusterRT and convertible codes under different
\(k\) and
\(k^{\prime }\). We can observe that compared to convertible codes,
ClusterRT gains lower cross-cluster transitioning traffic, mainly because it uses the maximum flow algorithm to reduce cross-cluster relocation traffic. Besides,
ClusterRT pre-locates the parity chunks of decomposed stripes and stretched stripes within the same cluster to reduce cross-cluster update traffic. Moreover, convertible codes only optimize access costs in the parity update, without considering the optimization of cross-cluster network traffic and data relocation. In summary,
ClusterRT achieves a reduction in cross-cluster transitioning traffic compared to convertible codes by 32.4% to 66.1%.
5.2 Testbed Experiments
We conduct further evaluation of
ClusterRT on Alibaba Cloud ECS [
4] to uncover its performance in a real-world cloud data center. Specifically, we allocate 19 virtual machine instances (with the type of
ecs.g7.large), where each instance is equipped with 2vCPU (2.7 GHz 3rd Intel Xeon Scalable Processors) and 8 GB memory. The operating system is Ubuntu 18.04, and we measure via
iperf that the network bandwidth between any two instances is around 10 Gb/s.
Experimental setup:. We deploy the
ClusterRT coordinator in one instance acting as the metadata server. We choose RS(6,3) as the default erasure coding paradigm and take the other 18 instances as the storage nodes with
ClusterRT agents atop. We also adopt the following default configurations unless otherwise specified. The chunk size is set to 64 MB and the value of
\(k\) is successively increased from 6 to 15 according to the state-of-the-art erasure coding deployments [
13]. Before the experiment, we deployed 300 stripes, and each cluster contains three nodes. We then use Linux tool
tc to throttle the cross-cluster network bandwidth between any two clusters from 1 Gb/s to 2 Gb/s (cross-cluster network bandwidth is set to 1 Gb/s by default) and measure the
average transitioning time per stripe (defined as the time needed to stretch a stripe on average). We repeat each experiment for five runs and plot the average results, as well as the error bars indicating the maximum and minimum values across the experiments (some may be invisible as they are very small).
Experiment B.1 (transitioning time):. We first measure the transitioning time when the value of
\(k\) increases from 6 to 15. We fix the intra-cluster network bandwidth to 10 Gb/s and the cross-cluster network bandwidth to 1 Gb/s (the ratio of the intra-cluster transfer speed to the cross-cluster transfer speed is around 10:1). Figure
19 shows the results.
Figure
19 implies that
ClusterRT has a stable and short transitioning time, while the transitioning time of SRS and ERS shows an upward trend with the increase of
\(k\). First of all, SRS always generates the longest transitioning time in successive transitioning operations. This is mainly because SRS consumes more data chunk relocation traffic to adjust the layout and needs to read more data chunks for the parity chunk update. Second, we find that
ClusterRT requires a shorter transitioning time than ERS even in the first transitioning operation, and much shorter in the successive transitioning operation. The main reason is that
ClusterRT does not need to relocate a large number of data chunks across the cluster to adjust the layout as ERS does. Meanwhile,
ClusterRT aggregates parity chunks in the same cluster, thus reducing the cross-cluster reading of data chunks during parity updates. In addition,
ClusterRT further reduces the tail latency by balancing the download traffic of each cluster. Compared to ERS and SRS,
ClusterRT can reduce 70.4% and 88.4% of the transitioning time on average, respectively.
Experiment B.2 (computation time):. We run the experiment in an instance on Alibaba Cloud ECS and study the computation time (i.e., the total time of generating the transitioning solutions) by increasing the number of stripes. Figure
20 shows the results.
We find that the computation time of ClusterRT to generate the transitioning scheme is always marginal, meaning that the computation time is negligible compared with the transmission time of the transitioning operation.
Experiment B.3 (impact of chunk size):. We also study the transitioning time under different chunk sizes, which are varied from 16 MB to 64 MB. Here, we set the cross-cluster network bandwidth to 1 Gb/s. Figure
21 implies that the transitioning time increases with the chunk size. We can observe that with the increase of chunk size,
ClusterRT reduces more transitioning time compared to SRS and ERS. This improvement can be attributed to the limited cross-cluster network bandwidth, combined with the extensive cross-cluster chunk transfers introduced in ERS and SRS during redundancy transitioning. Larger chunk sizes exacerbate these limitations, hence prolonging transitioning times. Statistically,
ClusterRT reduces the transitioning time by 33.3% to 88.9% and 73.3% to 93.2% compared to ERS and SRS, respectively.
Experiment B.4 (impact of cross-cluster network bandwidth):. We finally investigate the impact of cross-cluster network bandwidth, which is increased from 1 Gb/s to 10 Gb/s by using
tc. Figure
22 shows that the transitioning time decreases when the cross-cluster network bandwidth gets higher. Besides,
ClusterRT outperforms both ERS and SRS under different cross-cluster network bandwidths. At the same time, when the cross-cluster network bandwidth is more stringent,
ClusterRT performs better, as it reduces the cross-cluster transitioning traffic and balances the transitioning traffic. We observe that as the cross-cluster network bandwidth increases, the decrease in redundancy transitioning time gained by
ClusterRT becomes less significant compared to ERS and SRS in most cases. However, in some specific cases (such as the (12,3,15)-transitioning with a cross-cluster network bandwidth of 10 Gb/s),
ClusterRT exhibits a more significant reduction in redundancy transitioning time compared to ERS and SRS. This is because the performance bottleneck of ERS and SRS shifts to the intra-cluster network bandwidth, compromising the performance improvement. To summarize,
ClusterRT reduces the transitioning times of ERS and SRS by 7.6% to 88.9% and 73.3% to 93.8%, respectively.