1 Introduction
Power consumption remains an important issue in integrated circuit design. The problem is especially visible in the case of digital circuits processing data, e.g., microprocessors, digital signal processors, graphics processing units, and so forth. Addition and multiplication are probably the most important and frequent operations in calculations. Their importance grows with the development of microprocessors and computers, not to mention digital signal processing. Increasing computing power is still required. It is clearly visible in portable devices, which are equipped with more and more newer features, not long ago belonging to computers only (e.g., tablets, smartphones, smartwatches, and other new devices). Therefore, scientists and engineers must develop new materials, devices, and design methods to create new circuits and systems that will meet the growing expectations of users. Knowledge of the behavior and parameters of arithmetic circuits is needed to perform this task. The article also attempts to answer some questions regarding the design of adders in the context of low-power design.
The importance of this topic is proved by the large number of works on the design of low-power adders. The subject of adders design has many aspects, concerning parameters of the circuits, e.g., area, speed, and energy. Many researchers deal with 1-bit full adders, designed in various technologies. A well-described conventional CMOS circuit consisting of 28 transistors can be found in [
1]. Another possibility of designing the full adder is the use of transmission gates to design XOR and XNOR gates and finally build the adder [
1]. This fundamental design has evolved over the years. Thus, different technologies incorporated to reduce the power consumption of adders can be found. Some designs take advantage of the Pass-Transistor Logic, when dealing with low-power design [
2–
4]. Skillful use of these features can be found in hybrid adders [
5–
7]. The bridge style design used for adders can be found in [
8]. Dual value logic is a kind of pass-transistor logic and is also used in the design of adders [
9,
10]. Other techniques used to build adders can be found in literature, e.g.,
gate diffusion input (GDI) logic [
11,
12], or adiabatic [
13].
Many works are devoted to multi-bit adders. The techniques mentioned above can be found in the design of parallel adders. In [
14], the authors use CMOS, GDI, and modified GDI to design Kogge-Stone adders [
15] with various bit length. In [
16], full-swing GDI cells are used to build Carry Look Ahead adder. The design of a 16-bit adder in 40 nm technology shows better parameters than the traditional CMOS counterpart. Other authors investigated adders realized in adiabatic logic [
17,
18]. An interesting comparison of adder design across different types of adder and technology nodes can be found in [
19]. The authors designed six commonly known parallel adders in four technology nodes using automatic synthesis and present the results of primary parameters assessment (area, power, and delay). Similar considerations are presented in [
20]. Authors used standard cells to implement a few kinds of adders. In both cases, parameters of adders are assessed using commercial tools (Cadence and Synopsys, respectively). In [
21], the authors, based on Spice-like simulations results, present a comparative study of a 2-bit adder in terms of speed, power dissipation, and area to identify the best architecture. Some of the above-mentioned papers report the conditions of simulations, but there is no information about activity or probability distribution of input vector changes. At most, the authors used a square wave during the tests. Some authors design optimized adders intended for specific tasks combining various techniques [
22,
23]. In [
24,
25] the authors deal with parallel prefix adders in the aspect of power dissipation reduction but use the traditional power model with the switching activity factor for a description of a circuit activity. In [
26], the authors present a comparative analysis of multi-bit Ripple Carry Adders (serial) built with a 1-bit full adder designed in various styles. A novelty is the use of a more accurate power model than this commonly utilized.
Commonly known, regular PPA structures are used to build other circuits, e.g., approximate adders [
27]. The authors tested different adder topologies (Brent-Kung, Kogge-Stone, Han Carlson, Ladner-Fisher, and Sklansky) in the precise part of approximate to find the optimal combination between performance and power dissipation. In [
28], the authors proposed a special carry generate tree with the possibility of power gating to reduce power consumption. The authors of the mentioned works most often used the classical structures of parallel adders. However, modifications also appear in some works. In [
29], the authors used hybrid radix-2 and radix-3 prefix nodes and cloning nodes to achieve better performance and area of a 16-bit adder. In [
30], the authors proposed modification of the classical approach to PPA design by using certain Boolean combinations leading to better adders properties. In [
31], a review of parallel prefix adders is presented. However, the mentioned works concern typical, regular architectures, and the authors use the traditional power model to evaluate power consumption. An interesting approach to choosing the best adder topology is presented in [
32]. The authors incorporate machine learning to automatically select the architecture of adders and multipliers. However, they utilize commonly known structures of adders.
In the case of the look-ahead adder, there are several examples of elements used with different parameters [
33,
34]. It can be seen, that the widely used Kogge-Stone adder [
35] is fast indeed, but at the cost of significantly higher power consumption compared to the rest. The authors of [
36] show that, e.g., in Brent-Kung adder architecture, the power consumption can be further reduced by changing a bit truncation method. In [
37], it can be seen that current designs such as the Ladner-Fischer adder can be further modified to reduce the number of logic gates and therefore the current consumption.
The above reported papers usually deal with regular structures, well described in the literature and commonly used. Knowles presents a kind of summary of regular adder structures in his paper [
38]. But in this paper, diverse structures are examined, not only those commonly known from the literature, but also all of the possibilities to build them using
n-valency cells. Obviously, regular structures [
38] were considered, but irregular ones with variable value of block valency are especially interesting. In particular, it is of interest whether cells with a valency greater than 2 could be useful for low-power design. The motivation of the article is to find a suitable structure for a specific scenario of input data to be processed by an adder, yet in the design phase, considering power consumption of the adder. Thus, different data activities are considered. Moreover, unlike most authors, an extended power dissipation model [
39] is used to better assess power consumption. Finally, certain conclusions for low-power design are drawn. Therefore, in the article an exhaustive analysis of all possible structures of a carry generation block in PPA created with
n-valency cells for an N-bit adder is presented. The article shows a certain idea for controlling the design process of low-power adders that target the reduction of power dissipation. Only dynamic power dissipation of gates is taken into account, but the presented design framework can be complemented with issues characteristic for a particular technology or design style of gates, even in such exceptional solutions as those described in [
40]. This idea was first announced at the Mixdes conference [
41] and this paper is an extended version.
The novelty of the proposed solution lies in the irregularity of the PPA structures that outperforms the regular PPA for specific datasets. Several new architectures were proposed, evaluated, and compared with the legacy ones, clearly showing the differences in performance depending on the datasets. The datasets include values of phase-shifted sine function that are used in digital signal processing.
The article is organized as follows. Section
2 provides basic information on Parallel Prefix Adders, and the description of the extended power model of static CMOS gates used in this study. The next section presents a description of building blocks needed for varied adder designs. Section
4 describes the input data for adder simulations used for the tests. Section
5 presents a detailed description of the structures of the adders tested and the results of the simulations. Observations and discussion of designs focused on low power are presented in Section
6 with conclusions and remarks.
4 Test Vectors Preparation
Power consumption in CMOS static circuits strongly depends on an activity profile of input signals. The extended model of power consumption presented in Section
2.2 utilizes the probability of input vector changes (gate driving way) as a measure of circuit activity. Thus, some distributions of input vector changes are needed to test adders. Obviously, the easiest case is uniform distribution, and it can be used as a general or reference one. On the other hand, random distribution gives similar results. But adders in many cases work with specific kinds of data. In this work, 27 distributions of input vector changes were prepared based on assumed values of added numbers. The following dependency scenarios between the summed data were considered. At the beginning, two series (S1, S2) of disjoint numbers were randomly generated (20,000 vectors for 4-bit adders and it was doubled increasing the number of bits by 1). The first series included numbers from 0 to half of the range (0
\(\div\) 0.5·2
N–1) and the second included values from half of the range to the end (0.5·2
N–1
\(\div\) 2
N–1), assuming the N-bit adder (Figure
14). Histograms for these 4-bit series are shown in Figure
15. Next, consecutive numbers were merged into one 2
N-vector, using all possible combinations: A = S
1, B = S
2; A = S
2, B = S
1; A = S
1, B = S
1; and A = S
2, B = S
2, giving four scenarios of input vector distributions. Finally, for the first case, the distribution of 8-bit (merged) vector changes is presented in Figure
16.
The next four cases were prepared in a similar way, but series of disjoint vectors were generated each time. Another four distributions were created taking small overlap of numbers, e.g., S1 = [0 \(\div\) 60% max], S2 = [40% max \(\div\) max], and series were generated each time. Finally, 12 distributions were obtained.
Moreover, another scenario was considered: the addition of sinusoidal waveforms. Fifteen cases were prepared for further test. The final distribution of the first one, merged sin(t) and sin(2·t) with maximum amplitude (Figure
17), is presented in Figure
18. All the cases considered for sinusoidal waveform addition are described in Table
3 using a simplified equation.
The scenarios of input vector distributions presented above show that some vectors may be absent at an adder input (Figures
16 and
18). However, on the other hand, the switching activity of all inputs is greater than zero. For the probability distribution shown in Figure
18, the switching activity for all inputs takes values 0.60, 0.28, 0.12, 0.04, 0.30, 0.14, 0.06, and 0.02 starting from LSB. The extended power model of gates, used in this work, naturally takes into account spatial and temporal correlations between signals and does not need any additional measures or coefficients.
6 Results AND Discussion
All circuits described in the previous section were simulated using the MATLAB environment for distributions of the input vector changes described in Section
4. In total, there were 28 cases: uniform, 12 with series of numbers, and 15 with sinus signals. The results, the total equivalent capacitance of the PG tree, for all 101 circuits are collected in Table
4, Table
5, and Table
6 for 4-, 5-, and 6-bit adders, respectively. Adders containing black cells are marked in gray. Adders consisting solely of gray cells are marked in white. The values in the tables are indicated with blue stripes for easier comparison. The minimum values are written in green italics and the maximum values are written in red bold. In addition, the minimum values except for Ripple Carry Adder (sum4b_3p7, sum5b_4p25, and sum6b_5p25) are marked in yellow. Thus, parallel adders can be easily compared.
The results of the equivalent capacitance collected in the tables below can be easily recalculated to power consumption using a well-known equation: \(P = f{{C}_{eqv}}V_{dd}^2\), where in this case Ceqv is the equivalent capacitance, f is the frequency of the circuit input vector changes, and \(V_{dd}^2\) is the supply voltage of the circuit. For comparing implementations of adders with each other, this parameter is sufficient and is discussed in the next paragraph.
To show trends and for better analysis, results collected in the above table for the 6-bit adders are additionally presented in the graphs shown in Figure
24.
The first observation from the results presented in the above tables is that the RCA adder is the best realization in many cases (sum4b_3p7, sum5b_4p25, and sum6b_5p25). It is obvious because of the smallest number of transistors used to build this adder, but on the other hand, it has the largest number of levels; thus, consequently, the RCA usually has the largest delay. From an activity point of view, such serial structure only calculates carry signals when necessary. Thus, it would be treated as a borderline case, but that is not a sharp boundary, as the test results show. The second realization (marked in yellow) is usually only a few percent worse than RCA, mostly about 1% or 2%. However, these differences are larger for 4-bit adders and reach up to 7.9% for distribution sin11. The RCA adder is not the best solution in all cases. Among the 28 scenarios of input activity considered, in seven to eight cases, the RCA adder turned out to be worse, and in other cases, the RCA is only a little bit better than the second solution.
Considering the obtained data, especially for 5- and 6-bit adders, it can be seen that in general the higher power consumption is for structures utilized black cells. It is clearly visible in the middle and to the right of the graphs (starting from sum6b_2p18c) presented in Figure
24. However, some exceptions exist, and implementations with black cells have similar power consumption as that with gray cells only. For example, adders sum5b_2p7c, sum5b_3p17c, sum6b_3p33c, sum6b_3p42c, or sum6b_3p47c have power consumption comparable to that without black cells. It is because they have a similar structure, only one black cell, and small redundancies. One exception that could be pointed out here is the sum6b_2p34c, which has two black cells and bigger redundancy than the ones previously mentioned, but in cases of sinus distributions, it has better power properties than for serial distributions.
Generally, power consumption in the sinus distribution case (“sinN”) is lower than in the serial distribution (“serN”). The exception is “ser4,” which is comparable with sinus. In addition, it can be observed that if sinus distributions are considered, some adders with black cells have similar power consumption as adders with gray cells only. This is caused by the sparse distribution of changes of input vectors (see Figure
18). Only a small portion of possible changes of vectors occur and, in consequence, structures with black cells can have similar power consumption as those that consist of gray cells only. It is when only one black cell is inside the adder (sum6b_21c, _23c, _32c). Therefore, if black cells were to be used, such structures require careful analysis of activity and power consumption. The structure of adders should be carefully adapted to the nature of changes of the input vectors, and a more detailed analysis that gives insight into the activity of the adder structure could be done, but this analysis can already be useful for low-power design. From the above observations, the conclusion is that black cells should be avoided because they perform redundant calculations, and finally increase power consumption.
The number of levels has a smaller influence on power consumption than the structure of an adder, especially the presence of black cells. Nonetheless, it can be observed that adders with higher levels have less power consumption. The reason is that the an adder with a smaller number of levels requires the use of cells with higher valency. They have more transistors, usually causing larger power consumption, but some exceptions can be found. Again, it depends on the type of data that are summed. Interestingly, the CLA adder, considered to be very fast because it is single level, has quite high power consumption, but smaller than those with many black cells. It uses the most complex gates, and the carry signal of each bit is calculated separately. This means that the input signals sometimes change without affecting the output, overloading the capacitance of internal nodes and consuming energy. The extended model used in this work takes this phenomenon into account very well.
Taking into consideration structures of adders known from the literature, the Brent-Kung adder (sum4b_2p5c, sum5b_3p18c, and sum6b_3p15c) and Kogge-Stone (sum4b_2p6c, sum5b_3p23c, and sum6b_3p11c) can be found in the analyzed structures. It can be observed that both are not the best realizations when considering power consumption in the cases analyzed for the presented scenarios of input data activity. The Kogge-Stone adders are worse, because they have a lot of nodes, and more redundancies occur. At the first level, there are many cells with valency-2, which are unproductively switched. Such signals are not propagated to the circuit outputs. Modifying the Kogge-Stone adder by shifting some calculations to the second level (obtaining Brent-Kung), it is possible to reduce the power consumption by one-third. The adder sum6b_49c has a lower power consumption of about 10 \(\div\) 23% compared to Kogge-Stone. It is a modified version of the Kogge-Stone adder and consists of a lower number of cells, but some calculations are still redundant. Therefore, lower power consumption occurs when there are no redundant nodes in non-regular structures.
Table
7 presents a comparison of the best and worst realizations of the adder versus input activity scenarios. In this comparison, RCA is not taken into account. The differences increase with the size of the adder, because a bigger adder gives more opportunities to consider more complex structures. The results show that it is possible to reduce the power consumed by the PG tree of adders by several dozen percent (up to 60%) when the proper structure is chosen.