research-article

Open access

Comparative Analysis of Dynamic Power Consumption of Parallel Prefix Adder

Author:

Ireneusz BrzozowskiAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 29, Issue 3

Article No.: 49, Pages 1 - 22

https://doi.org/10.1145/3651984

Published: 22 April 2024 Publication History

PDF eReader

Abstract

The Newcomb-Benford law, also known as Benford's law, is the law of anomalous numbers stating that in many real-life numerical datasets, including physical and statistical ones, numbers have a small initial digit. Numbers irregularity observed in nature leads to the question, is the arithmetical-logical unit, responsible for performing calculations in computers, optimal? Are there other architectures, not as regular as commonly used Parallel Prefix Adders, that can perform better, especially when operating on the datasets that are not purely random, but irregular? In this article, structures of a propagate-generate tree are compared including regular and irregular configurations—various structures are examined: regular, irregular, with gray cells only, with both gray and black, and with higher valency cells. Performance is evaluated in terms of energy consumption. The evaluation was performed using the extended power model of static CMOS gates. The model is based on changes of vectors, naturally taking into account spatio-temporal correlations. The energy parameters of the designed cells were calculated on the basis of electrical (Spice) simulation. Designs and simulations were done in the Cadence environment and calculations of the power dissipation were performed in MATLAB. The results clearly show that there are PPA structures that perform much better for a specific type of numerical data. Negligent design can lead to an increase greater than two times of power consumption. The novel architectures of PPA described in this work might find practical applications in specialized adders dealing with numerical datasets, such as, for example, sine functions commonly used in digital signal processing.

1 Introduction

Power consumption remains an important issue in integrated circuit design. The problem is especially visible in the case of digital circuits processing data, e.g., microprocessors, digital signal processors, graphics processing units, and so forth. Addition and multiplication are probably the most important and frequent operations in calculations. Their importance grows with the development of microprocessors and computers, not to mention digital signal processing. Increasing computing power is still required. It is clearly visible in portable devices, which are equipped with more and more newer features, not long ago belonging to computers only (e.g., tablets, smartphones, smartwatches, and other new devices). Therefore, scientists and engineers must develop new materials, devices, and design methods to create new circuits and systems that will meet the growing expectations of users. Knowledge of the behavior and parameters of arithmetic circuits is needed to perform this task. The article also attempts to answer some questions regarding the design of adders in the context of low-power design.

The importance of this topic is proved by the large number of works on the design of low-power adders. The subject of adders design has many aspects, concerning parameters of the circuits, e.g., area, speed, and energy. Many researchers deal with 1-bit full adders, designed in various technologies. A well-described conventional CMOS circuit consisting of 28 transistors can be found in [1]. Another possibility of designing the full adder is the use of transmission gates to design XOR and XNOR gates and finally build the adder [1]. This fundamental design has evolved over the years. Thus, different technologies incorporated to reduce the power consumption of adders can be found. Some designs take advantage of the Pass-Transistor Logic, when dealing with low-power design [2–4]. Skillful use of these features can be found in hybrid adders [5–7]. The bridge style design used for adders can be found in [8]. Dual value logic is a kind of pass-transistor logic and is also used in the design of adders [9, 10]. Other techniques used to build adders can be found in literature, e.g., gate diffusion input (GDI) logic [11, 12], or adiabatic [13].

Many works are devoted to multi-bit adders. The techniques mentioned above can be found in the design of parallel adders. In [14], the authors use CMOS, GDI, and modified GDI to design Kogge-Stone adders [15] with various bit length. In [16], full-swing GDI cells are used to build Carry Look Ahead adder. The design of a 16-bit adder in 40 nm technology shows better parameters than the traditional CMOS counterpart. Other authors investigated adders realized in adiabatic logic [17, 18]. An interesting comparison of adder design across different types of adder and technology nodes can be found in [19]. The authors designed six commonly known parallel adders in four technology nodes using automatic synthesis and present the results of primary parameters assessment (area, power, and delay). Similar considerations are presented in [20]. Authors used standard cells to implement a few kinds of adders. In both cases, parameters of adders are assessed using commercial tools (Cadence and Synopsys, respectively). In [21], the authors, based on Spice-like simulations results, present a comparative study of a 2-bit adder in terms of speed, power dissipation, and area to identify the best architecture. Some of the above-mentioned papers report the conditions of simulations, but there is no information about activity or probability distribution of input vector changes. At most, the authors used a square wave during the tests. Some authors design optimized adders intended for specific tasks combining various techniques [22, 23]. In [24, 25] the authors deal with parallel prefix adders in the aspect of power dissipation reduction but use the traditional power model with the switching activity factor for a description of a circuit activity. In [26], the authors present a comparative analysis of multi-bit Ripple Carry Adders (serial) built with a 1-bit full adder designed in various styles. A novelty is the use of a more accurate power model than this commonly utilized.

Commonly known, regular PPA structures are used to build other circuits, e.g., approximate adders [27]. The authors tested different adder topologies (Brent-Kung, Kogge-Stone, Han Carlson, Ladner-Fisher, and Sklansky) in the precise part of approximate to find the optimal combination between performance and power dissipation. In [28], the authors proposed a special carry generate tree with the possibility of power gating to reduce power consumption. The authors of the mentioned works most often used the classical structures of parallel adders. However, modifications also appear in some works. In [29], the authors used hybrid radix-2 and radix-3 prefix nodes and cloning nodes to achieve better performance and area of a 16-bit adder. In [30], the authors proposed modification of the classical approach to PPA design by using certain Boolean combinations leading to better adders properties. In [31], a review of parallel prefix adders is presented. However, the mentioned works concern typical, regular architectures, and the authors use the traditional power model to evaluate power consumption. An interesting approach to choosing the best adder topology is presented in [32]. The authors incorporate machine learning to automatically select the architecture of adders and multipliers. However, they utilize commonly known structures of adders.

In the case of the look-ahead adder, there are several examples of elements used with different parameters [33, 34]. It can be seen, that the widely used Kogge-Stone adder [35] is fast indeed, but at the cost of significantly higher power consumption compared to the rest. The authors of [36] show that, e.g., in Brent-Kung adder architecture, the power consumption can be further reduced by changing a bit truncation method. In [37], it can be seen that current designs such as the Ladner-Fischer adder can be further modified to reduce the number of logic gates and therefore the current consumption.

The above reported papers usually deal with regular structures, well described in the literature and commonly used. Knowles presents a kind of summary of regular adder structures in his paper [38]. But in this paper, diverse structures are examined, not only those commonly known from the literature, but also all of the possibilities to build them using n-valency cells. Obviously, regular structures [38] were considered, but irregular ones with variable value of block valency are especially interesting. In particular, it is of interest whether cells with a valency greater than 2 could be useful for low-power design. The motivation of the article is to find a suitable structure for a specific scenario of input data to be processed by an adder, yet in the design phase, considering power consumption of the adder. Thus, different data activities are considered. Moreover, unlike most authors, an extended power dissipation model [39] is used to better assess power consumption. Finally, certain conclusions for low-power design are drawn. Therefore, in the article an exhaustive analysis of all possible structures of a carry generation block in PPA created with n-valency cells for an N-bit adder is presented. The article shows a certain idea for controlling the design process of low-power adders that target the reduction of power dissipation. Only dynamic power dissipation of gates is taken into account, but the presented design framework can be complemented with issues characteristic for a particular technology or design style of gates, even in such exceptional solutions as those described in [40]. This idea was first announced at the Mixdes conference [41] and this paper is an extended version.

The novelty of the proposed solution lies in the irregularity of the PPA structures that outperforms the regular PPA for specific datasets. Several new architectures were proposed, evaluated, and compared with the legacy ones, clearly showing the differences in performance depending on the datasets. The datasets include values of phase-shifted sine function that are used in digital signal processing.

The article is organized as follows. Section 2 provides basic information on Parallel Prefix Adders, and the description of the extended power model of static CMOS gates used in this study. The next section presents a description of building blocks needed for varied adder designs. Section 4 describes the input data for adder simulations used for the tests. Section 5 presents a detailed description of the structures of the adders tested and the results of the simulations. Observations and discussion of designs focused on low power are presented in Section 6 with conclusions and remarks.

2 Background On Adders AND Power Dissipation Model

2.1 Parallel Prefix Adders

Multi-bit binary addition can be basically realized by cascading of 1-bit adders. Simply, the carry-output signal of a stage is connected to the carry-input of the next stage creating the Carry-Ripple Adder (CRA). The speed of this adder is strongly dependent on the number of bits due to the need to propagate the carry signal from the first bit to the last.

Another way to perform a summation operation is by looking ahead to predict the carry-out of a multi-bit group to speed up the circuit. Usually, added inputs are divided into multi-bit groups, for which two signals are calculated—the propagation and generation signals, which indicate whether the multi-bit group will propagate carry-in signals or will generate carry-out signals. The Carry-Skip Adder (CSA), incorporating this idea, shortens the critical path by computing a propagation signal for each group and then uses it to skip over long carry ripples. The Carry-Lookahead Adder (CLA) is similar to the previous one, but computes group generate and propagate signals to avoid waiting for a ripple to determine whether the first group generates carry [42].

Therefore, multi-bit binary addition, apart from cascading of 1-bit full adders, can be realized as a parallel prefix operation. Considering the full adder at the beginning, the definition of signals used to describe this operation is recalled: generate (G), propagate (P), and kill (K) signals. The adder generates a carry when C_out is 1 independent of C_in: \(G = A \cdot B\), where A and B are inputs. The adder kills a carry when C_out is 0 independent of C_in: \(K = \bar{A} \cdot \bar{B} = \overline {A + B}\). The adder propagates a carry: it produces a carry-out if and only if it receives a carry-in. It occurs when exactly one input is 1: \(P = A \oplus B\). The same meaning of these signals is for a group spanning bits i…j, inclusive. Thus, now the signals can be defined recursively for \(i \ge k > j\) as follows:

\begin{equation} \begin{split} {{G}_{i:j}} &= {{G}_{i:k}} + {{P}_{i:k}} \cdot {{G}_{k - 1:j}},\\ {{P}_{i:j}} &= {{P}_{i:k}} \cdot {{P}_{k - 1:j}}, \end{split} \end{equation}

(1)

with the base case

\begin{equation} \begin{split} {{G}_{i:i}} &= {{G}_i} = {{A}_i} \cdot {{B}_i},\\ {{P}_{i:i}} &= {{P}_i} = {{A}_i} \oplus {{B}_i}, \end{split} \end{equation}

(2)

and \({{G}_i},{{P}_i}\) are bitwise generate and propagate signals. Note that carry-out at the i-th position \({{G}_{{\rm{OUT}}i}}\) is equal to group generation \({{G}_{i:0}}\). Hence, groups generate signals and carries can be used interchangeably. Finally, the sum for the i-th bit can be calculated as

\begin{equation} {{S}_i} = {{P}_i} \oplus {{G}_{i - 1:0}}. \end{equation}

(3)

Thus, summation can be defined as a three-step operation, consisting of precomputation (calculation of bitwise \({{G}_i},{{P}_i}\)), prefix calculation (PG block, PG tree), and postcomputation (summation \({{S}_i}\)) [42]. This second step (prefix calculation) is our main point of interest in the context of power consumption.

The PG block of parallel prefix adders is usually drawn as a graph using black and gray cells, for which the meaning is presented in Figure 1 [42]. The cells perform the operations described by Equation (1). This presents a case in which the cells are valency-2, and thus it combines pairs of smaller groups.

Fig. 1.

If more groups of signals are taken into account, a higher-valency PG block is defined. It demands more complex gates but can lead to a reduction of stages. For example, the PG block of valency-4 is described by

\begin{equation} \begin{split} {{G}_{i:j}} &= {{G}_{i:k}} + {{P}_{i:k}}\cdot {{G}_{k - 1:l}} + {{P}_{i:k}}\cdot {{P}_{k - 1:l}}\cdot {{G}_{l - 1:m}} \\ &\quad + {{P}_{i:k}}\cdot {{P}_{k - 1:l}}\cdot {{P}_{l-1:m}}\cdot {{G}_{m - 1:j}},\\ {{P}_{i:j}} &= {{P}_{i:k}}\cdot {{P}_{k - 1:j}}\cdot {{P}_{l - 1:m}}\cdot {{P}_{m - 1:j}} \end{split} \end{equation}

(4)

for \(i \ge k > l > m > j\). The generate signal \({{G}_{i:j}}\) can be realized by an AND-OR gate, which is described as

\begin{equation} {{G}_{i:j}} = {{G}_{i:k}} + {{P}_{i:k}} \cdot \left( {{{G}_{k - 1:l}} + {{P}_{k - 1:l}} \cdot \left( {{{G}_{l - 1:m}} + {{P}_{l - 1:m}} \cdot {{G}_{m-1:j}}} \right)} \right), \end{equation}

(5)

and finally, the symbol of the cell and schematic corresponding to Equation (4) is presented in Figure 2 [42].

Fig. 2.

When designing PPA in specified technology (e.g., static CMOS), a designer has only a few options for the first level (bitwise Gi, Pi signals) and the last one (summation Si), but the PG block can be realized in many versions. Therefore, the focus is on a PG block, which can be easily drawn using black and gray cells as a diagram for different adders. For instance, Figure 3 shows a PG diagram of a Sklansky 8-bit adder. The axes present a number of levels and bits, vertical and horizontal, respectively.

Fig. 3.

In the literature, in some papers authors do not distinguish between cell types—all are marked with the same color. It can be done, because the kind of cell is easy to determine—which should be gray and which black. If a cell is preceded by others, not including the first input (LSB), then the propagation signal must be calculated in such cell, so it is black [38]. However, in some works more compound marks describing a diagram of the propagation block, giving more information, can be found [43].

2.2 Extended Power Model of Static CMOS Gates

The power dissipation model of gates used in this work takes advantage of information about changes of vector of circuit primary inputs. Thus, it is extended against the traditional one, usually found in the literature, which uses switching activity and constant capacitance of a node. However, detailed analysis of static CMOS gate behavior during switching shows that power dissipation depends on the reason for the gate switching. It is due to the reconfiguration of internal parasitic capacitances, which leads to different current flows in case of various changes of vectors. Therefore, changes of the whole input vector should be considered as the activity measure, not independent input signals as in the traditional model. Moreover, power is consumed even when a change at an input does not produce a change at an output. Thus, the new model of energy consumed by static CMOS gates was introduced in [39]. The model consists of capacitors representing an amount of energy consumed as a function of the gate switching reason called the gate driving way (Figure 4). Capacitors represent equivalent capacitances, which are calculated from current flowing directly from the supply source or through previous gates for each possible change of input vectors. The values depend on the gate driving way.

Fig. 4.

In the traditional model, it is assumed that in CMOS gates energy is consumed only during switching, but in fact power consumption occurs even when the gate output state is stable but the input vector changes. Therefore, all possible input vector changes are essential and should be considered. The table in Figure 5 shows all driving ways for the 2‑input gate. Changes of input signals—rising or falling edges—are denoted by arrows. Thus, it is the second component of the model—probability of the gate driving way. This parameter numerically describes the possibility of a particular change of the input vector value. It expresses the activity of the input data.

Fig. 5.

Parameters of the model—equivalent capacitance—are calculated based on the simulation results of a current flowing through the gate terminals. Thus, each gate is characterized by a set of tables (one for each input and supply node). But considering dependencies of the equivalent capacitance and the gate driving way in a circuit, values for all terminals can be summed, giving one table with the total equivalent capacitance for the gate.

The probability of a gate driving way, as a circuit activity factor, describes the contribution of a particular equivalent capacitance in the total power dissipation of the whole circuit. Therefore, the following equation characterizes the model:

\begin{equation} {{C}_{T\_equ}}\left( g \right) = \mathop \sum \limits_{d{{w}_g}} {{c}_{t\_equ}}\left( {d{{w}_g}} \right) \cdot p\left( {d{{w}_g}} \right), \end{equation}

(6)

where C_{T_equ} is the total equivalent capacitance of gate g, c_{t_equ}(dwg) is the total equivalent capacitance value regarding terminal X for the dw_g driving way, and p(dw_g) is the probability of the drive way.

Obviously, the probability of driving way for circuit primary inputs must be known, as an activity measure of a circuit. Then, based on the methods developed to calculate the probability of gate driving way, the activity factor for all nodes in a circuit can be calculated in one cycle [44]. The extended model is used in this work because it naturally takes into account spatio-temporal dependencies in the circuit, resulting in a more accurate assessment than the traditional one.

3 Building Blocks of Adders

3.1 Cells for Adders Design

Equations (1), (4), and (5) directly show that the complex gate AND-OR and gate AND are the best candidates to implement in the PG tree of parallel adders. But, on the other hand, static CMOS technology naturally utilizes gates with negation, e.g., NOT, NAND, and so forth. Therefore, complex gates AND-OR-INVERT and OR-AND-INVERT can be used alternately to avoid additional inverters during the creation of the PG tree. It can be done because rules similar to de Morgan's laws work for complex gates:

\begin{equation} \begin{array}{@{}*{1}{c}@{}} {\overline {AOI\left( {{{x}_i}} \right)} = OAI\left( {\overline {{{x}_i}} } \right),}\\ {\overline {OAI\left( {{{x}_i}} \right)} = AOI\left( {\overline {{{x}_i}} } \right).} \end{array} \end{equation}

(7)

The above equations show that negation of the function AOI is the function OAI executed for negated variables, and vice versa, negation of OAI is AOI of negated variables.

This work focuses on the PG block and its power consumption. But the first stage of the parallel prefix adder computing bitwise generate and propagate signals (2) needs AND and EX-OR gates. Also, the third stage, calculating the final sum, needs an EX-OR gate. In static CMOS circuits, NAND gate and complex gates can be used according to Equation (7), but the design of the EX-OR gate is quite problematic. Thus, it can be realized as a serial connection of NOR and AOI gates (\(y = \overline {\overline {\overline {a + b} \ + a \cdot b} }\)). Finally, the first stage (calculating P_i and G_i) was implemented as shown in Figure 6, and the final summation circuit is shown in Figure 7. In this work, the first and the last stages of all adders were the same, thus various structures of the prefix computation block (PG) are considered and their power consumption is assessed.

Fig. 6.

Fig. 7.

In order to design various PG blocks, a set of gates was designed in such a way that allows for easy and fast creation of the final adders’ layouts. In addition to inverter, NAND and NOR gates, complex gates AOI and OAI were designed. Because the work analyzes the PG block built using cells with various values of valency, complex gates have many inputs (see Equation (5)). The boundary case is one level adder—Carry Look Ahead—which for n-bits needs a valency-n cell. It results in a number of complex gate inputs equal to \(2n - 1\) (e.g., for a 6-bit adder it is 11 inputs). Considering schematic diagrams of AOI and OAI gates one can notice that the pull-up and pull-down networks are dual. Therefore, the pull-up network in AOI is the same as the pull-down network in OAI, and vice versa. This duality is visible when considering equations that describe complex gates (Table 1). Example schematics of complex gates of the valency-3 cell are shown in Figure 8. To reduce the capacitance of the gate output node, the number of connected transistors is as small as possible.

Table 1.

Cell name	Function	Number of transistors	Layout width [μm]
AOI21	\(y = \overline {a + b \cdot c}\)	6	3.48
AOI2111	\(y = \overline {a + b \cdot ( {c + d \cdot e} )}\)	10	5.20
AOI211111	\(y = \overline {a + b \cdot ( {c + d \cdot ( {e + f \cdot g} )} )}\)	14	6.92
AOI21111111	\(y = \overline {a + b \cdot ( {c + d \cdot ( {e + f \cdot ( {g + h \cdot i} )} )} )}\)	18	8.64
AOI2111111111	\(y = \overline {a + b \cdot ( {c + d \cdot ( {e + f \cdot ( {g + h \cdot ( {i + j \cdot k} )} )} )} ))}\)	24	10.36
OAI21	\(y = \overline {a \cdot ( {b + c} )}\)	6	3.88
OAI2111	\(y = \overline {a \cdot ( {b + c \cdot ( {d + e} )} )}\)	10	5.60
OAI211111	\(y = \overline {a \cdot ( {b + c \cdot ( {d + e \cdot ( {f + g} )} )} )}\)	14	7.33
OAI21111111	\(y = \overline {a \cdot ( {b + c \cdot ( {d + e \cdot ( {f + g \cdot ( {h + i} )} )} )} )}\)	18	9.04
OAI2111111111	\(y = \overline {a \cdot ( {b + c \cdot ( {d + e \cdot ( {f + g \cdot ( {h + i \cdot ( {j + k} )} )} )} )} )}\)	24	10.76

Table 1. Complex Gates Designed for Creation of Adders

Fig. 8.

Another problem related to the layout design of the gates is finding the right order of inputs to obtain all NMOS and PMOS transistors put on one rectangle of the appropriate diffusion layer. Such a solution gives a smaller area occupied by a gate layout. To solve the problem, a compatible Euler path should be found for both circuits of NMOS and PMOS transistors. For designed gates, such paths have been found. It is well visible in the gate layouts (Figures 9 and 10).

Fig. 9.

Fig. 10.

Considering the functionality of gates layout, all designed cells have the same height, and thus they can be used as standard cells. Additionally, the supply lines (vdd, gnd) are matched to each other. The input and output terminals are created with metal 3 for easy further connections. Internal lines are made with metal 1 and occasionally with metal 2. Generally, 17 gates were designed: NOT, 2-, 3-, and 4-input NAND and NOR, and complex gates, whose description is presented in Table 1. The height of the cells is 5.24 μm and their width is in the fourth column of the table (in micometers).

Layouts of cells were designed using Cadence Virtuoso in UMC 180 nm CMOS technology. The minimum dimensions of transistors were used to create NMOS transistors, but the width of PMOS transistors was increased by a factor of 4.74, to obtain the symmetrical voltage transfer characteristics. Exemplary cell layouts are presented in Figure 9. The two biggest layouts are presented in Figure 10, where duality of pull-up (PMOS) and pull-down (NMOS) networks can be observed.

After functional verification of the designed cells, netlists with parasitic elements (capacitors and resistors) were generated for further simulations and calculation of the equivalent capacitance for the extended power model of gates.

3.2 Energy Parameters of Gates Used in Adders Design

The extended power model described in Section 2.2 needs to calculate the equivalent capacitance values for a cell versus its possible driving ways, Equation (6). To consider all possible changes of the input vectors, test benches with suitable input sources were prepared. For N-input gate there are changes of the input vectors—driving ways. Thus, for gates with more inputs, the simulation can take some time. The duration of input signal edges (rising and falling time) was chosen in such a way that quasi-short power did not occur. Moreover, this work focuses on dynamic power dissipation, thus 180 nm technology in which static losses can be neglected was chosen. The values of equivalent capacitance corresponding to all terminals of a gate were calculated and then appropriately summed to obtain the total power dissipation of a circuit. Naturally, it is possible to consider all components of power dissipation separately, e.g., input, internal, and so forth, but the total value is important. However, all components can be interesting, and they are presented in Figure 11. The first three on the left are input equivalent capacitances, the next one corresponds to the supply terminal, and the last one is the total equivalent capacitance (sum of all previous components).

Fig. 11.

The results of the equivalent capacitance calculation are collected naturally in tables. For example, values of the total equivalence capacitance of the gates AOI21 and OAI21 are presented in Table 2.

Table 2.

AOI21										OAI21
		Next vector										Next vector
		000	001	010	011	100	101	110	111			000	001	010	011	100	101	110	111
\({\textit{Present vector}}\)	000	0.00	1.32	1.99	7.09	3.46	4.85	5.11	7.06	\({\textit{Present vector}}\)	000	0.00	1.25	1.47	2.70	3.18	7.47	6.49	8.27
	001	1.91	0.00	4.13	5.25	5.35	3.51	7.04	5.65		001	1.92	0.00	3.39	1.47	5.22	5.71	8.42	6.99
	010	2.00	3.39	0.00	5.41	5.38	6.86	3.84	5.79		010	1.92	3.16	0.00	1.25	5.22	7.91	4.52	5.86
	011	8.95	6.66	7.39	0.00	7.02	4.82	4.82	2.02		011	3.73	1.86	1.86	0.00	7.21	6.20	6.65	4.56
	100	4.50	5.81	5.72	6.54	0.00	1.31	1.45	3.02		100	2.01	3.33	3.56	4.88	0.00	5.69	4.90	6.86
	101	6.40	4.48	8.46	4.67	1.91	0.00	3.37	1.59		101	7.98	5.70	8.01	5.78	7.39	0.00	4.08	1.88
	110	6.59	7.82	5.27	4.52	1.91	3.22	0.00	1.44		110	6.57	7.77	4.23	5.45	6.00	3.70	0.00	1.35
	111	8.23	6.29	7.04	2.83	4.90	1.72	1.75	0.00		111	8.37	6.45	6.09	4.20	7.86	1.90	2.12	0.00

Table 2. Total Equivalent Capacitance C_Ltot Extracted for the AOI21 and OAI211 Gates [fF]

Numbers collected in tables are difficult to analyze, especially for gates with more inputs. Therefore, the results of the evaluation of larger gates can be presented as a color map on graphs. For example, the total equivalent capacitances for the AOI2111 and OAI2111 gates are shown in Figure 12 and for the bigger gate in Figure 13. Analyzing the graphs, areas with smaller values (blue) or higher (brown/red) can be noticed. They are placed according to the output logic state of the gate. In the case of AOI2111 for vectors from 0 to 10, the output state is logic 1 and 0 for the rest. A dual situation occurred for the OAI gate. It was marked in the figures. For input vectors that switch the output from 0 to 1, the equivalent capacitance takes larger values (top left). It is the situation when the output capacitance is loaded to the Vdd voltage. An observation can be made that the opposite changes give lower values. Then, admittedly, the output capacitance is discharged and the power consumed from the supply is lower, but in the graph, a contribution of input capacitances of the gate (bottom right) can be noticed. The first three graphs in Figure 11 show the input capacitance of the AOI21 gate. A similar situation occurs for larger gates. Additionally, it is visible in the graphs (Figure 12), that such changes of input vectors, which do not affect the gate output state, cause power consumption too. Also, the values of equivalent capacitance are lower, but not zero (bottom left and top right). The reason is a current flowing as a result of charging or discharging of internal parasitic capacitances in the gate. Contributions of the input capacitance are better visible in Figure 13 as a characteristic rectangle. Its dimension depends on the position of the switched input.

Fig. 12.

Fig. 13.

The dependencies described above and visible in Figures 12 and 13 can be observed for other gates with a greater number of inputs and used for the construction of adders. Subsequent gates were designed based on previous ones, thus the above obtained values are a natural consequence of such a solution and show well-designed gates.

4 Test Vectors Preparation

Power consumption in CMOS static circuits strongly depends on an activity profile of input signals. The extended model of power consumption presented in Section 2.2 utilizes the probability of input vector changes (gate driving way) as a measure of circuit activity. Thus, some distributions of input vector changes are needed to test adders. Obviously, the easiest case is uniform distribution, and it can be used as a general or reference one. On the other hand, random distribution gives similar results. But adders in many cases work with specific kinds of data. In this work, 27 distributions of input vector changes were prepared based on assumed values of added numbers. The following dependency scenarios between the summed data were considered. At the beginning, two series (S1, S2) of disjoint numbers were randomly generated (20,000 vectors for 4-bit adders and it was doubled increasing the number of bits by 1). The first series included numbers from 0 to half of the range (0 \(\div\) 0.5·2^N–1) and the second included values from half of the range to the end (0.5·2^N–1 \(\div\) 2^N–1), assuming the N-bit adder (Figure 14). Histograms for these 4-bit series are shown in Figure 15. Next, consecutive numbers were merged into one 2^N-vector, using all possible combinations: A = S₁, B = S₂; A = S₂, B = S₁; A = S₁, B = S₁; and A = S₂, B = S₂, giving four scenarios of input vector distributions. Finally, for the first case, the distribution of 8-bit (merged) vector changes is presented in Figure 16.

Fig. 14.

Fig. 15.

Fig. 16.

The next four cases were prepared in a similar way, but series of disjoint vectors were generated each time. Another four distributions were created taking small overlap of numbers, e.g., S₁ = [0 \(\div\) 60% max], S₂ = [40% max \(\div\) max], and series were generated each time. Finally, 12 distributions were obtained.

Moreover, another scenario was considered: the addition of sinusoidal waveforms. Fifteen cases were prepared for further test. The final distribution of the first one, merged sin(t) and sin(2·t) with maximum amplitude (Figure 17), is presented in Figure 18. All the cases considered for sinusoidal waveform addition are described in Table 3 using a simplified equation.

Fig. 17.

Fig. 18.

Table 3.

Name	Simplified equation	Name	Simplified equation
dist_sin1	sin(t) + sin(2⋅t)	dist_sin9	sin(t) + 0.5⋅sin(t−0.12)
dist_sin2	sin(t) + sin(3⋅t)	dist_sin10	sin(t) + sin(t+0.12)
dist_sin3	sin(t) + 0.5⋅sin(2⋅t)	dist_sin11	sin(t) + cos(t)
dist_sin4	sin(t) + 0.5⋅sin(3⋅t)	dist_sin12	sin(t) + cos(2⋅t)
dist_sin5	0.5⋅sin(t) + sin(2⋅t)	dist_sin13	sin(t) + cos(3⋅t)
dist_sin6	sin(t) + 0.2⋅sin(t)	dist_sin14	sin(t) + 0.5⋅cos(t)
dist_sin7	sin(t) + sin(t−0.12)	dist_sin15	sin(t) + 0.5⋅cos(2⋅t)
dist_sin8	0.5⋅sin(t) + 0.5⋅sin(t−0.12)

Table 3. Description of Sinusoidal Vector Change Distributions

The scenarios of input vector distributions presented above show that some vectors may be absent at an adder input (Figures 16 and 18). However, on the other hand, the switching activity of all inputs is greater than zero. For the probability distribution shown in Figure 18, the switching activity for all inputs takes values 0.60, 0.28, 0.12, 0.04, 0.30, 0.14, 0.06, and 0.02 starting from LSB. The extended power model of gates, used in this work, naturally takes into account spatial and temporal correlations between signals and does not need any additional measures or coefficients.

5 Tests

5.1 Evaluation Framework

In the case of large digital circuits Spice-based simulations are not acceptable due to calculation time. Therefore, a certain solution is used for precise simulation of gates (Spice) to obtain the power model of the gates as accurately as possible and then use digital simulations to speed up the evaluation time. The evaluation framework used in this work is presented in Figure 19. The left part of the graph presents tools used to design gates (building blocks of adders) and extract their energetic parameters. The Cadence Spectre simulator was used for electrical simulations of gates under all possible changes of input vectors. The results—the currents of the terminals—were used to calculate the equivalent capacitances of the gates. It was done in MATLAB.

Fig. 19.

The second part of the framework was used to assess adders. The calculations were done in MATLAB. The energetic parameters of gates, adder netlists, and information about input vector changes probability are input data. The final result is the total equivalent capacitance of the adders.

5.2 Circuits Under Test

The building blocks presented in Section 3 were used to design as many parallel adders as possible. Not only regular structures described in the literature [36, 41], but also all possible constructions using cells with valency greater than 2, were considered. At the beginning, only 4-, 5- and 6-bit circuits were considered. In parallel prefix adders, the first stage (bitwise propagation and generation) and the third (final summation) can be realized in a few ways, but realization of the second part (carry propagation) gives many possibilities. Thus, the main focus of this article is on the PG block. Following the theory presented in Section 2.1 for simplification, the dashed diagram represents the structure of the PG block (see Figure 3). Thus, as much as possible, such structures were created for tests. The corresponding diagrams for each adder are shown in Figure 20 for 4-bit adders, and in Figure 21 for 5-bit.

Fig. 20.

Fig. 21.

The 6-bit adders are divided into two groups. Therefore, the first diagrams for these adders created with only gray cells are shown in Figure 22 and those with both types of cells are presented in Figure 23. Pictures are consequently named, and “xp” means a block has x levels and the letter “c” indicates that black cells were used in a structure.

Fig. 22.

Fig. 23.

Analyzing the diagrams presented above, realizations with various numbers of levels, utilizing gray and black cells with valency from 2 up to 6, can be noticed. Some structures are regular, and others are not. Among them, structures well described in the literature (e.g., in [38]) can be found.

6 Results AND Discussion

All circuits described in the previous section were simulated using the MATLAB environment for distributions of the input vector changes described in Section 4. In total, there were 28 cases: uniform, 12 with series of numbers, and 15 with sinus signals. The results, the total equivalent capacitance of the PG tree, for all 101 circuits are collected in Table 4, Table 5, and Table 6 for 4-, 5-, and 6-bit adders, respectively. Adders containing black cells are marked in gray. Adders consisting solely of gray cells are marked in white. The values in the tables are indicated with blue stripes for easier comparison. The minimum values are written in green italics and the maximum values are written in red bold. In addition, the minimum values except for Ripple Carry Adder (sum4b_3p7, sum5b_4p25, and sum6b_5p25) are marked in yellow. Thus, parallel adders can be easily compared.

Table 4.

Table 5.

Table 6.

The results of the equivalent capacitance collected in the tables below can be easily recalculated to power consumption using a well-known equation: \(P = f{{C}_{eqv}}V_{dd}^2\), where in this case C_eqv is the equivalent capacitance, f is the frequency of the circuit input vector changes, and \(V_{dd}^2\) is the supply voltage of the circuit. For comparing implementations of adders with each other, this parameter is sufficient and is discussed in the next paragraph.

To show trends and for better analysis, results collected in the above table for the 6-bit adders are additionally presented in the graphs shown in Figure 24.

Fig. 24.

The first observation from the results presented in the above tables is that the RCA adder is the best realization in many cases (sum4b_3p7, sum5b_4p25, and sum6b_5p25). It is obvious because of the smallest number of transistors used to build this adder, but on the other hand, it has the largest number of levels; thus, consequently, the RCA usually has the largest delay. From an activity point of view, such serial structure only calculates carry signals when necessary. Thus, it would be treated as a borderline case, but that is not a sharp boundary, as the test results show. The second realization (marked in yellow) is usually only a few percent worse than RCA, mostly about 1% or 2%. However, these differences are larger for 4-bit adders and reach up to 7.9% for distribution sin11. The RCA adder is not the best solution in all cases. Among the 28 scenarios of input activity considered, in seven to eight cases, the RCA adder turned out to be worse, and in other cases, the RCA is only a little bit better than the second solution.

Considering the obtained data, especially for 5- and 6-bit adders, it can be seen that in general the higher power consumption is for structures utilized black cells. It is clearly visible in the middle and to the right of the graphs (starting from sum6b_2p18c) presented in Figure 24. However, some exceptions exist, and implementations with black cells have similar power consumption as that with gray cells only. For example, adders sum5b_2p7c, sum5b_3p17c, sum6b_3p33c, sum6b_3p42c, or sum6b_3p47c have power consumption comparable to that without black cells. It is because they have a similar structure, only one black cell, and small redundancies. One exception that could be pointed out here is the sum6b_2p34c, which has two black cells and bigger redundancy than the ones previously mentioned, but in cases of sinus distributions, it has better power properties than for serial distributions.

Generally, power consumption in the sinus distribution case (“sinN”) is lower than in the serial distribution (“serN”). The exception is “ser4,” which is comparable with sinus. In addition, it can be observed that if sinus distributions are considered, some adders with black cells have similar power consumption as adders with gray cells only. This is caused by the sparse distribution of changes of input vectors (see Figure 18). Only a small portion of possible changes of vectors occur and, in consequence, structures with black cells can have similar power consumption as those that consist of gray cells only. It is when only one black cell is inside the adder (sum6b_21c, _23c, _32c). Therefore, if black cells were to be used, such structures require careful analysis of activity and power consumption. The structure of adders should be carefully adapted to the nature of changes of the input vectors, and a more detailed analysis that gives insight into the activity of the adder structure could be done, but this analysis can already be useful for low-power design. From the above observations, the conclusion is that black cells should be avoided because they perform redundant calculations, and finally increase power consumption.

The number of levels has a smaller influence on power consumption than the structure of an adder, especially the presence of black cells. Nonetheless, it can be observed that adders with higher levels have less power consumption. The reason is that the an adder with a smaller number of levels requires the use of cells with higher valency. They have more transistors, usually causing larger power consumption, but some exceptions can be found. Again, it depends on the type of data that are summed. Interestingly, the CLA adder, considered to be very fast because it is single level, has quite high power consumption, but smaller than those with many black cells. It uses the most complex gates, and the carry signal of each bit is calculated separately. This means that the input signals sometimes change without affecting the output, overloading the capacitance of internal nodes and consuming energy. The extended model used in this work takes this phenomenon into account very well.

Taking into consideration structures of adders known from the literature, the Brent-Kung adder (sum4b_2p5c, sum5b_3p18c, and sum6b_3p15c) and Kogge-Stone (sum4b_2p6c, sum5b_3p23c, and sum6b_3p11c) can be found in the analyzed structures. It can be observed that both are not the best realizations when considering power consumption in the cases analyzed for the presented scenarios of input data activity. The Kogge-Stone adders are worse, because they have a lot of nodes, and more redundancies occur. At the first level, there are many cells with valency-2, which are unproductively switched. Such signals are not propagated to the circuit outputs. Modifying the Kogge-Stone adder by shifting some calculations to the second level (obtaining Brent-Kung), it is possible to reduce the power consumption by one-third. The adder sum6b_49c has a lower power consumption of about 10 \(\div\) 23% compared to Kogge-Stone. It is a modified version of the Kogge-Stone adder and consists of a lower number of cells, but some calculations are still redundant. Therefore, lower power consumption occurs when there are no redundant nodes in non-regular structures.

Table 7 presents a comparison of the best and worst realizations of the adder versus input activity scenarios. In this comparison, RCA is not taken into account. The differences increase with the size of the adder, because a bigger adder gives more opportunities to consider more complex structures. The results show that it is possible to reduce the power consumed by the PG tree of adders by several dozen percent (up to 60%) when the proper structure is chosen.

Table 7.

Adder size	Distribution Name
	unif	ser	ser	ser	ser	ser	ser	ser	ser	ser	ser	ser	ser	sin	sin	sin	sin	sin	sin	sin	sin	sin	sin	sin	sin	sin	sin	sin
	unif	1	2	3	4	5	6	7	8	9	10	11	12	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
4-bit	59	58	58	71	71	59	58	63	62	58	58	61	62	72	70	67	67	66	69	67	63	67	67	70	71	64	64	73
5-bit	51	49	49	62	63	49	49	56	56	50	50	53	54	55	58	58	60	54	61	54	53	57	54	60	57	54	54	58
6-bit	44	43	43	49	49	42	42	46	46	43	43	45	45	42	44	40	42	41	43	44	41	45	44	43	44	42	41	44

Table 7. Min/Max Total Equivalent Capacitance for Adders in [%] (except RCA)

7 Conclusions

The conclusion arises that in structures of adders with more levels, a good idea is to avoid redundancies. Do not calculate the same again. If possible, use disjoint input groups. The second is to use as few nodes as possible but keep delay in mind (number of levels). Also, using cells with higher valency does not guarantee a reduction in power consumption. The best low-power structure of the PG logic in PPA depends on the kind of processed data. Thus, low-power design needs careful and detailed analysis of adders’ structure in terms of processed data scenarios.

In this article, only dynamic power dissipation was considered. In the future, static power consumption will be taken into account. Dynamic power consumption is still important, especially in fast arithmetic circuits. In this work, the analysis started with the smallest adders to understand the relationship between their structure and power consumption. The extended power model is helpful for this because it takes into account the internal spatio-temporal dependencies, but it requires a bit more computational memory. Although this topic needs further work, the conclusions from the obtained results can already be introduced in low-power designs. This article takes into account the commonly known regular PPA structures and irregular structures that have not been analyzed so far. The results show the potential of irregular structures, especially in the case of specific input data distribution. The key finding of the research is that non-standard, not-balanced PPA can outperform regular PPA when dealing with specific datasets.

References

[1]

N. H. E. Weste and K. Eshraghian. 1993. Principles of CMOS VLSI Design (2nd ed.). Addison-Wesley.

Abstract

1 Introduction

2 Background On Adders AND Power Dissipation Model

2.1 Parallel Prefix Adders

2.2 Extended Power Model of Static CMOS Gates

3 Building Blocks of Adders

3.1 Cells for Adders Design

3.2 Energy Parameters of Gates Used in Adders Design

4 Test Vectors Preparation

5 Tests

5.1 Evaluation Framework

5.2 Circuits Under Test

6 Results AND Discussion

7 Conclusions

References

Index Terms

Recommendations

Power-Delay Optimized 32 Bit Radix-4, Sparse-4 Prefix Adder

A low-power adder operating on effective dynamic data ranges

A New Implementation of 16-bit Parallel Prefix Adder for High Speed and Low Area

Comments

Information

Published In

Publisher

Journal Family

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations