

# A LOW POWER HIGH SPEED HYBRID HIGH RADIX ENCODING FOR GENERATING THE PARTIAL PRODUCTS OF A SIGNED MULTIPLIER

G.Aruna kumari<sup>1</sup>, K. Gowthami<sup>2</sup>, G. Mahesh<sup>3</sup>

<sup>1</sup>Assistant Professor, <sup>2</sup>Assistant Professor, <sup>3</sup>Assistant Professor, ECE Department, Anantha Lakshmi Institute of Technology and Sciences, Ananthapuramu, Andhra Pradesh, India.

ABSTRACT—Inexact register structures a elective that adventures plan the characteristic blunder strength of different applications and produces vitality efficient circuits with little precision misfortune. In this paper, we propose a rough half and half high radix encoding for creating the incomplete items in marked increases that encodes the most significant bits with the exact radix-4 encoding and the least significant bits with an inexact higher radix encoding. The approximations are performed by adjusting the high radix esteems to their closest intensity of two. The proposed strategy can be configured to accomplish the ideal vitality precision tradeoffs. Contrasted and the exact radix-4 multiplier, the proposed multipliers convey up to 56% vitality and 55% region reserve funds, when working at a similar

recurrence, while the forced blunder is limited by a Gaussian dissemination with close to zero normal. Additionally, the proposed multipliers are contrasted and cutting edge estimated multipliers, outperforming them by up to 40% in vitality utilization, for comparative blunder esteems. At long last, we exhibit the adaptability of our procedure.

Index Terms—Approximate computing, error resiliency, low power, radix encodings, signed multipliers.

## **1.INTRODUCTION**

In modern inserted frameworks and server farms, vitality efficiency is a required plan concern. Taking into account that a lot of use spaces displays an inborn mistake resistance, e.g., computerized sign handling



(DSP), picture information preparing, investigation, and information mining [1], [2], estimated processing shows up as a compelling answer for decrease their dissemination. capacity In estimated figuring, blunder has been seen as a product that can be exchanged for significant gains in expense (e.g., power, vitality, and execution) [3], and therefore, it makes a promising structure worldview focusing on vitality efficient frameworks by very diminishing the power utilization of inalienably mistake strong applications. Specifically, estimated figuring misuses the inborn blunder tolerance of the respective applications and deliberately relaxes the rightness of certain calculations, so as to diminish their capacity utilization as well as quicken their execution. As of late, focusing to exploit its benefits, huge research has been directed in the field of equipmen testimated circuits. The fundamental targets are math units, e.g., adders and multipliers, that are the center parts in many inserted gadgets and equipment quickening agents. Broad research is accounted for in surmised adders [4]–[8], giving significant gains regarding deferral and power dispersal. Be that as it may, look into exercises on the inexact multipliers [9]–[21] is less extensive contrasted and the separate on surmised

In multipliers, adders. inaccurate approximations can be connected on the fractional item age [16]-[18], just as the halfway item collection [9]–[11], [13]. Approximations on the fractional item age and approximations on their gathering are synergistic, and can be connected in cooperation so as to accomplish higher power decrease [17], [18], [22]. In spite of the fact that significant research has been directed in the fractional item amassing, inquire about movement on the estimate of the incomplete item age is as yet restricted. At long last, another restriction of the current surmised multipliers is that most of them (see [9], [10], [12], [19], [20]) does not look at marked augmentation. In this paper, focusing on the plan of vague multipliers by applying approximations on the halfway item age, we propose a novel rough crossover high radix encoding. In the proposed method, the most significant bits (MSBs) of the multiplicand are encoded utilizing the radix-4 encoding, though the k least significant bits (LSBs) are encoded utilizing a radix-2 k (with  $k \ge 4$ ). To streamline the expanded multifaceted nature initiated by the proposed half breed encoding, the circuit for creating the halfway items is approximated by modifying in like manner its reality table. Thus, the



quantity of the incomplete items diminishes significantly and more straightforward tree designs are utilized for their amassing, decreasing the multiplier's vitality utilization, region, and basic way delay. The significant commitments of this paper are condensed as pursues.

1) We propose and empower the utilization of cross breed high radix encodings for the age of vitality efficient estimated multipliers, surpassing the expanded equipment multifaceted nature of exceptionally high radix encodings.

2) The proposed method can be connected to any multiplier design and is reconfigurable, empowering the client to choose the ideal per application vitality blunder tradeoff.

3) An investigative mistake examination is led, demonstrating that the yield blunder of the proposed procedure is limited and unsurprising. Such a thorough blunder examination prompts exact and a prior error estimation for any information circulation, without the need of tedious reenactments.

4) We demonstrate that the proposed strategy outflanks related best in class inexact marked multipliers interms of equipment and precision, accomplishing up to 40% less vitality dissemination for similar blunder esteems.

All the more specifically, the proposed strategy is connected to a  $16 \times 16$  piece multiplier and is assessed utilizing mechanical quality devices, i.e., Synopsys Design Compiler, PrimeTime, and Mentor Graphics ModelSim. Contrasted and the precise multiplier, the proposed method conveys up to 55% vitality and territory decrease, for mean relative blunder up to 0.93%.

#### 2.LITERATURE SURVEY

V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan, "Analysis and characterization of inherent application resilience for approximate computing,"[1] Approximate computing is an emerging design paradigm that enables highly efficient hardware software and implementations by exploiting the inherent resilience of applications to in-exactness in their computations. Previous work in this area has demonstrated the potential for performance significant energy and improvements, but largely consists of ad hoc techniques that have been applied to a small number of applications. Taking approximate computing closer to mainstream adoption requires (i) a deeper understanding of



inherent application resilience across a broader range of applications (ii) tools that can quantitatively establish the inherent resilience of an application, and (iii) methods to quickly assess the potential of various approximate computing techniques for a given application. We make two key contributions in this direction. Our primary contribution is the analysis and characterization of inherent application resilience present in a suite of 12 widely used applications from the domains of recognition, data mining, and search. Based on this analysis, we present several new insights into the nature of resilience and its relationship to various key application characteristics. To facilitate our analysis, we propose a systematic framework for Application Resilience Characterization (ARC) that (a) partitions an application into resilient and sensitive parts and (b) characterizes the resilient parts using approximation models that abstract a wide range of approximate computing techniques. We believe that the key insights that we present can help shape further research in the area of approximate computing, while resilience automatic characterization frameworks such as ARC can greatly aid designers in the adoption approximate computing.

#### **3.PROPOSED WORK**

#### 1. Related Work

In this section, prior works in the field of approximate multipliers, related to our proposed design, are discussed. As far as the approximate adders are concerned, [4] produces approximate adders by simplifying the logic used in the adder. Gupta et al. [5] design imprecise full adder cells by approximating their logic function and then use them to build approximate adders. Nevertheless, it is not clear how these adders can be used in different tree architectures and how the error scales in the case of multi operand accumulation. Reference [6] proposes an approximate adder that consists of an accurate and an inaccurate part. Reference [7] designs a multiplier in which the LSBs of the additions are approximated by applying bitwise OR to the respective input bits. In [8], a fast approximate adder is produced by limiting the carry propagation, based on a proof that the longest carry chain in an n-bit adder is logn. However, despite the fact that all these techniques demonstrate the benefit of approximate computing, their fixed functionality and low-level design limit further improvements in efficiency. Regarding the approximations in multiplication schemes, Kulkarni et al. [9]



propose an under designed  $2 \times 2$  inaccurate multiplier block to produce the partial products, and then use it as a building block to design larger multipliers. This technique is characterized by high error and hardware overhead for the error detection and correction due to the small building block. Momeni et al. [10] propose two approximate 4:2 compressors to accumulate the partial products by modifying the respective accurate truth table, and then use them to build approximate multipliers. The negative aspect of this technique is that there is no parameter to adjust the accuracy of the multiplier. Lin and Lin [11] introduce a high accuracy approximate  $4 \times 4$  Wallace tree multiplier by employing a 4:2 approximate counter that reduces the partial product stages, and then use it to build larger multipliers. However, the delay is large due to the array structure of the multiplier. Kyaw et al. [12] propose a multiplier that divides the operands in two parts: an accurate multiplication-based part that includes the **MSBs** and approximate an nonmultiplication-basedpart for the LSBs that does not generate partial products. This design delivers significant reductions in power and delay, but only for specific input combinations. Moreover, Liu et al. [13] propose an approximate multiplier with

configurable error recovery that uses inaccurate fast adders for the partial product additions. Although this multiplier delivers low power, the error imposed cannot be predicted, as it depends on the carry propagation. Narayanamoorthy et al. [14] statically split the operands in three m-bit segments and perform the multiplication utilizing the segment that contains the most significant nonzero bit. However, this approach exhibits small scalability, as m should be at least half the operand bit-width in order to keep the accuracy in acceptable limits. Reference [15] extended this idea to enable dynamic range multiplications. The dynamic partition technique requires extra components for the signed multiplications, significant hardware overhead. adding Recently, Zendegani et al. [21] proposed a multiplier that rounds the input operands into the nearest exponent of two. Finally, [23] replaces the floating-point operations with fixed-point ones, and by applying the proposed stochastic rounding, achieves good accuracy results in training deep neural networks while delivering high energy savings by limiting the data precision The modified representation. Booth encoding is commonly used in signed multipliers [16]–[18], [24],[25]. Although techniques perform fast these



multiplications, the number of the partial products is not reduced in most cases, in contrast with our design. Zervakis et al. [16] introduce the partial product perforation technique, where they omit the generation of some partial products based on the modified Booth encoding. Jiang et al. [17] propose an approximate radix-8 booth multiplier that uses an approximate adder for producing $\pm 3$ A, and combine this idea with the truncation method. Recently, Liu et al. [18] designed approximate modified Booth encoders by Another significant aspect of this paper is that the error imposed depends only on the configuration parameter k, and as a result, it can be calculated without the need for exhaustive simulations. Consequently, a precise estimation of the output quality can be extracted for the application's inputs, giving the flexibility to target the maximum energy reduction for a specific error bound.

# 2. Approximate Hybrid High Radix Multipliers

High radix encodings offer partial products reduction, and as a result, their accumulation requires smaller trees, leadingto energy,

area, and/or delay savings. However, high radix

encodingsrequirecomplexencodingandpartial productgeneration circuits, negating thus the benefits of the partial products reduction. In

this section, the proposed hybrid high radix encodingand the performed approximations for simplifying its circuit complexity are presented. In the proposed technique, the multiplicand B is encoded using the approximate high radix generating ~ B, and encoding, the approximate multiplication  $A^{-}$  B is performed. Finally, its adaptation on inexact 16-bit hardware multipliers is described, and a qualitative analysis is conducted, targeting to estimate the potential area gains.

#### A. Hybrid High Radix Encoding

In the proposed hybrid high radix encoding, B is divided in two groups: the MSB part of n-k bits and the LSB part of k bits. The configuration parameter,  $k \ge 4$ , is an even number, namely, k = 2m:  $m \in Z$ , with  $m \ge 2$ . The MSB part is encoded using the radix-4 (modified Booth) encoding, while the LSB part is encoded with the high radix-2<sup>K</sup> encoding

$$B = -b_{n-1}2^{n-1} + \sum_{i=0}^{n-2} b_i 2^i = \sum_{\substack{j=k/2\\k\ge 4}}^{n/2-1} y_j^{R4} 4^j + y_0^{R2^k}$$
(1)

where

$$y_j^{R4} = -2b_{2j+1} + b_{2j} + b_{2j-1} \tag{2}$$

and

$$y_0^{R2^k} = -2^{k-1}b_{k-1} + 2^{k-2}b_{k-2} + \dots + b_0.$$
(3)

The radix-4 encoding includes (n - k)/2digits  $y^{R4}$   $j \in \{0,\pm 1,\pm 2\}$ , while  $y^{R2K}$   $0 \in \{0,\pm 1,\pm 2,\pm 3,...,\pm (2k-1-1), -2k-1\}$ 



corresponds to the radix-2k encoding. Overall, B is encoded with (n-k)/2+1 digits. The above hybrid high radix encoding is characterized by increased logic complexity, due to the high radix values of yR2k 0 that are not power of two, and thus, an approximate version is proposed. However, in order to retain high accuracy, the radix-4 encoding of the MSB is performed accurately. In particular, in the approximate encoding, all the values that are not power of two and the k-4 smallest powers of two as well, are rounded to the nearest of the 4 largest powers of two or 0, so that the sum of all the values of the approximate digit ^ yR2k 0 is 0. We choose to keep only the four largest powers of two, so that the radix-2k encoding circuit requires only about the double area in comparison with the accurate encoder. radix-4 Therefore, В is approximately encoded as follows:

$$\tilde{B} = \sum_{\substack{j=k/2\\k \ge 4}}^{n/2-1} y_j^{R4} 4^j + \hat{y}_0^{R2^k}$$

where

$$y_j^{R4} \in \{0, \pm 1, \pm 2\}$$

and

$$\hat{y}_0^{R2^k} \in \{0, \pm 2^{k-4}, \pm 2^{k-3}, \pm 2^{k-2}, \pm 2^{k-1}\}.$$

|            | Input    |            | R4 Digit   | Output   |              |              |  |
|------------|----------|------------|------------|----------|--------------|--------------|--|
| $b_{2j+1}$ | $b_{2j}$ | $b_{2j.1}$ | $y_j^{R4}$ | $sign_j$ | $\times 2_j$ | $\times 1_j$ |  |
| 0          | 0        | 0          | 0          | 0        | 0            | 0            |  |
| 0          | 0        | 1          | 1          | 0        | 0            | 1            |  |
| 0          | 1        | 0          | 1          | 0        | 0            | 1            |  |
| 0          | 1        | 1          | 2          | 0        | 1            | 0            |  |
| 1          | 0        | 0          | ·2         | 1        | 1            | 0            |  |
| 1          | 0        | 1          | -1         | 1        | 0            | 1            |  |
| 1          | 1        | 0          | -1         | 1        | 0            | 1            |  |
| 1          | 1        | 1          | 0          | 1        | 0            | 0            |  |

TABLE I Accurate Radix-4 Encoding Table

| TABLE II                                    |              |
|---------------------------------------------|--------------|
| APPROXIMATE RADIX-2 <sup>k</sup> Encoding 7 | <b>'ABLE</b> |

| R2 <sup>k</sup> Digit                                    | Output                 |      |                  |                  |                  |                  |
|----------------------------------------------------------|------------------------|------|------------------|------------------|------------------|------------------|
| $y_0^{R2^k}$                                             | $\hat{y}_{0}^{R2^{k}}$ | sign | $\times 2^{k-1}$ | $\times 2^{k-2}$ | $\times 2^{k-3}$ | $\times 2^{k-4}$ |
| $[0, 2^{k-5})$                                           | 0                      | 0    | 0                | 0                | 0                | 0                |
| $[2^{k-5}, 2^{k-4}+2^{k-5})$                             | $2^{k-4}$              | 0    | 0                | θ                | 0                | 1                |
| $[2^{k-4}+2^{k-5}, 2^{k-3}+2^{k-4})$                     | $2^{k-3}$              | 0    | 0                | 0                | 1                | 0                |
| $[2^{k\cdot 3}+2^{k\cdot 4}, 2^{k\cdot 2}+2^{k\cdot 3})$ | $2^{k \cdot 2}$        | 0    | 0                | 1                | 0                | 0                |
| $[2^{k\cdot 2}+2^{k\cdot 3}, 2^{k\cdot 1})$              | $2^{k-1}$              | 0    | 1                | 0                | 0                | 0                |
| $[2^{k\cdot 1}, 2^{k\cdot 2}, 2^{k\cdot 3})$             | $\cdot 2^{k \cdot 1}$  | 1    | 1                | 0                | 0                | 0                |
| $[-2^{k-2}, 2^{k-3}, -2^{k-3}, 2^{k-4}]$                 | ·2 <sup>k-2</sup>      | 1    | 0                | 1                | 0                | 0                |
| $[-2^{k-3}, 2^{k-4}, -2^{k-4}, 2^{k-5}]$                 | ·2 <sup>k-3</sup>      | 1    | 0                | 0                | 1                | 0                |
| $(-2^{k-4} \cdot 2^{k-5}, -2^{k-5})$                     | ·2 <sup>k-4</sup>      | 1    | 0                | θ                | 0                | 1                |
| $[-2^{k-5}, 0)$                                          | 0                      | 1    | 0                | 0                | 0                | 0                |



Table I presents the accurate radix-4 encoding. The output signals sign<sub>j</sub>,  $\times 1_j$ , and  $\times 2_j$  define the radix-4 digit  $y_j^{R4}$ . Their logic equations are the following:

$$sign_{j} = b_{2j+1}$$
 (7)

$$\times 1_j = b_{2j-1} \oplus b_{2j} \tag{8}$$

$$\times 2_{j} = (b_{2j+1} \oplus b_{2j}) \cdot (b_{2j-1} \oplus b_{2j}). \tag{9}$$

Table II presents the approximate radix- $2^k$  encoding. The logic equations of the encoding signals that define the radix- $2^k$  digit  $\hat{y}_0^{R2^k}$  are the following:

$$\begin{aligned} \operatorname{sign} &= b_{k-1} \\ \times 2^{k-4} &= (\overline{b}_{k-2} \cdot \overline{b}_{k-3} \cdot \overline{b}_{k-4} + b_{k-2} \cdot b_{k-3} \cdot b_{k-4}) \\ & \cdot (b_{k-4} \oplus b_{k-5}) \\ \times 2^{k-3} &= \overline{b}_{k-1} \cdot \overline{b}_{k-2} \cdot (\overline{b}_{k-3} \cdot b_{k-4} \cdot b_{k-5} + b_{k-3} \cdot \overline{b}_{k-4}) \\ & + b_{k-1} \cdot b_{k-2} \cdot (b_{k-3} \cdot \overline{b}_{k-4} \cdot \overline{b}_{k-5} + \overline{b}_{k-3} \cdot b_{k-4}) \\ \end{aligned}$$

$$(12)$$

$$x^{2^{n-k}} = b_{k-2} \cdot b_{k-3} \cdot (b_{k-1} + b_{k-4})$$

$$+ b_{k-2} \cdot \overline{b}_{k-3} \cdot (\overline{b}_{k-1} + \overline{b}_{k-4})$$

$$(13)$$

$$\times 2^{k-1} = b_{k-1} \cdot b_{k-2} \cdot b_{k-3} + b_{k-1} \cdot b_{k-2} \cdot b_{k-3}.$$
 (14)

The effectiveness of the approximate hybrid radix encoding technique is explored with its application to 16-bit signed numbers, for k =6,8,10, namely, the LSBs are encoded using We choose to keep only the four largest powers of two, so that the radix-2k encoding circuit requires only about the double area in comparison with the accurate radix-4 encoder.



Fig. 1. - For parial product generator based on (a) accurate radio-4 encoding and the approximate (b) radio-54, (c) radio-256, and (d) radio-1024 encoding, ag: 1-36 of aperand 3, ag = ag (3) sign.



Fig. 2. Partial product tree based on the hybrid encoding of accurate nafer 4 and approximate (a) ender 44, (d) ender 556, and (c) ender W24 encoding. **III** partial product hits from the approximate high ender encoding, **x** partial product hits from the accurate ender 4 encoding. **III** and 4: inverted M50k of the partial products. (D and o: sign factors:

the radix-64,radix-256,and radix-1024encoding,respectively. In the radix-64 encoding, the bits of B are grouped as in

$$\underbrace{y_{6}^{R4}}_{y_{7}^{R4}}\underbrace{y_{6}^{R4}}_{y_{5}^{R4}}\underbrace{y_{4}^{R4}}_{y_{6}^{R4}}\underbrace{y_{0}^{R64}}_{y_{0}^{R64}}_{y_{0}^{R64}}.$$
 (15)

The following values of the digit yR64 0 are rounded to their nearest power of two:  $\pm 1$ ,  $\pm 3$ ,  $\pm 5$ ,  $\pm 6$ ,  $\pm 7$ ,  $\pm 9$ , ...,  $\pm 15$ ,  $\pm 17$ , ...,  $\pm 31$  are rounded to  $\pm 4$ ,  $\pm 8$ ,  $\pm 16$ , or  $\pm 32$ , while the smallest powers of two, i.e.,  $\pm 1$  and  $\pm 2$ , are rounded to 0 or  $\pm 4$ . In radix-1024 encoding, the bits of B are grouped as follows:

$$\underbrace{\begin{array}{c}y_{6}^{R4} & y_{0}^{R1024}\\ b_{15}b_{14}\overline{b_{13}}b_{12}\overline{b_{11}}b_{10}\overline{b_{9}}b_{8}b_{7}b_{6}b_{5}b_{4}b_{3}b_{2}b_{1}b_{0}\\ y_{7}^{R4} & y_{5}^{R4}\end{array}}_{(16)}$$

Similarly, the nonpowers of two are rounded to  $\pm 64$ ,  $\pm 128$ ,  $\pm 256$ , or  $\pm 512$ , and the smallest powers of two ( $\pm 1$ ,  $\pm 2$ ,  $\pm 4$ ,  $\pm 8$ ,  $\pm 16$ ,  $\pm 32$ ) are rounded to 0 or  $\pm 64$ . The encoder's



inputs are the bits b9,b8,...,b0, the approximate radix-1024 digit is  $^{y}$ R1024 0  $\in \{ 0,\pm 64,\pm 128,\pm 256,\pm 512 \}$ , and the output signals that define  $^{y}$ R1024 0 are sign,  $\times 64$ ,  $\times 128$ ,  $\times 256$ ,  $\times 512$ .

#### **B.** Partial Product Generation

In the proposed hybrid encoding, the n–k MSBs of B are encoded with the accurate radix-4 encoding, while the k LSBs are encoded with an approximateradix-2k encoding. The accurate radix-4 encoder produces the signals defined in (7)–(9), whereas the approximate high radix encoder produces the signals of (10)–(14). Overall, there is a reduction of k/2–1 partial products generated in the multiplication  $A \cdot B$ .

# TABLE III Partial Products per Radix Encoding

| Radix Encoding | Partial Products                           |  |  |  |  |
|----------------|--------------------------------------------|--|--|--|--|
| Radix-4        | $0, \pm A, \pm 2A$                         |  |  |  |  |
| Radix-64       | $0,\pm 4A,\pm 8A,\pm 16A,\pm 32A$          |  |  |  |  |
| Radix-256      | $0,\pm 16A,\pm 32A,\pm 64A,\pm 128A$       |  |  |  |  |
| Radix-1024     | $0, \pm 64A, \pm 128A, \pm 256A, \pm 512A$ |  |  |  |  |

In Fig. 1, four partial product generators are presented, i.e., the circuit of the accurate radix-4 encoding and the ones of the three approximate high radix encodings discussed in Section III-A. The partial products created from each encoding are shown in Table III. In addition, the three hybrid high radix encodings create the partial product trees shown in Fig. 2. The trees also include the encoding's correction term (constant terms and sign factors). The implementation of the partial product accumulation can be chosen by the designer. In this paper, an accurate Wallace tree [26] is used to implement the partial product's sum, whereas the two outputs produced by the Wallace tree are added using a prefix (fast) adder.

Overall, the multiplication circuit consists of stages of operand hybrid radix encoding, partial product generation, partial product accumulation, and final addition. The proposed approximate multipliers are named RAD2k, showing the selected approximate high radix encoding, e.g., RAD64, RAD256, and RAD1024.

#### C. Unit Gate Model

The advantage of the approximate hybrid high radix multipliers is their simple logic, resulting in fast operation and low power performance. Although overhead is added because of the encoding circuits, it is insignificant because of the approximations made. Also, this offset is compensated withthe partial product generators that deliver low area, and the reduction of the number of the partial products.



TABLE IV UNIT GATES AREA OF THE RADIX ENCODERS AND PARTIAL PRODUCTS GENERATORS

| Circuit                                        | Unit Gates [27] |
|------------------------------------------------|-----------------|
| Radix-4 Encoder                                | 5.5             |
| Radix-2 <sup>k</sup> Encoder                   | 41.5            |
| Radix-4 Partial Product Generator              | 5               |
| Radix-2 <sup>k</sup> Partial Product Generator | 9               |

TABLE V UNIT GATES AREA SAVINGS OF RAD MULTIPLIERS

| Multiplier Stage               | ACCR4 | RAD64  | RAD256 | RAD1 |
|--------------------------------|-------|--------|--------|------|
| Radix-4 Encoding               | 44    | 25     | 20     | 15   |
| Radix-2 <sup>k</sup> Encoding  |       | 41.5   | 41.5   | 41.  |
| Radix-4 PP Gener.              | 640   | 400    | 320    | 240  |
| Radix-2 <sup>k</sup> PP Gener. | -     | 180    | 198    | 216  |
| PP Accumulation                | 784   | 560    | 448    | 336  |
| Final Addition                 | 400   | 400    | 400    | 400  |
| <b>Total Unit Gates</b>        | 1868  | 1606.5 | 1427.5 | 1248 |
| Reduction %                    | 100   | 14.00  | 23.58  | 33.1 |

In order to give a theoretical evaluation of the proposed multipliers, an area gate model is included. The area evaluation is performed by using the unit gate model of [27]: a XOR-2 gate counts as 2 unit gates, an AND-2 or an OR-2 gate is equal to 1 unit gate, and a NOT gate is equal to 0.5 unit gate. According to this model, Table IV presents the number of unit gates of each circuit used in our multipliers. Overall, the accurate radix-4 n-bit multiplier uses n/2 radix-4encodersandn2/2 radix-4partial product generators to produce the n/2 n-bit partial products. Similarly, the proposed approximate radix-2k, with  $k \ge 4$ , requires (n-k)/2 radix-4 encoders, 1 radix-2k

encoder, n(n - k)/2 radix-4 partial product generators, and n + k - 2 radix-2 k partial product generators.

## **4.RESULTS**







| 18 121                                                                                                                                | 0                                         |                                                                 |                                                       | (1                                                                                                  |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|-----------------------------------------------------------------|-------------------------------------------------------|-----------------------------------------------------------------------------------------------------|--|--|--|
| <i>r</i>                                                                                                                              | S.V.                                      |                                                                 |                                                       |                                                                                                     |  |  |  |
|                                                                                                                                       |                                           |                                                                 |                                                       |                                                                                                     |  |  |  |
| ning constraint:                                                                                                                      | Default OF                                | FHT IS                                                          | BERGE E                                               | ne Clock "elk"                                                                                      |  |  |  |
| lotal number of                                                                                                                       | paths / des                               | tination                                                        | ports:                                                | 249 / 46                                                                                            |  |  |  |
|                                                                                                                                       |                                           |                                                                 |                                                       |                                                                                                     |  |  |  |
| Eset:                                                                                                                                 | 6.13915                                   | 20022                                                           | of Logar                                              | := 5)                                                                                               |  |  |  |
|                                                                                                                                       | B<1> (E)                                  | 201                                                             |                                                       |                                                                                                     |  |  |  |
| Destination:                                                                                                                          | m2/count 1 (FF)                           |                                                                 |                                                       |                                                                                                     |  |  |  |
|                                                                                                                                       |                                           |                                                                 |                                                       |                                                                                                     |  |  |  |
|                                                                                                                                       | k: clk cisi                               | ц                                                               |                                                       |                                                                                                     |  |  |  |
| Destination Circ                                                                                                                      | 000000000                                 |                                                                 |                                                       |                                                                                                     |  |  |  |
| Destination Circ                                                                                                                      | 000000000                                 | 1                                                               | Tet                                                   |                                                                                                     |  |  |  |
| Destination Cloc<br>Data Path: BCD>                                                                                                   | to m2/count                               | l<br>Gate                                                       | - 67C                                                 | Louizal Jane (Set Jane)                                                                             |  |  |  |
| Destination Cloc<br>Data Path: BCD>                                                                                                   | to m2/count                               | l<br>Gate                                                       | - 67C                                                 |                                                                                                     |  |  |  |
| Destination Cloc<br>Data Path: BCD>                                                                                                   | te m2/count<br>fancat                     | l<br>Gate<br>Delay                                              | Delay                                                 |                                                                                                     |  |  |  |
| Destination Cinc<br>Data Path: BcD<br>Dell:in-Not                                                                                     | to n2/count<br>fencet                     | Gate<br>Delay                                                   | Delay<br>1.690                                        | Logical Name (Set Name)                                                                             |  |  |  |
| Destination Circ<br>Data Path: BCI><br>Cell::in->cct<br>IBUP:1->0                                                                     | te m2/count<br>fancat<br>5<br>1           | Gate<br>Delay<br>1.116<br>0.612                                 | lelay<br>1.691<br>1.505                               | Logical Name (Net Name)<br><br>B_1_IBUT (B_1_IBNT)                                                  |  |  |  |
| Destination Cloc<br>Data Path: Bcl><br>Cell:in->out<br>IBOF:1->0<br>LUT4:10->0                                                        | te m2/count<br>fanout<br>5<br>1<br>1      | f<br>Gate<br>Delay<br>1.106<br>1.612<br>1.612                   | Delay<br>1.691<br>1.515<br>1.515                      | logical Same (Set Same)<br><br>8_1_ENTF (5_1_ENTF)<br>m2/count_mam0006C3511 (m2/count_mam0006C3511) |  |  |  |
| Destination Circ<br>Data Path: B(1)<br>Cell:in-Yout<br>HUT:1-YO<br>LUT4:11-YO<br>LUT4:11-YO                                           | te m2/count<br>fenout<br>5<br>1<br>1<br>1 | -<br>Gate<br>Delay<br>1.106<br>0.612<br>0.612<br>0.612          | Delay<br>1.690<br>1.509<br>1.509<br>1.509             | logical Same (Set Same)<br>                                                                         |  |  |  |
| Destination Circ<br>Data Path: B(1)<br>Cell:in->out<br>HUT:1->0<br>LUT4:10->0<br>LUT4:10->0<br>LUT4:10->0<br>LUT4:10->0               | te m2/count<br>fenout<br>5<br>1<br>1<br>1 | -<br>Gate<br>Delay<br>1.106<br>0.612<br>0.612<br>0.612          | lelay<br>1.69<br>1.59<br>1.59<br>1.59<br>1.59<br>1.59 | Logical Same (Set Same)<br>                                                                         |  |  |  |
| Destination Circ<br>Data Fath: B(1)<br>Chil:in->out<br>HNT:1->0<br>LUT4:11->0<br>LUT4:11->0<br>LUT4:11->0<br>LUT4:11->0<br>LUT4:11->0 | te m2/count<br>fenout<br>5<br>1<br>1<br>1 | J<br>Gate<br>Delay<br>1.106<br>1.612<br>1.612<br>1.612<br>1.612 | lelay<br>1.69<br>1.59<br>1.59<br>1.59<br>1.59<br>1.59 | Logical Same (Set Same)<br>                                                                         |  |  |  |

#### 2 PR PI4 9

|              |        | Gate    | llet     |                                                |
|--------------|--------|---------|----------|------------------------------------------------|
| Cell:in-Xout | fancut | Ielay   | Ielay    | Logical Name (Set Name)                        |
| FURSE: 0->Q  | 30     | 0.514   | 1.224    | m4/o 0 (m4/o 0)                                |
| UIT3;10->0   | 2      | 1.612   | 1.383    | Sh203 500 (967)                                |
| LUT4:13->0   | 1      | 1.612   | 0.481    | Sh183 (Sh18 bdd2)                              |
| LUT4:12->0   | 2      | 1.612   | 1.383    | Sh26260 (Sh26 hdd0)                            |
| LUT4:13->0   | 1      | 0.612   | 0.426    | 5h2611 (Sh26)                                  |
| LUT2:11->0   | 1      | 1.612   | 1.000    | m7/Madi s lut<10> (m7/Madi s lut<10>           |
| MINCY:5->0   | 1      | 0.404   | 1.000    | m7/Madi s cy<10> (m7/Madi s cy<10>)            |
| MIRCY:CI-X0  | 1      | 0.052   | 1.000    | m7/Madi s cy <ll> (m7/Madd s cy<ll>)</ll></ll> |
| MINCY;CI-X0  | 1      | 0.052   | 1.000    | m7/Madi s cy<12> (m7/Madd s cy<12>)            |
| MINCY;CI->0  | 1      | 0.052   | 1.000    | m7/Madd s cy <l3> (m7/Madd s cy<l3>)</l3></l3> |
| MIRCY;CI-X0  | 1      | 0.052   | 1.000    | m7/Madi s cy <l4> (m7/Madd s cy<l4>)</l4></l4> |
| MIRCY:CI-X0  | 1      | 0.052   | 1.000    | m7/Madi s cy<15> (m7/Madd s cy<15>)            |
| XORCY:CI-XO  | 26     | 1,699   | 1.223    | m7/Madi s mor<16> (L<16>)                      |
| LUT3;10->0   | 2      | 1.612   | 0.449    | m8/5h42 SW1 (W35)                              |
| LUT3:11->0   | Ξ      | 1.612   | 0.481    | m8/Sh40 (m8/Sh40)                              |
| LUT3:12->0   | 1      | 1.612   | 0.454    | m8/5h116 5W0 (m8/5h11220)                      |
| LUT4:13->0   | 1      | 1.612   | 0.454    | m8/5h116 (D<20>)                               |
| LUT4:13->0   | 1      | 1.612   | 1.000    | n8/mux(5)211 (n8/mux(5)21)                     |
| MINE5:11->0  | 1      | 0.278   | 1.357    | m8/mam<5>21 f5 (D<5>)                          |
| 080F:1->0    |        | 3,169   |          | P 5 080E (PCS)                                 |
| Total        |        | 17.758n | 8 (11.44 | lna logic, 6.317ms route)                      |
|              |        |         |          | i logic, 35.68 route)                          |
|              |        |         |          |                                                |

## CONCLUSION

In this paper, we propose an inexact cross breed high radix encoding for creating the halfway results of a marked multiplier. The MSBs of the multiplicand are encoded with the exact radix-4 encoding, while its k LSBs are encoded with an estimated high radix-2k encoding, with k being a configuration parameter that changes the tradeoff among exactness and vitality utilization. The blunder of the proposed procedure pursues a Gaussian conveyance with close to zero normal. Contrasted and best in class estimated multipliers, the proposed ones establish better inexact plan elective, beating them in both vitality utilization and precision. Besides, we demonstrated the efficiency of the proposed multipliers when connected, all things considered, applications. At last, the proposed method is adaptable, conveying higher vitality reserve funds for a similar mistake, as the multiplier's size increments.

#### REFERENCES

[1] V. K. Chippa, S. T. Chakradhar, K. Roy, and A. Raghunathan, "Analysis and characterization of inherent application resilience for approximate computing," in Proc. ACM/IEEE Design Autom. Conf., May 2013, pp. 1–9.



[2] S. T. Chakradhar and A. Raghunathan,
"Best-effort computing: Re-thinking parallel software and hardware," in Proc.
ACM/IEEE Design Autom. Conf., Jun.
2010, pp. 865–870.

[3] A. Lingamneni, C. Enz, K. Palem, and C. Piguet, "Highly energyefficient and quality-tunable inexact FFT accelerators," in Proc. IEEE Custom Integr. Circuits Conf., Sep. 2014, pp. 1–4.

[4] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, "Low-power digital signal processing using approximate adders," IEEE Trans. Comput.Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp. 124–137, Jan. 2013.

[5] V. Gupta, D. Mohapatra, S. P. Park, A.
Raghunathan, and K. Roy, "IMPACT:
IMPrecise adders for low-power approximate computing," in Proc. 17th
IEEE/ACM Int. Symp. Low-Power Electron. Design, Aug. 2011, pp. 409–414.

[6] N. Zhu, W. L. Goh, W. Zhang, K. S. Yeo, and Z. H. Kong, "Design of low-power high-speed truncation-error-tolerant adder and its application in digital signal processing," IEEE Trans. Very Large Scale

Integr. (VLSI) Syst., vol. 18, no. 8, pp. 1225–1229, Aug. 2010.

[7] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, "Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 4, pp. 850–862, Apr. 2010.

[8] A. K. Verma, P. Brisk, and P. Ienne, "Variable latency speculative addition: A new paradigm for arithmetic circuit design," in Proc. Design, Autom. Test Eur., Mar. 2008, pp. 1250–1255.

[9] P. Kulkarni, P. Gupta, and M. Ercegovac, "Trading accuracy for power with an under designed multiplier architecture," in Proc. 24th Int. Conf. VLSI Design, Jan. 2011, pp. 346–351.

[10] A. Momeni, J. Han, P. Montuschi, and
F. Lombardi, "Design and analysis of approximate compressors for multiplication," IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.

