User:Lokesh Yadav

Introduction

The most accurate power analysis tools are available for the circuit level but unfortunately, even with switch- rather than device-level modelling, tools at the circuit level have disadvantages like they are either too slow or require too much memory thus inhibiting large chip handling. The majority of these are simulators like SPICE and have been used by the designers for many years as performance analysis tools. Due to these disadvantages, gate-level power estimation tools have begun to gain some acceptance where faster, probabilistic techniques have begun to gain a foothold. But it also has its trade off as speedup is achieved on the cost of accuracy, especially in the presence of correlated signals. Over the years it has been realized that biggest wins in low power design cannot come from circuit- and gate-level optimizations whereas architecture, system, and algorithm optimizations tend to have the largest impact on power consumption. Therefore, there has been a shift in the incline of the tool developers towards high-level analysis and optimization tools for power.

Motivation

It is well known that more significant power reductions are possible if optimizations on higher levels of abstraction are made like the architectural and algorithmic level than at the circuit or gate level^[1]. This provides the required motivation for the developers to focus on the development of new architectural level power analysis tools. This in no way imply that lower level tools are unimportant. Instead, each layer of tools provides a foundation upon which the next level can be built. The abstractions of the estimation techniques at a lower level can be used to higher level and applied again with slight modifications.

Advantages of doing power estimation at RTL or architectural level

The Register-transfer level(RTL) input description of the design allows the designer to make optimizations and trade-offs very early in the design flow.
The presence of functional blocks in an RTL description makes the complexity of the architectural design much more manageable even for large chips due to its large enough granularity than the corresponding gate- or circuit-level descriptions.

Architecture-Level Tools

There are two primary techniques for architectural power analysis:

Gate Equivalents^[2]

It is a technique based on the concept of gate equivalents. The complexity of a chip architecture can be described approximately in terms of gate equivalents where gate equivalent count specifies the average number of reference gates that are required to implement the particular function.The total power required for the particular function is estimated by multiplying the approximated number of gate equivalents with the average power consumed per gate. The reference gate can be any gate e.g. 2-input NAND gate.

Examples of Gate Equivalent technique

Class-Independent Power Modeling: It is a technique which tries to estimate chip area, speed, and power dissipation based on information about the complexity of the design in terms of gate equivalents. The functionality is divided among different blocks but no distinction is made about the functionality of the blocks i.e. it is basically class independent. This is the technique used by the Chip Estimation System (CES).

Steps:

Identify the functional blocks such as counters, decoders, multipliers, memories, etc.
Assign a complexity in terms of Gate Equivalents. The number of GE’s for each unit type are either taken directly as an input from the user or are fed from a library.

P = ∑_i∈{fns} GE_i (E_typ + C_Lⁱ V_dd²) f A_intⁱ

Where E_typ is the assumed average dissipated energy by a gate equivalent,when active. The activity factor, A_int, denotes the the average percentage of gates switching per clock cycle and is allowed to vary from function to function. The capacitive load, C_L, is a combination of fan-out loading as well as wiring. An estimate of the average wire length can be used to calculate the wiring capacitance. This is provided by the user and cross-checked by using a derivative of Rent’s Rule.

Assumptions:

A single reference gate is taken as the basis for all thge power estimates not taking into consideration different circuit styles, clocking strategies, or layout techniques.
The percentage of gates switching per clock cycle denoted by Activity factors are assumed to be fixed regardless of the input patterns.
Typical gate switching energy is characterized by completely random uniform white noise (UWN) distribution of the input data. This implies that the power estimation is same regardless of the circuit being idle or at maximum load as this UWN model ignores how different input distributions affect the power consumption of gates and modules.^[3]

Class-Dependent Power Modeling: This approach is slightly better than the previous approach as it takes into account customized estimation techniques to the different types of functional blocks thus trying to increase the modelling accuracy which wasn’t the case in the previous technique such as logic, memory, interconnect, and clock hence the name. The power estimation is done in a very similar manner to the independent case. The basic switching energy is based on a three-input AND gate and is calculated from technology parameters e.g. gate width, tox, and metal width provided by the user.

P_bitlines = N_col / 2 * (L_col C_wire + N_row C_cell) V_dd V_swing

Where C_wire denotes the bit line wiring capacitance per unit length and C_cell denotes the loading due to a single cell hanging off the bit line. The clock capacitance is based on the assumption of an H-tree distribution network. Activity is modelled using a UWN model.As can be seen by the equation the power consumption of each components is related to the number of columns (N_col) and rows (N_row) in the memory array.

Disadvantages:

The circuit activities are not modeled accurately as an overall activity factor is assumed for the entire chip which is also not trustable as provided by the user. As a matter of fact activity factors will vary throughout the chip hence this is not very accurate and prone to error. This leads to the problem that even if the model gives a correct estimate for the total power consumption by the chip, the module wise power distribution is fairly inaccurate.
The chosen activity factor gives the correct total power, but the breakdown of power into logic, clock, memory, etc. is less accurate. Therefore this tool is not much different or improved in comparison with CES.

Precharacterized Cell Libraries

This technique further customizes the power estimation of various functional blocks by having separate power model for logic, memory, and interconnect suggesting a Power Factor Approximation (PFA) method for individually characterizing an entire library of functional blocks such as multipliers, adders, etc. instead of a single gate-equivalent model for “logic” blocks.
The power over the entire chip is approximated by the expression:

P = ∑ _{i∈{all blocks}} K_i G _i f _i

Where K_i is PFA proportionality constant that characterizes the i_th functional element,G_i is the measure of hardware complexity , and f_i denotes the activation frequency.

Example

G_i denoting the hardware complexity of the multiplier is related to the square of the input word length i.e. N² where N is the word length.The activation frequency is the rate at which multiplies are performed by the algorithm denoted by f_mult and the PFA constant, K_mult, is extracted empirically from past multiplier designs and shown to be about 15 fW/bit2-Hz for a 1.2 µm technology at 5V. The resulting power model for the multiplier on the basis of the above assumptions is:

P_mult = K_mult N ² f _mult

Advantages:

Customization is possible in terms of whatever complexity parameters which are appropriate for that block. E.g. for a multiplier the square of the word length was appropriate. For memory, the storage capacity in bits is used and for the I/O drivers the word length alone is adequate.

Weakness:

There is the implicit assumption that the inputs do not affect the multiplier activity which is contradictory to the fact that the PFA constant K_mult is intended to capture the intrinsic internal activity associated with the multiply operation as it is taken to be a constant.

The estimation error (relative to switch-level simulation) for a 16x16 multiplier is experimented and it is observed that when the dynamic range of the inputs doesn’t fully occupy the word length of the multiplier, the UWN model becomes extremely inaccurate^[4]. Granted, good designers attempt to maximize word length utilization. Still, errors in the range of 50-100% are not uncommon. The figure clearly suggests a flaw in the UWN model.

References

^ "Power Estimation Techniques for Integrated Circuits "
^ "Low-Power Architectural Design Methodologies "
^ [http://delivery.acm.org/10.1145/250000/244548/p158-raghunathan.pdf?ip=103.27.8.42&id=244548&acc=ACTIVE%20SERVICE&key=045416EF4DDA69D9%2EF8E7F338DF557316%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=504808115&CFTOKEN=79046804&__acm__=1429710434_0d9c0bce018bcd071c079ecb15be69e8 "Register-Transfer Level Estimation Techniques for Switching Activity and Power Consumption"]
^ "Power Macromodeling for High Level Power Estimationy"

[1] "Power Estimation Techniques for Integrated Circuits "

[2] "Low-Power Architectural Design Methodologies "

[3] [http://delivery.acm.org/10.1145/250000/244548/p158-raghunathan.pdf?ip=103.27.8.42&id=244548&acc=ACTIVE%20SERVICE&key=045416EF4DDA69D9%2EF8E7F338DF557316%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=504808115&CFTOKEN=79046804&__acm__=1429710434_0d9c0bce018bcd071c079ecb15be69e8 "Register-Transfer Level Estimation Techniques for Switching Activity and Power Consumption"]

[4] "Power Macromodeling for High Level Power Estimationy"

[1]