power consumption optimization for vlsi designs using icc dc eda tools and cmos 32nm edk synopsys technology

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.58 MB, 67 trang )

<span class="text_page_counter">Trang 1</span><div class="page_container" data-page="1">

MINISTRY OF EDUCATION AND TRAINING

<b>HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION FACULTY FOR HIGH QUALITY TRAINING </b>

<b>GRADUATION PROJECT </b>

<b>COMPUTER ENGINEERING TECHNOLOGY</b>

<b>POWER COMSUMPTION OPTIMIZATION FOR VLSI DESIGNS USING ICC & DC EDA TOOLS AND CMOS 32NM </b>

<b>EDK SYNOPSYS TECHNOLOGY </b>

<b>LECTURER: PHD. PHAM VAN KHOASTUDENT: NGUYEN LE GIA LAM PHU QUOC HUY</b>

<b>Ho Chi Minh City, December 2023</b>

</div><span class="text_page_counter">Trang 2</span><div class="page_container" data-page="2">

<i><b> HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY AND EDUCATION </b></i>

<b>FACULTY OF INTERNATIONAL EDUCATION </b>

<b>GRADUATION PROJECT </b>

<b>NGUYỄN LÊ GIA LÂMStudent ID: 19119066 PHÙ QUỐC HUYStudent ID: 19119022 </b>

Ho Chi Minh City, December 2023

<b>POWER COMSUMPTION OPTIMIZATION FOR VLSI DESIGNS USING ICC & DC EDA TOOLS AND CMOS </b>

<b>32NM EDK SYNOPSYS TECHNOLOGY </b>

</div><span class="text_page_counter">Trang 3</span><div class="page_container" data-page="3">

<small>THE SOCIALIST REPUBLIC OF VIETNAM </small>

<b><small>Independence - Freedom - Happiness </small></b>

<b>PROJECT ASSIGNMENT </b>

Student name: NGUYEN LE GIA LAM Student ID: 19119066

Major: COMPUTER ENGINEERING TECHNOLOGY Class: 19119CLA1

Supervisor: PHAM VAN KHOA, Ph.D Phone number: 0943722143

Date of assignment: September 3<small>rd</small>, 2023 Date of submission: January 6<small>th</small>, 2024 1. Project title: POWER COMSUMPTION OPTIMIZATION FOR VLSI DESIGNS USING ICC

& DC EDA TOOLS AND CMOS 32NM EDK SYNOPSYS TECHNOLOGY

2. Initial materials provided by the advisor: Documents, such as papers and books, that pertain to power in VLSI and low-power techniques.

3. Content of the project:

 Fundamentals about power reduction techniques, especially Clock Gating

 Applying CG technique in an educational microprocessor provided by Synopsys by using EDA tools: Design Compiler and IC Compiler

 Analyzing the obtained results of power reduction proportional gained to reduce power consumption to a good level without changing the chip's functions or making the chip defective.

 Finish from Synthesis to Routing stage of Physical Design Flow to make the layout of the design

4. Final product: Full layout of the design, Area and Power Reports

<b>CHAIR OF THE PROGRAM </b>

(Sign with full name)

<b>ADVISOR </b>

(Sign with full name)

</div><span class="text_page_counter">Trang 4</span><div class="page_container" data-page="4">

<b>DISCLAIMER </b>

We hereby declare that this is the final report, "POWER COMSUMPTION OPTIMIZATION FOR VLSI DESIGNS USING ICC & DC EDA TOOLS AND CMOS 32NM EDK SYNOPSYS TECHNOLOGY", the simulations and study findings are accurate and were carried out entirely under the direction of the instructor, Ph.D. PHAM VAN KHOA. The report does not duplicate any other sources either. The document also includes a number of cited and meticulously labeled references. We would like to fully assume responsibility for this pledge in front of the department, faculty, and school.

<b>Student </b>

<b>NGUYEN LE GIA LAM PHU QUOC HUY </b>

</div><span class="text_page_counter">Trang 5</span><div class="page_container" data-page="5">

<b>ACKNOWLEDGEMENTS </b>

First, we would like to express our sincere gratitude to the Faculty for High Quality Training and the School Board of the Ho Chi Minh City University of Technology and Education for creating the ideal surroundings for me to pursue my project.

Additionally, we would like to extend our sincere gratitude to the department's head, Ph.D. Pham Van Khoa, who consistently monitors the learning environment and supports and develops growth possibilities for every student generation.

Last but not least, lack of expertise and slow implementation make it impossible for us to avoid blunders. Your comments and ideas are appreciated as we work to make this topic better.

Regards and many thanks for your assistance.

<b>NGUYEN LE GIA LAM PHU QUOC HUY </b>

</div><span class="text_page_counter">Trang 6</span><div class="page_container" data-page="6">

2.3.2 Clock Gating technique ... 10

2.4 Standard Design Flow ... 15

2.4.1 Front End Flow ... 15

2.4.2 Back End Flow ... 16

</div><span class="text_page_counter">Trang 7</span><div class="page_container" data-page="7">

Chapter 4 Chater 4: RESULT ... 42

4.1 Layout of the design ... 42

4.2 Area and Power Consumption Results ... 44

4.2.1 Gated registers report ... 44

</div><span class="text_page_counter">Trang 8</span><div class="page_container" data-page="8">

<b>LIST OF FIGURES </b>

Figure 2.1 pMOS and nMOS physical structure ... 4

Figure 2.2 nMOS and pMOS model ... 5

Figure 2.3 NOT gate ... 5

Figure 2.4 Switching power concept ... 7

Figure 2.5 Short-circuit power concept ... 8

Figure 2.6 Clock Gating model [8]... 10

Figure 2.7 Latch-OR sample ... 11

Figure 2.8 Latch-AND sample ... 12

Figure 2.9 Clock Gating cell with Latched Pos Edge Control Post [9] ... 12

Figure 2.10 Clock Gating cell with Latched Neg Edge Control Post [9] ... 13

Figure 2.11 Clock Gating cell with Latched Pos Edge Control Pre [9] ... 13

Figure 2.12 Clock Gating cell with Latched Neg Edge Control Pre [9] ... 14

Figure 2.13 Front end flow ... 15

Figure 2.14 Back end flow and Synopsys tools related... 17

Figure 2.15 I/O ports ... 21

Figure 2.16 Rectangular Rings ... 22

Figure 2.17 Via ... 22

Figure 2.18 Parallel and cross routing grid ... 23

Figure 2.19 Keepout Margin ... 23

Figure 2.20 Design Compiler Interface ... 24

Figure 2.21 IC Compiler Interface ... 25

Figure 3.1 Chiptop General Blocks ... 26

Figure 3.2 Chiptop I/Os ... 27

Figure 3.3 Chiptop Interconnection ... 28

Figure 3.4 Register with Feedback MUX ... 29

Figure 3.5 Register with Latch-And ICG cell ... 30

Figure 3.6 Register Bank sample ... 30

Figure 3.7 Design Compiler process flow ... 31

</div><span class="text_page_counter">Trang 9</span><div class="page_container" data-page="9">

Figure 3.8 source and set libraries command ... 31

Figure 3.9 analyze command ... 32

Figure 3.10 read_file command ... 32

Figure 3.11 set_operating_conditions command ... 32

Figure 3.12 set_load command ... 32

Figure 3.13 set_driving_cell command ... 32

Figure 3.14 source constraints file command ... 33

Figure 3.15 set_clock_gating_registers command ... 33

Figure 3.16 link command ... 34

Figure 3.17 compile command ... 34

Figure 3.18 check number of ICG cells command... 34

Figure 3.19 report_area command ... 34

Figure 3.20 report_power command ... 34

Figure 3.21 write dcc file command ... 35

Figure 3.22 write Verilog file command ... 35

Figure 3.23 load script file command ... 35

Figure 3.24 set TLU file and Tech file ... 35

Figure 3.25 import_designs command ... 35

Figure 3.26 ICC based on physical design flow ... 36

Figure 3.27 create_floorplan command ... 37

Figure 3.28 set_keepout_margin command ... 37

Figure 3.29 create_rectangular_rings command ... 37

Figure 3.30 preroute_standard_cells command ... 38

Figure 3.31 set_fp_rail_constraints commands ... 38

Figure 3.32 synthesize_fp_rail command ... 38

Figure 3.33 commit_fp_rail command ... 39

Figure 3.34 create_fp_placement command ... 39

Figure 3.35 set_dont_touch_placement command ... 39

Figure 3.36 insert_stdcell_fillter command ... 39

</div><span class="text_page_counter">Trang 10</span><div class="page_container" data-page="10">

Figure 3.37 place_opt command ... 39

Figure 3.38 clock_opt command ... 40

Figure 3.39 preroute_standard_cells in routing ... 40

Figure 3.40 active zroute commands ... 40

Figure 3.41 route_opt command ... 41

Figure 3.42 route_eco command ... 41

Figure 4.1 Macros, I/O ports and Standard Cells ... 42

Figure 4.2 Keepout margin ... 43

Figure 4.3 Rectangular Rings and Power Straps ... 43

Figure 4.4 Via of the design ... 44

Figure 4.5 Percentage of area changed in 4 main blocks ... 47

Figure 4.6 Power Reduction Percentage Comparison Graph ... 49

</div><span class="text_page_counter">Trang 11</span><div class="page_container" data-page="11">

<b>LIST OF TABLES </b>

Table 2.1 NOT gate truth table... 5

Table 2.2 CMOS 32nm technology table ... 6

Table 2.3 Design’s operating table ... 6

Table 2.4 Clock Gating cell with Latched Pos Edge Control Post [9] ... 12

Table 2.5 Clock Gating cell with Latched Neg Edge Control Post [9] ... 13

Table 2.6 Clock Gating cell with Latched Pos Edge Control Pre [9] ... 14

Table 2.7 Clock Gating cell with Latched Neg Edge Control Pre [9] ... 14

Table 4.1 Clock gating summary ... 44

Table 4.2 Clock gating detailed placement ... 45

Table 4.3 Detailed Area Report ... 46

Table 4.4 Power Consumption Report ... 47

</div><span class="text_page_counter">Trang 13</span><div class="page_container" data-page="13">

<b>Chapter 1 Chapter 1: INTRODUCTION 1.1 Introduction </b>

Nowadays, the Internet of Things (IoT) technology is increasing vastly and in the coming days, billions of things/devices/systems will be connected with each other through the internet. For making this emerging technology self-sustainable, there is a need for green and incessant energy, to power up different nodes of the IoT system, and it is only possible with the use of energy harvesting schemes.[1] One of main features of low power design is battery lifeline. Power is one of main factors of a design limitation that affects high-end systems as well as portable computers and IoT devices devices, and power should not be prioritized over performance during the design process. Also, low power design is one of the primary goals for any integrated circuits. Especially in Very Large-Scale Integration (VLSI), which is a kind of Integrated Circuit (IC) that consists of hundreds and hundreds of transistor connection into a small chip.[2]

Clock gating is a commonly employed technique in CMOS circuits to improve power efficiency. Global perspectives on the Clock Gating technique have been covered in articles. While the transition of flip flop in 0 to 1 and 1 to 0 by the clock pulse which increases the switching activity hence consume more power.[3] The primary goal of putting the Clock Gating approach into practice is to stop giving circuits needless clock pulses that would not alter the output. The methods discussed in the article show a thorough comprehension of the theory, block diagrams, and technique comparisons.

The complexity of IC's physical architecture has increased because to all of the aforementioned improvements. The gate level netlist synthesis, floor planning, power planning, placement, clock tree synthesis, routing, and physical verification are all part of the physical design of an integrated circuit. To solve the difficulties of design productivity gap and proper design quality, designers must find out low-power technologies.

In this project, we will present the basic theories of each technique based on the previous research papers. Additionally, the team will applying Clock Gating technique in a microprocessor. Through this, a comparison and evaluation of the results of this technique when applied in digital circuit design will be conducted.

</div><span class="text_page_counter">Trang 14</span><div class="page_container" data-page="14">

<b>1.2 Objective </b>

The project aims to provide the fundamental concepts of Clock gating techniques. In addition, we will apply Clock Gating technique in a microprocessor in physical design by using EDA tools: Design Compiler and IC Compiler. Then, the performance of the techniques compared with plain design will be evaluated and analyzed.

<b>1.3 Limitation </b>

Our project was actually implemented on a educational design provided by Synopsys, but because the field that the project covered was only physical design (Backend) and the supplier's technology security issue, when designing, it will not be done at the RTL level, but our team will use two EDA tools, Design Compiler and IC Compiler, to perform Clock Gating techniques on the provided chip. The results of power consumption are also simulated on Design Compiler, so they are only at an ideal level (not considering best-case or worst-case).

Besides, timing analysis will also be ignored because it requires a lot of specialized knowledge and related tools to perform this timing step.

<b>1.4 Research method </b>

The research methods used in this project are:

Analysis and synthesis of theoretical knowledge: analyzing the difficulties and synthesizing all the relevant theories before applying technique.

Simulation-based research: using software (EDA tools) to observe and analyze the processes.

<b>1.5 Object and the scope of the study </b>

Report objects: fundamental about Clock Gating technique, applying this technique in a real microprocessor by using EDA tools: Design Compiler and IC Compiler, analyzing the obtained results of power reduction proportional gained.

The scope of the study: investigating the Clock Gating technique, applying CG techniques to the microprocessor chip at the physical design stage can reduce power consumption to a good level without changing the chip's functions or making the chip defective or unable to work.

<b>1.6 Outline </b>

The study team tried to convey the information logically in the report so that readers quickly understood the subject's expertise, methodology, and functioning. The report is organized into the following six chapters

CHAPTER 1: INTRODUCTION Presenting an overview of the current research Clock Gating technique. Objectives, objects and scope of the study.

</div><span class="text_page_counter">Trang 15</span><div class="page_container" data-page="15">

CHAPTER 2: BACKGROUND Presenting background knowledge about Clock gating technique, CMOS technology and design flows

CHAPTER 3: SYSTEM DESIGN Presenting system requirements, block diagrams and block functions, design for the system by EDA tools, building algorithmic flowcharts. CHAPTER 4: RESULT AND EVALUATION Presenting the results of power reduction, compare this result with related work.

CHAPTER 5: CONCLUSION AND FUTURE WORK Presenting conclusions for final project, stating the advantages and disadvantages of the topic, the errors that the team made while implementing and giving directions for future development.

</div><span class="text_page_counter">Trang 16</span><div class="page_container" data-page="16">

<b>Chapter 2 Chapter 2: BACKGROUND 2.1 CMOS Technology </b>

<b>2.1.1 Introduction about CMOS </b>

CMOS (short for complementary metal-oxide-semiconductor) is the semiconductor technology widely used in most of modern integrated circuits (ICs), also known as chips or microchips. A Metal-Oxide-Semiconductor (MOS) structure is created by superimposing several layers of conducting and insulating materials to form a sandwich-like structure. A metal-oxide semiconductor field-effect transistors (MOSFETs) are transistors that operation is controlled by electric fields. Note that the structure consists of three layers: The metal gate electrode, the insulating oxide (SiO2) layer, and the p-type bulk semiconductor (Si), called the substrate. [4]

There are two primary types of MOSFETs: p-channel MOS (pMOS) and n-channel MOS (nMOS). In a nMOS transistor, the source and drain use a n-type semiconductor, and the substrate uses an p-type semiconductor. An pMOS transistor takes the opposite approach (shown in Figure 2.1).

<i>Figure 2.1 pMOS and nMOS physical structure </i>

In conclusion, the MOS transistor's gate regulates the current that flows from the source to the drain. The MOS transistors can be thought of as basic ON/OFF switches by oversimplifying this. An nMOS transistor is ON and has a conducting route from source to drain when its gate is set to 1. The nMOS transistor is off and nearly no current flows from source to drain when the gate is low. The converse is true for a pMOS transistor, which is ON when the gate is low and OFF when the gate is high. This switch model is illustrated in Figure 2.2.

</div><span class="text_page_counter">Trang 17</span><div class="page_container" data-page="17">

<i>Figure 2.2 nMOS and pMOS model </i>

CMOS logic ICs combine MOSFETs in various ways to implement logic functions. CMOS inverter which is basically combined by a nMOS and a pMOS is shown in figure below.

<i>Figure 2.3 NOT gate </i>

The pMOS transistor is ON and the nMOS transistor is OFF when input A is 0. As a result of being connected to VDD rather than GND, the output Y is pulled up to 1. On the other hand, when A = 1, Y is lowered to "0," the nMOS is ON, and the pMOS is OFF. This is summarized in Table 2.1.

<i>Table 2.1 NOT gate truth table </i>

At first, we give an overview of the evolution of important parameters such as the integrated circuit (IC) complexity, gate length, switching delay and supply voltage of the 32 nm CMOS technology. The requirement to include additional functions into a given silicon area is what keeps driving the trend of advancements in CMOS technology. The features of the general transistors for the 32nm general purpose technology are as follows: the poly contacted pitch is 0.126um, Lpoly is 28nm, and the inversion gate oxide thickness

</div><span class="text_page_counter">Trang 18</span><div class="page_container" data-page="18">

is 1.2/1.4nm for both nFET and pFET, respectively. In order to maximize driveability at low voltage and maximize active power, the power supply is set at Vdd=0.9V. Detail of the design given in table below

<i>Table 2.2 CMOS 32nm technology table </i>

In this thesis, we use SAED_EDK32/28_CORE-SAED Digital Standard Cell Library provided by Synopsys to design and implement low power techniques in 32nm CPU. It is given for 1.05V operation. The detailed operating conditions of the library are shown in table.

<i>Table 2.3 Design’s operating table </i>

</div><span class="text_page_counter">Trang 19</span><div class="page_container" data-page="19">

Finally, the two power components of a CMOS circuit are:  Dynamic Power

 Static Power (Leakage Power)

<b>2.1.1 Dynamic Power </b>

Dynamic power is the power consumed when the device is active which mean when signals are changing values [6]. It is combined by 2 types of power: switching power and internal power (also called short-circuit power).

𝑃<sub>𝑑𝑦𝑛𝑎𝑚𝑖𝑐 </sub>= 𝑃<sub>𝑠𝑤𝑖𝑡𝑐ℎ𝑖𝑛𝑔 </sub>+ 𝑃<sub>𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙 </sub>

In terms of switching power, it is the power needed to charge and discharge a gate’s output capacitance, mainly from transistors and interconnect wires.

<i>Figure 2.4 Switching power concept </i>

Switching power depends on the clock frequency (𝑓<sub>𝑐𝑙𝑘</sub>), load capacitance (C) and the supplied power (𝑉<sub>𝐷𝐷</sub>). The expression of the switching power can be defined as:

</div><span class="text_page_counter">Trang 20</span><div class="page_container" data-page="20">

<i>Figure 2.5 Short-circuit power concept </i>

Since power is getting calculated from current and voltage supply, the formula of the internal power is:

𝑃<sub>𝑖𝑛𝑡𝑒𝑟𝑛𝑎𝑙 </sub>= 𝑡<sub>𝑠𝑐</sub>*𝑉<sub>𝐷𝐷</sub>*𝐼<sub>𝑝𝑒𝑎𝑘</sub>*𝑓<sub>𝑐𝑙𝑘</sub>

where 𝑡<sub>𝑠𝑐</sub> is the time duration of the short circuit current, 𝐼<sub>𝑝𝑒𝑎𝑘</sub> is a crowbar current. There are several techniques used to optimizing dynamic power. This focus on the frequency and voltage in the formulas. Clock gating and multi-voltage are the 2 low-power methods which are widely used:

 The clock gating technique is about driving the frequency to zero that also drive the power to zero. Different form of clock gating is used for many IC designs.  Another approach is multi-voltage: since in some ICs, there are blocks do not

have to always run with high performance, we can provide it a lower voltage supply than other to reduce the power consumption.

<b>2.1.2 Static power </b>

Static power, which is also named as leakage power, is present even when the switching activity is zero and it is not dependent on the clock frequency [7]. Leakage currents have different sources:

 Subthreshold leakage: caused by the current that flows from the drain to the source when transistor is not turned completely OFF. It is a major component of leakage power in semiconductor devices.

 Gate leakage: caused by the current that flows directly through the dielectric material from the gate to the body.

 Junction leakage from source/drain diffusions: caused by the current that flows through the reversed-bias diodes, from the n-type drain to the grounded p-type substrate of NMOS and from the n-well to the p-type drain of the PMOS.

</div><span class="text_page_counter">Trang 21</span><div class="page_container" data-page="21">

To a good approximation, the formula of subthreshold leakage is defined as: 𝐼<sub>𝑠𝑢𝑏 </sub>= µ𝐶<sub>𝑜𝑥 </sub>𝑉<sub>𝑡ℎ</sub><sup>2 𝑊</sup>

<small>𝐿</small> 𝑒

Power gating and Multi-𝑉<sub>𝑇</sub> are 2 well-known techniques used to minimizing the leakage power.

<b>2.3 Power reduction techniques 2.3.1 Overview </b>

Currently, there is a greater need for low power consumption circuits due to the growing trend of tiny gadgets. This means that research on lowering power dissipation in VLSI circuits must be coordinated. This study aims to provide a brief overview of the various power reduction approaches used in industries today at the design abstraction level. Power dissipation in the devices increases as a result of the need to add more and more systems on chips, which increases the number of transistors. Reliability and battery life, particularly in portable devices, are two key components of low power design. Numerous conventional methods, including power gating, variable frequency, variable voltage supply, variable device threshold, clock gating, and variable frequency, are being employed to solve these issues. Whereas many modern techniques have also been used such dynamic power reduction, leakage power reduction, back biasing and many more.

There are some common power management techniques in brief:

 Clock Gating: is a power-saving feature in semiconductor microelectronics that enables switching off circuits. Clock gating is a common technique used by electronic systems to lower dynamic power consumption by turning off buses, controllers, bridges, and portions of processors.

 Power Gating: is a power reduction method in integrated circuit design that cuts the current to circuit blocks that are not in use in order to minimize power consumption. One advantage of power gating is that it allows for Iddq testing in addition to lowering stand-by or leakage power.

 Multi Vdd: is used for saving the dynamic and static power of the design. By this technique, the chip is built with the different supply voltages. Different functional blocks run at a different supply voltage.

 Multi Vt: Multiple threshold voltage techniques use both Low Vt and High Vt cells. Employ gates with lower thresholds on the crucial path and gates with higher thresholds off the critical path. With this methodology, performance is enhanced without requiring more electricity.

</div><span class="text_page_counter">Trang 22</span><div class="page_container" data-page="22">

<b>2.3.2 Clock Gating technique </b>

<b>2.3.2.1 Concept of Clock Gating technique </b>

In some designs, there are logical blocks that can be shut off when there is no work to be done, but the clock signal keeps toggling at every clock cycle. Often, the clock signal makes a huge capacitive load, so these signals come for a major source of dynamic power dissipation. One of the most effective ways to reduce this power is using clock gating. It is a critical technique that can significantly improve the power consumption, performance, and reliability of an ASIC design.

Clock Gating technique seeks to specific disable or suppress transitions from propagating to portions of the clock path determined by clock-gating circuits,. Because unnecessary transitions are not loaded while the clock is not active, the switching capacitance reduction in the clock network and the switching activity in the logic fed by the storage components are primarily responsible for the savings. Clock gating technique is illustrated in figure below

<i>Figure 2.6 Clock Gating model [8]</i>

Determining the appropriate location and timing for clock gating is a problem in terms of power optimization. Early on in the development of RT-level design, engineers would specifically write clock gating circuits in the RT-level. This method is prone to mistakes since it is simple to construct a clock gating circuit that malfunctions during gating and causes functional faults. These days, the majority of libraries include certain clock gating cells that the synthesis tool can identify. Clock gating reduces power in an easy and dependable manner thanks to the combination of explicit clock gating cells and automatic insertion. Since modern design tools support automatic clock gating: they can identify circuits where clock gating can be inserted without changing the function of the logic, no change to the RT-level is required to implement this style of clock gating.

It is a good design idea to turn off the clock when it is not needed. Automatic clock gating is supported by modern EDA tools. They identify the circuits where clock gating can be placed and inserted Integrated Clock Gating (ICG) cells into this circuit. There are

</div><span class="text_page_counter">Trang 23</span><div class="page_container" data-page="23">

many architectures of ICG cells, though two common using types are OR type clock gate (Figure 2.7) can be preceded with a positive level-sensitive latch and an AND type clock gate (Figure 2.8) can always be preceded with a negative level-sensitive.

<b>2.3.2.2 Integrated Clock Gating Cells </b>

ICG cell basically stops the clock propagation through it when we apply a low clock enable signal on it. It is also the main technique in our thesis. When the large group of logic cells is not needed to function, we use the ICG cell to interrupt the clock signal's transmission. This is accomplished by applying a clock enable signal to the EN pin of the ICG cell, which is created internally in the block. Because the clock signal has the highest switching activity, the clock tree plays a significant role in producing dynamic power. The ICG cell contributes to the design's reduction of dynamic power usage by enabling the clock signal to stop propagating beyond it.

The one in Figure 2.8 has an AND gate preceded by a negative level-sensitive latch. Both test_enable (TE) and enable (EN) are set to active high. There's an inactive low state for clock out. This one is complemented by the one in the Fig. Prior to an OR gate, there is a positive level-sensitive latch. The enable and TE signals are both active highs, while the output clock is in an inactive high condition. If both the enable and TE signals are active low, an AND gate ought to be utilized rather than a NOR gate.

<i>Figure 2.7 Latch-OR sample </i>

Latch-AND based ICG cell offers a clock-gated output that is glitch-free, and only allowed the clock to run on a single channel when the enable signal was high, stopping it from running when the signal was low.

</div><span class="text_page_counter">Trang 24</span><div class="page_container" data-page="24">

<i>Figure 2.8 Latch-AND sample </i>

In this thesis, we also use both types of Latch-AND and Latch-OR based ICG cells in the SAED_EDK32/28_CORE – SAED library.

<b>2.4.2.2 Integrated Clock Gating Cell in SAED_EDK32/28_CORE – SAED library Clock Gating cell with Latched Pos Edge Control Post </b>

</div><span class="text_page_counter">Trang 25</span><div class="page_container" data-page="25">

<b>Clock Gating cell with Latched Neg Edge Control Post </b>

Logic Symbol

<i>Figure 2.10 Clock Gating cell with Latched Neg Edge Control Post [9] </i>

Truth Table

<i>Table 2.5 Clock Gating cell with Latched Neg Edge Control Post [9] </i>

<b>Clock Gating cell with Latched Pos Edge Control Pre </b>

Logic Symbol

<i>Figure 2.11 Clock Gating cell with Latched Pos Edge Control Pre [9] </i>

</div><span class="text_page_counter">Trang 26</span><div class="page_container" data-page="26">

Truth Table

<i>Table 2.6 Clock Gating cell with Latched Pos Edge Control Pre [9] </i>

<b>Clock Gating cell with Latched Neg Edge Control Pre </b>

</div><span class="text_page_counter">Trang 27</span><div class="page_container" data-page="27">

<b>2.4 Standard Design Flow 2.4.1 Front End Flow </b>

Finding a solution for a particular issue or opportunity and turning it into an RTL circuit description is the responsibility of the frontend flow. The frontend flow's phases are described in Figure 2.13 detailly below.

<small>RTL CodingIntegration and </small>

<small>Verification Succeed?</small>

<i>Figure 2.13 Front end flow </i>

<b>2.4.1.1 Problem to Solution Specification </b>

Each project commences with a challenge to address, an opportunity to seize, or an area that requires enhancement. The designer must conceptualize an abstract solution for that challenge, which, at this point, may or may not be linked to any particular implementation technology.

<b>2.4.1.2 High-level architecture </b>

The subsequent phase involves designing a system by breaking it down into level blocks, each with its distinct function, and defining their interconnections. For instance, when designing a microprocessor, this entails dividing the design into components like the ALU, instruction decoder, registers, and so forth.

</div><span class="text_page_counter">Trang 28</span><div class="page_container" data-page="28">

<b>high-2.4.1.3 Low-level functional specification </b>

During this phase, the designer must elucidate the function of each block and outline its implementation. It can be advantageous to depict these blocks using functional or behavioral descriptions.

<b>2.4.1.4 RTL Coding </b>

Using a Hardware Description Language, like Verilog, each block is described in depth. Next, each block's functionality is converted into language-specific synthesizable constructs.

<b>2.4.1.5 Integration and Functional verification </b>

This is the phase where the operational aspects of the design are subjected to simulation or validation, spanning all levels of abstraction. To ensure that the RTL code aligns with the functional requirements, each block must undergo verification. Once all blocks meet their specifications, the next step is to integrate them at the top level and validate the overall system's functionality. This involves the use of testbenches, which generate input test vectors to stimulate either the individual blocks or the top-level functionality.

Realizing a circuit physically is the responsibility of the backend phase, which transforms the RTL circuit description into a GDSII layout file. Place & Route and Synthesys are the two main phases of the backend process. The diagram in Figure depicts the backend workflow alongside relevant Synopsys tools.

</div><span class="text_page_counter">Trang 29</span><div class="page_container" data-page="29">

<small>Synthesis Verification</small>

<small>Place & Route</small>

<small>Static TimingAnalysis</small>

<small>Timing OK?</small>

<small>Integration and DRC/LVS verification</small>

<small>Design Compiler</small>

<small>Custom Designer +Hercules</small>

<b>Design FlowSynopsys Tools</b>

<i>Figure 2.14 Back end flow and Synopsys tools related </i>

</div><span class="text_page_counter">Trang 30</span><div class="page_container" data-page="30">

<b>2.4.2.1 Synthesis </b>

Synthesis plays the role of transforming the RTL description into a structural netlist at the gate level. This netlist encompasses the instantiation of all components, including standard cells and macros, that constitute the circuit, along with their interconnections, while adhering to the design constraints related to timing and area. Synthesis can be broken down into three main steps: Translation, Optimization, and Mapping

Regarding DC's synthesis procedure, the tool first loads the RTL description into memory and converts it into an unmapped netlist using translation. DC then uses the target library cells to generate the netlist while considering the design requirements and the particular design environment. DC then makes optimizations to guarantee that the design constraints are satisfied. Since synthesis is limited in its ability to reflect physical properties, all clock signals, settings, and resets are considered ideal during this phase. Finally, a set of reports is generated, and a gate-level netlist is exported for utilization by the subsequent place and route tool.

<b>2.4.2.2 Place and route </b>

The backend step known as Place & Route is in charge of turning the gate-level netlist produced during synthesis into a physical design. The Place & Route stage comprises five different procedures, namely Design Planning, Placement, Clock Tree Synthesis, Routing, and Chip Finishing.

Design Planning involves configuring the specific tool's environment. Placement entails arranging all macros and cells within a predetermined space, typically occurring in two phases. The first phase, Coarse Placement, strategically positions standard cells to optimize timing and congestion, with no consideration for overlap prevention. The subsequent phase, known as Legalize, addresses overlap issues by relocating overlapping cells to the nearest available space. Floor planning produces assigned blocks and enables early estimates of interconnect length, circuit delay, and chip performance. [10]

Clock Tree Synthesis (CTS) entails constructing a balanced buffer tree for all fanout clock nets to mitigate violations related to clock skew, maximum transition time, capacitance, and setup and hold times.Routing is responsible for designing the wires necessary to connect all circuit cells while adhering to the manufacturing process's rules. These connections are established using multiple metal layers stacked on top of each other, interconnected through vias.

high-Because routing introduces RC parasitic effects that cause delay, signal noise, and higher IR drop, it can negatively impact timing, transition, and capacitance margins. Clock signals are usually routed first and positioned in middle metal layers, away from the noisy power sources of ordinary cells, to reduce these parasitic effects. The three stages are as

</div><span class="text_page_counter">Trang 31</span><div class="page_container" data-page="31">

follows: Search & Repair (correcting violations), Track Assignment (assigning nets to certain metal layers), and Global Routing (creating routing nets).

<b>2.4.2.3 Static timing analysis </b>

Static Timing Analysis is a method employed to acquire precise timing data without the necessity of circuit simulation. It is used to find skew and sluggish pathways that restrict the operating frequency, as well as to detect timing violations during setup and hold. With the use of programs like as Synopsys PrimeTime, STA can be implemented on a physical design for different areas. PrimeTime provides a variety of reports that make timing violations easier to find by using the post-layout netlist as an input and adding standard and parasitic cell data.

As mentioned previously, these timing issues can often be resolved by introducing buffers or adjusting cell sizes. PrimeTime aids in pinpointing the locations where these modifications are required and allows for testing their effectiveness. Once a list of new buffers and resized cells is compiled, these alterations must be implemented within ICC. Subsequently, another round of parasitic extraction and STA is conducted to evaluate the results. This iterative process continues until no further violations are detected.

<b>2.4.2.4 Post-layout analysis and verification </b>

Once again, a formality check is carried out to verify the logical equivalence between the post-layout netlist and the RTL description. The sheer number of transistors within a circuit can potentially cause the voltage levels to drop below a predefined margin necessary for proper circuit operation. IR-drop analysis is employed to examine the power grid's strength and ensure it can maintain the minimum required voltage level.

Synopsys PrimeRail is the designated tool for producing reports on IR-drop and electromagnetic (EM) analyses. Subsequently, PrimeTime PX, an extension of PrimeTime, assumes the responsibility of conducting power analyses to estimate the circuit's power consumption across various corners. This tool is capable of calculating both dynamic and static power consumption, encompassing the entire design or focusing on the power consumption of individual standard cells or macros.

<b>2.4.2.5 DRC/LVS verification </b>

The final stage is to create a complete input file that contains the whole design layout. In order to do this, the conventional cell layout can be combined with the IC Compiler layout using Custom Designer, which will result in the final comprehensive graphical layout file. Hercules, a Synopsys tool for verifying Layout against Schematic and Design Rules Checking, is then launched. Design requirements Checking evaluates if the foundry's geometric and connection requirements are followed. These guidelines may include minimum metal width, metal-to-well and well-to-well separation, the Antenna Effect, and metal fill density, among other factors. Conversely, Layout Versus Schematic,

</div><span class="text_page_counter">Trang 32</span><div class="page_container" data-page="32">

or LVS, assesses if the physical circuit corresponds with the initial circuit schematic. The schematic representation is usually a CDL netlist, while the layout is represented in the GDSII format.

<b>2.4.2.6 Summary </b>

Throughout this chapter, we have outlined the standard design flow using Synopsys tools. However, it's crucial to note that this conventional flow does not place a specific emphasis on power-related characteristics. In this flow, every design is essentially treated as a single-supply design. Therefore, it becomes imperative to explore how these tools can be adapted to accommodate power characteristics and the integration of power gating cells.It's worth highlighting that the implementation of power gating primarily revolves around the physical aspect of design. As a result, the key points of impact for this dissertation are expected to be in the Synthesis and Place & Route Analysis.

<b>2.5 Layout Components 2.5.1 Macros </b>

The memory cells are called macro cells. These IPs, which were created by another Analog design team, might be utilized during the design process when creating the floor layout.

There are following three types of Macros:

Hard macros are defined in GDS or LEF files. Hardware IPs, such as memory, are the only terminology used for hard macros. Block level design, which has previously undergone silicon testing and for PPA (Power, Performance, Area) optimized, is quite like hard macros. We are limited to movement, rotation, and flipping when placing hard macros; we are unable to alter anything within them. Hard macros are only accessible through their pins; their RTL cannot be altered. Hard macros are block-level designs with timing, power, or area optimizations.

Soft Macros are defined in Synthesizable RTL. Since RTL is independent of all technologies, these macros are likewise not process- or industry-specific. Soft macros are therefore more flexible than hard and firm macros because we can edit before moving on to further phases.

<b>2.5.2 I/O ports </b>

I/O is a function for exchanging data and signals between external devices and a microcontroller. Basic operation includes "read" and "write" by the CPU. Peripheral circuits dedicated to external devices are prepared, and they perform input, output, and communication of data. The port connects the CPU to a peripheral device via a hardware interface or to the network via a network interface.

</div><span class="text_page_counter">Trang 33</span><div class="page_container" data-page="33">

<i>Figure 2.15 I/O ports </i>

<b>2.5.3 Cells </b>

A Standard Cell is a group of transistors and its interconnect structures that provides a Boolean logic function (NOT, OR, AND, XOR, NOR, Inverters…) or a storage function (Flip-flop or Latch)

The logical view of a Cell is its Boolean logic function; for combinational logic, this takes the form of a truth table or Boolean algebra equation, and for sequential logic, it takes the shape of a state transition table.

There are some types of Standard Cells:

 Buffer (Inverting and Non-inverting)  Combinational (AND, OR, NOR, NOT…)

 Arithmetic (XOR, full-adder, half-adder), Sequential (latched, clock-gate components, D type flip-flop…)

<b>2.5.4 Rectangular rings and power straps </b>

Around the core and macro are formed by a VDD and VSS power rings or rectangular rings (split into horizontal and vertical layers). In addition to this, power straps which tap power from the rings to the core area rails (special route) are created for macros to meet the power requirement of the entire design.

</div>