# Pulsed Flip-Flop with Dual Dynamic Node for Low Power using Embedded Logic

C. Aishwarya\* J.R. Beny\* and R. Rajasekaran\*

Abstract: In this paper, we introduce a new dual dynamic node hybrid flip-flop (DDFF) and a novel embedded logic module (DDFF-ELM) based on DDFF. The proposed designs eliminate the large capacitance present in the pre charge node of several state-of-the-art designs by following a split dynamic node structure to separately drive the output pull-up and pull down transistors. The DDFF offers a power reduction of up to37% and 30% compared to the conventional flip-flops at 25% and50% data activities, respectively. The aim of the DDFF-ELM is to reduce pipeline overhead. It presents an area, power, and speed efficient method to incorporate complex logic functions into the flip-flop. The performance comparisons made in a 90 nm UMC process show a power reduction of 27% compared to the Semi dynamic cflip-flop, with no degradation in speed performance. The leakage power and process-voltage-temperature variations of various designs are studied in detail and are compared with the proposed designs. Also, DDFF and DDFF-ELM are compared with other state-of-the-art designs by implementing The performance comparisons made in a 90 nm UMC process show a power reduction of 27% compared to the Semi dynamic cflip-flop, with no degradation in speed performance. The leakage power and process-voltage-temperature variations of various designs are studied in detail and are compared with the proposed designs. Also, DDFF and DDFF-ELM are compared with other state-of-the-art designs by implementing a 4-b synchronous counter and a 4-b Johnson up-down counter. The performance improvements indicate that the proposed designs are well suited for modern highperformance designs where power dissipation and latching overhead are of major concern.

Keywords : Embedded logic, flip-flops, high-speed, leakage power, low-power.

### 1. INTRODUCTION

Technology and speed are always moving forward, from low scale integration to large and VLSI and from megahertz (MHz) to gigahertz (GHz). The system requirements are also rising up with this continuous advancing process of technology and speed of operation. In synchronous systems, high speed has been achieved using advanced pipelining techniques. In modern deep-pipelined architectures, pushing the speed further up demands a lower pipeline overhead. This overhead is the latency associated with the pipeline elements, such as the flip-flops and latches. Extensive work has been devoted to improve the performance of the flip-flops in the past few decades [1]–[3], [8]–[14], [16].Hybrid latch flip-flop (HLFF) [1] and semi dynamic flip flop (SDFF) [2] are considered as the classic high-performance flip-flops. They possess a hybrid architecture that combines the merits of dynamic and static structures. In addition, SDFF has a distinctive capability of incorporating logic very efficiently, because unlike the true single phase latch (TSPC) in Yuanand Svensson's experiment [3], only one transistor is driven by the data input. This greatly helps in reducing the pipeline overhead since the delay and area associated with one or more logic stages preceding the flip-flop can be eliminated. Several hybrid flip-flop designs have been proposed in the past decade ,all aiming at reduction of power, delay, and area [8]–[17].A recent paper [4] introduced a flip-flop architecture named cross charge control flip flop (XCFF), which has considerable

<sup>\*</sup> Assistant Professor, Dept. of EEE, SNS College of Technology, Coimbatore. *E-mail : aishu.mouli@gmail.com, jrbenyje@gmail.com, rare457@gmail.com* 

advantages over SDFF and HLFF in both power and speed. It uses a split-dynamic node to reduce the pre charge capacitance, which is one of the most important reasons for the large power consumption in most of the conventional designs. But this structure still has some drawbacks, due to redundant power dissipation that results when the data does not switch for more than one clock (CLK) cycles. Also, the large hold-time requirement makes the design of timing-critical systems with XCFF an involved process. Finally, despite having a single data-driven transistor, embedding logic to XCFF is not very efficient due to the susceptibility to charge sharing at the internal dynamic nodes. In this paper, we propose a new dual dynamic node hybrid flip-flop (DDFF) and a novel embedded logic module (DDFFELM). Both of them eliminate the drawbacks of XCFF. The new designs are free from unwanted transitions resulting when the data input is stable at zero. DDFF-ELM presents a speed, area, and power efficient method to reduce the pipeline overhead. The performance of modern high performance flip-flops are compared with that of DDFF at different data activity. The post layout simulation results in 90 nm UMC process show that the DDFF saves 8% and 10% of the total power dissipated at 50% and 25% data activities, respectively when compared with XCFF. The proposed DDFF-ELM has a maximum power reduction of about 27% compared to its counter parts in SDFF.

#### 2. ANALYSIS OF FLIP-FLOP ARCHITECTURES

A large number of flip-flops and latches have been published in the past few decades. They can be grouped under the static and dynamic design styles. The former includes the master slave designs, such as the transmission gate based master-slave flip-flop in [5] and the PowerPC 603 master-slave latch in Gerosa et al.'s experiment [6]. They dissipate comparatively lower power and have a low clock-to-output (CLK-Q) delay. In a synchronous system, the delay overhead associated with the latching elements is expressed by the data-to-output (D-Q) delay rather than CLK-Q delay [7]. Here, D-Q delay refers to the sum of CLK-Q delay and the setup-time of the flip-flop. But the static designs mentioned earlier lack a low D-Q delay because of their large positive setup time. Also, most of them are susceptible to flow-through resulting from CLK overlap.



Figure 1: Power PC 603 flip-flop

Power PC 603 (Fig. 1) is one of the most efficient classic static structures. It has the advantages of having a low-power keeper structure and a low latency direct path. As mentioned earlier, the large D-Q delay resulting from the positive setup time is one of the disadvantages of this design. Also, the large data and CLK node capacitances make the design inferior in performance. The second category of the flip-flop design, the dynamic flip-flops includes the modern high performance flip-flops [1]–[3], [8]–[15]. There are purely dynamic designs as well as pseudo-dynamic structures. The latter, which has an internal recharge structure and a static output, deserves special attention because of their distinctive performance improvements.

They are called the semi-dynamic or hybrid structures, because they consist of a dynamic frontend and a static output. HLFF (Fig. 2) and SDFF (Fig. 3) fall under this category. They benefit from the CLK overlap to perform the latching operation. SDFF is the fastest classic hybrid structure, but is not efficient as far as power consumption is concerned because of the large CLK load as well as the large pre charge capacitance .HLFF is not the fastest but has lower power consumption compared to the SDFF. The longer stack of nMOS transistor sat the output node (Fig. 2) makes it slower than SDFF and causes large holdtime requirement. This large positive hold time requirement makes the integration of HLFF to complex circuits a difficult process. Also it is inefficient in embedding logic.



#### Figure 2: HLFF

The major sources of power dissipation in the conventional semi-dynamic designs are the redundant data transitions and large pre charge capacitance. Many attempts have been made to reduce the redundant data transitions in the flip-flops [8]–[13]. The conditional data mapping flip-flop (CDMFF) shown in Fig. 4 is one of the most efficient among them. It uses an output feedback structure to conditionally feed the data to the flip-flop. This reduces overall power dissipation by eliminating unwanted transitions when a redundant event is predicted [12]. Since there are no added transistors in the pull-down nMOS stack, the speed performance is not greatly affected. But the presence of three stacked nMOS transistors at the output node, similar to HLFF, and the presence of conditional structures in the critical path increase the hold time requirement and D-Q delay of the flip-flop. Also, the additional transistors added for the conditional circuitry make the flip-flop bulky and cause an increase in power dissipation at higher data activities.

The large pre charge-capacitance in a wide variety of designs results from the fact that both the output pull-up and the pull-down transistor are driven by this pre charge node. These transistors being driving large output loads contribute to most of the capacitance at this node. This common drawback of many conventional designs was considered in the design of XCFF (Fig. 5). It reduces the power dissipation by splitting the dynamic node into two, each one separately driving the output pull-up and pull-down transistors as shown in Fig. 5.

Since only one of the two dynamic nodes is switched during one CLK cycle, the total power consumption is considerably reduced without any degradation in speed. Also XCFF has a comparatively lower CLK driving load. One of the major drawbacks of this design is the redundant pre charge at nodeX2 and X1 for data patterns containing more 0 s and 1 s, respectively. In addition to the large hold time requirement resulting from the conditional shutoff mechanism, a low to high transition in the CLK when the data is held low can cause charge sharing at node X1. This can trigger erroneous transition at the output unless the inverter pair INV1-2 is carefully skewed. This effect of charge sharing becomes uncontrollably large when complex functions are embedded into the design.



Figure 3: SEMIDYNAMIC FLIP-FLOP

The conditional shutoff mechanism provided in SDFF (Fig. 3) is robust. It is capable of producing smaller sampling window by skewing the inverters and the NAND gate in the conditional shutoff path.



Figure 4: CDMFF

#### 3. PROPOSED DDFF

Fig. 6 shows the proposed DDFF architecture. Node X1is pseudo-dynamic, with a weak inverter acting as a keeper, whereas, compared to the XCFF, in the new architecture nodeX2 is purely dynamic. An unconditional shutoff mechanism is provided at the frontend instead of the conditional one in XCFF. The operation of the flip-flop can be divided into two phases: 1) the evaluation phase, when CLK is high, and 2) the pre charge phase, when CLK is low. The actual latching occurs during the 1–1 overlap of CLK and CLKB during the evaluation phase. If D is high prior to this overlap period, node X1 is discharged through NM0-2. This switches the state of the cross coupled inverter pair INV1-2 causing nodeX1B to go high and output QB to discharge through NM4.



Figure 5: XCFF

The low level at the node X1 is retained by the inverter pairINV1-2 for the rest of the evaluation phase where no latching occurs. Thus, node X2 is held high throughout the evaluation period by the pMOS transistor PM1. As the CLK falls low, the circuit enters the pre charge phase and node X1 is pulled high through PM0, switching the state of INV1-2. During this period node X2 is not actively driven by any transistor, it stores the charge dynamically. The outputs at node QB and maintain their voltage levels through INV3-4.If D is zero prior to the overlap period, node X1 remains high and node X2 is pulled low through NM3 as the CLK goes high. Thus, node QB is charged high through PM2and NM4 is held off. At the end of the evaluation phase, as the CLK falls low, node X1 remains high and X2 stores the charge dynamically. The architecture exhibits negative setup time since the short transparency period defined by the 1–10verlap CLK of and CLKB allows the data to be sampled even after the rising edge of the CLK before CLKB falls low [7].



Figure 6: Proposed DDFF



Figure 7: Result of DDFF

#### 4. PROPOSED ELM

As mentioned earlier, the major advantage of the SDFF is the capability to incorporate complex logic functions efficiently. The efficiency in terms of speed and area comes from the fact that an N-input function can be realized in a positive edge triggered structure using a pull-down network(PDN) consisting of N transistors as shown in Fig. 8(*a*).

Compared to the discrete combination of N a static gate and a flip-flop, this embedded structure offers a very fast and small implementation. Although SDFF is capable of offering efficiency in terms of speed and area, it is not a good solution as far as power consumption is concerned. Not too many attempts have been made to design a flip-flop, which can incorporate logic efficiently in terms of power, speed and area. The double-pulsed set-conditional-reset flip-flop (DPSCRFF)[15] is one of the flip-flops capable of incorporating logic .But this structure has an explicit pulse generator to generate two pulses from the global CLK, which can cause large power consumption even when there is no data transition. Also, the three inverter delay between the two pulses, p1 and p2 [15], causes a direct path between supply rails and a large glit chat the output when the data input remains high for more than one CLK cycle. In addition, the highly asymmetric timing nature of the design and the large hold time requirements prevent it from being directly cascaded without the use of additional buffers. Another flip-flop design aiming at efficient logic embedding is presented. The revised structure of the proposed dual dynamic node hybrid flip-flop with logic embedding capability (DDFF-ELM)is shown in Fig. 9(b). Note that in the revised model, the transistor driven by the data input is replaced by the PDN and the clocking scheme in the frontend is changed. The reason for this in clocking is the charge sharing, which becomes uncontrollable as the number of nMOS transistors in the stack increases.





Figure 8: Flip-flops with embedded logic (a) SDFF (b) Proposed DDFF-ELM

In the proposed structure [Fig. 8(b)], since a low to high transition of CLKB occurs when CLK is low, the node X1is held high by PM0 making this design free from charge sharing. The operation of the logic element is similar to the proposed DDFF.

Table 5.1 show that the proposed flip-flop has the lowest PDP among the group. It gives 29%, 10%, and 7% reduction in total power dissipation compared to SDFF, PowerPC, and XCFF, respectively, along with comparable speed performance. In order to estimate the size of the flip-flops, the number of transistors used and the total layout area of various designs are provided. The proposed flip-flop uses least number of devices.

Table 5.2 gives the performance comparison of the ELM with various embedded functions. The results show that proposed ELM gives comparable speed performances Compared to the SDFF-ELM. The DDFF-ELM exhibits 15% and 22% lower delay for AND and OR logic, respectively. As expected, the power performance of the proposed ELM is superior to that of the SDFF.



Figure 9: Result of DDFF-ELM

Table 1 Data Activity

| Flip-Flop    | Number of Transitor | Total Power (NW) | Delay DQ (NS) | PDP(FJ) |
|--------------|---------------------|------------------|---------------|---------|
| Power PC 630 | 22                  | 1.3429           | 195.42        | 3.75    |
| HLFF         | 20                  | 1.9658           | 191.17        | 2.62    |
| SDFF         | 23                  | 2.1132           | 188.26        | 3.97    |
| CDMFF        | 22                  | 1.2885           | 199.37        | 2.56    |
| XCFF         | 21                  | 1.3119           | 195.42        | 2.45    |
| DDFF         | 18                  | 1.0472           | 197.31        | 2.06    |

## Table 2Performance Comparisons

| Function | SDFF_ELM(D-Q) | DDFF-ELM(D-Q) | SDFF-ELM(NS) | DDFF-ELM(D-Q) |
|----------|---------------|---------------|--------------|---------------|
| AND      | 160.331       | 148.242       | 1.8792       | 1.5784        |
| OR       | 127.892       | 138.231       | 1.6352       | 1.5689        |

As the total power dissipated in the flip-flop depends on the data activity, an illustration of power dissipated at data activities of 100%, 25%, and 0% are given in Fig. 11.

Data activity of 100% corresponds to 101010... Data pattern and50% data activity corresponds to 11001100... data pattern and so on. In order to analyze the performance of the flip-flopping the absence of any data switching, power dissipation corresponds to 0% data activity for 11111... and 00000...data patterns are also provided. The results show that the proposed design consumes lowest total power for

100% and 0% (0000...) data activity. As mentioned earlier, the small pre charge node, CLK-input, and datainput capacitances makes the proposed flip-flop power efficient at higher data rates. At 25% data activity, CDMFF dissipates lowest power because the conditional structure eliminates the redundant transitions. For 11111...data pattern, DDFF consumes higher power compared to XCFF, CDMFF, and PowerPC flipflop. This is because of the unconditional shutoff mechanism provided in the frontend, but it is still less than that of SDFF and HLFF. As mentioned earlier, 00000... data pattern causes large redundant power dissipation in XCFF because of the unwanted activity at node. Since this redundancy is eliminated, DDFF. provides superior performance for this data pattern





Figure 10 : Power Vs Applied voltages





Figure 12: Bit Johnson Counter Using DDFF

Figure 13 shows the result of 4 bit Johnson counter. The proposed dual dynamic node hybrid flip-flop is connected in cascade manner. Initially all flip-flops are reset to "0000".when the first clock pulse is applied the Q bar of the last flip-flop is connected to the first flip-flop input so the output will be "1000" and so on.



Figure 13: Result of 4 Bit Johnson Counter using DDFF

#### 5. CONCLUSION

In this paper, a new low power DDFF and a novel DDFFELM were proposed. An analysis of the overlap period required to select proper pulse width was provided in order to make the design process simpler. The proposed DDFF eliminates the redundant power dissipation present in the XCFF.A comparison of the proposed flip-flop with the conventional flip-flops showed that it exhibits lower power dissipation along with comparable speed performances. The post-layout simulation results showed an improvement in PDP by about 10% compared to the XCFF at 25% data activity. By eliminating the charge sharing, the revised structure of the proposed flip-flop, DDFF-ELM, is capable of efficiently incorporating complex logic in to the flip-flop. The presented ELM out performs the SDFF in the CLK driving power and in internal power dissipation. A power reduction of approximately 26% was observed when basic functions were embedded. The leakage and PVT variation performances of the flip-flops were studied in detail. The efficiency of the flip-flop and the ELM were further highlighted using a 4-b synchronous counter and a4-b Johnson updown counter, respectively. It was proven that the proposed architectures are well suited for modern high performance designs where area, delay-overhead, and power dissipation are of major concern

#### 6. **REFERENCES**

- H. Patrovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, "Flow-through latch and edge-triggered flipflop hybrid elements," inProc. IEEE ISSCC Dig. Tech. Papers, Feb. 1996, pp. 138–139.
- 2. F. Klass, "Semi-dynamic and dynamic flip-flops with embedded logic," in Proc. Symp. VLSI Circuits Dig. Tech. Papers, Honolulu, HI, Jun. 1998, pp. 108–109.
- 3. J. Yuan and C. Svensson, "New single-clock CMOS latches and flip flops with improved speed and power savings," IEEE J. Solid-State Circuits, vol. 32, no. 1, pp. 62–69, Jan. 1997.
- A. Hirata, K. Nakanishi, M. Nozoe, and A. Miyoshi, "The cross charge control flip-flop: A low-power and highspeed flip-flop suitable for mobile application SoCs," in Proc. Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2005, pp. 306–307.

- J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 2003.
- G. Gerosa, S. Gary, C. Dietz, P. Dac, K. Hoover, J. Alvarez, H.Sanchez, P. Ippolito, N. Tai, S. Litch, J. Eno, J. Golab, N. Vanderschaaf, and J. Kahle, "A 2.2 W, 80 MHz superscalar RISC microprocessor, "IEEE J. Solid-State Circuits, vol. 29, no. 12, pp. 1440–1452, Dec.1994.
- 7. V. Stojanovic and V. Oklobdzija, "Comparative analysis of master slave latches and flip-flops for high-performance and low-power systems, "IEEE J. Solid-State Circuits, vol. 34, no. 4, pp. 536–548, Apr.1999.
- 8. B.-S. Kong, S.-S. Kim, and Y.-H. Jun, "Conditional-capture flip-flop for statistical power reduction," IEEE J. Solid-State Circuits, vol. 36, no. 8, pp. 1263–1271, Aug. 2001.
- 9. N. Nedovic and V. G. Oklobdzija, "Hybrid latch flip-flop with improved power efficiency," in Proc. Symp. Integr. Circuits Syst. Design, 2000, pp.211–215.
- N. Nedovic, M. Aleksic, and V. G. Oklobdzija, "Conditional pre-charge techniques for power-efficient dual-edge clocking," in Proc. Int. Symp .Low-Power Electron. Design, 2002, pp. 56–59.
- 11. P. Zhao, T. K. Darwish, and M. A. Bayoumi, "High-performance and low-power conditional discharge flip-flop," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 477–484, May2004.
- C. K. Teh, M. Hamada, T. Fujita, H. Hara, N. Ikumi, and Y.Oowaki, "Conditional data mapping flip-flops for low-power and high performance systems," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 14, no. 12, pp. 1379–1383, Dec. 2006.
- S. H. Rasouli, A. Khademzadeh, A. Afzali-Kusha, and M. Nourani, "Low-power single- and double-edge-triggered flip-flops for high-speed applications," Proc. Inst. Elect. Eng. Circuits Devices Syst., vol. 152,no. 2, pp. 118–122, Apr. 2005.
- 14. H. Mahmoodi, V. Tirumalashetty, M. Cooke, and K. Roy, "Ultra low power clocking scheme using energy recovery and clock gating," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 1, pp. 33–44, Jan. 2009.
- 15. A. Ma and K. Asanovic, "A double-pulsed set-conditional-reset flip flop, "Laboratory for Computer Science, Massachusetts Inst. Technology, Cambridge, Tech. Rep. MIT-LCS-TR-844, May 2002.
- O. Sarbishei and M. Maymandi-Nejad, "Power-delay efficient overlap based charge-sharing free pseudo-dynamic D flip-flops," in Proc. IEEE Int. Symp. Circuits Syst., May 2007, pp. 637–640.
- 17. O. Sarbishei and M. Maymandi-Nejad, "A novel overlap-based logic cell: An efficient implementation of flip–flops with embedded logic, "IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 2, pp.222–231, Feb. 2010.
- M. Hansson and A. Alvandpour, "Comparative analysis of process variation impact on flip-flop power-performance," in Proc. IEEE Int. Symp. Circuits Syst., May 2007, pp. 3744–3747.
- 19. S. Yang, W. Wolf, N. Vijaykrishnan, Y. Xie, and W. Wang, "Accurate stacking effect macro-modeling of leakage power in sub-100 nm circuits," in Proc. IEEE 18th Int. Conf. VLSI Design, Jan. 2005, pp.165–170.
- Y.-F. Tsai, D. Duarte, N. Vijaykrishnan, and M. J. Irwin, "Implications of technology are scaling on leakage reduction techniques," in Proc. Design Autom. Conf., Jun. 2003, pp. 187–190.