## 40 Gbit/s limiting output buffer in 80 nm CMOS

## G. Sialm, C. Kromer, T. Morf, F. Ellinger and H. Jäckel

A 40 Gbit/s 1V limiting output buffer for an AC-coupled 50  $\Omega$  load with a differential output swing of 660 mV and a gain of 18 dB is presented. A power consumption of only 24 mW and a simulated risetime of 11 ps are achieved by means of a systematic buffer optimisation.

Introduction: The aggressive downscaling of the transistor gate lengths enables the design of CMOS circuits operating at  $40$  Gbit/s. To route 40 Gbit/s signals from chip to chip in parallel Tbit/s links or in chip to measurement equipment, small and power-efficient  $40$  Gbit/s buffers are required. Recently published circuits at 40 Gbit/s are a CMOS amplifier [1], a multiplexer and demultiplexer [2] as well as drivers at 30 Gbit/s  $[3]$ . However, they suffer from modest output swings, which reduce noise immunity in broadband interconnects. Moreover, the decreasing supply voltages required to prevent oxide breakdown make the development of drivers with a large output swing,  $V_{\text{out}}$ , challenging. Therefore, the conventional power consumption per data rate  $(P/B)$  figure of merit (FOM), is extended by including  $V_{\text{out}}$ resulting in  $V_{\text{out}}/(P/B)$ . In this Letter, a five-stage differential-pair output buffer for an AC-coupled 50  $\Omega$  load in 80 nm CMOS with common-mode feedback (CMFB) is presented. The circuit reaches an FOM of  $1100 \text{ mV/(mW/Gbit/s)}$ . This FOM is more than seven times larger than the highest FOM reported, see Table 1.

Table 1: Driver comparison

| <b>FOM</b><br>[mV/(mW/Gbit/s)] | Technology                     | Data rate<br>$B$ [Gbit/s] | P<br>$\lceil mW \rceil$ | Differential<br>$V_{\text{out}}$ [mV] | Ref.         |
|--------------------------------|--------------------------------|---------------------------|-------------------------|---------------------------------------|--------------|
| 85.7                           | $150 \text{ nm}$<br>GaAs pHEMT | 40                        | 2800                    | 6000                                  | [5]          |
| 153.3                          | 130 nm CMOS                    | 12                        | 43.85                   | 560                                   | [6]          |
| 120                            | 130 nm CMOS                    | 30                        | 150                     | 600                                   | $[3]$        |
| 1100                           | 80 nm CMOS                     | 40                        | 24                      | 660                                   | This<br>work |

Circuit design: The 50  $\Omega$  buffer is implemented in IBM's standard CU-08 CMOS process with eight metal layers. The design goal is to demonstrate a 50  $\Omega$  driver with a supply voltage of only 1 V and a differential  $V_{\text{out}} > 500 \text{ mV}$  with an input voltage swing of  $V_{\text{in}} = 300 \text{ mV}$  at a data rate of 40 Gbit/s and a risetime  $t_r$  of about 10 ps, low power consumption and small area.

In the following the design procedure of the driver in Fig. 1 is explained. The transistor width of the main driver is determined by (i) the 50  $\Omega$  output matching requirements and (ii) the common-mode voltage  $V_{\text{CM}}$  requirement for an optimal gain-power consumption ratio  $(G/P)$ . Owing to the large output capacitance of 190 fF (pad capacitance of 100 fF), the dominant pole is at the output. Using shunt peaking with a 1 dB peak in the frequency response and a total load of 25  $\Omega$  the minimum achievable risetime  $t_r$  is 7.5 ps.



Fig. 1 Circuit schematics of implemented driver

To meet the matching requirements a large transistor is necessary  $(w = 28 \mu m)$ . Therefore, the main task of the pre-driver is to step down

ELECTRONICS LETTERS 15th September 2005 Vol. 41 No. 19

this large input capacitance without increasing the previous calculated  $t_r$ too strongly. Consequently, the number of pre-driver stages is optimised in such a way that for a given total capacitance step down ratio  $F_{\text{tot}}$  of 7 (ratio between the input capacitances of the main and the pre-driver) a minimal  $t_r$  and P result. For this  $F_{\text{tot}}$ , which is defined by the minimum transistor width, the optimal pre-driver stage number is 4. Because of the large bias resistor the minimum transistor width is restricted by the maximum peaking inductance  $[4]$ . The maximum inductance ( $\leq$ 2.1 nH) in turn is limited by layout constraints and driver chip area. A driver chip area of only  $50 \times 80 \mu m$  has been achieved. This leads to a transistor width and an input capacitance of the pre-driver of 4  $\mu$ m and 8 fF, respectively.

However, the minimum achievable  $t_r$  of a non-limiting five-stage driver with shunt peaking and a 1 dB overall peak in the frequency response is mainly determined by the RC constant of the output stage (dominant pole), yielding a  $t_r$  of about 10.5 ps. The reason is that for a five-stage driver a 1.72 times higher bandwidth per stage is required than for the same driver consisting only of a main driver. This factor corresponds roughly to the bandwidth enhancement factor for shunt peaking that yields a maximally flat frequency response [4].

To maintain a  $t_r$  of about 10.5 ps independent of process variations the input signal is additionally limited. For a given  $V_{\text{in}}$ , the maximum  $t_r$ improvement due to signal limiting is determined by the total gain of the driver. The total gain is in turn defined by  $V_{\text{CM}}$  of a stage  $(V<sub>CMin</sub> = V<sub>CMout</sub>)$  and by the number of stages. As the number of stages is fixed, the minimal  $t_r$  and P can be achieved when optimising  $V_{\text{CM}}$  for a maximum  $G/P$ . Systematic simulations have shown that this can be accomplished for  $V_{\text{GS}} = V_{\text{DS}} = 0.5 \text{ V}$ . Together with the saturation voltage of the current sources of 0.1 V the in- and output  $V_{\text{CM}}$  of a stage are 0.6 V. The simulated  $t_r$  reduction due to signal limitation for the 300 mV input signal used in the measurements is 19%. This corresponds to an effective bandwidth improvement of 25%.

The implemented CMFB prevents that small common mode shifts at the driver input result in large common mode shifts at the output, which may clip the signal. The CMFB is realised with an operational transconductance amplifier (OTA). The open-loop feedback gain and the phase margin yield 37 dB and 100 degree worst case, respectively.



Fig. 2 Measured and simulated differential voltage gain and output impedance

Results: All measurements were performed on wafer and included the pad capacitance. Fig. 2 shows that the small signal driver bandwidth is 24 GHz without limiting. However, with signal limiting and  $V_{\text{in}} \geq 300 \text{ mV}$  the effective bandwidth is, as explained, 30 GHz ( $+25%$ ), which is sufficient for an NRZ data rate of 40 Gbit/s. Measured and simulated eye diagrams are reproduced in Fig.  $3a$ ; they show clearly open eyes at 40 Gbit/s. The 40 Gbit/s multiplexer used as a source has a differential  $V_{\text{out}}$ ,  $t_r$  and rms jitter of 610 mV, 9 ps and 1.2 ps, respectively. The measured differential  $V_{\text{out}}$  of the 50  $\Omega$  driver is 660 mV after correction for cable losses. The measured  $t_r$  and rms jitter of the driver, including multiplexer and cabling at 40 Gbit/s, are  $17$  ps and  $1.56$  ps, respectively. Modelling the experimentally determined response of the measurement setup yields a total  $t_r$  of 18 ps, including the driver, which is in good agreement with the measurements. The simulated  $t_r$  of the buffer together with  $t_r = 9$  ps of the multiplexer as shown in Fig.  $3b$  results in 11 ps, and thus is close to the initial estimations.



Fig. 3 Eye diagrams at 40 Gbit/s

a Measurement and simulation (black line) of driver including losses and delays of measurement equipment b Simulation of driver including only multiplexer with 9 ps risetime

Conclusion: We have successfully demonstrated a  $40$  Gbit/s low power driver with a large differential output swing of 660 mV at only 1 V supply voltage.

*#* IEE 2005 15 June 2005 Electronics Letters online no: 20052172 doi: 10.1049/el:20052172

G. Sialm, C. Kromer, F. Ellinger and H. Jäckel (Swiss Federal Institute of Technology (ETH) Zurich, Electronics Laboratory, 8092 Zurich, Switzerland)

E-mail: gion.sialm@id.ethz.ch

T. Morf (IBM Research, Zurich Research Laboratory, 8803 Rüschlikon, Switzerland)

## References

- 1 Galal, S., and Razavi, B.: '40 Gbit/s amplifier and ESD protection circuits in 0.18-um CMOS technology'. IEEE ISSCC Dig., 2004, pp. 480–481
- 2 Kehrer, D., and Wohlmuth, H.D.: '40 Gbit/s  $2 : 1$  multiplexer and  $1 : 2$ demultiplexer in 120 nm CMOS'. IEEE ISSCC Dig., 2003, pp. 345–346
- 3 Westergaard, P., Dickson, T.O., and Voinigescu, S.P.: 'A 1.5V 20/30 Gb/s CMOS backplane driver with digital pre-emphasis'. IEEE Custom Integrated Circuits Conf., Hanoi, Vietnam, 2004, pp. 23–26
- 4 Mohan, S.S., del Mar Hershenson, M., Boyd, S.P., and Lee, T.H.: 'Bandwidth extension in CMOS with optimized on-chip inductors', IEEE J. Solid-State Circuits, 2000, 35, (3), pp. 346–355
- 5 McPherson, D.S., Pera, F., Tazlauanu, M., and Voinigescu, S.P.: 'A 3-V fully differential distributed limiting driver for 40 Gbit/s optical transmission systems', IEEE J. Solid-State Circuits, 2003, 38, (9), pp. 485–1496
- 6 Kim, J.K., and Kalkur, T.S.: 'High-speed current mode logic amplifier using positive feedback and feed-forward source follower techniques for high-speed CMOS I/O buffer', IEEE J. Solid-State Circuits, 2005, 40, (3), pp. 796–802