Chapter 2
Recent Trends in Bias Temperature Instability


Abstract The paradigm shifts occurring in the past few years in our understanding of BTI are reviewed. Among the most significant ones is the shift from perceiving NBTI in terms of the Reaction-Diffusion model to analyzing BTI with the tools originally developed for describing low-frequency noise. This includes the interpretation of the time, temperature, voltage, and duty cycle dependences of BTI. It is further demonstrated that a wealth of information about defect properties can be obtained from deeply-scaled devices, and that this information can allow projection of variability issues of future deeply downscaled CMOS devices. The chapter is concluded by showing the most promising technological solutions to alleviate both PBTI and NBTI.

1 Introduction

Among the critical reliability issues facing present and future deeply downscaled CMOS devices is the so-called Bias Temperature Instability (BTI). While BTI in nFET devices was generally ascribed to charge trapping in the high-k portion of the gate oxide, the interpretation of BTI in pFET devices still generates controversy [1–7]. This phenomenon in pFET devices has been previously described by diffusion of hydrogen from and back into substrate/gate oxide interface states [8]. The so-called Reaction–Diffusion model based on this assumption is still popular, especially in the design community [9], despite being inconsistent with some crucial observations [10].

In this chapter, the elementary definitions and experimental observations of BTI are first briefly reviewed. One of the most intriguing properties of BTI—the lack
of characteristic time scale, especially in pFETs—is then argued to point to a dispersion in the underlying mechanism [5, 7, 11]. In CMOS technologies, a response on many time scales is typical for low-frequency noise [and its manifestation as Random Telegraph Noise (RTN) in deeply-scaled devices], suggesting that (the principal component of) BTI is in fact caused by the same defects [12, 13]. The link between BTI and low-frequency noise is then further developed—it is shown that many properties of gate oxide defects can be directly extracted from BTI relaxation measurements in deeply-scaled devices [14] and the noise-inspired model capable to fully describe these properties is reviewed. Afterwards it is shown how the understanding of gate oxide defect properties can be used to explain variability of BTI in deeply-scaled technologies, as well as possible technological solutions for both nFETs and pFETs.

2 Brief Overview of BTI

BTI is a consequence of charging of defect states in the gate oxide and at its interface [2]. The defects could be both pre-existing or generated during device operation. The trapped charge results in a shift of the device parameters, such as its threshold voltage $V_{th}$, channel mobility, transconductance, and subthreshold slope, and generally a decrease of the FET’s drive current. The name is derived from the phenomenon being strongly accelerated by temperature $T$ and gate bias $V_G$. BTI in n-channel FET devices, which are typically biased in circuits at positive $V_G$, is referred to as Positive BTI (PBTI), while negative BTI (NBTI) takes place in p-channel FETs. Constant $V_G$ stress bias is often referred to as static, or “DC” BTI, while periodically interrupted $V_G$ stress is called “AC”, or dynamic BTI.

3 Static BTI

Figure 2.1 illustrates the typical gradual shift of pFET threshold voltage $\Delta V_{th}$ during accelerated stress at elevated $T$ [5]. The stress data are typically measured at several $V_G$’s and $\Delta V_{th}$ is extrapolated to 10 years at the circuit operating voltage $V_{DD}$ (or $V_{DD} + 10 \%$). The extrapolated $\Delta V_{th}$ must be below a given value (typically 30 or 50 mV) for the technology to qualify.

This simple extrapolation procedure is, however, complicated by $\Delta V_{th}$ decreasing immediately after the stress bias is removed, as illustrated in Fig. 2.1 [15]. As will be discussed henceforth, this recovery, or relaxation, component $R$ typically proceeds simultaneously on many time scales, making it difficult to determine its beginning or end and thus separating it from the final non-recoverable, or permanent, component $P$ [2, 6]. This $\Delta V_{th}$ relaxation is thus a crucial problem for BTI measurement, interpretation, and extrapolation.
Fig. 2.1 Shift in pFET threshold voltage is observed during negative gate bias stress. When the stress bias is removed, a recovery of the effect is seen (note: $V_{th} \sim 0$ V for this device). Inset illustrates the biases applied at FET terminals during the BTI measurement.

![Diagram showing shift in pFET threshold voltage during negative gate bias stress.](image)

Fig. 2.2 The $V_{th}$ shifts due to 50 % AC unipolar NBTI stress in pFETs are seen independent of frequency in the entire frequency range of 1 Hz–2 GHz. The $V_{th}$ shift of the corresponding DC NBTI stress obtained on an identical device is shown for comparison. Inset: Micrograph of the on-chip circuit for the DC and AC BTI measurements consisting of a ring oscillator, a frequency divider, a buffer, a pass-gate-based multiplexer, and the device under test [17].

![Diagram showing AC and DC BTI measurements.](image)

4 Dynamic BTI

In many CMOS applications, such as logic, the majority of the FETs are constantly switched and thus exposed to dynamic stress [16]. Figure 2.2 documents that NBTI is present at frequencies up to the GHz range, i.e., there does not appear to be any “cut-off” time constant of the degradation mechanism above $\sim 1$ ns [17]. Furthermore, the AC bias signal reduces BTI with respect to the DC stress. This provides some additional reliability margin, which can be factored in during the application design phase [9].
Fig. 2.3 Total degradation $\Delta V_{th}$ after 6000 s of unipolar NBTI stress shows a distinctive dependence on the duty factor $DF$. In particular, a weak dependence, or a “plateau” between $\sim 10$ and $\sim 90 \%$ is observed, complemented by rapid $\Delta V_{th}$ increase for the outermost $DF$ values. Data at different relaxation times are shown.

In an arbitrary FET of an arbitrary digital circuit, the average probability of a signal being high can vary between 0 and 100 %. The dependence of BTI on the duty cycle (called duty factor or $DF$ here) thus needs to be studied. A NBTI $\Delta V_{th}$-$DF$ dependence with an inflection point around $DF \sim 50 \%$, first reported in [17], is shown in Fig. 2.3 [12].

5 Similarity Between BTI Relaxation and Low-Frequency Noise

Long, log($t$)-like behavior of $\Delta V_{th}$ without a characteristic time scale is typically observed in both the initial portion of NBTI degradation [13, 18] and the recovery phase. Figure 2.4 illustrates that the rate of degradation $d\Delta V_{th}/dt_{relax}$ [7] extracted from the log($t_{relax}$)-like $\Delta V_{th}$ NBTI relaxation transient after even a very short, 0.1 s stress, follows $1/t_{relax}$ for over seven decades. Such behavior is a signature of states with discharging time constants covering as many decades [19].

Incidentally, superposition of states with widely distributed time scales is the standard explanation of the 1/f noise spectra [20], which are clearly observed in our pFETs (Fig. 2.4). This obvious similarity leads us to argue that the same states with widely distributed time scales in fact play a fundamental role in both NBTI and noise measurements.

6 Semi-quantitative Model for BTI Relaxation

In order to visualize this common property it is beneficial to consider an equivalent circuit representing states with widely distributed time scales [19]. Note that in either NBTI relaxation or 1/f noise measurements, no maximum or minimum “cut-
Fig. 2.4 (a) A characteristic long, log-like $\Delta V_{th}$ relaxation trace is observed after even short (pulse-like) NBTI stress. The rate of recovery $d\Delta V_{th}/dt_{relax}$ following $\sim 1/t_{relax}$ for $\sim 7$ decades is a signature of states with discharging time constants covering as many decades. (b) Gate-referred noise spectra measured on the same (unstressed) devices show clear $1/f$ dependence, routinely explained by a superposition of states with widely distributed time scales.

"off" times are typically observed [12]. For the sake of simplicity it is therefore assumed here that the time constants are log-uniformly distributed from times much shorter than the switching time of a pFET to very long, corresponding to the lifetime of a CMOS application. Such states with widely distributed time scales are then represented by “RC” elements in Fig. 2.5 with the total FET $\Delta V_{th}$ being proportional to the sum of voltages (“occupancies”) on all capacitors. For the sake of simplicity, it is assumed that all RC elements have the same weight and can be partially occupied, which emulates the behavior of a large-area device. Most properties of the recoverable component can be reproduced when the ohmic resistors in Fig. 2.5 are replaced with a non-linear component (simulated by two diodes with different parameters, see Fig. 2.5), which emulates different charging (i.e., capture) and discharging (i.e., emission) time constants of each defect [21]. Such a circuit correctly reproduces $DF$ (Fig. 2.6, cf. Fig. 2.3) and also the log-like relaxation (Fig. 2.4a) and the log-like initial phase of stress (not shown) [19].
Fig. 2.5 (a) An equivalent circuit with exponentially increasing capacitances used to emulate defect states with widely distributed time scales, such as those active in low-frequency noise. (b) The same circuit modified to account for charging (i.e., capture) and discharging (i.e., emission) time constants being voltage dependent, represented by asymmetric diodes. The sum of voltages on capacitors is assumed to be proportional to FET $\Delta V_{th}$.

Fig. 2.6 The plateau in $DF$ dependence of $R$ is also qualitatively well reproduced by the equivalent circuit in Fig. 2.5, as is the decrease with increasing relaxation time (cf. Fig. 2.3, which, however, shows the sum of $R$ and $P$).

7 Properties of Individual Defects

Figure 2.7 shows two typical $\Delta V_{th}$ relaxation transients following positive $V_G$ stress on a single $70 \times 90 \text{ nm}^2$ nMOSFET (i.e., corresponding to PBTI). Conversely to the continuous relaxation curves obtained on large devices, a quantized $\Delta V_{th}$ transient is observed in the deeply-scaled devices. In such devices, the relaxation is observed to proceed in discrete voltage steps, with each step corresponding to discharging of a single oxide defect [12, 22, 23]. Upon repeated perturbation, each defect shows up in the relaxation trace with a characteristic “fingerprint” consisting of its discharge, or emission time, and its voltage step [14].

Figure 2.8 shows the two-dimensional histogram of the heights and the emission times of the steps when the experiment was repeated 70 times at the same stressing and relaxing condition as in Fig. 2.7 [14]. In Fig. 2.8, four clusters are clearly formed that correspond to four active defects in the time window of the experimental setup.

The emission times of each defect are stochastically distributed and follow an exponential distribution. This allows us to determine the average emission time $\tau_e$. The capture time of each trap can be obtained by varying the stress (i.e., charging) time from 240 ms down to 2 ms. The intensity of the cluster decreases with reducing
**Fig. 2.7** Characteristic $\Delta V_{th}$ transients of a single 70 x 90 nm$^2$ 1 nm-SiO$_2$/1.8 nm-HfSiO nMOSFET device stressed at 25 ºC and $V_G = 2.8$ V for 184 ms. Four discrete drops are observed indicating the existence of four active traps at this stress condition.

**Fig. 2.8** Two-dimensional histograms (TDDS spectra) of the heights and emission times of the steps extracted from 70 $\Delta V_{th}$ transients of the particular device of Fig. 2.7 at (a) 25 ºC and (b) 50 ºC. Four clusters are formed that shift horizontally to shorter emission times with increasing temperature. Note that trap #3 disappears from the experimental window at 50 ºC.
stress time when the characteristic capture time is in the range of the stress time. The
fit of the intensity to \( P_c = 1 - \exp(-t_{\text{stress}}/\tau_c) \) lets us calculate the average capture
time \( \tau_c \). This technique is known as Time Dependent Defect Spectroscopy (TDDS)
[14].

In Fig. 2.8, an identical experiment was repeated at 50 °C on the same device.
Note the large horizontal shift of the clusters to shorter emission times with only a
25 °C temperature increase. The Arrhenius plots of the emission and capture times
obtained at \( T \) from 10 to 50 °C (not shown) provide activation energies of 0.48 eV
for emission and 0.25 eV for capture. Similarly thermally activated capture and
emission times are also observed in both nFET and pFET (i.e., corresponding to
NBTI) with conventional SiO₂ gate oxide [14, 23, 24]. One can therefore conclude
for all these cases that \textit{both emission and capture in both electron and hole gate
oxide traps are without a doubt thermally activated processes}. This experimental
fact is incompatible with direct elastic tunneling theories widely used in different
oxide trap characterization techniques and calculations. Consequently, a new model
that takes into account this thermal dependence has to be considered.

8 Modeling Properties of Individual Defects

A model of the above-described properties of individual gate oxide defects can be
constructed by drawing on the above similarities with low-frequency and Random
Telegraph Noise (RTN) [25]. An example of the configuration coordinate diagram of
the model is shown in the inset of Fig. 2.9. Four different configurations of the defect
are considered [14]. Two of the states are electrically neutral while two of them
correspond to the singly positively charged state. In each charge state the defect is
represented by a double well, with the first of the two states being the equilibrium
state and the other a secondary (metastable) minimum. The time dynamics of the
defect can be described by a simple stochastic Markov process. Broadly, transition
rates between states involving charge transfer assume (1) tunneling between the
substrate and the defect, and (2) nonradiative multiphonon (NMP) theory, which
has been often applied to explain RTN [26, 27]. Introduction of the NMP theory
naturally explains the temperature dependence of both capture and emission time
constants observed in the previous section. The wide distribution of time scales is
then readily described by a distribution of the overlaps of the potential wells (i.e., a
distribution of “potential barriers”) [14].

The crucial extension of the NMP theory is the assumption of the relative position
of the potential wells changing with gate bias [14], quite naturally introducing the
required strong \( V_G \) dependence. As documented in Fig. 2.9, the model successfully
describes the bias as well as the temperature dependences of the characteristic time
constants. It is also noted that, contrary to techniques for the analysis of RTN, which
only allow monitoring the defect behavior in a rather narrow time window, TDDS
can be used to study the defects capture and emission times over an extremely
wide range.
Fig. 2.9  Simulated capture and emission time constants (lines) compared with the experimental TDDS values obtained on SiO$_2$ pFETs (symbols) during NBTI stressing at 125 and 175 °C and varying $V_G$. The experimental occupation probability of the charged state $f_p$ is also indicated. The configuration coordinate diagram is shown in the inset (dashed line: neutral defect state; solid line: charged state potential).

Fig. 2.10  Simulated RTN, stress, and recovery behavior of a nano-scale device using a stochastic solution algorithm of the proposed model. (a) At the threshold voltage ($V_{g1}$), the RTN is dominated by defect #5 with the occasional contribution from defect #3. Defects #1, #2, and #4 remain positively charged within the ‘simulation/experimental’ window. (b) During stress ($V_{g2}$), the capture times are dramatically reduced by the higher (more negative) gate voltage and the defects #3 and #5 become predominantly positively charged ($t_c \ll t_e$). Defects #1, #2, and #4 start producing RTN. (c) During recovery (back at $V_{g1}$), trapped charge is subsequently lost and the dynamic equilibrium behavior is gradually restored.

It has been previously argued that the phenomenon called NBTI relaxation in pFET devices is in fact just a different facet of the well-known low-frequency noise in these devices. While the low-frequency noise corresponds to the channel/gate dielectrics system being in the state of dynamic equilibrium, NBTI relaxation corresponds to the perturbed system returning to this equilibrium [28]. Figure 2.10
Then illustrates this concept on a simulated example of a deeply scaled pFET containing only five active defects [23]. In particular it shows that the same defects can be responsible both for RTN as well as the NBTI relaxation and the (initial phase of) NBTI stress.

9 BTI Distribution in Deeply-Scaled FETs

As CMOS devices scale toward atomic dimensions, device parameters become statistically distributed. Similarly, parameter shifts during device operation, once studied in terms of the average value only, will have to be described in terms of their distribution functions. The understanding of the properties of individual defects helps us to explain this distribution. Namely, much like in the case of random telegraph noise (RTN) [29, 30], it is observed the distribution of down-steps ∆V_{th} due to individual discharging events to be exponentially distributed (Fig. 2.11). The exponential distribution of single-charge ∆V_{th} can be understood if non-uniformities in the pFET channel due to random dopant fluctuations (RDF) are considered [28–30]. A single discharging event in many devices routinely exceeded 15 mV, and in several devices exceeded 30 mV, the NBTI lifetime criterion presently used by some groups. For comparison, ∆V_{th} of less than 2 mV would be expected based on a simple charge sheet approximation. The large observed step height amplitude is due to the aggressively scaled dimensions of the pFETs used [28, 31].

Assuming the lateral locations of the trapped charges are uncorrelated, the overall ∆V_{th} distribution can be readily expressed as a convolution of individual exponential distributions [28, 31]. An actual population of stressed devices will
consistent of devices with a different number $n$ of oxide defects in each device, with $n$ being Poisson distributed [12, 22, 28]. The total $\Delta V_{th}$ distribution can be therefore obtained as

$$F_N (\Delta V_{th}, \eta) = \sum_{n=0}^{\infty} \frac{e^{-N} N^n}{n!} \left[ 1 - \frac{n}{n!} \Gamma(n + \Delta V_{th}/\eta) \right]. \quad (2.1)$$

where $N$ is the mean number of defects in the FET gate oxide and is related to the oxide trap (surface) density $N_{ot}$ as $N = W L N_{ot}$. The CDF is plotted in Fig. 2.12 for several values of $N$. For comparison, measured total $\Delta V_{th}$ distributions for three different stress times from Ref. [22] are excellently fitted by the derived analytical description.

The advantage of describing the total $\Delta V_{th}$ distribution in terms of Eq. (2.1) is its relative simplicity and tangibility of the variables. The analytical description allows, among other things, to calculate NBTI threshold voltage shifts in an unlimited population of devices, a feat practically impossible through device simulations.

10 Technological Solutions

Once the underlying BTI mechanisms are understood, the defect properties can be modified to beneficial ends. Below, two possible technological solutions for both PBTI and NBTI are discussed.
A significant reduction of PBTI threshold voltage shift is observed in planar nFETs with La passivation (“La”) over the reference stack (“ref”) without passivation. Simplified power-law projection to 10 years shows passivated stack having sufficient reliability ($\Delta V_{th} < 30$ mV) at $\sim$5 MV/cm operating field.

**Fig. 2.13** A significant reduction of PBTI threshold voltage shift is observed in planar nFETs with La passivation (“La”) over the reference stack (“ref”) without passivation. Simplified power-law projection to 10 years shows passivated stack having sufficient reliability ($\Delta V_{th} < 30$ mV) at $\sim$5 MV/cm operating field.

### 11 Improving PBTI with Rare-Earth Incorporation

PBTI was considered a minor problem in technologies based on SiO$_2$. It arose as a reliability issue when high-k materials were incorporated into the gate stack. However, when rare earths were introduced to adjust the nFET initial threshold voltage, this issue was mitigated, as can be seen in Fig. 2.13. A significant reduction of PBTI is observed in planar nFETs with lanthanum with respect to a lanthanum-free reference [32].

Positive BTI in nFETs with high-k materials like HfO$_2$ has been linked to oxygen vacancies, which produce a defect level in the upper part of the oxide band gap. Group III elements compensate unpaired electrons around the oxygen vacancy in HfO$_2$ and the defects are “passivated” by being pushed up toward the conduction band minimum [33]. Such states are not easily accessible to nFET channel electrons, resulting in the significant reduction of negative charge capture in the stack and hence the reduction of PBTI.

### 12 Improving NBTI in High-Mobility SiGe pFETs

Reduction of gate stack EOT, which is one of the most efficient ways to improve FET performance, enhances NBTI due to increased oxide electric field. As a consequence, 10 year lifetime can be guaranteed for sub-1 nm EOT Si pFETs only at gate overdrive voltages far below the expected operating voltages (Fig. 2.14).

Another way to improve FET performance is the use of high-mobility substrates, such as buried-channel SiGe [34]. Because of the valence band offset between the
SiGe and the Si cap (see inset of Fig. 2.14), inversion channel holes are confined in the SiGe layer, which therefore acts as a quantum well (QW) for holes. The Si cap lowers the inversion capacitance as compared to the accumulation capacitance. For these devices it is therefore necessary to report the capacitance-equivalent thickness in inversion \(T_{\text{inv}}\), evaluated at \(V_G - V_{th} = 0.6\) V) which will be affected by the thickness of the Si cap [35].

As can be seen from Fig. 2.14, SiGe-based device gate stacks significantly increase operating gate overdrive while still guaranteeing 10 year device lifetime and at the moment seem to be the only solution to the NBTI issue for sub-1 nm EOT devices. It has been recently observed that both increasing the Ge content in the channel as well as increasing the SiGe QW thickness reduces NBTI. Most intriguingly, a reduction of Si cap thickness also diminishes NBTI [35]. The most likely hypothesis explaining all three trends appears to be the energetic decoupling of the buried channel and the gate oxide defects [36].

**Conclusions**

In this chapter some of the shifts occurring in the past few years in our understanding of BTI were reviewed. Among the most significant ones is the shift from perceiving NBTI in terms of the Reaction–Diffusion model to analyzing BTI with the tools originally developed for describing low-frequency
noise. This includes the interpretation of the time, temperature, voltage, and duty cycle dependences of BTI. It was further demonstrated that a wealth of information about defect properties can be obtained from deeply-scaled devices, and that this information can allow interpretation of variability issues of future deeply downscaled CMOS devices. This theme was complemented by showing the most promising technological solutions to alleviate both PBTI and NBTI.

References

Circuit Design for Reliability
Reis, R.; Cao, Y.; Wirth, G. (Eds.)
2015, VI, 272 p. 190 illus., 132 illus. in color., Hardcover