The invention of integrated circuits and the continuing progress in their manufacturing processes are the fundamental engines for the implementation of semiconductor technologies that support today’s information society. The vast majority of microelectronic applications presented nowadays exploit the well-established CMOS process and fabrication technology which exhibit high reliability rates. During the past few decades, this fact has enabled the design of highly complex systems, consisting of several millions of components, where each one of these components could be deemed as fundamentally reliable, without the need for extensive redundancy.

The steady downscaling of CMOS technology has led to the development of devices with nanometer dimensions. Future integrated circuits are expected to be made of emerging nanodevices and their associated interconnects. The expected higher probabilities of failures, as well as the higher sensitivities to noise and variations, could make future integrated circuits prohibitively unreliable. The systems to be fabricated will be made of unreliable components, and achieving 100% correctness of operation not only will be extremely costly, but may turn out to become impossible. The global picture depicts reliability emerging as one of the major threats to the design of future integrated computing systems. Building reliable systems out of unreliable components requires increased cooperative involvement of the logic designers and architects, where high-level techniques rely upon lower-level support based on novel modeling including component and system reliability as design parameters.

In the first part, this book presents a state of the art of the circuits and systems, architectures, and methodologies focusing on the enhancement of the reliability of digital integrated circuits. This research field spans over 60 years, with a remarkable revival in interest in recent years, which is evidenced by a growing amount of literature in the form of books, or scholarly articles, and comes as a reaction to an expected difficult transition from the CMOS technology that is widely perceived as very reliable into nanotechnology which is proven very unreliable in contrast. Circuit- and system-level solutions are proposed to overcome high defect density. Their performance is discussed in the context of a trade-off solution, where reliability is suggested as a design parameter to be considered in addition to the widely used triplet consisting of delay, area, and power.
Reliability, fault models, and fault tolerance are presented in Chapter 2, establishing the major concepts further discussed in the book. Chapter 3 depicts an overview of nanotechnologies that are considered in the fabrication of future integrated circuits. This work is focused at device level and addresses technologies that are still in relative infancy. Nanoelectronic devices prove to be very sensitive to their environment, during fabrication and operation, and eventually unreliable, thereby motivating the stringent need to provide solutions to fabricate reliable systems. Fault-tolerant circuits, architectures, and systems are explored in Chapter 4, presenting solutions provided in the early ages of CMOS, as well as recent techniques. Reliability evaluation, including historical developments, and also recent methodologies and their supporting software tools are presented in Chapter 5.

In the second part of the book, original circuit- and system-level solutions are presented and analyzed. In Chapter 6, an architecture suitable for circuit-level and gate-level redundant module implementation and exhibiting significant immunity to permanent and random failures as well as unwanted fluctuation of the fabrication parameters is presented, which is based on a four-layer feed-forward topology, using averaging and thresholding as the core voter mechanisms. The architecture with both fixed and adaptable threshold is compared to triple and $R$-fold modular redundancy techniques, and its superiority is demonstrated based on numerical simulations as well as analytical developments. Its applicability in single-electron-based nanoelectronics is analyzed and demonstrated.

A novel general method enabling the introduction of fault tolerance and evaluation of the circuit and architecture reliability is proposed in Chapter 7. The method is based on the modeling of probability density functions (PDFs) of unreliable components and their subsequent evaluation for a given reliability architecture. PDF modeling, presented for the first time in the context of realistic technology and arbitrary circuit size, is based on a novel reliability evaluation algorithm and offers scalability, speed, and accuracy. Fault modeling has also been developed to support PDF modeling.

In the third part of the book, a new methodology that introduces reliability in existing design flows is proposed. The methodology is presented in Chapter 8, which consists of partitioning the full system to design into reliability-optimal partitions and applying reliability evaluation and optimization at the local and system level. System-level reliability improvement of different fault-tolerant techniques is studied in depth. Optimal partition size analysis and redundancy optimization have been performed for the first time in the context of a large-scale system, showing that a target reliability can be achieved with low to moderate redundancy factors ($R < 50$), even for high defect densities (device failure rate up to $10^{-3}$).

The optimal window of application of each fault-tolerant technique with respect to defect density is presented as a way to find the optimum design trade-off between the reliability and power area. $R$-fold modular redundancy with distributed voting and averaging voter is selected as the most promising candidate for the implementation in trillion-transistor logic systems.

The recent regain of interest in reliability that the community of micro and nanoelectronics researchers and developers shows is fully justified. The advent of novel
methodologies enabling the development of reliable systems made of unreliable devices is a key issue to sustain the consumer and industry demands related to integrated systems with improved performance, lower cost, and lower power dissipation. This ultimate goal must be tackled at several levels of the VLSI abstraction, simultaneously, where the improvements at the lower levels provide benefits at the higher levels. Finally, also the upper levels including the compiler and software should be included in a common effort to reach this striving goal.

Lausanne
June 2010

Miloš Stanisavljević
Alexandre Schmid
Yusuf Leblebici