Overview
- Outlines potential critical faults in the modern computer systems and what is required to change them
- Explains how to design system software for next generation computers with wider applications and greater efficiency
- Presents how implemented system software support makes maintenance easier, while reliability and performance increase
- With new chapters on computer system performance, resilience and resilient architecture simulators and system software
Access this book
Tax calculation will be finalised at checkout
Other ways to access
Table of contents (20 chapters)
Keywords
About this book
This book addresses the question of how system software should be designed to account for faults, and which fault tolerance features it should provide for highest reliability. With this second edition of Software Design for Resilient Computer Systems the book is thoroughly updated to contain the newest advice regarding software resilience. With additional chapters on computer system performance and system resilience, as well as online resources, the new edition is ideal for researchers and industry professionals.
The authors first show how the system software interacts with the hardware to tolerate faults. They analyze and further develop the theory of fault tolerance to understand the different ways to increase the reliability of a system, with special attention on the role of system software in this process. They further develop the general algorithm of fault tolerance (GAFT) with its three main processes: hardware checking, preparation for recovery, andthe recovery procedure. For each of the three processes, they analyze the requirements and properties theoretically and give possible implementation scenarios and system software support required. Based on the theoretical results, the authors derive an Oberon-based programming language with direct support of the three processes of GAFT. In the last part of this book, they introduce a simulator, using it as a proof of concept implementation of a novel fault tolerant processor architecture (ERRIC) and its newly developed runtime system feature-wise and performance-wise. Due to the wide reaching nature of the content, this book applies to a host of industries and research areas, including military, aviation, intensive health care, industrial control, and space exploration.Authors and Affiliations
About the authors
Prof Eugene Zouev is currently a professor in Innopolis University, Russia. Eugene has graduated and defended his PhD in Moscow State University (1976 and 1999, respectively). He was involved in many research and industrial projects in system software, programming languages and their compilers.
Among his achievements were full ISO-compliant C++ compiler (2000, Moscow, Russia), Zonnon language compiler for .NET (ETH Zurich 2000-2006) with and under supervision of Prof Niklaus Wirth (Turing Award) and Prof J. Gutknecht, and many other projects. His involvement in EU funded project (ONBASS 2004-09) became a next step in research and development summarised to some extent in this book.
Dr. Thomas Kaegi-Trachsel received his PhD in 2012 in ETH Zurich in the area of system software for embedded systems (under supervision of Prof. Schagaev). He is currently a Senior Researcher atErgon Informatics, Switzerland.
Bibliographic Information
Book Title: Software Design for Resilient Computer Systems
Authors: Igor Schagaev, Eugene Zouev, Kaegi Thomas
DOI: https://doi.org/10.1007/978-3-030-21244-5
Publisher: Springer Cham
eBook Packages: Engineering, Engineering (R0)
Copyright Information: The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2020
Hardcover ISBN: 978-3-030-21243-8Published: 19 July 2019
Softcover ISBN: 978-3-030-21246-9Published: 14 August 2020
eBook ISBN: 978-3-030-21244-5Published: 09 July 2019
Edition Number: 2
Number of Pages: XVIII, 308
Number of Illustrations: 42 b/w illustrations, 133 illustrations in colour
Topics: Communications Engineering, Networks, Circuits and Systems, Software Engineering, Performance and Reliability, Quality Control, Reliability, Safety and Risk