Skip to main content
  • Book
  • © 2010

Stochastic Models for Fault Tolerance

Restart, Rejuvenation and Checkpointing

Authors:

Buy it now

Buying options

eBook USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

This is a preview of subscription content, log in via an institution to check for access.

Table of contents (10 chapters)

  1. Front Matter

    Pages i-xvi
  2. Introduction

    1. Front Matter

      Pages 1-1
    2. Basic Concepts and Problems

      • Katinka Wolter
      Pages 3-12
    3. Task Completion Time

      • Katinka Wolter
      Pages 13-31
  3. Restart

    1. Front Matter

      Pages 33-33
    2. Applicability Analysis of Restart

      • Katinka Wolter
      Pages 35-50
    3. Moments of Completion Time Under Restart

      • Katinka Wolter
      Pages 51-93
    4. Meeting Deadlines Through Restart

      • Katinka Wolter
      Pages 95-115
  4. Software Rejuvenation

    1. Front Matter

      Pages 117-120
  5. Checkpointing

    1. Front Matter

      Pages 167-168
    2. Checkpointing Systems

      • Katinka Wolter
      Pages 171-176
    3. Stochastic Models for Checkpointing

      • Katinka Wolter
      Pages 177-236
    4. Summary, Conclusion and Outlook

      • Katinka Wolter
      Pages 237-240
  6. Back Matter

    Pages 241-269

About this book

As modern society relies on the fault-free operation of complex computing systems, system fault-tolerance has become an indispensable requirement. Therefore, we need mechanisms that guarantee correct service in cases where system components fail, be they software or hardware elements. Redundancy patterns are commonly used, for either redundancy in space or redundancy in time.

Wolter’s book details methods of redundancy in time that need to be issued at the right moment. In particular, she addresses the so-called "timeout selection problem", i.e., the question of choosing the right time for different fault-tolerance mechanisms like restart, rejuvenation and checkpointing. Restart indicates the pure system restart, rejuvenation denotes the restart of the operating environment of a task, and checkpointing includes saving the system state periodically and reinitializing the system at the most recent checkpoint upon failure of the system. Her presentation includes a brief introduction to the methods, their detailed stochastic description, and also aspects of their efficient implementation in real-world systems.

The book is targeted at researchers and graduate students in system dependability, stochastic modeling and software reliability. Readers will find here an up-to-date overview of the key theoretical results, making this the only comprehensive text on stochastic models for restart-related problems.

Reviews

From the reviews:

“Wolter’s textbook presents … three issues that will interest specialists in distributed systems and software design: restarting, rejuvenation, and checkpointing. … The book’s strength is its ability to systematically gather different models that are rarely presented together in one place. It is also admirable how clearly … the difficult material on reliability is presented. The work is intended for experienced readers … . The content is up to date; in fact, many of the analyses are quite new and based on Wolter’s own work.” (Piotr Cholda, ACM Computing Reviews, November, 2010)

“It is comprehensive and self-contained as it includes everything one needs to understand and apply the models and algorithms represented. Even the probability distributions used in those models are briefly, yet satisfactorily, explained in the appendices. All in all, I can recommend this book as a handbook not only to researchers and practitioners who work in this field, but also to students as a textbook.” (Fevzi Belli, Zentralblatt MATH, Vol. 1209, 2011)

Authors and Affiliations

  • Inst. Informatik, Humboldt-Universität Berlin, Berlin, Germany

    Katinka Wolter

About the author

Katinka Wolter is an assistant professor at Humboldt-University, Berlin, Germany, working with the on Computer Architecture and Communication Group since April 2002. She is principal investigator of two research projects funded by the German research council and teaches courses on performance analysis of communication systems and dependability evaluation. Prior to her current position, she was a visiting researcher at Hewlett-Packard Labs in Palo Alto, CA, USA. Her research interests include dependability evaluation of service-oriented architectures and wireless computer networks, as well as stochastic models for representing data in those systems and stochastic models for improving dependability through restart-based techniques.

Bibliographic Information

Buy it now

Buying options

eBook USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access