Textbook Draft

Chapter Link Name
1 PDF Classical Dependability Techniques & Modern Computing Systems: Where and how do they meet?
2 PDF Hardware and Software Error Detection and Example Applications
3 PDF Processor Level Detection and Recovery
4 PDF Data Analysis
5 PDF Software Detection
6 PDF Reliable Networked and Distributed Systems
7 PDF Checkpointing and Rollback Error Recovery
8 PDF Checkpointing Large-Scale Systems
9 PDF Internals of Fault Injection Techniques
10 PDF Safeguarding Current Technologies

Probability Review

Item Link Name
1 Link Prof. Hajek’s course notes for ECE313
2 Link Prof. Iyer’s lecture slides for ECE313
  • D. P. Siewiorek and R. S. Swarz, Reliable Computer Systems - Design and Evaluation, Digital Press, 1998, 3rd edition.
  • M. Singhal and N.G. Shivaratri, Advanced Concepts in Operating Systems, McGraw-Hill, 1994.
  • D. K. Pradhan, ed., Fault Tolerant Computer System Design, New Jersey: Prentice-Hall, 1996.
  • B. W. Johnson, Design and Analysis of Fault-Tolerant Digital Systems, Addison Wesley, 1989.
  • M. R. Lyu, ed., McGraw-Hill Handbook of Software Reliability Engineering, McGraw-Hill 1996.
  • M. R. Lyu, ed., Software Fault Tolerance, John Wiley & Sons, 1995.
  • K.P. Birman, Building Secure and Reliable Network Applications, Manning, 1996.
  • K.S. Trivedi, Probability and Statistics with Reliability, Queuing and Computer Science Applications, John Wiley & Sons, 2nd edition, 2002.
  • P. Jalote, Fault Tolerance in Distributed Systems, Prentice-Hall, Inc. 1994.
  • M. Shooman, Probabilistic Reliability: An Engineering Approach, McGraw-Hill, 1968.