Students can work in groups of two or three for the final project. The final project is an open-ended research project that can target the design of reliable hardware or software systems. Projects dealing with evaluation of reliable systems using analytical models or measurements are also encouraged. Several project ideas and conference/journal links are listed below to help you get started.

The requirements for the project are:

  1. Intial project proposal
  2. Two presentations
    • Mid-term Report: A short presentation to report on your initial progress including a critique of the literature.
    • Final Presentation: Encompassing initial goals, results achieved, method/approach, major accomplishments.
  3. Final report

Project Team Signup

One member of each project team should signup the team at this form by 9/11 before the start of the class.

Project Ideas

  • Failure Data Analysis from hyperscale systems
    • Performance-Reliability models
    • Machine Learning based failure prediction
    • Network failure localization
  • Self Driving Cars
    • Failure data (Derangement) analysis
    • Fault Injection into Neural Network inference processors (NVIDIA Xavier)
  • Reliability (and Security) of Cyber-Physical Systems
    • Power grids
    • Blue-Waters cooling and power distribution system
    • Raven surgical robot
  • Large scale network reliability (Fault injection)
    • Cray Gemini and Aires networks
    • SDN

Conferences and Journals for Project Ideas


  • International Conference on Dependable Systems and Networks (DSN)
  • Symposium on Reliable Distributed Systems (SRDS)
  • International Symposium on Software Reliability Engineering (ISSRE)
  • European Conference on Dependable Computing (EDCC)
  • Operating Systems Design and Implementation (OSDI)
  • Networked Systems Design and Implementation (NSDI)
  • Symposium on Operating Systems Principles (SOSP)
  • International Symposium on Computer Architecture (ISCA)
  • International Symposium on Microarchitecture (MICRO)
  • Special Interest Group on Performance Evaluation (SIGMETRICS)


  • IEEE Transactions on Dependable and Secure Computing
  • IEEE Transactions on Computers (special issues on Fault-Tolerant Computing)
  • IEEE Transactions on Software Engineering
  • IEEE Transactions on Parallel and Distributed Systems
  • IEEE Transactions on Reliability
  • IEEE Transactions on VLSI
  • IEEE Computer
  • IEEE Software
  • IEEE Micro
  • ACM Transactions on Computer Systems
  • Communications of the ACM