Distributed Systems and Applications : Fault Tolerance - WMM53S3
A+Augmenter la taille du texteA-Réduire la taille du texteImprimer le documentEnvoyer cette page par mailPartagez cet articleFacebookTwitterLinked In
Number of hours
Lectures : 18.0
Laboratory works : 12.0
ECTS : 3.0
At the basis of reliable distributed systems are several mechanisms, such as leader election, (ordered) broadcast, consensus, etc. This course introduces the main algorithms that are used to implement these mechanisms; and yet the design techniques to limit the impact of software or hardware failures. We present several algorithms and give some example of basic correctness proofs. Moreover, we study how the different assumptions that can be made on a system (synchrony, faults, etc.) impact the design of distributed algorithms.
Contact Renaud LACHAIZE
The course is structured in two parts: A- Distributed algorithms and agreement [7 lectures, Renaud Lachaize] The course contains three parts: distributed algorithms and engineering distributed applications. Study of algorithms that are at the basis of reliable distributed systems. Proofs that these algorithms are correct. B - Fault-tolerance [3 lectures, Lorena Anghel] This part focuses on the main design techniques to limit the impact of software or hardware failures: faults avoidance; robustness; N version programming; recovery blocks techniques; acceptation test; retry; check points and rollback.
Centralized operating systems; networks; elements of probability.