Next: Solving consensus using S
Up: Lecture 7
Previous: Unreliable Failure Detectors for Systems*
Each process will have its own failure detector, which it can
query like an oracle. The failure detector will then return a list of
the processes it suspects to be faulty. What makes this approach interesting
is that the detectors can make mistakes. In fact, they can make arbitrarily
many mistakes and still, if some minimal conditions are satisfied,
they can be used to solve consensus.
The first property our failure detector should satisfy is some sort of
completeness property, which roughly means that if a process
is faulty it should also be suspected by the detector.
Definition 12478
(Completeness)
-
Strong completeness means that eventually every process that
crashes is permanently suspected by all detectors.
-
Weak completeness means that eventually every process that
crashes is permanently suspected by some detector.
Notice that with weak completeness the information that a process
is crashed is somewhere in the system, but it is unknown where.
For instance, some detectors can say that p is down while others
can say that p is up and this can go on forever.
The second kind of property we would like to have is accuracy ,
which means that, if a process is suspected, it should also be faulty. We shall
distinguish four cases.
Definition 12481
(Accuracy)
-
Perpetual strong accuracy means that no
correct process is ever suspected.
-
Perpetual weak accuracy means that some of the correct
processes are never suspected.
-
Eventual strong accuracy means that eventually no correct
process is ever suspected.
-
Eventual weak accuracy means that eventually some of the
correct processes are never suspected.
In the perpetual case, the detectors are guaranteed
to work correctly from the beginning, whereas in the eventual case they
are guaranteed to do so only from some point on.
Notice that this leaves still a lot of uncertainty.
For instance, a detector can say that both p and q are correct
while in fact one of them is down. Also, with eventual accuracy
the moment after which the detectors work properly in not known.
In general ``each local failure detector module
can repeatedly add and remove correct processes from its
list of suspects [and] some correct process may be erroneously suspected
to have crashed by all other processes throughout the entire execution''
[11]. Still, if any combination of completeness
and accuracy properties is satisfied, consensus can be attained.
The above properties can be combined in eight ways, yielding the
detectors shown in table 7.1. It is left as an exercise to
show that weak and strong completeness are actually equivalent in
strength, which means that the eight different detectors collapse into
four.
Table 7.1:
The eight different combinations of detector properties.
P stands for perfect; Q for quasi-perfect;
S for strong; and W for weak.
The sign
comes from temporal logic, and should be
pronounced ``eventual''.
| |
4c|accuracy |
|
|
|
| |
2c|perpetual |
2c|eventual |
|
|
| |
strong |
weak |
strong |
weak |
| strong completeness |
P |
S |
P |
S |
| weak completeness |
Q |
W |
Q |
W |
In fact, the whole table collapse to just two entries.
We shall prove the following.
-
W guarantees (n-1)-resiliency, i.e. there is an (n-1)-resilient
asynchronous protocol for consensus using a failure detector which
satisfies weak completeness and perpetual weak accuracy .
-
There is no f resilient protocol using
for f<n/2; and
-
guarantees f-resiliency for any f<n/2,
i.e. there is an f-resilient (f<n/2) asynchronous
protocol for consensus using a failure detector which
satisfies weak completeness and eventual weak accuracy .
We will give an (n-1)-resilient protocol for consensus with the aid
of the S-detector; since strong completeness
and weak completeness are equivalent this will establish
the fact that W is enough for (n-1)-consensus. and hence
that P=W.
Then we will show that it is impossible to reach
consensus with n/2 faulty processors if we only have the
P-detector. Since S is the weakest detector with perpetual accuracy,
and
P is the strongest detector with eventual accuracy, this highlights
the big difference between these two accuracy properties.
Next lecture, we will show that consensus can be
reached even with the
W-detector, if the number of faulty processors
is less than n/2. We remind the reader that we are considering
only crash failures.
Altogether this shows that in reality there are only two
cases two consider: P=Q=S=W and
.
We should also point out that
is the weakest possible
failure detector for solving consensus in the sense that
any failure detector that solves consensus must satisfy
weak completeness and eventual weak accuracy .
The difficult and interesting proof can be found in [12].
As a final comment we remark that the protocols we will be discussing
do not require that the failure detector properties hold
forever. It is enough that they hold ``for sufficiently long time''.
In this sense, failure detectors are a neat abstraction of the time-out
mechanisms which are used in practice. In fact, several results
obtained for models assuming various degrees
of mild synchronicity can be recast and proven in terms of failure detectors
[11].
Next: Solving consensus using S
Up: Lecture 7
Previous: Unreliable Failure Detectors for Systems*
Viggo Kann
Sat Dec 20 23:41:16 MET 1997