Theorem 12173
There is no 1-resilient deterministic protocol for consensus in message-passing, asynchronous systems. That is, no protocol can tolerate 1 crash failure.
This result highlights the enormous difference existing between synchronous and asynchronous systems. As we saw, if the system is synchronous up to n/3 - 1 malicious faults can be tolerated- the protocol works correctly in spite of the arbitrary behaviour of (almost) one third of the processes. If the system is asynchronous however, not even one crash failure- a much more benign kind of fault- can be tolerated. This result does depends on the asynchronous nature of the system and not on other assumptions. For instance, the same impossibility result (and in fact essentially the same proof) extends to shared-memory systems, where processes communicate by asynchronously reading and writing a shared memory accessible to all (see, for instance, [4,27]).
We now turn to the proof of the Theorem 5.1.
First off, we give a formalization of the
model. Message passing is performed by means of Send and Receive operations.
There is a buffer of messages B containing messages that have been
sent but not yet received. The buffer contains pairs of the form (p,m) where m is the
message and p is the recipient.
``sends'' the message m to process p by adding (p,m)
to the buffer. To get a message, q samples the buffer with
. If there are no messages for q, q will
simply receive a null marker,
. If there are messages ready for q
in the buffer, either q will receive any one of them--
say m, which will cause (q,m) to be removed from the buffer-- or q will
receive
. However, each message is received within a finite
number of attempts. The message delivery discipline can be assumed to be anything.
To make the lower bound result more interesting we assume that message delivery is fair in the
following sense:
if a message (p,m) is sent, and if p performs
infinitely many
operations, then m will be delivered to p
after finitely many attempts.
A process is modeled as a deterministic automaton,
possibly with infinitely many states. A process p works in steps; in each step,
p performs
, computes (in a finite time) the next
state, and sends a finite number of messages. Moreover, we may
assume that each correct process goes on like this forever; we can
always modify our protocol to satisfy this.
We continue with some definitions: A configuration C is a
vector containing the process states and the message buffer content.
An event is a pair e=(p,m) where p is a process and m is
a message or the null marker,
. The event e=(p,m) is
applicable to a configuration C if (p,m) is in the
buffer, or if
, i.e.
is always applicable.
When e = (p,m) is applicable to C,
the reception of m by p defines the next step in the
system. This will give a new configuration, which we write as
e(C). We let e1 e2 (C) denote e2(e1(C)), and so forth.
We record the following definition for future reference.
Definition 12221
A run (from C) is a (possibly empty) sequence of events that can be applied in turn, starting from C.
Finally, a configuration D is accessible if it can be reached
from an initial configuration, i.e. if
for some
initial configuration C and run
.From now on, when we talk of a
configuration, we mean an accessible configuration.
We classify configurations into three categories:
Definition 12233
We say that a run is admissible if each process, with possibly one exception, takes an infinite number of steps.
We are ready now for the the proof, which is by by contradiction. The contradiction is reached by assuming that a 1-resilient protocol exists and by exhibiting an admissible run of the protocol where all processes remain forever undecided, therefore violating Limited Dithering. Notice that an admissible run embodies the notion of fairness since each process, except perhaps one, takes infinitely many steps without deciding.
Lemma 12234
There is a bivalent initial configuration.
Proof. Let C0 be the initial configuration where xi = 0 for all
i, and for
, let Cj be the initial
configuration where xi = 1 for
and xi = 0
for all remaining i. Notice that the non-triviality property
implies that C0 is 0-valent and Cn is 1-valent. We now show
that at least one configuration among
is
bi-valent. For if not, let j be the lowest number such that
Cj is 1-valent. Obviously, Cj-1 must then be 0-valent.
Since we suppose our protocol to be 1-resilient, we can let pj
be dead from the beginning; there is still a finite run
(thus not involving pj) from Cj such that a decision is
made. But xj is the only input bit where Cj-1 and Cj
differ. Therefore,
is a run also from Cj-1, and in
particular, it will lead to the same decision there. This will
contradict either Cj-1 being 0-valent or Cj being 1-valent.![]()
The following is an easy technical lemma which will come handy.
Lemma 12270
(Commutativity Lemma)
Let
and
be runs from C, and suppose that
the set of processes taking steps in
is disjoint from
the set of processes taking steps in
. Then
and
are both runs from C, and
they lead to the same configuration.
Proof. Homework.![]()
Let C be a configuration and e=(p,m) an event applicable to C.
Notice that if e is applicable to C it remains applicable
until its corresponding step is executed.
A run
is e-free if it does not contain e.
Lemma 12285
Let C be bivalent, and let e be any applicable event. Then
there is a (possibly empty) e-free run
such
that
is bivalent.
With this lemma we can prove the theorem.
To see this, construct an admissible run by starting with an initial
bivalent configuration C0, whose existence is guaranteed by
Lemma 5.1, as follows.
Let
be any ordering of the processes.
Pick any applicable event e1=(p1,m1) and apply the lemma, thereby obtaining
a second bivalent configuration
.To make things more concrete, e1=(p1,m1) can be selected such that m1
was the first message ever sent to p1 or
if there is none.
Then, apply the lemma again by picking an event e2=(p2,m2) applicable to C1
(notice
), obtaining another bivalent configuration
. As before, m2 can be the first message ever sent
to p2 still present in the buffer.
By proceeding in a round robin fashion- making pi perform step e1+kn (k ge 0)-
we obtain an infinite admissible run where
every process remains forever undecided and yet takes infinitely many steps.
Proof. Assume there is no such
. Then e(C) must be uni-valent;
assume without loss of generality that it is 0-valent.
We want to show the existence of an e-free run
leading from C to a
configuration D such that e(D) is 1-valent. This means that on
the way from C to D, there must be two neighboring
configurations A and B such that B = f(A), where f is some
event where e(A) is 0-valent and
e(B) is 1-valent. We illustrate the situation in
Figure 5.1.
To prove the existence of such a run notice first that
since C is bi-valent, there is a run
such that
is 1-valent. Then, we distinguish between two cases:
Focus then on A and B=f(A) (refer always to Figure 5.1).
We first show that the processes taking steps in e and f
cannot be different. For if they were, according to Lemma
5.2, we would have e(B) = e(f(A)) =
f(e(A)). This is impossible, since e(B) is 1-valent, and e(A)
is 0-valent.
Thereafter, we show that we will still end up with a
contradiction. Let p be the process taking steps in e and
f. Since our protocol is supposed to be 1-resilient, there is a
run
in which p does not take any steps, leading from A
to some configuration R where a decision has been
taken. See Figure 5.2.
However, the decision can neither be 0 nor 1: