next up previous contents
Next: Impossibility Proofs Up: Lecture 2 Previous: Lecture 2

Byzantine Agreement

Byzantium, 1453 AD. The city of Constantinople, the last remnants of the hoary Roman Empire, is under siege. Powerful Ottoman battalions are camped around the city on both sides of the Bosporus, poised to launch the next, perhaps final, attack. Sitting in their respective camps, the generals are meditating. Because of the redoubtable fortifications, no battalion by itself can succeed; the attack must be carried out by several of them together or otherwise they would be thrusted back and incur heavy losses that would infuriate the Grand Sultan. Worse, that would jeopardize the prospects of a defeated general to become Vizier. The generals can agree on a common plan of action by communicating thanks to the messenger service of the Ottoman Army which can deliver messages within an hour, certifying the identity of the sender and preserving the content of the message. Some of the generals however, are secretly conspiring against the others. Their aim is to confuse their peers so that an insufficient number of generals is deceived into attacking. The resulting defeat will enhance their own status in the eyes of the Grand Sultan. The generals start shuffling messages around, the ones trying to agree on a time to launch the offensive, the others trying to split their ranks...

Menlo Park, 1982 AD. The situation above describes a classical coordination problem in distributed computing known as byzantine agreement which was introduced in two seminal papers by Lamport, Pease and Shostak [23,30]. Broadly stated, a basic problem in distributed computing is this: Can a set of concurrent processes achieve coordination in spite of the faulty behaviour of some of them? The faults to be tolerated can be of various kinds. The most stringent requirement for a fault-tolerant protocol is to be resilient to so-called byzantine failures: a faulty process can behave in any arbitrary way, even conspire together with other faulty processes in an attempt to make the protocol work incorrectly. The identity of faulty processes is unknown, reflecting the fact that faults can (and do) happen unpredictably.

What kind of applications need this kind of malicious fault to be handled? Part of the answer can be found in the footnote of the original paper by Lamport, Shostack and Pease listing the sponsors: NASA, the Ballistic Missile Defense Systems Command and the Army Research Office. Besides the usual applications to killing and terrorizing people, protocols that withstand byzantine failures are useful in any situation where it is vital that a system performs correctly in spite of the malfunctioning of some its subparts, regardless of the type of malfunctioning. Examples include aircraft control systems and applications of computer technology to medicine. According to Nancy Lynch's book ``the agreement problem is a simplified version of a problem that originally arose in the development of on-board aircraft control systems. In this problem, a collection of processors, each with access to a separate altimeter, and some of which may be faulty, attempt to agree on the airplane's altitude. Byzantine agreement algorithms have also been incorporated into the hardware of fault-tolerant multiprocessor systems; there, they are used to help a small collection of processors to carry out identical computations, agreeing on the results at every step. This redundancy allows the processors to tolerate the (Byzantine) failure of one processor. Byzantine agreement algorithms are also useful in processor fault diagnosis, where they can permit a collection of processors to agree on which of their number have failed (and should therefore be replaced or ignored)'' [27].

The problem we want to study is whether coordination among processes is at all possible in spite of byzantine faults. We begin with a precise formulation of the problem.

We are given a set G of processes, called generals, which is partitioned into two sets L and T of, respectively, loyal generals and traitors. Henceforth, t denotes the number of traitors and n the overall number of processes. Each general Gi has an input bit xi- Gi's initial assessment- and must produce an output bit di called the decision value of Gi. The underlying communication mechanism is

As we shall see, each of these assumptions has a big impact on the problem. The question we ask is whether there exists a protocol satisfying the following conditions.
Non-triviality : If all generals have the same input bit b then, the only possible decision value of the loyal generals is b. More formally, $\forall G_i, x_i=b \; \Rightarrow \; \forall G_j \in L, d_j=b$.
Agreement : The loyal generals should agree on the decision. That is, $\forall G_i,G_j\in L, \; d_i=d_j$.
Limited bureaucracy : The protocol must terminate.
The first requirement is a technical condition to disallow trivial constant solutions; if all generals have the same input 0 (resp. 1), then all loyal generals must decide 0 (resp. 1). The output condition refers to loyal generals only because faulty processors can behave arbitrarily. The second condition captures the essence of the problem, while the third is an obvious requirement that must be satisfied for the protocol to be of any use. Note however a subtle point. The requirement can be relaxed as follows:
Limited dithering: Eventually all loyal generals should come to a decision.
This is less stringent than limited-bureaucracy because a process might commit to a decision value and yet keep participating to the protocol which might even run forever. We shall see later examples of this.

next up previous contents
Next: Impossibility Proofs Up: Lecture 2 Previous: Lecture 2
Viggo Kann
Sat Dec 20 23:41:16 MET 1997