Position Paper for the ACM SigOps Workshop 1988 ----------------------------------------------- Andrew Birrell Digital Equipment Corporation Systems Research Center 130 Lytton Avenue Palo Alto, CA 94301, U.S.A. 1. The Desire for Autonomy --------------------------- Why build a distributed system? Why not spend the same money and effort on a centralized system? There are several attractions to sticking with a centralized system. a) It's easier to build. It doesn't need much research to build powerful centralized systems. The technologies are well understood, even if the requirements include high availability in the presence of failures. b) It performs better. A distributed system necessarily involves a static partition of the total resources into smaller boxes. A centralized system can partition dynamically, responding to the time-varying offered load. c) It's easier to manage. You employ one system manager for the centralized system, and he does the management as his job, what he gets paid for. Distributed systems typically make each user be a system manager, even although it's not viewed as a productive part of their job. This isn't a necessary consequence of distribution, but it seems to happen with today's systems. So the advocates of distributed systems must be able to justify why their customers should forego these benefits. The possibility that distributed systems research might be fun is not enough. Advocates of distribution often give variants of the following justifications. d) The same-cost distributed system will be more powerful. This argument says that distributed systems are presently more cost- effective. e) The distributed system will allow for high-bandwidth user interaction. This argument says that economic high bandwidth links must be shorter than the distance to the machine room. f) The entire system must be widely dispersed geographically. This is a special case of (e); long-distance high-bandwidth links are indeed prohibitively expensive. g) The users will be more productive if they are insulated from failures of centrally managed resources. This is part of the "autonomy" argument. h) Users of a distributed system can choose to be their own system managers, if they wish to configure their part of the system in non-standard ways. This is the positive side of (c) above. This is also part of the autonomy argument. I am unaware of published numbers that support the cost-effectiveness argument (d). Indeed, the empirical evidence provided by IBM's share price indicates that (d) is false. Despite the evidence in support of (d) provided by Sun Microsystems' share price. There is a little truth to (e). Experiments with running the X window protocol across 9600 baud lines (with data compression) are disappointing; that bandwidth appears to be inadequate to keep the screens up to date in the style to which we have become accustomed. But only just inadequate. A 19.2K line would probably be fine for 800 x 1000 black and white displays, and a 56K line would be easy. A 0.5M line would certainly be good enough for high quality color. These speeds are easy to achieve within a local area. This indicates that user interaction would be well satisfied by a centralized system with X window terminals connected to it by a medium speed star-shaped local area network. Argument (f) is real. But most of the distributed systems we have seen don't work satisfactorily across wide areas either. This leaves us with the two parts of the autonomy argument as the dominant rationale for building distributed systems. Even if the other arguments were partially true, autonomy would still stand out as an important reason for distribution, probably the major one apart from wide-area geographic distribution. 2. Loss of Autonomy -------------------- Some of the hardest design issues that we face revolve around the conflict between this desire for autonomy, and the inevitable loss of autonomy that comes from access to shared services. The simplest distributed systems are formed from network interconnection of totally autonomous machines. These systems offer network services that are used by only in response to explicit user requests (such as file transfers in the FTP style, or explicit remote login). In such systems there is little loss of autonomy: each machine, be it a workstation, a VAX 8800 or a large mainframe, will as usable in the presence of network or server failures as it would have been without the addition of network services. The effect of failure of a network service will be at most to cause the explicit user request to fail. The user is denied only access to the failed resource, and it is necessarily a resource that the user is explicitly aware of. As distributed systems have become more sophisticated, the machines have come to depend on their services more intimately. At the extreme, the entire distributed system behaves as a single distributed time- sharing system, vulnerable to the failure of its components. When sufficient components fail, the entire system becomes monolithically unavailable. This effect was once described (by Leslie Lamport) as "a distributed system is one where my program can crash because of some computer that I've never heard of". There is room for argument about whether failure of a network component should be able to cause failure of an individual workstation. However, it seems clear (to me) that the failure of a network component shouldn't be able to cause failure of a large system such as a mainframe or a building full of workstations. In other words: failure of a small part must not be able to cause failure of a much larger part. In other words: for large enough components of the system, autonomy is a requirement, not just a desire. One approach to avoiding this loss of autonomy is to work hard at reducing the probability of network-based failures to an "acceptably" low value. We believe this does not solve the problem. The available techniques (including replication) can reduce some failure modes, but they have no impact on several key failures, notably: administrative errors, replicated bugs and environmental problems (air conditioning design, or the city's electricity supply). Another approach is to replicate each necessary service everywhere where you care about retaining autonomy. For example you could run a name server replica and a satellite time receiver on each mainframe. This would be acceptable from the point of view of the extra resource it would consume on the mainframe (or workstation cluster). But it is likely to be unacceptable to the name service because of the vast number of replicas it would create. And even this technique doesn't handle the problems of administrative failure and replicated bugs. 3. Regaining Autonomy ---------------------- One technique would, if feasible, solve this problem. We could arrange that for each critical service, if the network service is unavailable but is required for continued local operation, then an acceptable substitute is available locally. A trivial example of this is that if the network time service is unavailable, we will use the local clock instead. We (Mike Burrows, Roger Needham, Michael Schroeder and the present author) have been investigating the practicality of alleviating the autonomy problem by a general facility known (hereafter) as "stashing". A stash is akin to a cache, in that it contains results of previous accesses to network services. But the distinguishing feature is that the values in the stash will be used in place of the network service if the service is unavailable. In other words values in a stash are most useful at the times when they cannot be verified; values in a cache just optimize the verification process. A cache contains a reasonably arbitrary selection of things recently referenced. A stash is a more principled structure, containing things that are believed to be important to the continued well-being of the local machine. For example, even although you haven't logged-in since yesterday, you would expect your mainframe to still retain the information about your user account - even if the name service is down. In the case of the name service we propose to stash the arguments and results of all name service calls. This is not the same as having a copy of part of the name service name-space; there is no reason in general to suppose that there are units of name-space that correspond properly to what is used on a particular machine. In ordinary operation, with the name service in place we shall use it and only update the stash, not consult it. If the name service is unavailable no updates are allowed to the name service data in the stash. (There is also some attraction to using the stashed data as a name service cache, but that is not relevant here.) Files could be stashed. The stashed file data would participate in any general purpose file caching scheme. The stash would include all the files necessary to the running of the machine. Hopefully, most of these are immutable. If mutable stashed files are also used as a cache, then they must participate in file caching schemes (next section), with the additional proviso that they cannot just be "invalidated" - having a file stashed means that it's always available. Most likely, a stashed file would not be writable when the file service is unavailable. File directory entries could be stashed. Or entire file directories. If stashed directory data is used as a cache, it must participate in the caching schemes (next section); if not then updates must be written through to the backing file service. A stashed directory could be updated even if the backing file service is unavailable. In this case the updates would be written through to the file service when the file service is next available. The file service could resolve any resulting conflicts using a LOCUS(tm) style algorithm. We have a number of half-baked ideas on what the rules are for stashing a name service value or a file or directory entry. For example, stash it thirty days after being used on each of two successive days. 4. Conclusions --------------- Autonomy is a critical part of the argument for local area distributed systems. Our present experience is that as distributed systems become more powerful, autonomy diminishes. Fault tolerant software and hardware configurations can alleviate, but not eliminate this problem. The technique of stashing offers some hope. But we do not yet understand whether stashing is feasible. Stashing replaces failure by out-of-date answers; we do not know whether this is acceptable to users and systems. We do not understand the appropriate strategies for building stashes for particular applications. Acknowledgement --------------- These views and ideas (such as they are) arose out of meetings amongst Mike Burrows, Roger Needham, Michael Schroeder and the present author. But any deficiencies in the presentation are probably my fault, and I have not checked that the other participants still hold these views: they might disagree radically by now.