Position Paper for SigOps European Workshop, 1992 Andrew D. Birrell Digital Equipment Corporation Systems Research Center 130 Lytton Avenue, Palo Alto, CA 94301 U.S.A. Phone +11 415 853 2214 For more than ten years, many people (including myself) have been advocating the use of "Remote Procedure Calls" (RPC) as the primary technique for communication among the components of a distributed system. Many researchers have explored the design space for RPC, and they have written many worthy papers on what they discovered. Many commercial vendors have constructed and purveyed RPC systems. Many standards committees have added their own distinctive values to the enterprise. Where has all this effort got us? It is time to consider the successes and failures, benefits and costs, of our advocacy of RPC. 1. The arguments ------------- Anyone who has written a substantial application that communicates with another piece of software outside of its own address space (across the network, or to a separate address space in the same machine) has felt the seduction of RPC. Especially when writing in a high level language with a rich type system, you quickly notice the tedium of converting between your beautiful data structures and the bland bytes offered by the underlying communication protocols (such as TCP streams or Unix pipes). Surely the computer than do this conversion for you? Anyone who has written more than one server program has noticed a regularity in the techniques for dispatching threads of control to incoming client requests. And such programmers probably treasure little subroutine packages for keeping track of resources that their servers have issued to clients, complete with mechanisms for clients to free resources, and for servers to reclaim them by timeout if the clients fail. Finally, application programmers using the communication libraries of the 1970's keep reading in the research literature of the wondrous results being obtained by researchers in their RPC systems in the universities and industrial research labs. They read about "the world's fastest distributed system", and can't understand why their own system is performing at one tenth of the speed claimed for others' systems in the public literature. So these are the attractions of RPC: - simplicity: making a remote invokation is no more complex than the familiar old procedure call. The application programmer doesn't need to get involved with issues of data representation, or managing connections, or matching up datagram replies with requests; - commonality: a single investment in a top quality RPC system will produce a collection of common communication paradigms that you can use for many applications. The RPC system will embody the known good solutions to numerous common problems encountered in building distributed applications; - performance: since everyone is going to use RPC, we can invest substantial effort in making this particular communication path go extremely fast. 2. Observed Reality ---------------- The RPC systems produced by researchers come close to meeting these lofty aspirations. Several groups have produced RPC systems with quite good performance: round-trip latency of less than 5000 instruction times, and bulk data transfer throughput of 75% or more of the available network bandwidth. Several research groups have produced RPC systems that are good enough to ease their programmers' burdens substantially, so that the programmer can actually concentrate on his own research goals, which do not include questions of byte-order on the Ethernet. A few research groups have made progress on including common paradigms beyond mere procedure call in their RPC systems - for example, issuing handles that get refreshed and timed out automatically, or that are controlled by distributed garbage collection. Some have found comfortable techniques for merging streaming mechanisms with their RPC designs, broadening the scope of the RPC solutions considerably. The RPC systems produced commercially, and the RPC designs that emerge from standards committees, are a sadder case. They seem to fall short of the targets for which the researchers had been competing. Uniformly, commercial RPC systems are a lot slower than the results published in the research literature: generally a factor of 5 or 10 slower. That's a lot of instructions! Uniformly, commercial RPC systems are more complicated and difficult to use than the research systems; or at least, than the researchers say their systems are. Commercial RPC systems mostly are minimal: they often omit the more elaborate features espoused by their research brethren. Standards designs seem to tend toward complexity and elaborateness. The overall effect has been that while many, probably the majority, of researchers in distributed systems agree on the utility of the RPC paradigm (they happily dispute about the boundaries of its usefulness, but few claim that it is fundamentally misguided), the real-world commercial programmers are left wondering what all the fuss was about. A designer of a TP system wouldn't dream of using a request response protocol that consumes 15000 of his valuable instructions, far less 30000. 3. Analysis -------- It is time to consider the several aspects of this paradox. - Why do so many commercial RPC enterprises fall short of the standards that researchers seem to find so easy? - What are the really valuable paradigms that have emerged from our use of RPC? If they're so valuable, why haven't they been embodied in more commercial designs? - What are the key techniques for achieving communication at high efficiency? In a year when vendors are manufacturing 200 MIP single-chip processors, is a 5000 instruction round-trip really needed? - Effective use of RPC seems to be inextricably linked with the availability of lightweight multi-threading within an address space. Is this really true? Why are commercial systems so slow to offer this feature? Will commercial threads be another feature where the reality falls short of the researchers' promises? I'm not going to attempt to offer answers to these questions in this position paper. But now is a good time for a well-qualified group to discuss them. We need to learn the answers, consider whether we can help the commercial world to do better in this area, and apply the lessons to our next great research insight.