Position Paper for SigOps European Workshop, 1992

Andrew D. Birrell
Digital Equipment Corporation
Systems Research Center
130 Lytton Avenue,
Palo Alto, CA 94301
U.S.A.

Phone +11 415 853 2214

For more than ten years, many people (including myself) have been
advocating the use of "Remote Procedure Calls" (RPC) as the primary
technique for communication among the components of a distributed system.
Many researchers have explored the design space for RPC, and they have written
many worthy papers on what they discovered.  Many commercial vendors have
constructed and purveyed RPC systems.  Many standards committees have
added their own distinctive values to the enterprise.  Where has all this
effort got us?

It is time to consider the successes and failures, benefits and costs, of
our advocacy of RPC.


1. The arguments
   -------------

Anyone who has written a substantial application that communicates with
another piece of software outside of its own address space (across the
network, or to a separate address space in the same machine) has felt the
seduction of RPC.  Especially when writing in a high level language with a
rich type system, you quickly notice the tedium of converting between your
beautiful data structures and the bland bytes offered by the underlying
communication protocols (such as TCP streams or Unix pipes).  Surely the
computer than do this conversion for you?

Anyone who has written more than one server program has noticed a
regularity in the techniques for dispatching threads of control to
incoming client requests.  And such programmers probably treasure little
subroutine packages for keeping track of resources that their servers have
issued to clients, complete with mechanisms for clients to free resources,
and for servers to reclaim them by timeout if the clients fail.

Finally, application programmers using the communication libraries of the
1970's keep reading in the research literature of the wondrous results
being obtained by researchers in their RPC systems in the universities and
industrial research labs.  They read about "the world's fastest distributed
system", and can't understand why their own system is performing at one
tenth of the speed claimed for others' systems in the public literature.

So these are the attractions of RPC:

  -  simplicity: making a remote invokation is no more complex than the
     familiar old procedure call. The application programmer doesn't need
     to get involved with issues of data representation, or managing
     connections, or matching up datagram replies with requests;

  -  commonality: a single investment in a top quality RPC system will
     produce a collection of common communication paradigms that you can
     use for many applications. The RPC system will embody the known good
     solutions to numerous common problems encountered in building
     distributed applications;

  -  performance: since everyone is going to use RPC, we can invest
     substantial effort in making this particular communication path
     go extremely fast.


2. Observed Reality
   ----------------

The RPC systems produced by researchers come close to meeting these lofty
aspirations.  Several groups have produced RPC systems with quite good
performance: round-trip latency of less than 5000 instruction times, and
bulk data transfer throughput of 75% or more of the available network
bandwidth.  Several research groups have produced RPC systems that are good
enough to ease their programmers' burdens substantially, so that the programmer
can actually concentrate on his own research goals, which do not include
questions of byte-order on the Ethernet.  A few research groups have made
progress on including common paradigms beyond mere procedure call in their
RPC systems - for example, issuing handles that get refreshed and timed
out automatically, or that are controlled by distributed garbage
collection.  Some have found comfortable techniques for merging streaming
mechanisms with their RPC designs, broadening the scope of the RPC
solutions considerably.

The RPC systems produced commercially, and the RPC designs that emerge from
standards committees, are a sadder case.  They seem to fall short of the
targets for which the researchers had been competing.  Uniformly,
commercial RPC systems are a lot slower than the results published in the
research literature: generally a factor of 5 or 10 slower.  That's a lot of
instructions!  Uniformly, commercial RPC systems are more complicated and
difficult to use than the research systems; or at least, than the
researchers say their systems are.  Commercial RPC systems mostly are
minimal: they often omit the more elaborate features espoused by their
research brethren. Standards designs seem to tend toward complexity and
elaborateness.

The overall effect has been that while many, probably the majority, of
researchers in distributed systems agree on the utility of the RPC paradigm
(they happily dispute about the boundaries of its usefulness, but few claim
that it is fundamentally misguided), the real-world commercial programmers
are left wondering what all the fuss was about.  A designer of a TP system
wouldn't dream of using a request response protocol that consumes 15000
of his valuable instructions, far less 30000.


3. Analysis
   --------

It is time to consider the several aspects of this paradox.

  - Why do so many commercial RPC enterprises fall short of the standards
    that researchers seem to find so easy?

  - What are the really valuable paradigms that have emerged from our use
    of RPC?  If they're so valuable, why haven't they been embodied in more
    commercial designs?

  - What are the key techniques for achieving communication at high
    efficiency?  In a year when vendors are manufacturing 200 MIP
    single-chip processors, is a 5000 instruction round-trip really
    needed?

  - Effective use of RPC seems to be inextricably linked with the
    availability of lightweight multi-threading within an address space. Is
    this really true?  Why are commercial systems so slow to offer this
    feature?  Will commercial threads be another feature where the reality
    falls short of the researchers' promises?

I'm not going to attempt to offer answers to these questions in this
position paper.  But now is a good time for a well-qualified group to
discuss them.  We need to learn the answers, consider whether we can
help the commercial world to do better in this area, and apply the lessons
to our next great research insight.