A.D. Birrell and B. J. Nelson
3.3 Complicated Calls
As mentioned above, the transmitter of a packet is responsible for retransmitting it until it is acknowledged. In doing so, the packet is modified to request an explicit acknowledgment. This handles lost packets, long duration calls, and long gaps between calls. When the caller is satisfied with its acknowledgments, the caller process waits for the result packet. While waiting, however, the caller periodically sends a probepacket to the callee, which the callee is expected to acknowledge. This allows the caller to notice if the callee has crashed or if there is some serious communication failure, and to notify the user of an exception. Provided these probes continue to be acknowledged the caller will wait indefi- nitely, happy in the knowledge that the callee is (or claims to be) working on the call. In our implementation the first of these probes is issued after a delay of slightly more than the approximate round-trip time between the machines. The interval between probes increases gradually, until, after about 10 minutes, the probes are being sent once every five minutes. Each probe is subject to retrans- mission strategies similar to those used for other packets of the call. So if there is a communication failure, the caller will be told about it fairly soon, relative to the total time the caller has been waiting for the result of the call. Note that this will only detect failures in the communication levels: it will not detect if the callee has deadlocked while working on the call. This is in keeping with our principle of making RPC semantics similar to local procedure call semantics. We have language facilities available for watching a process and aborting it if this seems appropriate; these facilities are just as suitable for a process waiting on a remote call.
A possible alternative strategy for retransn~ssions and acknowledgments is to have the recipient of a packet spontaneously generate an acknowledgment if he doesn't generate the next packet significantly sooner than the expected retrans- mission interval. This would save the retransmission of a packet when dealing with long duration calls or large gaps between calls. We decided that saving this packet was not a large enough gain to merit the extra cost of detecting that the spontaneous acknowledgment was needed. In our implementation this extra cost would be in the form of maintaining an additional data structure to enable an extra process in the server to generate the spontaneous acknowledgment,~when appropriate, plus the computational cost of the extra process deciding when to generate the acknowledgment. In particular, it would be difficult to avoid incur- ring extra cost when the acknowledgment is not needed. There is no analogous extra cost to the caller, since the caller necessarily has a retransmission algorithm in case the call packet is lost.
If the arguments (or results) are too large to fit in a single packet, they are sent in multiple packets with each but the last requesting explicit acknowledg- ment. Thus when transmitting a large call argument packets are sent alternately by the caller and callee, with the caller sending data packets and the callee responding with acknowledgments. This allows the implementation to use only one packet buffer at each end for the call, and avoids the necessity of including the buffering and flow control strategies found in normal-bulk data transfer protocols. To permit duplicate elimination, these multiple data packets within a call each has a call-relative sequence number. Figure 4 shows the packet sequences for complicated calls.