morse@quark.mpr.ca (Daryl Morse) (03/14/91)
Now that we have been throroughly reminded of some of the differences between Amoeba and Mach, I have a question regarding the throughput of the Mach RPC. First a bit of background... Peterson, et al, published a paper in the May 1990 issue of IEEE Computer entitled "The x-kernel: A platform for accessing Internet resources." In that paper, a number of RPC throughput figures were given for x-kernel, Mach, and several other OSes. (I don't have it handy right now to give a full list, however.) Tanenbaum, et al, published a paper in the December 1990 issue of Communications of the ACM, entitled "Experiences with the Amoeba Distributed operating System." In that paper, RPC throughput figures were given for Amoeba, Cedar, x-kernel, V, Topax, Sprite, and Mach. (The figure for Mach was obtained from Peterson's paper.) The throughput for Mach is between 3 and 10 times slower than that of the other OSes. My question is simple, though its answer likely is not: Why is the throughput of the Mach RPC so much slower than the other OSes? Are the respective RPCs different enough that throughput is a meaningless "apples and oranges" comparision? Has the Mach RPC simply not been optimized as heavily as that of the other OSes? Thanks. -- Daryl Morse | Voice : (604) 293-5476 MPR Teltech Ltd. | Fax : (604) 293-5787 8999 Nelson Way, Burnaby, BC | E-Mail: morse@quark.mpr.ca Canada, V5A 4B5 | quark.mpr.ca!morse@uunet.uu.net
bill@cs.columbia.edu (Bill Schilit) (03/15/91)
Our measurements of Mach RPC show that TCP is the dominant cost when sending synchronous RPC consisting of a small request and a 4K response. On an i386 machine with an AT bus running Mach 2.5 we tried to account for the measured round trip time of 33ms by computing the cost of ethernet and bus transfer and separately measuring TCP loopback and Mach RPC without going over the network. The results are shown below. 4K RPC ------ Packets 7 (observed) Bytes 4738 (observed) AT bus (times 2) 9.5 ms (computed) Ethernet 3.8 ms (computed) TCP Loopback 12 ms (measured) Mach IPC (times 2) 4 ms (measured) Total 29.3 ms Network RPC 33 ms (measured) Breaking the times down with into these four components gets pretty close to the observed network RPC rate. These tests are fully described in "Adaptive remote paging for mobile computing" TR CUCS-004-91, available by anonymous FTP from cs.columbia.edu::pub/reports. - Bill -- Bill Schilit Columbia University Computer Science Department bill@cs.columbia.edu
fkittred@bbn.com (Fletcher Kittredge) (03/15/91)
In article <MORSE.91Mar14091659@quark.mpr.ca> morse@quark.mpr.ca (Daryl Morse) writes: > >Now that we have been throroughly reminded of some of the differences >between Amoeba and Mach, I have a question regarding the throughput of >the Mach RPC. First a bit of background... ... >Why is the >throughput of the Mach RPC so much slower than the other OSes? Are the >respective RPCs different enough that throughput is a meaningless >"apples and oranges" comparision? Has the Mach RPC simply not been >optimized as heavily as that of the other OSes? > The answer is that this is an example of comparing an old version of piece of software with a new version of a competing package. Given the dates cited in your article, the Mach RPC tested would be version 2.5 or lower. For Mach 3.0, the RPC system was completely re-written by Richard Draves. One of the results was a real increase in speed. You will be wanting to read the paper "A Revised IPC Interface", Richard Draves, Proceedings of the October 1990 "Machnix" Conference, Burlington Vt. An example of the increase in performance is that the null RPC now takes 125 micro-seconds instead of 210 micro-seconds. >Thanks. your welcome ;-), fletcher > >-- >Daryl Morse | Voice : (604) 293-5476 >MPR Teltech Ltd. | Fax : (604) 293-5787 >8999 Nelson Way, Burnaby, BC | E-Mail: morse@quark.mpr.ca >Canada, V5A 4B5 | quark.mpr.ca!morse@uunet.uu.net Fletcher Kittredge Platforms and Tools Group, BBN Software Products 10 Fawcett Street, Cambridge, MA. 02138 617-873-3465 / fkittred@bbn.com / fkittred@das.harvard.edu
Richard.Draves@cs.cmu.edu (03/16/91)
> Excerpts from netnews.comp.os.mach: 15-Mar-91 Re: Mach RPC Throughput... > Fletcher Kittredge@bbn.c (1629) > The answer is that this is an example of comparing an old version of > piece of software with a new version of a competing package. Given > the dates cited in your article, the Mach RPC tested would be version > 2.5 or lower. For Mach 3.0, the RPC system was completely re-written > by Richard Draves. One of the results was a real increase in speed. > You will be wanting to read the paper "A Revised IPC Interface", Richard > Draves, Proceedings of the October 1990 "Machnix" Conference, Burlington > Vt. I rewrote the kernel IPC code. I believe the Amoeba/Mach comparison was looking at network RPC throughput. We are looking at ways to improve the performance of network RPCs for Mach 3.0. Rich
schmidt@crimee.ics.uci.edu (Doug Schmidt) (03/16/91)
In article <63274@bbn.BBN.COM> fkittred@spca.bbn.com (Fletcher Kittredge) writes:
++ You will be wanting to read the paper "A Revised IPC Interface", Richard
++ Draves, Proceedings of the October 1990 "Machnix" Conference, Burlington
++ Vt.
Speaking of documentation... Can someone please inform me where to ftp
the latest postscript sources of the CMU Mach documentation, e.g., the
kernel interface manual, the cthreads manual, etc. I've looked on
cs.cmu.edu, but only the Mach 3.0 micro-kernel sources seem to be
there. Is the documentation located someplace else?
Thanks,
Doug
--
His life was gentle, and the elements so | Douglas C. Schmidt
Mixed in him that nature might stand up | (schmidt@ics.uci.edu)
And say to all the world: "This was a man." | (714) 856-4101
-- In loving memory of Terry Williams (1971-1991)|
ast@cs.vu.nl (Andy Tanenbaum) (03/16/91)
In article <MORSE.91Mar14091659@quark.mpr.ca> morse@quark.mpr.ca (Daryl Morse) writes: >My question is simple, though its answer likely is not: Why is the >throughput of the Mach RPC so much slower than the other OSes? Are the >respective RPCs different enough that throughput is a meaningless >"apples and oranges" comparision? Has the Mach RPC simply not been >optimized as heavily as that of the other OSes? I am sure Rick can give the definitive answer for Mach, but I can speak for Amoeba, and I think by implication for some of the others. An RPC in Amoeba follows the following path: 1. User process issues an RPC system call and traps to the kernel 2. Kernel mucks about with headers and sends a packet to the dest CPU 3. Dest kernel unmucks the headers and passes the packet to the user 4. User inspects the packet and traps to the kernel to send reply 5. More muck, another packet sent 6. Src machine gets the reply packet and passes it back to the user These 6 steps take 1.1 msec on a Sun 3/60. The protocol used on the Ethernet is a straightforward protocol we have designed. It is not IP. No external servers of any kind are involved. I believe that Mach involves external servers in the process, which of course is fatal for the performance. Thus we are comparing apples to apples. From the user's viewpoint, what is being measured is the time to send a message from itself as client to a remote server over the Ethernet, and get the reply back. Nevertheless, Amoeba processes can speak TCP/IP when desired. There is an external TCP/IP server available. To speak TCP/IP, a client does an RPC with the TCP/IP server and effectively says: "Please send this data as an IP packet to such and such an IP address." This gives full connectivity with the Internet, but also means that the normal (local) case goes much faster. The loss of performance when going through the TCP/IP server is not so important because usually the TCP connections go over a narrow-band wide-area link anyway, so there is no way to get high-performance no matter what. In essence, we have chosen to optimize the local case, and accepted worse performance when one specifically wishes to speak TCP/IP. I believe that Mach has chosen to do things differently. Andy Tanenbaum (ast@cs.vu.nl)
ast@cs.vu.nl (Andy Tanenbaum) (03/17/91)
In article <4bsEGja00hsQI6K19M@cs.cmu.edu> Richard.Draves@cs.cmu.edu writes: >I rewrote the kernel IPC code. I believe the Amoeba/Mach comparison was >looking at network RPC throughput. We are looking at ways to improve >the performance of network RPCs for Mach 3.0. What are the current figures for the 3.0 microkernel for sending a null message from user space on one machine over the Ethernet to another user process and then back, i.e. the null RPC time? Also, what is the maximum user-to-user bandwidth in 3.0? If possibly, what are they on Sun 3/60s, to compare them with the numbers I published in the Dec, 1990 CACM. Andy Tanenbaum (ast@cs.vu.nl)
bob@MorningStar.Com (Bob Sutterfield) (03/19/91)
In article <9332@star.cs.vu.nl> ast@cs.vu.nl (Andy Tanenbaum) writes:
The protocol used [by Amoeba] on the Ethernet ... is not IP ...
The loss of performance when going through the TCP/IP server is not
so important because usually the TCP connections go over a
narrow-band wide-area link anyway, so there is no way to get
high-performance no matter what. In essence, we have chosen to
optimize the local case, and accepted worse performance when one
specifically wishes to speak TCP/IP.
What about when the non-"local" case involves RPC with a machine on a
different Ethernet in the next room, accessible via a high-bandwidth
IP router? IP is useful in environments other than wide area
networks. Local distributed computing environments might involve
multiple networks connected by routers, rather than bridges or
repeaters, and it seems that you're designing in a penalty for Amoeba
users whose clusters grow too big for one Ethernet.
morse@quark.mpr.ca (Daryl Morse) (03/19/91)
In article <9332@star.cs.vu.nl> ast@cs.vu.nl (Andy Tanenbaum) writes: > In article <MORSE.91Mar14091659@quark.mpr.ca> morse@quark.mpr.ca (Daryl Morse) writes: > >My question is simple, though its answer likely is not: Why is the > >throughput of the Mach RPC so much slower than the other OSes? Are the > >respective RPCs different enough that throughput is a meaningless > >"apples and oranges" comparision? Has the Mach RPC simply not been > >optimized as heavily as that of the other OSes? > I am sure Rick can give the definitive answer for Mach, but I can speak for > Amoeba, and I think by implication for some of the others. An RPC in > Amoeba follows the following path: > 1. User process issues an RPC system call and traps to the kernel > 2. Kernel mucks about with headers and sends a packet to the dest CPU > 3. Dest kernel unmucks the headers and passes the packet to the user > 4. User inspects the packet and traps to the kernel to send reply > 5. More muck, another packet sent > 6. Src machine gets the reply packet and passes it back to the user > These 6 steps take 1.1 msec on a Sun 3/60. The protocol used on the > Ethernet is a straightforward protocol we have designed. It is not IP. A number of people both posted, and replied directly, that the MACH RPC runs over UDP/IP. In that light, the comparison was most certainly one of "apples to oranges", but not for the reason given below. Unless I am mistaken, that point was not very clearly indicated in either of the comparisions. It would have been helpful if it was. > No external servers of any kind are involved. I believe that Mach > involves external servers in the process, which of course is fatal for > the performance. Thus we are comparing apples to apples. From the user's According to one posted reply <bill@cs.columbia.edu (Bill Schilit)>, the difference is a result of the different transport, rather than the fact that an external server is utilized (ie. the transport, not the architecture of Mach). > viewpoint, what is being measured is the time to send a message from itself > as client to a remote server over the Ethernet, and get the reply back. > Nevertheless, Amoeba processes can speak TCP/IP when desired. There is an > external TCP/IP server available. To speak TCP/IP, a client does an RPC > with the TCP/IP server and effectively says: "Please send this data as an > IP packet to such and such an IP address." This gives full connectivity > with the Internet, but also means that the normal (local) case goes much > faster. The loss of performance when going through the TCP/IP server is > not so important because usually the TCP connections go over a narrow-band > wide-area link anyway, so there is no way to get high-performance no matter > what. In essence, we have chosen to optimize the local case, and accepted > worse performance when one specifically wishes to speak TCP/IP. I believe > that Mach has chosen to do things differently. Perhaps you can post some results of Amoeba RPC throughput over TCP/IP, so we can see an "apples to apples" comparision? That would likely be a more "fair" comparision. > Andy Tanenbaum (ast@cs.vu.nl) I would also like to see an "oranges to oranges" comparision, namely one where the Mach RPC runs over a less-expensive transport, as in the "nrmal (local) case" for Amoeba. One respondent, who asked to be identified only as a "highly placed source" hinted that such a comparision might soon be possible: >However, you might be interested to hear that X-kernel is being >integrated into Mach as a basic, core system component. The new "Mach >network message server" will actually be the X-kernel, ported to Mach. >So, once this is running, I would assume that Mach will run at >X-kernel speeds. Your are correct. I am interested. Perhaps someone is willing to offer some tangible comments on that?? -- Daryl Morse | Voice : (604) 293-5476 MPR Teltech Ltd. | Fax : (604) 293-5787 8999 Nelson Way, Burnaby, BC | E-Mail: morse@quark.mpr.ca Canada, V5A 4B5 | quark.mpr.ca!morse@uunet.uu.net
gdtltr@brahms.udel.edu (root@research.bdi.com (Systems Research Supervisor)) (03/19/91)
In article <BOB.91Mar18130146@volitans.MorningStar.Com> bob@MorningStar.Com (Bob Sutterfield) writes: =>In article <9332@star.cs.vu.nl> ast@cs.vu.nl (Andy Tanenbaum) writes: => The protocol used [by Amoeba] on the Ethernet ... is not IP ... => The loss of performance when going through the TCP/IP server is not => so important because usually the TCP connections go over a => narrow-band wide-area link anyway, so there is no way to get => high-performance no matter what. In essence, we have chosen to => optimize the local case, and accepted worse performance when one => specifically wishes to speak TCP/IP. => =>What about when the non-"local" case involves RPC with a machine on a =>different Ethernet in the next room, accessible via a high-bandwidth =>IP router? IP is useful in environments other than wide area =>networks. Local distributed computing environments might involve =>multiple networks connected by routers, rather than bridges or =>repeaters, and it seems that you're designing in a penalty for Amoeba =>users whose clusters grow too big for one Ethernet. I don't want to speak for Dr. Tanenbaum, but I like to pop in every once in a while and pretend that I know what I am talking about. :-) I believe that Amoeba 4.0 deals with the multiple LAN case with a Fast Local Internet Protocol (FLIP). This adds a hint to a capability to determine which subnet the service is on. This still wouldn't handle IP-specific hardware, but it would handle more general high-bandwidth gateways. Not all of us want to be slaves to existing protocols, especially if there is greater performance to be gained through newer ones. Gary Duzan Time Lord Third Regeneration -- gdtltr@brahms.udel.edu _o_ ---------------------- _o_ [|o o|] Two CPU's are better than one; N CPU's would be real nice. [|o o|] |_o_| Disclaimer: I AM Brain Dead Innovations, Inc. |_o_|
schmidt@crimee.ics.uci.edu (Doug Schmidt) (03/20/91)
In article <MORSE.91Mar18110412@quark.mpr.ca> morse@quark.mpr.ca (Daryl Morse) writes:
++ "nrmal (local) case" for Amoeba. One respondent, who asked to be
++ identified only as a "highly placed source" hinted that such a
++ comparision might soon be possible:
++
++ >However, you might be interested to hear that X-kernel is being
++ >integrated into Mach as a basic, core system component. The new "Mach
++ >network message server" will actually be the X-kernel, ported to Mach.
++ >So, once this is running, I would assume that Mach will run at
++ >X-kernel speeds.
Hum, did your highly placed source indicate whether the x-kernel
support would run in the kernel or in user-space as another server?
It seems that would make a big difference in terms of any significant
speed-up, since a major win of the x-kernel is that it avoids context
switches when moving from user-space to the kernel and vice versa.
Does anyone have any further info about this?
Doug
--
His life was gentle, and the elements so | Douglas C. Schmidt
Mixed in him that nature might stand up | (schmidt@ics.uci.edu)
And say to all the world: "This was a man." | (714) 856-4101
-- In loving memory of Terry Williams (1971-1991)|
dpj@CS.CMU.EDU (Daniel Julin) (03/20/91)
As indicated in several earlier posts, the major cost of the current implementation of Mach IPC (RPC) appears to be the low-level transport protocol used. The "standard" Mach netmsgserver uses TCP/IP over UNIX sockets. TCP was selected because it provided the best combination of performance and robustness on large, complicated networks. Note that this issue of robustness is particularly important at CMU. There are over 1700 machines on the CMU network, with almost 800 in the School of Computer Science alone, of which more than 500 are running Mach. On an average day, few of these machines might be actively engaged in doing Mach network RPC's, but the chances that two communicating machines are on the same physical cable are quite low. In addition, we have to contend with a lot of background traffic, and non-negligible packet losses. We have been thinking about using different transport protocols in different situations for quite a while, but never got around to it. For the future, we are investigating the use of the x-kernel "virtual protocols" mechanism to fulfill this function. Our plans for the x-kernel are not finalized, but current prototypes assume a system running at user-level in a Mach task. However, we are also working on user-level device drivers, which should take care of the argument against paying a protection boundary crossing for network access. Finally, since someone asked, there is no "serious" netmsgserver distributed with Mach 3.0. I have put together an adaptation of the netmsgserver from the 2.5 system to use TCP sockets emulated in the UNIX server, but this is clearly not an interesting long-term solution. Again, in the longer term, we hope to use x-kernel technology instead. ====================================================================== Daniel Julin dpj@cs.cmu.edu School of Computer Science Carnegie Mellon University, Pittsburgh, PA 15213 ======================================================================