af@spice.cs.cmu.edu (Alessandro Forin) (12/21/89)
In article <14246@jumbo.dec.com>, discolo@jumbo.DEC.COM (Anthony Discolo) writes: > Does anyone have benchmarks that compare Mach to BSD and/or Ultrix > in the areas of RPC/networking/scheduling? > > Any pointers would be greatly appreciated. > > Anthony Seems to me large part of the question wants to compare apples and oranges. 1- Do you really believe BSD and derivatives to have an RPC mechanism ? Are you thinking of Sun's/Apollo/... RPCs or .. what ? How do you believe they can be correctly compared to Mach's IPC ? 2- How do you "measure" a scheduler ? On a multiprocessor ? Under which load ? Do you believe the functionalities of the Mach scheduler (e.g. user processor allocation, handoff scheduling, fixed/timeshare priorities ) are in any way comparable to U*x ? If you have any more precise definition of the comparisons you want to make I'd be glad to provide you with an answer. Assuming for now you just want *some* comparison of Ultrix & Mach 2.5 on a pmax here is some data collected some time ago on my pmax, comparing under the same conditions (machine, disk, environment, programs&input, buffer cache, network, time of day, etc etc etc) Mach and Ultrix. [Since we have made progress since, I sometimes added the numbers I get _right_now_ on the same machine, but clearly under different & less controlled conditions]. As far as networking, when I tested our ethernet device driver I was satisfied by measuring a binary ftp of a large file (the Ultrix image :-) into /dev/null using Ultrix's ftp, multiuser, on the same thinwir-ing. Among two Ultrix pmaxen I got no better than about 190 kb/sec, among two Mach pmaxen I got up to 300 kb/sec, as measured by ftp itself. We do have internal test programs for Mach IPC. Here are the results I get right_now on my pmax (multiuser, X11 running, my emacs & news programs in the background, some 20 assorted systems servers for the fancy CMU environment). binding to host testarossa Test syscall: Iters:10000 E:228 U:31 S:197 Test localLoop: Iters:10000 E:0 U:0 S:0 Test localNull: Iters:10000 E:0 U:0 S:0 Test localAdd: Iters:10000 E:16 U:15 S:0 Test localBigIn: Iters:10000 E:203 U:203 S:0 Test localBigOut: Iters:10000 E:203 U:203 S:0 Test localBigInOut: Iters:10000 E:390 U:390 S:0 Test remoteNull: Iters:10000 E:1985 U:15 S:1125 Test remoteAdd: Iters:10000 E:2062 U:141 S:985 Test remoteBigIn: Iters:10000 E:2797 U:172 S:1234 Test remoteBigOut: Iters:10000 E:2016 U:141 S:1000 Test remoteBigInOut: Iters:10000 E:2765 U:390 S:1266 The localXXX entries are for local procedure calls, remoteXXX are for the same operation invoked on a server process. Add just adds two numbers, BigIn passes (by value) as IN parameter a 200bytes string, BigOut returns it (by value), BigInOut does both. All tests ran 10000 times, Elapsed/User/System times are as given, e.g. a null RPC takes 200 microseconds, elapsed. My standard use of the machine is for compilation, so here are two compilation benchmarks: a small one and a large one: Compilation benchmark, small programs. Ultrix Mach Elapsed 17.9 15.4 Breakdown: 1.5 real 1.2 real 2.4 real 2.1 real 1.5 real 1.1 real 1.7 real 1.5 real 2.0 real 1.7 real 2.0 real 1.7 real 1.6 real 1.3 real 2.7 real 2.4 real 1.6 real 1.4 real Mach kernel compilation benchmark. Ultrix Mach Mach-nbc Elapsed 2325.0 2139.0 2180.0 User 1404.0 1471.6 1388.8 System 413.0 323.5 351.1 Utilization 78% 83% 77% I/O 7120+17397 3278+17756 291+6271 [The Mach-nbc entry if for a no-buffer-cache kernel] Some specific U*x tests are performed by a suite of little programs we wrote ourselves. Ultrix Mach right_now Elapsed 15.1 secs 15.6 secs exec 16 ms 12 ms touch 0.566 ms 0.684 ms [0.440] fork 4.960 ms 4.218 ms [3.125] getpid 2.4 secs 2.8 secs [1.9] iocall 8.1 secs 8.2 secs puzzle 0.6 secs 0.6 secs pgtest 1.0 secs 1.3 secs File system performance is compared by Satya's benchmark (see SOSP87) Satya's Filesystem benchmark results (1) On local disk Ultrix Mach Total elapsed 121 secs 108 secs Phase I: Creating directories 2 secs 4 secs Phase II: Copying files 9 secs 9 secs Phase III: Recursive directory stats 9 secs 7 secs Phase IV: Scanning each file 18 secs 9 secs Phase V: Compilation 83 secs 79 secs Hope this helps, sandro- Disclaimer: The Ultrix group have different ideas on what should be tested and by which benchmarks. Their opinions might be very different from ours. The opinions of our users seem to agree with our own opinions.
grunwald@foobar.colorado.edu (Dirk Grunwald) (12/21/89)
Does the version of MACH that's working at CMU use the MIPS ECOFF loader format? If not, does it use BSD loader formats? Or standard Mach format? That, in of itself, would be useful - I'd like to get debugging information from Gcc/G++ to work. If it's not using their loader format, from whence come the compilers and assemblers and loaders? Dirk Grunwald -- Univ. of Colorado at Boulder (grunwald@foobar.colorado.edu) (grunwald@boulder.colorado.edu)
jg@max.crl.dec.com (Jim Gettys) (12/21/89)
I don't understand why your off-machine RCP performance is so poor; you do decently with on machine RPC (relative to other RPC implementations like that at DECSRC), but your off machine (over the net) case is only comparable to what a MicroVAX running Topaz can do, or just running over TCP (X round trip times on Ultrix over the net using TCP are less than a factor of two worse (around 3.6 ms)). Where is the bottle-neck? - Jim Gettys Digital Equipment Corporation Cambridge Research Laboratory
Richard.Draves@CS.CMU.EDU (12/22/89)
Off-machine RPCs are relatively slow because they aren't handled directly by the kernel. A user-level process, called the netmsgserver, handles network IPC. When a task on machine A sends a message to a task on machine B, the message really goes to the netmsgserver on A, which sends it to the netmsgserver on B, which sends it to the final destination. (These intermediaries are transparent to the user tasks on A & B. Messages are sent to capabilities which don't have any location information in their names.) In sum, not an architecture motivated by performance. Rich
discolo@jumbo.pa.dec.com (Anthony Discolo) (12/22/89)
In article <7372@pt.cs.cmu.edu>, af@spice.cs.cmu.edu (Alessandro Forin) writes: > We do have internal test programs for Mach IPC. > Here are the results I get right_now on my pmax (multiuser, X11 > running, my emacs & news programs in the background, some 20 assorted > systems servers for the fancy CMU environment). > > binding to host testarossa > Test syscall: Iters:10000 E:228 U:31 S:197 > Test localLoop: Iters:10000 E:0 U:0 S:0 > Test localNull: Iters:10000 E:0 U:0 S:0 > Test localAdd: Iters:10000 E:16 U:15 S:0 > Test localBigIn: Iters:10000 E:203 U:203 S:0 > Test localBigOut: Iters:10000 E:203 U:203 S:0 > Test localBigInOut: Iters:10000 E:390 U:390 S:0 > Test remoteNull: Iters:10000 E:1985 U:15 S:1125 > Test remoteAdd: Iters:10000 E:2062 U:141 S:985 > Test remoteBigIn: Iters:10000 E:2797 U:172 S:1234 > Test remoteBigOut: Iters:10000 E:2016 U:141 S:1000 > Test remoteBigInOut: Iters:10000 E:2765 U:390 S:1266 > > The localXXX entries are for local procedure calls, remoteXXX > are for the same operation invoked on a server process. > Add just adds two numbers, BigIn passes (by value) as IN parameter a > 200bytes string, BigOut returns it (by value), BigInOut does both. > All tests ran 10000 times, Elapsed/User/System times are as given, > e.g. a null RPC takes 200 microseconds, elapsed. Thanks. Just a couple of questions... Do the localXXX entries refer to inter-address space/intra-machine procedure calls? Do the remoteXXX entries refer to inter-machine procedure calls? Do the BigInOut procedures touch their arguments? Anthony ----- Anthony Discolo DEC Systems Research Center 130 Lytton Ave. Palo Alto, CA 94301 ARPA: discolo@src.DEC.COM
af@spice.cs.cmu.edu (Alessandro Forin) (12/22/89)
In article <1482@crltrx.crl.dec.com>, jg@max.crl.dec.com (Jim Gettys) writes: > I don't understand why your off-machine RCP performance is so poor; ... > your off machine (over the net) case is only comparable to what a > MicroVAX running > Topaz can do, or just running over TCP (X round trip times on Ultrix > over the net using TCP > are less than a factor of two worse (around 3.6 ms)). Where is the > bottle-neck? > - Jim Gettys > Digital Equipment Corporation > Cambridge Research Laboratory I believe there is a deep misunderstanding here, and re-reading my post I realize that I am largely responsible for it. [On the other hand, where did you hear of a system that does an RPC over the ether in 200 usecs ???? From the SOSP proceedings I see that the official score seems to be: Cedar: 1.1 MILLIsecs/call Dorado Amoeba: 1.4 Tadpole (68020) V: 2.5 Sun 3/75 Topaz: 2.7 Firefly (5-way multi) Sprite: 2.8 Sun 3/75 Topaz: 4.8 Firefly (mono) ] The RPC numbers I gave are LOCAL: the output from the program is definitely misleading in the use of the terms "local" and "remote". What the author meant is "local" for a normal procedure call (in the same address space), "remote" for a different thread but on the same machine. [The results should be the same for threads in the same or in separate address spaces, the test is across separate address spaces.] The table is also misleading in that times are not normalized. Here are the normalized numbers, all times in MICROSECONDS per call. Test syscall: E:22 U:3 S:19 Test localLoop: E:0 U:0 S:0 Test localNull: E:0 U:0 S:0 Test localAdd: E:1 U:1 S:0 Test localBigIn: E:20 U:20 S:0 Test localBigOut: E:20 U:20 S:0 Test localBigInOut: E:39 U:39 S:0 Test remoteNull: E:198 U:1 S:112 Test remoteAdd: E:206 U:14 S:98 Test remoteBigIn: E:279 U:17 S:123 Test remoteBigOut: E:201 U:14 S:100 Test remoteBigInOut: E:276 U:39 S:126 What the test wants to show is the approx ratio between local (LPC?) and remote procedure call (RPC!) on the given machine, which in this case turns out to be slightly better than a factor of 10 slower, which is not too bad given the amount of optimization the MIPS guys put in their compilers. It is well known that Mach network IPC is not very fast, we always stressed functionality over performance, e.g. the ability to transparently replace transport protocols from TPC to UDP to VMTP to ... whatever with only trivial changes in a single user-level process. Maybe some day we will put some work on getting it fast, but right now we are not in the race. Besides, Mach IPC is a no-cheat IPC system, with all the security measures necessary for a true multiuser/time-shared machine. For instance, I would expect the guys at Trusted Information Systems to see about the same performance on their Secure Mach. Comparisons with unsecure systems like Topaz are therefore inappropriate, with all due respect to all the painful work they did to get their very good numbers. On a multiprocessor. [Same goes for Amoeba or V-kernel] We also do well on transfering large volumes of data, see for instance what the NeXT box does with bitmaps. Anyways, here are the times I get between two similar pmaxen, on the same cable, right_now, multiuser etc etc. These times ARE for network IPC and ARE normalized as above in usecs per call. And Rick WILL kill me for handing them out. [Take them as lower-bounds, on a young machine] binding to host rvb Test remoteNull: E:6563 U:15 S:203 Test remoteAdd: E:6421 U:0 S:265 Test remoteBigIn: E:7172 U:47 S:141 Test remoteBigOut: E:6563 U:15 S:94 Test remoteBigInOut: E:7140 U:94 S:266 Why is it so slow ? Because the path is Machine-A: client -> kernel -> network_server -> kernel -> ether Machine-B: ether -> kernel -> network_server -> kernel -> server that is, the network_servers "interpose" between kernels. On other systems the path is typically something like Machine-A: client -> kernel -> ether Machine-B: ether -> kernel -> server which saves 4 `copy' operations. Topaz then cheats by only `copying' once, from the client's stack into a preallocated packet (and then back) in user-visible memory. The packet is then handed over to the device driver by reference. If I had to guess, I'd say this way they could do something like 500 MICROsecs/call on a multiprocessor pmax. And if they double-cheated by using the Washington version they'd probably do even better. The term `copy' above includes context-switching overhead, if applicable. As far as X is concerned, I take the word of the people at MIT that "It runs visibly faster under Mach than under Ultrix". I am no X guru, for me the machine is so fast anyways that I can't see any difference. But I would indeed expect the better ether driver to have some positive effects. I'd be glad to run under Mach the benchmark you used to get the "3.6 msecs" figure and report the findings, where do I ftp it from ? BTW, did you see the tex previewer for X11 by Eric Cooper on a pmax ? sandro- PS: I am setting up a TAR file with the benchmarks I mentioned, except of course the Mach sources, for which you can subsitute your favorite BSD kernel. By tomorrow it should be available by anonymous FTP on host testarossa.mach.cs.cmu.edu [128.2.250.252] in the directory /usr/pub
Rick.Rashid@CS.CMU.EDU (12/22/89)
Actually, Rich is only partly correct. On most systems the cost of 4.3BSD networking code (which is used by the netmsgserver) dominates all other costs. The netmsgserver has support for several protocols but the standard one used at CMU (and by NeXT) is based on TCP connections. There is code in the kernel for "short-circuiting" established Mach IPC connections directly to the TCP code so when this option is used there is no "intermediate" netmsgserver processing for each message. Unfortunately the cost of IP/TCP code more than makes up the difference. The last time I looked the cost of IP output was as high as 1600 VAX instructions by itself. We have experimented with various more high performance protocols such as David Cheriton's VMTP. A now somewhat dated version of VMTP is an option in our system and the netmsgserver knows how to use it. That version of VMTP was never really stable in CMU's complex network environment, though, so it has only been used for experimentation. It is likely that the most recent VMTP release will be re-integrated and tested in the next few months. Anyone interested in experimenting with the use of alternate protocols or network driving code in Mach can do so relatively easily by modifying the existing netmsgserver (which is already set up to handle 3 different protocols) and the kernel's short-circuit code can be easily connected to alternate network driver code.
Richard.Draves@CS.CMU.EDU (12/22/89)
I still don't understand what Sandro's numbers measure, so I won't try
to comment on them.
Excerpts from mail: 21-Dec-89 Re: Mach performance? [Long]
Rick.Rashid@CS.CMU.EDU (1425)
> Actually, Rich is only partly correct.
I didn't mention the "short-circuit" path in my brief description of
remote IPC because I don't think it is usable. It is an experimental
option. I don't even know if the code would still compile if one turned
on the option. It certainly isn't in use anywhere. NeXT tried to put
the short-circuit code into their production system and found it was too
buggy to use; they had to back it out.
I think the short-circuit code was a successful experiment. The
improved times it produced confirmed that the netmsgserver is a
bottleneck in remote Mach IPC.
Rich
jg@max.crl.dec.com (Jim Gettys) (12/23/89)
The 3.6 millisecond figure (round trip using TCP) between two PMAXen on a local net was measured with the x11perf program, which you can get off of expo.lcs.mit.edu or gatekeeper.dec.com. It is performing a no-op X request, and timing the response (I think it is just measuring elapsed time on XSync X library calls.). The current version of x11perf is version 1.2. It will also be on the X11R4 distribution. - Jim
pcg@rupert.cs.aber.ac.uk (Piercarlo Grandi) (12/23/89)
In article <QZYJhvu00hYP4Qa1N2@cs.cmu.edu> Richard.Draves@CS.CMU.EDU writes: Excerpts from mail: 21-Dec-89 Re: Mach performance? [Long] Rick.Rashid@CS.CMU.EDU (1425) > Actually, Rich is only partly correct. I didn't mention the "short-circuit" path in my brief description of remote IPC because I don't think it is usable. It is an experimental option. I don't even know if the code would still compile if one turned on the option. It certainly isn't in use anywhere. NeXT tried to put the short-circuit code into their production system and found it was too buggy to use; they had to back it out. I have been aware of Rashid's Accent IPC for almost ten years, and about five years ago ported it to to System V, and did a netmsgserver. It is quite possible and actually I think (I did not actually do it) fairly easy to move the netmsgserver in the kernel, so that context switch time is essentially nullified. This is what 4.xBSD essentially does; by default their netmsgserver is stuck right in the kernel. Not many seem to have noticed that this actually is not the only option you have under 4.xBSD, indeed there are two alternatives: 1) all user programs could use the Unix domain, where you can (modulo some bugs) send filedescriptors between processes. All processes open Unix domain connections to a network server process, and this is the only one that opens TCP/IP or whatever connections to the outside world. You can use the existing TCP sockets, or use raw IP sockets and reimplement TCP in the server, or whatever. This is exactly like in Mach. 2) an unimplemented feature of 4.xBSD is user implemented IPC domains. The idea was to give user processes the ability to register with the kernel as servers for sockets of some particular domain, and the kernel would pass to that process all operations on sockets of that domain. This facility has never been implemented, just like 4.xBSD wrappers (it actually is related to them). Interestingly enough, option 1) is possible also under streams, and I am quite sure that the two crucial points of Rashid's IPC, the ability to send file/port descriptors with messages and the access to global addresses only through address space local file/port descriptors has been inspired in both cases by Accent (even if both points are circumvented under 4.xBSD by direct access to a kernel based netmsgserver). I think the short-circuit code was a successful experiment. The improved times it produced confirmed that the netmsgserver is a bottleneck in remote Mach IPC. Naturally all this netmsg server trouble happens because of a fundamental limitation of Accent/Mach, the inabilities for threads to change address space and, possibly, to have multiple address spaces mapped together (yes, I know about sharing address spaces, it's not quite the same thing). I suspect that these limitations are there also possibly because otherwise the architecture would be very different from the Unix one, and CMU have been badly burnt with Accent that was too unlike Unix. Context switching for RPC implies three distinct overheads: security checking, address space switching, thread switching. There are therefore three possible levels of extra sophistication beyond Accent/Mach (which is already two levels beyond 4.xBSD): 1) If it were possible for threads to jump between address spaces, the thread switching overhead would be nullified. 2) If it were possible to map multiple address spaces together even address space switching would be nullified. 3) If it were possible to inform the OS that an address space trusted another, security checking in that direction would be eliminated. As an historical note, Multics had all three for communication between *rings* in the same address space, and even had support in hardware to do 3) in the reverse direction. Capability machines with a single global address space for all threads are best of course, and since security checking is automagically done in both directions by hardware, point 3) is moot. An OS called Psyche (from Rochester, not by chance) allows you to do all three things on fairly conventional (non-capability, non-ring) hardware; you can then have your netmsgserver comapped with your user address space, the thread that wants to do the network RPC just jumps to the netmsgserver code, and since you say that you trust it, half of the checking is eliminated as well. I think that this should give excellent performance; I have had correspondence with other people working on similar lines (e.g. from AT&T), and from our limited data it is apparent you do not pay much more than for a local procedure call (and maybe even less than an intra address space thread rendezvouz, as you don't have synch and thread switch costs). I have been working (since 1983 on and off... but I have now apparently found a way to switch to full time for this) on something that does 1) by default, but will only do 2) for selected, statically configured, modules, and 3) only for the kernel. While this is a less general mechanism than Psyche, I think that the Psyche mechanism is excessively fine grained for my tastes, and I'd rather be more restrictive, and not even offer the option to do 2) and 3) in a general way. There is of course a difference in perspective: I am a minimalist, and I don't want to add mechanisms that are not relevant, or may even encourage programming styles at variance with my target environment, distributed systems, where it is important to cut overheads, but also to encourage the programmer to be aware of communication boundaries, and not to expect to be able to map address spaces together, as they may be on different machines. It is *possible* to support transparently distributed shared memory, indeed it is in principle possible and fairly easy with the existing Accent/Mach architecture, but I think that hiding communication boundaries while attractive from a conceptual point of view would also hide the underlying reality in terms of cost and reliability. Also, most current machines do not have long enough virtual addresses that you can expect to map many user address spaces together. The Psyche people have of course a completely different attitude, and I would dare say that to them points 2) and especially 3) are the most important because their target is a NUMA machine, that is a not-too-loosely coupled multiprocessor and not a (possibly over a wide area) distributed system (Mach seems ever more oriented to very closely coupled multiprocessors), and where hardware effort has been expended to provide some credible, efficient illusion of global shared memory, that should be exploited. In other words, my reckoning is that while in current Accent/Mach kernels there is a "short-circuit" path as an exception to the normal mechanism, the (sw) architecture should be such that such a thing is actually the standard. -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcvax!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
Richard.Draves@CS.CMU.EDU (12/26/89)
Excerpts from netnews.comp.os.mach: 23-Dec-89 Re: Mach performance? [Long] Piercarlo Grandi@rupert. (7140) > It is quite possible and actually I think (I did > not actually do it) fairly easy to move the netmsgserver in the > kernel, so that context switch time is essentially nullified. I wouldn't want to move the netmsgserver into the Mach kernel. The netmsgserver is a pretty hefty program. As Rick mentioned, it can handle multiple network protocols. It has security capabilities, to protect port rights. (See Robert Sansom's thesis.) It is the base-level name server. (Once two parties on different machines have exchanged send rights, they can transmit more port rights back and forth, but the netmsgserver has to be involved in the initial bootstrap that sets up the first remote send right.) It reformats data in messages (like fixing byte order), according to the type descriptors in the message and the hardware architectures involved. The "short-circuit" experiment did demonstrate that it is feasible to avoid the netmsgserver on performance-critical paths. I think this is much more palatable than moving the entire netmsgserver into the kernel, although it does compromise the Mach IPC model to some extent. (Theoretically, the netmsgserver is just another user task and doesn't need any special kernel support.) Rich
peter@ficc.uu.net (Peter da Silva) (12/28/89)
In article <IZYFjfK00hYPB2M7si@cs.cmu.edu> Richard.Draves@CS.CMU.EDU writes: > Off-machine RPCs are relatively slow because they aren't handled > directly by the kernel. A user-level process, called the netmsgserver, > handles network IPC. Which is as it should be, right? One of the design goals of Mach was to move stuff like this out of the kernel, then improve performance by speeding up context switches and clever use of virtual memory. I might be all wet on this, but by making the user pages containing the data to be sent copy-on-write and mapping them into the netmsgserver you could get rid of all the extra copies. And context switch time should already be quite low. Merely a SMOP. -- `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. 'U` Also <peter@ficc.lonestar.org> or <peter@sugar.lonestar.org>. "It was just dumb luck that Unix managed to break through the Stupidity Barrier and become popular in spite of its inherent elegance." -- gavin@krypton.sgi.com
ast@cs.vu.nl (Andy Tanenbaum) (12/29/89)
In article <7387@pt.cs.cmu.edu> af@spice.cs.cmu.edu (Alessandro Forin) writes: > From the SOSP proceedings I see that > the official score seems to be: > Cedar: 1.1 MILLIsecs/call Dorado > Amoeba: 1.4 Tadpole (68020) > V: 2.5 Sun 3/75 > Topaz: 2.7 Firefly (5-way multi) > Sprite: 2.8 Sun 3/75 > Topaz: 4.8 Firefly (mono) I think it is important that everyone realize that RPC protocols are basically CPU limited. Thus when making comparisons, one has to normalize for CPU speed. In the Oct 1988 Operating System Review I wrote an article claiming that Amoeba was the fastest distributed system in the world ON ITS CLASS OF HARDWARE (68020 at 16 MHz --essentially Sun 3/50 type machines). This has been widely misunderstood. In the above list one might get the impression that the Cedar folks wrote better software (1.1 < 1.4 etc.) It is important to note that the effective speed of the Dorado hardware is something like 3 times faster than the Sun 3/50. Getting a 20% speed gain by using 3 times faster hardware is nice, but not Guiness Book of Records stuff. If one only looks at raw speed, then Kermit running on a Cray-3 is going to beat the pants off everybody. Conclusion: When making rankings like the above, one must normalize for CPU speed, or at least quote it. It would be interesting to see an honest list, including Mach. I believe that Amoeba is still #1 in RPC performance and also in reading from a remote file server (677 kilobytes/sec). How fast can a Mach user program read data continuously from a remote machine over the Ethernet (assuming 100% hit rate in the file server's cache, to factor out disk speed)? Finally, one also has to be careful that one is measuring the same thing. The Amoeba times are user-to-user, not kernel-to-kernel, using no special tricks, no microcode assist, no special hardware boards, etc. Just plain vanilla Ethernet. I can only hope the others measure the same thing. Andy Tanenbaum (ast@cs.vu.nl) P.S. With a little luck, Amoeba will be made available during the course of 1990. I'll post an announcement to comp.os.misc when the time comes.