af@spice.cs.cmu.edu (Alessandro Forin) (12/22/89)
Archive-name: benchmark/af-rpc Original-posting-by: af@spice.cs.cmu.edu (Alessandro Forin) Original-subject: Re: Mach performance? [Long] Archive-site: testarossa.mach.cs.cmu.edu [128.2.250.252] Archive-directory: /usr/pub Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti) In article <1482@crltrx.crl.dec.com>, jg@max.crl.dec.com (Jim Gettys) writes: > I don't understand why your off-machine RCP performance is so poor; ... > your off machine (over the net) case is only comparable to what a > MicroVAX running > Topaz can do, or just running over TCP (X round trip times on Ultrix > over the net using TCP > are less than a factor of two worse (around 3.6 ms)). Where is the > bottle-neck? > - Jim Gettys > Digital Equipment Corporation > Cambridge Research Laboratory I believe there is a deep misunderstanding here, and re-reading my post I realize that I am largely responsible for it. [On the other hand, where did you hear of a system that does an RPC over the ether in 200 usecs ???? From the SOSP proceedings I see that the official score seems to be: Cedar: 1.1 MILLIsecs/call Dorado Amoeba: 1.4 Tadpole (68020) V: 2.5 Sun 3/75 Topaz: 2.7 Firefly (5-way multi) Sprite: 2.8 Sun 3/75 Topaz: 4.8 Firefly (mono) ] The RPC numbers I gave are LOCAL: the output from the program is definitely misleading in the use of the terms "local" and "remote". What the author meant is "local" for a normal procedure call (in the same address space), "remote" for a different thread but on the same machine. [The results should be the same for threads in the same or in separate address spaces, the test is across separate address spaces.] The table is also misleading in that times are not normalized. Here are the normalized numbers, all times in MICROSECONDS per call. Test syscall: E:22 U:3 S:19 Test localLoop: E:0 U:0 S:0 Test localNull: E:0 U:0 S:0 Test localAdd: E:1 U:1 S:0 Test localBigIn: E:20 U:20 S:0 Test localBigOut: E:20 U:20 S:0 Test localBigInOut: E:39 U:39 S:0 Test remoteNull: E:198 U:1 S:112 Test remoteAdd: E:206 U:14 S:98 Test remoteBigIn: E:279 U:17 S:123 Test remoteBigOut: E:201 U:14 S:100 Test remoteBigInOut: E:276 U:39 S:126 What the test wants to show is the approx ratio between local (LPC?) and remote procedure call (RPC!) on the given machine, which in this case turns out to be slightly better than a factor of 10 slower, which is not too bad given the amount of optimization the MIPS guys put in their compilers. It is well known that Mach network IPC is not very fast, we always stressed functionality over performance, e.g. the ability to transparently replace transport protocols from TPC to UDP to VMTP to ... whatever with only trivial changes in a single user-level process. Maybe some day we will put some work on getting it fast, but right now we are not in the race. Besides, Mach IPC is a no-cheat IPC system, with all the security measures necessary for a true multiuser/time-shared machine. For instance, I would expect the guys at Trusted Information Systems to see about the same performance on their Secure Mach. Comparisons with unsecure systems like Topaz are therefore inappropriate, with all due respect to all the painful work they did to get their very good numbers. On a multiprocessor. [Same goes for Amoeba or V-kernel] We also do well on transfering large volumes of data, see for instance what the NeXT box does with bitmaps. Anyways, here are the times I get between two similar pmaxen, on the same cable, right_now, multiuser etc etc. These times ARE for network IPC and ARE normalized as above in usecs per call. And Rick WILL kill me for handing them out. [Take them as lower-bounds, on a young machine] binding to host rvb Test remoteNull: E:6563 U:15 S:203 Test remoteAdd: E:6421 U:0 S:265 Test remoteBigIn: E:7172 U:47 S:141 Test remoteBigOut: E:6563 U:15 S:94 Test remoteBigInOut: E:7140 U:94 S:266 Why is it so slow ? Because the path is Machine-A: client -> kernel -> network_server -> kernel -> ether Machine-B: ether -> kernel -> network_server -> kernel -> server that is, the network_servers "interpose" between kernels. On other systems the path is typically something like Machine-A: client -> kernel -> ether Machine-B: ether -> kernel -> server which saves 4 `copy' operations. Topaz then cheats by only `copying' once, from the client's stack into a preallocated packet (and then back) in user-visible memory. The packet is then handed over to the device driver by reference. If I had to guess, I'd say this way they could do something like 500 MICROsecs/call on a multiprocessor pmax. And if they double-cheated by using the Washington version they'd probably do even better. The term `copy' above includes context-switching overhead, if applicable. As far as X is concerned, I take the word of the people at MIT that "It runs visibly faster under Mach than under Ultrix". I am no X guru, for me the machine is so fast anyways that I can't see any difference. But I would indeed expect the better ether driver to have some positive effects. I'd be glad to run under Mach the benchmark you used to get the "3.6 msecs" figure and report the findings, where do I ftp it from ? BTW, did you see the tex previewer for X11 by Eric Cooper on a pmax ? sandro- PS: I am setting up a TAR file with the benchmarks I mentioned, except of course the Mach sources, for which you can subsitute your favorite BSD kernel. By tomorrow it should be available by anonymous FTP on host testarossa.mach.cs.cmu.edu [128.2.250.252] in the directory /usr/pub
af@spice.cs.cmu.edu (Alessandro Forin) (12/22/89)
Archive-name: benchmark/af-rpc-how-to-get Original-posting-by: af@spice.cs.cmu.edu (Alessandro Forin) Original-subject: Re: tests Archive-site: testarossa.mach.cs.cmu.edu [128.2.250.252] Archive-directory: /usr/pub Archive-files: mach_tests.TAR.Z Reposted-by: emv@math.lsa.umich.edu (Edward Vielmetti) The tests I mentioned in my post are now available. Since our ftp daemons are very concerned with security I suggest you do not try any fancy, just follow this script: ftp testarossa.mach.cs.cmu.edu # if your name lookup fails try: # ftp 128.2.250.252 ...login: anonymous ..Passwd: guest ftp> cd /usr/pub ftp> ls ftp> binary ftp> get mach_tests.TAR.Z ftp> quit Then you should uncompress the file and un-tar it. Have a Merry Christmas, sandro-