sauer@auschs.UUCP (Charlie Sauer) (11/21/88)
I had hoped to just sit back and watch this discussion, but... - there seems to be little distinction between the original machine and the upgrades last year and this year - there are some key points which have been insufficiently noticed in the discussion, e.g., the role of optimizing compilers - there have been some significant inaccuracies, e.g., with respect to the implementation and impact of the VRM. (Contrary to one assertion, the VRM is not half assembly code, but mostly PL.8 and C. Though the VMI has negative impact in some applications, in many cases that is more than compensated for by the benefits of the paging code in the VRM and the real time capabilities in the VRM.) So, I'll try to provide some perspective. The main reason, in my opinion, that the original machine seemed/was slow when it appeared in Spring of '86 is that the design decisions were made, for the most part, with the plan that the machine ship in Winter of '84. If we had been able to hold that plan, then it would have seemed much faster. The main limitations in performance in the machine when released were - the compilers, both for AIX and 4.2/RT, had very little optimization capability. Thus they violated a fundamental concept of RISC, that of exploiting the processor with highly optimizing compilers. In AIX 1.1, we began providing global optimizing C and Fortran compilers based on pcc/f77 but incorporating HCR's Portable Code Optimizer. In 87, a C compiler based on the PL.8 compiler (the "Advanced C Compiler") became available as an AIX option. Correspondingly, the Metaware High C compiler became available with 4.2/RT and 4.3/RT. These compilers provided roughly 2 to 1 improvements in performance in many applications. - there was no built in floating point hardware and the optional floating point accelerator was not very fast - the disk controller used a ST-506 interface and had no DMA capability Though the I/O bus was only 16 bits, it had a number of implementation enhancements over the AT bus, e.g., 32 bit burst/buffering extensions, and that bus is not used the main memory bus, so I don't think that bus is a major bottleneck in the machine. It is able to sustain more than 2 megabytes/sec in DMA transfers, and that is adequate for most applications on a machine of that processor speed. The original processor was a 6 MHz NMOS implementation. It had several instructions which were multiple cycle but should have been one cycle, and was not able to pipeline loads and stores with virtual memory enabled. On our standard internal CPU kernel, it comes in at 2.1 MIPS. Last year we started shipping a new processor implementation which reduces the above cited multiple cycle instructions to single cycle instructions, is able to pipeline loads and stores with virtual memory enabled, and has a number of other minor improvements. The 10 MHz CMOS implementation of that chip comes in at 4.5 MIPS on the above cited kernel. Primarily due to memory shortages, we were unable to provide adequate quantities of machines with that processor until early this year. In July we started shipping machines with a 12.5 MHz version of the new implementation. Though these machines are not as fast as some high end workstations, we think they are very competitive in price/performance, as do others. See David Wilson's dollar/Khornerstone ratings in the May Unix Review, for example. Besides reimplementing the processor and providing optimizing compilers, we provided a 20MHz 68881 standard with the 10 MHz CMOS processor and provided an optional floating point accelerator using the ADSP 3210 and 3221. With the 12.5 MHz machines, the standard floating point unit is based on those parts. We also started providing DMA controllers for both ESDI and SCSI disks, with caching on the controller cards. For those that care, RT models 10, 15, 20 and 25 have the 6 MHz NMOS processor. The 115 and 125 have the 10 MHz CMOS processor, and the 130 and 135 have the 12.5 MHz CMOS processor. Those are the machines that we have shipped. I think it is well known that we are working on follow on machines which support the Micro-Channel. Other than that, I don't think much is publicly known about those machines, so I won't say anything more about them now. -- Charlie Sauer IBM AES/ESD, D75/802 uucp: cs.utexas.edu!ibmaus!sauer 11400 Burnet Road 822: @CS.UTEXAS.EDU:sauer@ibmaus.uucp Austin, Texas 78758 aesnet: sauer@auschs (512) 823-3692 vnet: SAUER at AUSVM6
sauer@auschs.UUCP (Charlie Sauer) (11/30/88)
Since my previous posting, I've received several mail requests to post benchmark results. Here's what I have so far from the group responsible for running various benchmarks: RT PERFORMANCE ___________________________________________________________________ | | | | | |*DHRY- | WHETSTONES| LINPACK | | MODEL | | | | | STONES| KWHETS | KFLOPS | | | | |FLOAT| |_________________________________ | |PROC |MHZ |POINT|NOTE| | SP | DP | SP | DP | ___________________________________________________________________ |RT 025 |ROMP | 5.9| FPA | | 4000 | 985 | 730 | 196 | 118 | | 125 |APC |10.0| FPA2| 1 | 8300 |1893 |1733 | 400 | 360 | | 125 |APC |10.0| FPA2| 2 | 8300 |1893 |1733 |1040 | 700 | | 135 |APC |12.4| * | 1 | 10400 |2298 |2087 | 490 | 410 | | 135 |APC |12.4| * | 2 | 10400 |2298 |2087 |1210 | 810 | ___________________________________________________________________ * DHRYSTONE = V1.1 W/ ADVANCED C COMPILER 1. LINPACK PERFORMANCE = UNROLLED BLAS 2. LINPACK PERFORMANCE = CODED BLAS -- Charlie Sauer IBM AES/ESD, D75/802 uucp: cs.utexas.edu!ibmaus!sauer 11400 Burnet Road 822: @CS.UTEXAS.EDU:sauer@ibmaus.uucp Austin, Texas 78758 aesnet: sauer@auschs (512) 823-3692 vnet: SAUER at AUSVM6
butcher@g.gp.cs.cmu.edu (Lawrence Butcher) (12/01/88)
Mr. Sauer's benchmark people should be aware that the current version of the Dhrystone benchmark is version 2.0. Version 1.1 numbers are not timely. We have a bunch of RT's. We use the Metaware High C compiler version 1.4R, fall back to pcc when mc generates incorrect code, and run MACH. The Dhrystone numbers Mr. Sauer quotes are so far from the ones that I measure here that I wonder if we are measuring different things. If any reader can get their RT to run the benchmark significantly faster than I can, please send me mail. We would love to know what to do to get a C compiler that doubles the speed of our programs. My measured Dhrystone numbers (Dhry 2.0) are (roughly): 2191 for the (RT, ROMP, 6150, Model 025, at 6 MHz) using mc 3176 for the 6152 using mc 3551 for the (APC, 6151, 125, at 10 MHz) using pcc 4474 for the (APC, 6151, 125 at 10 MHz) using mc The following are my personal feelings only. The 025, running the software available to me, is about as fast as a SUN 3/50 while running the Dhrystone program. However my experience is that an 025 machine with 4 megs of RAM, a disk, and MACH, is totally useless. It cannot run X10, an xterm, and a single outgoing telnet at the same time. Yes, I am saying that it is not as useful as a terminal. A diskless Sun 3/50 running SUN OS 3.5, suntools, emacs, and pcc is comfortable. The 125 is about as fast as a SUN 3/60, again running the software we have here. High end 80386 and middle-end 68030 systems should be about twice as fast as the 3/60. I also have access to a MIPS M500 running at 8 MHz. It Dhrystones at 12000+. This number may be hard for others to verify, because I don't think that MIPS makes 8 MHz chips any more. The Performance Semi version of the R3000 is available at 16, 20, and 25 MHz. This chip is commercially available and there is a C compiler available for it which generates correct code. Of course we shouldn't use the the existence of a fast MIPS processor to try to predict the speed of future IBM machines. Just the speed of MIPS machines. I am disappointed that there is no version of GCC for the RT. IBM could allocate one person to do a port and give away the results, but it seems that they would rather have a company sell a compiler without source. The APA8 display was a joke, and I am disappointed that IBM believes they fixed things with the APA16 display. Although both look great on a desk when they are shut off, both are too small to use for editing. I am disappointed with the IBM 2-button mouse. It takes tremendous force to push the buttons. We had an informal contest and only one person here could hold both buttons down while lifting the mouse off the table. (For people who have not had the pleasure of using the IBM mouse, you hold both buttons down at once to emulate holding the middle button down on a real mouse). I absolutely cannot believe that the keyboard clicks when you hit the shift or control key. I wish that the caps lock key and the control key were the same size so we could switch the key caps. Thank you someone for that layout. Everyone here rebinds CTRL to the right place. And the ESC key?? I am disappointed that the RT expansion bus is an AT bus. The AT form factor limits the size of peripheral cards, and limits the power they have available. Ever heard of an SMD controller for the AT?? 16 line serial line card?? These problems will be worse with smaller MCA cards. We will throw away the RT boxes when they are obsolete. We will NOT throw away the SUN 3/160 boxes when the 3/160 is no longer interesting. Many of the things that I dislike about the RT could be fixed in the future. Today we give APC's to people who cannot afford SUNs. We give old RT's to people to punish them :-) Lawrence Butcher @g.gp.cs.cmu.edu --
njs@scifi.UUCP (Nicholas J. Simicich) (12/03/88)
In article <3736@pt.cs.cmu.edu> butcher@g.gp.cs.cmu.edu (Lawrence Butcher) writes: >Mr. Sauer's benchmark people should be aware that the current version of the >Dhrystone benchmark is version 2.0. Version 1.1 numbers are not timely. > >We have a bunch of RT's. We use the Metaware High C compiler version 1.4R, >fall back to pcc when mc generates incorrect code, and run MACH. The >Dhrystone numbers Mr. Sauer quotes are so far from the ones that I measure >here that I wonder if we are measuring different things. If any reader can >get their RT to run the benchmark significantly faster than I can, please >send me mail. We would love to know what to do to get a C compiler that >doubles the speed of our programs. > (.....) > >The 025, running the software available to me, is about as fast as a SUN 3/50 >while running the Dhrystone program. However my experience is that an 025 >machine with 4 megs of RAM, a disk, and MACH, is totally useless. It cannot >run X10, an xterm, and a single outgoing telnet at the same time. Yes, I am >saying that it is not as useful as a terminal. A diskless Sun 3/50 running >SUN OS 3.5, suntools, emacs, and pcc is comfortable. The 125 is about as >fast as a SUN 3/60, again running the software we have here. High end 80386 >and middle-end 68030 systems should be about twice as fast as the 3/60. At IBM T.J. Watson Research, we have a number of RT's running Mach. A simple C program running CPU bound with a working set of around 300k running niced makes it impossible to do any other work on the machine. This does not happen on either the AOS or AIX machines we have. The operating system seems to be the sole difference I can come up with. I believe that the Mach operating system runs well on a number of other machines and suspect that it is simply a matter of tuning. I typically run X 10, GNUemacs, outgoing telnets under Xterm, and so forth. I also have other people logging in to my system through telnet, am the server for some DS style remote mounts, and so forth. Admittedly, I have more memory and an APC. But I first brought up AIX on a 3 meg 025, and it was servicable, just a tad slow. (.....) >I am disappointed that there is no version of GCC for the RT. IBM could >allocate one person to do a port and give away the results, but it seems >that they would rather have a company sell a compiler without source. As far as I knew, someone in Project Athena has had a working code generator for GCC since last year, but that there were (at the time) technical reasons why the code generated was incorrect, even though it generated the correct results. Something about the granularity on the intermediate code pass. I have no idea what the current status is. (.....) >I am disappointed with the IBM 2-button mouse. It takes tremendous force >to push the buttons. We had an informal contest and only one person here >could hold both buttons down while lifting the mouse off the table. (For >people who have not had the pleasure of using the IBM mouse, you hold both >buttons down at once to emulate holding the middle button down on a real >mouse). I believe that there is a great deal of similiarity between the IBM RT mouse and the older Microsoft mice. The mechanisms seem to be similar, as does the button pushing force. I can hold the buttons down while using the mouse and lifting it off of the table surface, and frequently do, since I allocate the cover of a Usenix 4.2 manual as my mouse surface. Then again, I have large hands. People can pick up a Microsoft mouse and judge for themselves. Personally, I think that one button is the right number of buttons on the mouse. But this doesn't fit the X model. >I absolutely cannot believe that the keyboard clicks when you hit the shift >or control key. I wish that the caps lock key and the control key were the >same size so we could switch the key caps. Thank you someone for that >layout. Everyone here rebinds CTRL to the right place. And the ESC key?? Neither can I, as mine doesn't click when I push ctrl or shift. But I run AIX. The clicking is under software control, not hardware control, so I suspect that this is a problem with MACH or the AOS porting base again. I won't comment on key placement. >I am disappointed that the RT expansion bus is an AT bus. The AT form factor >limits the size of peripheral cards, and limits the power they have available. >Ever heard of an SMD controller for the AT?? 16 line serial line card?? I use the Anvil Systems Stallion 16 line card. It has a 186 on it, and all of the cooking and stuff is offloaded to the card. Communication to the card is at the ioctl()/read()/write() level. Requires a special I/O driver, of course, which is available from Anvil for AIX. This is not a product endorsement. At the Unix Expo, I saw a lot of 16 line serial cards that ran on the AT bus, as well as some that fit in the smaller form factor of the MCA bus card. Of course, they required special connectors. We sell a SCSI adapter, but not an SMD adapter, as far as I know, although I understand that you can get conversion cards. >These problems will be worse with smaller MCA cards. We will throw away the >RT boxes when they are obsolete. We will NOT throw away the SUN 3/160 boxes >when the 3/160 is no longer interesting. Throw one my way? I'll be glad to drive over and pick it up. :-) >Many of the things that I dislike about the RT could be fixed in the future. >Today we give APC's to people who cannot afford SUNs. We give old RT's to >people to punish them :-) > > Lawrence Butcher @g.gp.cs.cmu.edu >-- I believe that you are correct in your assessment: the problems are fixable. Some are already fixed, in that I think that the 19 inch Megapel is enough real-estate to edit on, and that other problems you mentioned are being fixed, through our efforts and through the efforts of third parties. I also believe that at least some of these problems you mention are software related, perhaps even Mach specific, and that the RT can't be blamed for them. Since this has gotten away from comp.arch, I've directed followups to comp.sys.ibm.pc.rt. -- Nick Simicich --- uunet!bywater!scifi!njs --- njs@ibm.com (Internet)
rick@pcrat.UUCP (Rick Richardson) (12/03/88)
In article <3736@pt.cs.cmu.edu> butcher@g.gp.cs.cmu.edu (Lawrence Butcher) writes: >Mr. Sauer's benchmark people should be aware that the current version of the >Dhrystone benchmark is version 2.0. Version 1.1 numbers are not timely. > >My measured Dhrystone numbers (Dhry 2.0) are (roughly): > >2191 for the (RT, ROMP, 6150, Model 025, at 6 MHz) using mc >3176 for the 6152 using mc >3551 for the (APC, 6151, 125, at 10 MHz) using pcc >4474 for the (APC, 6151, 125 at 10 MHz) using mc Looking back on the last list of 1.1 results I put together, it seems pretty clear the Sauer's "Advanced C" numbers are running with all possible optimization turned on. Which makes them pretty much useless as an indication of anything other than that the optimizer people have been at work. Dhrystone 2.1 is not as easily fooled by optimizers. How about posting some 2.1 numbers for the record? -- Rick Richardson | JetRoff "di"-troff to LaserJet Postprocessor|uunet!pcrat!dry2 PC Research,Inc.| Mail: uunet!pcrat!jetroff; For anon uucp do:|for Dhrystone 2 uunet!pcrat!rick| uucp jetroff!~jetuucp/file_list ~nuucp/. |submission forms. jetroff Wk2200-0300,Sa,Su ACU {2400,PEP} 12013898963 "" \d\r\d ogin: jetuucp
rpd@RPD.MACH.CS.CMU.EDU (Richard Draves) (12/05/88)
In article <447@scifi.UUCP> njs@scifi.UUCP (Nicholas J. Simicich) writes: >At IBM T.J. Watson Research, we have a number of RT's running Mach. A >simple C program running CPU bound with a working set of around 300k >running niced makes it impossible to do any other work on the machine. >This does not happen on either the AOS or AIX machines we have. The >operating system seems to be the sole difference I can come up with. >I believe that the Mach operating system runs well on a number of >other machines and suspect that it is simply a matter of tuning. I know of one performance gotcha with Mach on RTs. The RT MMU only allows sharing of segments. Mach VM is more general and allows sharing of pages. However, it should still notice when what is being shared is in fact an entire segment (notably, text segments) and use a common segment to implement the sharing. However, it doesn't do this. (The interface between the machine-independent and machine-dependent VM code makes it difficult to figure out that this is possible/desirable.) Instead, each address space is composed of different segments. Because the RT architecture only allows a page to be in one segment at a time, when a process uses a shared page it may take a "translation fault" which moves the page into the right segment. These faults are pretty expensive; on a Model 25 RT they take more than a millisecond. For example, every time our csh runs a command about 80 of these faults occur. Rich Sanzi recently greatly improved the translation-fault handling time, but it is still an unfortunate performance hit. I dug out my copy of Dhrystone 1.1 and tried to reproduce Sauer's numbers. Sauer Draves Model 25 4000 3270 Model 125 8300 7855 Model 135 10400 9765 I used hc2.1d and ran the tests single-user. Problems with VM don't explain the discrepancies. (I wonder why the Model 25 number is especially far off?) Is there some compiler better than hc2.1d? Do AIX and AOS get different numbers? Rich Draves --
njs@scifi.UUCP (Nicholas J. Simicich) (12/05/88)
In article <3764@pt.cs.cmu.edu> rpd@RPD.MACH.CS.CMU.EDU (Richard Draves) writes: .............. >I dug out my copy of Dhrystone 1.1 and tried to reproduce Sauer's numbers. > Sauer Draves >Model 25 4000 3270 >Model 125 8300 7855 >Model 135 10400 9765 > >I used hc2.1d and ran the tests single-user. Problems with VM don't explain >the discrepancies. (I wonder why the Model 25 number is especially far off?) >Is there some compiler better than hc2.1d? Do AIX and AOS get different >numbers? .............. I haven't asked Charlie, but I suspect that he would have used the Advanced C Compiler under AIX for his figures. I haven't compared the compilers, personally. This is a totally different compiler with totally different numbers, I presume. Charlie? -- Nick Simicich --- uunet!bywater!scifi!njs --- njs@ibm.com (Internet)
friedl@vsi.COM (Stephen J. Friedl) (12/06/88)
In article <628@pcrat.UUCP>, rick@pcrat.UUCP (Rick Richardson) writes:
< [...]
< Which makes them pretty much useless as an indication
< of anything other than that the optimizer people have been
< at work.
<
< Dhrystone 2.1 is not as easily fooled by optimizers. How
< about posting some 2.1 numbers for the record?
This looks like the traditional battle between the people
who make tank armor and those who make anti-tank weapons...
--
Stephen J. Friedl 3B2-kind-of-guy friedl@vsi.com
V-Systems, Inc. attmail!vsi!friedl
Santa Ana, CA USA +1 714 545 6442 {backbones}!vsi!friedl
Nancy Reagan on my new '89 Mustang GT Convertible: "Just say WOW!"
sauer@auschs.UUCP (Charlie Sauer) (12/06/88)
In article <471@scifi.UUCP>, njs@scifi.UUCP (Nicholas J. Simicich) writes: > I haven't asked Charlie, but I suspect that he would have used the > Advanced C Compiler under AIX for his figures. I haven't compared the > compilers, personally. This is a totally different compiler with > totally different numbers, I presume. Charlie? Yes, it was the Advanced C Compiler, with AIX 2.2, that was used for those numbers, as a footnote (Advanced C, which implies AIX) in the posting indicated. -- C.H. Sauer IBM Advanced Workstations Div. uucp: cs.utexas.edu!ibmaus!sauer 11400 Burnet Road, D75/802 822: @CS.UTEXAS.EDU:sauer@ibmaus.uucp Austin, Texas 78758-2502 aesnet: sauer@auschs (512) 823-3692 vnet: SAUER at AUSVM6