mash@mips.COM (John Mashey) (03/11/89)
In article <93452@sun.uucp> garner@sun.UUCP (Robert Garner) writes: ..... >The judgement so far is that the 22% improvement must be coming from >the FORTRAN version. As I've never seen a FORTRAN version of >Dhrystone, does anyone at Intel have the source that they could >post on the net? Or was the reference to the Fortan >compiler a typo? (The Performance brief remarks: >"Dhrystone was developed in ADA by R. Weicker in 1984. >Fortran and C versions of the benchmark are more commonly used.") >It will be interesting to see how pointers and structures are handled. >Also, which Fortran library routine was used to do the string copies? If it is indeed true that this is no typo, it is fascinating, as it is well-known that C's byte-by-byte copy can add 25-30% over (for example) PASCAL, or anything that has fixed-length character strings. (On an R3000, 34% of Dhrystone is in strcmp & strcpy.) Of course it depends on what kind of code-generation is done, also, i.e., in-line versus out-of-line. In reading the Intel i860(TM) performacne document, it is interesting to note that 7 benchmarks are presented: Dhrystone 1.1 and 2.1, Stanford Integer, SP & DP Whetstone, and FORTRAN and Coded LINPACK. Of these, the document claims that Dhrystone was in FORTRAN, and Stanford and LINPAK were simulated (with zero-wait-state memory; this makes little difference to Stanford, as it mostly fits in the cache.) That means, that in terms of published benchmarks that seem apples-to-apples comparable, the sum total is: SP & DP Whetstone. On Feb 27, Green Hills announced it was shipping C and FORTRAN compilers for the i860, so it does seem a little strange that C wouldn't have been used. Maybe it is a typo, and there's some other reason. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
clif@intelca.intel.com (Ken Shoemaker) (03/14/89)
In article <15074@winchester.mips.COM>, mash@mips.COM (John Mashey) writes: > In article <93452@sun.uucp> garner@sun.UUCP (Robert Garner) writes: > ..... > >The judgement so far is that the 22% improvement must be coming from > >the FORTRAN version. As I've never seen a FORTRAN version of Mashey writes > > If it is indeed true that this is no typo, it is fascinating, > as it is well-known that C's byte-by-byte copy can add 25-30% over > (for example) PASCAL, or anything that has fixed-length character strings. > (On an R3000, 34% of Dhrystone is in strcmp & strcpy.) > In reading the Intel i860(TM) performacne document, it is interesting > to note that 7 benchmarks are presented: > Dhrystone 1.1 and 2.1, Stanford Integer, SP & DP Whetstone, > and FORTRAN and Coded LINPACK. > > That means, that in terms of published benchmarks that seem apples-to-apples > comparable, the sum total is: SP & DP Whetstone. The i860 CPU benchmark report had a TYPO the Dhrystone benchmark used the Greenhill C compiler not FORTRAN. Sorry to dissappoint everyone who thought that we were getting great Dhrystone numbers by rewritting the benchmark in FORTRAN. As for the simulated numbers versus actual numbers. We have an excellent correlation (within 3%) between simulated numbers and actual numbers. My speculation (note the word speculation) as to why the the Dhrystone numbers are so good is: Clock Frequency 128-bit loads for string instructions The clocks/instruction is 1 (I imagine other RISC chips approach 1 clock/instruction but don't actually obtain it) Clif Purkiser Intel Corp. The above views are mine and don't represent Intel's official position.
mash@mips.COM (John Mashey) (03/14/89)
In article <210@intelca.intel.com> clif@intelca.intel.com (Ken Shoemaker) writes: ... >The i860 CPU benchmark report had a TYPO the Dhrystone benchmark used >the Greenhill C compiler not FORTRAN. >Sorry to dissappoint everyone who thought that we were getting great >Dhrystone numbers by rewritting the benchmark in FORTRAN. > >As for the simulated numbers versus actual numbers. We have an excellent >correlation (within 3%) between simulated numbers and actual numbers. > >My speculation (note the word speculation) as to why the the Dhrystone >numbers are so good is: > > Clock Frequency > 128-bit loads for string instructions > The clocks/instruction is 1 (I imagine other RISC chips > approach 1 clock/instruction but don't actually obtain it) Thanx for the correction; that certainly saves wasting some time. 1) Can you say any more words on simulations? I.e., everybody understands that the memory system is irrelevant for almost-100%-cache-hit programs [Dhrystone, Stanford, Whetstone], but we'd be surprised that a 5-wait-state machine (the measured one) and the zero-wait-state machine (the simulated one) would be within 3% on DP LINPACK, given the speed of the basic FP ops. Could the zero-wait-state thing also be a typo? 2) OK, I give up. There must be something unbelievably clever going on to use 128-bit loads for C-language string operations. I've looked at the i860 Programmer's Reference Manual a bunch, trying to figure out how to use either the FP unit or the graphics unit to do this. The string copy on page 9-5 of the manual is the "natural" strcpy (which doesn't use anything but byte load/store, and takes about 5 cycles/byte). I haven't been able to find anything like "branch on any byte zero", and the 860 doesn't have unaligned word operations. For a fair test, you MUST use str* that only assume byte alignment of operands, and you can't inline the str*. The only place I can think of using 128-bit loads is in the structure-copy, and it shouldn't be used there, unless structures whose largest entities are words are always aligned to 4-word boundaries, which seems unlikely. 3) Anyway, various people at various companies still can't figure out why the number can reasonably be this high, under the normal rules, UNLESS there's some really slick trick for getting strcpy and strcmp down around 2 cycles/byte. There just aren't enough differences between an R3000 and an 860, on this benchmark, to account for this otherwise. [Everything fits in the caches; an 860 wins some places, an R3000 wins in some places; the R3000 has essentially no write stalls on this benchmark, so difference between write-thru and writeback is irrelevant; etc; since something like 40% of the time is spent in str*, and the rest is spread around; it's really the major place to look.] Maybe somebody at Intel would care to post the str* routines and educate us? -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
cprice@mips.COM (Charlie Price) (03/16/89)
In article <93088@sun.uucp> garner@sun.UUCP (Robert Garner) writes: >Question: What's going on with the i860 Dhrystone/MHz ratio? > >The Intel "i860 Processor Performance" brief--Release 1.0, March 89--shows >82,900 Dhrystones/sec for version 1.1 for a scaled 40-MHz i860. >With compiler improvements and elimination of "errata on the current >stepping of the i860 processor", it says they expect to push the value to 90K. > >According to the paper, 69K Dhrystones/sec were measured on a Compaq 386/20 >add-in card with a 33-MHz i860 and 8MB of SCRAM (0-wait cycles for hits, >5-W for read miss, and 2-W for write misses). > >As a sanity check to a similar micro-architecture implementation (split >i&d caches, 1-cycle load/store), the 25-MHz R3000 value is 42,300 >Dhrystones/sec (w/ MIPS -O3, i.e., interprocedural register allocation). > >Since the i860 integer/cache micro-architecture is so similar to >the R3000 integer/cache micro-architecture, and assuming that Intel's >compiler technology is not significantly better than MIPSCo's, >shouldn't an i860 value equal a scaled R3000 value? > >Scaling the 25-MHz R3000 value up to 40 MHz gives 67,680 Dhrystones/sec. >So where did Intel get the extra 22% ? The 25 MHz R3000 can actually get 51,800 1.1 dhrystones per second. This number scales to 82,800 at 40 MHz, within 100 of Intel's figure. The 51.8K number, however, is beyond the spirit of the benchmark. What do dhrystone numbers mean? MIPS has maintained for a long time that they are not especially meaningful numbers. Our existing results show that you need a lot of context to have any idea what a number is telling you. MIPS has just released Performance Brief 3.6 (March 89 -- I assume that Mashey will post it sometime) and it has an interesting set of numbers for dhrystone. There are numbers for two different compiler releases and one number for the newer compilers with assembly language versions of strcpy() and strcmp(). dhrystone 1.1 -- M/2000-8 (25 MHz R3000) (numbers are Kilo-loops) Default opt -O -O3 -O4 1.31 compiler 32.4 (K) 39.7 42.3 45.3 2.00 compiler 32.6 (K) 39.7 43.1 46.7 2.00 with new str rtns 47.4 2.00 with new str rtns (my measurement, not in Brief) 51.8 dhrystone 2.1 -- M/2000-8 (25 MHz R3000) (numbers are Kilo-loops) Default opt -O -O3 -O4 1.31 compiler 33.0 (K) 36.7 38.8 42.8 2.00 compiler 32.4 (K) 36.7 39.4 43.2 >(1) More aggressive compiler optimizations? >MIPSCo's -O4 value, which according to MIPSCo's Performance Brief >"is beyond the spirit of the benchmark", is 45,300 Dhrystones/sec. >Scaled to 40 MHz, this still falls short at 72,480 Dhrystones/sec. MIPS believes (and says in the Performance Brief) that -O4 is beyond the spirit of the benchmark. We include the -O4 results to show what is possible, and for illumination of what a dhrystone figure without context can mean since not all quoted figures are The "-O4" optimizations give 7.1% for the 1.31 compiler, 8.4% for the 2.00 compiler with string routines in C, and 9.3% for the 2.00 compiler with strcmp() and strcpy() in assembler. If you don't know exactly what the optimizations are, it is hard to say what a dhrystone number might mean. >(2) Faster string copy/compare with graphics instructions? This can be quite important. For dhrystone 1.1 on an M/2000 assembly-language routines give 9.3% increase for -O3 (within the spirit of the benchmark), and 11.6% increase for -O4 (too much optimization). I believe that these lib routines are just assembler, not especially tuned for the dhrystone string lengths. If you wanted faster dhrystone numbers, you could get them with string routines that worked generally, but especially well for the specific length operations that dhrystone does. Again, if you don't know much about the libraries, you can't determine what the dhrystone number is telling you. The range of dhrystone figures for a single MIPS machine might be interesting because it tells you something about the compilers, but a single "dhrystones for this machine" just doesn't mean much. -- Charlie Price cprice@mips.com (408) 720-1700 MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA 94086
hanko@masscomp.UUCP (Jim Hanko) (03/17/89)
In article <15226@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >In article <210@intelca.intel.com> clif@intelca.intel.com (Ken Shoemaker) writes: >... >>The i860 CPU benchmark report had a TYPO the Dhrystone benchmark used >>the Greenhill C compiler not FORTRAN. >>My speculation (note the word speculation) as to why the the Dhrystone >>numbers are so good is: ... >> >> 128-bit loads for string instructions > > >2) OK, I give up. There must be something unbelievably clever going on >to use 128-bit loads for C-language string operations. ... >... For a fair test, you MUST ^^^^^^^^^ >use str* that only assume byte alignment of operands, and >you can't inline the str*. ... > >3) Anyway, various people at various companies still can't figure >out why the number can reasonably be this high, under the >normal rules, UNLESS there's some really slick trick for >getting strcpy and strcmp down around 2 cycles/byte. A couple of years ago I investigated the output of the Green Hills C compiler on the Dhrystone benchmark (for a different architecture). I remember being somewhat surprised to see that the compiler had inlined the strcpy calls. It could do this since most of the calls were of the form: strcpy(x, "a constant string"); I believe that it did not actually copy the bytes from memory but loaded long immediate values and stored them. Although strcpy is extensively called with string constants in Dhrystone, this is relatively rare in real programs. Therefore, such a compiler feature seems to be targeted specifically to Dhrystone. I can't say that the Intel version of the compiler has this "optimization" (or if it did that Intel knew about it), but this may explain the high numbers. Can anyone with access to the compiler check this? I think it would clearly be unfair to compare Dhrystone numbers where this trick was used to those where a strcpy subroutine was called. - #include <std/disclaimer> Jim Hanko {uunet|decvax|harvard|mit-eddie}!masscomp!hanko
chase@Ricerca.orc.olivetti.com (David Chase) (03/17/89)
In article <955@masscomp.UUCP> hanko@masscomp.UUCP (Jim Hanko) writes: >Although strcpy is extensively called with string constants in >Dhrystone, this is relatively rare in real programs. Therefore, such a >compiler feature seems to be targeted specifically to Dhrystone. > >I think it would clearly be unfair to compare Dhrystone numbers where this >trick was used to those where a strcpy subroutine was called. Get real. Nobody with a half a brain should trust silly little benchmark programs that reduce performance to a single number. Develop benchmarks based on real code that does real work, and perhaps compiler writers will target will target all those "unfair" optimizations at code that people actually use. Procedure inlining is "not in the spirit of Dhrystone", but it would be stupid not to use it for real programs if it was reasonably implemented. When I want to compare processors, I run the programs that I use every day. The one that works best on those is the one that works best for me. David
aglew@mcdurb.Urbana.Gould.COM (03/18/89)
>Although strcpy is extensively called with string constants in >Dhrystone, this is relatively rare in real programs. > >Jim Hanko {uunet|decvax|harvard|mit-eddie}!masscomp!hanko Have you got a source for this, or can you post numbers?
jesup@cbmvax.UUCP (Randell Jesup) (03/18/89)
In article <15226@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >In article <210@intelca.intel.com> clif@intelca.intel.com (Ken Shoemaker) writes: >... >>The i860 CPU benchmark report had a TYPO the Dhrystone benchmark used >>the Greenhill C compiler not FORTRAN. >>Sorry to dissappoint everyone who thought that we were getting great >>Dhrystone numbers by rewritting the benchmark in FORTRAN. ... >>My speculation (note the word speculation) as to why the the Dhrystone >>numbers are so good is: >> >> Clock Frequency >> 128-bit loads for string instructions >> The clocks/instruction is 1 (I imagine other RISC chips >> approach 1 clock/instruction but don't actually obtain it) ... >2) OK, I give up. There must be something unbelievably clever going on >to use 128-bit loads for C-language string operations. I've looked ... >doesn't have unaligned word operations. For a fair test, you MUST >use str* that only assume byte alignment of operands, and >you can't inline the str*. The only place I can think of using 128-bit >loads is in the structure-copy, and it shouldn't be used there, >unless structures whose largest entities are words are always aligned >to 4-word boundaries, which seems unlikely. Actually, I think the statement "Greenhills C" was the giveaway. We use Greenhills C here at Commodore for Amiga OS work, and got bitten recently because the compiler was set up with the "dhrystone" optimizer turned on, without our knowing it. This causes mis-aligned strcpy()s to bus-fault on 68000, since it (a) assumes string sources AND destinations are ALWAYS word- aligned, and (b) inlines strcpy, even though in general greenhills doesn't do inlining. So I suspect the differences are being caused by the "dhrystone" switch in Greenhills. -- Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup
mash@mips.COM (John Mashey) (03/18/89)
In article <39388@oliveb.olivetti.com> chase@Ricerca.UUCP (David Chase) writes: >In article <955@masscomp.UUCP> hanko@masscomp.UUCP (Jim Hanko) writes: .... >>I think it would clearly be unfair to compare Dhrystone numbers where this >>trick was used to those where a strcpy subroutine was called. ... >Get real. Nobody with a half a brain should trust silly little >benchmark programs that reduce performance to a single number. >Develop benchmarks based on real code that does real work, and perhaps >compiler writers will target will target all those "unfair" >optimizations at code that people actually use. Procedure inlining is >"not in the spirit of Dhrystone", but it would be stupid not to use it >for real programs if it was reasonably implemented. > >When I want to compare processors, I run the programs that I use every >day. The one that works best on those is the one that works best for me. A lot of us do this, and we also try to publish it. So far, Intel: -published the results of EXACTLY ONE integer benchmark (Dhrystone 1.1 & 2.1) actually measured on this machine (@ 33MHz) -features this number prominently in its marketing claims (well, actually, it features the numbers that would be gotten at 40MHz, or sometimes 50MHz) -uses it frequently to claim superiority over other processors -in its performance document, describes Dhrystone WITHOUT THE SLIGHTEST TRACE OF CAVEATS about the care with which these results must be interpreted, despite the fact that the Dhrystone sources give such caveats, and that the Dhrystone table in the Intel document comes straight from a document that takes great pains to warn the reader to be very careful about interpretation. However, Olivetti is not Intel. Since you work at a site well-known to be working on i860s, perhaps you could suggest for us the "programs that you use every day" that you run on an i860, and performance thereof, so we could have a better shot at evaluating its performance. You could really help the cause of realistic-benchmarking by posting sources-of-real-programs (if any could be public domain) of such programs and their i860 times..... Anyway, it's no wonder than many users of computers trust vendors abotu as far as they can throw them.... -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.COM (John Mashey) (03/18/89)
In article <6326@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes: ... > Actually, I think the statement "Greenhills C" was the giveaway. >We use Greenhills C here at Commodore for Amiga OS work, and got bitten recently >because the compiler was set up with the "dhrystone" optimizer turned on, >without our knowing it. This causes mis-aligned strcpy()s to bus-fault on >68000, since it (a) assumes string sources AND destinations are ALWAYS word- >aligned, and (b) inlines strcpy, even though in general greenhills doesn't >do inlining. > > So I suspect the differences are being caused by the "dhrystone" switch >in Greenhills. Can you say more? Do you mean that this is a compile-time switch, but the default was set up to do the strpy this way? (You clearly must be able to turn it off, since some real programs will fail if this is done generally. Maybe it only inlines strcpy when all the conditions are right (and this was a bug)?) Or is this something gen'd into some compilers, but not others? In any case, do you (or anybody) know the option names for turning this effect ON/OFF? (maybe they're different across CPUs?) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
cquenel@polyslo.CalPoly.EDU (56 more school days) (03/19/89)
In article <6326@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes: > Actually, I think the statement "Greenhills C" was the giveaway. We use Greenhills C here at Commodore for Amiga OS work, and got bitten recently >because the compiler was set up with the "dhrystone" optimizer turned on, >without our knowing it. This causes mis-aligned strcpy()s to bus-fault on >68000, since it (a) assumes string sources AND destinations are ALWAYS word- >aligned, and (b) inlines strcpy, even though in general greenhills doesn't >do inlining. Ghreenstones -- The benchmark number corresponding to Dhrystones 1.1 run through a "dhrystone" optimizing Greenhills compiler. Known to be 2 to 3 times higher than it should be. --chris (I didn't write this. You can't sue me, I don't have any money! P.S. Greenhills deserves it.) -- @---@ ------------------------------------------------------------------ @---@ \. ./ | Chris Quenelle (The First Lab Rat) cquenel@polyslo.calpoly.edu | \. ./ \ / | Better Red than dead ! | \ / ==o== ------------------------------------------------------------------ ==o==
w-colinp@microsoft.UUCP (Colin Plumb) (03/19/89)
mash@mips.COM (John Mashey) wrote: > 2) OK, I give up. There must be something unbelievably clever going on > to use 128-bit loads for C-language string operations. I've looked > at the i860 Programmer's Reference Manual a bunch, trying to figure > out how to use either the FP unit or the graphics unit to do this. Yeah... the Z-buffer check instructions could be used for this, but they're only available in 16 and 32-bit versions, and you have to test the bits from the psr, two cycles. And even that would only be 64 bits at a time. > The string copy on page 9-5 of the manual is the "natural" strcpy > (which doesn't use anything but byte load/store, and takes about 5 cycles/ > byte). I haven't been able to find anything like "branch on any byte zero", > and the 860 doesn't have unaligned word operations. For a fair test, > you MUST use str* that only assume byte alignment of operands, and > you can't inline the str*. The only place I can think of using 128-bit > loads is in the structure-copy, and it shouldn't be used there, > unless structures whose largest entities are words are always aligned > to 4-word boundaries, which seems unlikely. Well, you can quickly cobble together some code using the ((x-0x01010101)&~x)&0x80808080 != 0 trick that works on words at a time. This would help in Dhrystone which, as has been observeed, has unnaturally long strings. If you get this going with a bit of alternation to allow for load latency, you can get strlen down to about 5 cycles/word. Strcmp and strcpy would be slower, but would probably be bandwidth-limited. As for structure copies, what you want is for all structures 4 words or larger in size to always be 4-word aligned. Intel suggests the stack is kept this way, for just the same reason. (I admit this is starting to enter the realm of declining returns - you can waste a lot of memory this way - but is still feasable.) > Maybe somebody at Intel would care to post the str* routines > and educate us? I posted the instruction set - it's an exercise for the reader. :-) -- -Colin (uunet!microsoft!w-colinp) "Don't listen to me. I never do." - The Doctor
chase@Ozona.orc.olivetti.com (David Chase) (03/21/89)
In article <15475@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >Since you work at a site well-known to be working on i860s, perhaps you >could suggest for us the "programs that you use every day" X11, GNU emacs, make, cc (as, ld), tex, dvi2ps, iptex (widely accessible), m2c, m2l, m2make, m3cfe, m3be (not accessible, not all portable either). >that you run on an i860, None -- several very important ones only run on 68K boxes, and have not even been ported to SunOS 4.*. I tend to be conservative in moving to new hardware and software. >and performance thereof, I'm not sure, but I don't think I could even if I had the numbers. When in doubt, I don't comment publicly on Olivetti endeavors, or on companies with whom we appear to be working. I seem to recall (from my student days) agreements with DEC and IBM barring me from publishing benchmarks or bug reports for products not yet released to the rest of the world, so I'll assume that such an agreement probably holds here. In general, I think that I am I/O- and bad software-bound, not CPU-bound. (That is, I am CPU bound, but only because some critical software is Really Stupid. It would be helped by a faster cpu, but that solution (though practical and probably the one I'll use) offends me.) Benchmarks will appear via e-mail. David
cramer@sun.com (Sam Cramer) (03/21/89)
In article <15475@winchester.mips.COM>, mash@mips (John Mashey) writes: >So far, Intel: > -published the results of EXACTLY ONE integer benchmark (Dhrystone > 1.1 & 2.1) actually measured on this machine (@ 33MHz) > -features this number prominently in its marketing claims > (well, actually, it features the numbers that would be gotten at 40MHz, > or sometimes 50MHz) > -uses it frequently to claim superiority over other processors > -in its performance document, describes Dhrystone WITHOUT THE SLIGHTEST > TRACE OF CAVEATS about the care with which these results must be > interpreted, despite the fact that the Dhrystone sources give such > caveats, and that the Dhrystone table in the Intel document comes > straight from a document that takes great pains to warn the reader > to be very careful about interpretation. Intel is not the only company to do this. In the "DECstation 3100 Performance Summary" distributed by DEC, the SINGLE integer benchmark shown is Dhrystone 2.1. The text that accompanies these results contains no warning regarding the tendency of Dhrystone to overstate "real-world" integer performance. In fact, the Dhrystone results are used to calculate "price-performance" ratios relative to two Sun machines (a vital metric for those users who spend all day running Dhrystones). This section of the document (titled "CASE and Dhrystone"!) goes on to imply that these Dhrystone results promise good performacne on a CASE workload. It seems that when DEC acquired RISC technology from MIPS, they overlooked MIPS's benchmarking know-how. Sam Cramer sun!cramer cramer@sun.com
sclafani@jumbo.dec.com (Michael Sclafani) (03/21/89)
In article <95013@sun.Eng.Sun.COM>, cramer@sun.com (Sam Cramer) writes: > Intel is not the only company to do this. In the "DECstation 3100 > Performance Summary" distributed by DEC, the SINGLE integer benchmark shown > is Dhrystone 2.1. The text that accompanies these results contains no > warning regarding the tendency of Dhrystone to overstate "real-world" > integer performance. From the DECstation 3100 Performance Summary / Part 2: Performance Details: "Dhrystone is widely available, easy to run and is arguably the industry's most popular Integer benchmark. Unfortunately, the result obtained is difficult to fairly compare amongst differing computing architectures and is almost as sensitive to how the Dhrystone executable image is compiled and linked as it is to the underlying processor speed. The benchmark documentation presents a set of ground rules for building and executing Dhrystone. Today, the accepted practice is to run the benchmark under any environment you wish, as long as the environment is clearly described and procedure inlining compiler optimization is not employed." "Dhrystone does not seem to be the best indication of application performance and is unusual in the following respects: + Unusually low dynamic nesting depth of function calls + Unusually low number of instuctions executed per function call + Large percentage of time spent in "strcpy" and "strcmp" routines, processing unusually large character strings + Character strings are typically alignable on a word boundary + Does not show how the use of shared libraries in real workload with multiple concurrent applications effects performance Results for the Sun-3/60 are not reported because the data in [Presentation on Benchmarks given at Sun User Group Conference, Dec 5-7, 1988 by Sun Microsystems, Inc.] uses compiler optimization level 4 which employs procedure inlining. We include the Dhrystone benchmark in our performance evaluation because of its popularity, but warn against using it as the sole basis of comparing system performance and of accepting results that don't explicitly label how the benchmark was built and what optimizations were exploited." The performance summary and technical information are available via anonymous ftp as compressed postscript from gatekeeper.dec.com in ~ftp/pub: ds3100_perf.1a.ps.Z ds3100_perf.1b.ps.Z ds3100_perf.2.ps.Z ds3100_tech.ps.Z The summary includes Linpack, Whetstone, DR Labs CPU2, Livermore FORTRAN Kernels, Dhrystone (2.1 AND 1.1), SPICE 2G6, Doduc, Dynamic Graphics TOP Benchmark, and X11 graphics benchmarks. Please note that I am not a Digital spokescritter, and any opinions presented or errors committed are my own. -- Michael Sclafani \\\ Digital Equipment Corporation sclafani@src.dec.com \\\ Systems Research Center, Palo Alto, CA (415) 854-7569 (home) \\\ (415) 853-2271 (work)
david@sun.com (Academy of Pathetic Mail) (03/22/89)
In article <13641@jumbo.dec.com> sclafani@jumbo.dec.com (Michael Sclafani) writes: >Results for the Sun-3/60 are not reported because the data in >[Presentation on Benchmarks given at Sun User Group Conference, Dec 5-7, >1988 by Sun Microsystems, Inc.] uses compiler optimization level 4 which >employs procedure inlining. FYI, this is incorrect. Current Sun C compilers do not perform procedure inlining at any optimization level. -- David DiGiacomo, Sun Microsystems, Mt. View, CA sun!david david@sun.com
cramer@sun.com (Sam Cramer) (03/22/89)
Mr. Sclafani is quite right - I overlooked the section toward the back of the document which he quotes. My apologies for mistakenly claiming that the "DECstation 3100 Performance Summary" contains no caveats regarding Dhrystone. Nonetheless, it is a bit curious that such a flawed benchmark (the results of which are described in the section I missed as "difficult to fairly compare amongst differing computing architectures") is used as the basis of a price/performance comparison. Sam Cramer sun!cramer cramer@sun.com
jg@jumbo.dec.com (Jim Gettys) (03/23/89)
In article <95013@sun.Eng.Sun.COM> cramer@sun.com (Sam Cramer) writes: >Intel is not the only company to do this. In the "DECstation 3100 >Performance Summary" distributed by DEC, the SINGLE integer benchmark shown >is Dhrystone 2.1. The text that accompanies these results contains no >warning regarding the tendency of Dhrystone to overstate "real-world" >integer performance. In fact, the Dhrystone results are used to calculate >"price-performance" ratios relative to two Sun machines (a vital metric for >those users who spend all day running Dhrystones). This section of the >document (titled "CASE and Dhrystone"!) goes on to imply that these >Dhrystone results promise good performacne on a CASE workload. > >It seems that when DEC acquired RISC technology from MIPS, they overlooked >MIPS's benchmarking know-how. If you get Mashey's latest performance brief (Issue 3.6), you will find pretty well complete DECstation 3100 performance numbers for his entire suite. I sent them to John within a few days of announcement. (I ran his suite last fall myself). Our marketeers blew the original summary; I was to review it before it saw the light of day, but was in the process of moving from California to Massachusetts. Due to some unfortunate problems getting a copy printed when I arrived in Cambridge, I was unable to get this problem fixed (only one integer benchmark) in the first version of the memo saw the light of day. To first order, you can easily estimate its performance given the fact it is a 16.667 mhz R2000. Due to differences in the memory subsystem, it is slightly slower than the MIPS M/120-5. It was the typical problem of people working to deadlines overlooking things. The summary had to make a printer's deadline for announcement. The marketing folks promised to update the original briefs within a couple weeks of the original announcement; I believe you wil find that current ones are better. - Jim Gettys
mash@mips.COM (John Mashey) (03/23/89)
In article <13645@jumbo.dec.com> jg@jumbo.UUCP (Jim Gettys) writes: >If you get Mashey's latest performance brief (Issue 3.6), you will find >pretty well complete DECstation 3100 performance numbers for his entire >suite. I sent them to John within a few days of announcement. (I ran >his suite last fall myself). Yes. Between Uniforum & being knocked flat with the flu, it's taken a while to postthis, although we gave a lot out at Uniforum. I've got a couple typos and broken referecnes t ofix, and then I'll post 3.7 (soon). -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086