fouts@orville.nas.nasa.gov (Marty Fouts) (11/17/87)
In article <916@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes: > >0) At the rate of speed this business moves, writers and editors >are hard-pressed to keep up, even when they try very hard. I would feel for these editors, except they do a generally bad job, including both articles such as those cited and the way in which they handle news releases. Daily papers, which have a less than 24 hour deadline handle news release rewrites with more accuracy than most industry monthly or weekly papers. It wouldn't bother me if it happened occasionally, but every press release I've seen about my organization has come out badly garbled. Knowing the lack of accuracy about what I can check, I've forced myself to doubt everything I read in these magazines, which makes me like the man with two clocks. >Since many of the trade rags are controlled circulation, you >can't usefully threaten to cancel your subscription! > Actually you can, it just takes a lot more threats for them to do something. Controlled circulation magazines make their money off of their advertisers, and the advertising rate depends heavily on how well the market is targeted. If enough people quit reading a bad trade magazine, it will quit being published. >5) In general, it is hopeless to improve some of the rags, which are >little above the National Enquirer. Some of the magazines try very hard, >even to having their own benchmark suites which they want to watch >running on a real machine. A word about magazine benchmarking suites. Byte magazine had an article in the July 1987 issue which contained a benchmark comparison of the 80386 and the 68020 which consisted of a suite of five benchmarks. They were all flawed in ways that the readership of this group is well familiar with, but my favorite is one called float which contained code like: #define CONST1 3.141597E0 #define CONST2 1.7839032E4 #define COUNT 10000 double a, b, c; int i; a = CONST1; b = CONST2; for (i = 0; i < COUNT; ++i) { c = a * b; /* These two statements are repeated a total of 12 times */ c = c / a; /* "So that the loop overhead is dominated by work" */ } where the for loop is suppose to measure the C libraries ability to do double precision floating point. Over half the compilers I have tried this code on recognize the loop invariance and constant propagation and generate code to either statically allocate a, b and c or simple store instructions at run time, making the code three runtime instructions. (Which happen outside the timing loop . . .)
ram%shukra@Sun.COM (Renu Raman, Sun Microsystems) (11/17/87)
Recently a friend of mine, while hunting for a 386 based PC was given a copy of a page from PCWEEK that had benchmarks of various 386 boards. Apart from the usual VAX realitve mips, the sieve etc, curious enough they had a NOOP(!!) number - showing the time it takes to execute a noop. Ofcourse the noop number was not compared with Vaxens but a comparison of various 386 boards was given. Thought this might interesting within the context of this topic. That reminded me of the old joke about trumping NOOPS in a CRAY. Soon we may have machine X relative NOOP speeds and .... [Actually there may be a good use for NOOPS] --------------------- Renu Raman ARPA:ram@sun.com Sun Microsystems UUCP:{ucbvax,seismo,hplabs}!sun!ram M/S 5-40, 2500 Garcia Avenue, Mt. View, CA 94043
mash@mips.UUCP (John Mashey) (11/18/87)
In article <3425@ames.arpa> fouts@orville.nas.nasa.gov.UUCP (Marty Fouts) writes >In article <916@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes: >>0) At the rate of speed this business moves, writers and editors >>are hard-pressed to keep up, even when they try very hard. >I would feel for these editors, except they do a generally bad job,.... Perhaps this leads to something useful: perhaps (either here or in some other newsgroup), we should all post examples of what we think are inaccurate or accurate reporting, and/or good/bad benchmarking. This would at least give other people calibrations on believability. >>Since many of the trade rags are controlled circulation, you >>can't usefully threaten to cancel your subscription! >Actually you can, it just takes a lot more threats for them to do >something.... Unfortunately, if your are a vendor, you MUST continue to get these things in self-defense, if nothing else... > >>5) In general, it is hopeless to improve some of the rags, which are >>little above the National Enquirer. Some of the magazines try very hard, >>even to having their own benchmark suites which they want to watch >>running on a real machine. > >A word about magazine benchmarking suites. Byte magazine had an... >They were all flawed in ways that the readership of this group is well Good point. I'd rate magazines on the following levels (somewhat akin to the old UNIX novice->guru scale): 1) Novice: believes all vendor mips & flops ratings, publishes same without even cursory checks. Thinks whetstones are what you sharpen knives with. Doesn't know difference between single and double-precision. Not really trying, and glad to hype unsupported claims. 2) Beginner: at least labels vendor mips ratings as "claimed". Has heard of LINPACK and other commonly-used ones, and even has some idea of what they measure, at least that some are integer and some are floating point. May still count NOOPS/second. 3) Intermediate: at least has some benchmarks, and wants to see them run on real machines. Benchmarks may have silly flaws, but can at least tell the difference between a 4.7MHz 8088 and a 20MHz 386. A few benchmarks might even be useful, if interpreted carefully. Trying. 4) Advanced: knows the difference between LINPACK and Livermore Loops. Either has own (useful) benchmarks, or gives credence to the more realistic ones that are generally available. Knows when geometric mean should be used. Trying hard. 5) Wizard: not only does all of 4), but is competent at spotting benchmark oddities. Understands the surprises of caches and optimizing compilers. Understands reasons for skepticism and publishes same. Has good idea when somebody sets HZ wrong. Knows when disk benchmarks fit in cache. Verifies claimed numbers by watching them run, and verifies vendor claims regarding other vendor performance by calling the other vendors. Exhorts people to be skeptical. Trying very hard. I'd put some of the Byte stuff in 3). Digital Review I'd put in 4: despite the fact that there are a few silly tests in the 33-test suite, most of it correlates pretty well with some kinds of computing, and it actually has a few real programs in it. Anyway, I'd encourage everybody to write letters to editors, both good and bad: how else is anything going to change if we don't give them feedback. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
ccplumb@watmath.waterloo.edu (Colin Plumb) (11/19/87)
In article <925@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes: >Perhaps this leads to something useful: perhaps (either here or in >some other newsgroup), we should all post examples of what we think >are inaccurate or accurate reporting, and/or good/bad benchmarking. >This would at least give other people calibrations on believability. I, for one, would appreciate this greatly. While I know that 90% of everything is bullshit, and 99% of benchmarks, I can't just disregard them entirely. Some figures should just be thrown out (like naive work-loops that the compiler optimizes out), but sometimes I need to extract some sense from them. If some experts here could post a detailed critique, I could learn a great deal. I know everyone likes to compare their machines to VAXen on small benchmarks because the frequency of procedure calls is disproportionately high, which maked the VAX suffer, but what else should I look out for? It's a jungle out there... And in the benchmark skill list: >5) Wizard: not only does all of 4), but is competent at spotting >benchmark oddities. Understands the surprises of caches and optimizing >compilers. Understands reasons for skepticism and publishes same. >Has good idea when somebody sets HZ wrong. Knows when disk benchmarks >fit in cache. Verifies claimed numbers by watching them run, >and verifies vendor claims regarding other vendor performance by calling >the other vendors. Exhorts people to be skeptical. Trying very hard. (blush)... What's "HZ"? The closest I can come is Hertz, but that's usually not settable by anything but a motherboard swap. :-) -- -Colin (watmath!ccplumb) Zippy says: Did I say I was a sardine? Or a bus???