[comp.arch] myths & magazines

fouts@orville.nas.nasa.gov (Marty Fouts) (11/17/87)

In article <916@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:
>
>0) At the rate of speed this business moves, writers and editors
>are hard-pressed to keep up, even when they try very hard.

I would feel for these editors, except they do a generally bad job,
including both articles such as those cited and the way in which they
handle news releases.  Daily papers, which have a less than 24 hour
deadline handle news release rewrites with more accuracy than most
industry monthly or weekly papers.  It wouldn't bother me if it
happened occasionally, but every press release I've seen about my
organization has come out badly garbled.  Knowing the lack of accuracy
about what I can check, I've forced myself to doubt everything I read
in these magazines, which makes me like the man with two clocks.

>Since many of the trade rags are controlled circulation, you
>can't usefully threaten to cancel your subscription!
>

Actually you can, it just takes a lot more threats for them to do
something.  Controlled circulation magazines make their money off of
their advertisers, and the advertising rate depends heavily on
how well the market is targeted.  If enough people quit reading a bad
trade magazine, it will quit being published.

>5) In general, it is hopeless to improve some of the rags, which are
>little above the National Enquirer.  Some of the magazines try very hard,
>even to having their own benchmark suites which they want to watch
>running on a real machine.

A word about magazine benchmarking suites.  Byte magazine had an
article in the July 1987 issue which contained a benchmark comparison
of the 80386 and the 68020 which consisted of  a suite of five benchmarks.
They were all flawed in ways that the readership of this group is well
familiar with, but my favorite is one called float which contained
code like:

#define CONST1 3.141597E0
#define CONST2 1.7839032E4
#define COUNT 10000
double a, b, c;
int i;
a = CONST1;
b = CONST2;
for (i = 0; i < COUNT; ++i) {
  c = a * b;  /* These two statements are repeated a total of 12 times */
  c = c / a;  /* "So that the loop overhead is dominated by work" */
}

where the for loop is suppose to measure the C libraries ability to do
double precision floating point.  Over half the compilers I have tried
this code on recognize the loop invariance and constant propagation
and generate code to either statically allocate a, b and c or simple
store instructions at run time, making the code three runtime
instructions. (Which happen outside the timing loop . . .)

ram%shukra@Sun.COM (Renu Raman, Sun Microsystems) (11/17/87)

     Recently a friend of mine, while hunting for a 386 based PC was
     given a copy of a page from PCWEEK that had benchmarks of
     various 386 boards.  Apart from the usual VAX realitve mips, the sieve
     etc, curious enough they had a NOOP(!!) number - showing
     the time it takes to execute a noop.  Ofcourse the noop number was
     not compared with Vaxens but a comparison of various 386 boards was
     given.   Thought this might interesting within the context of this
     topic.

     That reminded me of the old joke about trumping NOOPS in a CRAY.
     Soon we may have machine X relative NOOP speeds and ....

     [Actually there may be a good use for NOOPS]

---------------------
   Renu Raman				ARPA:ram@sun.com
   Sun Microsystems			UUCP:{ucbvax,seismo,hplabs}!sun!ram
   M/S 5-40, 2500 Garcia Avenue,
   Mt. View,  CA 94043

mash@mips.UUCP (John Mashey) (11/18/87)

In article <3425@ames.arpa> fouts@orville.nas.nasa.gov.UUCP (Marty Fouts) writes
>In article <916@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:

>>0) At the rate of speed this business moves, writers and editors
>>are hard-pressed to keep up, even when they try very hard.

>I would feel for these editors, except they do a generally bad job,....

Perhaps this leads to something useful: perhaps (either here or in
some other newsgroup), we should all post examples of what we think
are inaccurate or accurate reporting, and/or good/bad benchmarking.
This would at least give other people calibrations on believability.

>>Since many of the trade rags are controlled circulation, you
>>can't usefully threaten to cancel your subscription!

>Actually you can, it just takes a lot more threats for them to do
>something....

Unfortunately, if your are a vendor, you MUST continue to get these
things in self-defense, if nothing else...
>
>>5) In general, it is hopeless to improve some of the rags, which are
>>little above the National Enquirer.  Some of the magazines try very hard,
>>even to having their own benchmark suites which they want to watch
>>running on a real machine.
>
>A word about magazine benchmarking suites.  Byte magazine had an...
>They were all flawed in ways that the readership of this group is well

Good point.  I'd rate magazines on the following levels (somewhat
akin to the old UNIX novice->guru scale):

1) Novice: believes all vendor mips & flops ratings, publishes same
without even cursory checks.  Thinks whetstones are what you sharpen
knives with. Doesn't know difference between single and double-precision.
Not really trying, and glad to hype unsupported claims.

2) Beginner: at least labels vendor mips ratings as "claimed".
Has heard of LINPACK and other commonly-used ones, and even has
some idea of what they measure, at least that some are integer and some are
floating point.  May still count NOOPS/second.

3) Intermediate: at least has some benchmarks, and wants to see them run
on real machines.  Benchmarks may have silly flaws, but can at least tell the
difference between a 4.7MHz 8088 and a 20MHz 386.  A few benchmarks might
even be useful, if interpreted carefully. Trying.

4) Advanced: knows the difference between LINPACK and Livermore Loops.
Either has own (useful) benchmarks, or gives credence to the more realistic ones
that are generally available.  Knows when geometric mean should be used.
Trying hard.

5) Wizard: not only does all of 4), but is competent at spotting
benchmark oddities.  Understands the surprises of caches and optimizing
compilers.  Understands reasons for skepticism and publishes same.
Has good idea when somebody sets HZ wrong.  Knows when disk benchmarks
fit in cache.  Verifies claimed numbers by watching them run,
and verifies vendor claims regarding other vendor performance by calling
the other vendors.  Exhorts people to be skeptical.  Trying very hard.

I'd put some of the Byte stuff in 3).  Digital Review I'd put in 4:
despite the fact that there are a few silly tests in the 33-test suite,
most of it correlates pretty well with some kinds of computing,
and it actually has a few real programs in it.

Anyway, I'd encourage everybody to write letters to editors, both good
and bad: how else is anything going to change if we don't give them feedback.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

ccplumb@watmath.waterloo.edu (Colin Plumb) (11/19/87)

In article <925@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:
>Perhaps this leads to something useful: perhaps (either here or in
>some other newsgroup), we should all post examples of what we think
>are inaccurate or accurate reporting, and/or good/bad benchmarking.
>This would at least give other people calibrations on believability.

I, for one, would appreciate this greatly.  While I know that 90% of
everything is bullshit, and 99% of benchmarks, I can't just disregard
them entirely.  Some figures should just be thrown out (like naive
work-loops that the compiler optimizes out), but sometimes I need to
extract some sense from them.  If some experts here could post a detailed
critique, I could learn a great deal.  I know everyone likes to compare
their machines to VAXen on small benchmarks because the frequency of
procedure calls is disproportionately high, which maked the VAX suffer,
but what else should I look out for?  It's a jungle out there...

And in the benchmark skill list:
>5) Wizard: not only does all of 4), but is competent at spotting
>benchmark oddities.  Understands the surprises of caches and optimizing
>compilers.  Understands reasons for skepticism and publishes same.
>Has good idea when somebody sets HZ wrong.  Knows when disk benchmarks
>fit in cache.  Verifies claimed numbers by watching them run,
>and verifies vendor claims regarding other vendor performance by calling
>the other vendors.  Exhorts people to be skeptical.  Trying very hard.

(blush)... What's "HZ"?  The closest I can come is Hertz, but that's
usually not settable by anything but a motherboard swap. :-)
--
	-Colin (watmath!ccplumb)

Zippy says:
Did I say I was a sardine?  Or a bus???