[net.micro.mac] Harmonic Series Benchmark

jimb@amd.UUCP (Jim Budler) (05/25/85)

Someone on arpanet proposed a floating point benchmark of timing the
sum of the first 10,000 terms of the harmonic series, i.e. for 
i = 1 to 100000; sum = sum + 1/i;  
He listed times for Megamax = long, Aztec = 36 seconds; MacModula = 9
seconds; and MacFortran = 4 seconds.
He also stated the right answer was 9.787613 determined from a Vax 11/780
taking 0.02 seconds.

Now either I'm doing it wrong or something, because I didn't get that
answer on an IBM 3081.  But anyway, here is my results, with the
actual output of the Macintosh using Mac C from Consulair.
----------------------------------------------------------------------
Calculating the sum of 10,000 terms of the harmonic series
Vax 11/780 with FPU took 0.02 seconds, answer was 9.787613
The originator of this problem assumed this was the right answer!
IBM 3081 UTS System III took 0.00? seconds, the answer was 9.787606
Valid 68000 BDS4.1c workstation took 23 seconds, the answer was 9.787606

This is an Apple Macintosh(TM) with Mac C(TM),
	using Double, 64 bit precision:
Time = 34.48 Seconds
This answer is 9.787606

Bill Duvall, Consulair, recommends using extended
precision for computation as it is the native mode
of the SANE and Mac C floating point, so:


This is an Apple Macintosh(TM) with Mac C(TM),
	using Extended, 80 bit precision:
Time = 25.15 Seconds
This answer is 9.787606
--------------------------------------------------------------
The times and answers for the IBM3081, Valid and of course the Mac
are mine.  At least, if my math is wrong, it's consistent, so the 
relative times and accuracies are meaningful.

It's obvious to me that Bill Duvall was correct when he said to 
do the calculations in extended and convert the results for
printing.  It also looks like decent times for the Mac.
-- 
 Jim Budler
 Advanced Micro Devices, Inc.
 (408) 749-5806
 UUCPnet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amdcad!jimb
 Compuserve:	72415,1200

jimb@amd.UUCP (Jim Budler) (05/25/85)

In article <> jimb@amd.UUCP (Jim Budler) writes:
>Someone on arpanet proposed a floating point benchmark of timing the
>sum of the first 10,000 terms of the harmonic series, i.e. for 
>i = 1 to 100000; sum = sum + 1/i;  
>
>He also stated the right answer was 9.787613 determined from a Vax 11/780
>taking 0.02 seconds.
>
>Now either I'm doing it wrong or something, because I didn't get that
>answer on an IBM 3081.  But anyway, here is my results, with the
>actual output of the Macintosh using Mac C from Consulair.

Well, I now know the answer to my own question.  If you use float
as apposed to double, the answer on a Vax is 9.787613.  If you
use double the answer is 9,787606, the same as I got on the IBM, Valid
and Mac.  I'll have to go back and try float on the Mac.
-- 
 Jim Budler
 Advanced Micro Devices, Inc.
 (408) 749-5806
 UUCPnet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amdcad!jimb
 Compuserve:	72415,1200

g-inners@gumby.UUCP (05/26/85)

> He also stated the right answer was 9.787613 determined from a Vax 11/780
> taking 0.02 seconds.
> 
> Now either I'm doing it wrong or something, because I didn't get that
> answer on an IBM 3081.  But anyway, here is my results, with the
> actual output of the Macintosh using Mac C from Consulair.
> ----------------------------------------------------------------------

I suspect the difference (9.787613 vs 9.787606) is due to differences
in the number of bits of precision.  The VAX has one bit more precision
in many cases due to the 'implied one' immediately after the binary
point.  Since normalized mantissa's always start with a '1' bit, the
VAX doesn't store it.  It stores another bit at the end instead.

The the VAX result is probably better than the 3081 or Mac.
				-- Mike Inners

jimb@amd.UUCP (Jim Budler) (05/27/85)

In article <> jimb@amd.UUCP (Jim Budler) writes:
>In article <> jimb@amd.UUCP (Jim Budler) writes:
>>Someone on arpanet proposed a floating point benchmark of timing the
>>sum of the first 10,000 terms of the harmonic series, i.e. for 
>>i = 1 to 100000; sum = sum + 1/i;  
>>
>>He also stated the right answer was 9.787613 determined from a Vax 11/780
>>taking 0.02 seconds.
>>
>>Now either I'm doing it wrong or something
>
>Well, I now know the answer to my own question.

I went back and took another closer look at the problem.  The poster
of the problem specified 32 bits for the problem.  I redid my
program, using both floats(32 bit), doubles(64 bit), and extended (80
bit).  In addition I reread Bill Duval's words on the floating point
and eliminated the casts from the program.  
i.e. was:	j += ( 1.0 / (float) i ) where i is long.
     is:	j += ( 1.0 / i)

This resulted in a savings of about 5 seconds on the double, and
3 seconds on the float.

Here are my final results.  I'll post the source in net.sources.mac.
Actual output of the program.
-------------------------------------------------------------------
Calculating the sum of 10,000 terms of the harmonic series
Vax 11/780 with FPU took 0.02 seconds the answer was 9.787613
This result from originator of problem. Assumed "right" answer.

This is a Macintosh with Consulair Mac C 2.0,
using float, 32 bit precision:

Time = 29.43 Seconds	Sum = 9.787613

Using Doubles:
IBM3081 with UST System III took 0.00? seconds,
	the answer was 9.787606.
A Valid 68000 CAD workstation 4.1c BSD took 23 seconds,
	the answer was 9.787606.

This is a Macintosh with Consulair Mac C 2.0,
using double, 64 bit precision:

Time = 29.87 Seconds	Sum = 9.787606

Bill Duvall of Consulair recommends using extended,
80 bit precision for computation with Mac C as that is the
base mode of his implemntation of the SANE/IEEE numerics.

This is a Macintosh with Consulair Mac C 2.0,
using extended, 80 bit precision:

Time = 25.15 Seconds	Sum = 9.787606
-- 
 Jim Budler
 Advanced Micro Devices, Inc.
 (408) 749-5806
 UUCPnet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amdcad!jimb
 Compuserve:	72415,1200

brouille@Shasta.ARPA (05/28/85)

*** REPLACE THIS LINE WITH YOUR MESSAGE ***

Before starting to argue as to which machine has the exact result, we should
first discuss the algorithm.
 
Calculating the harmonic by:
 
    for  ( i=1; i <= 10000; i++ )
        j += ( 1.0 / i );
 
is clumsy. When i reaches the big numbers, a lot of small significant digits
will be lost when we perform the addition.    For example, lets suppose that
the machine has 8 decimal digits of accuracy, and that the sum for
i = 1 to 9499 is  9.7325632.   Adding 1.0/9500 (1.0526316E-4) to our sum
will be done by first adjusting the small number to 0.0001053, and then
adding it to 9.7325632.  We have lost quite a few significant digits in
the process.
 
A better way is to write:
 
    for  ( i=10000; i > 0; i-- )
        j += ( 1.0 / i );
 
In this case, the Vax/780 comes with the answer 9.787604 (and not 9.787613)!
I would be curious to see what are the results on other systems.

			Jean-Luc Brouillet

jww@bonnie.UUCP (Joel West) (05/28/85)

> > He also stated the right answer was 9.787613 determined from a Vax 11/780
> > taking 0.02 seconds.
> > 
> > Now either I'm doing it wrong or something, because I didn't get that
> > answer on an IBM 3081.  But anyway, here is my results, with the
> > actual output of the Macintosh using Mac C from Consulair.
> > ----------------------------------------------------------------------
> 
> I suspect the difference (9.787613 vs 9.787606) is due to differences
> in the number of bits of precision.  The VAX has one bit more precision
> in many cases due to the 'implied one' immediately after the binary
> point.  

Actually, it's worse than that.  The IBM S/360 architecture is "hexadecimal
normalized", which means that "1" is stored as "0x1000...." in the mantissa
field, with 3 leading (non-significant!) zeros.  This reduces precision
by 1/2 to 1 digit vs. most comparable 32- and 64-bit floating point formats
(including the VAX, which the author notes squeezes out an extra bit
of precision).  

The accuracy of the original (1964? 65?) algorithm was so abysmal that IBM
had to re-engineer the machines in the field to add a "guard digit", which
is an extra hex digit in 32-bit floating (7 hex mantissa digits) format
that exists only during intermediate computations and is rounded when
stored in 32-bits.  I'm told a UCLA professor was very mad at IBM for
changing their hardware, because he could no longer reproduce the
(erroneous) 5-digit numbers in his book... :-)
-- 
	Joel West				     (619) 457-9681
	CACI, Inc. - Federal 3344 N. Torrey Pines Ct La Jolla 92037
	jww@bonnie.UUCP (ihnp4!bonnie!jww)
	westjw@nosc.ARPA

   "The best is the enemy of the good" - A. Mullarney

jimb@amdcad.UUCP (Jim Budler) (05/30/85)

In article <385@gumby.UUCP> g-inners@gumby.UUCP writes:
>I suspect the difference (9.787613 vs 9.787606) is due to differences
>in the number of bits of precision.  The VAX has one bit more precision
>...
 It is.  I used doubles, the poster used floats. 9.787606 is a more
accurate answer.
>...
>The the VAX result is probably better than the 3081 or Mac.

No, the answer for the Vax, the IBM 3081, and Consulair C are the same
given the same conditions:
	Floats(1->10000) = 9.787613
	Floats(10000->1) = 9.787604   (Less roundoff error)
	Doubles		 = 9.787606   (Direction doesn't matter)

A friend of mine ran my source code on Megamax C, got same answers, it 
took 64 seconds for floats, and 39 seconds for doubles.

I don't know why the original poster of the problem stated he got the wrong
answer from Megamax, but his times were comparable for Floats.
-- 
 Jim Budler
 Advanced Micro Devices, Inc.
 (408) 749-5806
 UUCPnet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amdcad!jimb
 Compuserve:	72415,1200

"... Don't sue me, I'm just the piano player!...."

guido@boring.UUCP (05/30/85)

I have unpacked the "Harmonics" benchmark program from the net, and
was rather upset about what was left of my favourite language, "C".

(Readers not prepared for flames, please skip to "Comments", below.)

----------
> #Options D=300

What's this?  can't we just pass the options to the C compiler any more?

----------
> // To make it fit 128K Mac

Is this a new form of comment?  Even if the compiler allows it, don't
you ever dare use it.  Even if you can't restrain yourself totally,
DON'T USE IT IN PROGRAMS YOU ARE POSTING TO THE NET!
(Hint, hint, as Chuqui would add.)

----------
> extended seconds;

Huh?  Is this a recent extension to ANSI C that I have missed?

----------
> int fd = open(outfile, 1);
> fprintf(fd, "Calculating the sum of 10,000 terms of the harmonic series\n");

Er, umm, I thought that files to be used with fprintf c.s. should be opened
with *fopen*, not with *open*.  On some systems (including that obscure
system whose name starts with U, ends with an x and has nothing but footnotes
in between) this makes a lot of differences, ya know.

----------
And now for the worst offense to C's syntax:

> printf("Calculating the sum of 10,000 terms of the harmonic series\n");
> /* Lots of printf's */
> register int i;
> register float j = 0;

I am sure that when I last looked, declarations should strictly *precede*
statements.

----------
>     printf("\nThis is a Macintosh with Consulair Mac C 2.0,\n");

WRONG.  This Macintosh (i.e., mine) is running SUMacC.
----------
(End flame)
==========

Comments

I won't comment on programming style, but I think that a program that is
meant as a benchmark could at least have the decency to use only portable
features of a language.  Where this is not possible (e.g., when the timing
difference for using 'extended' numbers is exercised), #ifdefs should
enclose the non-portable sections, and appropriate warnings made in comments.

Alternatively, it is possible that this was one of the first C programs
ever written by its author.  In this case, considering the burden already
placed on the net by inappropriate or duplicate postings, it would have
been better not to post it at all.  Consider this article as a short
lesson in portability.

Admittedly, it is quite possible that the manufacturers of Consulair C
have done their best to hide the fact that there are other versions of
C than theirs, and not provided any clues to the differences.  The fact
that they added #Options, one-line comments and the 'extended' data type
makes one expect the worst.  One suspects that they have never been exposed
to a real C compiler (I mean one running under U**x).  How come they're
such a big success?

BTW, SUMacC floating point (at least what you get when you use 'double' or
'float' variables) is not SANE, and it has some peculiarities, but the
timing was about 22 seconds, independent of whether float or double was
used.  The answer seemed right, at least for 'double'.  (Just to prove
that I *did* manage to correct all the bugs mentioned above.)

	Guido van Rossum, CWI, Amsterdam
	guido@mcvax.UUCP

(Remember, there is more to the Mac than raw computing power.)

jimb@amdcad.UUCP (Jim Budler) (06/03/85)

In article <6437@boring.UUCP> guido@boring.UUCP (Guido van Rossum) writes:
>was rather upset about what was left of my favourite language, "C".
>(Readers not prepared for flames, please skip to "Comments", below.)
>> #Options D=300
>What's this?  can't we just pass the options to the C compiler any more?
>
What do you think that is. An ice cream bar?
>----------
>> extended seconds;
>Huh?  Is this a recent extension to ANSI C that I have missed?
I didn't post an ANSI C program, I posted a Mac C program.
>----------
>with *fopen*, not with *open*.  On some systems (including that obscure
>system whose name starts with U, ends with an x and has nothing but footnotes
>in between) this makes a lot of differences, ya know.
I posted the program to net.sources.MAC, not net.U**x
>----------
>> /* Lots of printf's */
They said what I wanted them to say.

>I am sure that when I last looked, declarations should strictly *precede*
>statements.
It's not true in Mac C.
>
>----------
>>     printf("\nThis is a Macintosh with Consulair Mac C 2.0,\n");
>
>WRONG.  This Macintosh (i.e., mine) is running SUMacC.

And here I thought that your Vax was running SUMacC.
>Comments
>
>I won't comment on programming style, but I think that a program that is
Oh I thought you did.
>meant as a benchmark could at least have the decency to use only portable
>features of a language.  Where this is not possible (e.g., when the timing
>difference for using 'extended' numbers is exercised), #ifdefs should
>enclose the non-portable sections, and appropriate warnings made in comments.
I posted a Mac C source to net.sources.MAC, next time you see a posting
that says Mac C or Megamax C or something maybe you should consider it
as Forth or something.  

>...			  In this case, considering the burden already
>placed on the net by inappropriate or duplicate postings, it would have
Your posting was larger than my program and probably took longer to write.
>been better not to post it at all.  Consider this article as a short
>lesson in portability.
I hate to say it but almost none of the code posted to net.sources.mac is
portable.  I have to change include file names, toolbox calls, and many
other things.
>
>Admittedly, it is quite possible that the manufacturers of Consulair C
>have done their best to hide the fact that there are other versions of
>C than theirs, and not provided any clues to the differences.  The fact
>that they added #Options, one-line comments and the 'extended' data type
>makes one expect the worst.  One suspects that they have never been exposed
>to a real C compiler (I mean one running under U**x).  How come they're
>such a big success?
Extended data type was created by IEEE not Consulair. 
One line comments were under discussion by ANSI.
#Options is a preprocessor command which is needed because Mac C doesn't 
run under either 'sh' or 'csh' and thus has NO command line to parse.
What makes you think U**x compilers are so great.  A 4.2 program doesn't work
on a v7 or a SysV.......
>
>BTW, SUMacC floating point (at least what you get when you use 'double' or
>'float' variables) is not SANE, and it has some peculiarities, but the
>timing was about 22 seconds, independent of whether float or double was
>used.  The answer seemed right, at least for 'double'.  (Just to prove
>that I *did* manage to correct all the bugs mentioned above.)
The program ran as written on the compiler it was written for, and you
call these *bugs*???

I'm really glad you liked my program and I am so sorry that you wont be
able to compile any of my sources I decide to post, because I'm using
a native Macintosh language which contains calls not found in ANSI C
such as:

	GetResInfo()
	SetResInfo()
	OpenResourceFile()

Or are you saying that I shouldn't post programs to net.sources.MAC unless 
they'll run on a U**x machine?

What garbage.
-- 
 Jim Budler
 Advanced Micro Devices, Inc.
 (408) 749-5806
 UUCPnet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amdcad!jimb
 Compuserve:	72415,1200

"... Don't sue me, I'm just the piano player!...."

g-inners@gumby.UUCP (06/06/85)

(...lots of flamage deleted...)
> Or are you saying that I shouldn't post programs to net.sources.MAC unless 
> they'll run on a U**x machine?
> 
> What garbage.

No, the garbage is posting a 'benchmark' that runs on only one machine.
Benchmarks are intended to be comparative.  At their best, they run
without modification on all machines to be compared.  At worst, the mods
are minor.  Extensive changes (such as removing implementation extensions)
invalidate any comparison. 

What you posted wasn't a benchmark, it was an example program for
Mac C.  Don't advertise falsely, and you won't get flames!
				-- Mike Inners

jimb@amdcad.UUCP (Jim Budler) (06/07/85)

In article <392@gumby.UUCP> g-inners@gumby.UUCP writes:
>No, the garbage is posting a 'benchmark' that runs on only one machine.
>
>What you posted wasn't a benchmark, it was an example program for
>Mac C.  Don't advertise falsely, and you won't get flames!

What I posted was the program I used to generate times posted in
net.micro.mac, so people would know how I got the times in what I thought
was a better manner than (to quote one of the other posters) "... the
times are clocked between printf statements."

The reason I posted anything was because I spent much time trying
to find the _exact_ algorithm used by the original poster. His
verbal discription was certainly far less useful than my source code.

The _actual_ algorithm of the benchmark is only one line of
code in each precision:

	for (i = 1; i < 10000; i++) j += 1.0/i;

If that couldn't be determined by a quick scan of the code I guess
you couldn't READ C.

What do all you people do?  A C program comes over the net, you don't even
read it, but pass it through your compiler to see if it will break????

The ONLY reason that the program contained the name of the compiler
so many times was that the code started a one algorithm in double
which was then copied and pasted twice for each precision, then doubled 
again to get the output into a file as well as the screen.

A simple mechanical effort to duplicate a reasonable test of floating point
accuracy and speed has turned into a major issue. 

Your comments about a benchmark requiring portability is also rot.  The
byte, sieve of erat... benchmark is a Benchmark.  The C program to run
it is a Benchmark IMPLEMENTATION.  The Harmonic Benchmark was described
as " the sum of the first 10,000 terms of the harmonic series, with an
answer of 9.787613." I wrote an IMPLEMENTATION, and added the  doubles
answer of 9.787606.

In the intervening week before the flames started coming in the net
contained many interesting and informative comments appeared:

	1.) The series is more accurate if run in reverse.  I added
	this to my program (without posting the source), and found that
	the Mac retains accuracy in doubles.

	The new answers are 9.787604 in float, and 9.787606 in double.

	2.) We discovered that the minimum time for the series using
	SANE is 16.5 seconds.

	3.) An IBM PC AT is reported to take 1/3 the time my program 
	took.  BUT since the reporter didn't describe his method
	of timing his implementation we really don't know the exact
	speed difference.

I will admit to only TWO, count them ONE, TWO errors in that program
of any concern:
	1.) The use of // comments allowed by my compiler but not K&R.
	2.) Placing a declaration in the body of a block, again allowed
	by my compiler, but not K&R.

I do NOT want to hear again that extended is not K&R, because the use
of extended had a purpose within the meaning of the program.

Now, I'm tired of all this flaming so please direct the rest of them 
to the wastbasket.
-- 
 Jim Budler
 Advanced Micro Devices, Inc.
 (408) 749-5806
 UUCPnet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amdcad!jimb
 Compuserve:	72415,1200

"... Don't sue me, I'm just the piano player!...."

bytebug@pertec.UUCP (roger long) (06/26/85)

I don't know what is taking the Mac so long!  Our system, running at 10MHz
with wait states due to memory management, ran the following code in 10.6
seconds.

	main()
	{
		register i;
		double j;

		for (i=10000; i>0; i--)
			j += (1.0 / i);
		printf("result = %9.6f\n", j);
	}

The result I got was 9.787606.

A 68020 running at 12.5MHZ with a 16-bit memory bus ran it in 6.3 seconds.

A 68020 running at 16.667MHz with 32-bit memory bus ran it in 2.8 seconds.
-- 
	roger long
	pertec computer corp
	{ucbvax!unisoft | scgvaxd | trwrb | felix}!pertec!bytebug

beaucham@uiucuxc.Uiuc.ARPA (07/01/85)

Re the harmonic series benchmark--

I got 0.6u+0s on the VAX 11/780, 2.7 seconds on my IBM PC AT (with 80287 fpu),

and 0.7u+.2s on a 32016 machine running Genix (with 32081 fpu, 8MHz).

For the Genix execution to work I had to initialize j=0.

All computers gave the same result that you got.

                       Jim Beauchamp  {ihnp4,pur-ee}!uiucdcs!uiucuxc!beaucham