[net.lang.c] C language hacking

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/11/84)

As long as we're proposing our favorite extensions to C (not that
they should be in the forthcoming ANSI standard, but perhaps the
standard should anticipate that these extensions might be made
some day):

I find that frequently I need both the quotient and remainder of
two integers:	a / b	a % b
Since most (maybe all) machine architectures compute these at the
same time, it sure would be nice to be able to get both results
rather than wastefully reexecuting precisely the same instructions
a second time.  This becomes particularly bothersome when "a" & "b"
are fairly complicated expressions.  I have no idea what a good
syntax for such an operation would be.

It would also be nice if sin( x ) and cos( x ) could be computed
simultaneously with reduced cost.  I doubt if this is possible
but would like to know if it is.

guido@mcvax.UUCP (Guido van Rossum) (11/12/84)

>It would also be nice if sin( x ) and cos( x ) could be computed
>simultaneously with reduced cost.  I doubt if this is possible
>but would like to know if it is.

Can't you use (sin x)**2 + (cos x)**2 = 1 ?  This suggests that
abs cos x = root(1 - (sin x)**2); surely something can be said about
the sign given the range x is in.  This assumes computing a square root
is faster than computing a sine.  This is as close as you can g;
the type polynomial approximations used for computing sine/cosine
are not symmetrical around pi/4.  (I doubt that they are symmetrical
around any point, since their precision is only relevant between
0 and pi/2.)

Anyway, I wouldn't want to add such a function to the standard library.
A specialized numerical library could provide one, however.

BTW, does anybody know whether the NAG library of numerical algorithms
has been translated into C?  This surely would make a great standard!
Thesis: a language standard should not incorporate standards
	for numerical functions.  It should only provide the raw
	material to allow a numerical library to be written.
(I believe the ANSI Committee is well aware of this, and only defines
the most basic numerical functions.  If only compiler writers would
provide decent implementations of these... (and of printf %e/%f/%g).)

--
	Guido van Rossum, "Stamp Out BASIC" Committee, CWI, Amsterdam, Holland
	guido@mcvax.UUCP

"Life is like a sewer.  What you get out of it, depends on what you put
 into it."

quiroz@rochester.UUCP (Cesar Quiroz) (11/12/84)

>(from a recent posting :) 
> I find that frequently I need both the quotient and remainder of
> two integers:	a / b	a % b
> Since most (maybe all) machine architectures compute these at the
> same time, it sure would be nice to be able to get both results
> rather than wastefully reexecuting precisely the same instructions
> a second time.  This becomes particularly bothersome when "a" & "b"
> are fairly complicated expressions.  I have no idea what a good
> syntax for such an operation would be.
> 
> It would also be nice if sin( x ) and cos( x ) could be computed
> simultaneously with reduced cost.  I doubt if this is possible
> but would like to know if it is.

Both are reasonable expectations, but seem to have a place in a standard
library rather than in the syntax of the language.  

The sin/cos simultaneous computing is something you can  program in C
completely.  Getting both quotient and remainder at the same time cannot
(at least not without incurring a procedure call, which denies the
advantage) and you may have a point if you expect the compiler to 
recognize something like:

intdiv(a,b,&q,&r);

and expand it in line.  However, it may be too much additional overhead
for the gains.  In such a case, you may feel forced to write assem...
(no I didn't mean to type that!).

Cesar

alan@apollo.uucp (Alan Lehotsky) (11/14/84)

Regarding the suggestion of being able to get SIN and COS or (DIV and REM) at
the same time - all you need is a good compiler!  

    I believe that the VAX/VMS FORTRAN compiler looks for computations of SIN(X)
and COS(X) and calls a special entrypoint into the run-time-library which returns
BOTH values at once.  Similarly, I designed (but never implemented) a modification
to the BLISS compiler which would have done the same thing for DIV and REM - I don't
know if this was actually finished after I left DEC.

    By the way, I'm unaware of the existance of any "good" C compilers.  [ A "good C
    compiler"  is definitely an oxymoron!]


<< Put down that flamethrower!!>>
Al Lehotsky

andrew@orca.UUCP (Andrew Klossner) (11/14/84)

	"It would also be nice if sin( x ) and cos( x ) could be
	computed simultaneously with reduced cost.  I doubt if this is
	possible but would like to know if it is."

Sure:

	sine = sin(x);
	cosine = sqrt(1.0 - sine*sine);

Sqrt() is lots cheaper than cos().

:-)

  -- Andrew Klossner   (decvax!tektronix!orca!andrew)       [UUCP]
                       (orca!andrew.tektronix@csnet-relay)  [ARPA]

geoff@desint.UUCP (Geoff Kuenning) (11/15/84)

In article <5715@brl-tgr.ARPA> Doug Gwyn <gwyn@brl-tgr.ARPA> writes:

>I find that frequently I need both the quotient and remainder of
>two integers:	a / b	a % b
>Since most (maybe all) machine architectures compute these at the
>same time, it sure would be nice to be able to get both results
>rather than wastefully reexecuting precisely the same instructions
>a second time.  This becomes particularly bothersome when "a" & "b"
>are fairly complicated expressions.  I have no idea what a good
>syntax for such an operation would be.

This is the business of the optimizer, not the programmer.  Even a peephole
optimizer should be able to handle:

	x = a / b;
	y = a % b;

and any optimizer that can handle common subexpressions can take care of this
even when a and b are complex expressions.

C already has too many "features" that are basically ways for the
programmer to compensate for poor optimizers.  Let's move into the 20th
century here.

>It would also be nice if sin( x ) and cos( x ) could be computed
>simultaneously with reduced cost.  I doubt if this is possible
>but would like to know if it is.

It is.  The obvious way is to make use of the identity

    sin (x) == sqrt (1 - cos (x) * cos (x))

which can be computed slightly faster than sin(x) on some architectures.
I think that there are also numerical algorithms that generate both
functions at once, though I am out of my field here.  I know I have run into
a routine (in the Evans and Sutherland Picture System library?) named 'sincos',
which returned both values at once for use in rotation calculations, but it
may have been fixed-point.
-- 

	Geoff Kuenning
	First Systems Corporation
	...!ihnp4!trwrb!desint!geoff

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/16/84)

>> 	It would also be nice if sin( x ) and cos( x ) could be
>> 	computed simultaneously with reduced cost.  I doubt if this is
>> 	possible but would like to know if it is.

> 	sine = sin(x);
> 	cosine = sqrt(1.0 - sine*sine);

I wish people would check things out before posting them.  On our
4.2BSD system, the sqrt() approach is actually SLOWER than using
cos().  (Under UNIX System V it is very slightly faster.)  Now,
if I wanted cos(x)^2, I would certainly use 1 - sin(x)^2, which
I already knew about (I too went to high school).

What I had in mind was more along the lines of a CORDIC algorithm
or some other obscure but useful approach.

john@x.UUCP (John Woods) (11/16/84)

>Regarding the suggestion of being able to get SIN and COS or (DIV and REM) at
>the same time - all you need is a good compiler!  
> 
>     By the way, I'm unaware of the existance of any "good" C compilers.
>     [ A "good C compiler"  is definitely an oxymoron!]

When I worked at Lincoln Labs, the performance of the UNIX (4.1) f77 was
compared against the VMS FORTRAN, in (I believe) an application for digital
signal processing.  VMS FORTRAN came out (as I recall) as twice as fast as
the f77 compiler (resulting code, not speed of compilation).

A 4.1BSD C version of the same program was 10% faster than the VMS FORTRAN.

The result is left to the reader as an exercise. :-)
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1114
...!decvax!frog!john, ...!mit-eddie!jfw, jfw%mit-ccc@MIT-XX.ARPA

If your puppy goes off in the next room,
is it because of the explosive charge?		[y][n]

john@x.UUCP (John Woods) (11/16/84)

> 
> 	"It would also be nice if sin( x ) and cos( x ) could be
> 	computed simultaneously with reduced cost.  I doubt if this is
> 	possible but would like to know if it is."
> 
On a more serious note than my last message, how about:

circular( value, sinptr, cosptr) double value, *sinptr, *cosptr;
{
	/* compute both and store, saving function overhead and sharing
	   some commonality of code */
}

It's easy.  Buggering the compiler to do it solves *one* case fast, at the
expense of still having garbage code.  Reminds me of compiler writers who
detect the famous FORTRAN "Whetstone" benchmark and output hand-written
assembly code for the whole thing, so as to achieve artificially high scores..
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1114
...!decvax!frog!john, ...!mit-eddie!jfw, jfw%mit-ccc@MIT-XX.ARPA

If your puppy goes off in the next room,
is it because of the explosive charge?		[y][n]

ken@turtlevax.UUCP (Ken Turkowski) (11/16/84)

> >It would also be nice if sin( x ) and cos( x ) could be computed
> >simultaneously with reduced cost.  I doubt if this is possible
> >but would like to know if it is.
> 
> It is.  The obvious way is to make use of the identity
> 
>     sin (x) == sqrt (1 - cos (x) * cos (x))
> 
> which can be computed slightly faster than sin(x) on some architectures.
> I think that there are also numerical algorithms that generate both
> functions at once, though I am out of my field here.  I know I have run into
> a routine (in the Evans and Sutherland Picture System library?) named
> 'sincos', which returned both values at once for use in rotation
> calculations, but it may have been fixed-point.
> 
> 	Geoff Kuenning

There is a class of algorithms, called CORDIC (for COordinate Rotation
DIgital Computer), which calculates sine and cosine simultaneously with
linear convergence in fixed point with no multiplications.  There are
one test, two shifts, and three adds per bit of precision.

It is very easily written entirely in C (except for frmul(), a simple
fractional multiplication).

The original paper is:

    Volder, Jack E.
    "The CORDIC Trigonometric Computing Technique",
    IRE Trans. on Electronic Computers,
    Sept. 1959, pp.330-334

Extension to other functions (multiplication, division, sin, cos, tan,
arctan, sinh, cosh, tanh, arctanh, ln, exp, and square root) is covered
in:

    Walther, J.S.,
    "A Unified Algorithm for Elementary Functions",
    Proc. AFIPS 1971 Spring Joint Computer Conference,
    pp. 379-385

Theory of CORDIC generalizations is discussed in:

    Chen, Tien Chi,
    "Automatic Computation of Exponentials, Logarithms, Ratios,
    and Sqaure Roots",
    IBM J. Res. Develop.,
    July 1972, pp. 380-388
-- 
Ken Turkowski @ CADLINC, Palo Alto, CA
UUCP: {amd,decwrl,flairvax,nsc}!turtlevax!ken
ARPA: turtlevax!ken@DECWRL.ARPA

david@imd.UUCP (11/18/84)

>        sine = sin(x);
>        cosine = sqrt(1.0 - sine*sine);
>Sqrt() is lots cheaper than cos().

But is it the positive or negative root of the sqrt?
By the time that was figured out, one probably could have called cos().

ken@turtlevax.UUCP (Ken Turkowski) (11/18/84)

> >> 	It would also be nice if sin( x ) and cos( x ) could be
> >> 	computed simultaneously with reduced cost.  I doubt if this is
> >> 	possible but would like to know if it is.
> 
> > 	sine = sin(x);
> > 	cosine = sqrt(1.0 - sine*sine);
> 
> I wish people would check things out before posting them.  On our
> 4.2BSD system, the sqrt() approach is actually SLOWER than using
> cos().  (Under UNIX System V it is very slightly faster.)  Now,
> if I wanted cos(x)^2, I would certainly use 1 - sin(x)^2, which
> I already knew about (I too went to high school).
> 
> What I had in mind was more along the lines of a CORDIC algorithm
> or some other obscure but useful approach.

In Tom Duff's SIGGRAPH '84 tutorial on Numerical Methods for Computer
Graphics, he writes (quote until end of article):

Calculating Euclidean distance probably accounts for 90% of the square
roots expended in computer graphics applications.  Aside from the
obvious geometric uses, most illumination models require that the unit
normal to each surface be computed at each pixel.

[Moler-Morrison, "Replacing Square Roots by Pythagorean Sums", IBM
Journal of Research and Development", vol. 27, no. 6, pp. 577-581, Nov.
1983] describes an algorithm for computing sqrt(a^2+b^2) which is fast,
robust and portable.  The naive approach to this problem (just writing
sqrt(a*a+b*b) ) has the problem that for many cases when the result is
a representable floating point number, the intermediate results may
overflow or underflow.  This seriously restricts the usable range of a
and b.  Moler and Morrison's method never causes harmful overflow or
underflow when the result is in range.  The algorithm has _____cubic
convergence, and depending on implementation details may be even faster
than sqrt(a*a+b*b).  Here is a C function that implements their
algorithm:

double hypot(p,q)
double p,q;
{
    double r, s;
    if (p < 0) p = -p;
    if (q < 0) q = -q;
    if (p < q) { r = p; p = q; q = r; }
    if (p == 0) return 0;
    for (;;) {
	r = q / p;
	r *= r;
	s = r + 4;
	if (s == 4)
	    return p;
	r /= s;
	p += 2 * r * p;
	q *= r;
    }
}

This routine produces a result good to 6.5 digits after 2 iterations,
to 20 digits after 3 iterations, and to 62 digits after 4 iterations.
In normal use you would probably unwind the loop as many times as you
need and omit the test altogether.  Moler and Morrison's paper
describes the generalization of this algorithm to n dimensions, and
exhibits a related square root algorithm which also has cubic
convergence.  [Dubrelle, "A Class of Numerical Methods for the
Computation of Pythagorean Sums", IBM Journal of Research and
Development, vol. 27, no. 6, pp. 582-589, Nov. 1983] gives a geometric
justification of the algorithm and presents a set of generalizations
that have arbitrarily large order of convergence.
-- 
Ken Turkowski @ CADLINC, Palo Alto (soon Menlo Park), CA
UUCP: {amd,decwrl,flairvax,nsc}!turtlevax!ken
ARPA: turtlevax!ken@DECWRL.ARPA

grunwald@uiucdcsb.UUCP (11/19/84)

Actually, one of the cute things about the CORDIC method is that it computes
the sin/cos by rotating a vector.

It turns out that you can use the CORDIC methods to rotate vectors for graphics
without needing to compute the Sin/Cos at all. But this does mean 3 adds
and shifts per bit for every vector rotated. A multiplication takes only 1 add
and shift per bit, but you need to do four multiplies for rotation. A classmate
made a VLSI chip to do just this in a VLSI design class I took.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/20/84)

> > 	"It would also be nice if sin( x ) and cos( x ) could be
> > 	computed simultaneously with reduced cost.  ...
> 
> It's easy.  Buggering the compiler to do it solves *one* case fast, at the
> expense of still having garbage code.

Sorry, John, but you missed the point of my inquiry.  I was not
suggesting that the compiler should have anything to do with this.
I want a speedy algorithm that computes both functions simultaneously.
The majority of the time spent is NOT in the function linkage but
rather in the guts of the computation (which is usually range
reduction followed by a rational approximation).

The reason why this matters should be pretty obvious to anyone who
does graphics programming for a living.

Mike Niswonger <CNISWONGER@SIMTEL20.ARPA> (11/21/84)

Doug,
	I have run into the same situation as you in often needing both
sin and cos in doing coordinate transformations.  However, after looking
at the algorithims presently used, only a small amount of time could be
saved.
	Most commercial FORTRAN compilers use a 4 or 5 term Chebychev app-
roximation.  Unfortunately, odd powers are used for sine and even terms
are used for the cosine (or is it the other way around?? never could re-
member).  The only actual duplicated code would be where the input angle
is reduced ("unwrapped") to its reference angle.  On the machine that I
normally use, a Gould 32/8780, an SIN or a COS take about 7.5 uS, with
a SQRT taking about 8 uS.  These times are including the FORTRAN overhead.
	>>sigh!!<< With something that simple, I decided not to shave any
more - it wasn't worth the effort.		-- Mike
.

-------

dik@turing.UUCP (11/30/84)

>The reason why this matters should be pretty obvious to anyone who
>does graphics for a living.
In that case it is simple.  You do not need large parts of the fancy stuff
executed by standard sine and cosine routines; e.g. you will not need a
16 digit precision (as delivered by the standard Vax Unix sine and cosine).
Moreover, you may make quite sure that the argument to these functions
is within a range like [-pi,pi], so range reduction is also much simpler
to do.

It is in general a pain that all standard functions in C deliver a result
in double precision (what?, oh yes, the attempt is made) although you need
it in single precision, which results in a waste of time.
-- 
dik t. winter
centrum voor wiskunde en informatica
postbus 4079
1009 AB  amsterdam
nederland
+31 20 592 4102 (polish your dutch)
UUCP: decvax!mcvax!turing!dik

dik@turing.UUCP (11/30/84)

>					A multiplication takes only 1 add
> and shift per bit, .....

Oh yes, but you can have a lot of parallelism there, so the time needed
for a 32*32 bit multiply is comparable to about 5 adds.  In contrast, the
CORDIC methods allow also some parallelism, but the timing is still
one bit in about one add time.

-- 
dik t. winter
centrum voor wiskunde en informatica
postbus 4079
1009 AB  amsterdam
nederland
+31 20 592 4102 (polish your dutch)
UUCP: decvax!mcvax!turing!dik