brooks@lll-crg.UUCP (Eugene D. Brooks III) (09/25/85)
With regards to the performance of the Caltech C compiler with single
precision floating point operations.
This compiler implemented both single and double precision register
variables and delivered quite a substantial speed improvement over
the standard Unix compiler.  I recorded speed increases of up to a
factor of 2.5 for some vector operations implemented as unrolled loops
in C.
	register float *a, *b, *c;
	int dim;
	dim /= 8;
	do {
		*a++ = *b++ + *c++;
		*a++ = *b++ + *c++;
		*a++ = *b++ + *c++;
		*a++ = *b++ + *c++;
		*a++ = *b++ + *c++;
		*a++ = *b++ + *c++;
		*a++ = *b++ + *c++;
		*a++ = *b++ + *c++;
	while(dim-- > 0);
With the apropo handling of the dim%8 part, went like the devil.  Other
operations, dot product, etc obtained similar speed improvements.  Inspection
of the resulting assembler code showed that one could not do better by
writing in assembler.