brooks@lll-crg.UUCP (Eugene D. Brooks III) (09/25/85)
With regards to the performance of the Caltech C compiler with single precision floating point operations. This compiler implemented both single and double precision register variables and delivered quite a substantial speed improvement over the standard Unix compiler. I recorded speed increases of up to a factor of 2.5 for some vector operations implemented as unrolled loops in C. register float *a, *b, *c; int dim; dim /= 8; do { *a++ = *b++ + *c++; *a++ = *b++ + *c++; *a++ = *b++ + *c++; *a++ = *b++ + *c++; *a++ = *b++ + *c++; *a++ = *b++ + *c++; *a++ = *b++ + *c++; *a++ = *b++ + *c++; while(dim-- > 0); With the apropo handling of the dim%8 part, went like the devil. Other operations, dot product, etc obtained similar speed improvements. Inspection of the resulting assembler code showed that one could not do better by writing in assembler.