ciemo@bananapc.wpd.sgi.com.SGI.COM (Dave Ciemiewicz) (05/20/89)
To use pointers or array indexes, that is the question. The SGI (MIPS) compilers for the 4D are optimization monsters. The optimizer is "smart" enough to recognize special case use of array indexes and convert them to pointers internally. This removes the infamous multiply within your loop when using array indices. The fact of the matter is that there are cases where use of indexes will generate more efficient code than the corresponding code using pointers. The following is a C source file which demonstrates this. ----- optimizer.c ------------------------------------------------------------- /* 1 */extern v3f(float [3]); /* 2 */ /* 3 */dovertices_indexed(float v[][3], unsigned int len) { /* 4 */ register unsigned int i; /* 5 */ /* 6 */ for (i=0; i<len; i++) { /* 7 */ v3f(v[i]); /* 8 */ } /* 9 */} /* 10 */ /* 11 */dovertices_pointer(float v[][3], unsigned int len) { /* 12 */ register float ** p; /* 13 */ /* 14 */ for (p=v; p != v+len*3; p+=3) { /* 15 */ v3f(*p); /* 16 */ } /* 17 */} ----- optimizer.c ------------------------------------------------------------- "cc -O2 -c optimizer.c" will generate an optimized object file "optimizer.o". Use "dis optimizer.o" to disassemble the object code produced. The array indexed code generates the following innerbody of the loop. [test.c: 7] 0x38: 0c000000 jal v3f * [test.c: 7] 0x3c: 02202021 move a0,s1 [test.c: 8] 0x40: 2610000c addiu s0,s0,12 [test.c: 8] 0x44: 0212082b sltu at,s0,s2 [test.c: 8] 0x48: 1420fffb bne at,zero,0x38 * [test.c: 8] 0x4c: 2631000c addiu s1,s1,12 a0 is the argument the function call v3f. Yes it does get loaded *after* the jump to v3f is initiated. A jump takes more than a cycle to compute so there is a delay. A delay slot is left open for an executable instruction. There is enough time to load the argument after initiating the jump but before completing the jump. (Tricky huh?) s1 is the value of v[i]. s0 is the value of i*4. s2 is the value of len*4. Guess where the multiply went. 6 intructions were generated for the loop. The time to execute is 6 cycles. The pointer version generates the following innerbody of the loop. [test.c: 15] 0x98: 8e040000 lw a0,0(s0) [test.c: 15] 0x9c: 0c000000 jal v3f * [test.c: 15] 0xa0: 00000000 nop [test.c: 16] 0xa4: 2610000c addiu s0,s0,12 [test.c: 16] 0xa8: 1630fffb bne s1,s0,0x98 * [test.c: 16] 0xac: 00000000 nop a0 is the argument to the function call v3f. This time the loading of a0 is done from memory instead of a copy. This load has a delay slot associated with it that the jump fills. The jump delay slot is filled by a no-op. s0 is p. s1 is v+len*3. The final instruction is a no-op to fill the branch delay slot. Only 4 "real" instructions were generated for the innerloop. The 2 no-ops could be replaced by other instructions in a different code example. The time to execute for this example is the same as before, 6 cycles. However, there is a potential for a data cache miss with the load instruction that would be some number of cycles for the first load, depending on the CPU type. With the different setups involved for the two type of array traversals -- indexes versus pointers -- and the potentials for data cache misses, it is difficult to clearly say one style is superior to another. The advantages of one style or another is dependent on the instructions performed in the loop and even on the CPU executing the loop. From an asthetics point of view, I prefer the use of array indexes. Array indexes are easier to understand than pointers. My overall recommendation is to avoid language tricks when developing code. After the code is working, use the profiler for locating areas that might benefit from using the tricks. -- Dave (commonplace) "Boldly going where no one cares to go." Ciemiewicz (incomprehensible) ciemo (infamous)
tarolli@dragon.wpd.sgi.com (Gary Tarolli) (05/22/89)
In article <33239@sgi.SGI.COM>, ciemo@bananapc.wpd.sgi.com.SGI.COM (Dave Ciemiewicz) writes: > To use pointers or array indexes, that is the question. The SGI (MIPS) > compilers for the 4D are optimization monsters. The optimizer is "smart" > enough to recognize special case use of array indexes and convert them to > pointers internally. This removes the infamous multiply within your loop > when using array indices. The fact of the matter is that there are cases > where use of indexes will generate more efficient code than the corresponding > code using pointers. The following is a C source file which demonstrates > this. > > ----- optimizer.c ------------------------------------------------------------- > /* 1 */extern v3f(float [3]); > /* 2 */ > /* 3 */dovertices_indexed(float v[][3], unsigned int len) { > /* 4 */ register unsigned int i; > /* 5 */ > /* 6 */ for (i=0; i<len; i++) { > /* 7 */ v3f(v[i]); > /* 8 */ } > /* 9 */} > /* 10 */ > /* 11 */dovertices_pointer(float v[][3], unsigned int len) { > /* 12 */ register float ** p; > /* 13 */ > /* 14 */ for (p=v; p != v+len*3; p+=3) { > /* 15 */ v3f(*p); > /* 16 */ } > /* 17 */} > ----- optimizer.c ------------------------------------------------------------- If I am not mistaken, the second example should be "v3f(p);" and not *p. This will remove the offending load word instruction and the possibility of missing the cache and also make the second loop more efficient. However, I agree with Dave, arrays are generally as efficient as pointers and they are easier to read. Also, it makes sense to use arrays and then use prof to isolate your bottlenecks. However, there is one case where arrays are much faster than pointers on the MIPS cpu - and that is where you are unwinding an autoincremented pointer loop. For example, for (i=0; i<end; i+=4) { to[i] = from[i]; to[i+1] = from[i+1]; to[i+2] = from[i+2]; to[i+3] = frompi+3]; } is much better than while (to < end) { *to++ = *from++; *to++ = *from++; *to++ = *from++; *to++ = *from++; } because MIPS doesn't have an autoincrementing register instruction and the first case doesn't need to do the extra 2 adds per line of code. So sometimes it is better to use base+offset then *p++. Sometimes ... on certain machines .... when the moon is full ....