self@BAYES.ARC.NASA.GOV (Matthew Self) (03/11/89)
This is on a sun3-os4, but the problem first appears in the dataflow RTL dump, so it is probably machine-independent. Consider the simple array copy function below which is suitable for strength reduction optimization: void copy (int *a, int *b, int n) { int i; for (i = 0; i < n; i++) b[i] = a[i]; } gcc -O yields: L11: movel a1@(d0:l:4),a0@(d0:l:4) addql #1,d0 cmpl d0,d1 jgt L11 gcc -O -fstrength-reduce successfully optimizes this to: L11: movel a0@+,a1@+ addql #1,d0 cmpl d0,d1 jgt L11 But look what happens as soon as the index variable is a short: void copy (int *a, int *b, short n) { short i; for (i = 0; i < n; i++) b[i] = a[i]; } gcc -O yields just about the same code as before: L11: movel a1@(d0:w:4),a0@(d0:w:4) addqw #1,d0 cmpw d0,d1 jgt L11 but this time gcc -O -fstrength-reduce fails to optimize this any further. This may be hard to solve, since two iteration variables won't necessarily remain in lockstep unless they overflow in the same way. For example, if we have an array with 2^16 elements, we might rely on the unsigned short index variable wrapping around. Strength reduction could easily introduce subtle bugs in this case. What we really need to solve this problem is enough smarts to recognize that this loop will run exactly n times, with i taking on the values 0 to n-1. Then we can verify that overflow won't occur (since n is also a short), and thus strength reduction is OK. This would be useful for other reasons, since we might then be able to make use of cheaper tests or funny decrement and branch instructions by shifting the loop to run from n-1 to 0. We might also be able to reverse the order of loops so as to use post-increment rather than pre-decrement addressing if that is faster (or available at all). I am interested in working on this problem, if someone who is more familiar with the dataflow code also wants to.... --Matthew