spies@eastrg2.cray.com (John Spies) (09/27/90)
Tim, Try the function ISMAX. It's documented in the unicos math and scientific library reference(sr-2081) manual and also by doing man ismax. It should be callable from c and most of those routines are fairly well optimized by our developement group. J.S.
desj@idacrd.UUCP (David desJardins) (09/28/90)
From article <16279@thorin.cs.unc.edu>, by cullip@vangogh.cs.unc.edu (Timothy Cullip): > I'm reasonably new to the Cray and am trying to do something > simple: find the maximum element in an array. You have picked a very interesting example to start off with. The code you submit is actually nearly optimal (asymptotically, it is optimal, if compiled correctly) in the case of a randomly-ordered array. Note that it can easily be modified to return the location of the maximum as well as the value by saving it when it occurs. The alternative is to use the Cray library routine ISMAX(3SCI). This has both advantages and disadvantages over your routine. The big disadvantage is that it will be a factor of two-plus slower in the randomly-ordered case (as I discovered the first time I tried to use it in an application where I cared about speed). It uses a search algorithm which is a common paradigm for Cray programming. Generate a vector with the first 64 elements of your array. Then read the next 64 elements; compare each to the corres- ponding element of your vector, and choose the larger. Repeat until your array is exhausted. One of the 64 elements of your vector will be the maximum. Of course, you have to deal with fragments which are not multiples of 64, and further it is not trivial to write efficient high-level code for this algorithm (the actual library routine is written in assembler). It takes only a little knowledge of Cray hardware to realize that the above algorithm, when perfectly optimized, is about two times slower than your algorithm, as long as you stay in your inner loop. Your code just does one-plus chimes of work on each vector, setting the vector mask and then testing it for nonzero (the memory fetch and floating subtract should be overlapped or chained); the Cray code does two-plus chimes, setting the vector mask and then always a vector merge. The "plus" is then to go back and find the maximum, which requires looking again at every 64th element. If you want to be clever you can avoid a stride of 64 here either by setting the vector length to 63 or (more complicated) by using a nonunit stride near n/64 in the first pass. In the random-long-vector case, therefore, your code will be much better. The break out of the vector loop only occurs, on average, about lg n times out of n/64 vector iterations, which is asymptotically negligible. The Cray code has the distinct advantage, however, that its worst case is the same as its average case. Your worst case is incredibly bad; if the array happens to be monotonically increasing, I would expect your code to be hundreds of times slower than in the random case. You may also pay more overhead than you like when the array is fairly short. (Solving this problem efficiently on short arrays is difficult. But in general you should almost never attack a problem this way in a time-critical application with short arrays, but rather find 64 maxima in parallel.) It might make sense to merge the two algorithms by implementing the Cray algorithm but with a check for zero-vector-mask. This would mean that in the case that none of the 64 array elements is greater than the corresponding vector element (still "almost always" in the asymptotic random case), that a single chime would suffice, while the worst case would not be much worse than the the existing Cray code. Also, of course, if you want only the value of the maximum and not its location, a version of the Cray code without that feature would be slightly faster (this is more significant on a Cray-2 than on an [XY]-MP, because the stride-64 problem is greater). -- David desJardins
hennebry@plains.NoDak.edu (Michael J. Hennebry) (10/06/90)
Try something like this: float findmax(a, size) float a[]; int size; { int j, inc, inc2; for(inc=1, inc2=2; inc<size; inc=inc2, inc2<<=1) { for(j=0; j<size; j+=inc2) { if(a[j] < a[j+inc]) a[j]=a[j+inc]; } } return a[0]; } this works if size is a power of two and you don't mind destroying a. whether it is efficient depends on whether the compiler can tell that the inner loop is vectorizable. This depends on inc, inc2 > 0 another possibility is float findmax(a, work, size) float[], work[]; int size; { int j, inc, inc2; float *to, *from, *temp; from=a; to=work; for(inc=1, inc2=2; inc<size; inc=inc2, inc2<<=1, temp=from, from=to, to=temp) { for(j=0; j<size; j+=inc2) { if(from[j] < from[j+inc]) to[j]=from[j+inc]; else to[j]=from[j]; } } return from[0]; } Having two 'arrays' may cause the compiler to infer that the inner loop is vectorizable. -- Mike hennebry@plains.NoDak.edu "Megalomania is such a *wonderful* illness." -- Asmodeus Mogart