pmontgom@sonia.math.ucla.edu (Peter Montgomery) (12/17/89)
In article <28413@amdcad.AMD.COM> davec@cayman.amd.com (Dave Christie) writes: > >Quite true, but (IMHO) for most applications, int<->fp traffic is not very >high, so having to explicitly move some data around is no big deal ... as >long as it is done reasonably efficiently so that you don't blow one or two >applications out of the water. > I have written a multiple precision integer arithmetic package. Its time-critical routines are conditionally compiled so the algorithms can be individually optimized for different machines (yes, I resemble Herman Rubin in wanting full access to machine instructions from high-level languages). On some machines, almost all arithmetic is integer. Recently, while transporting my program to an Alliant, I vectorized some of these codes. A frequent primitive operation requires finding integers X, Y such that X*2**30 + Y = A*B + C where A, B, C are given (vectors of) integers up to 2^30 (2^30 is the radix for multiple-precision arithmetic). The Alliant has an vectorized integer multiply instruction, but only for the lower 32 bits of a product. Hence I can get Y = IAND(A*B + C, 2**30 - 1) with vectorized integer operations. To get X (upper half of product) using vector operations, I use floating point, such as X = NINT((DBLE(A)*DBLE(B) + DBLE(C - Y))*0.5**30) (DBLE = convert integer to double, NINT = convert double to nearest integer). This statement uses 4 conversions between integer and floating point while doing only 3 floating point operations. The vectorized DBLE is compiled inline, but the vectorized NINT is done in a subroutine; the NINT subroutine uses 10% of my program's time (but total program time is down from before vectorization). BTW, I have asked the X3J3 committee to add this operation (i.e., given A, B, C, D, all nonnegative, and either A < D or B < D, find X, Y such that X*D + Y = A*B + C) to Fortran 8x. -------- Peter Montgomery pmontgom@MATH.UCLA.EDU Department of Mathematics, UCLA, Los Angeles, CA 90024
davec@proton.amd.com (Dave Christie) (12/18/89)
In article <2100@sunset.MATH.UCLA.EDU# pmontgom@math.ucla.edu (Peter Montgomery) writes: #In article <28413@amdcad.AMD.COM> davec@cayman.amd.com (Dave Christie) writes: #> #>Quite true, but (IMHO) for most applications, int<->fp traffic is not very #>high, so having to explicitly move some data around is no big deal ... as #>long as it is done reasonably efficiently so that you don't blow one or two #>applications out of the water. #> # I have written a multiple precision integer arithmetic package. #Its time-critical routines are conditionally compiled so #the algorithms can be individually optimized for different machines #(yes, I resemble Herman Rubin in wanting full access to machine #instructions from high-level languages). On some machines, Yes, multiple precision seems to be one of the applications you may want to be careful about (Herman Rubin replied as well!). My rash generalization comes from a former life at a TLA mainframe company where I did detailed instruction frequency measurements of a wide range of real-world Fortran programs - such things as structural analysis, flight simulation, ECAD, many other flavors (but not likely multiple precision - notice I'm not using the words "representative" or "comprehensive" ;-) This indicated (among many other things) that the frequency of converts would not be a big factor in deciding whether to go with separate or shared registers, were one to design a new general-purpose ISA - other factors carry more weight. Of course, your milage may vary. What one sees fit to do generally depends on what one thinks will bring in the most bucks, right? #A frequent primitive operation requires finding integers X, Y such that #X*2**30 + Y = A*B + C where A, B, C are given (vectors of) integers #up to 2^30 (2^30 is the radix for multiple-precision arithmetic). #The Alliant has an vectorized integer multiply instruction, but #only for the lower 32 bits of a product. Hence I can get #Y = IAND(A*B + C, 2**30 - 1) with vectorized integer operations. #To get X (upper half of product) using vector operations, #I use floating point, such as # # X = NINT((DBLE(A)*DBLE(B) + DBLE(C - Y))*0.5**30) # #(DBLE = convert integer to double, NINT = convert double to nearest integer). #This statement uses 4 conversions between integer and floating point #while doing only 3 floating point operations. This sounds like a nice extreme case. Now I'm curious - does anyone know, or care to figure out (I'm not curious enough to take the time myself :-) whether the simultaneous integer/fp issue capability of the i860 would more than compensate for the penalty on converts caused by the separate int/fp registers, even on this extreme piece of code? # Peter Montgomery # pmontgom@MATH.UCLA.EDU # Department of Mathematics, UCLA, Los Angeles, CA 90024 ------------ Dave Christie The opinions are mine, the facts belong to everyone .... oops - sorry, the facts are proprietary.....