aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) (09/30/89)
(Is this a good place for computer arithmetic questions?) I just read a paper by Deming in the 8th Computer Arithmetic Symposium, which basically deflates some of the proposed alternate arithmetics, such as level index (where a floating point number consists of three fields: pointer, exponent, and mantissa, and the pointer indicates how many bits are "borrowed" from the mantissa to expand the exponent and prevent overflow or underflow) or what I call the meta-exponential form, where a number is expressed as exp(exp(...(exp(m)))), and you store the number of exponentiations necessary to bring the value m within the desired range. Level index basically has a much larger range than normal FP, so overflow more rarely, while the meta-exponential form can be shown to be closed under +-*/ for a given number of bits in the representation. IE. both of these representations trade range for reduced relative precision at the extrema of the range. Deming shows how this tradeoff moves the complexity of coding reliable numerical software from avoiding overflow, to handling roundoff. IE. reduced precision makes rounding error analysis more complicated. QUESTION: Don't the same arguments apply to IEEE Floating Point with denormalized numbers? Ie. don't denormalized numbers complicate roundoff error analysis in the same way reduced precision complicates the other arithmetics? Deming suggests a sticky register which tracks the least relative precision ever used in the calculation of intermediate results, which will give you worst-case rounding error. Would such a register be worthwhile tracking the most extremely denormalized IEEE FP number encountered? Does anyone do this sort of thing? -- Andy "Krazy" Glew, Motorola MCD, aglew@urbana.mcd.mot.com 1101 E. University, Urbana, IL 61801, USA. {uunet!,}uiucuxc!udc!aglew My opinions are my own; I indicate my company only so that the reader may account for any possible bias I may have towards our products.
ccplumb@rose.waterloo.edu (Colin Plumb) (10/03/89)
(I'm going a bit out on a limb, since someone with more experience than I may prove all my ideas total nonsense, but this is what I learned in conversation with a member of the IEEE 754 standards committee.) In article <AGLEW.89Sep29135055@chant.urbana.mcd.mot.com> aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) writes: > Deming shows how this tradeoff moves the complexity of coding > reliable numerical software from avoiding overflow, to handling > roundoff. > IE. reduced precision makes rounding error analysis more > complicated. True. There are some really useful axioms binary floating-point obeys that, say, IBM-style base-16 stuff doesn't. E.g. a+b/2 lies on the closed interval between a and b. In base-16, of a and b both have the high bit of the high 4-bit digit set, then adding them causes you to shift by 4 bits at once, dropping 3 off the bottom. Dividing by 2 causes 0's to be shifted in and makes a mess of things. This can happen anywhere you can lose more than 1 bit of mantissa in an addition step, such as variable-size exponent encodings. > QUESTION: > Don't the same arguments apply to IEEE Floating Point with > denormalized numbers? Ie. don't denormalized numbers complicate > roundoff error analysis in the same way reduced precision complicates > the other arithmetics? Surprisingly,. no... they improve things! I've seen a letter from no less eminent an authority than Our Lord Knuth retracting his opposition to denormalised numbers. This is because denormalised numbers let you add and subtract near the lower end of the expressible range without losing absolute precision. Consider a representation without denormalised numbers. There is some minimum exponent, 2^-min, which can be multiplied by a mantissa from 1.111...111 to 1.000...000. The difference made by a 1 in the least significant bit of the mantissa is 2^-min*0.000...001, 2^-(min+mantsize). You can add and subtract a lot of these units, but once you get down to 1x2^-min, the jump is 2^-min all the way down to zero. Rather annoying! Denormalised numbers let you express the difference between any two representable numbers with as much absolute accuracy as the least precise of the inputs. Rather useful for fiddling with the last few bits of error term in some messy polynomial approximation or whatever. > Deming suggests a sticky register which tracks the least relative > precision ever used in the calculation of intermediate results, which > will give you worst-case rounding error. > Would such a register be worthwhile tracking the most extremely > denormalized IEEE FP number encountered? > Does anyone do this sort of thing? I don't know... generally those who are really concerned about such things do interval arithmetic, keeping two answers at all stages which the true answer is guaranteed to lie between. There are problems with covariance (even if x is x+/- epsilon ,x-x is exactly zero, not 0+/-2*epsilon), but if provides good worst-case bounds. Addition and multiplication do rather different things to error bounds. For the former, an absolute error bound is best; for the latter, a relative error. Mixing the two leads to all sorts of messy analysis. This is one of the reasons that specifying the rounding mode in the instruction rather than a mode register is A Good Thing. -- -Colin
aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) (10/03/89)
A while back I posted concerning the following paper: "On Error Analysis in Arithmetic with Varying Relative Precision", James W. Demmel (Courant), Proc 8th Symp Comp Arith, 1987, pp. 148-152. (I incorrectly named the author as Deming, instead of Demmel. My apologies to both Messrs. Deming and Demmel) I asked whether the arguments against varying relative precision for level-index types of arithmetic do not also apply to denormalized numbers in IEEE floating point. I have received many answers, both by email and news, of the form "Of course not - denorms preserve absolute precision on +/-". This is true enough. But isn't it also true that denorms lose relative precision? Eg. If I compute (x-y)*z and x-y produces a denorm, then, instead of relative precision related to 1/2^M, where there are M bits in the mantissa, do you not have relative precision related to 1/2^D, where there are D valid bits in the denormalized difference. In fact, if you permit denorm-denorm, might you not be reducing relative accuracy to 1/2 (since you can conceivably have only one significant denormalized bit)? Note that I am not pushing the alternative, which would be to make (x-y)*z => 0*z => 0, which may have much lower relative precision. If anything, I was wondering whether the sticky precision register would be useful. -- Andy "Krazy" Glew, Motorola MCD, aglew@urbana.mcd.mot.com 1101 E. University, Urbana, IL 61801, USA. {uunet!,}uiucuxc!udc!aglew My opinions are my own; I indicate my company only so that the reader may account for any possible bias I may have towards our products.
dik@cwi.nl (Dik T. Winter) (10/04/89)
In article <AGLEW.89Oct3123439@chant.urbana.mcd.mot.com> aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) writes: > Eg. > If I compute (x-y)*z > and x-y produces a denorm, > then, instead of relative precision related to 1/2^M, > where there are M bits in the mantissa, > do you not have relative precision related to 1/2^D, > where there are D valid bits in the denormalized difference. This is only true if the multiplication produces a denorm too. Note that if underflow to zero occurs the relative precision would be 0. Numerical analysis has always to take care of underflow and near underflow. But the use of denorms simplifies matters in a number of cases (and does not make it more difficult in other cases). The major advantage of denorms is, in my opinion, that statements like: if(x-y != 0.0) z = x/(x-y); will not trap with a divide by zero trap but gives an (approximately) correct result. -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax