phipps@fortune.UUCP (Clay Phipps) (09/11/84)
> ... the floating point on the VAX ... makes perfect sense > when you look at what you can do with it. > The 32 and 64 bit floating point formats > are identical in the first 32 bits. > The 64 bit format just adds more precision bits. > This allows a user to reference a 64 bit floating point value > as if it were a 32 bit one without any conversion. > It also allows a user > to reference a 32 bit floating point value > as if it were a 64 bit floating point one without any conversion. Strange, this sounds like a description of the floating point formats on the IBM 360, an architecture that preceded the VAX by a few years (about 14 years, actually: 1964 versus 1978). Many numerical analysis types think that the IBM 360 floating-point is one of the worst. Among other things, I think many people believe that they are entitled to a few more exponent bits when they *double* the size of each data value (from 32 to 64 bits). The hexadecimal (i.e., 4 bits instead of 1 bit) normalization on the 360 reduces the effective precision of floating-point numbers and is another source of irritation to numerical analysis people; I'm not sure what the VAX does on this one. -- Clay Phipps -- { amd hplabs!hpda sri-unix ucbvax!amd } !fortune!phipps { ihnp4 cbosgd decvax!decwrl!amd harpo allegra}
haapanen@watdcsu.UUCP (Tom Haapanen [DCS]) (09/12/84)
The 360/370/30XX/43XX series IBM machines use hexadecimal normalization. This is the pits. The VAX uses binary normalization. This is MUCH better. Tom Haapanen University of Waterloo {allegra,decvax,ihnp4,utzoo}!watmath!watdcsu!haapanen ...thanks to CS375 (Numerical Analysis)...
dmmartindale@watcgl.UUCP (Dave Martindale) (09/12/84)
The VAX (whose F and D floating formats were copied from the PDP-11) has quite reasonable floating point - normalization is binary and the leading '1' bit is not stored, so it gets several bits more precision than the IBM three-sickly. And it takes some care to round properly. It has a relatively small exponent range, fixed in the G and H floating point types introduced a few years ago. I believe that it was proposed as one of the competing standards for the IEEE 754 floating-point standard. If you want to see a really interesting floating-point format, read up on what was eventually adopted for the IEEE 754 standard. It starts out similar to VAX floating point in representation, but adds gradual underflow of numbers too small to normalize, and representations for infinity and NaN (Not a Number - basically an undefined result). There is an extended-precision format defined for partial results in internal registers. Rounding can be done to the nearest integer, towards 0 (truncate), or towards plus or minus infinity. All the special cases of doing arithmetic on operands of zero, infinity, and NaN are specified, as are the actions if a result overflows or underflows or an operation is invalid. Lots of neat stuff. It is nice to see someone make a hardware design more complex in order to make the software simpler and more straightforward (and more likely to be correct).
rcb@rti-sel.UUCP (09/14/84)
The IBM (Itty Bitty Machines) floating point is NOT the same as the VAX floating point format. The IBM is hex based floating point and must be normalized such that the highest digit is not zero. The VAX floating point is binary based, which means that the highest normalized fraction digit must not be zero which means that it must be one. VAX takes advantage of this fact by not storing that bit at all. Also, any number that is a power or factor or 2 (i.e. 1, 2, 4, .5, .25, etc.) does not have any bits in the exponent and the floating point hardware can take advantage of this. A benchmark shows that 100 million floating multiplications takes 45 seconds when these special values are used and takes 2.5 minutes when any old numbers are used. And, for the numerical analysis types that want greater range, they have 2 options on the VAX. G format floating point uses 64 bits and gives a range of .56*10**-308 to .9*10**308 with 15 digits precision. The standard double floating gives .29*10**-38 to 1.7*10**38 with 16 digits precision. And for you people who really like big numbers, there is H format floating point which uses 128 bits to give a range of .84*10**-4932 to .59*10**4932 with a mighty 33 digits of precision. Enough to satisfy even the most manical numerical analyst. Randy Buckland Research Triangle Institute ...!mcnc!rti!rcb