stephen@estragon.uchicago.edu (Stephen P Spackman) (09/05/90)
Please excuse the quasi-flamage.... Am I really dense, or have I completely missed the point? Why are we burning silicon on floating point arithmetic when we could have fast 128-bit INTEGER arithmetic? Why do all the great arguments for RISC suddenly evaporate when FP looms? Seems to me that 90% of the floating point code I've seen had a dynamic range that was low *and potentially known to the compiler*; it could have been directly compiled into scaled integers. Of what's left, much was pretty weird stuff with unpredictable behaviour and the exponent in fixed format FP was not big ENOUGH and maybe it should have been broken out into a separately specifiable (and independently computed) exponent ([int, long]float, you see). Ok, so we can do the full analysis; maybe there're a couple of normalisation-shifts that deserve instructions, but I am *so tired* of having the silicon on my PC and my workstation wasted on floating point when all I want is Emacs and Unix and X to be tolerably fast (which on a Sparc they aren't). Sniff. Sorry about that. But it really does seem to me that floating point is the most INCREDIBLY arcane and domain-specific hack, it has nothing approaching the utility of arbitrary-precision integer arithmetic, bitblt, finite-field arithmetic, graph-rewrite, unification or just more registers. Of course, I'm not into weather-prediction. Then again, I only ever met one programmer who was. Somehow it seems to me that we are being led by the nose by marketing types and the ghosts of languages past. stephen p spackman stephen@estragon.uchicago.edu 312.702.3982
amull@Morgan.COM (Andrew P. Mullhaupt) (09/05/90)
In article <STEPHEN.90Sep5000536@estragon.uchicago.edu>, stephen@estragon.uchicago.edu (Stephen P Spackman) writes: > Please excuse the quasi-flamage.... > > Am I really dense, or have I completely missed the point? Why are we > burning silicon on floating point arithmetic when we could have fast > 128-bit INTEGER arithmetic? Why do all the great arguments for RISC > suddenly evaporate when FP looms? I can only answer your second question. > Seems to me that 90% of the floating point code I've seen had a > dynamic range that was low *and potentially known to the compiler*; it It would seem you haven't seen much code for scientific computation, or many subroutine libraries. If I compile say, and inner product subroutine am I then supposed to go around remembering what scaling is appropriate and what length vectors it can handle? > could have been directly compiled into scaled integers. Of what's > left, much was pretty weird stuff with unpredictable behaviour and the > exponent in fixed format FP was not big ENOUGH and maybe it should > have been broken out into a separately specifiable (and independently > computed) exponent ([int, long]float, you see). Ok, _you_ might consider Linpack pretty weird stuff but I think this would put you in a decided minority. > Ok, so we can do the full analysis; maybe there're a couple of > normalisation-shifts that deserve instructions, but I am *so tired* of > having the silicon on my PC and my workstation wasted on floating > point when all I want is Emacs and Unix and X to be tolerably fast > (which on a Sparc they aren't). Sniff. I keep my emacs and X on a Sun 3 where they belong. The RISC machines have better things to do than message passing. > Of course, I'm not into weather-prediction. Then again, I only ever > met one programmer who was. NCAR is actually a large place. Weather prediction is a billion dollar problem. Damage due to severe storms, floods, droughts is usually in the billions _every_ year. Our best attempts in this regard still fall short, but you can't argue that they're not worth it. It may be possible to predict El Nino, (that's what my old Ph.D. advisor is into these days...) and there is a lot of California exposed to this one. Naturally, a billion dollar problem can justify quite a few programmers. > Somehow it seems to me that we are being led by the nose by marketing > types and the ghosts of languages past. There are a lot of ghosts put into the market, mostly by marketing types. When we buy a machine, we find out if it can run our proprietary software, and if so, how fast. Floating performance is a big piece of the picture, along with our need for massive I/O throughput and our evaluation of the operating system. Sure, we need good integer performance, but we're not even remotely likely to consider replacing floating point with fixed arithmetic. It's a bad trade off for us on any of the present machines. Later, Andrew Mullhaupt Disclaimer: opinions expressed here are my own.
aglew@crhc.uiuc.edu (Andy Glew) (09/07/90)
>Is it so absurd to suggest, in sum, that exposing separate mantissa >and exponent to the optimiser might result in *speedup* due to >constant propagation and expression-rearrangement, while at the same >time increasing expressivity by allowing an INDEPENDENT choice of >mantissa and exponent sizes? Someone at MIT(?) (MS thesis?) had a paper on "Micro-optimization of floating point" in a conference within the last few years. Note that some of the same ideas can be applied to hardware: for example, you can combine the normalization post-shift with the alignment pre-shift for forwarded FP operands, saving a cycle. -- Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]
stephen@estragon.uchicago.edu (Stephen P Spackman) (09/07/90)
Let me try to rephrase my question in a less controversial form. And please understand that I *do* appreciate the importance of weather forecasting, and te fact there there is a genuine *need* for supercomputing for that and all the other applications that involve the simulation of large dynamical systems. But it does seem to me that most cycles really do go on operating systems and user interface, and putting integrated floating point into a silly little workstation like a Sparc or an 80486 machine is serious overkill (and I'm almost serious: these machines aren't fast enough for editing anymore) and a poor use of gates. Especially since, I conjecture, emulation could theoretically be faster. So here's the restatement: SUPPOSE that hardware floating point were NOT a given. Suppose we were completely free to choose how to support the user. But users have these needs involving non-integral numbers. What should we do? Well, any fast conventional architecture will have good support for non-negative integers, at least - they're needed for addressing. And it'll have good support for function call - that's needed to execute code. So the question is, what ELSE do we have to put in to get good coverage? All I'm thinking is that an FPU *may not* be the best answer. It's 1990 now; we can rely on the compiler. Language-driven architecture is dead (though language-tuned architecture is another story). Semantic gap is, if not a myth, then a strength - compilers need elbowroom in which to optimise. Is it so absurd to suggest that outside of the supercomputer market, scaled integers might honestly be a better solution for the problems that need solving (assuming that there is proper compiler support for them, and it isn't all hand-coded at every step)? Is it so absurd to suggest that there might be PARTS of floating point instructions that, in the hands of a good optimiser, might be used to generate better code than their wholes (remembering the VAX CALLS linkage... :-)? Is it so absurd to suggest, in sum, that exposing separate mantissa and exponent to the optimiser might result in *speedup* due to constant propagation and expression-rearrangement, while at the same time increasing expressivity by allowing an INDEPENDENT choice of mantissa and exponent sizes? Is it so absurd to suggest that the effort and the silicon that go into an FPU might be better spent on supporting some other datatype that is of more _general_ applicability? Maybe this has all been tried and I just haven't heard about it. If *that's* the case, please point me at some references. But I have the distinct impression that we're coasting along on research that was done thirty years ago or more, and that may need to be updated in the light of changing technology. stephen p spackman stephen@estragon.uchicago.edu 312.702.3982
billms@dip.eecs.umich.edu (Bill Mangione-Smith) (09/07/90)
In article <AGLEW.90Sep6233218@dwarfs.crhc.uiuc.edu> aglew@crhc.uiuc.edu (Andy Glew) writes: > >Someone at MIT(?) (MS thesis?) had a paper on "Micro-optimization of >floating point" in a conference within the last few years. William Dally, ASPLOS III. >Andy Glew bill -- ------------------------------- Bill Mangione-Smith billms@eecs.umich.edu
khb@chiba.Eng.Sun.COM (Keith Bierman - SPD Advanced Languages) (09/08/90)
In article <STEPHEN.90Sep6215928@estragon.uchicago.edu> stephen@estragon.uchicago.edu (Stephen P Spackman) writes:
... All I'm thinking is that an FPU *may not* be the best answer. It's
1990 now; we can rely on the compiler. Language-driven architecture is
dead (though language-tuned architecture is another story). Semantic
gap is, if not a myth, then a strength - compilers need elbowroom in
which to optimise.
Is it so absurd to suggest that outside of the supercomputer market,
...
It isn't absurd to think about it. The literature contains such
thoughts over the years, going back at least 15 years. Probably more.
The consensus view of those who cast the little bugger though, has so
far been that this isn't a good idea.
Often folks employ the heuristic that any instruction which gets used
frequently, say 3+% of the time has certainly earned its keep. FP
instructions satisfy that. There are all sorts of other data points
also.
Go, formalize your proposal, gather statistics from "real" programs
(spec, perfect club, US steel, etc.) using both the conventional and
your special compiler (and possibly other candiate special compilers)
on a variety of machines and publish the results and your conclusion.
--
----------------------------------------------------------------
Keith H. Bierman kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33 | (415 336 2648)
Mountain View, CA 94043
sritacco@hpdmd48.boi.hp.com (Steve Ritacco) (09/08/90)
> putting integrated floating point into a silly little workstation like > a Sparc or an 80486 machine is serious overkill (and I'm almost > serious: these machines aren't fast enough for editing anymore) and a > poor use of gates. Especially since, I conjecture, emulation could > theoretically be faster. This I agree with. Especially when we are talking about the possibility of single processors executing multiple instructions per cycle. > Is it so absurd to suggest that there might be PARTS of floating point > instructions that, in the hands of a good optimiser, might be used to > generate better code than their wholes (remembering the VAX CALLS > linkage... :-)? > > Is it so absurd to suggest, in sum, that exposing separate mantissa > and exponent to the optimiser might result in *speedup* due to > constant propagation and expression-rearrangement, while at the same > time increasing expressivity by allowing an INDEPENDENT choice of > mantissa and exponent sizes? Very true, who need IEEE format anyway. Give me a processor capable of doing a few arithmetic instructions in a single cycle, with a single cycle multiply, and I think you've got it. Lets use all the FPU silicon to do more needed operations and good floating point could fall out anyway. > Is it so absurd to suggest that the effort and the silicon that go > into an FPU might be better spent on supporting some other datatype > that is of more _general_ applicability? Might be worth a try.
vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/09/90)
In article <14900015@hpdmd48.boi.hp.com> sritacco@hpdmd48.boi.hp.com (Steve Ritacco) writes: \\\ >Very true, who need IEEE format anyway. Give me a processor capable of ^^^^^^^^^^^^^^^^ >doing a few arithmetic instructions in a single cycle, with a single >cycle multiply, and I think you've got it. Lets use all the FPU silicon \\\ Good G*d! The IEEE std _was_ intended to produce some kind of uniformity in results across different kind of h/w. (A rather like the idea of getting results accurate to the _bit_ in FP calcs). Now, _some_ of us wouldnt go _near_ an FP calc in a month of weekends but some of us like to waste cycles rather than $M simulating the ``real'' world (where physics tells us we really dont _need_ FP since its all discrete; FP is just a handy hack) and sundry other pursuits. However, Im sure DSP guys _love_ this kind idea. The trouble comes when they try to port their wizz-bang code to some other processor (e.g. when their current chip/fabricator is superceded). -Kym Horsell
usenet@nlm.nih.gov (usenet news poster) (09/09/90)
In article <14900015@hpdmd48.boi.hp.com> sritacco@hpdmd48.boi.hp.com (Steve Ritacco) writes (and quotes): >> putting integrated floating point into a silly little workstation like >> a Sparc or an 80486 machine is serious overkill ... > >> Is it so absurd to suggest, in sum, that exposing separate mantissa >> and exponent to the optimiser might result in *speedup* due to >> constant propagation and expression-rearrangement The chained multiply and add FP hardware in processors like the IBM 6000 effectively do this. The marginal gain of putting resolution of the exponent off by more than every other operation is going to be small. >> while at the same >> time increasing expressivity by allowing an INDEPENDENT choice of >> mantissa and exponent sizes? > >Very true, who need IEEE format anyway. The market is in simulation and modeling. Everything from stockbrokers running econometric models to chemists looking at molecules. IEEE format has proven to be a reasonable balance which allows you to write general purpose tools that function over a wide range of input values. Between 32 bit integer and 32/64 bit FP and an occaissional algorithmic tweak, the vast majority of data can be be reasonably well represented. Custom fixed point formats have a place in DSP where performance is critical and you have the advantage of knowing exactly where the input data is coming from and what values will be acceptable. >Give me a processor capable of >doing a few arithmetic instructions in a single cycle, with a single >cycle multiply, and I think you've got it. Alot of the "superscalar" marketing hype is really just FP coprocessors. Take away the load/store operations in a superscalar RISC (those used to be part of the CISC instruction anyway) and the FPU, and what have you got left? ~one op/cycle. >Lets use all the FPU silicon >to do more needed operations and good floating point could fall out anyway. Matching similar levels of integration for the IPU and FPU, I have yet to see software emulation of FP that comes anywhere close to the speed of a hardware FPU. David States
gillies@m.cs.uiuc.edu (09/10/90)
1. Does the tail wag the dog? 2. Who needs floating point when *everyone* knows that operating systems code is the most important / most prevalent type of software. Does anyone see the similarity in the two statements above? How about these [circa 1970]: 1. Does the tail wag the dog? 2. Who needs BitBlt support when *everyone* knows that user interfaces are dominated by 24*80 displays and job control languages.
kahn@batcomputer.tn.cornell.edu (Shahin Kahn) (09/11/90)
In article <KHB.90Sep7103618@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman - SPD Advanced Languages) writes: >Go, formalize your proposal, gather statistics from "real" programs >(spec, perfect club, US steel, etc.) using both the conventional and "real" programs? Programs cease to be "real" as soon as they become benchmarks. Sometimes, It is not the code that is "real", it is the data-set. Very few of these codes are real as far as supercomputng goes. Most of them fit within the whole of 16 MBytes and take the whole of several minutes to execute. Shahin.
jgk@osc.COM (Joe Keane) (09/11/90)
In article <KHB.90Sep7103618@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman - SPD Advanced Languages) writes: >Often folks employ the heuristic that any instruction which gets used >frequently, say 3+% of the time has certainly earned its keep. FP >instructions satisfy that. There are all sorts of other data points >also. > >Go, formalize your proposal, gather statistics from "real" programs >(spec, perfect club, US steel, etc.) using both the conventional and >your special compiler (and possibly other candiate special compilers) >on a variety of machines and publish the results and your conclusion. This doesn't work. If you get statistics from C program and make a machine based on that, you'll get a C machine. Given the way C is, this machine will be good at subroutine calls, floating-point arithmetic, and pointer dereferencing. Conversely, it will probably be not so good at co-routines, multi-precision arithemtic, and associative lookup. If you optimize the machine based on the language, and then adjust the language based on what is efficient on the machine, you're stuck in a loop. It's time to get out.
jgk@osc.COM (Joe Keane) (09/11/90)
In article <3961@bingvaxu.cc.binghamton.edu> vu0310@bingvaxu.cc.binghamton.edu.cc.binghamton.edu (R. Kym Horsell) writes: >Good G*d! The IEEE std _was_ intended to produce some kind of uniformity >in results across different kind of h/w. (A rather like the idea >of getting results accurate to the _bit_ in FP calcs). You could argue that this is bad not good. Suppose you ran a computation on machine X and it gives the answer -37.69, then you moved to machine Y and it gives the same answer. This might give you some unfounded confidence that the answer is actually -37.69. Actually the right answer could be 2.00 but you don't know that. It used to be that if you wanted to check an answer you would run it on a different machine, but now this doesn't do much for you. In fact you could argue that if you're going to do rounding, the best way to do it is randomly. A couple of free-running oscillators on your FP chip wouldn't take up too much space. If you did this the error in the expected value could actually be much lower than 1 LSB. If you ran the same program 20 times, you'd get 20 different answers. However, if these agreed well you'd have good reason to believe that the answer is right. I'm not arguing that multiple runs is a substitute for good numerical analysis, but it can point out that something is drastically wrong. The floating point on machine X may be much better than that on machine Z, but if you get a terrible answer from machine Z, i wouldn't trust the answer from machine X so much either. I'm sort of playing devil's advocate here. Actually i think the IEEE standard is very good as far as FP goes. However, if you're dealing with inherently inaccurate computations, a little diversity may be a good thing.
geoff@hls0.hls.oz (Geoff Bull) (09/12/90)
In article <14900015@hpdmd48.boi.hp.com> sritacco@hpdmd48.boi.hp.com (Steve Ritacco) writes: >Very true, who need IEEE format anyway. Give me a processor capable of IEEE format is one of the better things that has happened to the industry. You seem to have forgotten the bad old days when numerical programming was a black art, and programs would give different answers on different machines. -- Geoff Bull (Senior Engineer) Phone : (+61 48) 68 3490 Highland Logic Pty. Ltd. Fax : (+61 48) 68 3474 348-354 Argyle St ACSnet : geoff@hls0.hls.oz.au Moss Vale, 2577, AUSTRALIA
gillies@m.cs.uiuc.edu (09/12/90)
/* Written 9:59 pm Sep 6, 1990 by stephen@estragon.uchicago.edu in m.cs.uiuc.edu:comp.arch */ > But it does seem to me that most cycles really do go on operating > systems and user interface, and putting integrated floating point into > a silly little workstation like a Sparc or an 80486 machine is serious > overkill. My favorite device-independent screen language is display postscript (is there another one that is device independent?). This is user interface code. Display Postscript really beats up on a floating point chip. Once we have all machines running floating point, I think a whole new class of applications will emerge to take advantage of this feature. Remember, the software development biz is now dominated by the common denominator (typically PC's). Most floating-point intensive applications are unthinkable on today's integer workstation.