wmb@MITCH.ENG.SUN.COM (08/28/90)
> This proposal [making a floating point stack optional] seems to > guarantee that anyone wishing to write a "Standard" application > using floating-point will have to write everything twice. Then, > of course, at the beginning of your standard program you can test > the environment and decide which copy of the application to run. > (I invite counter-examples.) That is what I used to think too, until I figured out the "trick". It turns out that the real problem is with "mixed stack" operations, where you need to simultaneously deal with data stack and floating point stack values. If the floating point data is kept on the data stack, then can you access integer data underneath it? The solution turns out to be remarkably simple: Suppose that we have a function FSTKCELLS FSTKCELLS ( n -- ncells ) ncells is the number of data stack items occupied by a n floating point numbers. If there is a separate floating point stack, FSTKCELLS would be DROP 0. Otherwise, it might be NOOP or 2* or 4* or whatever is correct, considering the relative sizes of integers and floating point numbers. Given this function, mixed stack operations can be portably expressed as a (usually trivial) calculation involving FSTKCELLS and PICK . Of course, this is more cumbersome than doing nothing at all, but in an absolute sense, it is reasonably simple, effective, and less trouble than maintaining 2 versions of the code. Fortunately, mixed-stack operations turn out to be relatively infrequent, and can often be avoided altogether by judicious use of variables. Mitch
ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) (08/28/90)
>> This proposal [making a floating point stack optional] seems to >> guarantee that anyone wishing to write a "Standard" application >> using floating-point will have to write everything twice. Then, >> of course, at the beginning of your standard program you can test >> the environment and decide which copy of the application to run. >> (I invite counter-examples.) > >That is what I used to think too, until I figured out the "trick". > >The solution turns out to be remarkably simple: > >Suppose that we have a function FSTKCELLS > > FSTKCELLS ( n -- ncells ) > ncells is the number of data stack items occupied > by a n floating point numbers. > >Given this function, mixed stack operations can be portably expressed as >a (usually trivial) calculation involving FSTKCELLS and PICK . > >Of course, this is more cumbersome than doing nothing at all, but in >an absolute sense, it is reasonably simple, effective, and less trouble >than maintaining 2 versions of the code. > >Fortunately, mixed-stack operations turn out to be relatively infrequent, >and can often be avoided altogether by judicious use of variables. > >Mitch Sorry, Mitch, I have to disagree strongly on this one. There is a big difference between floating point code that uses a separate stack and that which doesn't. The end result of the compromise in the proposal that was adopted is that, rather than some of the existing floating point code needing to be rewritten to be portable, ALL OF IT WILL HAVE TO BE. From the standpoint of portability, all existing floating point code is broken. Period. I admired the handling of the Floored vs. Truncated division compromise, but this one sucks hot rocks from hell. Mixed stack operations are quite common in astrophysical applications, especially image processing, where, in our system here at Goddard, the image is stored in an indexed N-dimensional array. The indices are integers kept on the data stack, the floating point array data on a separate floating point stack. The indices may be computed in another word entirely (e.g. a word to convert celestial coordinates to pixel location), therefore the ordering of the words in a program is very dependant on knowing if a separate stack exists. To illustrate the difference in coding style consider data stored in an array defined like so: F_IMAGE ( rows cols -- ) compiling ( row col -- addr ) executing Allocate a two-dimensional floating point array of width 'cols' and heigth 'rows'. At runtime, the defined array takes the index values 'row' and 'col' and converts them to the appropriate memory address of the element in that row and column. Say we want a 3-by-3 pixel box average of the data in order to generate a smoothed image. For most of the image, one just adds up the values of the 3 pixels immediately above the reference pixel, the three pixels below it, the pixels to either side and the reference pixel itself, then divide by 9.0. At the borders, 2-by-3 and 3-by-2 pixels boxes are averaged. 512 512 F_IMAGE Galaxy ( This is the image array used below. Assume ) ( an image has already been loaded into it. ) ( Central routine for computing a 3-by-3 pixel box average of the image in array Galaxy. To complete the actual box averaging, similar routines are used that compute averages of 2-by-3 and 3-by-2 boxes at the borders. ) : _box-average ( F: -- Ave-Value D: i0 j0 --- ) ( Total the values for the three pixels in the row above ) over 1- over 1- ( i0-1 j0-1 ) Galaxy F@ over 1- over ( i0-1 j0 ) Galaxy F@ F+ over 1- over 1+ ( i0-1 j0+1 ) Galaxy F@ F+ ( Total the values for the three pixels in the row below ) over 1+ over 1- ( i0+1 j0-1 ) Galaxy F@ F+ over 1+ over ( i0+1 j0 ) Galaxy F@ F+ over 1+ over 1+ ( i0+1 j0-1 ) Galaxy F@ F+ ( Total pixels to left and right and finally center, divide by 9 ) 2dup 1- ( i0 j0-1 ) Galaxy F@ F+ 2dup 1+ ( i0 j0+1 ) Galaxy F@ F+ ( i0 j0 ) Galaxy F@ F+ 9.0 F/ ; As you can see, with the separate floating point stack, the pixel values accumulate in the FP stack and don't interfere with the calculations of the index values. Writing the same code for FP values on the data stack would result in much more complex set of stack manipulations, or the intermediate value would have to be stored in a variable. Either way, it would be much less efficient. I scanned several dozen screens of code with various floating-point calculations and encountered many instances where the existence of a separate stack is vital to running the code. There is also the problem of programmers, like myself, that get a little sloppy about fetch and store operations, seeing as how with a separate stack it doesn't matter what order the value and address are referenced, e.g. 12.0 XVAL F! is precisely equivalent to XVAL 12.0 F! when there is a separate FP stack, but not when floats are stored on the data stack. This kind of thing can get very hard to find and fix if both methods are allowed and have to be planned for. Testing for whether there is or isn't an FP stack, and writing code to operate on either, just will not be worth the effort. The end result will be a lot of code that depends on one method alone. All hope of portability is lost, in my opinion. This is one area where the TC should have bitten the bullet and made a definite decision for or against a separate stack. Obviously I would prefer the former, but I could live with the latter, as long as it was definite. This time the move to an all-encompassing compromise may well prove disastrous. The division question provided a simple answer that leaves current code portable with a few simple definitions for / , MOD , etc. This latest decision will require rewrites of ALL floating point code, regardless. What could the rational have possibly been? -- Lee Brotzman (FIGI-L Moderator) -- BITNET: ZMLEB@SCFVM Internet: zmleb@scfvm.gsfc.nasa.gov -- I'm only a contractor, don't blame me for the tax rates and don't blame -- the government for my statements.
wmb@MITCH.ENG.SUN.COM (08/28/90)
> The end result of the compromise in the proposal that was adopted is that, > rather than some of the existing floating point code needing to be > rewritten to be portable, ALL OF IT WILL HAVE TO BE. From the standpoint > of portability, all existing floating point code is broken. Period. Yeah, I guess so. OTOH, you could just declare that the code has an environmental dependency on a floating point stack, which is no worse than the current situation (in which there is no floating point standard). (Yeah, yeah, I know, the environmental dependency sucks too) > I admired the handling of the Floored vs. Truncated division compromise, > but this one sucks hot rocks from hell. ... > I scanned several dozen screens of code with various floating-point > calculations and encountered many instances where the existence > of a separate stack is vital to running the code. > This is one area where the TC should have bitten the bullet and made a > definite decision for or against a separate stack. Sigh. What to do? The committee seemed to be pretty much set on a separate stack, but then Phil Koopman came and made an impassioned and eloquent plea for not requiring a separate stack. Sometimes I feel like this is a "damned if you do, damned if you don't" situation. My personal position favors specifying a separate stack, but Phil was pretty persuasive. I convinced myself that I could manage to write new portable code in the ambiguous situation, and at that point my position softened somewhat. Mitch Bradley, wmb@Eng.Sun.COM (getting somewhat weary of defending one half of the Forth community against the other half, and vice versa)
ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) (08/28/90)
>> This is one area where the TC should have bitten the bullet and made a >> definite decision for or against a separate stack. > >Sigh. What to do? The committee seemed to be pretty much set on a >separate stack, but then Phil Koopman came and made an impassioned >and eloquent plea for not requiring a separate stack. Ok Phil, speak up. We know you're out there. Come out now and noone will get hurt. Explain this morbid fear of a separate floating point stack. I presume this is related to implementing floating point on a Forth chip. Give specific examples where the separate stack makes such an impact on performance in your case, that making everyone else rewite all their floating point code becomes necessary. Come on, 'fess up. Let's hear it. Speak now or forever hold your peace. (Uh ... if you haven't noticed yet, the above is light-hearted sarcasm, even though the subject I inquire about is quite serious). -- Lee Brotzman (FIGI-L Moderator) -- BITNET: ZMLEB@SCFVM Internet: zmleb@scfvm.gsfc.nasa.gov -- I'm only a contractor, don't blame me for the tax rates and don't blame -- the government for my statements.
koopman@a.gp.cs.cmu.edu (Philip Koopman) (08/28/90)
In article <9008281431.AA00691@ucbvax.Berkeley.EDU>, ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) writes: > Ok Phil, speak up. We know you're out there. Come out now and noone > will get hurt. This is a summary (to the extent that I can recall) of the reasons for allowing using the data stack for floating point data that I presented to the ANSI Forth meeting in Melbourne back in May. That discussion appears to have provided the impetus for the changes to the BASIS at the latest meeting, but I have not been personally involved since May. 1) There is common practice for both using a separate floating point stack and a unified data/floating point stack. Historically, separate floating point stacks have come into use because of implementation considerations on specific platforms (e.g. the 80287). Coprocessor stacks can have problems (such as handling stack overflows when reals are passed as subroutine parameters). On some platforms, a separate floating point stack is very expensive, because there is no on-chip register available for use as a pointer. The fact that there is common practice for both separated and unified stacks is what creates the issue. 2) I do not know of any stack-based machines (sometimes called "Forth machines") that support separated floating point stacks. When last I checked, the consensus seemed to be that separate stacks are not likely to be added, either. Certainly, a floating point stack can be emulated in memory, but it will be very slow compared to a *single-cycle* floating point operation that is likely to be found on 32-bit hardware. Therefore, it is quite likely that users of such machines will have strong incentive to use a unified stack approach. Harris floating point software assumes a unified stack. I predict that users of stack machines will ignore any requirement for using a separate floating point stack. A separate on-chip stack is quite expensive not only in silicon real estate, but also in terms of increased context switching time. 3) As Mitch has pointed out, in a great many cases code can be written to be insensitive to the stack model. In those cases where such code is extremely inefficient, portable code could use conditional compilation to provide two versions. My guess is that such code is very limited in size when viewed in the context of an entire application (and, if speed is that important, it's probably in assembler anyway). Also, much code is written with the loop variables in local variables or on the return stack (so, the sequence OVER 1+ OVER 1- for image processing could just as easily be I 1+ J 1-). 4) One motivator for separate stacks is that 16-bit integers are not the same size as 32-bit reals. On 32-bit hardware, this problem goes away: single and double precision for reals and ints are the same size. (80-bit reals are brought to you courtesy of Intel, and are uncommon elsewhere). If you are really serious about fast floating point (i.e., single-cycle F* and F+), you probably should be using a 32-bit machine, so I do not weight this reason heavily. I do not know whether a separate or unified stack is "best". One of my criteria will be which one a C compiler can use best for stack machines (but, the jury is still out). I requested that the standard not preclude use of a unified stack. Phil Koopman koopman@greyhound.ece.cmu.edu Arpanet 2525A Wexford Run Rd. Wexford, PA 15090 Senior scientist at Harris Semiconductor, and adjunct professor at CMU. I don't speak for them, and they don't speak for me.
a684@mindlink.UUCP (Nick Janow) (08/29/90)
koopman@a.gp.cs.cmu.edu (Philip Koopman) writes: > As Mitch has pointed out, in a great many cases code can be written to be > insensitive to the stack model. In those cases where such code is extremely > inefficient, portable code could use conditional compilation to provide two > versions. My guess is that such code is very limited in size when viewed in > the context of an entire application (and, if speed is that important, it's > probably in assembler anyway). Your argument could also apply to using a separate stack only--and offering an optional stack-machine coded version for the speed-critical sections. :-) > I do not know whether a separate or unified stack is "best". One of my > criteria will be which one a C compiler can use best for stack machines (but, > the jury is still out). I requested that the standard not preclude use of a > unified stack. While I respect Harris and appreciate the support they are giving FORTH, I don't think that the ANS standard should be set just to suit C compilers running on an RTX2000. You've just admitted that code for a separate FP stack might be more viable in the marketplace. Part of the reasons for the present method was to accommodate Harris--and you don't even know if that's what you want? Maybe you Harris engineers could brainstorm a bit on the issue...before the ANS FORTH is engraved in stone. The present FP method (separate and combined stacks) was decided upon after lengthy discussion. However, there were 15 or fewer people present (some of whom were less than experts on the issue) and there were not that many good arguments put forward in the proposals, so it was not a massive consensus of the entire FORTH community. Despite the consensus (I think it was 14 in favour, 1 {me} abstaining), I felt that the mood was "This is the best compromise we can come up with at this time. Let's see what the reaction is." To anyone interested in the FP issue: if you've got a new slant on the issue or a convincing argument for a particular method, SEND IT IN! Post comments, ideas, etc here too; maybe something better can come out of the discussion. If you're not happy with the present compromise, offer something constructive in order to change it. If you can't offer a better solution, admit it and stop complaining.
a684@mindlink.UUCP (Nick Janow) (08/29/90)
As I see it, the present compromise on FP numbers makes applications portable; the same FP code will run on an RTX2001 and an 8086&8087. In return for this portabiity, the programmers must accept the restraint that once a FP number is placed on the stack (you must assume the data stack), anything below it on the stack can not be accessed. Programmers for the RTX series must accept this restriction as well; do any want to comment on how that affects their work? Anyone who doesn't follow this is writing non-portable code. I don't know how difficult the restriction on data stack access will be, especially for large FP applications. Locals and the return stack can solve some of the problems, but I'd like to hear the comments of programmers who use FP heavily in their work _after_ they try it under the new restrictions, using locals and the return stack. koopman@a.gp.cs.cmu.edu (Philip Koopman) writes: > Historically, separate floating point stacks have come into use because of > implementation considerations on specific platforms (e.g. the 80287). There's another reason for a separate stack: the length of a FP number differs from application to application (and from machine to machine). Using conditional FP words (Flength 48 = IF MAKE OVER 3 CELLS ...) sounds very messy to me. :( > One motivator for separate stacks is that 16-bit integers are not the same > size as 32-bit reals. On 32-bit hardware, this problem goes away: single and > double precision for reals and ints are the same size. (80-bit reals are > brought to you courtesy of Intel, and are uncommon elsewhere). If you are > really serious about fast floating point (i.e., single-cycle F* and F+), you > probably should be using a 32-bit machine, so I do not weight this reason > heavily. <<<The Macintosh has 80 bit FP routines in ROM.>>> One idea I had during the discussion of FP handling was the option of having a separate FP stack _and_ allowing single-cell FP operations on the data stack. In this manner, single-cell FP could be handled either way, or if separate words are used for each operation (Ff+ and Fd+) you could have both. This would allow fast single-cell operations and slower but more accurate 80 bit operations at the same time. Either one could use a coprocessor. While this method would allow separate-stack programmers and combined (32 bit) stack programmers to write efficient code, it wouldn't allow equally fast implementation on the other's systems. Also, it would make systems with stacks narrower than a FP number (32 bits?) unhappy. Of course, if the portability of an application was a high priority, the restrictions of the present compromise could still be followed. If the restrictions weren't followed, the program will be ANS standard, environment dependent. I find this method appealing for some reason, but I have to admit that the present ANS compromise seems better. Under the restrictions, the ANS code will run on stack machines or non-stack machines equally fast. Wait a minute! If I remember correctly, the ANS compromise says something along the lines of "The default is separate FP stack, with the restriction of not accessing stack items below a FP item." Well, if you obey that restriction, why bother calling it a separate stack? If I sound confused on the issue of FP, that's okay, because I am. :-) Anyways, I wanted to put out the idea of separate FP stack _and_ single cell FP operations. If nothing else, it may help clarify why certain options are workable/unworkable. I find writing these things out helps me clarify my own thoughts. Maybe it will also inspire others to come up with other ideas that haven't been considered yet.
peter@ficc.ferranti.com (Peter da Silva) (08/29/90)
In article <10340@pt.cs.cmu.edu> koopman@a.gp.cs.cmu.edu (Philip Koopman) writes: > 4) One motivator for separate stacks is that 16-bit integers are not > the same size as 32-bit reals. On 32-bit hardware, this problem > goes away: single and double precision for reals and ints are the same size. Except for 64-bit reals (AKA double precision), which bring the whole thing back again. (If you say you don't need DP, I'll believe you... but you're not the whole world). -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) (08/29/90)
Phil Koopman <koopman@greyhound.ece.cmu.edu> writes: >1) There is common practice for both using a separate floating > point stack and a unified data/floating point stack. How common is the unified stack, I wonder? Mitch mentioned that the TC was pretty much settled on the separate stack idea, before settling on allowing both. That says to me that separate stacks are more common than unified. > On some platforms, a separate floating point stack is very expensive, > because there is no on-chip register available for use as a pointer. On many platforms in common use (VAX, 68000), there is an on-chip register available. The lack of one on one current generation of Forth chip should hamstring portability of floating point software on all? I think I can safely say that the vast majority of current working FP code is implemented on conventional CPUs. >2) I do not know of any stack-based machines (sometimes called "Forth > machines") that support separated floating point stacks. When > last I checked, the consensus seemed to be that separate stacks > are not likely to be added, either. Certainly, a floating point > stack can be emulated in memory, but it will be very slow compared > to a *single-cycle* floating point operation that is likely to be > found on 32-bit hardware. Are there any stack-based machines with on-chip floating point capability; floating point that executes basic instructions such as F+ and F* in a single cycle? If so, I can reluctantly accept your arguments (maybe). If not, and they are in the planning stage only, what is preventing the inclusion of an FP stack register and associated stack memory? Given the parallel nature of stack-based machine architecture, I would think -- as a complete novice when it comes to hardware design -- that it would be faster to perform an FP addition and memory store operation in a single cycle, than to perform an FP addition in one cycle, a swap in one cycle, and a store in one cycle. > Harris floating point software assumes a unified stack. To reiterate, is Harris floating point software founded on hardware FP operations or software "emulation" of floating point. If the former, I take as my example the VAX, where FP is provided in the CPU itself. The additional stack manipulations required by unified stack far outweigh the savings of a single register for holding the FP stack pointer. Keeping FP values on a separate stack eliminates extraneous SWAPs, OVERs, etc. A separate FP stack is a clear winner on this platform. If the latter, I take as my example the lowly 6502, for which I have implemented a set of software floating point routines. I found that amongst all of the bit-manipulation gymnastics, accessing a memory variable to get the FP stack pointer falls into the noise level when counting cycles. From my own experience, the separate stack becomes an advantage, or at the very least, no disadvantage. Coupled with the much greater ease in writing code, I am a stone-cold-separate-floating-stack advocate. Perhaps I should stand before the TC to make my own "empassioned plea". Nah, it's already too late. >3) As Mitch has pointed out, in a great many cases code can be written > to be insensitive to the stack model. Note the future tense used here. In effect, the past has been discarded. What I fear is that the effort to write stack-model-insensitive code will be too great, resulting in "ANS Standard Floating Point" meaning nothing. I understand your arguments (I hope), but let me couple your predictions of Forth-in-hardware developers ignoring floating point with a prediction of my own: I predict that *all* Forth developers will ignore the standard and write their code for the system they use and disregard portability considerations. The end result will be that we will use the same words (a major step forward in itself), but the words will mean different things. So much for Forth as a portable scientific application language. >4) One motivator for separate stacks is that 16-bit integers are not > the same size as 32-bit reals. On 32-bit hardware, this problem > goes away: single and double precision for reals and ints are the same size. > (80-bit reals are brought to you courtesy of Intel, and are uncommon > elsewhere). If you are really serious about fast floating point > (i.e., single-cycle F* and F+), you probably should be using a > 32-bit machine, so I do not weight this reason heavily. One motivator behind the ANS Standards effort was to release the "16-bit barrier" in Forth-83. This opens the door for vendors to develop systems that are compatible across platforms with disparate word lengths. The problems with writing FP code that is portable on 32-bit platforms is made nearly impossible on 16-bit platforms using 32-bit FP numbers when the FP values are stored on the data stack. This problem goes away completely when a separate stack is used for FP values. Whether I am "really serious" about floating point calculations or not, being able to write FP code on a VAX and run it unchanged on an Apple // (which I can easily do, using a separate stack) is a powerful indication of the portability afforded by this scheme. >I do not know whether a separate or unified stack is "best". One of >my criteria will be which one a C compiler can use best for stack machines >(but, the jury is still out). I requested that the standard not preclude >use of a unified stack. Oh, I thought we were talking about *Forth*. Seriously, your point about how a C compiler would implement FP operations is well taken. Harris' success could well be based on the efficiency of a C compiler on your chip. But you admit that the jury is still out. This is a direct conflict with the common-practice argument. Has Harris ever implemented a separate floating point stack in order to make quantitative judgements? Those judgements should be made against other platforms performing the same operations, as opposed to your own platform. Could you still outperform the competition even with a separate FP stack? Perhaps Harris is so concerned with single-cycle operations that multi-cycle operations are too much an anathema? > Phil Koopman koopman@greyhound.ece.cmu.edu Arpanet -- Lee Brotzman (FIGI-L Moderator) -- BITNET: ZMLEB@SCFVM Internet: zmleb@scfvm.gsfc.nasa.gov -- "Between an idea and implementation, is software." -- Curse from Hubble -- Space Telescope engineer.
koopman@a.gp.cs.cmu.edu (Philip Koopman) (08/29/90)
In article <9008290355.AA19589@ucbvax.Berkeley.EDU>, ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) writes: > How common is the unified stack, I wonder? It depends on what you mean by common. Since most Forth programmers these days probably run on systems with 80287 hardware stacks (or 80287 emulator software), the answer may be uncommon. My personal experience has been almost uniformly with unified stacks. The fact that the ANS Forth folks passed the resolution without my being at the meeting suggests that they now believe it is common enough to warrant consideration. > On many platforms in common use (VAX, 68000), there is an on-chip > register available. The lack of one on one current generation of Forth > chip should hamstring portability of floating point software on all? > I think I can safely say that the vast majority of current working FP > code is implemented on conventional CPUs. Not all conventional CPUs have an available (or at least convenient) register for the FP pointer. The 80x86 could use BX or perhaps DI, but requires an SS: override instruction for non-tiny memory models. First-generation micros don't have a spare register (6502, 8080/Z80, 6800, 68HC11?). You may think that small/old micros don't matter much, but other folks (especially in embedded control) do. > Are there any stack-based machines with on-chip floating point capability; > floating point that executes basic instructions such as F+ and F* in a single > cycle? No announced products that I know of. But, they will clearly come some day. > If so, I can reluctantly accept your arguments (maybe). If not, > and they are in the planning stage only, what is preventing the inclusion > of an FP stack register and associated stack memory? The memory takes a lot of chip area, which makes chips more expensive. It also represents more context to be saved for context switches, thus degrading real time performance. Even a separate pointer register is that much more context to save (especially since on a many systems there will probably be limit registers as well). The realities of the marketplace are that C performance will drive most design decisions (not ANS Forth compliance). Stack architectures will have a separate FP stack on-chip only if they *significantly* help C run times. I believe that this will be true of companies besides Harris in the future. > Given the parallel > nature of stack-based machine architecture, I would think -- as a complete > novice when it comes to hardware design -- that it would be faster to perform > an FP addition and memory store operation in a single cycle, than to > perform an FP addition in one cycle, a swap in one cycle, and a store in > one cycle. Inexpensive memory is slower than floating point operations. The major limit to supercomputers these days is not fast floating point, but rather memory bandwidth. I'll trade stack twiddling for fetches and stores any day. Perhaps this is not optimal on current hardware, but it is the wise long-term path. As CPU speeds increase, the importance of reducing demands on memory will increase in importance too. > To reiterate, is Harris floating point software founded on hardware FP > operations or software "emulation" of floating point. Both (at least in the planning stages). > If the latter, I take as my example the lowly 6502, for which I have > implemented a set of software floating point routines. I found that amongst > all of the bit-manipulation gymnastics, accessing a memory variable to get > the FP stack pointer falls into the noise level when counting cycles. How about FDUP, FSWAP, etc.? Here you are paying a proportionally larger penalty for memory-based pointer manipulations. > From my own experience, the separate stack becomes an advantage, or at > the very least, no disadvantage. Coupled with the much greater ease in > writing code, I am a stone-cold-separate-floating-stack advocate. > Perhaps I should stand before the TC to make my own "empassioned plea". > Nah, it's already too late. I didn't consider it an "empassioned plea" myself. Just a statement of fact. Harris (and other stack machine vendors, to the best of my knowledge) don't plan on supporting a separate hardware floating point stack. The other reasons I gave in my previous post were what I perceive as the pro-unified stack point of view. It is up to the TC to sort out the facts and reach a wise decision. > Oh, I thought we were talking about *Forth*. Seriously, your point > about how a C compiler would implement FP operations is well taken. Harris' > success could well be based on the efficiency of a C compiler on your chip. > But you admit that the jury is still out. This is a direct conflict with > the common-practice argument. No conflict at all. The common practice argument should be restricted to Forth. The jury is out on C. I have no performance numbers, and am therefore unwilling to preclude future use of a unified stack (OR, a split stack). The issue is whether the TC wants to take into account the very likely possibility (based on size, memory, and context switching considerations) that unified stacks will be significantly more efficient on future stack machines. We all know that many Forth programmers value efficiency above conformance to a standard. Phil Koopman koopman@greyhound.ece.cmu.edu Arpanet 2525A Wexford Run Rd. Wexford, PA 15090 Senior scientist at Harris Semiconductor, and adjunct professor at CMU. I don't speak for them, and they don't speak for me.
koopman@a.gp.cs.cmu.edu (Philip Koopman) (08/30/90)
In article <0XI5RNG@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes: > In article <10340@pt.cs.cmu.edu> koopman@a.gp.cs.cmu.edu (Philip Koopman) writes: > > 4) One motivator for separate stacks is that 16-bit integers are not > > the same size as 32-bit reals. On 32-bit hardware, this problem > > goes away: single and double precision for reals and ints are the same size. > > Except for 64-bit reals (AKA double precision), which bring the whole thing > back again. (If you say you don't need DP, I'll believe you... but you're > not the whole world). What I meant was single int, single float = 32 bits double int, double float = 64 bits DUP = FDUP DDUP = FDDUP (or, 2DUP = F2DUP, or whatever...) So, double reals are no worse than double integers (and, in fact, the data type doesn't really matter much for stack manipulations any more). Of course, this is only for 32 bit machines... Phil Koopman koopman@greyhound.ece.cmu.edu Arpanet 2525A Wexford Run Rd. Wexford, PA 15090 Senior scientist at Harris Semiconductor, and adjunct professor at CMU. I don't speak for them, and they don't speak for me.
toma@tekgvs.LABS.TEK.COM (Tom Almy) (08/30/90)
This is getting tiresome. LMI Forths put floating point numbers on the parameter stack. When incorporating floating point operations into my Native Code Compiler, I wanted separate stacks to take advantage of the additional performance (speed, code size, and accuracy) of stack operations in the 80x87. For compatibility I rewrote the existing LMI package to use the coprocessor stack. After changing the primitives, very little additional changes had to made. It hardly affects good application code at all (by *good* I mean *readable* and readable code does very little stack manipulation). Hey folks, it's not that difficult to write code that works in both environments; what hurts portability is not having a standard for the primitive functions! So lets have a standard that lets this be an implementation detail, where it belongs! If it makes sense for an 80x86/7 system to have separate stacks and for a Forth chip system to have one stack, that's fine! Consider the problems we have had in existing standards which specify division working in ways other than the hardware, or wordsizes that don't match reality. Tom Almy toma@tekgvs.labs.tek.com Standard Disclaimers Apply
BARTHO@CGEUGE54.BITNET ("PAUL BARTHOLDI, OBSERVATOIRE DE GENEVE") (08/30/90)
Hi friends, > Thanks, Philip, for the response. You have made several good points, > although I still find room to quibble (don't we all? :-). I don't > see much point in going back and forth on this with 100 lines of text > at a pop (did you hear that, OOF debaters?). 100 agree ! Thanks again. > Let me just reiterate my basic point about why the decision to allow > either a single or a separate floating point stack in a Standard System is > a very bad one. idem ! > It is difficult, if not impossible, to write floating point code that will > run portably on each of the following implementations: > > 16-bit integers, 32- and 64-bit FP, unified stack > 16-bit integers, 32- and 64-bit FP, separate stack > 32-bit integers, 32- and 64-bit FP, unified stack > 32-bit integers, 32- and 64-bit FP, separate stack > I was one of those who suggested back in Rochester 1981 to have separate stacks. My reasons were (and still are) : - To keep full precision of 80x87 and similar chips (including 9911 etc), data MUST stay as long as possible on the hardware stack; - mixing on the same stack integer and floating is (from experience, I have FP on my forth since 1976 ...) either trivial because they don't interfer, or terrible because the right data is never at the right place on the stack, so the code becomes full of 'unnecessary' ROT ROLL etc. - The computing speed may be real for some hardware (80x87 for example) but was not important in my case (HP1000 with FP microcode). Interestingly, the 'trivial' case is independant of unified or separate stack, while the 'terrible' is almost 100% solved with separate ones. It is very clear to me that we need a single choice, either separate or unified (I prefer separate; why, see above!). I don't want to write and DEBBUG code for both version (with conditionals etc the problem is not technical, but practical!) > The end result will be that although the code may be ANSI Standard Forth, ther e > is no guaruntee whatsoever that floating point operations will be portable. > Granted, this is the same as our current position, but some solution should > have been arrived at; instead the TC did nothing. > Even though I am strongly in favor of the separate stack for coding ease > and -- yes -- computing speed, I would have accepted a Standard that forced > the FP numbers to the data stack. It is the lack of a definitive answer that > is the real error. > > -- Lee Brotzman (FIGI-L Moderator) Please, TAKE a decision NOW (we should have done it in 1981)! Regards, Paul. +--------------------------------------------------------------+ | Dr Paul Bartholdi bartho@cgeuge54.bitnet | | Observatoire de Geneve bartho@obs.unige.ch | | 51, chemin des Maillettes 02284682161350::bartho (psi) | | CH-1290 Sauverny 20579::ugobs::bartho | | Switzerland +41 22 755 39 83 (fax) | | +41 22 755 26 11 (phone) | | +45 419 209 obsg ch (telex) | +--------------------------------------------------------------+
wmb@MITCH.ENG.SUN.COM (12/06/90)
Besides Harris, I believe LMI (one of the largest Forth vendors) uses a floating point implementation without a separate FP stack. I agree that it might be worthwhile to revisit the floating point stack issue. Personally, I don't think it's very likely that we will see a commercially Forth chip with integrated floating point any time in the near future. It would be nice, but I wouldn't bet on it happening. Doing floating point right in hardware is not as easy as building an integer processor, and I just don't see where the investment capital is going to come from. Mitch Bradley, wmb@Eng.Sun.COM
cwpjr@cbnewse.att.com (clyde.w.jr.phillips) (12/07/90)
In article <9012061505.AA20237@ucbvax.Berkeley.EDU>, wmb@MITCH.ENG.SUN.COM writes: > Personally, I don't think it's very likely that we will see a > commercially Forth chip with integrated floating point any time > in the near future. It would be nice, but I wouldn't bet on > it happening. Doing floating point right in hardware is not > as easy as building an integer processor, and I just don't see > where the investment capital is going to come from. > > Mitch Bradley, wmb@Eng.Sun.COM Yes but does the standard address a device independent way of interfacing with a FP chip. I'd like that functionality so my PCx FORTH could use "in a standard way" say an 8087, etc. --Clyde