[comp.lang.forth] Floating point stack

wmb@MITCH.ENG.SUN.COM (08/28/90)

> This proposal [making a floating point stack optional] seems to
> guarantee that anyone wishing to write a "Standard" application
> using floating-point will have to write everything twice.  Then,
> of course, at the beginning of your standard program you can test
> the environment and decide which copy of the application to run.
> (I invite counter-examples.)

That is what I used to think too, until I figured out the "trick".

It turns out that the real problem is with "mixed stack" operations,
where you need to simultaneously deal with data stack and floating
point stack values.  If the floating point data is kept on the data
stack, then can you access integer data underneath it?

The solution turns out to be remarkably simple:

Suppose that we have a function FSTKCELLS

	FSTKCELLS  ( n -- ncells )
		ncells is the number of data stack items occupied
		by a n floating point numbers.

If there is a separate floating point stack, FSTKCELLS would be DROP 0.
Otherwise, it might be NOOP or 2* or 4* or whatever is correct, considering
the relative sizes of integers and floating point numbers.

Given this function, mixed stack operations can be portably expressed as
a (usually trivial) calculation involving FSTKCELLS and PICK .

Of course, this is more cumbersome than doing nothing at all, but in
an absolute sense, it is reasonably simple, effective, and less trouble
than maintaining 2 versions of the code.

Fortunately, mixed-stack operations turn out to be relatively infrequent,
and can often be avoided altogether by judicious use of variables.

Mitch

ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) (08/28/90)

>> This proposal [making a floating point stack optional] seems to
>> guarantee that anyone wishing to write a "Standard" application
>> using floating-point will have to write everything twice.  Then,
>> of course, at the beginning of your standard program you can test
>> the environment and decide which copy of the application to run.
>> (I invite counter-examples.)
>
>That is what I used to think too, until I figured out the "trick".
>
>The solution turns out to be remarkably simple:
>
>Suppose that we have a function FSTKCELLS
>
>	FSTKCELLS  ( n -- ncells )
>		ncells is the number of data stack items occupied
>		by a n floating point numbers.
>
>Given this function, mixed stack operations can be portably expressed as
>a (usually trivial) calculation involving FSTKCELLS and PICK .
>
>Of course, this is more cumbersome than doing nothing at all, but in
>an absolute sense, it is reasonably simple, effective, and less trouble
>than maintaining 2 versions of the code.
>
>Fortunately, mixed-stack operations turn out to be relatively infrequent,
>and can often be avoided altogether by judicious use of variables.
>
>Mitch

   Sorry, Mitch, I have to disagree strongly on this one.  There is a big
difference between floating point code that uses a separate stack and that
which doesn't.  The end result of the compromise in the proposal that was
adopted is that, rather than some of the existing floating point code needing
to be rewritten to be portable, ALL OF IT WILL HAVE TO BE.  From the standpoint
of portability, all existing floating point code is broken.  Period.
   I admired the handling of the Floored vs. Truncated division compromise,
but this one sucks hot rocks from hell.
   Mixed stack operations are quite common in astrophysical applications,
especially image processing, where, in our system here at Goddard, the image is
stored in an indexed N-dimensional array.  The indices are integers kept
on the data stack, the floating point array data on a separate floating point
stack.  The indices may be computed in another word entirely (e.g. a word to
convert celestial coordinates to pixel location), therefore the ordering of the
words in a program is very dependant on knowing if a separate stack exists.
   To illustrate the difference in coding style consider data stored in
an array defined like so:

   F_IMAGE  ( rows cols -- )    compiling
            ( row col -- addr ) executing
      Allocate a two-dimensional floating point array of width 'cols' and
   heigth 'rows'.  At runtime, the defined array takes the index values
   'row' and 'col' and converts them to the appropriate memory address of
   the element in that row and column.

Say we want a 3-by-3 pixel box average of the data in order to generate a
smoothed image.  For most of the image, one just adds up the values of the
3 pixels immediately above the reference pixel, the three pixels below it,
the pixels to either side and the reference pixel itself, then divide by 9.0.
At the borders, 2-by-3 and 3-by-2 pixels boxes are averaged.

   512 512 F_IMAGE Galaxy    ( This is the image array used below.  Assume )
                             ( an image has already been loaded into it.   )

   ( Central routine for computing a 3-by-3 pixel box average of the image in
     array Galaxy.  To complete the actual box averaging, similar routines are
     used that compute averages of 2-by-3 and 3-by-2 boxes at the borders. )

   : _box-average  ( F:  -- Ave-Value   D:  i0 j0 --- )
       ( Total the values for the three pixels in the row above )
       over 1- over 1-  ( i0-1 j0-1 ) Galaxy F@
       over 1- over     ( i0-1 j0   ) Galaxy F@ F+
       over 1- over 1+  ( i0-1 j0+1 ) Galaxy F@ F+
       ( Total the values for the three pixels in the row below )
       over 1+ over 1-  ( i0+1 j0-1 ) Galaxy F@ F+
       over 1+ over     ( i0+1 j0   ) Galaxy F@ F+
       over 1+ over 1+  ( i0+1 j0-1 ) Galaxy F@ F+
       ( Total pixels to left and right and finally center, divide by 9 )
               2dup 1-  ( i0   j0-1 ) Galaxy F@ F+
               2dup 1+  ( i0   j0+1 ) Galaxy F@ F+
                        ( i0   j0   ) Galaxy F@ F+ 9.0 F/
   ;

   As you can see, with the separate floating point stack, the pixel values
accumulate in the FP stack and don't interfere with the calculations of the
index values.  Writing the same code for FP values on the data stack would
result in much more complex set of stack manipulations, or the intermediate
value would have to be stored in a variable.  Either way, it would be much
less efficient.  I scanned several dozen screens of code with various
floating-point calculations and encountered many instances where the existence
of a separate stack is vital to running the code.
   There is also the problem of programmers, like myself, that get a little
sloppy about fetch and store operations, seeing as how with a separate stack
it doesn't matter what order the value and address are referenced, e.g.
12.0 XVAL F!  is precisely equivalent to  XVAL 12.0 F!  when there is a
separate FP stack, but not when floats are stored on the data stack.  This
kind of thing can get very hard to find and fix if both methods are allowed
and have to be planned for.
   Testing for whether there is or isn't an FP stack, and writing code to
operate on either, just will not be worth the effort.  The end result will
be a lot of code that depends on one method alone.  All hope of portability
is lost, in my opinion.
   This is one area where the TC should have bitten the bullet and made a
definite decision for or against a separate stack.  Obviously I would prefer
the former, but I could live with the latter, as long as it was definite.
This time the move to an all-encompassing compromise may well prove
disastrous.
   The division question provided a simple answer that leaves current code
portable with a few simple definitions for / , MOD , etc.  This latest decision
will require rewrites of ALL floating point code, regardless.  What could the
rational have possibly been?

-- Lee Brotzman (FIGI-L Moderator)
-- BITNET:   ZMLEB@SCFVM          Internet: zmleb@scfvm.gsfc.nasa.gov
-- I'm only a contractor, don't blame me for the tax rates and don't blame
-- the government for my statements.

wmb@MITCH.ENG.SUN.COM (08/28/90)

> The end result of the compromise in the proposal that was adopted is that,
> rather than some of the existing floating point code needing to be
> rewritten to be portable, ALL OF IT WILL HAVE TO BE.  From the standpoint
> of portability, all existing floating point code is broken.  Period.

Yeah, I guess so.  OTOH, you could just declare that the code has an
environmental dependency on a floating point stack, which is no worse
than the current situation (in which there is no floating point standard).
(Yeah, yeah, I know, the environmental dependency sucks too)

> I admired the handling of the Floored vs. Truncated division compromise,
> but this one sucks hot rocks from hell.  ...
> I scanned several dozen screens of code with various floating-point
> calculations and encountered many instances where the existence
> of a separate stack is vital to running the code.

> This is one area where the TC should have bitten the bullet and made a
> definite decision for or against a separate stack.

Sigh.  What to do?  The committee seemed to be pretty much set on a
separate stack, but then Phil Koopman came and made an impassioned
and eloquent plea for not requiring a separate stack.

Sometimes I feel like this is a "damned if you do, damned if you don't"
situation.

My personal position favors specifying a separate stack, but Phil was
pretty persuasive.  I convinced myself that I could manage to write
new portable code in the ambiguous situation, and at that point my
position softened somewhat.

Mitch Bradley, wmb@Eng.Sun.COM

(getting somewhat weary of defending one half of the Forth community
 against the other half, and vice versa)

ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) (08/28/90)

>> This is one area where the TC should have bitten the bullet and made a
>> definite decision for or against a separate stack.
>
>Sigh.  What to do?  The committee seemed to be pretty much set on a
>separate stack, but then Phil Koopman came and made an impassioned
>and eloquent plea for not requiring a separate stack.

   Ok Phil, speak up.  We know you're out there.  Come out now and noone
will get hurt.
   Explain this morbid fear of a separate floating point stack.  I presume
this is related to implementing floating point on a Forth chip.  Give specific
examples where the separate stack makes such an impact on performance in
your case, that making everyone else rewite all their floating point code
becomes necessary.
   Come on, 'fess up.  Let's hear it.  Speak now or forever hold your peace.

   (Uh ... if you haven't noticed yet, the above is light-hearted sarcasm,
even though the subject I inquire about is quite serious).

-- Lee Brotzman (FIGI-L Moderator)
-- BITNET:   ZMLEB@SCFVM          Internet: zmleb@scfvm.gsfc.nasa.gov
-- I'm only a contractor, don't blame me for the tax rates and don't blame
-- the government for my statements.

koopman@a.gp.cs.cmu.edu (Philip Koopman) (08/28/90)

In article <9008281431.AA00691@ucbvax.Berkeley.EDU>, ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) writes:
>    Ok Phil, speak up.  We know you're out there.  Come out now and noone
> will get hurt.

This is a summary (to the extent that I can recall) of
the reasons for allowing using the data stack for
floating point data that I presented to the ANSI Forth 
meeting in Melbourne back in May.  That discussion appears
to have provided the impetus for the changes to the BASIS at the
latest meeting, but I have not been personally involved since May.

1) There is common practice for both using a separate floating
 point stack and a unified data/floating point stack.  Historically,
 separate floating point stacks have come into use because of 
 implementation considerations on specific platforms (e.g. the 80287).
 Coprocessor stacks can have problems (such as handling stack overflows 
 when reals are passed as subroutine parameters).
 On some platforms, a separate floating point stack is very expensive,
 because there is no on-chip register available for use as a pointer.
 The fact that there is common practice for both separated and unified
 stacks is what creates the issue.

2) I do not know of any stack-based machines (sometimes called "Forth
 machines") that support separated floating point stacks.  When
 last I checked, the consensus seemed to be that separate stacks
 are not likely to be added, either.  Certainly, a floating point
 stack can be emulated in memory, but it will be very slow compared
 to a *single-cycle* floating point operation that is likely to be
 found on 32-bit hardware.  Therefore, it is quite likely that users
 of such machines will have strong incentive to use a unified stack
 approach.  Harris floating point software assumes a unified stack.
 I predict that users of stack machines will ignore any requirement
 for using a separate floating point stack.
 A separate on-chip stack is quite expensive not only in silicon real
 estate, but also in terms of increased context switching time.

3) As Mitch has pointed out, in a great many cases code can be written
 to be insensitive to the stack model.  In those cases where such code
 is extremely inefficient, portable code could use conditional compilation
 to provide two versions.  My guess is that such code is very limited
 in size when viewed in the context of an entire application (and, if
 speed is that important, it's probably in assembler anyway).
 Also, much code is written with the loop variables in local variables
 or on the return stack (so, the sequence OVER 1+ OVER 1- for image
 processing could just as easily be I 1+ J 1-).

4) One motivator for separate stacks is that 16-bit integers are not
 the same size as 32-bit reals.  On 32-bit hardware, this problem
 goes away: single and double precision for reals and ints are the same size.
 (80-bit reals are brought to you courtesy of Intel, and are uncommon
  elsewhere).  If you are really serious about fast floating point
 (i.e., single-cycle F* and F+), you probably should be using a
 32-bit machine, so I do not weight this reason heavily.

I do not know whether a separate or unified stack is "best".  One of
my criteria will be which one a C compiler can use best for stack machines
(but, the jury is still out).  I requested that the standard not preclude
use of a unified stack.

  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet
  2525A Wexford Run Rd.
  Wexford, PA  15090
Senior scientist at Harris Semiconductor, and adjunct professor at CMU.
I don't speak for them, and they don't speak for me.

a684@mindlink.UUCP (Nick Janow) (08/29/90)

koopman@a.gp.cs.cmu.edu (Philip Koopman) writes:

> As Mitch has pointed out, in a great many cases code can be written to be
> insensitive to the stack model.  In those cases where such code is extremely
> inefficient, portable code could use conditional compilation to provide two
> versions.  My guess is that such code is very limited in size when viewed in
> the context of an entire application (and, if speed is that important, it's
> probably in assembler anyway).

Your argument could also apply to using a separate stack only--and offering an
optional stack-machine coded version for the speed-critical sections.  :-)

> I do not know whether a separate or unified stack is "best".  One of my
> criteria will be which one a C compiler can use best for stack machines (but,
> the jury is still out).  I requested that the standard not preclude use of a
> unified stack.

While I respect Harris and appreciate the support they are giving FORTH, I
don't think that the ANS standard should be set just to suit C compilers
running on an RTX2000.  You've just admitted that code for a separate FP stack
might be more viable in the marketplace.  Part of the reasons for the present
method was to accommodate Harris--and you don't even know if that's what you
want?  Maybe you Harris engineers could brainstorm a bit on the issue...before
the ANS FORTH is engraved in stone.

The present FP method (separate and combined stacks) was decided upon after
lengthy discussion.  However, there were 15 or fewer people present (some of
whom were less than experts on the issue) and there were not that many good
arguments put forward in the proposals, so it was not a massive consensus of
the entire FORTH community.  Despite the consensus (I think it was 14 in
favour, 1 {me} abstaining), I felt that the mood was "This is the best
compromise we can come up with at this time.  Let's see what the reaction is."

To anyone interested in the FP issue: if you've got a new slant on the issue or
a convincing argument for a particular method, SEND IT IN!  Post comments,
ideas, etc here too; maybe something better can come out of the discussion.  If
you're not happy with the present compromise, offer something constructive in
order to change it.  If you can't offer a better solution, admit it and stop
complaining.

a684@mindlink.UUCP (Nick Janow) (08/29/90)

As I see it, the present compromise on FP numbers makes applications portable;
the same FP code will run on an RTX2001 and an 8086&8087.  In return for this
portabiity, the programmers must accept the restraint that once a FP number is
placed on the stack (you must assume the data stack), anything below it on the
stack can not be accessed.  Programmers for the RTX series must accept this
restriction as well; do any want to comment on how that affects their work?
Anyone who doesn't follow this is writing non-portable code.

I don't know how difficult the restriction on data stack access will be,
especially for large FP applications.  Locals and the return stack can solve
some of the problems, but I'd like to hear the comments of programmers who use
FP heavily in their work _after_ they try it under the new restrictions, using
locals and the return stack.

koopman@a.gp.cs.cmu.edu (Philip Koopman) writes:

> Historically, separate floating point stacks have come into use because of
> implementation considerations on specific platforms (e.g. the 80287).

There's another reason for a separate stack: the length of a FP number differs
from application to application (and from machine to machine).  Using
conditional FP words (Flength 48 = IF MAKE OVER 3 CELLS ...) sounds very messy
to me.  :(

> One motivator for separate stacks is that 16-bit integers are not the same
> size as 32-bit reals.  On 32-bit hardware, this problem goes away: single and
> double precision for reals and ints are the same size. (80-bit reals are
> brought to you courtesy of Intel, and are uncommon  elsewhere).  If you are
> really serious about fast floating point (i.e., single-cycle F* and F+), you
> probably should be using a 32-bit machine, so I do not weight this reason
> heavily.
      <<<The Macintosh has 80 bit FP routines in ROM.>>>

One idea I had during the discussion of FP handling was the option of having a
separate FP stack _and_ allowing single-cell FP operations on the data stack.
In this manner, single-cell FP could be handled either way, or if separate
words are used for each operation (Ff+ and Fd+) you could have both.  This
would allow fast single-cell operations and slower but more accurate 80 bit
operations at the same time.  Either one could use a coprocessor.

While this method would allow separate-stack programmers and combined (32 bit)
stack  programmers to write efficient code, it wouldn't allow equally fast
implementation on the other's systems.  Also, it would make systems with stacks
narrower than a FP number (32 bits?) unhappy.  Of course, if the portability of
an application was a high priority, the restrictions of the present compromise
could still be followed.  If the restrictions weren't followed, the program
will be ANS standard, environment dependent.

I find this method appealing for some reason, but I have to admit that the
present ANS compromise seems better.  Under the restrictions, the ANS code will
run on stack machines or non-stack machines equally fast.

Wait a minute!  If I remember correctly, the ANS compromise says something
along the lines of "The default is separate FP stack, with the restriction of
not accessing stack items below a FP item."  Well, if you obey that
restriction, why bother calling it a separate stack?  If I sound confused on
the issue of FP, that's okay, because I am.  :-)

Anyways, I wanted to put out the idea of separate FP stack _and_ single cell FP
operations.  If nothing else, it may help clarify why certain options are
workable/unworkable.  I find writing these things out helps me clarify my own
thoughts.  Maybe it will also inspire others to come up with other ideas that
haven't been considered yet.

peter@ficc.ferranti.com (Peter da Silva) (08/29/90)

In article <10340@pt.cs.cmu.edu> koopman@a.gp.cs.cmu.edu (Philip Koopman) writes:
> 4) One motivator for separate stacks is that 16-bit integers are not
>  the same size as 32-bit reals.  On 32-bit hardware, this problem
>  goes away: single and double precision for reals and ints are the same size.

Except for 64-bit reals (AKA double precision), which bring the whole thing
back again. (If you say you don't need DP, I'll believe you... but you're
not the whole world).
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) (08/29/90)

Phil Koopman <koopman@greyhound.ece.cmu.edu> writes:
>1) There is common practice for both using a separate floating
> point stack and a unified data/floating point stack.

   How common is the unified stack, I wonder?  Mitch mentioned that the
TC was pretty much settled on the separate stack idea, before settling
on allowing both.  That says to me that separate stacks are more common
than unified.

> On some platforms, a separate floating point stack is very expensive,
> because there is no on-chip register available for use as a pointer.

   On many platforms in common use (VAX, 68000), there is an on-chip
register available.  The lack of one on one current generation of Forth
chip should hamstring portability of floating point software on all?
I think I can safely say that the vast majority of current working FP
code is implemented on conventional CPUs.

>2) I do not know of any stack-based machines (sometimes called "Forth
> machines") that support separated floating point stacks.  When
> last I checked, the consensus seemed to be that separate stacks
> are not likely to be added, either.  Certainly, a floating point
> stack can be emulated in memory, but it will be very slow compared
> to a *single-cycle* floating point operation that is likely to be
> found on 32-bit hardware.

   Are there any stack-based machines with on-chip floating point capability;
floating point that executes basic instructions such as F+ and F* in a single
cycle?  If so, I can reluctantly accept your arguments (maybe).  If not,
and they are in the planning stage only, what is preventing the inclusion
of an FP stack register and associated stack memory?  Given the parallel
nature of stack-based machine architecture, I would think -- as a complete
novice when it comes to hardware design -- that it would be faster to perform
an FP addition and memory store operation in a single cycle, than to
perform an FP addition in one cycle, a swap in one cycle, and a store in
one cycle.

> Harris floating point software assumes a unified stack.

   To reiterate, is Harris floating point software founded on hardware FP
operations or software "emulation" of floating point.
   If the former, I take as my example the VAX, where FP is provided in the
CPU itself.  The additional stack manipulations required by unified stack far
outweigh the savings of a single register for holding the FP stack pointer.
Keeping FP values on a separate stack eliminates extraneous SWAPs, OVERs, etc.
A separate FP stack is a clear winner on this platform.
   If the latter, I take as my example the lowly 6502, for which I have
implemented a set of software floating point routines.  I found that amongst
all of the bit-manipulation gymnastics, accessing a memory variable to get
the FP stack pointer falls into the noise level when counting cycles.
   From my own experience, the separate stack becomes an advantage, or at
the very least, no disadvantage.  Coupled with the much greater ease in
writing code, I am a stone-cold-separate-floating-stack advocate.
   Perhaps I should stand before the TC to make my own "empassioned plea".
Nah, it's already too late.

>3) As Mitch has pointed out, in a great many cases code can be written
> to be insensitive to the stack model.

   Note the future tense used here.  In effect, the past has been discarded.
What I fear is that the effort to write stack-model-insensitive code will
be too great, resulting in "ANS Standard Floating Point" meaning nothing.
   I understand your arguments (I hope), but let me couple your predictions
of Forth-in-hardware developers ignoring floating point with a prediction
of my own:  I predict that *all* Forth developers will ignore the standard
and write their code for the system they use and disregard portability
considerations.  The end result will be that we will use the same words
(a major step forward in itself), but the words will mean different things.
   So much for Forth as a portable scientific application language.

>4) One motivator for separate stacks is that 16-bit integers are not
> the same size as 32-bit reals.  On 32-bit hardware, this problem
> goes away: single and double precision for reals and ints are the same size.
> (80-bit reals are brought to you courtesy of Intel, and are uncommon
>  elsewhere).  If you are really serious about fast floating point
> (i.e., single-cycle F* and F+), you probably should be using a
> 32-bit machine, so I do not weight this reason heavily.

    One motivator behind the ANS Standards effort was to release the "16-bit
barrier" in Forth-83.  This opens the door for vendors to develop systems
that are compatible across platforms with disparate word lengths.  The
problems with writing FP code that is portable on 32-bit platforms is made
nearly impossible on 16-bit platforms using 32-bit FP numbers when the FP
values are stored on the data stack.  This problem goes away completely when a
separate stack is used for FP values.
    Whether I am "really serious" about floating point calculations or not,
being able to write FP code on a VAX and run it unchanged on an Apple //
(which I can easily do, using a separate stack) is a powerful indication
of the portability afforded by this scheme.

>I do not know whether a separate or unified stack is "best".  One of
>my criteria will be which one a C compiler can use best for stack machines
>(but, the jury is still out).  I requested that the standard not preclude
>use of a unified stack.

    Oh, I thought we were talking about *Forth*.  Seriously, your point
about how a C compiler would implement FP operations is well taken.  Harris'
success could well be based on the efficiency of a C compiler on your chip.
But you admit that the jury is still out.  This is a direct conflict with
the common-practice argument.  Has Harris ever implemented a separate floating
point stack in order to make quantitative judgements?  Those judgements should
be made against other platforms performing the same operations, as opposed to
your own platform.  Could you still outperform the competition even with a
separate FP stack?  Perhaps Harris is so concerned with single-cycle
operations that multi-cycle operations are too much an anathema?

>  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet

-- Lee Brotzman (FIGI-L Moderator)
-- BITNET:   ZMLEB@SCFVM          Internet: zmleb@scfvm.gsfc.nasa.gov
-- "Between an idea and implementation, is software." -- Curse from Hubble
-- Space Telescope engineer.

koopman@a.gp.cs.cmu.edu (Philip Koopman) (08/29/90)

In article <9008290355.AA19589@ucbvax.Berkeley.EDU>, ZMLEB@SCFVM.GSFC.NASA.GOV (Lee Brotzman) writes:
>    How common is the unified stack, I wonder?
It depends on what you mean by common.  Since most Forth programmers
these days probably run on systems with 80287 hardware stacks (or
80287 emulator software), the answer
may be uncommon.  My personal experience has been almost uniformly
with unified stacks.  The fact that the ANS Forth folks passed the
resolution without my being at the meeting suggests that they now
believe it is common enough to warrant consideration.

>    On many platforms in common use (VAX, 68000), there is an on-chip
> register available.  The lack of one on one current generation of Forth
> chip should hamstring portability of floating point software on all?
> I think I can safely say that the vast majority of current working FP
> code is implemented on conventional CPUs.
Not all conventional CPUs have an available (or at least convenient)
register for the FP pointer.  The 80x86 could use BX or perhaps DI,
but requires an SS: override instruction for non-tiny memory models.
First-generation micros don't have a spare register (6502, 8080/Z80,
6800, 68HC11?).  You may think that small/old micros don't matter
much, but other folks (especially in embedded control) do.

>    Are there any stack-based machines with on-chip floating point capability;
> floating point that executes basic instructions such as F+ and F* in a single
> cycle? 
No announced products that I know of.  But, they will clearly come
some day.

> If so, I can reluctantly accept your arguments (maybe).  If not,
> and they are in the planning stage only, what is preventing the inclusion
> of an FP stack register and associated stack memory?
The memory takes a lot of chip area, which makes chips more expensive.
It also represents more context to be saved for context switches,
thus degrading real time performance.  Even a separate pointer register
is that much more context to save (especially since on a many systems
there will probably be limit registers as well).
The realities of the marketplace are that C performance will drive most
design decisions (not ANS Forth compliance).  Stack architectures will
have a separate FP stack on-chip only if they *significantly* help C
run times.  I believe that this will be true of companies besides
Harris in the future.  

>  Given the parallel
> nature of stack-based machine architecture, I would think -- as a complete
> novice when it comes to hardware design -- that it would be faster to perform
> an FP addition and memory store operation in a single cycle, than to
> perform an FP addition in one cycle, a swap in one cycle, and a store in
> one cycle.
Inexpensive memory is slower than floating point operations.  The major
limit to supercomputers these days is not fast floating point, but rather
memory bandwidth.  I'll trade stack twiddling for fetches and stores
any day.  Perhaps this is not optimal on current hardware, but it is
the wise long-term path.  As CPU speeds increase, the importance of reducing
demands on memory will increase in importance too.

>    To reiterate, is Harris floating point software founded on hardware FP
> operations or software "emulation" of floating point.
Both (at least in the planning stages).

>    If the latter, I take as my example the lowly 6502, for which I have
> implemented a set of software floating point routines.  I found that amongst
> all of the bit-manipulation gymnastics, accessing a memory variable to get
> the FP stack pointer falls into the noise level when counting cycles.
How about FDUP, FSWAP, etc.?  Here you are paying a proportionally larger
penalty for memory-based pointer manipulations.

>    From my own experience, the separate stack becomes an advantage, or at
> the very least, no disadvantage.  Coupled with the much greater ease in
> writing code, I am a stone-cold-separate-floating-stack advocate.
>    Perhaps I should stand before the TC to make my own "empassioned plea".
> Nah, it's already too late.
I didn't consider it an "empassioned plea" myself.  Just a statement of
fact.  Harris (and other stack machine vendors, to the best of my
knowledge) don't plan on supporting a separate hardware floating point
stack.  The other reasons I gave in my previous post were what I
perceive as the pro-unified stack point of view.  It is up to
the TC to sort out the facts and reach a wise decision.

>     Oh, I thought we were talking about *Forth*.  Seriously, your point
> about how a C compiler would implement FP operations is well taken.  Harris'
> success could well be based on the efficiency of a C compiler on your chip.
> But you admit that the jury is still out.  This is a direct conflict with
> the common-practice argument. 
No conflict at all.  The common practice argument should be restricted
to Forth.  The jury is out on C.  I have no performance numbers, and am
therefore unwilling to preclude future use of a unified stack (OR, a
split stack).  The issue is whether the TC wants to take into account
the very likely possibility (based on size, memory,  and context switching
considerations) that unified stacks will be significantly more
efficient on future stack machines.  We all know that many Forth
programmers value efficiency above conformance to a standard.

  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet
  2525A Wexford Run Rd.
  Wexford, PA  15090
Senior scientist at Harris Semiconductor, and adjunct professor at CMU.
I don't speak for them, and they don't speak for me.

koopman@a.gp.cs.cmu.edu (Philip Koopman) (08/30/90)

In article <0XI5RNG@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:
> In article <10340@pt.cs.cmu.edu> koopman@a.gp.cs.cmu.edu (Philip Koopman) writes:
> > 4) One motivator for separate stacks is that 16-bit integers are not
> >  the same size as 32-bit reals.  On 32-bit hardware, this problem
> >  goes away: single and double precision for reals and ints are the same size.
> 
> Except for 64-bit reals (AKA double precision), which bring the whole thing
> back again. (If you say you don't need DP, I'll believe you... but you're
> not the whole world).

What I meant was  single int, single float = 32 bits
                  double int, double float = 64 bits
 DUP = FDUP
DDUP = FDDUP   (or, 2DUP = F2DUP, or whatever...)

So, double reals are no worse than double integers (and, in fact,
 the data type doesn't really matter much for stack manipulations any more).
Of course, this is only for 32 bit machines...

  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet
  2525A Wexford Run Rd.
  Wexford, PA  15090
Senior scientist at Harris Semiconductor, and adjunct professor at CMU.
I don't speak for them, and they don't speak for me.

toma@tekgvs.LABS.TEK.COM (Tom Almy) (08/30/90)

This is getting tiresome.

LMI Forths put floating point numbers on the parameter stack. When 
incorporating floating point operations into my Native Code Compiler,
I wanted separate stacks to take advantage of the additional performance
(speed, code size, and accuracy) of stack operations in the 80x87.
For compatibility I rewrote the existing LMI package to use the
coprocessor stack.

After changing the primitives, very little additional changes had to made.
It hardly affects good application code at all (by *good* I mean *readable*
and readable code does very little stack manipulation).

Hey folks, it's not that difficult to write code that works in both 
environments; what hurts portability is not having a standard for the
primitive functions!

So lets have a standard that lets this be an implementation detail, where
it belongs! If it makes sense for an 80x86/7 system to have separate stacks
and for a Forth chip system to have one stack, that's fine! Consider the
problems we have had in existing standards which specify division working
in ways other than the hardware, or wordsizes that don't match reality.

Tom Almy
toma@tekgvs.labs.tek.com
Standard Disclaimers Apply

BARTHO@CGEUGE54.BITNET ("PAUL BARTHOLDI, OBSERVATOIRE DE GENEVE") (08/30/90)

Hi friends,

>    Thanks, Philip, for the response.  You have made several good points,
> although I still find room to quibble (don't we all? :-).  I don't
> see much point in going back and forth on this with 100 lines of text
> at a pop (did you hear that, OOF debaters?).
       100 agree !  Thanks again.
>    Let me just reiterate my basic point about why the decision to allow
> either a single or a separate floating point stack in a Standard System is
> a very bad one.
       idem !
>    It is difficult, if not impossible, to write floating point code that will
> run portably on each of the following implementations:
>
>    16-bit integers, 32- and 64-bit FP, unified stack
>    16-bit integers, 32- and 64-bit FP, separate stack
>    32-bit integers, 32- and 64-bit FP, unified stack
>    32-bit integers, 32- and 64-bit FP, separate stack
>
I was one of those who suggested back in Rochester 1981 to have separate
stacks.  My reasons were (and still are) : - To keep full precision of 80x87
and similar chips (including 9911 etc), data MUST stay as long as possible
on the hardware stack;  - mixing on the same stack integer and floating is
(from experience, I have FP on my forth since 1976 ...) either trivial because
they don't interfer, or terrible because the right data is never at the right
place on the stack, so the code becomes full of 'unnecessary' ROT ROLL etc.
- The computing speed may be real for some hardware (80x87 for example) but
was not important in my case (HP1000 with FP microcode).

Interestingly, the 'trivial' case is independant of unified or separate stack,
while the 'terrible' is almost 100% solved with separate ones.  It is very
clear to me that we need a single choice, either separate or unified (I prefer
separate; why, see above!).  I don't want to write and DEBBUG code for both
version (with conditionals etc the problem is not technical, but practical!)

> The end result will be that although the code may be ANSI Standard Forth, ther
   e
> is no guaruntee whatsoever that floating point operations will be portable.
> Granted, this is the same as our current position, but some solution should
> have been arrived at; instead the TC did nothing.
>    Even though I am strongly in favor of the separate stack for coding ease
> and -- yes -- computing speed, I would have accepted a Standard that forced
> the FP numbers to the data stack.  It is the lack of a definitive answer that
> is the real error.
>
> -- Lee Brotzman (FIGI-L Moderator)

Please, TAKE a decision NOW  (we should have done it in 1981)!

                              Regards,                  Paul.

     +--------------------------------------------------------------+
     |  Dr Paul Bartholdi             bartho@cgeuge54.bitnet        |
     |  Observatoire de Geneve        bartho@obs.unige.ch           |
     |  51, chemin des Maillettes     02284682161350::bartho (psi)  |
     |  CH-1290 Sauverny              20579::ugobs::bartho          |
     |  Switzerland                   +41 22 755 39 83       (fax)  |
     |                                +41 22 755 26 11     (phone)  |
     |                                +45 419 209 obsg ch  (telex)  |
     +--------------------------------------------------------------+

wmb@MITCH.ENG.SUN.COM (12/06/90)

Besides Harris, I believe LMI (one of the largest Forth vendors) uses
a floating point implementation without a separate FP stack.

I agree that it might be worthwhile to revisit the floating point
stack issue.

Personally, I don't think it's very likely that we will see a
commercially Forth chip with integrated floating point any time
in the near future.  It would be nice, but I wouldn't bet on
it happening.  Doing floating point right in hardware is not
as easy as building an integer processor, and I just don't see
where the investment capital is going to come from.

Mitch Bradley, wmb@Eng.Sun.COM

cwpjr@cbnewse.att.com (clyde.w.jr.phillips) (12/07/90)

In article <9012061505.AA20237@ucbvax.Berkeley.EDU>, wmb@MITCH.ENG.SUN.COM writes:
> Personally, I don't think it's very likely that we will see a
> commercially Forth chip with integrated floating point any time
> in the near future.  It would be nice, but I wouldn't bet on
> it happening.  Doing floating point right in hardware is not
> as easy as building an integer processor, and I just don't see
> where the investment capital is going to come from.
> 
> Mitch Bradley, wmb@Eng.Sun.COM

Yes but does the standard address a device independent way of
interfacing with a FP chip.

I'd like that functionality so my PCx FORTH could use 
"in a standard way" say an 8087, etc.

--Clyde