[comp.lang.forth] New Directions: 'I' is broken in Forth

dwp@willett.UUCP (Doug Philips) (01/19/90)

Let me say right now, that I am *not* proposing any of this for ANSI Forth.

Also, the issue I want to get into here is independant of the issue
I want to raise in my other 'New Directions' post (just in case you see it
first).

The problem, IMHO, with 'I' is that it is context sensitive, as in

    DO ... I ... DO ... I ... LOOP ... LOOP

In this example, 'I' is not refering the same thing every place it
is mentioned.  State-smart words are something else in Forth that
are potentially ambiguous, but at least they attempt to have the
same semantics.  (There are arguments for/against state-smartness,
but I don't want to get sidetracked into that issue here.)

How would I do 'I'?  My first attempt would be to have I, J, etc. be
immediate words that compile code which grabs the current index from
where-ever it is that DO ... LOOP stashes it.  I would not use a
stack in the way that the return stack is used, in order to avoid the
problem I mentioned above.  In the interests of efficiency, I might
even want to hack the dictionary so that, from the word being compiled,
there is a known way to access the loop parameters for the loops it
contains.  I haven't given the actual implementation scads of concentrated
thought, but there oughta be a way to do it that is both clean *and* fast.

There would be, IMHO, two major gains by doing this.  First, the
restrictions about using R>, >R before/during/after DO LOOPS would
be lifted.  Second, the semantics of I J etc. would be simpler.
(Isn't forth's philosophy that simpler is better :-).  
(The second one is what motivated this post, but the first one is just
as important, now that I consider it.)

I have seen reference to 'control stacks' (probably in Forth Dimensions, but
I can't recall) for solving this problem.  Has anyone been using them or
are they just an acedemic exercise?  and if 'yes', have control stacks
been found to introduce new problems?   (Set aside 'efficiency', for the
moment, I mean semantic problems or difficulties).

I'm beginning to be of the opinion that DO LOOP and its use of the
return stack is not an elegant design, but rather one that was hacked
on top of the existing machinery.

I am also beginning to be of the opinion that Forth is flexible enough to
allow you do 'do it right' and still be efficient.

		-Doug

---
Fastest: (willett!dwp@gateway.sei.cmu.edu OR ...!sei!willett!dwp)
...!{uunet,nfsun,sei}!willett!dwp  [in a pinch: dwp@vega.fac.cs.cmu.edu]

toma@tekgvs.LABS.TEK.COM (Tom Almy) (01/23/90)

In article <285.UUL1.3#5129@willett.UUCP> dwp@willett.UUCP (Doug Philips) writes:
[Doug proposes an intelligent "IMMEDIATE" loop index function instead of the
 current stupid one]
>There would be, IMHO, two major gains by doing this.  First, the
>restrictions about using R>, >R before/during/after DO LOOPS would
>be lifted.  Second, the semantics of I J etc. would be simpler.

In my Native Code Compilers (which compile complete colon definitions as
CODE words) DO LOOP constructs are compiled in two passes (everything else is
single pass BTW). The word DO (or ?DO) causes a scan ahead to the matching
LOOP (or +LOOP), looking for any usage of the loop index. Depending on how
many arguments of DO are literals, presense of I, and termination by LOOP or
+LOOP one of about a half dozen different looping structures are compiled.
If I does not appear, the loop index is not even maintained -- a simple 
count down to zero loop (using a register) is used. Sometimes the index is
maintained in a register, sometimes on the return stack. If the limit is
a literal, then it need not be saved, otherwise it needs to be saved on
the return stack. "I" is smart enough to compile the correct code.

If the index is in a register, and the limit is a literal then >R and R>
can be used across a loop boundary, but the programmer can never be certain.
Then again, the 83 standard prevents making such assumptions -- and it is a
good thing because it allows me to do far more optimizations than I would
otherwise be able.

BTW, both my NCC and your scheme would have a tough time handling this
(illegal but common) abommination:

: y  r> I . >r ;

: x  ." Prints 1 to 10: "  11 1 do  y  loop ;

>I have seen reference to 'control stacks' (probably in Forth Dimensions, but
>I can't recall) for solving this problem.  Has anyone been using them or
>are they just an acedemic exercise?

STOIC had a separate loop stack, and thus got around the problem.

>and if 'yes', have control stacks
>been found to introduce new problems?   (Set aside 'efficiency', for the
>moment, I mean semantic problems or difficulties).

It didn't seem to cause any problems at all.

>I'm beginning to be of the opinion that DO LOOP and its use of the
>return stack is not an elegant design, but rather one that was hacked
>on top of the existing machinery.

Unlike the Fortran arithmetic if statement, the Forth DO LOOP doesn't seem
to map onto any hardware. And if anything, having two stack (let alone three)
rather than a single stack causes hardware problems. If you avoid the use
of the return stack (a recommendation I heartily gave students in my Forth
classes) then it works just fine!  Just use the return stack to help juggle
values on the parameter stack, or better yet use a variable or two.

>I am also beginning to be of the opinion that Forth is flexible enough to
>allow you do 'do it right' and still be efficient.

Well, yes. My NCC generates tighter code than even the best C compiler
around.

Tom Almy
toma@tekgvs.labs.tek.com
Standard Disclaimers Apply

BARTHO@CGEUGE54.BITNET (PAUL BARTHOLDI) (01/23/90)

>Let me say right now, that I am *not* proposing any of this for ANSI Forth.
> ...
>The problem, IMHO, with 'I' is that it is context sensitive, as in

First, just a question not related to the subject:  for us non americans,
could you explain what is the meaning of IMHO or BTW ? we find in all messages
now ...

>
>    DO ... I ... DO ... I ... LOOP ... LOOP
>
>In this example, 'I' is not refering the same thing every place it
>is mentioned.  State-smart words are something else in Forth that
>are potentially ambiguous, but at least they attempt to have the
>same semantics.  (There are arguments for/against state-smartness,
>but I don't want to get sidetracked into that issue here.)

I never found this a real problem, to the contrary. Where ever you are, you
can use 'I' for the current loop index, irrespective of the other outer loops,
and then 'J', 'K' etc for the next outer ones.

>...
>There would be, IMHO, two major gains by doing this.  First, the
>restrictions about using R>, >R before/during/after DO LOOPS would
>be lifted.  Second, the semantics of I J etc. would be simpler.
>(Isn't forth's philosophy that simpler is better :-).
>(The second one is what motivated this post, but the first one is just
>as important, now that I consider it.)

100% right. The mess of confusing the return stack with loops and index
is simply crazy, as is the use of the return stack for anything but return
addresses.

>I have seen reference to 'control stacks' (probably in Forth Dimensions, but
>I can't recall) for solving this problem.  Has anyone been using them or
>are they just an acedemic exercise?  and if 'yes', have control stacks
>been found to introduce new problems?   (Set aside 'efficiency', for the
>moment, I mean semantic problems or difficulties).

I never used >R or R> except when using some ones else programs.  One of the
first thing I did about 15 years ago starting with forth was to have a
separate stack for the do-loops, and a little later to build into the 'basic'
forth a stack structure that works like 'value' (or 'quant' ...).  So when ever
I need a stack I build one with  nn STACK <stack-name (of length nn)>.  Then
I will push vales into that stack by  ... INTO <stack-name>.  <stack-name>
pushes its top value onto the normal parameter stack (like a 'value') and
the stack can be cleaned with RESET <stack-name>.  Most of my programs
have 2-4 such stacks for various uses (like redirecting i/o between screen,
printer, ploter etc).

An other advantage of the separate do-loops stack is that the indices can be
used in any command call inside the do-loops.  For example, you can have
   : .i I . cr ;  then   : x 10 0 do .i loop ;
which i find quite useful.

One important point to remember is that the do-loops stack MUST be cleared
(resetting the pointers) when ever a command aborts.

>I'm beginning to be of the opinion that DO LOOP and its use of the
>return stack is not an elegant design, but rather one that was hacked
>on top of the existing machinery.

probably not hacked, but designed that way by Chuck Moore under very
strong constraints for space.  This is also in Chuck's style!

>I am also beginning to be of the opinion that Forth is flexible enough to
>allow you do 'do it right' and still be efficient.

I agree with you, but talking about forth, what is 'right' ?  (see also
the discussion about a 'pure' postfix 'do' ...)

                               regards,   Paul Bartholdi, Geneva Observatory.

wmb@SUN.COM (Mitch Bradley) (01/23/90)

BTW stands for "By the Way"; it serves to introduce a new topic that
may be somewhat related to the original topic, but which breaks the
train of thought.

IMHO stands for "In My Humble Opinion", which means nothing except that
perhaps the writer is not really very humble after all.

Mitch

wmb@SUN.COM (Mitch Bradley) (01/23/90)

> 100% right. The mess of confusing the return stack with loops and index
> is simply crazy, as is the use of the return stack for anything but return
> addresses.

More generally, Forth has a very bad tendency to use the same word
to mean two or more different things, taking advantage of the fact
that it just happens to work on a particular implementation.

My pet peeve is the use of COUNT for incrementing a pointer through
a byte array.  This is a common usage but I think it is disgusting.

A more profound problem is the use of the same arithmetic and comparison
operators for integer arithmetic and also for address arithmetic.
In many cases, integers and addresses have quite different properties.

This kind of think severely hinders portability.  Better to bite the
bullet and call a spade a spade, rather than saying that it's like
a shovel on your machine and leaving it at that.


> >I'm beginning to be of the opinion that DO LOOP and its use of the
> >return stack is not an elegant design, but rather one that was hacked
> >on top of the existing machinery.
>
> probably not hacked, but designed that way by Chuck Moore under very
> strong constraints for space.  This is also in Chuck's style!

The last I heard, Chuck had repented of having invented DO .. LOOP ,
preferring FOR .. NEXT instead.  I believe that he felt strongly enough
about it to have lobbied for the removal of DO .. LOOP from the ANSI
standard.

Actually, I think that Chuck would be happier if he could rewrite the
history books every 6 months.

> probably not hacked, ...

I vote for "hacked".  Forth is full of hacks.  It's an amazing combination
of sublime elegance and hideous hackery.

Mitch

bouma@cs.purdue.EDU (William J. Bouma) (01/24/90)

>The last I heard, Chuck had repented of having invented DO .. LOOP ,
>preferring FOR .. NEXT instead.  I believe that he felt strongly enough
>about it to have lobbied for the removal of DO .. LOOP from the ANSI
>standard.
>
>Mitch

  Besides the names, what is the difference?

  There is nothing wrong with the do-loop in forth other than the ugly
  way the index variable is handled. In my system you must explicitely
  declare the index variable after the 'do'. At compile time the do
  grabs the next word and makes sure it is a variable. It then compiles
  code to store the initial value in the memory location of that variable.
  Loop likewise compiles code to update that variable.

  ok, : junk variable x 10 1 do x x @ . loop ;
  ok, junk
  1 2 3 4 5 6 7 8 9
  ok, quit

-- 
Bill <bouma@cs.purdue.edu>  ||  ...!purdue!bouma

wmb@SUN.COM (01/24/90)

>  Besides the names, what is the difference? [ between DO .. LOOP and
>  FOR .. NEXT ]

Explanation by example:

    0  FOR  ." x"  NEXT   -->   x
    1  FOR  ." x"  NEXT   -->   xx
    2  FOR  ." x"  NEXT   -->   xxx
   -1  FOR  ." x"  NEXT   -->   xxxxxxxxxxxxxxxxxxxxxxxxxxx (and so on!)

This is efficient to implement in hardware; for instance, it maps pretty
much directly onto the 680x0 DBRA instruction.


>  There is nothing wrong with the do-loop in forth other than the ugly
>  way the index variable is handled.

Some things wrong with the standard Forth DO .. LOOP:

   Can't EXIT from inside a DO ... LOOP

   Can't intermix use of the return stack and loop indices

   Termination conditions for +LOOP with positive and negative arguments
     is not symmetrical.

   It always executes at least once.  This is nearly always the wrong
     thing.  ?DO helps, but ?DO doesn't work well with negative arguments
     to +LOOP


I use DO .. LOOP all the time, but there are definitely some problems
with it.


Mitch

bouma@cs.purdue.EDU (William J. Bouma) (01/25/90)

>    0  FOR  ." x"  NEXT   -->   x
>    1  FOR  ." x"  NEXT   -->   xx
>    2  FOR  ." x"  NEXT   -->   xxx
>
>This is efficient to implement in hardware; for instance, it maps pretty
>much directly onto the 680x0 DBRA instruction.

  ok, that is a good reason in some cases. But do loop can do the same
  stuff and do +loop is more general than this. Plus you have access to the
  counting index inside do loop. I can't understand how Chuck could argue
  to throw out do loop in favor of for next, (as you said in your previous
  post), since there doesn't seem to be that much difference. It doesn't
  seem very intuitive to have 0 FOR .. NEXT iterate one time. This is the
  same problem do loop has in always executing at least once.


>>  There is nothing wrong with the do-loop in forth other than the ugly
>>  way the index variable is handled.
>
>Some things wrong with the standard Forth DO .. LOOP:
>
>   Can't EXIT from inside a DO ... LOOP
>
>   Can't intermix use of the return stack and loop indices

  These are both side effects of the crummy way the index is handled. In
  my system these are not problems as my index doesn't sit on the return
  stack taking up space.


>   Termination conditions for +LOOP with positive and negative arguments
>     is not symmetrical.
>
>   It always executes at least once.  This is nearly always the wrong
>     thing.  ?DO helps, but ?DO doesn't work well with negative arguments
>     to +LOOP

  These were just stupid design decisions. I agree the functionality of
  do loop should be changed, but that is no reason to trash the construct
  altogether. m n DO LOOP should count from n to m-1 and should not execute
  at all if m <= n. Likewise m n DO -1 +LOOP should count from n to m+1 ...
-- 
Bill <bouma@cs.purdue.edu>  ||  ...!purdue!bouma