[comp.lang.forth] Postfixer FORTH

DAVID@PENNDRLS.BITNET (08/15/90)

How about this for a more postfix FORTH syntax:

" SomeWord" Is 20 Constant
" *2"       Is : 2 * ;

'Is' factors out Create without changing the compiler significantly.
The Name field still gets laid down first.  Defining words then
lay down a Code field at the current location, which is
conveniently right where it should go.

Another idea I've been fooling with arose from a previous discussion
of Vocabularies as string recognizers.  The argument was to enlarge
the scope of Number by implementing Vocabularies as generalized string
recognizers, with most of them doing it through table lookup in the
dictionary, but Number and similar routines doing it alogorithmically.
For instance, you could allow "x" to be recognized as a character
constant just as 5 is recognized as a numerical constant.

So, how about a recognizer that is a mixture?  Suppose we have a word
that recognizes tokens of the form 'Something' as being a reference
to the CFA of the word Something?  The Interpreter would leave the
CFA on the stack, the Compiler would compile it as a litteral.  We
could then say

  Defer FooBar
  'Fred' 'FooBar' Is
  : FooFred   'Fred' 'FooBar' Is ;

(to use Is in its more common meaning).  I think this is much
prettier.  And more postfix.

Once started down this path, though, you start to get into a lot of
delimited strings.  For instance, "Something" could be Something as a
string constant, <Something> could be the Body address of Something,
and [Something] could Compile Something even if Something is Immediate.
I'm not sure I want to go that far, but .  .  .

  "Fred"     Is Immediate : Fred's Action ;
  "CompFred" Is : Some Extra [Fred] Actions ;
  "FooBar"   Is Defered
  "FooFred"  Is : 'Fred' <FooBar> ! ;

Now for the Big Question.  Would this still be FORTH?  I think it
would be.  But it would *not* be Chuck Moore Forth, or ANS FORTH.

-- R. David Murray    (DAVID@PENNDRLS.BITNET, DAVID@PENNDRLS.UPENN.EDU)

dwp@willett.pgh.pa.us (Doug Philips) (08/16/90)

In <9008152018.AA23733@ucbvax.Berkeley.EDU>, DAVID@PENNDRLS.BITNET writes:

> How about this for a more postfix FORTH syntax:
> 
> " SomeWord" Is 20 Constant
> " *2"       Is : 2 * ;
> 
> 'Is' factors out Create without changing the compiler significantly.
> The Name field still gets laid down first.  Defining words then
> lay down a Code field at the current location, which is
> conveniently right where it should go.

I was going to reply that I thought your 'Is' was actually infix and
not postfix.  However, I think I'm too confused to say that with confidence.
The point of postfix is that parameters are passed to functions and that
the functions don't/shouldn't care how the parameters are created.
I think your 'Is' example shows that the physically adjacency of the
dictionary is not necessary are pure factoring.  I agree with:
	20 Constant		( and )
	: 2 * ;
being words that lay down code somewhere.  I think that for 'Is' to be
strictly postfix, the address of the code to be associated with the
word should be on the stack along with the name for the word.  What I
think you have in your 'Is' is a form of deferred word that is
immediately resolved.

> Another idea I've been fooling with arose from a previous discussion
> of Vocabularies as string recognizers.  The argument was to enlarge
> the scope of Number by implementing Vocabularies as generalized string
> recognizers, with most of them doing it through table lookup in the
> dictionary, but Number and similar routines doing it alogorithmically.
> For instance, you could allow "x" to be recognized as a character
> constant just as 5 is recognized as a numerical constant.

In March 90, on comp.lang.misc Darren (Last name forgotten) "new@udel.edu"
proposed a language he was working on called '2ol' which was just what
you are talking about.  The parser is a sequence of functions that
each get a crack at the current token until one of them claims to have
processed it.

Of course, you could always pervert NUMBER to do something like that.
The interpreter doesn't "know" that NUMBER only parses numbers, it
just passes tokens to it.  NUMBER could be a jump off point for
what you are talking about (but then perhaps a better name would be less
confusing.)

> Once started down this path, though, you start to get into a lot of
> delimited strings.  For instance, "Something" could be Something as a
> string constant, <Something> could be the Body address of Something,
> and [Something] could Compile Something even if Something is Immediate.
> I'm not sure I want to go that far, but .  .  .
> 
> Now for the Big Question.  Would this still be FORTH?  I think it
> would be.  But it would *not* be Chuck Moore Forth, or ANS FORTH.

I think it would be '2ol'.  Whether or not that is Forth, I'm not sure.
I think you could do something like that *in* Forth...

-Doug

---
Preferred: ( dwp@willett.pgh.pa.us  OR  ...!{sei,pitt}!willett!dwp )
Daily: ...!{uunet,nfsun}!willett!dwp  [last resort: dwp@vega.fac.cs.cmu.edu]

DAVID@PENNDRLS.BITNET (08/18/90)

>> How about this for a more postfix FORTH syntax:
>>
>> " SomeWord" Is 20 Constant
>> " *2"       Is : 2 * ;
>>
>> 'Is' factors out Create without changing the compiler significantly.
>> The Name field still gets laid down first.  Defining words then
>> lay down a Code field at the current location, which is
>> conveniently right where it should go.

>I was going to reply that I thought your 'Is' was actually infix and
>not postfix.  However, I think I'm too confused to say that with confidenc
>The point of postfix is that parameters are passed to functions and that
>the functions don't/shouldn't care how the parameters are created.
>I think your 'Is' example shows that the physically adjacency of the
>dictionary is not necessary are pure factoring.  I agree with:
>	20 Constant		( and )
>	: 2 * ;
>being words that lay down code somewhere.  I think that for 'Is' to be
>strictly postfix, the address of the code to be associated with the
>word should be on the stack along with the name for the word.  What I
>think you have in your 'Is' is a form of deferred word that is
>immediately resolved.

Actually my logic was as follows: Is is a word that takes a string and
builds an index entry that points to Here.  Constant and : are 'defining
words' that lay down a Code Field and Body.  Remember that the index is
not required for FORTH execution.  The compiler looks up words in the
Index and compiles the address to which the index entry points.  Like
data types in FORTH, it is the programmer's responsability to see to it
that Is was called only to label an address where he or she subsequently
put a Code Field.  Is doesn't care, and neither does the compiler.  In
fact, if one were to define the system such that the index was located
in a different area than the dictionary, one could use Is to label an
arbitrary memory location, such as defining a label in an assembler
program:

  " 2Push" Is Code:  BX Push,  " 1Push" Is AX Push,
    " Next" Is ..... ;Code

(to crib an example from my memory of F83).

But I agree with you: 'postfix' vs 'prefix' vs 'infix' seems to be
a much subtler problem than I ever imagined!

-- R. David Murray    (DAVID@PENNDRLS.BITNET, DAVID@PENNDRLS.UPENN.EDU)

dwp@willett.pgh.pa.us (Doug Philips) (08/20/90)

In <9008180333.AA11956@ucbvax.Berkeley.EDU>, DAVID@PENNDRLS.BITNET writes:
> Actually my logic was as follows: Is is a word that takes a string and
> builds an index entry that points to Here.  Constant and : are 'defining
> words' that lay down a Code Field and Body.  Remember that the index is
> not required for FORTH execution.  The compiler looks up words in the
> Index and compiles the address to which the index entry points.  Like
> data types in FORTH, it is the programmer's responsability to see to it
> that Is was called only to label an address where he or she subsequently
> put a Code Field.  Is doesn't care, and neither does the compiler.

Ok, I think I understand a bit better what you're trying to do.  I think
that you have removed the prefix nature of CREATE, because the name comes
first.  Unfortunately the binding for the name still comes after.  There is
an implicit parameter, HERE, which is what binds the name to the code that
defines it.

First, the implicit "HERE" parameter means that if the code
which is defining the word contains immediate words that invoke Is, you're
out of luck.  I'm not quite sure if the use of global variables is or isn't
a deciding factor for post-fix-ness.  I would be inclined to say no, because
*all* the parameters should be on the stack.  This has the nice side-effect
that the words are reentrant.

> In
> fact, if one were to define the system such that the index was located
> in a different area than the dictionary, one could use Is to label an
> arbitrary memory location, such as defining a label in an assembler
> program:
> 
>   " 2Push" Is Code:  BX Push,  " 1Push" Is AX Push,
>     " Next" Is ..... ;Code

Second, I think that the requirement for separate dictionary and code
spaces is overly restrictive and unnecessary.  As for the specifics of your
labelled code example, a question:

    Does Is rip out the previous string from the definition, or is " an
    IMMEDIATE word in the assembler vocabulary (assuming you use
    vocabularies)?

Aside from allowing arbitrary labelling of words in code, I'm don't get
the advantage to using Is to label an arbitrary memory location.
If Is binds its name to HERE, I don't see quite how the location
could be truly arbitrary.  If Is could define a truly arbitrary memory
location, wouldn't that mean the location's address would have to be
on the stack?  Seems like that is a more cleanly post-fix solution.

How about this example of a purely postfix version of CODE:

	( vocabulary manipulations as necessary go here...)
	: CODE:   ( -- cfa-able-address ) ... ; IMMEDIATE

	CODE: BX PUSH
	CODE: AX PUSH
	CODE: ( Machinations for NEXT... ) ;Code
	" Next" Is
	" 1Push" Is
	" 2Push" Is

My claim is that if you have more stack-ed addresses than you can keep
track of, you are doing something wrong.  It maybe that with CODE: you are
justified in performing arbitrary complex operations and that if normal
Forth factoring were an issue you wouldn't be using CODE: in the first
place.  Of course, in my example, I assume that CODE: doesn't do anything
obnoxious like create a code prolog or do any alignment, etc.  Of course
those restrictions would apply equally to your example.

-Doug

---
Preferred: ( dwp@willett.pgh.pa.us  OR  ...!{sei,pitt}!willett!dwp )
Daily: ...!{uunet,nfsun}!willett!dwp  [last resort: dwp@vega.fac.cs.cmu.edu]

DAVID@PENNDRLS.BITNET (08/22/90)

>> in a different area than the dictionary, one could use Is to label an
>> arbitrary memory location, such as defining a label in an assembler
>> program:
>>
>>   " 2Push" Is Code:  BX Push,  " 1Push" Is AX Push,
>>     " Next" Is ..... ;Code
>
>Second, I think that the requirement for separate dictionary and code
>spaces is overly restrictive and unnecessary.  As for the specifics of your
>labelled code example, a question:
>
>    Does Is rip out the previous string from the definition, or is " an
>    IMMEDIATE word in the assembler vocabulary (assuming you use
>    vocabularies)?

I thought the assembler was Interpreted code in all systems?  Is this
not true?  So '"' is interactive and leaves on the stack the address of
a copy of the string up to the next '"', which 'Is' eats and lays down
the index entry, and then 'BX Push,' assembles the next instruction into
the dictionary, and so on.

>Aside from allowing arbitrary labelling of words in code, I'm don't get
>the advantage to using Is to label an arbitrary memory location.
>If Is binds its name to HERE, I don't see quite how the location
>could be truly arbitrary.  If Is could define a truly arbitrary memory
>location, wouldn't that mean the location's address would have to be
>on the stack?  Seems like that is a more cleanly post-fix solution.

You are absolutely right about this.  I take back my implication that
my Is construct is in any way superior to { code . . . }  " name" Def.
It is in fact inferior (but easier, I think, to implament).

>How about this example of a purely postfix version of CODE:
>
>	( vocabulary manipulations as necessary go here...)
>	: CODE:   ( -- cfa-able-address ) ... ; IMMEDIATE
>
>	CODE: BX PUSH
>	CODE: AX PUSH
>	CODE: ( Machinations for NEXT... ) ;Code
>	" Next" Is
>	" 1Push" Is
>	" 2Push" Is
>
>My claim is that if you have more stack-ed addresses than you can keep
>track of, you are doing something wrong.  It maybe that with CODE: you are
No problem, if we are talking about separate code and index spaces:

CODE: " 2Push" Is BX PUSH
CODE: " Next"  Is AX PUSH
CODE: " 1Push" Is ( Machinations for NEXT... ) ;Code

This rewrite should make it obvious that we have factored out the
HERE dependence only from Is/Def, and that CODE: (etc) still have it.
This seems inherent in a state based system like FORTH (as opposed
to a purely functional system).

>place.  Of course, in my example, I assume that CODE: doesn't do anything
>obnoxious like create a code prolog or do any alignment, etc.  Of course
>those restrictions would apply equally to your example.

I'm not sure why you say this.  I am assuming Is does not compile
anything into the dictionary, but my Code: is free to lay down a CFA
and whatever code prefix it may find necessary, since it is called only
once, at the beginning of the Word.  Only "2Push" could be referenced
as a FORTH word, the other two could only be used as assembly labels.
For that reason I don't really consider my example valid code.

Anyway, I concede that

  { some FORTH words }  " Name" Def

is more postfix than my Is.

-- R. David Murray    (DAVID@PENNDRLS.BITNET, DAVID@PENNDRLS.UPENN.EDU)

peter@ficc.ferranti.com (Peter da Silva) (08/22/90)

In article <1560.UUL1.3#5129@willett.pgh.pa.us> dwp@willett.pgh.pa.us (Doug Philips) writes:
> 	( vocabulary manipulations as necessary go here...)
> 	: CODE:   ( -- cfa-able-address ) ... ; IMMEDIATE

> 	CODE: BX PUSH
> 	CODE: AX PUSH
> 	CODE: ( Machinations for NEXT... ) ;Code
> 	" Next" Is
> 	" 1Push" Is
> 	" 2Push" Is

That's pretty good. I was about to point out you had invented PostScript,
but that language doesn't allow multiple entry points. Still, it's pretty
close to:

	{ ... } /Next def

Still, it does demonstrate that at least at this level the PostScript
full postfix form has its advantages...
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

peter@ficc.ferranti.com (Peter da Silva) (08/23/90)

The following is in FIG-forth. I still fail to see the point of dumping
state. It's also untested and written by a rusty Forth coder. " xxx" is
assumed to leave the address of a counted string on top of the stack.

: { state @ if compile (branch) then compile (docol) here state @ 34 ;
: } ?comp 34 ?match state ! compile (;s)
    state @ if dup here over - swap ! literal then ;
: def tib @ in @ count drop tib ! 0 in ! [compile] : in ! tib !
  [ , ] [compile] ; ;
: {ifelse} if swap then drop execute ; ( use as { ... } { ... } flag {ifelse} )
: {while} >r begin i execute until r> drop ; ( use as { ... flag } {while} )
: {do} rot >r do i j execute loop r> drop ; ( use as { ... } hi lo {do} )
...
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

dwp@willett.pgh.pa.us (Doug Philips) (08/24/90)

In <9008230629.AA29489@ucbvax.Berkeley.EDU>, DAVID@PENNDRLS.BITNET writes:

> I thought the assembler was Interpreted code in all systems?  Is this
> not true?  So '"' is interactive and leaves on the stack the address of
> a copy of the string up to the next '"', which 'Is' eats and lays down
> the index entry, and then 'BX Push,' assembles the next instruction into
> the dictionary, and so on.

Oops.  I forgot that.  I guess the only difficulty might be if you were
using a subroutine threaded system with an optimizer in it, such as
Andrew Scott describes in the article "Extensible Optimizing Compiler"
(Forth Dimensions, Volume XII, Number 2).  I'm not sure what would happen
to a label on an instruction that "disappeared".  Maybe the label would
just inhibit optimization "across it"?  Probably just an academic question.

> You are absolutely right about this.  I take back my implication that
> my Is construct is in any way superior to { code . . . }  " name" Def.
> It is in fact inferior (but easier, I think, to implament).

You could use it in something perverted like:

	: Long-Word
	    ( stuff )
	    [ " Short-Word" Is ]
	    ( more stuff )
	    ;
Of course that would require the dictionary to be out of the way of the
code, a system dependant thing.  On the other hand if you really need to
tail merge two definitions you are doing non-portable stuff anyway.
(I can't easily justify why you'd do something like the above instead of
going to assembly code though.)

> CODE: " 2Push" Is BX PUSH
> CODE: " Next"  Is AX PUSH
> CODE: " 1Push" Is ( Machinations for NEXT... ) ;Code
> 
> >place.  Of course, in my example, I assume that CODE: doesn't do anything
> >obnoxious like create a code prolog or do any alignment, etc.  Of course
> >those restrictions would apply equally to your example.
> 
> I'm not sure why you say this.  I am assuming Is does not compile
> anything into the dictionary, but my Code: is free to lay down a CFA
> and whatever code prefix it may find necessary, since it is called only
> once, at the beginning of the Word.  Only "2Push" could be referenced
> as a FORTH word, the other two could only be used as assembly labels.
> For that reason I don't really consider my example valid code.

Ok. I guess I was still stuck on:

> From: DAVID@PENNDRLS.BITNET
> Message-ID: <9008152018.AA23733@ucbvax.Berkeley.EDU>
> Date: 15 Aug 90 15:13:41 GMT
> Reply-To: DAVID%PENNDRLS.BITNET@SCFVM.GSFC.NASA.GOV
> 
> How about this for a more postfix FORTH syntax:
> 
> " SomeWord" Is 20 Constant
> " *2"       Is : 2 * ;
> 
> 'Is' factors out Create without changing the compiler significantly.
> The Name field still gets laid down first.  Defining words then
> lay down a Code field at the current location, which is
> conveniently right where it should go.

-Doug

---
Preferred: ( dwp@willett.pgh.pa.us  OR  ...!{sei,pitt}!willett!dwp )
Daily: ...!{uunet,nfsun}!willett!dwp  [last resort: dwp@vega.fac.cs.cmu.edu]


---
Preferred: ( dwp@willett.pgh.pa.us  OR  ...!{sei,pitt}!willett!dwp )
Daily: ...!{uunet,nfsun}!willett!dwp  [last resort: dwp@vega.fac.cs.cmu.edu]

dwp@willett.pgh.pa.us (Doug Philips) (08/24/90)

In <IRD5U21@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:
>                     I was about to point out you had invented PostScript,
> but that language doesn't allow multiple entry points. Still, it's pretty
> close to:
> 
> 	{ ... } /Next def

Aside from some twisted cases (Duff's device, tail-merged code)
I don't see much need for multiple entry points.  Usually that is done
by factoring.  Perhaps something like:

	: CREATE
		WORD
	ENTRY:
		( rest of CREATE )
	;
	" $CREATE" Is

But that still doesn't look as clean to me as:

	: CREATE
	    WORD
	    $CREATE
	;

> Still, it does demonstrate that at least at this level the PostScript
> full postfix form has its advantages...

Now if only ANSI will let me have the cake and eat it too:
Where "The cake" is clean post-fix way to do ugly prefix stuff now  ( and )
"eat it too" means use those clean words to make the existing words
	for backward compatibily and for user-at-the-keyboard convenience.
I think I go involved with Forth too late for this standards go around.

-Doug

---
Preferred: ( dwp@willett.pgh.pa.us  OR  ...!{sei,pitt}!willett!dwp )
Daily: ...!{uunet,nfsun}!willett!dwp  [last resort: dwp@vega.fac.cs.cmu.edu]


---
Preferred: ( dwp@willett.pgh.pa.us  OR  ...!{sei,pitt}!willett!dwp )
Daily: ...!{uunet,nfsun}!willett!dwp  [last resort: dwp@vega.fac.cs.cmu.edu]

andrew@idacom.uucp (Andrew Scott) (08/25/90)

Doug Philips writes:
> Oops.  I forgot that.  I guess the only difficulty might be if you were
> using a subroutine threaded system with an optimizer in it, such as
> Andrew Scott describes in the article "Extensible Optimizing Compiler"
> (Forth Dimensions, Volume XII, Number 2).  I'm not sure what would happen
> to a label on an instruction that "disappeared".  Maybe the label would
> just inhibit optimization "across it"?  Probably just an academic question.

I hope I understand the issue, as I'm jumping into this discussion thread a
bit late, but you *did* refer to my article.  :-)

The question of preventing optimizing across labels is not so academic.  If
you think of a branch destination like a label, the optimizer queue must be
"flushed" before the branch is resolved.

For example, suppose the phrase DUP >R was optimized to a single instruction.
The following code illustrates the problem:

: FOO   ( x \ y -- )
    HORIZ? IF  OVER  ELSE  DUP  THEN     ( x coord for horiz., y for vertical )
    >R ...
;

THEN doesn't really compile any code - it only resolves a branch.  There is no
code between DUP and >R, but they must not be combined.

My solution was to compile a "do nothing" word that has an optimization rule
that "does nothing" (i.e. doesn't actually lay down code):

: NOTHING ;

SEQ:  NOTHING  IS:  ( nothing ) ;

The optimization rule does in fact complete any pending sequences, as NOTHING
never appears in any other rule.

The definition of THEN becomes:

: THEN     ' NOTHING COMPILE-TOKEN  >RESOLVE ; IMMEDIATE

(Our tick is state smart.  Here's a vote for the proposal for COMPILE-TOKEN !)

I force a "compilation" of NOTHING whenever I have to stop optimization.  In
your discussion, the same method could be used whenever internal labels need
to be generated.

I hope this all made some sense...
-- 
Andrew Scott	| mail:		andrew@idacom.uucp
		| - or -	{att, watmath, ubc-cs}!alberta!idacom!andrew
		| - or -	uunet!myrias!aunro!idacom!andrew

dwp@willett.pgh.pa.us (Doug Philips) (08/29/90)

In <1990Aug24.221345.14475@idacom.uucp>, andrew@idacom.uucp (Andrew Scott) writes:
> I hope I understand the issue, as I'm jumping into this discussion thread a
> bit late, but you *did* refer to my article.  :-)

(I was quite surprised when you actually showed up on the net.  Another
lurker out in the open!)

> The question of preventing optimizing across labels is not so academic.  If
> you think of a branch destination like a label, the optimizer queue must be
> "flushed" before the branch is resolved.

Just as I thought.  Thanks for the example.  The code in question was
using multiple entry points, but the issues are the same (I think).
(It matters not where you COME-FROM, but where you are GOING-TO).

I do have a question about your system.  For example:

IF ( code-series-1 ) ELSE ( code-series-2 ) THEN ( code-series-3 )

would it be reasonable, possible, sane, to be able to push copies of
code-series-3 into each branch and see if the optimizer can do better that
way?  If the ELSE was missing you could pretend it was there but empty.  You
can decide when to stop the code motion/duplication based on the length of
the longest of the two optimizer sequences which start with either
code-series-1 or code-series-2.

> I hope this all made some sense...

It did to me, if that is any consolation.

-Doug

---
Preferred: ( dwp@willett.pgh.pa.us  OR  ...!{sei,pitt}!willett!dwp )
Daily: ...!{uunet,nfsun}!willett!dwp  [last resort: dwp@vega.fac.cs.cmu.edu]