[comp.lang.forth] ZEN 15a

UNBCIC@BRFAPESP.BITNET (05/23/91)

Ok. So I decided to convert my full-screen block editor to ZEN. I didn't finish
the convertion but I found some interesting problems.

    The first problem (not a big deal) is -->. The second problem is AT-XY.
It's defined as DROP SPACES. I think it's better leave AT-XY out of ZEN than
defining it this way. The third problem, the one that stopped me is with 8-bit
characters.

    My editor is provided with an accent filter, so I can enter all portuguese
characters easily. Well, to make my program more legible, I did things like:

CHAR ^ CONSTANT _portuguese_name_of_the_accent_

But this don't work with ZEN. The CHAR leaves nothing on the stack, the accent
is ignored (not this, the >127 ones), and CONSTANT is ignored. The word after
constant is then interpreted, returning an error. I have two questions about
this:
1) What is happening?
2) Should this happen on an ANS Forth? May this happen on an ANS Forth? (BASIS
15, of course)

                              (8-DCS)
Daniel C. Sobral
UNBCIC@BRFAPESP.BITNET

Mitch.Bradley@ENG.SUN.COM (05/24/91)

> Ok. So I decided to convert my full-screen block editor to ZEN. I didn't
> finish the conversion but I found some interesting problems.
>
>    The first problem (not a big deal) is -->.

        : -->  REFILL DROP  ;

REFILL is a generalization of --> and QUERY that works on any input source,
returning a flag indicating whether or not the input buffer could be refilled.

> The second problem is AT-XY.
> It's defined as DROP SPACES. I think it's better leave AT-XY out of ZEN than
> defining it this way.

Tough call.  I can see both sides of this argument.

> The third problem, the one that stopped me is with 8-bit characters.
>
>    My editor is provided with an accent filter, so I can enter all portuguese
> characters easily. Well, to make my program more legible, I did things like:
>
> CHAR ^ CONSTANT _portuguese_name_of_the_accent_
>
> But this don't work with ZEN. The CHAR leaves nothing on the stack, the
> accent is ignored (not this, the >127 ones), and CONSTANT is ignored. The
> word after constant is then interpreted, returning an error. I have two
> questions about this:
> 1) What is happening?

Something in the input stream mechanism is filtering out non-standard
characters.  My copy of zen15 is on another machine, so I can't say
for sure, but I would guess that the problem is either in EXPECT or
in the "skip delimiters" portion of WORD.  You might try searching for
"127" in the source code.

> 2) Should this happen on an ANS Forth? May this happen on an ANS Forth?
> (BASIS 15, of course)

It certainly *may* happen.  The character set for the source code of
a standard program is the set of printable characters in the 7-bit
ASCII set.  Use of any other character causes the program to have an
environmental dependency (in this case, on the system supporting the
Portugese character set).  This seems reasonable to me; your program
source code would not look very legible on my system, since I don't
have Portugese characters.

The question about whether or not it *should* happen is a matter of
market economics.  The implementor can choose whether try to support
extended character sets, or whether to accept only the standard set.
Since Zen is a fairly minimal system without many environment-dependent
extensions, and is intended to illustrate the basics of ANS Forth,
I'm not surprised that Martin chose to restrict the character set.

I expect that successful commercial systems will adopt a different
approach, supporting international character sets where available.

A standard ANS Forth system is not required to reject non-printable
characters in blocks, nor is it required to accept them.  The characters
whose meanings are precisely defined in the context of block source
code are the space character and the ASCII characters with codes from
33 to 126.

Mitch.Bradley@Eng.Sun.COM

UNBCIC@BRFAPESP.BITNET (05/24/91)

=> From: Mitch.Bradley@ENG.SUN.COM
=> Subject: RE: ZEN 15A
=> > Ok. So I decided to convert my full-screen block editor to ZEN. I didn't
=> > finish the conversion but I found some interesting problems.
=> >
=> >    The first problem (not a big deal) is -->.
=>
=>      : -->  REFILL DROP  ;
=>
=> REFILL is a generalization of --> and QUERY that works on any input source,
=> returning a flag indicating whether or not the input buffer could be refilled
   .

Thanks. But that was really not important (ZEN have THRU).

=> > The second problem is AT-XY.
=> > It's defined as DROP SPACES. I think it's better leave AT-XY out of ZEN tha
   n
=> > defining it this way.
=>
=> Tough call.  I can see both sides of this argument.

I could understand if it just ignore Y, but he don't test in wich column you
are! AT-XY, in ZEN, is SPACES with a different stack behavior.

=> > The third problem, the one that stopped me is with 8-bit characters.
=> >
=> >    My editor is provided with an accent filter, so I can enter all portugue
   se
=> > characters easily. Well, to make my program more legible, I did things like
   :
=> >
=> > CHAR ^ CONSTANT _portuguese_name_of_the_accent_
=> >
=> > But this don't work with ZEN. The CHAR leaves nothing on the stack, the
=> > accent is ignored (not this, the >127 ones), and CONSTANT is ignored. The
=> > word after constant is then interpreted, returning an error. I have two
=> > questions about this:
=> > 1) What is happening?
=>
=> Something in the input stream mechanism is filtering out non-standard
=> characters.  My copy of zen15 is on another machine, so I can't say
=> for sure, but I would guess that the problem is either in EXPECT or
=> in the "skip delimiters" portion of WORD.  You might try searching for
=> "127" in the source code.

The source file exist. There is no problem with EXPECT (it doesn't accept >126
characteres, but that's normal). WORD? Maybe. But I would expect that CONSTANT
returns an error (nothing on the stack) or define the following word. That's
not happening. The interpreter tries to execute (or interpret as a number) the
word that follows CONSTANT. Strange...

Search for "127"? Why?

=> > 2) Should this happen on an ANS Forth? May this happen on an ANS Forth?
=> > (BASIS 15, of course)
=>
=> It certainly *may* happen.  The character set for the source code of
=> a standard program is the set of printable characters in the 7-bit
=> ASCII set.  Use of any other character causes the program to have an
=> environmental dependency (in this case, on the system supporting the
=> Portugese character set).  This seems reasonable to me; your program
=> source code would not look very legible on my system, since I don't
=> have Portugese characters.

Ok with me.

=> The question about whether or not it *should* happen is a matter of
=> market economics.  The implementor can choose whether try to support
=> extended character sets, or whether to accept only the standard set.
=> Since Zen is a fairly minimal system without many environment-dependent
=> extensions, and is intended to illustrate the basics of ANS Forth,
=> I'm not surprised that Martin chose to restrict the character set.
=>
=> I expect that successful commercial systems will adopt a different
=> approach, supporting international character sets where available.
=>
=> A standard ANS Forth system is not required to reject non-printable
=> characters in blocks, nor is it required to accept them.  The characters
=> whose meanings are precisely defined in the context of block source
=> code are the space character and the ASCII characters with codes from
=> 33 to 126.

Thanks for the information.

=> Mitch.Bradley@Eng.Sun.COM

                              (8-DCS)
Daniel C. Sobral
UNBCIC@BRFAPESP.BITNET

Mitch.Bradley@ENG.SUN.COM (05/24/91)

> I could understand if [ AT-XY ] just ignored Y, but he doesn't test in
> which column you are!

Good point.  Based on that, I agree that  DROP SPACES  is a bad definition
of AT-XY .

> > CHAR ^ CONSTANT _portuguese_name_of_the_accent_
> >

> WORD? Maybe. But I would expect that CONSTANT
> returns an error (nothing on the stack) or define the following word.
> That's not happening. The interpreter tries to execute (or interpret as
> a number) the word that follows CONSTANT. Strange...

You're right.  It's probably WORD .  When interpreting from a text file,
the parsing function of the interpreter (WORD) is *supposed* to ignore
non-printable characters, because many popular text file formats use
various control characters as "white space" (e.g. tab, formfeed, linefeed,
return, word processor formatting characters).  CHAR is trying to parse
a "visible" word, and it is skipping the Portugese character as "white
space", picking up the word "CONSTANT" as the argument of CHAR .  Then
the interpreter tries to interpret the word "_portugese_name...".

One might argue that it would be better to skip just control characters,
rather than all non-printing characters.  That is certainly how I would
implement it.

However, I would also argue that the question in moot in the context of
a standard program without environmental dependencies.

> Search for "127"? Why?

"127 AND" is often used after KEY to remove junk like parity bits
and shift bits.  This technique, although quite common, is bogus,
because throwing away high bits doesn't necessarily result in a meaningful
7-bit ASCII character.  Instead it may for example transform a code that
means the "F7" function key into the letter "T", a behavior for which I
can think of no justification.

Anyway, I don't think this is what is happening in your case; I bet the
"skip" portion of WORD is looking for characters outside the range of
codes 33 to 126 .

However, I wouldn't be surprised to find "KEY 127 AND" in the definition
of EXPECT ; that phrase in an earlier version of Zen.

The correct phrase should be something like this:

126 constant max-graphic        \ Value depends on system character set

        ...
        key  dup bl max-graphic  between  if   ( char )
           <insert character in buffer>
        else
           <process as editing character>
        then

Mitch.Bradley@Eng.Sun.COM

UNBCIC@BRFAPESP.BITNET (05/28/91)

ZEN15a it's an implementation of BASIS 15 (not all, but lots of the wordsets).
The code size is around 13 Kb (as it doesn't resizes it's segments, you will
need an larger EXE).

I got my copy with Doug Philips. (Phillips?)

                              (8-DCS)
Daniel C. Sobra
UNBCIC@BRFAPESP.BITNET