UNBCIC@BRFAPESP.BITNET (05/23/91)
Ok. So I decided to convert my full-screen block editor to ZEN. I didn't finish the convertion but I found some interesting problems. The first problem (not a big deal) is -->. The second problem is AT-XY. It's defined as DROP SPACES. I think it's better leave AT-XY out of ZEN than defining it this way. The third problem, the one that stopped me is with 8-bit characters. My editor is provided with an accent filter, so I can enter all portuguese characters easily. Well, to make my program more legible, I did things like: CHAR ^ CONSTANT _portuguese_name_of_the_accent_ But this don't work with ZEN. The CHAR leaves nothing on the stack, the accent is ignored (not this, the >127 ones), and CONSTANT is ignored. The word after constant is then interpreted, returning an error. I have two questions about this: 1) What is happening? 2) Should this happen on an ANS Forth? May this happen on an ANS Forth? (BASIS 15, of course) (8-DCS) Daniel C. Sobral UNBCIC@BRFAPESP.BITNET
Mitch.Bradley@ENG.SUN.COM (05/24/91)
> Ok. So I decided to convert my full-screen block editor to ZEN. I didn't > finish the conversion but I found some interesting problems. > > The first problem (not a big deal) is -->. : --> REFILL DROP ; REFILL is a generalization of --> and QUERY that works on any input source, returning a flag indicating whether or not the input buffer could be refilled. > The second problem is AT-XY. > It's defined as DROP SPACES. I think it's better leave AT-XY out of ZEN than > defining it this way. Tough call. I can see both sides of this argument. > The third problem, the one that stopped me is with 8-bit characters. > > My editor is provided with an accent filter, so I can enter all portuguese > characters easily. Well, to make my program more legible, I did things like: > > CHAR ^ CONSTANT _portuguese_name_of_the_accent_ > > But this don't work with ZEN. The CHAR leaves nothing on the stack, the > accent is ignored (not this, the >127 ones), and CONSTANT is ignored. The > word after constant is then interpreted, returning an error. I have two > questions about this: > 1) What is happening? Something in the input stream mechanism is filtering out non-standard characters. My copy of zen15 is on another machine, so I can't say for sure, but I would guess that the problem is either in EXPECT or in the "skip delimiters" portion of WORD. You might try searching for "127" in the source code. > 2) Should this happen on an ANS Forth? May this happen on an ANS Forth? > (BASIS 15, of course) It certainly *may* happen. The character set for the source code of a standard program is the set of printable characters in the 7-bit ASCII set. Use of any other character causes the program to have an environmental dependency (in this case, on the system supporting the Portugese character set). This seems reasonable to me; your program source code would not look very legible on my system, since I don't have Portugese characters. The question about whether or not it *should* happen is a matter of market economics. The implementor can choose whether try to support extended character sets, or whether to accept only the standard set. Since Zen is a fairly minimal system without many environment-dependent extensions, and is intended to illustrate the basics of ANS Forth, I'm not surprised that Martin chose to restrict the character set. I expect that successful commercial systems will adopt a different approach, supporting international character sets where available. A standard ANS Forth system is not required to reject non-printable characters in blocks, nor is it required to accept them. The characters whose meanings are precisely defined in the context of block source code are the space character and the ASCII characters with codes from 33 to 126. Mitch.Bradley@Eng.Sun.COM
UNBCIC@BRFAPESP.BITNET (05/24/91)
=> From: Mitch.Bradley@ENG.SUN.COM => Subject: RE: ZEN 15A => > Ok. So I decided to convert my full-screen block editor to ZEN. I didn't => > finish the conversion but I found some interesting problems. => > => > The first problem (not a big deal) is -->. => => : --> REFILL DROP ; => => REFILL is a generalization of --> and QUERY that works on any input source, => returning a flag indicating whether or not the input buffer could be refilled . Thanks. But that was really not important (ZEN have THRU). => > The second problem is AT-XY. => > It's defined as DROP SPACES. I think it's better leave AT-XY out of ZEN tha n => > defining it this way. => => Tough call. I can see both sides of this argument. I could understand if it just ignore Y, but he don't test in wich column you are! AT-XY, in ZEN, is SPACES with a different stack behavior. => > The third problem, the one that stopped me is with 8-bit characters. => > => > My editor is provided with an accent filter, so I can enter all portugue se => > characters easily. Well, to make my program more legible, I did things like : => > => > CHAR ^ CONSTANT _portuguese_name_of_the_accent_ => > => > But this don't work with ZEN. The CHAR leaves nothing on the stack, the => > accent is ignored (not this, the >127 ones), and CONSTANT is ignored. The => > word after constant is then interpreted, returning an error. I have two => > questions about this: => > 1) What is happening? => => Something in the input stream mechanism is filtering out non-standard => characters. My copy of zen15 is on another machine, so I can't say => for sure, but I would guess that the problem is either in EXPECT or => in the "skip delimiters" portion of WORD. You might try searching for => "127" in the source code. The source file exist. There is no problem with EXPECT (it doesn't accept >126 characteres, but that's normal). WORD? Maybe. But I would expect that CONSTANT returns an error (nothing on the stack) or define the following word. That's not happening. The interpreter tries to execute (or interpret as a number) the word that follows CONSTANT. Strange... Search for "127"? Why? => > 2) Should this happen on an ANS Forth? May this happen on an ANS Forth? => > (BASIS 15, of course) => => It certainly *may* happen. The character set for the source code of => a standard program is the set of printable characters in the 7-bit => ASCII set. Use of any other character causes the program to have an => environmental dependency (in this case, on the system supporting the => Portugese character set). This seems reasonable to me; your program => source code would not look very legible on my system, since I don't => have Portugese characters. Ok with me. => The question about whether or not it *should* happen is a matter of => market economics. The implementor can choose whether try to support => extended character sets, or whether to accept only the standard set. => Since Zen is a fairly minimal system without many environment-dependent => extensions, and is intended to illustrate the basics of ANS Forth, => I'm not surprised that Martin chose to restrict the character set. => => I expect that successful commercial systems will adopt a different => approach, supporting international character sets where available. => => A standard ANS Forth system is not required to reject non-printable => characters in blocks, nor is it required to accept them. The characters => whose meanings are precisely defined in the context of block source => code are the space character and the ASCII characters with codes from => 33 to 126. Thanks for the information. => Mitch.Bradley@Eng.Sun.COM (8-DCS) Daniel C. Sobral UNBCIC@BRFAPESP.BITNET
Mitch.Bradley@ENG.SUN.COM (05/24/91)
> I could understand if [ AT-XY ] just ignored Y, but he doesn't test in > which column you are! Good point. Based on that, I agree that DROP SPACES is a bad definition of AT-XY . > > CHAR ^ CONSTANT _portuguese_name_of_the_accent_ > > > WORD? Maybe. But I would expect that CONSTANT > returns an error (nothing on the stack) or define the following word. > That's not happening. The interpreter tries to execute (or interpret as > a number) the word that follows CONSTANT. Strange... You're right. It's probably WORD . When interpreting from a text file, the parsing function of the interpreter (WORD) is *supposed* to ignore non-printable characters, because many popular text file formats use various control characters as "white space" (e.g. tab, formfeed, linefeed, return, word processor formatting characters). CHAR is trying to parse a "visible" word, and it is skipping the Portugese character as "white space", picking up the word "CONSTANT" as the argument of CHAR . Then the interpreter tries to interpret the word "_portugese_name...". One might argue that it would be better to skip just control characters, rather than all non-printing characters. That is certainly how I would implement it. However, I would also argue that the question in moot in the context of a standard program without environmental dependencies. > Search for "127"? Why? "127 AND" is often used after KEY to remove junk like parity bits and shift bits. This technique, although quite common, is bogus, because throwing away high bits doesn't necessarily result in a meaningful 7-bit ASCII character. Instead it may for example transform a code that means the "F7" function key into the letter "T", a behavior for which I can think of no justification. Anyway, I don't think this is what is happening in your case; I bet the "skip" portion of WORD is looking for characters outside the range of codes 33 to 126 . However, I wouldn't be surprised to find "KEY 127 AND" in the definition of EXPECT ; that phrase in an earlier version of Zen. The correct phrase should be something like this: 126 constant max-graphic \ Value depends on system character set ... key dup bl max-graphic between if ( char ) <insert character in buffer> else <process as editing character> then Mitch.Bradley@Eng.Sun.COM
UNBCIC@BRFAPESP.BITNET (05/28/91)
ZEN15a it's an implementation of BASIS 15 (not all, but lots of the wordsets). The code size is around 13 Kb (as it doesn't resizes it's segments, you will need an larger EXE). I got my copy with Doug Philips. (Phillips?) (8-DCS) Daniel C. Sobra UNBCIC@BRFAPESP.BITNET