[comp.lang.forth] Input stream questions

wmb@MITCH.ENG.SUN.COM (02/13/91)

> I am posting this proposal (which was passed at the most recent
> meeting of X3J14, because everything was deleted from it except for
> (16)) so that anyone and everyone can respond to the questions raised
> in it.

Another way of looking at this is that the proposal itself was not
actionable (i.e. there is no specific "Proposal" section, and
"Discussion" is not in itself actionable).

Nevertheless, we tried very hard to give it the "benefit of the doubt".
I read the discussion section at the meeting (it arrived in the middle
of the meeting, and I didn't have much time), found the one issue that
appeared to represent a real technical problem, and created an
actionable proposal out of that issue.

There are some things that could indeed use additional clarification,
although I believe that the letter of the Basis document is correct.
For those issues, we solicit proposals for specific improved wording.

Also, having had time to carefully reconsider all these points, I believe
that there is one more technical issue that needs to be addressed (see
the end of item 5.)  I don't think this is a serious problem, though;
rather, it is a usage restriction that should be noted.

> One member of the X3J14 Technical Committee (TC) has agreed to
> respond point-by-point to these questions.  Those responses will also
> be posted.

I'll shorten the loop by posting the response directly.  Read on.

Mitch Bradley, wmb@Eng.Sun.COM

> ----------------------------------------------------------------------
> ANSI ASC X3 / X3J14 Forth Technical Proposal    TP91-1082
> ----------------------------------------------------------------------
> Title:  Clarify BASIS on the subject of the Forth input stream.
> ----------------------------------------------------------------------
> Related Proposals:      TP90-833, TP91-1070
> ----------------------------------------------------------------------
> Keyword(s):     input stream, block file, implementation-defined
> default block space, text file
> ----------------------------------------------------------------------
> Forth word(s):  #TIB ( >IN BLK EVALUATE INCLUDE INCLUDE-FILE LOAD
> LOAD-FILE QUERY REFILL SOURCE-FILE SPAN TIB \
> ----------------------------------------------------------------------
> Abstract:       Clarify BASIS on the subject of the Forth input stream.
>  BASIS is still unclear on the specific input stream ``entitlements''
> to the point that we have heard differing interprpreter, 5.3.2
> addressable memory, 5.3.4.1 Input Stream, 8.1.006 #TIB, 8.1.0080 (,
> 8.1.0560 >IN, 8.1.0790 BLK, 8.1.1360 EVALUATE, 10.1.1713 INCLUDE,
> 10.1.1717 INCLUDE-FILE, 11.1.1790 LOAD, 10.2.1792 LOAD-FILE, 8.2.2040
> QUERY, 8.2.2125 REFILL, 10.1.2218 SOURCE-FILE, 8.2.2240 SPAN, 8.1.2290
> TIB, and 8.2.2535 \ (plus other new sections).
> ----------------------------------------------------------------------
> Discussion:
>
> In the following discussion the term ``legitimate'' means ``portable,
> consistent with the intention of X3J14, and within the boundaries of
> good taste.''
>
> 1)      There is a question about the general nature of #TIB.  Which of
> the following are legitimate and correct in ANS Forth?
>
> : stream-empty?  ( -- flag )  BLK @     : stream-empty?  ( -- flag )
>   IF   1024                               #TIB @  >IN @  = ;
>   ELSE #TIB @
>   THEN  >IN @ = ;
>
>        It appears, given the letter of BASIS, that the first ought to
> be correct in all cases, but is #TIB also meant to hold the input stream
> length?

The first is indeed correct.  #TIB is only valid if BLK is 0.  Reference:
5.3.4.1 Input Stream (Basis 14).

> 2)      There is a question about the general nature of TIB.  Which of
> the following are legitimate and correct in ANS Forth?
>
> : stream  ( -- c-addr u )  BLK @  ?DUP  : stream  ( -- c-addr u )
>   IF   BLOCK  1024                        TIB  #TIB @ ;
>   ELSE TIB  #TIB @
>   THEN ;
>
>         It appears, given the letter of BASIS, that the first ought to
> be correct in all cases, but is TIB also meant to specify the beginning
> of the input stream (i.e., does it call BLOCK)?  Also, does the value
> returned by TIB ever change, e.g. during EVALUATE, INCLUDE, or
> INCLUDE-FILE, or do the values of #TIB and >IN change relative to it?

The first is correct.  TIB and BLOCK are completely unrelated.  The
value of TIB may or may not change, at the discretion of the implementor.
For example, a valid implementation could have a fixed TIB buffer, and save
its contents somewhere else when the input stream nests to a text file
or to an EVALUATEd string.  Another valid implementation could change
the address returned by TIB whenver the input stream nests.

The second implementation is more likely, because the standard imposes
no restriction on the length of the string that can be EVALUATEd, thus
in the first implementation, the fixed TIB buffer would have to be
arbitrarily large.

Consequently, a standard program should be written so it doesn't care
whether or not TIB changes across input stream boundaries.  This should
not be a problem; just use TIB anytime you need that address, rather
than saving a copy of its value somewhere.

Reference: Basis 14 does not say or imply that TIB is a fixed buffer.

> 3)      There is a question about what >IN holds.  Are the following
> legitimate and correct in ANS Forth?
>
> : initialize-stream  ( -- )     : skip-stream  ( -- )
>   0 >IN ! ;                       #TIB @  >IN ! ;
>
>         That is, can anything but a previously saved value of >IN be
> legitimately stored in >IN by a user?

Both are legitimate and correct.  This used to not be the case, but
it was changed as part of the text file input stream stuff about a
year ago (there were some editorial errors that prevented it from
being correctly presented in the first couple of Basis documents
after the change was approved).

The confusion arose because of a rationale box below the glossary entry
for >IN .  That glossary entry is incorrect, and will not appear in
Basis 15 (Reference: proposal TP-1035).

Also, the entitlements for setting >IN are clarified in new wording
for section 5.3.4.1, as a result of proposal TP-1032.

Reference: 5.3.4.1

> 4)      There is a question about what #TIB holds.  Is the following
> legitimate and correct in ANS Forth?
>
> : truncate-stream  ( -- )
>   >IN @  #TIB ! ;
>
>         That is, can #TIB ever be legitimately modified directly by a
> user?

No and no.  Basis is not crystal clear on this prohibition.  Specific
wording to make this clear would be appreciated.  TIB and #TIB should
be considered "read only" for user programs.

The paragraph in 5.3.4.1 that seems to imply that #TIB is alterable
has been replaced.

> 5)      There is a question about what BLK holds.  Is the following
> legitimate and correct in ANS Forth?
>
> : ans-query  ( -- )
>   TIB +n ACCEPT  #TIB !  0 BLK !  0 >IN ! ;

No, because
  1) You don't necessarily know the size of the buffer that the current
     value of TIB points to.
  2) Storing into #TIB is bogus (the system can do it, but not the user).

>         That is, (in addition to the above questions) can BLK ever be
> legitimately modified directly by a user?

Yes, BLK can be modified.  The new wording for section 5.3.4.1 (alluded-to
above) specifically says so.

>  If not, what does it mean to
> say that ``QUERY is semantically equivalent to [that] sequence...''
> (BASIS section 8.2.2040 QUERY)?

It means that the results of executing QUERY are as described by that
code.   It doesn't mean that the user is allowed to write that code
or similar code.

QUERY really ought to be described in English instead of in code.

Actually, QUERY is somewhat bogus; suppose you write and execute:

        : FOO " QUERY" EVALUATE  ;

This could overwrite the definition FOO !

This is worth a specific proposal; If I had noticed this, I would
have brought it up at the last meeting (I didn't have much time
to evaluate this proposal/discussion as it came in fairly late
in the meeting).

> 6)      There is a question about accessing the contents of the input
> stream.  Is any of the following legitimate and correct in ANS Forth?
>
> : uppercase  ( c-addr u -- )  ... ;  ( converts string to uppercase )
> : uppercase-next-word  ( -- )
>   >IN  DUP @ 2>R  BL PARSE  uppercase  2R> SWAP ! ;
>
> : remaining-stream  ( c-addr u -- )
>   stream  >IN @ CHARS  TUCK - >R  +  R> ;
> : CHAR  ( "ccc" -- char )
>   remaining-stream 0= ABORT" Stream empty."  C@
>   BL WORD DROP ;
>
>         Given that uppercase-next-word will not generally work if the
> input stream is coming from a block (and what about EVALUATE?), can the
> contents of the input stream ever be legitimately modified directly by a
> user (uppercase-next-word, above)?  Can the contents of the input stream
> ever be legitimately manipulated in any way directly by a user (CHAR,
> above), or is it only ever legitimate to parse from the input stream?

The remaining-stream / CHAR example is fine, but the uppercase-next-word
example has problems.  For instance, suppose the input stream is coming
from a string in read-only memory
        e.g. : FOO " UPPERCASE-NEXT-WORD xyzzy" EVALUATE  ;
where FOO has been precompiled into ROM.

> If it is not legitimate for a user to directly calculate addresses
> within the input stream without parsing, can a user ever do anything
> legitimate with the value of  TIB?

You can directly calculate an address and then use C@ to read the character
at that address, but writing into the input stream is not guaranteed to
work.

Not that the CHAR example, although legitimate and correct, does not
skip leading delimiters, so it may or may not be what you want.

> 7)      There is a question about parsing.  Is the following legitimate
> and correct in ANS Forth?
>
> : CHAR  ( "ccc" -- char )
>   BL WORD  COUNT 0= ABORT" Stream empty."  C@ ;

No problem.  Totally okay.

>         Section 4.0575 Parsing says, ``If the current input stream is
> empty or contains no characters other than the delimiter, the string is
> empty.''  Is an empty string-implementation defined, or is the count
> zero?

The count is zero.  This was unclear in the definitions of WORD (8.1.2450)
and PARSE (8.2.2008) and we passed proposal TP-1025 to fix it.

> 8)      Is it true that, ``words using TIB and modifying the contents
> of BLK, >IN, or #TIB are responsible for maintaining the integrity of
> the input stream specification and its contents''  (changes added)?
> Ought section 5.3.4.1 Input Stream paragraph four (from where the above
> was taken) to be so modified?  Ought there to be more prose in that
> section about nesting and un-nesting the input stream and if so, what
> prose?

Proposal TP-1032 totally rewrote the section in question, with the
intention of making it clear what you can and cannot do to the input
stream.  I think the new prose is clear (John Hayes wrote it).  Your
mileage may vary.

> 9)      There is a question about CURRENT-FILE and SOURCE-FILE.  We
> have heard from the TC that CURRENT-FILE is obsolete, that SOURCE-FILE
> is obsolete, and that they are the same!  We most definitely believe
> (and assume) that CURRENT-FILE is not obsolete, that SOURCE-FILE (as a
> named value) is obsolete, and that they are definitely not the same
> thing (see TP91-1070).  C'mon, which is it guys?

The whole block-file/block/text-file thing has been cleaned up and
clarified.  Several of us stayed up until 2AM one night to make sure
everybody is on the same wavelength.  As a result, we passed TP-1070
(by the same authors as TP-1082, on which I am commenting), so that:

     BLOCK-FID is a variable containing the fileid of the file containing
     the block space, or containing 0 for the implementation-defined
     default block space.  (This used to called CURRENT-FILE but that
     name confused everybody.)  It is possible to store into this variable.

     SOURCE-FILE returns the fileid of the file that is the source of
     text file input, or 0 if the input is coming from the keyboard, or
     -1 if the input is coming from an EVALUATEd string.  SOURCE-FILE
     may not be directly modified by the user; it is set by the system
     as a result of executing INCLUDE-FILE (and related words) or
     EVALUATE .

     BLOCK-FID determines the source of blocks, regardless of the value
     of SOURCE-FILE, and SOURCE-FILE determines the source of text
     input line, regardless of the value contained in BLOCK-FID .

> 10)     The wording of BASIS section 8.2.2125 REFILL is unclear and /
> or incorrect.  When BLK contains a non-zero value and SOURCE-FILE
> returns a non-zero value, section 8.2.2125 specifies that REFILL takes
> the action ``0 >IN !  1 BLK +! 0.''  First, does that means that the
> system next interprets the block specified by the new value in BLK in
> the ``block space'' specified by CURRENT-FILE (regardless of the value
> in SOURCE-FILE)?  Second, if CURRENT-FILE contains a non-zero value,
> shouldn't REFILL return other than zero, or should the system simply
> abort if a user attempts a REFILL beyond the end of a block file?

This was indeed a problem.  TP-1048 fixed it.  Thanks for bringing
up the issue.  REFILL can now return "sorry, didn't work" for blocks.

> 11)     Is SOURCE-FILE useful in any way to a user?  It is used in
> BASIS to help explain the way INCLUDE, INCLUDE-FILE, and REFILL work,
> but it can only be modified by INCLUDE, INCLUDE-FILE and little can be
> done with it by a user except defining:  : INTERPRETING-TEXT-FILE?  ( --
> flag )  SOURCE-FILE 0<> ;.  We cannot see the need for that system value
> to be given a name and will help rewrite BASIS in terms of ``source
> file'' (the file whose text gets interpreted by INCLUDE and
> INCLUDE-FILE), removing any reference to SOURCE-FILE.

This observation is correct, and the example of how it may be useful
is also correct.  However, it is my belief that there is value in
exposing SOURCE-FILE to the user, because specific systems may choose
to provide other file operations in addition to the set of standard
ones.  It is nice to have a standard name for the word thats returns
the fileid of the input file; every text-file-based system is going
to have the word anyway, so they might as well use the same name!

> 12)     Section 10.1.1029 CURRENT-FILE refers to ``current block file''
> and ``implementation-defined default block space,'' neither of which are
> defined.  A ``block file'' probably means ``whenever the contents of
> CURRENT-FILE are non-zero'' and differs from the implementation-defined
> default block space only in that the concept of end-of-file is
> meaningful for a block file.  Ought ``block file,'' ``current block
> file,'' and ``implementation-defined default block space'' be defined in
> the Definition of Terms and if so, how?

Agreed (except that the name is now BLOCK-FID).  Specific wording would
be appreciated.

Actually, the concept of "end-of-file" may be meaningful for the
implementation-defined default block space for some implementations.

"implementation-defined default block space" is the place where blocks
come from if the program doesn't do anything to explicitly change it
(like storing a new fileid in BLOCK-FID, or executing LOAD-FILE).

> 13)     There are some unclear aspects of the specification of comments
> (BASIS sections 8.1.0080( and 8.2.2535 \).  Section 8.1.0080 refers to
> ``text file,'' but that term is nowhere defined.  It probably means
> ``whenever the contents of BLK are zero and the source fileid is
> non-zero.''  Ought ``text file'' be defined in the Definition of Terms?

Yes, it should.  Again, a proposal suggesting specific wording would
be appreciated.

> Section 8.2.235 refers to ``line,'' but section 4.0510 (the definition
> of the term ``line'') is inadequate to describe the lines in blocks (and
> block files) versus those in text files (it describes how ``lines'' are
> displayed).  What is meant by a line within a block (e.g. is it 64
> characters) and to what does section 4.0510 refer?  Ought the wording of
> section 4.0510 be improved and if so, how?

It was changed by TP-1021.  The current wording says something to the
effect that a line is what you see on one line of a display, and that
BLOCK source is conventionally represented as 16 lines of 64 characters.
I don't have the actual wording the we agree on written down; but it will
be in Basis 15.

> 14)     There are some questions about (.
>       - Section 8.1.0080 ( spells out that a null comment, (), is
> allowed so that ( cannot be implemented in terms of WORD (though
> sections 8.1.0190 ." and 8.2.0200 .( do not specify that behavior).
> Does that behavior need a rationale note?

Probably.  Please write one.  By the way, ( and ." and .( and S" and
C" can be implemented in terms of PARSE, which was one of the reasons
for including the word PARSE .

>       - Sections 8.1.0080 ( and 8.2.0200 .( specify that those words
> are immediate.  Is that a mistake and if not, why not?

What we really mean by this is that the compilation semantics and
the execution semantics of ( and .( are identical.

For non-immediate words, the compilation semantics is to add the
execution semantics to the current definition.

For most other immediate words, there are distinctly-different
compilation semantics and execution semantics.  For example,
the compilation semantics of IF involve pushing something on
the control flow stack and compiling a transfer-of-control, whereas
the execution semantics involve the data stack.

Considering the small number of words like ( , with identical
compilation and execution semantics, it would probably be a
good idea to just spell it out in the glossary entry.

Note, however, that this is a non-technical issue, in that the
clarification does not change the meaning.  Thus, such a change
could be made after the document goes to the dpANS review stage.

>       - Does X3J14 really want to require ( to comment to the end of a
> text file?

This point has been argued over and over, and we could never get an
overwhelming majority.  The vote was consistenly about 65% for multi-line
comments, 35% against (my surveys of users suggest a much stronger
majority in favor of multi-line comments).  The ultimate decision was
to put the issue to a letter ballot (to which ALL committee members,
not just the ones present at a particular meeting, MUST respond).
Then we would just go with a simple majority of that vote.  Everybody
at the meeting was willing to go along with that, because nobody felt
so strongly about the issue that they were willing to make a federal
case of it.

> 15)     Is the following legitimate and correct in ANS Forth and does
> it immediately exit the current input stream?
>
> : EXIT-INPUT  ( -- )  BLK @
>   IF   1024 >IN !
>   ELSE POSTPONE \
>        BEGIN
>             REFILL 0=
>        UNTIL
>   THEN ;

Yes.

> 16)     We asked the question of the Input Stream Working Group, ``why
> does section 11.1.1790 LOAD have the stack diagram ( i*w -- j*w )?'' and
> got explanations as to what the ``i*x'' mean.  To us, that stack comment
> says, ``LOAD takes a variable number of arguments and leaves a variable
> (and potentially different) number of arguments.''  Didn't LOAD used to
> take a block number?

The stack diagram should be  ( i*w u -- j*w ) , where "u" is the block
number.  A similar stack diagram applies to LOAD-FILE .  The i*w and
j*w implies that LOAD and LOAD-FILE may interpret code that has the
effect of popping, pushing, or modifying arbitrary stack items.

This discussion item was "promoted" from a discussion item to an actual
proposal.

> ----------------------------------------------------------------------
> Date:           February 1, 1991
> Submitted by:   David C. Petty and Bent Schmidt-Nielsen
> Address:        Post Office Box Two
>                 Cambridge,  MA    02140-0001
> Internet:       dcp@world.std.com
> Telephone:      +1(617)492-1232         FAX:    +1(617)491-2345
>
> ANSI ASC X3 / X3J14 Forth Standards Committee
> 111 North Sepulveda Boulevard, Suite 300  Manhattan Beach,  CA
> 90266-6861
> -----------------------------------------------------------------------
> X3J14/87-021    12/04/1987

Thanks for bringing up these issues.


The following is a general request directed at all proposal writers:

It is *REALLY* helpful if proposals are broken down into smaller pieces.
The logistic difficulties of dealing with "omnibus" proposals are severe.
This hurts the chances of the proposal.

Also, proposals suggesting specific wording are much easier to handle
than proposals to the effect of "you guys should do something about this".
Basically, in order to act on the latter, the proposal has to go back
in the pile, and somebody has to volunteer to take ownership of the issue
and write up some specific wording that can be acted on.  Speaking from
personal experience as a person who has done more than my share of such
volunteering, it is a pain in the butt to do this, and the time it takes
to do it sometimes causes me to miss the discussion on other issues.


Mitch Bradley, wmb@Eng.Sun.COM