[gnu.g++.lib.bug] BUG in istream input near end of file

jj@idris.id.dk (Jesper Joergensen [ris]) (10/17/89)

To:
    Doug Lea (author of the "stream" classes in the GNU libg++ library)

From:
    Jesper Jorgensen
    Department of Computer Science
    Building 345 West, room 179
    Technical University of Denmark
    DK-2800 Lyngby
    DENMARK

Subject:
    BUG in the "istream" class operators and members functions,
    when applied to input near end-of-file (NOTICE the latter).

Software Versions:
    GNU libg++   version 1.35.1
    GNU g++      version 1.35.0
    GNU gcc      version 1.35
    all received via the  ftp  server  "freja.diku.dk" (address 129.142.96.1)



Dear Doug,

   I've located a BUG in the input operators of the "istream" class, which
   bites under the following conditions:

     given an "istream" where a sequence of characters valid for some input
     operation is IMMEDIATELY followed by <end-of-file> (i.e. no <end-of-line>
     inbetween) then the application of that input operation to the stream will
     return a result as if the last character was duplicated.

   Example:
     If the characters remaining in the input stream  "cin"  is:
       234<end-of-file>
     then  "int number ; cin >> number"  will set "number" equal to 2344.

   NOTICE:
     that the result is the same for an "istream" setup to read from a file and
     for an "istream" setup to read from a character buffer (try it yourself).

   I've tried to analyze the problem and found the cause of the error, which
   is a combination of the following facts (from source code in "stream.cc"):

     (1) The "istream" member function "get" defined as
           istream& istream::get(char& c)
         will (if the stream is in state "_good") get the next character from
         the underlying "streambuf" via the member function "sgetc" then one
         of two things happen:
     (1a) if the value returned is EOF then the "istream" is put into state
          "_eof" AND NO CHARACTER IS RETURNED (reference &c is untouched).
     (1b) otherwise the character returned is put into the reference &c and
          the buffer pointer advanced via the "streambuf" member function
          "stossc" and the state remains unchanged (i.e. "_good").

     (2) The complex input for integers defined as
           istream& istream::operator >> (long& y)
         uses "get" to fetch character by character, but tests the "istream"
         state before "get" is called. So when the last character before
         <end-of-file> is read by "get" the state is still "_good" (1b) and
         the loop will be reexecuted, resulting in the same character being
         used since "get" doesn't touch the argument variable (1a).

   (The amount of details in the above description might seem overwhelming,
    but i didn't want to risc missing any.)

   It seems clear to me that the major problem is that "get" does not set the
   "istream" state "_eof" until an attempt has been made to read past the
   end of the stream. If this is the correct operation for "get" then the
   "istream" state must be tested by the users of "get" AFTER the call to
   validate that a character was actually read before it is used (Why doesn't
   it return something then ???).

   However, if "get" sets the "istream" state "_eof" when the last character
   has been read from the stream (like in PASCAL for instance), then the
   callers need only test BEFORE "get" is invoked. The "_eof" condition must
   be tested and set after the "get" has invoked "stossc" to advance the
   buffer.

   I have looked in Bjarne Stroustrup's book (The C++ Programming Language,
   Addison-Wesley, July 1987, ISBN 0-201-12078-X) for a clear definition of
   when an input stream should enter state "_eof", but no such definition
   is available. However, the description at the bottom of page 238 section
   8.4.2 specifies that a stream used as a test should succeed only if the
   state is "_good", such a test is used in the scanner funcion "get_token"
   in the calculator example in section 3.1 to test for success after a read
   operation, hence it indicates that the state should remain "_good" even if
   the stream has emptied. This conclusion contradicts the fact that the
   introduction to stream states in section 8.4.2 specifies that an input
   operation in state "_good" should succeed, unless the "get" function isn't
   meant to return a stream at all, but merely should return an integer
   indicating the status of its own operation and not implicitly the stream's
   final state.

   As you can see I'm pretty confused and I fear that I've made you confused
   as well (my colleagues considers me an expert on that field). BUT my final
   suggestion is that the state "_eof" should be set when the next character
   to be read is <end-of-file> (like in PASCAL) and then "get" shouldn't return
   a stream but an integer indicating the success of the operation alone.
   This implies minimal rewriting of the complex input operations (but do you
   mind checking them out anyway) and nothing contradicts Stroustrup. Anyway
   you may have better reasons for doing something else, since you're much
   more informed about the various versions and standards of C++ than I am
   (I know only Stroustrup's book and your source code).

   Please mail me as soon as you make a move, because I intend to make a "by
   hand patch" as soon as you have found a solution. If you could give me a
   reason for your move as well, I will appreciate it very much (so that I can
   stop bothering about this).


      Thanks in advance
         Jesper Jorgensen

PS: Where is the istream member function (user defined conversion)
      istream::operator int()
    that returns the test result (state == _good) for an argument stream ???
    (See section 6.3.2 and 8.4.2 in Stroustrup)

dl@G.OSWEGO.EDU (Doug Lea) (10/17/89)

Thanks for the helpful bug report.

The problem was that several of the istream op >> functions
used the construct

while (good()) { get(ch) ... }

when instead they should have done

if (good()) { while (get(ch)) { ... } ... }

Sorry for the slip.

Streams DO use the unix convention that you don't know if you've
reached EOF until you've tried to read a char and failed, but this
behavior is localized to get(char).

These are all fixed for the next release (1.36.0), which, at long
last, should be out within a few days.

AT&T 2.0-compatible iostream classes are in the works, but won't make it
into 1.36.0. They should be ready within a month or so.

-Doug