jj@idris.id.dk (Jesper Joergensen [ris]) (10/17/89)
To: Doug Lea (author of the "stream" classes in the GNU libg++ library) From: Jesper Jorgensen Department of Computer Science Building 345 West, room 179 Technical University of Denmark DK-2800 Lyngby DENMARK Subject: BUG in the "istream" class operators and members functions, when applied to input near end-of-file (NOTICE the latter). Software Versions: GNU libg++ version 1.35.1 GNU g++ version 1.35.0 GNU gcc version 1.35 all received via the ftp server "freja.diku.dk" (address 129.142.96.1) Dear Doug, I've located a BUG in the input operators of the "istream" class, which bites under the following conditions: given an "istream" where a sequence of characters valid for some input operation is IMMEDIATELY followed by <end-of-file> (i.e. no <end-of-line> inbetween) then the application of that input operation to the stream will return a result as if the last character was duplicated. Example: If the characters remaining in the input stream "cin" is: 234<end-of-file> then "int number ; cin >> number" will set "number" equal to 2344. NOTICE: that the result is the same for an "istream" setup to read from a file and for an "istream" setup to read from a character buffer (try it yourself). I've tried to analyze the problem and found the cause of the error, which is a combination of the following facts (from source code in "stream.cc"): (1) The "istream" member function "get" defined as istream& istream::get(char& c) will (if the stream is in state "_good") get the next character from the underlying "streambuf" via the member function "sgetc" then one of two things happen: (1a) if the value returned is EOF then the "istream" is put into state "_eof" AND NO CHARACTER IS RETURNED (reference &c is untouched). (1b) otherwise the character returned is put into the reference &c and the buffer pointer advanced via the "streambuf" member function "stossc" and the state remains unchanged (i.e. "_good"). (2) The complex input for integers defined as istream& istream::operator >> (long& y) uses "get" to fetch character by character, but tests the "istream" state before "get" is called. So when the last character before <end-of-file> is read by "get" the state is still "_good" (1b) and the loop will be reexecuted, resulting in the same character being used since "get" doesn't touch the argument variable (1a). (The amount of details in the above description might seem overwhelming, but i didn't want to risc missing any.) It seems clear to me that the major problem is that "get" does not set the "istream" state "_eof" until an attempt has been made to read past the end of the stream. If this is the correct operation for "get" then the "istream" state must be tested by the users of "get" AFTER the call to validate that a character was actually read before it is used (Why doesn't it return something then ???). However, if "get" sets the "istream" state "_eof" when the last character has been read from the stream (like in PASCAL for instance), then the callers need only test BEFORE "get" is invoked. The "_eof" condition must be tested and set after the "get" has invoked "stossc" to advance the buffer. I have looked in Bjarne Stroustrup's book (The C++ Programming Language, Addison-Wesley, July 1987, ISBN 0-201-12078-X) for a clear definition of when an input stream should enter state "_eof", but no such definition is available. However, the description at the bottom of page 238 section 8.4.2 specifies that a stream used as a test should succeed only if the state is "_good", such a test is used in the scanner funcion "get_token" in the calculator example in section 3.1 to test for success after a read operation, hence it indicates that the state should remain "_good" even if the stream has emptied. This conclusion contradicts the fact that the introduction to stream states in section 8.4.2 specifies that an input operation in state "_good" should succeed, unless the "get" function isn't meant to return a stream at all, but merely should return an integer indicating the status of its own operation and not implicitly the stream's final state. As you can see I'm pretty confused and I fear that I've made you confused as well (my colleagues considers me an expert on that field). BUT my final suggestion is that the state "_eof" should be set when the next character to be read is <end-of-file> (like in PASCAL) and then "get" shouldn't return a stream but an integer indicating the success of the operation alone. This implies minimal rewriting of the complex input operations (but do you mind checking them out anyway) and nothing contradicts Stroustrup. Anyway you may have better reasons for doing something else, since you're much more informed about the various versions and standards of C++ than I am (I know only Stroustrup's book and your source code). Please mail me as soon as you make a move, because I intend to make a "by hand patch" as soon as you have found a solution. If you could give me a reason for your move as well, I will appreciate it very much (so that I can stop bothering about this). Thanks in advance Jesper Jorgensen PS: Where is the istream member function (user defined conversion) istream::operator int() that returns the test result (state == _good) for an argument stream ??? (See section 6.3.2 and 8.4.2 in Stroustrup)
dl@G.OSWEGO.EDU (Doug Lea) (10/17/89)
Thanks for the helpful bug report. The problem was that several of the istream op >> functions used the construct while (good()) { get(ch) ... } when instead they should have done if (good()) { while (get(ch)) { ... } ... } Sorry for the slip. Streams DO use the unix convention that you don't know if you've reached EOF until you've tried to read a char and failed, but this behavior is localized to get(char). These are all fixed for the next release (1.36.0), which, at long last, should be out within a few days. AT&T 2.0-compatible iostream classes are in the works, but won't make it into 1.36.0. They should be ready within a month or so. -Doug