EGNILGES@pucc.Princeton.EDU (Ed Nilges) (06/17/91)
According to the description of the standard fgets library function (reference 1), you are not guaranteed newline at the end of every line...that is, you'll get one at the end of the LAST line (or is it the last-1th line, such that the last line is zero length? enquiring minds want to know) only if it's there in the file. I guess it's the old IBMer in me, who wants the end of a line to be the Edge of the World, but this seems a tad bogus, especially if one is writing a lexical analyser where such issues are important. Is there a true line reader in C? One that would slap on an end of line at the end of the last line if it needed it?
bhoughto@pima.intel.com (Blair P. Houghton) (06/17/91)
In article <12847@pucc.Princeton.EDU> EGNILGES@pucc.Princeton.EDU writes: >According to the description of the standard fgets library function >(reference 1), you are not guaranteed newline at the end of every >line...that is, you'll get one at the end of the LAST line (or is >it the last-1th line, such that the last line is zero length? >enquiring minds want to know) only if it's there in the file. >I guess it's the old IBMer in me, who wants the end of a line to >be the Edge of the World, but this seems a tad bogus, especially >if one is writing a lexical analyser where such issues are >important. Is there a true line reader in C? One that would >slap on an end of line at the end of the last line if it needed >it? As confusing as that was, I think I got it. Yes, fgets at eof may get a line with no newline. Hence: while ( fgets(s, sizeof s, stream) ) /* not eof */ process(s); /* reached iff eof */ if ( strlen(s) != 0 ) { /* there's something left to process */ if ( s[strlen(s) - 1] != '\n' ) /* it has no newline */ strcat(s,"\n"); process(s); } But notice also that the size of the gotten string is limited to the number of chars specified in the second argument to fgets. I.e., fgets may also get a line with no newline when that line is longer than the length you desire. If the line is actually longer than that, the balance will be read on the next call to fgets (possibly also without a newline). Why? Because fgets' responsibility is to fill an array with bytes, not to alter them. It should be the programmer's responsibility to maintain the semantics of input data. --Blair "process("what next?")"
kers@hplb.hpl.hp.com (Chris Dollin) (06/17/91)
Ed Nilges says (about fgets and the optional newline): be the Edge of the World, but this seems a tad bogus, especially if one is writing a lexical analyser where such issues are important. If I was writing a lexical analyser in C, I certainly would not first read in the entire line, not even with fgets; I'd read in characters as required. (How big a biffer should I allocate for fgets? What do I do on line overflow? These are questions I wish to unask.) Is the overhead of reading characters with fgetc really so large? (I suppose if the lexis is suitable bizarre, you may need lots of putback, and being able to just backbump the line index is easy. I find it a crying shame that stdio doesn't mandate arbitrary putback. Still, it's not as bad as Lisp - at least C has an excuse.) -- Regards, Chris ``GC's should take less than 0.1 second'' Dollin.
datangua@watmath.waterloo.edu (David Tanguay) (06/17/91)
In article <4739@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes: > while ( fgets(s, sizeof s, stream) ) > /* not eof */ > process(s); > > /* reached iff eof */ > if ( strlen(s) != 0 ) { > /* there's something left to process */ > if ( s[strlen(s) - 1] != '\n' ) > /* it has no newline */ > strcat(s,"\n"); > process(s); > } Huh? 4.9.7.2: "If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned." fgets does not return NULL for eof when there are characters read, so (barring I/O errors) the above code will process the last "line" twice. -- David Tanguay datanguay@watmath.waterloo.edu Thinkage, Ltd.
EGNILGES@pucc.Princeton.EDU (Ed Nilges) (06/17/91)
In article <4739@inews.intel.com>, bhoughto@pima.intel.com (Blair P. Houghton) writes: > >As confusing as that was, I think I got it. The only confusion resulted from the misuse of a line reader in a lexical analyzer, which is a character-by-character sort of thing. A minor source of confusion was the omission of the reference. It was ANSI C: A Lexical Guide, published by the Mark Williams Company. I took the advice of Mr. Ken Yap down in Australia at CSIRO, and this morning completely altered the lexical analyzer to use getc and ungetc. It considerably simplified the code. The use of a line reader in the first place was the unfortunate byproduct of having an IBM, unit-record background. Yes, there may be a performance penalty on IBM mainframe systems compiling C, in which case the getc and ungetc can be hand-rolled around a unit record reader for efficiency. No, I don't want to use lexx. I do not like the code it generates and (once I get rid of these subtle tendencies to think in IBMerese) I believe I can write more efficient code for the language I am lexxicating. Thanks to Mr. Ken Yap and the rest of the gang on comp.lang.c for your patience.
bhoughto@pima.intel.com (Blair P. Houghton) (06/18/91)
In article <1991Jun17.120927.3802@watmath.waterloo.edu> datangua@watmath.waterloo.edu (David Tanguay) writes: >In article <4739@inews.intel.com> bhoughto@pima.intel.com (Blair P. Houghton) writes: > > while ( fgets(s, sizeof s, stream) ) > > process(s); > > if ( strlen(s) != 0 ) { > > /* there's something left to process */ > >Huh? 4.9.7.2: "If end-of-file is encountered and no characters have been read [...etc...] Urp! Remind me: a. To test code before I post it. b. Not to do it this way... Spoiler alert. If there's anything you don't want to know, don't turn the page... --Blair "Uh, er, uh, Bob made me do it. Yeah, that's it. Everyone thinks that Agent Cooper is the only one he's infested, but he's got me in his clutches now, too. Yeah. That's the ticket..."
robert@isgtec.UUCP (Robert Osborne) (06/18/91)
In article <4739@inews.intel.com>, bhoughto@pima.intel.com (Blair P. Houghton) writes: > Yes, fgets at eof may get a line with no newline. > > Hence: > > while ( fgets(s, sizeof s, stream) ) > /* not eof */ > process(s); Well you really want... if( fgets(s, sizeof s, stream) ) { do { process(s); } while ( fgets(s, sizeof s, stream) ); } or something similar. > /* reached iff eof */ > if ( strlen(s) != 0 ) { > /* there's something left to process */ > if ( s[strlen(s) - 1] != '\n' ) > /* it has no newline */ > strcat(s,"\n"); > process(s); > } This has two str??? calls too many... /* reached iff eof */ if ( (s_length = strlen(s)) != 0 ) { /* there's something left to process */ if ( s[s_length - 1] != '\n' ) { /* it has no newline */ s[s_length] = '\n'; s[s_length + 1] = '\0'; } process(s); } The str??? calls are OFTEN used in this manner and this is a very common optimization that can be made in string handlers. I once cut the running time of a key piece of UI functionality from an intolerable >10 seconds to an almost bearable <5 seconds by performing this kind of "optimization". > But notice also that the size of the gotten string is > limited to the number of chars specified in the second > argument to fgets. And this would be intolerable in a parser. I'm surprised Blair didn't mention this (although he did solve the problem asked). Rob. -- Robert A. Osborne ...uunet!utai!lsuc!isgtec!robert or robert@isgtec.uucp