mnc@m10ux.UUCP (Michael Condict) (07/13/88)
I'm not sure whether this belongs here or in comp.unix.wizards, but it is probably a problem with most implementations of the C stdio library, so here it is: Many of you are probably aware of the bad reputation that sscanf has w.r.t. execution time, especially since it is doing no I/O, right? Wrong! The AT&T Sys V Rel 2 implementation of sscanf (and presumably earlier versions) DOES do I/O, or at least it tries to. Look at sscanf in scanf.c and at _filbuf in filbuf.c. Note that sscanf fakes up a FILE structure for the purpose of allowing getc to be called on the string. It sets the _IOREAD flag in the FILE structure to indicate that the string is read-only and it sets the fd number to _NFILE, to indicate an illegal fd, i.e., that no I/O should be done. Well, eventually, if getc runs off the buffer while trying to satisfy a scanf format item, such as occurs during: sscanf("1234", "%d", &i); then _filbuf will be called to refill the buffer. It will not notice that the _file field of the FILE struct is set to _NFILE and will actually call read on the illegal file fd, causing an error return, not to mention hundreds or thousands of wasted instructions. This can easily add 20% additional CPU time to your process, if you are using sscanf repeatedly. The fix is simple -- insert the following before the test of the _IOREAD flag in _filbuf: if ( iop->_file >= _NFILE) return(EOF); I've just checked the BSD implementation and it doesn't have this problem, so BSD Vaxen and Suns are probably okay. Amdahl UTS (System V Rel 1) definitely does have the problem. -- Michael Condict {ihnp4|vax135|cuae2}!m10ux!mnc AT&T Bell Labs (201)582-5911 MH 3B-416 Murray Hill, NJ