[net.lang.c] ugliness in scanf

cdl@mplvax.UUCP (Carl Lowenstein) (05/06/85)

The same ugly undocumented behavior has shown up with the stdio library
using 3 C compilers on 3 operating systems (DECUS C on RT-11, cc on SysV,
(3b2), and cc on 4.2BSD (vax).  First to quote the documentation:
"These functions return . . . a short count for . . . illegal data items"
(SysV).  ". . . if conversion was intended, it was frustrated by an
inappropriate character in the input." (4.2BSD).

Ok, but the character pointer is never advanced past that inappropriate
character, so the poor user's program is either stuck in an infinite
loop or else it has to advance the pointer to get going again.
Surely others have noticed this in the past.  

Below is a little test program which shows a workaround.  Without the
getchar(), it will loop until you get tired of watching it.  Try it with
bad octal digits like 8,9,a,b . . .
/*-------------------------------------------------------------------------*/
/*	scanft.c	*/
/*
 *	look at bug in scanf
 */
#include <stdio.h>

main()
{
	int i, k;

	for (;;) {
		printf("\n number: ");
		k = scanf("%o", &i);
		printf("scanf returns %d\n",k);
		if (k == EOF) break;
		if (k == 0){
			i = getchar();	/* flush a character	*/
			printf("	choked on '%c'\n",i);
			continue;	/* go back and ask again	*/
		}
		printf("value = %o\n", i);
	} 
	exit(0);
}
/*-------------------------------------------------------------------------*/

-- 
	carl lowenstein		marine physical lab	u.c. san diego
	{ihnp4|decvax|akgua|dcdwest|ucbvax}	!sdcsvax!mplvax!cdl

gwyn@Brl.ARPA (VLD/VMB) (05/08/85)

That's not a bug, it's a feature.  How else would you be able
to determine what comes next when a scanf stops prematurely?
If it ate the "failing" character, you could never see what it
was.  I think the routine was designed on the assumption that
the programmer would not be so stupid as to keep trying to
scan a chunk of input over & over with the same failing format.

cdl@mplvax.UUCP (Carl Lowenstein) (05/09/85)

In article <10496@brl-tgr.ARPA> gwyn@Brl.ARPA (VLD/VMB) writes:
>That's not a bug, it's a feature.  How else would you be able
>to determine what comes next when a scanf stops prematurely?
>If it ate the "failing" character, you could never see what it
>was.  I think the routine was designed on the assumption that
>the programmer would not be so stupid as to keep trying to
>scan a chunk of input over & over with the same failing format.

*mild flame*

This programmer is so stupid as to expect to find the behavior
of scanf documented in the manual.

*unflame*

-- 
	carl lowenstein		marine physical lab	u.c. san diego
	{ihnp4|decvax|akgua|dcdwest|ucbvax}	!sdcsvax!mplvax!cdl

zben@umd5.UUCP (05/12/85)

In article <190@mplvax.UUCP> cdl@mplvax.UUCP (Carl Lowenstein) writes:
>This programmer is so stupid as to expect to find the behavior
>of scanf documented in the manual.

Ye Gods! Expect the behavior of system primitives to be DOCUMENTED in
the MANUAL??  Why, why, thats as bad as expecting meaningful diagnostics
from the system language compiler!  "Error in conditional" indeed...

Clearly this poor person is from a 'dinosaur' environment, probably an
IBM 370 or Univac 1100 system, where people actually take more than 10
seconds to document what they have done, and where you have a ghost of a
chance of finding out **ANYTHING** from the manuals, as opposed to having
to prostrate yourself before a Unix Guru (read "high priest") to get the
real scoop...

Clearly I'm more than a little burned by being called a 'high priest' for
merely spending 15 years reading Univac manuals and system code, to get to
the point where I can *answer* questions from users too *lazy* to *read*
the manuals...  Still, this sort of sillyness is exactly why I have a 
hard time believing that Unix and C are "for real".

At this point I find Unix and C to be at the halfway point in the reality
spectrum between my real Univac 1100 work and trying to do systems programs
in Applesoft Basic...
-- 
Ben Cranston  ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben  zben@umd2.ARPA

matt@oddjob.UUCP (Matt Crawford) (05/12/85)

In article <190@mplvax.UUCP> cdl@mplvax.UUCP (Carl Lowenstein) writes:
>In article <10496@brl-tgr.ARPA> gwyn@Brl.ARPA (VLD/VMB) writes:
>>If it ate the "failing" character, you could never see what it
>>was.  I think the routine was designed on the assumption that
>>the programmer would not be so stupid as to keep trying to
>>scan a chunk of input over & over with the same failing format.
>
>*mild flame*
>
>This programmer is so stupid as to expect to find the behavior
>of scanf documented in the manual.
>
>*unflame*
>	carl lowenstein		marine physical lab	u.c. san diego

THIS programmer is not too arrogant to open the manual before telling
someone what's not in it:

SCANF(3S)	    UNIX Programmer's Manual		SCANF(3S)

     For example, .....

	  int i; float x; char name[50];
	  scanf("%2d%f%*d%[1234567890]", &i, &x, name);

     with input

	  56789	0123 56a72

     will assign 56 to i, 789.0	to x, skip `0123', and place the
     string `56\0' in name.  The next call to getchar will return
     `a'.                    ------------------------------------
     ----


If you make a mistake you can (a) admit it, (b) shut up, or (c)
prolong the argument and provide more entertainment.  I will choose
course (a) and admit that I am making a mistake by posting anything
at all on this subject.
_____________________________________________________
Matt		University	crawford@anl-mcs.arpa
Crawford	of Chicago	ihnp4!oddjob!matt

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (05/12/85)

> Clearly I'm more than a little burned by being called a 'high priest' for
> merely spending 15 years reading Univac manuals and system code, to get to
> the point where I can *answer* questions from users too *lazy* to *read*
> the manuals...  Still, this sort of sillyness is exactly why I have a 
> hard time believing that Unix and C are "for real".

UNIX was developed by and for intelligent programmers.

geoff@burl.UUCP (geoff) (05/13/85)

> >the programmer would not be so stupid as to keep trying to
> >scan a chunk of input over & over with the same failing format.
> 
> *mild flame*
> 
> This programmer is so stupid as to expect to find the behavior
> of scanf documented in the manual.
> 
> *unflame*
> 
> -- 
> 	carl lowenstein		marine physical lab	u.c. san diego
> 	{ihnp4|decvax|akgua|dcdwest|ucbvax}	!sdcsvax!mplvax!cdl

how about the bottom of page 2 of scanf documentation (V5.2)--

"Scanf conversion terminates at EOF, at the end of the control string, or
when an input character conflicts with the control string.  In the latter
case, the offending character is left unread in the input stream."

I can only surmise that you have a different version of the manual --
it does seem quite clear.
	geoff sherwood

jack@boring.UUCP (05/13/85)

Ahh, the joys of scanf.....

Something I've tried about every year in the last decade but
haven't got to work on any machine is the following :

main() {
    char buf[64];

    printf("Gimme string -");
    scanf("%s\n", buf);
    ...
I tried to leave the \n out, putting a space in it's place,
putting a space before the %s, everything.
Never, though, have I succeeded in read the *first* string from
stdin with scanf (the rest is no problem). So, everytime I need
to do this, I fiddle with scanf for an hour or so, and then
replace the scanf by a fgets() or gets().

Question: Am I asking impossible things from scanf, or an I just
soooooo very stupid that I haven't found out how to do this in
many many years????
(I would prefer answers in the form 'it is impossible',
but I'll settle for 'you are stupid', if accompanied by an explanation
*why* I am stupid).

Note that this is about reading the *first* string from stdin.
After that, things are fine, as long as you're careful where
to scan the \n's, etc.
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (05/14/85)

> Something I've tried about every year in the last decade but
> haven't got to work on any machine is the following :
> 
> main() {
>     char buf[64];
> 
>     printf("Gimme string -");
>     scanf("%s\n", buf);
>     ...

Try:
	#include <stdio.h>
	/*ARGSUSED*/
	main(argc, argv)
		char	*argv[];
	{
		char	buf[64];

		(void)printf("Gimme string -");
		(void)scanf("%[^\n]", buf);
		(void)getchar();	/* eat NL */
		...
	}

It is important not to begin or end the format string with whitespace,
since that causes ALL whitespace in the input stream at that point to
be skipped.  In particular, trying to consume the newline with the
format statement will cause you to have to type extra stuff before the
first line scan is considered complete, and leading whitespace on the
second line will be eaten.

By the way, what happens if someone types a very long line in response
to your prompt?  (This sort of thing caused some really bad security
loopholes in older UNIX systems.)  The safe way to input a line is
with fgets() (NOT gets()).

geoff@utcs.UUCP (Geoff Collyer) (05/14/85)

In article <10600@brl-tgr.ARPA> gwyn@brl-tgr.ARPA (Doug Gwyn) writes:
>> ... I can *answer* questions from users too *lazy* to *read*
>> the manuals...
>
>UNIX was developed by and for intelligent programmers.

printf(3S) and scanf(3S) are incomplete and slippery specifications.
If you doubt this, try writing the code from the manual pages.  The
current manual pages (and source code?) seem to be descended from the
v6 Portable C library (-lp) and have never been substantially
modified.

The v7 printf(3S) implies that one can supply a format specifier of %lu
to print an unsigned long int.  The v7 C compiler doesn't support
unsigned longs, yet %lu will print a long int as if it were unsigned.
It is possible to express this in C (assuming twos-complement
representation) by heroic measures.

What should printf do when given the format specifier %017s and a
string shorter than 17 characters?  I read printf(3S) as saying that
printf will pad with zeroes, though the v7 printf (at least) pads with
blanks and Dennis Ritchie has argued that this is desirable behaviour.

Various of System III or V don't support zero padding when the field
width begins with a zero.  AT&T has converted this incompatible
behaviour from a bug into a feature by documenting it (at least in
System V).  To date, ANSI has wisely sided against AT&T in this case.

scanf(3S) implies that inappropriate characters in the input will be
left unread, but this is not possible, given stdio's (zero or) one
character of pushback, for pathological input such as 3.4e-z under %f;
the best one can do is to push back the z, though all of e-z should be
pushed-back.

At a quick glance, the draft ANSI C library write-ups for printf and
scanf seem better than the UNIX manual pages, though still not as
explicit as I would like.
-- 
"All I'm after is just a *mediocre* brain,
 something like the president of the AT&T Company." - Alan Turing

zben@umd5.UUCP (05/15/85)

In article <714@oddjob.UUCP> matt@oddjob.UUCP (Matt Crawford) writes:
>In article <190@mplvax.UUCP> cdl@mplvax.UUCP (Carl Lowenstein) writes:
>>In article <10496@brl-tgr.ARPA> gwyn@Brl.ARPA (VLD/VMB) writes:
>>>
>>>If it ate the "failing" character, you could never see what it
>>>was.  I think the routine was designed on the assumption that
>>>the programmer would not be so stupid as to keep trying to
>>>scan a chunk of input over & over with the same failing format.
>>
>>This programmer is so stupid as to expect to find the behavior
>>of scanf documented in the manual.
>>
>THIS programmer is not too arrogant to open the manual before telling
>someone what's not in it:
>
>SCANF(3S)	    UNIX Programmer's Manual		SCANF(3S)
>
>     For example, .....
>
>	  int i; float x; char name[50];
>	  scanf("%2d%f%*d%[1234567890]", &i, &x, name);
>
>     with input
>
>	  56789	0123 56a72
>
>     will assign 56 to i, 789.0	to x, skip `0123', and place the
>     string `56\0' in name.  The next call to getchar will return
>     `a'.                    ------------------------------------
>     ----
>

The same documentation appears on our 2.9BSD system - I guess it is the same
on 4.xBSD - and yes, a reasonable person should be able, after scratching his
head for awhile, figure out what is happening.

How much time do you waste scratching your head?

The following mail arrived and I think it germane:

-------------------------------------------------------------

I get tired of people saying that UNIX & C are not documented.  There are
a few undocumented features of programs, but they are that way because they
might go away, and shouldn't be used (yet).  E.g., the VPATH variable in
make.  But all the system functions are documented *quite well*.  Take
the scanf manual page:

----
Scanf conversion terminates at EOF, at the end of the control string, or when
an input character conflicts with the control string.  In the latter case, the
offending character is left unread in the input stream.

Scanf returns the number of successfully matched and assigned input items;
this number can be zero in the event of an early conflict between an input
character and the control string.  If the input ends before the first conflict
or conversion, EOF is returned.
----

If that isn't *painfully obvious*, I don't know what is.  Maybe you're using
4.2BSD; if you do, I apologize.  That system is a total hack munged by grad
students and the documentation is even worse.  This excerpt comes from SVR2.
Since it is a production system, it has to be well-documented, and it is.

						Michael Baldwin
						AT&T Bell Labs

-------------------------------------------------------------

Now *this* is adequate documentation...

Re: "high priest" thing.  It's very easy to tell the most vicious form of
Polack joke, until you really become friends with a Pole.  It must similarly
be very easy to eliminate upon "high priests", until you are confronted with
one.  Maturity consists in large measure of doing what is right in preference
to doing what is easy.

Nuf said?

-- 
Ben Cranston  ...{seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben  zben@umd2.ARPA

cdl@mplvax.UUCP (Carl Lowenstein) (05/15/85)

In article <687@burl.UUCP> geoff@burl.UUCP (geoff) writes:
>
>how about the bottom of page 2 of scanf documentation (V5.2)--
>
>"Scanf conversion terminates at EOF, at the end of the control string, or
>when an input character conflicts with the control string.  In the latter
>case, the offending character is left unread in the input stream."
>
>I can only surmise that you have a different version of the manual --
>it does seem quite clear.
>	geoff sherwood

You're right.  It is quite clear.  Unfortunately, it isn't in the 4.2BSD
manual, the v7 manual, the Decus manual.  Since I have all these and SVR2
too in different places, it's easy to get confused.  I wish I could find
the original stdio document from v6 to see whether that sentence got
dropped along the way, or was recently added to prevent people like me
from provoking discussions unnecessarily.

-- 
	carl lowenstein		marine physical lab	u.c. san diego
	{ihnp4|decvax|akgua|dcdwest|ucbvax}	!sdcsvax!mplvax!cdl

guy@sun.uucp (Guy Harris) (05/15/85)

Now you know why, the few times I've ever used "scanf", I read the string
into a buffer and used "sscanf"...

	Guy Harris