[comp.lang.c] In defense of scanf

bph@buengc.BU.EDU (Blair P. Houghton) (06/15/89)

In article <4529@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>scs@adam.pika.mit.edu (Steve Summit) writes:
>>It is only a miserable problem when scanf
>>is being used for interactive user input, which is what everybody
>>uses it for.

"Eeep", and "yoicks!"

>Anyone using scanf directly for interactive input... or for any input at
>all... should have their head examined.
>
>The only really safe way to use scanf() without freaking out the casual
>user of your code is to do something like this:
>
>	fgets(buffer, sizeof buffer, stdin);
>	sscanf(buffer, fmt, args...);

I'll go along with the "don't use it for interactive input" idea, but
not the "for any input at all"...

When filtering tabular data from files, or dealing in a situation where
a precise syntax is necessary, the `fgets(..); sscanf(..)' doublet just
adds uncertainty and complexity to a simple problem to which scanf is
suited ideally.  If there is an error reading something in that case,
you usually force a barf.  It's irrelevant whether the input gets
discarded.

The point is, don't reject scanf() just because it's unsuited to a problem
you aren't solving.

				--Blair
				  "We pay csh for used car's..."

peter@ficc.uu.net (Peter da Silva) (06/15/89)

In article <3145@buengc.BU.EDU>, bph@buengc.BU.EDU (Blair P. Houghton) writes:
> When filtering tabular data from files, or dealing in a situation where
> a precise syntax is necessary, the `fgets(..); sscanf(..)' doublet just
> adds uncertainty and complexity to a simple problem to which scanf is
> suited ideally.

Eeep and Yoicks yourself. I suspect this is another religious issue, but given
scanf's habit of trashing indeterminate amounts of input and ignoring newlines
if you have anything wrong with your format string, well... precision and
scanf just don't belong in the same sentence.

I tend to stick with strspn() and strtok(), myself.
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

mpl@cbnewsl.ATT.COM (michael.p.lindner) (06/16/89)

In article <4563@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes:
> In article <3145@buengc.BU.EDU>, bph@buengc.BU.EDU (Blair P. Houghton) writes:
> > When filtering tabular data from files, ....
> > .... scanf is
> > suited ideally.
>
> I tend to stick with strspn() and strtok(), myself.
>
> Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Hope you never have to do anything complex.  If you call strtok on e string
in the middle of a strtok of another string it trashes its state information
on the first string (a little known feature), which can cause extremely
elusive bugs.  My previous project got stuck with this when we started using
some library code which called it.  For this reason, I avoid strtok like the
plague in all but the simplest applications.

Mike Lindner
attunix!mpl

daveh@marob.masa.com (Dave Hammond) (06/17/89)

In article <824@cbnewsl.ATT.COM> mpl@cbnewsl.ATT.COM (michael.p.lindner) writes:
>> > When filtering tabular data from files, ....
>>
>in the middle of a strtok of another string it trashes its state information
>on the first string (a little known feature), which can cause extremely
>elusive bugs.

Not only that, since it delimits the returned token by replacing the
terminating blank with a null, you are forced to work with a copy
of the input line, if for some reason the complete line must survive
the call to strtok().

I prefer using strpbrk(line, "\s\t\n") and either copying, or just
peeking at the token, whichever is required.

--
Dave Hammond
daveh@marob.masa.com

diamond@diamond.csl.sony.junet (Norman Diamond) (06/20/89)

In article <1134@vsi.COM> friedl@vsi.COM (Stephen J. Friedl) writes:

>since strtok places NULs in the string, the
>environment was getting corrupted for the child.

>Neither of these are bugs -- they are documented parts of the
>function -- but nevertheless we have been hit with these gotchas.

You mean that these are design bugs instead of coding bugs.
They are documented bugs instead of undocumented bugs.
Just like gets() has some documented design bugs.

Funny, existing practices that consisted of documented bugs really
have been standardized.  Only existing practices that consisted of
quasi-documented but necessary features have been omitted from the
standardization.

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.jp@relay.cs.net)
 The above opinions are claimed by your machine's init process (pid 1), after
 being disowned and orphaned.  However, if you see this at Waterloo, Stanford,
 or Anterior, then their administrators must have approved of these opinions.

gwyn@smoke.BRL.MIL (Doug Gwyn) (06/28/89)

In article <10397@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes:
>You mean that these are design bugs instead of coding bugs.
>They are documented bugs instead of undocumented bugs.
>Just like gets() has some documented design bugs.
>Funny, existing practices that consisted of documented bugs really
>have been standardized.  Only existing practices that consisted of
>quasi-documented but necessary features have been omitted from the
>standardization.

I don't know what you mean by that last sentence.

Certainly some of the features inherited from the Base Documents
were misdesigned in the eyes of many of us, including perhaps a
majority of X3J11.  Here are the most likely alternatives that
faced the committee:
	(1) Omit the misdesigned function from the Standard.
	(2) Specify a different behavior for the function than
	    it had in existing practice, to correct the problem.
	(3) Add a newly named function with an improved design.
	(4) Standardize the function the way it actually exists.
Obviously, some of these alternatives are mutually exclusive.

It should be pretty obvious what the pros and cons are for each of
these alternatives.  Since the primary charter of X3J11 was to
standardize existing practice, when it was clear and unambiguous,
alternative (4) was used for essentially all the functions in the
Base Documents.  Alternative (3) was avoided except when there was
a pressing need, as with the localization support.  Unlike many
so-called "standardization" committees, X3J11 did not feel their
job was to design a lot of new, unproven stuff then push it as a
"standard".  Alternative (1) for the most part would have been
defaulting on the committee's primary responsibility.  Alternative
(2) would have caused major compatibility and transition problems.

I have to say that I resent the tone of your criticism.  X3J11
did an excellent job of standardizing the C programming language,
and you could have participated if you had chosen to do so.  There
were many factors that had to be carefully evaluated in arriving
at the final specification.  It is easy to imply that you could
have done better yourself, but I seriously doubt it.

jss@hector.UUCP (Jerry Schwarz) (06/28/89)

In article <10397@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes:

[Some discussion of design flaws in "strtok" and "gets" omitted]

>
>Funny, existing practices that consisted of documented bugs really
>have been standardized.  Only existing practices that consisted of
>quasi-documented but necessary features have been omitted from the
>standardization.
>

I strongly object to the tone of the above paragraph.  It suggests
(without coming right out and saying it) that the deliberations of
the ANSI C committee were subject to some systematic effect that
damaged the design of ANSI C without suggesting what that influence
was?  Was it incompetence, improper goals, maliciousness, greed,
haste, or something else? Since no specific charges are made they
can't be refuted.

Probably nobody agrees with all the decisions made by the committee.
(I happen to agree with it on "strtok" and disagree on "gets", but
that isn't particularly relevant.)

For the record, I never served on the committee although I know some
of the people who have.

Jerry Schwarz

peter@ficc.uu.net (Peter da Silva) (07/22/89)

In article <824@cbnewsl.ATT.COM>, mpl@cbnewsl.ATT.COM (michael.p.lindner) writes:
> In article <4563@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes
  about scanf:
> > I tend to stick with strspn() and strtok(), myself.

> Hope you never have to do anything complex.  If you call strtok on e string
> in the middle of a strtok of another string it trashes its state information
> on the first string (a little known feature), which can cause extremely
> elusive bugs.

Sounds like something too fancy for scanf, too. But thanks for the info... I've
never gotten that complex with strtok. By that time I'm usually stepping
through the string by hand (while (strchr(*s, legalchars)) s++;)...

[files away information that there are broken implementations of strtok...]
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

mcdonald@uxe.cso.uiuc.edu (07/22/89)

Re:Re:Re:Re:Re:Re:Problems with scanf (and strtok)

I once (and only once!) wrote a "commercial" program. It is for use by
dumb students (well, relatively - they have made it half way through a
Junior level Quantum Mechanics course and survived many 3 dimensional
PDE's, Hermite, Legendre, and Associated Laguerre polynomials and
mathematical induction) and watched them, some of whom, still, in 1985, 
had never touched a computer before. I finally gave up trying to use
ANY canned input routine, and wrote my own that scanned the input
character by character, giving a hopefully appropriate, meaningful,
error message as they typed the offending character (not requiring
a carriage return before giving a message.) 

For programs that only I use, I use scanf all the time.

For programs I buy, I like quick and efficient error messages.
MAybe we need a new acronym: WYDIIWYSI: When you do it is when you
see it! It still bothers me to see a C compiler issue 100 error
messages when the program contains only one bug!

Doug McDonald

friedl@vsi.COM (Stephen J. Friedl) (07/23/89)

In article <4563@ficc.uu.net>, peter@ficc.uu.net (Peter da Silva) writes:
> I tend to stick with strspn() and strtok(), myself.

In article <824@cbnewsl.ATT.COM>, mpl@cbnewsl.ATT.COM (michael.p.lindner) writes:
> Hope you never have to do anything complex.  If you call strtok on e string
> in the middle of a strtok of another string it trashes its state information
> on the first string (a little known feature), which can cause extremely
> elusive bugs.

In addition, strtok() considers multiple occurrences of the
separating token to be one, so you can't use it for obvious
things like parsing an /etc/passwd line.

One more thing.  If you use strtok to pick apart an environment
variable, be sure to copy the environment string somewhere before
you tear into it.  We had a program whose child processes were
always failing, and it drove us nuts until we realized that it
was strtok again.  We were picking apart $PATH earlier in the
program, and since strtok places NULs in the string, the
environment was getting corrupted for the child.

Neither of these are bugs -- they are documented parts of the
function -- but nevertheless we have been hit with these gotchas.

     Steve

-- 
Stephen J. Friedl / V-Systems, Inc. / Santa Ana, CA / +1 714 545 6442 
3B2-kind-of-guy   / friedl@vsi.com  / {attmail, uunet, etc}!vsi!friedl
                                          ---> vsi!bang!friedl <-- NEW
"Friends don't let friends run Xenix" - me

dal@midgard.Midgard.MN.ORG (Dale Schumacher) (08/04/89)

In article <4596@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
|In article <824@cbnewsl.ATT.COM>, mpl@cbnewsl.ATT.COM (michael.p.lindner) writes:
|> Hope you never have to do anything complex.  If you call strtok on e string
|> in the middle of a strtok of another string it trashes its state information
|> on the first string (a little known feature), which can cause extremely
|> elusive bugs.
|
|Sounds like something too fancy for scanf, too. But thanks for the info... I've
|never gotten that complex with strtok. By that time I'm usually stepping
|through the string by hand (while (strchr(*s, legalchars)) s++;)...

Be careful here.  Since strchr() will match the '\0' at the end of legalchars,
you may walk right out of the string if all characters are legal!  I use a
macro like the following:

#define	IN_SET(set,c)	((c) && strchr((set), (c)))

Note, this is NOT a "safe" macro, so be sure c has no side-effects...

|[files away information that there are broken implementations of strtok...]

The operation of strtok() described above is NOT broken, it's documented.
It is also somewhat less useful than it could be due to it's "interesting"
quirks, but it IS defined as working that way.

nather@ut-emx.UUCP (Ed Nather) (08/04/89)

In article <1122@midgard.Midgard.MN.ORG>, dal@midgard.Midgard.MN.ORG (Dale Schumacher) writes:
> The operation of strtok() described above is NOT broken, it's documented.
> It is also somewhat less useful than it could be due to its "interesting"
> quirks, but it IS defined as working that way.

Gosh, that makes programming really easy!  Just throw something together,
document all the bugs, and you're done!

In my view, the operation of strtok() --- and, to a considerable extent, the
operation of scanf() --- are both broken, documentation notwithstanding.  I
have totally avoided scanf() for 8 years, and will continue to do so.  I
wrote my own small versions of strtok() after reading its description, so I 
have never used the "official" one.

-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin