[comp.lang.c] gets

maart@cs.vu.nl (Maarten Litmaath) (11/11/88)

In article <14447@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
\In article <339@igor.Rational.COM> dsb@Rational.COM (David S. Bakin) writes:
\>[What's going on?  The article I'm replying to was signed by Chris Torek of
\> uunet!mimsy!chris but the headers say it is from ok@quintus.uucp???]
\
\I have no idea why that happened.

It's the `virus'! (Who said it was dead? :-)

And now the real point: let's stop complaining about the gets(3) semantics of
not checking buffer boundaries; this is precisely what was intended.
Does anyone suggest doing away with strcpy() too?
Or /bin/rm, being destructive?
-- 
George Bush:                          |Maarten Litmaath @ VU Amsterdam:
             Capt. Slip of the Tongue |maart@cs.vu.nl, mcvax!botter!maart

guy@auspex.UUCP (Guy Harris) (11/12/88)

>And now the real point: let's stop complaining about the gets(3) semantics of
>not checking buffer boundaries; this is precisely what was intended.

"Intended" in what sense?  Somebody put it in there so that people would
deliberately write programs using it, and thus would write programs that
could be made to fail by sending them lines longer than they expect? 
Or somebody put it in there so that you could avoid the nasty run-time
overhead of checking string bounds?

The former is not a good reason for doing something, so the complaints
are justified; the latter isn't all that good either, since 1) I suspect
the cost of checking the string bounds is pretty low and 2) most of the
applications I know of that read input have no control over the form of
the input, and thus could be made to exhibit buggy behavior, if they
used "gets()", just by handing them a line longer than they expect. 

maart@cs.vu.nl (Maarten Litmaath) (11/15/88)

In article <434@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
\>And now the real point: let's stop complaining about the gets(3) semantics of
\>not checking buffer boundaries; this is precisely what was intended.
\
\"Intended" in what sense?  Somebody put it in there so that people would
\deliberately write programs using it, and thus would write programs that
\could be made to fail by sending them lines longer than they expect? 

Hey people! I just found out somebody put a nasty little program in /bin!
It's called `rm'. If you type `rm *', all your files will disappear!

\Or somebody put it in there so that you could avoid the nasty run-time
\overhead of checking string bounds?

Of course! If you want security, use fgets()!
-- 
fcntl(fd, F_SETFL, FNDELAY):          |Maarten Litmaath @ VU Amsterdam:
      let's go weepin' in the corner! |maart@cs.vu.nl, mcvax!botter!maart

achut@unisoft.UUCP (Achut Reddy) (11/15/88)

In article <1643@solo11.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>And now the real point: let's stop complaining about the gets(3) semantics of
>not checking buffer boundaries; this is precisely what was intended.
>Does anyone suggest doing away with strcpy() too?
>Or /bin/rm, being destructive?

No, there is a fundamental difference between gets(3) and all the other
functions that don't check buffer boundaries.  That difference is that 
the other functions *can* be used safely if the programmer exercises
some care.  He has complete control over the arguments he passes to these
functions, and can ensure that his buffers don't overflow.  When gets(3)
is used, however, to get input from any source which the user has
control over (e.g., stdin), then a correct program *cannot* be written.
(The user can always enter an input line longer than the buffer length)

gets(3) should not be used, and hopefully it can be phased out.

Achut Reddy

rkl1@hound.UUCP (K.LAUX) (11/16/88)

	Well, I suppose that if gets () is capable of overflowing the buffer,
the way to go would be to read the input one character at a time and check
for buffer overflow oneself.  It would be trivial to write a function to do
this, and you only have to do it once and use it from then on instead of gets ()

--rkl

guy@auspex.UUCP (Guy Harris) (11/16/88)

>\"Intended" in what sense?  Somebody put it in there so that people would
>\deliberately write programs using it, and thus would write programs that
>\could be made to fail by sending them lines longer than they expect? 
>
>Hey people! I just found out somebody put a nasty little program in /bin!
>It's called `rm'. If you type `rm *', all your files will disappear!

You've totally missed the point.

Somebody might want all the files in a given directory to disappear.  I
have difficulty imagining anybody who *wants* to write a program that
blows up when you feed too-long lines at it.

>\Or somebody put it in there so that you could avoid the nasty run-time
>\overhead of checking string bounds?
>
>Of course!

"Of course somebody put it in so you can avoide string bounds checking?"
I dispute this.  Got any references handy to prove your assertion?

>If you want security, use fgets()!

I want security.  I want everyone *else* to want security.  I don't want
programs that die randomly if they get handed lines that are too long. 
Programs like that are rude.

maart@cs.vu.nl (Litmaath Maarten) (11/17/88)

In article <453@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
\>\"Intended" in what sense?  Somebody put it in there so that people would
\>\deliberately write programs using it, and thus would write programs that
\>\could be made to fail by sending them lines longer than they expect? 
\>
\>Hey people! I just found out somebody put a nasty little program in /bin!
\>It's called `rm'. If you type `rm *', all your files will disappear!
\
\You've totally missed the point.
\
\Somebody might want all the files in a given directory to disappear.  I
\have difficulty imagining anybody who *wants* to write a program that
\blows up when you feed too-long lines at it.

Allright. I tried `clarification through exaggeration', and obviously I failed.
If you want to copy part of a string into a buffer, do you complain you can't
give a count to strcpy()? Or do you say: hey, strcpy() doesn't do what I want,
let's use another function (which happens to be strncpy())?

\>\Or somebody put it in there so that you could avoid the nasty run-time
\>\overhead of checking string bounds?
\>
\>Of course!
\
\"Of course somebody put it in so you can avoide string bounds checking?"
\I dispute this.  Got any references handy to prove your assertion?

No no! "Of course somebody put it in so you can avoid the nasty run-time
overhead!" Indeed, one could doubt if the writer of gets() really had that
very point in mind, but it sure comes in handy right now.

\>If you want security, use fgets()!
\
\I want security.  I want everyone *else* to want security.  I don't want
\programs that die randomly if they get handed lines that are too long. 
\Programs like that are rude.

So use fgets()! You're right insofar that public utilities should use fgets()
(or something equivalent), but there ARE cases in which you can be absolutely
sure how stdin is formatted (or cases in which one simply says: if stdin is
badly formatted, bad luck - YOU f*ck around, YOU get the core dump!).
Read Doug Gwyn's articles on this subject.
-- 
fcntl(fd, F_SETFL, FNDELAY):          |Maarten Litmaath @ VU Amsterdam:
      let's go weepin' in the corner! |maart@cs.vu.nl, mcvax!botter!maart

kenny@m.cs.uiuc.edu (11/17/88)

/* Written 11:16 am  Nov 15, 1988 by rkl1@hound.UUCP in m.cs.uiuc.edu:comp.lang.c */
	Well, I suppose that if gets () is capable of overflowing the buffer,
the way to go would be to read the input one character at a time and check
for buffer overflow oneself.  It would be trivial to write a function to do
this, and you only have to do it once and use it from then on instead of gets ()
/* End of text from m.cs.uiuc.edu:comp.lang.c */

How convenient that the deigners of the stardard C library have done
this for us already.  It's called fgets().

Please, can we go to another topic?  I've got one:  do fseek(), fread
(), and fwrite() allow a forward read after a write?  Does it matter
whether it's a text or binary file?

A-T

pmech@oucsace.cs.OHIOU.EDU (Paul J. Mech) (11/17/88)

In article <2747@hound.UUCP>, rkl1@hound.UUCP (K.LAUX) writes:
> 
> 	Well, I suppose that if gets () is capable of overflowing the buffer,
> the way to go would be to read the input one character at a time and check
> for buffer overflow oneself.  It would be trivial to write a function to do
> this, and you only have to do it once and use it from then on instead of gets ()
> 
> --rkl

Agreed, it was one of the first functions I wrote (I called mine getln()).
I just don't see why some people are damning gets(). I still use gets()
whenever I am writing a quick and dirty program that I will swiftly (read
immediately) discard after use, or for some aid that only I will have
access to with all the data being known not to cause an overflow. If a
customer is to get within spitting distance of it, or if I pass it on to
another programmer, I use getln(), and include the (rather trivial) source.
Despite what seems to be the hysteria of the moment, gets() is useful. But
like most things associated with 'C', you have to be careful how you use
it.

pjm

guy@auspex.UUCP (Guy Harris) (11/18/88)

>Allright. I tried `clarification through exaggeration', and obviously I
>failed.

Yes, because it didn't clarify - the two cases were different.

>No no! "Of course somebody put it in so you can avoid the nasty run-time
>overhead!" Indeed, one could doubt if the writer of gets() really had that
>very point in mind, but it sure comes in handy right now.

Oh, really?  I doubt that as well.  The extra time spent making sure you
don't overflow the buffer is likely to be washed away by the time spent
reading the data in the first place.

>So use fgets()! You're right insofar that public utilities should use fgets()
>(or something equivalent), but there ARE cases in which you can be absolutely
>sure how stdin is formatted

But in many cases where you think you can be sure, you may be in for a
rude surprise.  Consider program A which writes a file and program B
which reads it.  Of COURSE you wrote both programs, so of COURSE they
agree on conventions, right?  Well, unless you're sure 1) you didn't
screw up and 2) you'll be maintaining both programs, you may find, to
your surprise, that someday they may *not* agree....

>(or cases in which one simply says: if stdin is badly formatted, bad
>luck - YOU f*ck around, YOU get the core dump!).

Unfortunately, I don't trust people to do this only on programs that
only they will use.  I also don't trust people, including myself, to
remember that this program responds rather poorly to input lines that
are too long....

henry@utzoo.uucp (Henry Spencer) (11/20/88)

In article <4700029@m.cs.uiuc.edu> kenny@m.cs.uiuc.edu writes:
>>... It would be trivial to write a function to do
>>this, and you only have to do it once...
>
>How convenient that the deigners of the stardard C library have done
>this for us already.  It's called fgets().

Note also that a good implementation of fgets (the ones in many old C
libraries are not terribly good) will almost certainly be much faster
than the quick ten-minute hack that you might throw together.  Never
re-invent the wheel unnecessarily; yours may have corners.
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

jwr@scotty.UUCP (Dier Retlaw Semaj) (11/22/88)

In article <1403@unisoft.UUCP> achut@unisoft.UUCP (Achut Reddy) writes:
<In article <1643@solo11.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
<<And now the real point: let's stop complaining about the gets(3) semantics of
<<not checking buffer boundaries; this is precisely what was intended.
<<Does anyone suggest doing away with strcpy() too?
<<Or /bin/rm, being destructive?
<
<No, there is a fundamental difference between gets(3) and all the other
<functions that don't check buffer boundaries.  That difference is that 
<the other functions *can* be used safely if the programmer exercises
<some care.  He has complete control over the arguments he passes to these
>functions, and can ensure that his buffers don't overflow.

What about sprintf() & fprintf()?
The user does not have *complete control* over these functions.

-- 

Dier R. Semaj	{ames,cmcl2,rutgers}!rochester!kodak!fedsys!wally!jwr

--

logan@vsedev.VSE.COM (James Logan III) (11/23/88)

In article <644@scotty.UUCP> jwr@scotty.UUCP (Dier Retlaw Semaj) writes:
>
>What about sprintf() & fprintf()?
>The user does not have *complete control* over these functions.

True, sprintf() could write beyond the end of the string passed
as its first parameter.  But I don't see what damage fprintf()
would do, unless it does not check its internal buffer boundaries
as it expands the format string.  If that's the problem then
printf() would have the same problem.  Anyone here seen the
source?   

Hey, lets do away with printf! :-)

			-Jim

-- 
Jim Logan		logan@vsedev.vse.com
(703) 892-0002		uucp:	..!uunet!vsedev!logan
			inet:	logan%vsedev.vse.com@uunet.uu.net

gandalf@csli.STANFORD.EDU (Juergen Wagner) (11/23/88)

printf, fprintf, et al. all use an internal buffer of finite size. May I
quote from the manual:

	BUGS
	     Very wide fields (>128 characters) fail.

-- 
Juergen Wagner		   			gandalf@csli.stanford.edu
						 wagner@arisia.xerox.com

chris@mimsy.UUCP (Chris Torek) (11/23/88)

In article <6544@csli.STANFORD.EDU> gandalf@csli.STANFORD.EDU
(Juergen Wagner) writes:
>printf, fprintf, et al. all use an internal buffer of finite size. May I
>quote from the manual:
>
>	BUGS
>	     Very wide fields (>128 characters) fail.

You quote from *a* manual, not *the* manual: this bug is gone in 4.3tahoe.
printf format widths may be arbitrarily large (to MAXINT).  Ridiculous
field widths are handled correctly: e.g., %30000.15000f produces a 30000
character field with 15000 digits of precision, of which the last >14500
will definitely all be zero.  (They are faked.  Getting this right is
difficult; we went through a number of iterations, with Keith Bostic
doing the hard part of floating point formatting and me supplying
perverse test cases.  We finally settled on faking ridiculous precisions,
to avoid blowing away stacks.  Hmm :-) ....)  The manual entry might
still have that BUGS section; I forget whether we got it updated.  A
proper BUGS section is appended below.

Anyway, more or less back to the subject (now that I have finished my
mixed veggies and started the spaghetti water boiling [still hungry]):
the BUGS quote above really means that wider fields are truncated, or
at least that is what the 4.xBSD (x < 3tahoe) vax _doprnt.s tried to do.

[from my printf.3s]
BUGS
     The conversion formats %D, %O, and %U are not standard and
     are provided only for backward compatibility.  The effect of
     padding the %p format with zeros (either by the `0' flag or
     by specifying a precision), and the benign effect (i.e.,
     none) of the `#' flag on %n and %p conversions, as well as
     other nonsensical combinations such as %Ld, are not stan-
     dard; such combinations should be avoided.

Printed 9/13/88           June 5, 1986                          5

-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

guy@auspex.UUCP (Guy Harris) (11/24/88)

>What about sprintf() & fprintf()?
>The user does not have *complete control* over these functions.

"fprintf()" is, as stated in another article, irrelevant; if the buffer
fills up, the standard I/O library will generally write it out and
continue.

As for "sprintf":

Users do have control over the format operations they use with these
functions.

The "%d", "%i", "%o", "%u", "%x", "%X", "%e", "%E", "%f", "%g", and "%G"
conversions, and their "l"-prefixed equivalents (and "h"-prefixed and
"L"-prefixed equivalents, in the dpANS upon which K&R Second Edition is
based), have, on most implementations, a maximum length of output that
they can generate; you can probably pick a number that's "big enough"
for all implementations you're likely to run into (e.g., "%d" is
unlikely to produce more digits than are in -2^64 - the "-" is for the
minus sign" and is extremely unlikely to produce more digits than are in
-2^128). 

"%s" can take a "precision" argument that specifies the *maximum* number
of characters to be produced.

"%c" only produces one character.

The only tricky one appears to be the dpANS's "%p", and you can
probably, in most implementations, just say something like "it's
unlikely to produce more than 128 characters"; if a pointer value takes
128 characters to dump, you may want to consider not dumping it....

Thus, if you don't use "%s" by itself, you can compute a "maximum
length" for the output of "sprintf" that should work, as stated, on most
implementations.  If you do use "%s" by itself, you do at least have
control over what argument matches it, so you can use "strlen" to find
out how many characters it will generate.

If the format string is generated at run time (e.g., inside an
interpreter for a programming language that includes a "printf"
construct), you can consider scanning the string, computing the maximum
length (which may, as indicated in the previous paragraph, require you
to scan the arguments that match "%s"s), and proceeding from there. 

gandalf@csli.STANFORD.EDU (Juergen Wagner) (11/24/88)

In article <14705@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <6544@csli.STANFORD.EDU> gandalf@csli.STANFORD.EDU
>(Juergen Wagner) writes:
>>printf, fprintf, et al. all use an internal buffer of finite size.
>>...
>You quote from *a* manual, not *the* manual: this bug is gone in 4.3tahoe.
>...

Hmm... but this doesn't ensure that my program is compatible if I want to
run it under different *IX flavors. Until I actually run printf with extreme
test cases, I might not know if I have the fancy or the vanilla one.

-- 
Juergen Wagner		   			gandalf@csli.stanford.edu
						 wagner@arisia.xerox.com

ok@quintus.uucp (Richard A. O'Keefe) (11/24/88)

In article <1251@vsedev.VSE.COM> logan@vsedev.VSE.COM (James Logan III) writes:
>In article <644@scotty.UUCP> jwr@scotty.UUCP (Dier Retlaw Semaj) writes:
>>What about sprintf() & fprintf()?
>>The user does not have *complete control* over these functions.
>True, sprintf() could write beyond the end of the string passed
>as its first parameter.  But I don't see what damage fprintf()
>would do, unless it does not check its internal buffer boundaries
>as it expands the format string.

Oddly enough, there used to be versions of *printf() around that
_could_ corrupt the stack in a gets()-like way.  If I remember correctly,
the magic number was something like 127 characters of plain text.  That
is, if you had too much text before you came to the next %, your stack
could be damaged.  That version of _doprint copied the plain text to the
stack, and wrote it from there.  Now, this _was_ documented in the manual,
but it was very easy to overlook that.

Speaking of which, I just noticed that the man page on the system I am
using now says
	BUGS
	    Very wide fields (>128 characters) fail.
I do not know whether this is the same bug.

And people wonder why I wrote my own printf()...

jbayer@ispi.UUCP (Jonathan Bayer) (11/25/88)

In article <644@scotty.UUCP>, jwr@scotty.UUCP (Dier Retlaw Semaj) writes:
> In article <1403@unisoft.UUCP> achut@unisoft.UUCP (Achut Reddy) writes:
> <In article <1643@solo11.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
> <
> <No, there is a fundamental difference between gets(3) and all the other
> <functions that don't check buffer boundaries.  That difference is that 
> <the other functions *can* be used safely if the programmer exercises
> <some care.  He has complete control over the arguments he passes to these
> >functions, and can ensure that his buffers don't overflow.
> 
> What about sprintf() & fprintf()?
> The user does not have *complete control* over these functions.

With proper care the user *does* have complete control.  Simply specify
a length for each var being printed.

gets is different in that the input is undefined.  If gets is used in a
program in which data is piped to, and it is part of a secure system, and
unsecured data can be piped to it, then it is possible to break it.

Jonathan Bayer
Intelligent Software Products, Inc.