[comp.sources.d] v05i053: A "safe" replacement for gets

john@basser.oz (John Mackin) (11/18/88)

In article <674@quintus.UUCP>, Brandon Allbery writes:

> Posting-number: Volume 5, Issue 53
> Submitted-by: "A. Nonymous" <ok@quintus.UUCP>
> Archive-name: getsafe
> 
> [Aaaaagh.  I always suspected gets() was a potential bomb.  How about
> 
> #define gets(s) fgets(s, sizeof s, stdin)
> 
> as a quick fix?  ++bsa]

Just in case anyone is actually considering doing this,
I thought I had better point out it is a total disaster.
As in:

	char *p;

	...
	gets(p);

which under Brandon's suggestion will expand to

	fgets(p, sizeof p, stdin)

which is very unlikely indeed to do what anyone would want.

John Mackin, Basser Department of Computer Science,
             University of Sydney, Sydney, Australia

john@basser.oz.AU (john%basser.oz.AU@UUNET.UU.NET)
{uunet,mcvax,ukc,nttlab}!munnari!basser.oz!john

jfh@rpp386.Dallas.TX.US (John F. Haugh II) (11/19/88)

In article <674@quintus.UUCP> ok@quintus.UUCP writes:
>[Aaaaagh.  I always suspected gets() was a potential bomb.  How about
>
>#define gets(s) fgets(s, sizeof s, stdin)
>
>as a quick fix?  ++bsa]

No, if `s' is `char *s' instead of `char s[BUFSIZ]', sizeof s == some
small number [ 2 or 4 or something like that ].

A more correct solution would be to re-write gets() to expect a buffer
of size BUFSIZ, or else have the buffer size passed as an argument.
-- 
John F. Haugh II                        +----------Quote of the Week:----------
VoiceNet: (214) 250-3311   Data: -6272  | "Okay, so maybe Berkeley is in north-
InterNet: jfh@rpp386.Dallas.TX.US       |   ern California." -- Henry Spencer
UucpNet : <backbone>!killer!rpp386!jfh  +--------------------------------------

gandalf@csli.STANFORD.EDU (Juergen Wagner) (11/20/88)

I am wondering how long the discussion will last... The alternatives were:

o  Nuke gets()
   this will break the standard and a number of programs.

o  Don't use gets()
   Probably the best suggestion. Everybody who doesn't like gets, should
   just stop using it.

o  # define gets(s) fgets(...)
   sizeof returns sizeof(char *) if the buffer is not an array. That will
   cause unexpected effects. :-)

o  replace gets by something more intelligent
   We arrive at fgets. The "more intelligent" will certainly mean some kind
   of overflow checking. That's already available in fgets.

o  leave everything as it is

Well, this discussion reminds me of the "rm *" discussions which come up
every four months or so. The result: umpteen plus one 'best' solutions to
the problem. UNIX happens to provide very powerful tools, and nobody keeps
the super-user from putting a /bin/rm -fr / into his/her .login but that's
not the point: The question is: How do some people under some circumstances
protect themselves against themselves? The answer is: It depends! Probably
all of the above answers are optimal in some sense - in some situation where
you exactly know the file format, gets is just what you want. In other cases,
the #define allows a shorthand for the overflow checking version. In yet other
cases you may need a wrapper complaining when the line is actually longer than
you thought it would be. And so on....

All that is, of course, my personal opinion.
make flame | mail wagner@arisia.xerox.com

-- 
Juergen Wagner		   			gandalf@csli.stanford.edu
						 wagner@arisia.xerox.com

ok@quintus.uucp (Richard A. O'Keefe) (11/22/88)

In article <8709@rpp386.Dallas.TX.US> jfh@rpp386.Dallas.TX.US (John F. Haugh II) writes:
>In article <674@quintus.UUCP> ok@quintus.UUCP writes:
>>[Aaaaagh.  I always suspected gets() was a potential bomb.  How about
>>#define gets(s) fgets(s, sizeof s, stdin)
>>as a quick fix?  ++bsa]

Just in case anyone was confused (I would have been), the quoted material
is from ++bsa, _not_ from ok@quintus.  It turns out that my improved
getsafe() was still subject to attack:  I'll post the revised version soon.

daveb@geaclib.UUCP (David Collier-Brown) (11/22/88)

From article <6508@csli.STANFORD.EDU>, by gandalf@csli.STANFORD.EDU (Juergen Wagner):
> I am wondering how long the discussion will last... The alternatives were:
[ several alternatives] 
> o  replace gets by something more intelligent
>    We arrive at fgets. The "more intelligent" will certainly mean some kind
>    of overflow checking. That's already available in fgets.

  o  replace gets with something much more intelligent.
     You hack up a copy of getline((:-)) to use a hidden
     buffer, strip \n and read from stdin.

--dave () c-b
  ps: I posted a "getline" function some months ago which just
      keeps reallocating space as it need it. This should NOT be
      a standard library function, but it could provide the means
      of transparently replacing gets.  Of course, people will then
      start depending on its implementation... Sigh.
-- 
 David Collier-Brown.  | yunexus!lethe!dave
 Interleaf Canada Inc. |
 1550 Enterprise Rd.   | HE's so smart he's dumb.
 Mississauga, Ontario  |       --Joyce C-B

meissner@xyzzy.UUCP (Michael Meissner) (11/28/88)

In article <6508@csli.STANFORD.EDU> wagner@arisia.xerox.com (Juergen Wagner)
writes:
| I am wondering how long the discussion will last... The alternatives were:
	...	/* extraneous suggestions deleted */

| o  replace gets by something more intelligent
|    We arrive at fgets. The "more intelligent" will certainly mean some kind
|    of overflow checking. That's already available in fgets.

To my way of thinking, both gets and fgets are brain damaged, in that
they encourage using fixed sized buffers.  I think we should be
migrating programmers to something that allocates buffers, and if the
buffer overflows, realloc's a bigger buffer.  Whether or not the
previous buffer is reused, should be an option available to the
programmer.  No matter what buffersize you choose, you will run into
situations where either you waste so much space based on pessimistic
assumptions, or run into input that is larger than expected.

In most of the code I've looked at, when the programmer did use fgets,
no check was done to see if the newline is actually in the buffer, or
if a check is made via strchr (or index for BSD types), it assumes
that NULL is never returned.

I think this readline function should look something like (in ANSI C
prototypes):

	typedef struct {
		size_t	line_alloc_max;	/* current max # bytes allocated */
		size_t	line_num_chars;	/* # chars in this line or 0 */
		char	line_buffer[1];	/* start of line buffer */
	} line_t;

	extern line_t *readline( FILE *stream, line_t *oldline );

If readline returns NULL, it means an error or end of file, and if
oldline is not NULL, it means reuse the line buffer passed in.  Except
for the initial allocation, it should be faster than the fgets
solution, since the newline is already located.

You can also get fancy (like I've done sometimes), and define another
argument that gives a comment character (like '#' in shell scripts),
that automatically discards anything after the comment character, and
also break the line into whitespace separated fields, since reading
tabular data is a common use for fgets.

-- 
Michael Meissner, Data General.

Uucp:	...!mcnc!rti!xyzzy!meissner
Arpa:	meissner@dg-rtp.DG.COM   (or) meissner%dg-rtp.DG.COM@relay.cs.net

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (11/29/88)

In article <2055@xyzzy.UUCP> meissner@xyzzy.UUCP (Michael Meissner) writes:
: To my way of thinking, both gets and fgets are brain damaged, in that
: they encourage using fixed sized buffers.  I think we should be
: migrating programmers to something that allocates buffers, and if the
: buffer overflows, realloc's a bigger buffer.  Whether or not the
: previous buffer is reused, should be an option available to the
: programmer.  No matter what buffersize you choose, you will run into
: situations where either you waste so much space based on pessimistic
: assumptions, or run into input that is larger than expected.

I'm inclined to agree with this.  For the quick-and-dirty program, fgets()
is fine, but I'm appalled at the number of Unix utilities that have fixed
line length limits.  There's simply no excuse for it in this day and age.
(Red-faced, he thinks about patch, which has a limit of 1024 characters...)

: If readline returns NULL, it means an error or end of file, and if
: oldline is not NULL, it means reuse the line buffer passed in.  Except
: for the initial allocation, it should be faster than the fgets
: solution, since the newline is already located.

You can make it even faster by slurping the appropriate iob values into
registers and bypassing getc().  (Don't do this at home, kids.)  For an
example, see the str_gets() routine in perl.  While it's true that you
*shouldn't* cheat on the iob structure, you can get away with it almost
everywhere.  An #ifdef will handle other situations.

: You can also get fancy (like I've done sometimes), and define another
: argument that gives a comment character (like '#' in shell scripts),
: that automatically discards anything after the comment character, and
: also break the line into whitespace separated fields, since reading
: tabular data is a common use for fgets.

Ack.  Pfft.

You could do this for a particular application, but don't put it into the
general routine.  The perl routine mentioned does do alternate record
terminator stuff (including paragraph mode), but you can still keep
everything in registers (on a Vax) while doing so.  I don't think I'd
have put the extra functionality into that routine otherwise.

Larry Wall
lwall@jpl-devvax.jpl.nasa.gov
"He who toys with the most wins dies."