[comp.lang.c] Retiring gets

geoff@utstat.uucp (Geoff Collyer) (11/08/88)

The recent exposure of the security bug in the 4BSD fingerd caused by
use of gets(3) reminded me that gets is a bug waiting to happen and
should be stamped out.  I have deleted gets from my stdio implementation
(my first ANSI incompatibility!), the folks at Bell Labs Research have
deleted gets from their C library, now it's your turn.  We need to get
the next ANSI C standard, the relevant POSIX standard(s), the next
edition of the SVID, the next System V, the next 4BSD, the next SunOS
and the next release from your favourite C vendor to delete gets.  Let
your vendor know that you want to see gets deleted from its next
release, delete gets.o from your C library, move gets.o to -lgets,
define gets(s) as "gets is unsafe; use fgets(3)"<><><> in your stdio.h;
do whatever you can to help.

If your vendor protests your reasonable request, point out that gets,
as part of stdio, is a decade-old backward compatibility hack for
compatibility with the Sixth Edition UNIX Portable I/O Library, which
was utterly replaced by stdio no later than 1979.  Accept no excuses;
converting programs from using gets to fgets is largely mechanical,
and stripping trailing newlines is trivial to code yourself.

With your help, we can stamp out gets in our lifetimes.
-- 
Geoff Collyer	utzoo!utstat!geoff, geoff@utstat.toronto.edu

usenet@cps3xx.UUCP (Usenet file owner) (11/09/88)

in article <1988Nov8.054845.23998@utstat.uucp>, geoff@utstat.uucp (Geoff Collyer) says:
$ 
$ The recent exposure of the security bug in the 4BSD fingerd caused by
$ use of gets(3) reminded me that gets is a bug waiting to happen and
$ should be stamped out. 

This may be a naive question, or perhaps I haven't followed the right
stories, but what is the problem with using gets versus fgets?

John H. Lawitzke      UUCP: ...rutgers!mailrus!frith!fciiho!jhl
Michigan Farm Bureau        ...decvax!purdue!mailrus!frith!fciiho!jhl
Insurance Group             ...uunet!frith!jhl

vfm6066@dsacg3.UUCP (John A. Ebersold) (11/10/88)

In article <1031@cps3xx.UUCP> usenet@cps3xx.UUCP (Usenet file owner) writes:
>
>This may be a naive question, or perhaps I haven't followed the right
>stories, but what is the problem with using gets versus fgets?
>

One can feed a VERY long string to gets(3), since gets will keep reading
characters until receipt of a newline and does not check for overflow of the
receiving buffer.  The VERY long string would cause a program to
malfunction is some way that is not clear to me.

Maybe overwritting the stack?

chase@Ozona.orc.olivetti.com (David Chase) (11/10/88)

You should also consider retiring certain features of 'scanf' and
'fscanf'.  A call along the lines of

    scanf("%s", junk);

is perfectly able to scribble past the end of 'junk'.  I'm not sure if
there are other holes like this built in to the standard i/o library;
it wouldn't hurt to check.  (I've never been a real fan of 'scanf',
but it does seem marginally more useful and harder to replace than
'gets').

David

lvc@cbnews.ATT.COM (Lawrence V. Cipriani) (11/10/88)

In article <1031@cps3xx.UUCP> usenet@cps3xx.UUCP (Usenet file owner) writes:
>This may be a naive question, or perhaps I haven't followed the right
>stories, but what is the problem with using gets versus fgets?

The only argument to gets() is a character pointer, or buffer; fgets() has
a FILE*, a character buffer, and most importantly a count.  Used properly,
this prevents writing passed the end of the buffer.  Since gets() doesn't
have the count, this could be used to read passed the end of some buffer,
say buf, in fingerd.  Morris managed to get just the right "data" to go
past the end of buf so that the program behavior was modified the way he
needed.  Usually reading data passed the end of a buffer gives you a fatal
error and your process dies.  In this case (I'm really reaching here), the
stack was modified, say change the return address, to do "something special"
like go around some permission checks.  Neat, very neat.  Now will someone
please send me a copy of Morris's program :-)

>John H. Lawitzke      UUCP: ...rutgers!mailrus!frith!fciiho!jhl

-- 
Larry Cipriani, AT&T Network Systems, Columbus OH,
Path: att!cbnews!lvc    Domain: lvc@cbnews.ATT.COM

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/10/88)

In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
>We need to get ... the next release from your favourite C vendor to
>delete gets.

gets() is deliberately required for ANSI C standard conformance because
a LOT of existing code relies on it.  Any vendor who omits this function
will not be standard conforming and will not sell its compiler to those
(expected to be MANY customers) who specify standard conformance.

I sympathize with the desire to encourage conversion to fgets(), but
attempts to force this down programmers' throats are misguided.  This
is an EDUCATIONAL issue and should be handled as one.  Otherwise you
will be as effective as the Libertarians were with their politics-
before-public-education approach.  Even if your philosophy is right,
you should get others to go along with it BEFORE trying to force them
to conform to it.

By the way, have you removed scanf() from your C library as well?  Or
sprintf()?  Or strcpy()?  They can be misused in the same way as gets().
Let us know how happy your customers are once ALL such routines are gone.

I think the appropriate treatment of gets() is to omit it from the
documentation or to document it with "Unless you have sufficient control
over the data being read to be sure that it will not overflow the buffer,
use fgets".  But leave it in the library.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/10/88)

In article <1031@cps3xx.UUCP> usenet@cps3xx.UUCP (Usenet file owner) writes:
>This may be a naive question, or perhaps I haven't followed the right
>stories, but what is the problem with using gets versus fgets?

If you don't know for sure that the input line will fit the buffer
you've allocated for it, gets() can overrun the buffer (with random
consequences).  However, if your program can be sure that the line
will fit, there is nothing wrong with using gets().

geoff@utstat.uucp (Geoff Collyer) (11/10/88)

I wrote:
> The recent exposure of the security bug in the 4BSD fingerd caused by
> use of gets(3) reminded me that gets is a bug waiting to happen and
> should be stamped out.

Apparently a lot of people have still not heard the details of the
recent Internet worm (or "virus" as the media called it).  The 4BSD
fingerd had a bug which permitted its invoker to obtain a root shell.
The bug was that fingerd used gets to read a line of input from its
network connection, and gets is unable to check that the input line
fits within the buffer handed to gets, so a suitably-constructed line of
input to fingerd steps on other variables, confusing fingerd.

The above is merely preamble; the point I want to make is that gets is
inherently unsafe due to its inability to check for overrun of the
buffer provided to it.  There is no reason to use gets, and there are
good reasons to avoid gets.

Let's kill gets now, before it strikes again.
-- 
Geoff Collyer	utzoo!utstat!geoff, geoff@utstat.toronto.edu

chris@mimsy.UUCP (Chris Torek) (11/10/88)

In article <2044@cbnews.ATT.COM> lvc@cbnews.ATT.COM (Lawrence
V. Cipriani) writes:

[re recent Internet `worm'; note that R. T. Morris Jr. is still merely
the `alleged' perpetrator---we have to give him the benefit of the
doubt, first; *then* we can rip his arms off :-) ]

>... in fingerd.  Morris managed to get just the right "data" to go
>past the end of buf so that the program behavior was modified the way he
>needed.  Usually reading data passed the end of a buffer gives you a fatal
>error and your process dies.  In this case (I'm really reaching here), the
>stack was modified, say change the return address, to do "something special"
>like go around some permission checks.

You may be reaching, but you are right.  The fingerd attack wrote more
bytes than there were in the buffer passed to gets(); the `extra' bytes
were a hand-crafted stack that `returned' into the stack, into the
buffer itself.  The part just before the hand-crafted stack contained
code to to call execve("/bin/sh", (char **)0, (char **)0).  (There
were in fact ASCII NUL characters embedded in this code; curiously,
gets() reads and stores NULs in its search for '\n'.)

This attack failed if you had made any changes to fingerd or to the C
library start-up code such that the buffer was in a different place on
the stack.  I myself had expanded the buffer, so that there was plenty
of room for the `extra' bytes.  (Hurrah for local modifications! :-) )
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

jrk@s1.sys.uea.ac.uk (Richard Kennaway CMP RA) (11/10/88)

Not being a Real Programmer (tm), I had to look in the Unix manual to see
what the fuss was about.  The gist of the entry for gets(3) is:

	NAME		gets, fgets - get a string from a stream
	SYNOPSIS	char *gets(s)

	gets reads characters from the standard input stream, stdin,
	into  the  array pointed to by s, until a new-line character
	is read or an end-of-file  condition  is  encountered.

In other words, gets will read an *arbitrarily large* amount of data from
the file and place it in memory, beginning at &(s[0]).  Presumably the
programmer must guess a suitable amount of memory to allocate for s, then
pray that no-one ever runs his program on a file with very long lines.

Words fail me.

rds95@leah.Albany.Edu (Robert Seals) (11/10/88)

In article <8841@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:
> In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
> >We need to get ... the next release from your favourite C vendor to
> >delete gets.
> 
> I think the appropriate treatment of gets() is to omit it from the
> documentation or ...

I suppose there might be reasons to do this, but 1) it smells real bad
already, and 2) is kinda dishonest, and 3) is annoying.
My objections to omitting documentation are mostly moral, while my objection
to gets() et al. is functionally borne out...

rob

snafu@ihlpm.ATT.COM (00704a-Wallis) (11/11/88)

In article <8841@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:
... text deleted ...
> By the way, have you removed scanf() from your C library as well?  Or
> sprintf()?  Or strcpy()?  They can be misused in the same way as gets().
> Let us know how happy your customers are once ALL such routines are gone.
> 
....

Actually, I don't understand the argument that
gets() should be removed because it can overrun
the buffer. What's to prevent the following (and
how is it different from gets?):

	char	some_string[10];

	fgets( some_string, 2147483647, stdin );


--
Dave Wallis
AT&T Network Systems
Lisle, IL 60532

att!ihlpm!snafu

evil@arcturus.UUCP (Wade Guthrie) (11/11/88)

In article <1988Nov8.054845.23998@utstat.uucp>, geoff@utstat.uucp (Geoff Collyer) writes:
> The recent exposure of the security bug in the 4BSD fingerd caused by
> use of gets(3) reminded me that gets is a bug waiting to happen and
> should be stamped out.

gets is bad? There is a problem?  Please explain this to us unknowledgable
types before it's too late.


Wade Guthrie
Rockwell International
Anaheim, CA

(Rockwell doesn't necessarily believe / stand by what I'm saying; how could
they when *I* don't even know what I'm talking about???)

gregg@ihlpb.ATT.COM (Wonderly) (11/11/88)

From article <8847@smoke.BRL.MIL>, by gwyn@smoke.BRL.MIL (Doug Gwyn ):
> In article <1031@cps3xx.UUCP> usenet@cps3xx.UUCP (Usenet file owner) writes:
>>This may be a naive question, or perhaps I haven't followed the right
>>stories, but what is the problem with using gets versus fgets?
> 
> If you don't know for sure that the input line will fit the buffer
> you've allocated for it, gets() can overrun the buffer (with random
> consequences).  However, if your program can be sure that the line
> will fit, there is nothing wrong with using gets().

I believe that the right thing to do is to use a new function called
nlfgets (str, size, fp), that does exactly as gets(3).  The biggest
concern that most people have about moving from gets to fgets is the
added hassle of doing a 

	if ((t = strchr (buf, '\n')) != NULL)
		*t = 0;

This seems to be a lot of work when you may be processing thousands of
lines of code.  I have written this exact function many times just to
have the benefit of no strchr() call.


-- 
It isn't the DREAM that NASA's missing...  DOMAIN: gregg@ihlpb.att.com
It's a direction!                          UUCP:   att!ihlpb!gregg

rob@pbhyf.PacBell.COM (Rob Bernardo) (11/11/88)

Doug Gwyn:
+By the way, have you removed scanf() from your C library as well?  Or
+sprintf()?  Or strcpy()?  They can be misused in the same way as gets().
+Let us know how happy your customers are once ALL such routines are gone.

Wallis:
+Actually, I don't understand the argument that
+gets() should be removed because it can overrun
+the buffer. What's to prevent the following (and
+how is it different from gets?):
+
+	char	some_string[10];
+
+	fgets( some_string, 2147483647, stdin );

I think we need to make a distinction between three similar but different
situations.

1. One set of functions (e.g. gets()) deal with file input of indeterminate
   size.
2. Other functions (e.g. fgets()) deal with file input of limited size.
3. Yet other functions (e.g. strcpy()) deal with data internal to the program.

In order to guarantee that the buffer used by functions of type 1 will not
overflow, the programmer has to guarantee something *outside* the program:
that none of the lines in the file being read will ever exceed the buffer size.
Often the programmer cannot guarantee this.

But with functions of type 2 and 3, the programmer merely has to size things
*within* the program appropriately. The programmer *always* has the
capability to do this.

Scanf() and fscanf() can fit into type 1 or type 2 depending on whether
a field width is used in each conversion specification.
-- 
Rob Bernardo, Pacific Bell UNIX/C Reusable Code Library
Email:     ...![backbone]!pacbell!pbhyf!rob   OR  rob@pbhyf.PacBell.COM
Office:    (415) 823-2417  Room 4E750A, San Ramon Valley Administrative Center
Residence: (415) 827-4301  R Bar JB, Concord, California

les@chinet.chi.il.us (Leslie Mikesell) (11/11/88)

In article <8841@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>By the way, have you removed scanf() from your C library as well?  Or
>sprintf()?  Or strcpy()?  They can be misused in the same way as gets().
>Let us know how happy your customers are once ALL such routines are gone.

With gets() and strcpy() a safe alternative exists.  Is everyone really
going to write their own safe versions of scanf() and sprintf()?  I always
wondered why the standard library versions have no way to control the
size of the output - maybe real programers like core dumps?

Les Mikesell

les@chinet.chi.il.us (Leslie Mikesell) (11/11/88)

In article <9054@ihlpb.ATT.COM> gregg@ihlpb.ATT.COM (Wonderly) writes:

>I believe that the right thing to do is to use a new function called
>nlfgets (str, size, fp), that does exactly as gets(3). 

>....  I have written this exact function many times just to
>have the benefit of no strchr() call.

Same here, but I made it return the number of characters read so that
it can also avoid the strlen() call (which will be incorrect anyway
if there were any nulls in the line).  Why does any function return a
pointer that you obviously already knew?  Seems like it would only
be useful if you wanted to nest function calls and ignore errors. 

Les Mikesell

daveh@marob.MASA.COM (Dave Hammond) (11/11/88)

In article <32301@oliveb.olivetti.com> chase@Ozona.UUCP (David Chase) writes:
>You should also consider retiring certain features of 'scanf' and
>'fscanf'.  A call along the lines of
>
>    scanf("%s", junk);
>
>is perfectly able to scribble past the end of 'junk'.  I'm not sure if
>there are other holes like this built in to the standard i/o library;
>it wouldn't hurt to check.  (I've never been a real fan of 'scanf',
>but it does seem marginally more useful and harder to replace than
>'gets').

Carrying this line of thought foreward, it would seem that Mr. Chase
advocates retiring any library call which requires that the programmer
take responsibility for providing enough buffer space to handle the
data resulting from the call in question.

IMHO, if the programmer is aware that the library call does not know
about buffer length (which is obvious when no length parameter is passed
to the call), then it is the programmers responsibility to ensure that
(a) the buffer is of an appropriate length for his/her application, or
(b) if an appropriate length can not be determined, the call should *not*
be used.

[the unnecessary overhead of calling scanf("%s") instead of fgets()
 or a getc() loop might also be pointed out -- but I suspect the example
 was nothing more than that]

Dave Hammond
  UUCP: ...!uunet!masa.com!{marob,dsix2}!daveh
DOMAIN: daveh@marob.masa.com
----------------------------------------------------------------------------

guy@auspex.UUCP (Guy Harris) (11/12/88)

>What's to prevent the following...

Nothing, other than intelligence on the part of the programmer. 
However, unless your application can guarantee that the input will
*never* have overly-long lines (or can hand a buffer *so* immense to
"gets" that it won't matter - but consider how big a buffer might well
have to be), there's nothing to prevent a blowup in a program using
"gets()".

I don't know that I'd argue that "gets()" should be removed, especially
since it's in the dpANS.  I would, however, argue that it should *never*
be used.

jwr@scotty.UUCP (Dier Retlaw Semaj) (11/12/88)

In article geoff@utstat.uucp (Geoff Collyer) writes:
>
>The recent exposure of the security bug in the 4BSD fingerd caused by
>use of gets(3) reminded me that gets is a bug waiting to happen and
>should be stamped out.

Would someone please explain me what the problem with gets(3) is,
or point me in appropriate direction of something that would.
I'd be interested in finding out.

Thank you.

-- 

Dier R. Semaj	{ames,cmcl2,rutgers}!rochester!kodak!fedsys!wally!jwr

--

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (11/12/88)

In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
| The recent exposure of the security bug in the 4BSD fingerd caused by
| use of gets(3) reminded me that gets is a bug waiting to happen and
| should be stamped out.  I have deleted gets from my stdio implementation

I hate to say this, but C allows many things which are unsafe. The
problem is not the language, or the library, but that people make bad
choices about their selection of features.

If you stamp out gets you will see postings of dozens of "public domain
replacements" for the gets features "left out of BSD 4.17" or whatever.
I don't disagree for a moment with your sentiment, and I see the
problem, but I think you will have better luck educating your users on
how to use the language than taking away all the parts with sharp edges.

The best way to get rid of gets is to offer a better alternative. I
wrote a "getsn" routine which looks like fgets but avoids putting the
newline in the buffer in the first place, and I would expect to find
that hundreds of others have do it, too. There is no way to strip the
newline as quickly as not putting it in the buffer in the first place.

| With your help, we can stamp out gets in our lifetimes.

From or header files and our libraries, but not from our programmer's
hearts (unfortunately).

| -- 
| Geoff Collyer	utzoo!utstat!geoff, geoff@utstat.toronto.edu


-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

mcdonald@uxe.cso.uiuc.edu (11/12/88)

>gets() is deliberately required for ANSI C standard conformance because
>a LOT of existing code relies on it.  Any vendor who omits this function
>will not be standard conforming and will not sell its compiler to those
>(expected to be MANY customers) who specify standard conformance.

How about fixing this, and the scanf and strcpy problems as well,
by a little outside-the-standard kludge? (Okay, I realize that every
time I suggest something like this, somebody tries to roast me,
but I am flameproof.) That is

#pragma _MAX_STRING_LENGTH=256  /*or some other suitable number*/

and the compiler would call special versions of gets, strcpy,
and cohorts, that stopped at such a maximum. Now I am not sure whether
the result of overrun would have to be a fatal error or whether
it could just stop copying, but that would at least prevent 
old bugs from biting too bad.

ggs@ulysses.homer.nj.att.com (Griff Smith) (11/12/88)

In article <9054@ihlpb.ATT.COM>, gregg@ihlpb.ATT.COM (Wonderly) writes:
> I believe that the right thing to do is to use a new function called
> nlfgets (str, size, fp), that does exactly as gets(3).  The biggest
> concern that most people have about moving from gets to fgets is the
> added hassle of doing a 
| 
> 	if ((t = strchr (buf, '\n')) != NULL)
> 		*t = 0;
| 
> This seems to be a lot of work when you may be processing thousands of
> lines of code.  I have written this exact function many times just to
> have the benefit of no strchr() call.
> -- 
> It isn't the DREAM that NASA's missing...  DOMAIN: gregg@ihlpb.att.com
> It's a direction!                          UUCP:   att!ihlpb!gregg

I think this misses the point.  Gets guarantees that you will read all
the characters in a line, but forces you to write insecure programs.
One must also make the dubious assumption that the first null
encountered is the terminal null rather than a null in the file.  Your
variation avoids the security problem, but preserves the ambiguity of
nulls.  It also adds another ambiguity: if someone hands you a line
that is longer than your buffer, you gratuitously break it into two
lines since you don't know where the newline is.  Fgets avoids all
these problems by marking the end of a line with newline.  Proper use
requires that you call fgets until you find the newline.  You may need
to use malloc as you discover that the line is much larger than
anticipated.  Fgets does have one annoying flaw: it should return the
character count instead of the worthless pointer to the destination.

If you complain that all this fuss is unnecessary, since all reasonable
input will fit in the buffer you provided, you are really saying you
don't like to write correct programs.  I sometimes settle for `partially
correct': a program must either operate as specified or stop.  Breaking
lines isn't even partially correct.
-- 
Griff Smith	AT&T (Bell Laboratories), Murray Hill
Phone:		1-201-582-7736
UUCP:		{most AT&T sites}!ulysses!ggs
Internet:	ggs@ulysses.att.com

geoff@utstat.uucp (Geoff Collyer) (11/12/88)

> From: gwyn@smoke.BRL.MIL (Doug Gwyn )
> 
> gets() is deliberately required for ANSI C standard conformance because
> a LOT of existing code relies on it.

That's the whole point, Doug.  People *should* fix their existing code;
it's unsafe.

> Any vendor who omits this function
> will not be standard conforming and will not sell its compiler to those
> (expected to be MANY customers) who specify standard conformance.

Once the standards are changed, their code *will* be standard-conforming.

> Even if your philosophy is right, you should get others to go along with
> it BEFORE trying to force them to conform to it.

That's what I'm trying to do now: get people to agree, and then act on
that agreement.

> By the way, have you removed scanf() from your C library as well?  Or
> sprintf()?  Or strcpy()?  They can be misused in the same way as gets().

No, I have not; all of these functions *can* be used safely, though it
does take a little extra care.  The point is that gets() *can* *not* be
used safely; a dedicated opponent can *always* defeat a program that
reads with gets().

> I think the appropriate treatment of gets() is to omit it from the
> documentation or to document it with "Unless you have sufficient control
> over the data being read to be sure that it will not overflow the buffer,
> use fgets".

It is in general not possible to have sufficient control over the input
data.  Remember the old maxim "Never trust any input." or, as Kernighan
and Plauger put it in Elements of Programming Style, "Make sure input
cannot violate the limits of the program.".  It is *not* *possible* to
ensure that input cannot violate the limits of a program which uses
gets().  Someone can always provide input longer than the program's
gets() buffer.
-- 
Geoff Collyer	utzoo!utstat!geoff, geoff@utstat.toronto.edu

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/12/88)

In article <2566@ihlpm.ATT.COM> snafu@ihlpm.ATT.COM (00704a-Wallis) writes:
-Actually, I don't understand the argument that
-gets() should be removed because it can overrun
-the buffer. What's to prevent the following (and
-how is it different from gets?):
-	char	some_string[10];
-	fgets( some_string, 2147483647, stdin );

The main difference is that the above example would immediately
raise a flag in the mind of almost any competent programmer reading
the code, whereas we have not yet attained that degree of awareness
concerning gets() on uncontrolled sources of input.

strcpy() also is widely abused, so my mentioning it was not spurious.
The solution is not to ban potentially dangerous tools, but to ensure
that people are properly trained in their safe use.

scs@athena.mit.edu (Steve Summit) (11/12/88)

In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
>...gets is a bug waiting to happen and should be stamped out.

Getting rid of gets is an excellent idea.  I'm all for backwards
compatibility and not breaking existing code, but it's got to be
conscientiously written existing code, and to my way of thinking
no reasonable program should ever have been using gets.
(Apologies and condolences to those of you who do, and to the
original implementor.)

As an interim measure, why not recode gets with an implicit
maximum buffer size of, say, 512?  That is, implement it as if by

	char *gets(buf)
	char *buf;
	{
	return fgets(buf, 512, stdin);
	}

except with the requisite newline-stripping code added.  I doubt
this would break many programs, particularly since 512 is a
common buffer size anyway.  Programs that use bigger buffers will
just have to use fgets, if they're not already.

After we get rid of gets, we should get rid of calloc(n, size),
which doesn't really do anything for you that malloc(n * size)
doesn't do.  (This is not a security hole, just a quality-of-life
issue.)  calloc's only claim to fame is specious; its zero fill
property is misunderstood by many programmers and is sufficiently
useless that it can easily be replaced by bzero and/or memset for
those few instances that truly require filling with bytes of zero.
(Recall that such a zero fill does not necessarily result in NULL
pointers or 0.0 floating-point values, in the common case where
arrays or structures are being allocated.)

Finally, I'd not mourn the passing of scanf -- not just %s, but
all of it.  It just doesn't work robustly enough for its common
usage: interactive user input.  (For example, scanf("%d %d") gives
you no way of prodding the user if he only types one number;
newlines are acceptable whitespace, so the user can keep banging
the return key and getting nowhere because scanf hasn't returned
and your program can't say "please type two numbers" even if it
wants to.  A related case is scanf("%s") to read command lines:
you'd like to print another prompt if the user hits return
without typing a command, but you can't, again because scanf
doesn't return.)  sscanf can remain for picking apart strings
(perhaps read with fgets) while leaving the calling program in
control for error handling, and fscanf can remain for reading
carefully formatted data from files.

(I'm not seriously suggesting getting rid of scanf; I know how
many programs use it.  To my mind, however, there are no good
uses of scanf, but as long as it exists, people are going to keep
using it, because it is extremely convenient.)

                                            Steve Summit
                                            scs@adam.pika.mit.edu

mesard@bbn.com (Wayne Mesard) (11/13/88)

From article <9054@ihlpb.ATT.COM>, by gregg@ihlpb.ATT.COM (Wonderly):
> The biggest
> concern that most people have about moving from gets to fgets is the
> added hassle of doing a 
> 
> 	if ((t = strchr (buf, '\n')) != NULL)
> 		*t = 0;
> 
> This seems to be a lot of work when you may be processing thousands of
> lines of code.

Or have this new function return a pointer to the _last_ char read,
instead of redundantly returning its first param.  This would require
exactly no extra work on the part of the library routine or the client
program.

-- 
void *Wayne_Mesard();         MESARD@BBN.COM         BBN, Cambridge, MA

henry@utzoo.uucp (Henry Spencer) (11/13/88)

In article <8841@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>I think the appropriate treatment of gets() is to omit it from the
>documentation or to document it...  But leave it in the library.

Actually, my suggestion to Geoff (which he did mention, note) was that
it ought to go into a separate backwards-compatibility library.  That
way it's available, *but* you have to ask for it explicitly.
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

henry@utzoo.uucp (Henry Spencer) (11/13/88)

In article <8847@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>... if your program can be sure that the line
>will fit, there is nothing wrong with using gets().

That's a large "if" in most cases, however.
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

henry@utzoo.uucp (Henry Spencer) (11/13/88)

In article <2566@ihlpm.ATT.COM> snafu@ihlpm.ATT.COM (00704a-Wallis) writes:
>... What's to prevent the following (and
>how is it different from gets?):
>	fgets( some_string, 2147483647, stdin );

Programmers with IQs larger than their waistlines?

Nobody can protect against stupid programmers.  But gets doesn't even give
you a chance to be smart.
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

henry@utzoo.uucp (Henry Spencer) (11/13/88)

In article <6927@chinet.chi.il.us> les@chinet.chi.il.us (Leslie Mikesell) writes:
>... Is everyone really
>going to write their own safe versions of scanf() and sprintf()?  I always
>wondered why the standard library versions have no way to control the
>size of the output - maybe real programers like core dumps?

ANSI C came within a hairsbreadth of including a length-limited sprintf.
(There are length-limiting provisions in scanf, if you read the manual.)
If there had been any prior experience with it, it probably would have
made it.  Sigh.
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

peter@ficc.uu.net (Peter da Silva) (11/13/88)

In article <2566@ihlpm.ATT.COM>, snafu@ihlpm.ATT.COM (00704a-Wallis) writes:
> Actually, I don't understand the argument that
> gets() should be removed because it can overrun
> the buffer. What's to prevent the following (and
> how is it different from gets?):

> 	char	some_string[10];

> 	fgets( some_string, 2147483647, stdin );

This is a program bug... the programmer specified the wrong buffer size.
Unlike the case of gets, you can limit the read to the buffer size. In all
the other routines with the gets problem, a program can be written that will
not allow any buffer overflow:

char buffer[10];

	sprintf(buffer, "%.9s", ptr);
	fscanf(fp, "%.9s", buffer);
	fgets(buffer, 10, fp);

The problem is that there is no way to limit how much I/O gets will do.
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation
"Have you hugged  U  your wolf today?"     uunet.uu.net!ficc!peter
Disclaimer: My typos are my own damn business.   peter@ficc.uu.net

awm@gould.doc.ic.ac.uk (Aled Morris) (11/14/88)

I was going to suggest the following as a replacement for "gets":

	#define gets(buf) fgets(buf, sizeof(buf), stdin)

since all the examples I've seen of "gets" in use have been:

	char buf[10];
	...
	gets(buf);

But of course it won't work, (a) gets drops the newline at the end, fgets
keeps it, and (b) maybe someone, has written:

	char *buf;
	buf = malloc(10);
	gets(buf);

(although the #define would be fine in this case, it would read only 4
characters :-)

I guess there isn't an easy answer :-(  (but you didn't need me to tell you
that)

Aled Morris
systems programmer

    mail: awm@doc.ic.ac.uk    |    Department of Computing
    uucp: ..!ukc!icdoc!awm    |    Imperial College
    talk: 01-589-5111x5085    |    180 Queens Gate, London  SW7 2BZ

scs@athena.mit.edu (Steve Summit) (11/14/88)

In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
>...gets is a bug waiting to happen and should be stamped out.

Getting rid of gets is an excellent idea.  I'm all for backwards
compatibility and not breaking existing code, but it's got to be
conscientiously written existing code, and to my way of thinking
no reasonable program should ever have been using gets.
(Apologies and condolences to those of you who do, and to the
original implementor.)

As an interim measure, why not recode gets with an implicit
maximum buffer size of, say, 512?  That is, implement it as if by

	char *gets(buf)
	char *buf;
	{
	return fgets(buf, 512, stdin);
	}

except with the requisite newline-stripping code added.  I doubt
this would break many programs, particularly since 512 is a
common buffer size anyway.  Programs that use bigger buffers will
just have to use fgets, if they don't already.

After we get rid of gets, we should get rid of calloc(n, size),
which doesn't really do anything for you that malloc(n * size)
doesn't do.  (This is not a security hole, just a quality-of-life
issue.)  calloc's only claim to fame is specious; its zero fill
property is misunderstood by many programmers and is sufficiently
useless that it can easily be replaced by bzero and/or memset for
those few instances that truly require filling with bytes of zero.
(Recall that such a zero fill does not necessarily result in NULL
pointers or 0.0 floating-point values, in the common case where
arrays or structures are being allocated.)

Finally, I'd not mourn the passing of scanf -- not just %s, but
all of it.  It just doesn't work robustly enough for its common
usage: interactive user input.  (For example, scanf("%d %d") gives
you no way of prodding the user if he only types one number;
newlines are acceptable whitespace, so the user can keep banging
the return key and getting nowhere because scanf hasn't returned
and your program can't say "please type two numbers" even if it
wants to.  A related case is scanf("%s") to read command lines:
you'd like to print another prompt if the user hits return
without typing a command, but you can't, again because scanf
doesn't return.)  sscanf can remain for picking apart strings
(perhaps read with fgets) while leaving the calling program in
control for error handling, and fscanf can remain for reading
carefully formatted data from files.

(I'm not seriously suggesting getting rid of scanf; I know how
many programs use it.  To my mind, however, there are no good
uses of scanf, but as long as it exists, people are going to keep
using it, because it is extremely convenient.)

                                            Steve Summit
                                            scs@adam.pika.mit.edu

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/14/88)

In article <1988Nov11.232629.15414@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
>> From: gwyn@smoke.BRL.MIL (Doug Gwyn )
>> gets() is deliberately required for ANSI C standard conformance because
>> a LOT of existing code relies on it.

>That's the whole point, Doug.  People *should* fix their existing code;
>it's unsafe.

Bullshit.  When I use gets() I use it safely.

>> Any vendor who omits this function
>> will not be standard conforming and will not sell its compiler to those
>> (expected to be MANY customers) who specify standard conformance.

>Once the standards are changed, their code *will* be standard-conforming.

The standard is not going to change.  This proposal has been debated and
rejected by X3J11 on more than one occasion.  (See my first sentence
quoted above.)

>> Even if your philosophy is right, you should get others to go along with
>> it BEFORE trying to force them to conform to it.

>That's what I'm trying to do now: get people to agree, and then act on
>that agreement.

It has already been tried, and failed.

>> By the way, have you removed scanf() from your C library as well?  Or
>> sprintf()?  Or strcpy()?  They can be misused in the same way as gets().

>No, I have not; all of these functions *can* be used safely, though it
>does take a little extra care.  The point is that gets() *can* *not* be
>used safely; a dedicated opponent can *always* defeat a program that
>reads with gets().

I already said "bullshit" to this so I need not repeat it here.

gets() has legitimate uses.  It is in the library Base Document.
It is widely used in existing code (sometimes safely, sometimes not).
It stays.

You seem to want to protect the programmer who is too stupid to
protect himself.  This is a dangerous thing to attempt where C is
concerned.  My god, pointers can really be abused -- maybe we
better get rid of them too.

The right thing to do, as I said before, is to eductae craftsmen
in the proper use of their tools so they don't hurt themselves or
their customers.

ok@quintus.uucp (Richard A. O'Keefe) (11/14/88)

In article <7963@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
>In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
>>...gets is a bug waiting to happen and should be stamped out.
>
>Getting rid of gets is an excellent idea.  I'm all for backwards
>compatibility and not breaking existing code, but it's got to be
>conscientiously written existing code, and to my way of thinking
>no reasonable program should ever have been using gets.
>(Apologies and condolences to those of you who do, and to the
>original implementor.)

When I am writing a program for my own use to process my own data
sets which I _know_ have reasonable lines, why the d---l shouldn't
I use gets()?  If I am writing a program for _other_ people to use,
I have an obligation to try to make it reasonably robust, but a lot
of my C programs are there for a day (I find it easier to write C
than awk, better debugging tools to start with... -- would a lint
for awk be called lawk?).

I have just posted a "safe gets" to comp.sources.misc.

desnoyer@Apple.COM (Peter Desnoyers) (11/15/88)

Perhaps I'm being naive, but wouldn't changing
  char buf[x];  gets( buf);
to
  char * buf;   buf = malloc( x);  gets( buf);
eliminate most (not all) of the security hole associated with gets()?
The problem seems to be not only the use of gets(), but the use of
temporary arrays on the stack to hold the output of dangerous
functions. If you keep the buffer off the stack you make it much more
difficult to exploit gets()'s unsafeness. (unless all you want to do
is make the program crash.)

				Peter Desnoyers

shankar@hpclscu.HP.COM (Shankar Unni) (11/15/88)

> gets() has legitimate uses.  It is in the library Base Document.
> It is widely used in existing code (sometimes safely, sometimes not).
> It stays.

Exactly how do you use gets "safely"? There is really no way to stop gets
from overwriting the end of your buffer, unless you fiddle around with the
internals of stdio. Or do you "know" that your buffer is large enough? (This
might be acceptable for a limited situation, like when you or a trusted
co-program is writing stuff that you're reading from stdin, but in a more
general case, it's impossible). 

The thing about gets is that until now, the hazards of using it have not been
adequately advertised. There is no mention in any book or reference on C
about how gets can be perverted to blow away your application. It does occur
to most C programmers ultimately that there is "something wrong" with gets
when you cannot specify the max length to read in, but the magnitude of the
problem rarely sinks in.

This is why the suggestion of moving gets() to a compatibility library
sounds so good: this gives you the opportunity of making C programmers
re-evaluate their use of gets(), and replace it with fgets() if they are
unsure of the security and integrity implications of using gets().

But then, C programmers are such a spoilt bunch (sigh!). They scream and
moan at the least little trouble they are put to :-(.
----
Shankar.

chase@Ozona.orc.olivetti.com (David Chase) (11/15/88)

In article <20588@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
>Perhaps I'm being naive, but wouldn't changing
>  char buf[x];  gets( buf);
>to
>  char * buf;   buf = malloc( x);  gets( buf);
>eliminate most (not all) of the security hole associated with gets()?

In practice it would make invasion difficult.  Do bear in mind that it
might not make it impossible; memory allocation may look like a black
box to you, but with a little care purposeful overwriting is possible
(for example, in testing a new garbage collector we discovered a bug
tickled by the collector that (independent of input) consistently
appeared at the same line while examining a structure at the same
address.  The cause?  Overrunning of a data structure in the heap.)
(Yes, a garbage collector for C --
           ~ftp/sun-source/gc.shar@titan.rice.edu
It works on Sun3s, Sun4s, Vaxes.  Send mail with subject "help" to
"archive-server@rice.edu" if you lack FTP access.)

What I fail to understand is why you couldn't just as easily write

  char * buf;   buf = malloc(x);  fgets(buf, x, stdin);

(yes, I know that fgets leaves the newline in the string)

People say again and again "but I know how big the input is in my
programs, so it's safe to use 'gets'".  If you know how big the input
is, then you might as well say it.  People talk about performing
certain hand-optimizations in a habitual way; is it too much to ask
people to acquire habits that make their programs more robust?
Optimizing a correct program is easier than correcting an optimized
program (more fun, too).

David

geoff@utstat.uucp (Geoff Collyer) (11/15/88)

> From: gwyn@smoke.BRL.MIL (Doug Gwyn )
> 
> Bullshit.  When I use gets() I use it safely.

Okay, Doug, let's take this again from the top.  I'll use simple words
and try to make myself utterly clear, and I won't even abuse your
ancestry or swear at you, which I think is awfully polite of me, under
the circumstances.

To be proven: gets(3) should be abolished.

Any program which uses gets(3) can be corrupted by giving it a
long-enough input line.  There is no protection possible against such an
attack, other than sh's trick of making the gets buffer the last object
in the data segment, catching the resulting SIGSEGV signal, growing the
data segment and returning from the signal catcher, and this is
certainly not portable to Cray-1s and Sun-3s, for example.  gets is
probably unique among C library functions because it cannot be used
safely, no matter how hard you wish or how hard you work.  Thus there
seems little point (aside from writing unsafe programs) in continuing to
support gets in standards and C libraries.  QED
-- 
Geoff Collyer	utzoo!utstat!geoff, geoff@utstat.toronto.edu

jrk@s1.sys.uea.ac.uk (Richard Kennaway CMP RA) (11/15/88)

In article <8876@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:
> In article <1988Nov11.232629.15414@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
> 
> >That's the whole point, Doug.  People *should* fix their existing code;
> >it's unsafe.
> 
> Bullshit.  When I use gets() I use it safely.

Please give us an example of your safe use of gets().
-- 
Richard Kennaway
School of Information Systems, University of East Anglia, Norwich, U.K.
uucp:	...mcvax!ukc!uea-sys!jrk	Janet:	kennaway@uk.ac.uea.sys

jackson@freyja (Jerry Jackson) (11/16/88)

In article <32596@oliveb.olivetti.com>, chase@Ozona (David Chase) writes:

>(Yes, a garbage collector for C --
>           ~ftp/sun-source/gc.shar@titan.rice.edu
>It works on Sun3s, Sun4s, Vaxes.  Send mail with subject "help" to
>"archive-server@rice.edu" if you lack FTP access.)
>

I've written garbage collectors for lisp and have a pretty good idea
what is involved... I can't imagine what this does, but I'm pretty sure
it's something very different.  Could someone please explain what this
program does?

Thanks,

Jerry Jackson

dhesi@bsu-cs.UUCP (Rahul Dhesi) (11/16/88)

In article <1988Nov14.220842.3980@utstat.uucp> geoff@utstat.uucp (Geoff
Collyer) writes:
>gets is
>probably unique among C library functions because it cannot be used
>safely, no matter how hard you wish or how hard you work.

Well, now, suppose we (a) write lines of known length to a file that is
not writable by anybody else, (b) open that file for input and make it
our standard input and (c) use gets with a buffer that is known to be
big enough to hold any line in that file.

Thus gets is probably unique among C library functions because it can
be used safely if you try hard enough, but it should not be used
anyway, because most practical uses of it are not safe.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi

jimp@cognos.uucp (Jim Patterson) (11/16/88)

In article <7963@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
>After we get rid of gets, we should get rid of calloc(n, size),
>which doesn't really do anything for you that malloc(n * size)
>doesn't do.calloc's only claim to fame is specious; its zero fill
>property is misunderstood by many programmers and is sufficiently
>useless that it can easily be replaced by bzero and/or memset for
>those few instances that truly require filling with bytes of zero.
>(Recall that such a zero fill does not necessarily result in NULL
>pointers or 0.0 floating-point values, in the common case where
>arrays or structures are being allocated.)

I've seen this point made many times, and it's even in the ANSI draft
standard that null pointers don't HAVE to be all-bits-zero (as opposed
to the "null pointer constant", which IS required to be 0).
Realistically, though, are there REALLY C implementations out there
which don't take binary 0 to be a NULL pointer, or a floating-point
datum of all zero bits to be other than 0.0? I would be very
interested in hearing about such systems.

I know of at least one system where the system convention is not 0;
Data General MV systems have instructions which take -1 as the null
pointer value, and this has persisted through many system call
conventions as well. However, the C implementation still considers a
null pointer to be 0 even though this requires quite a bit of "glue"
around some system calls to interface between the two formats.
Requiring that the "null pointer constant" be 0, as ANSI C does, just
makes any other implementation painfully difficult (and is begging for
problems when porting software as well).

I don't consider calloc() specious; if you have a large table to
allocate even memset() can be too much overhead if you can do it
better. Explicitly setting all elements of a table to the appropriate
sort of 0, while maximally portable, is definitely even less efficient
(assuming your memset implementation isn't completely out to lunch).
If efficiency isn't a problem, fine, but often it is.

Where a good implementation of calloc() can shine is in virtual memory
(VM) environments where it can avoid actually faulting in the pages
that you allocate. On many VM systems you can do this using a
demand-page-zero page type which is allocated and cleared to zero when
it's first referenced (VAX VMS is one system that supports this).  You
can't take advantage of this using malloc() and memset() (or explicit
initialization). You are forced to fault in the entire area to clear
it to zero, even though if it's a large area much of it will likely be
faulted out again before you reference it again (if you do).

It's worth noting that pre-clearing memory shouldn't be considered
wasted overhead on the part of the OS. It's an important security
precaution, to prevent other system users from poking through memory
that used to belong to someone else and which could contain sensitive
information. This may not be important to all users, but it is to
many.
-- 
Jim Patterson                              Cognos Incorporated
UUCP:decvax!utzoo!dciem!nrcaer!cognos!jimp P.O. BOX 9707    
PHONE:(613)738-1440                        3755 Riverside Drive
                                           Ottawa, Ont  K1G 3Z4

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/16/88)

In article <660023@hpclscu.HP.COM> shankar@hpclscu.HP.COM (Shankar Unni) writes:
>Or do you "know" that your buffer is large enough?

Sometimes one does know exactly this.

Of course you don't know it for a general-purpose utility whose stdin
can be directed from random places.  So you don't use gets() then.

>The thing about gets is that until now, the hazards of using it have not been
>adequately advertised. There is no mention in any book or reference on C
>about how gets can be perverted to blow away your application.

The potential for abuse of gets() was quite well known before the virus
attack.  For instance, I don't think anyone on X3J11 was unaware of it.
I'm pretty sure this has been discussed in comp.lang.c (INFO-C) more
than once before.

To take just two of the standard C texts:

Harbison & Steele, "C: A Reference Manual":
	The use of gets can be dangerous because it is always possible
	for the input length to exceed the storage available in the
	character array.

Plum, "Reliable Data Structures in C":
	Since it provides no means of specifying the size of the
	receiving string, it can seldom be used in reliable programs.
	Furthermore, it gives no convenient way to tell whether a
	newline was present in the input.  The fgets function is more
	reliable, but oftentimes awkward to use.

From the Rationale for Draft Proposed American National Standard for
Information Systems -- Programming Language C:
	4.9.7.2  The fgets function
	This function subsumes gets, which has no limit to prevent
	storage overwrite on arbitrary input (see section 4.9.7.7).

>But then, C programmers are such a spoilt bunch (sigh!). They scream and
>moan at the least little trouble they are put to :-(.

I will complain if you try to enforce your notions of proper style on me,
or try to protect me from myself.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/16/88)

In article <1988Nov14.220842.3980@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
>To be proven: ...

Did you really get through math classes with such a notion of "proof"?
"Any program ... can be corrupted ..." and "There is no protection
possible ..." and "it cannot be used safely ..." are simply stated,
not demonstrated.  In fact they're wrong.  I routinely use gets() in
an utterly safe manner.  I'll let you try to figure out how this can be..
(Hint:  Examine your notion of what is "always" possible.)

ok@quintus.uucp (Richard A. O'Keefe) (11/16/88)

In article <1988Nov14.220842.3980@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
>Any program which uses gets(3) can be corrupted by giving it a
>long-enough input line.  There is no protection possible against such an
>attack

There is a false assumption in this, namely that an attacker can control
the input to every program.  If I have a program which _only_ I have
permision to execute, and I _always_ use it in a pipeline (or in a
command script), and the preceding program in the pipeline (or script)
always generates sufficiently short lines, it is safe to use gets().
The input to such a program is _every_ bit as much under my control as
the source argument of strcpy().

daveh@marob.MASA.COM (Dave Hammond) (11/16/88)

In article <225800090@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>How about fixing this, and the scanf and strcpy problems as well,
>by a little outside-the-standard kludge? (Okay, I realize that every
>time I suggest something like this, somebody tries to roast me,
>but I am flameproof.) That is
>
>#pragma _MAX_STRING_LENGTH=256  /*or some other suitable number*/
>
>and the compiler would call special versions of gets, strcpy,
>and cohorts, that stopped at such a maximum. Now I am not sure whether
>the result of overrun would have to be a fatal error or whether
>it could just stop copying, but that would at least prevent 
>old bugs from biting too bad.

Sorry, but I fail to see where this (and a previous article suggesting
a 512 byte limit) helps the problem if the programmer uses a buffer
whos length is smaller than MAX_STRING_LENGTH.  The result is still
going to be an overflowed buffer, which is still going to be wrong.

The best solution is to have the programmer instruct the function as
to the *true* buffer length, and this can only be done with a function
which expects a length parameter (eg fgets()).

Dave Hammond
  UUCP: ...!uunet!masa.com!{marob,dsix2}!daveh
DOMAIN: daveh@marob.masa.com
----------------------------------------------------------------------------

henry@utzoo.uucp (Henry Spencer) (11/17/88)

In article <8902@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>... In fact they're wrong.  I routinely use gets() in
>an utterly safe manner...

Well, "utterly safe" if you're always very careful that part A of your
program preserves the length limits that part B is relying on.  Personally
I prefer slightly more robust programming, especially when there's no
significant difference in convenience or efficiency.
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

news@ism780c.isc.com (News system) (11/17/88)

In article <682@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>
>There is a false assumption in this, namely that an attacker can control
>the input to every program.  If I have a program which _only_ I have
>permision to execute, and I _always_ use it in a pipeline (or in a
>command script), and the preceding program in the pipeline (or script)
>always generates sufficiently short lines, it is safe to use gets().
>The input to such a program is _every_ bit as much under my control as
>the source argument of strcpy().

No one worries much about a program written by Mr O'keefe that can be
executed only by Mr O'keefe.  What worries most people is programs
distributed for public use that are written by someone who is unaware of the
'gets problem'.  Simply admonishing programers (of publicly available
software) to avoid making the 'gets mistake' is less effective than removing
gets from the library.

I would like to suggest a library routine to replace gets say,
safegets(buffer,count), which for lines no longer then count would behave
like gets, and for lines longer than count would place the first count-1
characters of the line into the buffer followed by a '\0'.  The value
returned by safegets is the line length (or EOF).

    Marv Rubinstein

john@frog.UUCP (John Woods) (11/17/88)

In article <660023@hpclscu.HP.COM>, shankar@hpclscu.HP.COM (Shankar Unni) writes:
> > gets() has legitimate uses.  It is in the library Base Document.
> > It is widely used in existing code (sometimes safely, sometimes not).
> > It stays.
> Exactly how do you use gets "safely"?

The only case I can think of is when you have a process that fork()s, and the
parent feeds the child stuff which is guaranteed to fit into the buffer.

I used to think that parsing machine-generated output files was another case.
Then one day my program for analyzing /usr/spool/uucp/SYSLOG started blowing
out because I had run out of space during a uucp transfer...

gets.  A clock-tick of convenience.  A process-lifetime of regret.  :-)
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

Science does not remove the TERROR of the Gods!

chris@mimsy.UUCP (Chris Torek) (11/18/88)

In article <4509@aldebaran.UUCP> jimp@cognos.uucp (Jim Patterson) writes:
>Realistically, though, are there REALLY C implementations out there
>which don't take binary 0 to be a NULL pointer, or a floating-point
>datum of all zero bits to be other than 0.0?

The S1 project at LLL built such a machine, and the people working
on it eventually gave in and made all-bits-zero be a nil pointer.
It was less work than fixing all the incorrect programs.

>Data General MV systems have instructions which take -1 as the null
>pointer value....  the C implementation still considers a
>null pointer to be 0 even though this requires quite a bit of "glue"
>around some system calls to interface between the two formats.
>Requiring that the "null pointer constant" be 0, as ANSI C does, just
>makes any other implementation painfully difficult (and is begging for
>problems when porting software as well).

It is neither particularly painful nor difficult, but it is indeed
begging to expose all the old bugs (similar to what Sun did when
porting 4.2BSD onto their hardware, where *(char *)0 was not 0, but
rather `segmentation fault').

>Where a good implementation of calloc() can shine is in virtual memory
>(VM) environments where it can avoid actually faulting in the pages
>that you allocate. ... demand-page-zero page type ... (VAX VMS is one
>system that supports this).

4BSD Unix also supports it.

While this is true, it is also true that malloc() can avoid faulting in
the pages too, if you simply leave them unset.  For bounded operations
(i.e., you are not going to go referencing the uninitialised memory)
this is just as efficient: pages not used are not touched.  Of course,
unset memory is a good place for bugs to hide.

(If you want to get really silly, memset() could ask the O/S to map out
any full pages, marking them as `c'-fill, where c is what memset is to
fill with.  I wonder if this would actually ever pay off?)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/18/88)

In article <4509@aldebaran.UUCP> jimp@cognos.UUCP (Jim Patterson) writes:
>Requiring that the "null pointer constant" be 0, as ANSI C does, just
>makes any other implementation painfully difficult (and is begging for
>problems when porting software as well).

Please get your facts straight before complaining.  C has always
allowed a null pointer constant to be written as 0.  ANSI C merely
makes (void*)0 a valid alternative way to write a null pointer
constant.  (K&R C didn't have void*.)

The contexts where a null pointer constant are being used aren't
all that hard for a compiler to determine, and it can generate
whatever code is necessary for such cases.  By no means is an
all-0-bit representation forced on the implementation.

>It's worth noting that pre-clearing memory shouldn't be considered
>wasted overhead on the part of the OS. It's an important security
>precaution, to prevent other system users from poking through memory
>that used to belong to someone else and which could contain sensitive
>information. This may not be important to all users, but it is to
>many.

All the UNIX implementations I know of arrange for extended program
break memory (heap) and stack to be zeroed.  It would be even safer
to zero it just before relinquishing process ownership of it.

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/18/88)

In article <1988Nov16.184238.16375@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <8902@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>>... In fact they're wrong.  I routinely use gets() in
>>an utterly safe manner...
>Well, "utterly safe" if you're always very careful that part A of your
>program preserves the length limits that part B is relying on.  Personally
>I prefer slightly more robust programming, especially when there's no
>significant difference in convenience or efficiency.

Why work harder when gets() does exactly what one needs?

Another safe use is for small "one-shot" test programs etc. that are
to be used only by persons and procedures that will not exceed the
limits.  I've written quite a few of these over the years and they
have never had their buffers overrun, because nobody who is in a
position to do so (me, usually) has the least interest in doing so.

ok@quintus.uucp (Richard A. O'Keefe) (11/19/88)

In article <19278@ism780c.isc.com> marv@ism780.UUCP (Marvin Rubenstein) writes:
>I would like to suggest a library routine to replace gets say,
>safegets(buffer,count), which for lines no longer then count would behave
>like gets, and for lines longer than count would place the first count-1
>characters of the line into the buffer followed by a '\0'.  The value
>returned by safegets is the line length (or EOF).

Believing that co-operation is more constructive than criticism, I posted
just such a routine to comp.sources.misc a couple of days ago, called
getsafe().  The return value is the number of characters in the line
_including_ the \n, or 0 for EOF.

However, a couple of other people on the net have pointed out problems with
my code, such as the possibility of someone supplying >2**32 characters of
input so that the counter would wrap around, and some things to be done for
dpANS compatibility.  I have included these changes, and in a day or two
(in case anyone else spots something wrong) will post the revised version.

Trying to make getsafe() absolutely foolproof (and portable) has been an
educational experience for me.  I have come to the conclusion that there
is something _worse_, far far worse, than gets(), and that is the routines
which people took great care to make safe, but because of C's under-
specified integer arithmetic, aren't.  (Leaving aside the fact that in
UNIX it is _impossible_ for a C program to be sure of getting the right
value of errno -- and no, 'volatile' doesn't fix that, it just stops the
compiler making it worse.)

henry@utzoo.uucp (Henry Spencer) (11/20/88)

In article <8915@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>>Well, "utterly safe" if you're always very careful that part A of your
>>program preserves the length limits that part B is relying on...
>
>Why work harder when gets() does exactly what one needs?

What "work harder"?  It's a few more characters of typing.

>Another safe use is for small "one-shot" test programs etc...

Agreed, provided one is careful to destroy those programs after their
one shot is fired.  Such programs have a depressing tendency to persist,
and even to end up in 4BSD distributions... :-(
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/20/88)

In article <1988Nov19.214209.27406@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>What "work harder"?  It's a few more characters of typing.

It's considerably more than "a few characters".  Enough so that if I
didn't have gets() I'd write one and add it to my personal library.

jas@ernie.Berkeley.EDU (Jim Shankland) (11/21/88)

In article <8915@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
[In effect:  sometimes gets() really is safe, or sufficient:  e.g., in programs
whose input is known a priori, or in small, one-shot test programs, or ....]

>Why work harder when gets() does exactly what one needs?

But how much harder do you end up working without gets()?  Using fgets()
isn't exactly 5 years of hard labor.  gets() just doesn't seem to
provide much added value, and is almost never safe.  (I've certainly
written some small, one-shot test programs that ended up being so useful
that lots of people had the opportunity to gag at my "one-shot" code.)

Jim

tanner@cdis-1.uucp (Dr. T. Andrews) (11/21/88)

In article <8876@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:
) Bullshit.  When I use gets() I use it safely.

I suspect that I am far from the only one who would be most
interested in learning to use gets(3) safely.
-- 
...!bikini.cis.ufl.edu!ki4pv!cdis-1!tanner  ...!bpa!cdin-1!cdis-1!tanner
or...  {allegra killer gatech!uflorida decvax!ucf-cs}!ki4pv!cdis-1!tanner

meissner@xyzzy.UUCP (Usenet Administration) (11/22/88)

In article <4509@aldebaran.UUCP> jimp@cognos.UUCP (Jim Patterson) writes:
	/* stuff deleted */
| I know of at least one system where the system convention is not 0;
| Data General MV systems have instructions which take -1 as the null
| pointer value, and this has persisted through many system call
| conventions as well. However, the C implementation still considers a
| null pointer to be 0 even though this requires quite a bit of "glue"
| around some system calls to interface between the two formats.
| Requiring that the "null pointer constant" be 0, as ANSI C does, just
| makes any other implementation painfully difficult (and is begging for
| problems when porting software as well).

Sigh....  Yes the MV does have some queue instructions that take -1
for a null pointer.  However, the general NULL pointer as defined by
the C library is all 0's, as it is for other DG languages.  Whatever
other faults we have (three pointer types, etc.), a non-zero NULL is
not one of them.

And yes there are some system calls that want -1 in pointer fields as
a special value, there are also system calls that want you to do a
logical OR with the high bit set.  Such is life.....

-- 
Michael Meissner, Data General.

Uucp:	...!mcnc!rti!xyzzy!meissner
Arpa:	meissner@dg-rtp.DG.COM   (or) meissner%dg-rtp.DG.COM@relay.cs.net

daveb@gonzo.UUCP (Dave Brower) (11/22/88)

>Jas writes:
>In <8915@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>
>[In effect:  sometimes gets() really is safe, or sufficient:  e.g., in
>programs whose input is known a priori, or in small, one-shot test
>programs, or ....]
>
>>Why work harder when gets() does exactly what one needs?
>
>But how much harder do you end up working without gets()?  Using fgets()
>isn't exactly 5 years of hard labor.  gets() just doesn't seem to
>provide much added value, and is almost never safe.  (I've certainly
>written some small, one-shot test programs that ended up being so useful
>that lots of people had the opportunity to gag at my "one-shot" code.)

Jasbo obviously doesn't want to tell the story here, so I will, with
minor embellishments.

Once upon a time, the 'Zbo tried to figure out how the large character
writing on a VT100 family terminal worked.  In order to do so, he wrote
a little test program that took the command line arguments like echo,
and spat them out with the right escape sequences and multi-line
duplication to correctly drive the terminal.

It is lost to history who started it, but there followed a brief period
of "writebig" wars, with surreal messages in large letters appearing at
random times on compatriot's screens, to humourous effect.

"Wonderful!" said the workmates, who quickly snatched the program for
use in a messaging service that would send out the clarion call to go to
lunch in nice big letters.

"But! But! But!  It's a hack!", said 'Zbo, "I don't want to support it!"

And the users said, "Oh pleeaze, 'Zbo, it's so handy, please don't take
it away."  And then they whispered, "it would be awfully nice if it
would center lines on the screen"

"No! No! No!", said 'Zbo, "It's a hack!  I don't want to be responsible!
Next thing I know, people will start asking for documentation!"

But the user's cajoled and begged, and twisted the 'Zbo's arm, and
writebig was changed to center lines.

Then one day at tea, the Ceferino Lamb innocently inquired about the
neat announcement program that wrote letters double high and double
wide.  "Writebig, huh.  Where's the man page?"  And the 'Zbo let out a
quiet scream.

Moral:  There is _never_ a one shot test program.

(Corrollaries are left as an excercise to the reader.)

-dB

[ This is one of the few times I've ever disagreed with Doug Gwyn.
  Don't use gets().  It is the work of the devil. ]




-- 
If life was like the movies, the music would match the picture.

{sun,mtxinu,hoptoad}!rtech!gonzo!daveb		daveb@gonzo.uucp

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/23/88)

In article <471@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
>Moral:  There is _never_ a one shot test program.

That is simply untrue.  I've written scads of them over the years,
probably an average of one per week.

badri@valhalla.ee.rochester.edu (Badri Lokanathan) (11/24/88)

In article <8959@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:
> In article <471@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
> >Moral:  There is _never_ a one shot test program.
> 
> That is simply untrue.  I've written scads of them over the years,
> probably an average of one per week.

I hate to add to a discussion that is going nowhere, but I must say
I agree with Doug. As part of my research I design and implement many
algorithms, most of which are modules for a bigger package. Almost all
of them have a

#ifdef DEBUG_MAIN
main() {
  .
  .
}
#endif DEBUG_MAIN

built into them for stand-alone debugging. Here, gets, puts, scanf are the
easiest way of I/O and I use them all the time. It does not make any
sense to worry about safe gets, coz' this part of the code is never going to
be used by anybody for purposes other than testing. Quick and easy is the way
to go.
-- 
"Don't blame me for wanting more         {) badri@ee.rochester.edu
 The facts are too hard to ignore       //\\ {ames,cmcl2,columbia,cornell,
 I'm scared to death of poverty        ///\\\ garp,harvard,ll-xn,rutgers}!
 I only want what's best for me."-UB40   /\    rochester!ur-valhalla!badri

barmar@think.COM (Barry Margolin) (11/24/88)

In article <8959@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
]In article <471@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes:
]>Moral:  There is _never_ a one shot test program.
]That is simply untrue.  I've written scads of them over the years,
]probably an average of one per week.

OK, how about this one:

Moral: You can never be sure that a program will be a one-shot test
program.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

rob@pbhyf.PacBell.COM (Rob Bernardo) (11/24/88)

In article <1606@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes:
+ Almost all of them have a
+
+#ifdef DEBUG_MAIN
+main() {
+  .
+  .
+}
+#endif DEBUG_MAIN
+
+built into them for stand-alone debugging. ...  It does not make any
+sense to worry about safe gets, coz' this part of the code is never going to
+be used by anybody for purposes other than testing. Quick and easy is the way
+to go.

Um, er, wasn't it a debug part of sendmail that had it's security hole that
many people compiled in anyway?
-- 
Rob Bernardo, Pacific Bell UNIX/C Reusable Code Library
Email:     ...![backbone]!pacbell!pbhyf!rob   OR  rob@pbhyf.PacBell.COM
Office:    (415) 823-2417  Room 4E750A, San Ramon Valley Administrative Center
Residence: (415) 827-4301  R Bar JB, Concord, California

mcdonald@uxe.cso.uiuc.edu (11/24/88)

>I suspect that I am far from the only one who would be most
>interested in learning to use gets(3) safely.
-
I don't understand how you can get something like

    gets(3);

past a compiler. Isn't 'gets' supposed to take a char * argument,
not an int literal?

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/24/88)

In article <32095@think.UUCP> barmar@kulla.think.com.UUCP (Barry Margolin) writes:
-OK, how about this one:
-Moral: You can never be sure that a program will be a one-shot test
-program.

Still not true.

rob@pbhyf.PacBell.COM (Rob Bernardo) (11/25/88)

In article <225800095@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
+>I suspect that I am far from the only one who would be most
+>interested in learning to use gets(3) safely.
+I don't understand how you can get something like
+    gets(3);
+past a compiler. Isn't 'gets' supposed to take a char * argument,
+not an int literal?

You win the Gracie Allen Award of C!

Reminds me of something that happened in code walkthrough. One of the reviewers
noticed that all the exit statements were:
	exit(2);
and asked the programmer why an exit value of two was used regardless of
the exit conditions. The programmer replied, "That's what it says on
the top of the man page."


-- 
Rob Bernardo, Pacific Bell UNIX/C Reusable Code Library
Email:     ...![backbone]!pacbell!pbhyf!rob   OR  rob@pbhyf.PacBell.COM
Office:    (415) 823-2417  Room 4E750A, San Ramon Valley Administrative Center
Residence: (415) 827-4301  R Bar JB, Concord, California

atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]) (11/26/88)

In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
>Let your vendor know that you want to see gets deleted from its next
>release, delete gets.o from your C library, move gets.o to -lgets,
>define gets(s) as "gets is unsafe; use fgets(3)"<><><> in your stdio.h;
>do whatever you can to help.
>
>If your vendor protests your reasonable request, point out that gets,
>as part of stdio, is a decade-old backward compatibility hack for
>compatibility with the Sixth Edition UNIX Portable I/O Library, which
>was utterly replaced by stdio no later than 1979.  Accept no excuses;
>converting programs from using gets to fgets is largely mechanical,
>and stripping trailing newlines is trivial to code yourself.
>
While the vendor may sympathize with the reasoning, the mechanics
of the the US Federal bureaucracy work against this.  As long as
gets() is in the an official ANSI standard, it will be in a the validation
suites.  Part of the boiler plate used in sales contracts to the
the US government is that the compiler must be an officially
validated compiler (lawyers an accountants don't care about
the dangers of GETS/FGETS just that it be "certified").  In other
words once the ANSI standard gets passed and someone gets themselves
declared and official certifier, you can't sell your compiler to
a US Federal department without such certification.  That is a
lot of revenue for a vendor to give up to satisfy your request.

henry@utzoo.uucp (Henry Spencer) (11/27/88)

In article <22402@watmath.waterloo.edu> atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]) writes:
>... As long as
>gets() is in the an official ANSI standard, it will be in a the validation
>suites.  Part of the boiler plate used in sales contracts to the
>the US government is that the compiler must be an officially
>validated compiler...

It is not necessary for a vendor to give up validation-suite compliance for
the sake of discouraging use of gets().  How a compiler is invoked is
compiler-specific in any case; putting gets() in a separate library and
requiring that it be explicitly included (e.g. with "-lunsafe") retains
compliance (and the ability to compile broken old programs) while still
pushing in the right direction.
-- 
Sendmail is a bug,             |     Henry Spencer at U of Toronto Zoology
not a feature.                 | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

daveb@geaclib.UUCP (David Collier-Brown) (11/27/88)

> In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes:
>>If your vendor protests your reasonable request, point out that gets,
>>as part of stdio, is a decade-old backward compatibility hack for
>>compatibility with the Sixth Edition UNIX Portable I/O Library, which
>>was utterly replaced by stdio no later than 1979.  

From article <22402@watmath.waterloo.edu>, by atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]):
> While the vendor may sympathize with the reasoning, the mechanics
> of the the US Federal bureaucracy work against this.  As long as
> gets() is in the an official ANSI standard, it will be in a the validation
> suites.

  (Hi, Allan!)
  This raises the interesting, and possibly invidious, question of
why the ANSI C standard includes gets...  It may prove advisable to
ask for its elimination on the next (NOT! current) round of
standardization, and a request from the (U.S) DOD Computer Security
Center (sic) for an exception in the validation suite...

--dave
-- 
 David Collier-Brown.  | yunexus!lethe!dave
 Interleaf Canada Inc. |
 1550 Enterprise Rd.   | HE's so smart he's dumb.
 Mississauga, Ontario  |       --Joyce C-B

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/27/88)

In article <1988Nov27.005945.29173@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>It is not necessary for a vendor to give up validation-suite compliance for
>the sake of discouraging use of gets().  How a compiler is invoked is
>compiler-specific in any case; putting gets() in a separate library and
>requiring that it be explicitly included (e.g. with "-lunsafe") retains
>compliance (and the ability to compile broken old programs) while still
>pushing in the right direction.

This is dumb, dumb, dumb.  Now you want C vendors to have to support
multiple levels of compilation, one which is harder to invoke for the
standard C environment and one that is just like it except it's missing
gets() from the C library.  This is NOT a "push in the right direction";
it adds complexity merely because some people hate a particular function.
I happen to dislike several library functions for reasons similar to
those put forth againt gets(); should vendors also segregate those out
into a -lgwyn_disapproved library?  Why would that be any more absurd
than your suggestion?

Might I suggest that you simply add to whatever code-quality checks you
perform something along the following lines and LEAVE C ALONE:
	grep -n '[^a-zA-Z_]gets[^a-zA-Z_]' /dev/null "$*" && \
		echo "Henry thinks you shouldn't be using gets()."
That is the right place to apply your notions of proper coding style.

It amazes me how ready people are to jump onto a totally irrelevant
bandwagon in the aftermath of the Internet virus/worm attack.  If
you really think that lack of gets() in somebody's C library would
have prevented the attack, you're quite mistaken.  A programmer who
made the mistake that allowed the virus to enter through the 4BSD
finger daemon would very likely have been equally careless with
numerous other language and operating system facilities.  In fact
there have been several such security holes discovered so far, and
the famous virus/worm exploited only a couple of them to enter
systems.  You cannot fix the security problems by removing every
function that somebody misuses from the C library; there wouldn't
be many left if you took that approach.  Learn to use what's there
wisely, and when there isn't a canned function suitable for the job
invent one (preferably nicely designed and published so it will
eventually be a candidate for addition to the standard library).

I avoid use of gets() in general-purpose input code, but I still
want it in the C library for the times when it IS appropriate and
useful.  If vendors really are so stupid as to try to make it
hard to find, they're going to have trouble convincing me that
they want to sell C implementations to me.  Of course if necessary
I would immediately cons up a public-domain implementation, add it
to the deficient libraries, and spread it around for others in the
same boat.  The net result would have been just a lot of extra
trouble to get back to the point from which we started.

To repeat my main point: gets() is NOT a problem.  Programmers who
don't think clearly enough about what they are doing ARE the problem.
You cannot solve the real problem by working on the non-problem.
It's at best a waste of time and potentially a nuisance; at worst it
draws attention away from real causes for lack of system security and
gives people a false sense of security, on the misperception that the
problem has been properly dealt with.

peter@ficc.uu.net (Peter da Silva) (11/28/88)

In article <7008@cdis-1.uucp>, tanner@cdis-1.uucp (Dr. T. Andrews) writes:
> In article <8876@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:
> ) Bullshit.  When I use gets() I use it safely.

> I suspect that I am far from the only one who would be most
> interested in learning to use gets(3) safely.

Um, wear a prophylactic and use a sterile needle?
-- 
Peter da Silva  `-_-'  Ferranti International Controls Corporation
"Have you hugged  U  your wolf today?"     uunet.uu.net!ficc!peter
Disclaimer: My typos are my own damn business.   peter@ficc.uu.net

gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/28/88)

In article <3449@geaclib.UUCP> daveb@geaclib.UUCP (David Collier-Brown) writes:
>  This raises the interesting, and possibly invidious, question of
>why the ANSI C standard includes gets...

It's there because it is useful and much existing code relies on its
existence.  It was specified in the library base document.  There was
not sufficient committee support for its removal.  We've been over all
this before..

>It may prove advisable to ask for its elimination on the next (NOT!
>current) round of standardization,

I don't know what this is intended to refer to.  The proposed ANSI C
standard is complete at this point and is expected to be adopted without
alteration (although perhaps with additions) by ISO.  The committee
officially tasked with standardization did NOT deem it advisable to
eliminate gets().  This is a CLOSED ISSUE insofar as the standards
process is concerned.  (That's why it's so annoying to me to hear
it being discussed on the net as though anything was really going to,
or needed to, be done about the current state of gets() in the C
standard.  Don't use it if you don't like it, and propagandize your
friends to not use it if you wish, but stop suggesting that the
standards committees deal with it.  We already have.  It stays.)

>	and a request from the (U.S) DOD Computer Security
>Center (sic) for an exception in the validation suite...

I don't know what model you have for how standards work.  To conform
to an ANSI standard, the requirements of the standard must be met.
There are no provisions for "exceptions".

Now, a FIPS can say anything it wants, no matter how silly, and
products specified as FIPS-xxx compliant are expected to meet its
requirements.  An example of this is FIPS-151, which took the IEEE
1003.1 not-yet-standard (Draft 12) as its starting point then added
a collection of more specific requirements to it, the result being
that no planned vendor POSIX implementation was likely to meet the
FIPS without the vendor's plans being revised.  It is not clear
that this really served anyone's interests, and it is to be hoped
that in the case of C any relevant FIPS would not attempt to alter
the technical requirements set forth in the ANSI standard.  There
is much less excuse for this with C than with POSIX, because POSIX
had numerous explicit options that I suppose NBS felt obliged to
nail down.  What is optional in the proposed ANSI C standard are
just those things that COULD NOT be made more specific without
unjustifiably excluding important compilation/execution
environments.  There are practically NO "political options" like
POSIX had.  This was by design, as was the absence of "levels" of
conformance and the prohibitions against name-space pollution.

henry@utzoo.uucp (Henry Spencer) (11/30/88)

I won't do a point-by-point rebuttal of Doug's long posting, partly
because this is obviously a semi-religious issue.  I will content myself
with observing that saying "it's all just a matter of coding style"
ignores the fact that there are objective differences between coding
styles:  some *are* better than others.  Many people, notably including
those at a certain Bell Labs site of some historical significance, seem
to agree with Geoff and me that gets() is an error-prone and unnecessary
function whose use should be firmly discouraged.  This would not magically
solve all our problems, but it would eliminate one superfluous sharp edge
from widely-used software.
-- 
SunOSish, adj:  requiring      |     Henry Spencer at U of Toronto Zoology
32-bit bug numbers.            | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

eao@zeus.umu.se (12/03/88)

When I retire gets() I would like a function like this fgetline() to replace
it. Are there any drawbacks I have missed? or is recursion to simple to use
in problems like this? (To parse a input in search of newline.)

/*
 * char *fgetline(file) 
 *	FILE *file;
 * returns a null terminated line from stdin allocated with *some_malloc
 */

#include <stdio.h>

/*
 * Size of chunks read whith fgets. This constant could be freely altered 
 * to achieve optimal efficiency. (Try 1 :-)
 */
#define BUFFSIZE 512

static char *head, *tail; 
static long size;
static FILE *stream;

void storetail(buff, tailsize)
	char *buff;
	long tailsize;
{
long headsize;
extern char *(*some_malloc)();
headsize = size;
size += tailsize;
head = (*some_malloc)(size + 1);
tail = head + headsize;
strncpy(tail,buff, tailsize);
return;
}

static void getchunk()
{
char buff[BUFFSIZE+1], *strchr();
static char *s;
s = fgets(buff, BUFFSIZE+1, stream);
if (s == NULL) 
	if (size == 0)
		/* Do nothing */;
	else
		storetail(buff, 0);
else 	{
	s = strchr(buff, '\n');
	if (s != NULL) { /* Newline has been read */
		*s = 0;
		storetail(buff, s - buff);
		}
	else { /* Newline is still to be seen. Read more */
		size += BUFFSIZE;
		getchunk();
		tail -= BUFFSIZE;
		strncpy(tail, buff, BUFFSIZE);
		}
	}
return;
}

char *fgetline(file)
	FILE *file;
{
size = 0;
stream = file;
getchunk();
if (size == 0)
	return NULL;
else	{
	head[size] = 0;
	return head;
	}
}

Erik Marklund	+90-16 63 30	

bright@Data-IO.COM (Walter Bright) (12/06/88)

In article <649@umecs.cs.umu.se> eao@zeus.umu.se () writes:
>When I retire gets() I would like a function like this fgetline() to replace
>it. Are there any drawbacks I have missed? or is recursion to simple to use
>in problems like this? (To parse a input in search of newline.)
>/*
> * char *fgetline(file) 
> *	FILE *file;
> * returns a null terminated line from stdin allocated with *some_malloc
> */
> [ code deleted for brevity ]

My objections to the code presented are:
	1. It depends on static variables. This makes it non-reentrant, and
	   therefore a bug waiting to happen on multi-threaded systems
	   like OS2.
	2. If a 0 byte is read, the behavior is undefined.

So I present this:

   size_t fgetline(FILE *file, char **pbuffer, size_t *pbufsize);

Semantics:
	Reads a line from the file. The end of the line is defined by
	reading a \n, or encountering the EOF. If a \n was read, it's
	included in the read line. 0s may also be read, and are included
	in the read line, thus the count of bytes read that's returned
	may be larger than that obtained by strlen(*pbuffer).

Input:
	file		input stream pointer
	pbuffer		pointer to the buffer pointer. If the buffer pointer
			is NULL, one is malloc'd. The buffer pointer must
			be NULL or point to data allocated by malloc, realloc
			or calloc.
	pbufsize	pointer to variable containing the allocated length
			of the buffer
Output:
	*pbuffer	If the buffer needs to be realloc'd, this is set
			to the new buffer.
	*pbufsize	Set to the size of the buffer, which may be larger
			than the actual amount of data in the buffer.
Errors:
	If EOF or an error occurs while a partially read line is being
	read, it is treated as the end of the line.
	If no bytes are read yet, 0 is returned.

	If malloc or realloc run out of memory, fgetline will return what
	it's already got, and errno will be set.
Returns:
	number of bytes read into *pbuffer, excluding terminating 0

Example (in ANSI C):

	typefile(FILE *f)	/* copy file to stdout	*/
	{	char *buffer = NULL;
		size_t buflen = 0;
		size_t linelen;

		while (1)
		{	linelen = fgetline(f,&buffer,&buflen);
			if (linelen == 0)	/* error or EOF	*/
				break;
			if (fwrite(buffer,1,linelen,stdout) != linelen)
				break;		/* error	*/
		}
		free(buffer);
	}

Put a quarter in the juke,
Boogie 'till yah puke.