geoff@utstat.uucp (Geoff Collyer) (11/08/88)
The recent exposure of the security bug in the 4BSD fingerd caused by use of gets(3) reminded me that gets is a bug waiting to happen and should be stamped out. I have deleted gets from my stdio implementation (my first ANSI incompatibility!), the folks at Bell Labs Research have deleted gets from their C library, now it's your turn. We need to get the next ANSI C standard, the relevant POSIX standard(s), the next edition of the SVID, the next System V, the next 4BSD, the next SunOS and the next release from your favourite C vendor to delete gets. Let your vendor know that you want to see gets deleted from its next release, delete gets.o from your C library, move gets.o to -lgets, define gets(s) as "gets is unsafe; use fgets(3)"<><><> in your stdio.h; do whatever you can to help. If your vendor protests your reasonable request, point out that gets, as part of stdio, is a decade-old backward compatibility hack for compatibility with the Sixth Edition UNIX Portable I/O Library, which was utterly replaced by stdio no later than 1979. Accept no excuses; converting programs from using gets to fgets is largely mechanical, and stripping trailing newlines is trivial to code yourself. With your help, we can stamp out gets in our lifetimes. -- Geoff Collyer utzoo!utstat!geoff, geoff@utstat.toronto.edu
usenet@cps3xx.UUCP (Usenet file owner) (11/09/88)
in article <1988Nov8.054845.23998@utstat.uucp>, geoff@utstat.uucp (Geoff Collyer) says:
$
$ The recent exposure of the security bug in the 4BSD fingerd caused by
$ use of gets(3) reminded me that gets is a bug waiting to happen and
$ should be stamped out.
This may be a naive question, or perhaps I haven't followed the right
stories, but what is the problem with using gets versus fgets?
John H. Lawitzke UUCP: ...rutgers!mailrus!frith!fciiho!jhl
Michigan Farm Bureau ...decvax!purdue!mailrus!frith!fciiho!jhl
Insurance Group ...uunet!frith!jhl
vfm6066@dsacg3.UUCP (John A. Ebersold) (11/10/88)
In article <1031@cps3xx.UUCP> usenet@cps3xx.UUCP (Usenet file owner) writes: > >This may be a naive question, or perhaps I haven't followed the right >stories, but what is the problem with using gets versus fgets? > One can feed a VERY long string to gets(3), since gets will keep reading characters until receipt of a newline and does not check for overflow of the receiving buffer. The VERY long string would cause a program to malfunction is some way that is not clear to me. Maybe overwritting the stack?
chase@Ozona.orc.olivetti.com (David Chase) (11/10/88)
You should also consider retiring certain features of 'scanf' and 'fscanf'. A call along the lines of scanf("%s", junk); is perfectly able to scribble past the end of 'junk'. I'm not sure if there are other holes like this built in to the standard i/o library; it wouldn't hurt to check. (I've never been a real fan of 'scanf', but it does seem marginally more useful and harder to replace than 'gets'). David
lvc@cbnews.ATT.COM (Lawrence V. Cipriani) (11/10/88)
In article <1031@cps3xx.UUCP> usenet@cps3xx.UUCP (Usenet file owner) writes: >This may be a naive question, or perhaps I haven't followed the right >stories, but what is the problem with using gets versus fgets? The only argument to gets() is a character pointer, or buffer; fgets() has a FILE*, a character buffer, and most importantly a count. Used properly, this prevents writing passed the end of the buffer. Since gets() doesn't have the count, this could be used to read passed the end of some buffer, say buf, in fingerd. Morris managed to get just the right "data" to go past the end of buf so that the program behavior was modified the way he needed. Usually reading data passed the end of a buffer gives you a fatal error and your process dies. In this case (I'm really reaching here), the stack was modified, say change the return address, to do "something special" like go around some permission checks. Neat, very neat. Now will someone please send me a copy of Morris's program :-) >John H. Lawitzke UUCP: ...rutgers!mailrus!frith!fciiho!jhl -- Larry Cipriani, AT&T Network Systems, Columbus OH, Path: att!cbnews!lvc Domain: lvc@cbnews.ATT.COM
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/10/88)
In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: >We need to get ... the next release from your favourite C vendor to >delete gets. gets() is deliberately required for ANSI C standard conformance because a LOT of existing code relies on it. Any vendor who omits this function will not be standard conforming and will not sell its compiler to those (expected to be MANY customers) who specify standard conformance. I sympathize with the desire to encourage conversion to fgets(), but attempts to force this down programmers' throats are misguided. This is an EDUCATIONAL issue and should be handled as one. Otherwise you will be as effective as the Libertarians were with their politics- before-public-education approach. Even if your philosophy is right, you should get others to go along with it BEFORE trying to force them to conform to it. By the way, have you removed scanf() from your C library as well? Or sprintf()? Or strcpy()? They can be misused in the same way as gets(). Let us know how happy your customers are once ALL such routines are gone. I think the appropriate treatment of gets() is to omit it from the documentation or to document it with "Unless you have sufficient control over the data being read to be sure that it will not overflow the buffer, use fgets". But leave it in the library.
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/10/88)
In article <1031@cps3xx.UUCP> usenet@cps3xx.UUCP (Usenet file owner) writes: >This may be a naive question, or perhaps I haven't followed the right >stories, but what is the problem with using gets versus fgets? If you don't know for sure that the input line will fit the buffer you've allocated for it, gets() can overrun the buffer (with random consequences). However, if your program can be sure that the line will fit, there is nothing wrong with using gets().
geoff@utstat.uucp (Geoff Collyer) (11/10/88)
I wrote: > The recent exposure of the security bug in the 4BSD fingerd caused by > use of gets(3) reminded me that gets is a bug waiting to happen and > should be stamped out. Apparently a lot of people have still not heard the details of the recent Internet worm (or "virus" as the media called it). The 4BSD fingerd had a bug which permitted its invoker to obtain a root shell. The bug was that fingerd used gets to read a line of input from its network connection, and gets is unable to check that the input line fits within the buffer handed to gets, so a suitably-constructed line of input to fingerd steps on other variables, confusing fingerd. The above is merely preamble; the point I want to make is that gets is inherently unsafe due to its inability to check for overrun of the buffer provided to it. There is no reason to use gets, and there are good reasons to avoid gets. Let's kill gets now, before it strikes again. -- Geoff Collyer utzoo!utstat!geoff, geoff@utstat.toronto.edu
chris@mimsy.UUCP (Chris Torek) (11/10/88)
In article <2044@cbnews.ATT.COM> lvc@cbnews.ATT.COM (Lawrence V. Cipriani) writes: [re recent Internet `worm'; note that R. T. Morris Jr. is still merely the `alleged' perpetrator---we have to give him the benefit of the doubt, first; *then* we can rip his arms off :-) ] >... in fingerd. Morris managed to get just the right "data" to go >past the end of buf so that the program behavior was modified the way he >needed. Usually reading data passed the end of a buffer gives you a fatal >error and your process dies. In this case (I'm really reaching here), the >stack was modified, say change the return address, to do "something special" >like go around some permission checks. You may be reaching, but you are right. The fingerd attack wrote more bytes than there were in the buffer passed to gets(); the `extra' bytes were a hand-crafted stack that `returned' into the stack, into the buffer itself. The part just before the hand-crafted stack contained code to to call execve("/bin/sh", (char **)0, (char **)0). (There were in fact ASCII NUL characters embedded in this code; curiously, gets() reads and stores NULs in its search for '\n'.) This attack failed if you had made any changes to fingerd or to the C library start-up code such that the buffer was in a different place on the stack. I myself had expanded the buffer, so that there was plenty of room for the `extra' bytes. (Hurrah for local modifications! :-) ) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
jrk@s1.sys.uea.ac.uk (Richard Kennaway CMP RA) (11/10/88)
Not being a Real Programmer (tm), I had to look in the Unix manual to see what the fuss was about. The gist of the entry for gets(3) is: NAME gets, fgets - get a string from a stream SYNOPSIS char *gets(s) gets reads characters from the standard input stream, stdin, into the array pointed to by s, until a new-line character is read or an end-of-file condition is encountered. In other words, gets will read an *arbitrarily large* amount of data from the file and place it in memory, beginning at &(s[0]). Presumably the programmer must guess a suitable amount of memory to allocate for s, then pray that no-one ever runs his program on a file with very long lines. Words fail me.
rds95@leah.Albany.Edu (Robert Seals) (11/10/88)
In article <8841@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes: > In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: > >We need to get ... the next release from your favourite C vendor to > >delete gets. > > I think the appropriate treatment of gets() is to omit it from the > documentation or ... I suppose there might be reasons to do this, but 1) it smells real bad already, and 2) is kinda dishonest, and 3) is annoying. My objections to omitting documentation are mostly moral, while my objection to gets() et al. is functionally borne out... rob
snafu@ihlpm.ATT.COM (00704a-Wallis) (11/11/88)
In article <8841@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes: ... text deleted ... > By the way, have you removed scanf() from your C library as well? Or > sprintf()? Or strcpy()? They can be misused in the same way as gets(). > Let us know how happy your customers are once ALL such routines are gone. > .... Actually, I don't understand the argument that gets() should be removed because it can overrun the buffer. What's to prevent the following (and how is it different from gets?): char some_string[10]; fgets( some_string, 2147483647, stdin ); -- Dave Wallis AT&T Network Systems Lisle, IL 60532 att!ihlpm!snafu
evil@arcturus.UUCP (Wade Guthrie) (11/11/88)
In article <1988Nov8.054845.23998@utstat.uucp>, geoff@utstat.uucp (Geoff Collyer) writes: > The recent exposure of the security bug in the 4BSD fingerd caused by > use of gets(3) reminded me that gets is a bug waiting to happen and > should be stamped out. gets is bad? There is a problem? Please explain this to us unknowledgable types before it's too late. Wade Guthrie Rockwell International Anaheim, CA (Rockwell doesn't necessarily believe / stand by what I'm saying; how could they when *I* don't even know what I'm talking about???)
gregg@ihlpb.ATT.COM (Wonderly) (11/11/88)
From article <8847@smoke.BRL.MIL>, by gwyn@smoke.BRL.MIL (Doug Gwyn ): > In article <1031@cps3xx.UUCP> usenet@cps3xx.UUCP (Usenet file owner) writes: >>This may be a naive question, or perhaps I haven't followed the right >>stories, but what is the problem with using gets versus fgets? > > If you don't know for sure that the input line will fit the buffer > you've allocated for it, gets() can overrun the buffer (with random > consequences). However, if your program can be sure that the line > will fit, there is nothing wrong with using gets(). I believe that the right thing to do is to use a new function called nlfgets (str, size, fp), that does exactly as gets(3). The biggest concern that most people have about moving from gets to fgets is the added hassle of doing a if ((t = strchr (buf, '\n')) != NULL) *t = 0; This seems to be a lot of work when you may be processing thousands of lines of code. I have written this exact function many times just to have the benefit of no strchr() call. -- It isn't the DREAM that NASA's missing... DOMAIN: gregg@ihlpb.att.com It's a direction! UUCP: att!ihlpb!gregg
rob@pbhyf.PacBell.COM (Rob Bernardo) (11/11/88)
Doug Gwyn: +By the way, have you removed scanf() from your C library as well? Or +sprintf()? Or strcpy()? They can be misused in the same way as gets(). +Let us know how happy your customers are once ALL such routines are gone. Wallis: +Actually, I don't understand the argument that +gets() should be removed because it can overrun +the buffer. What's to prevent the following (and +how is it different from gets?): + + char some_string[10]; + + fgets( some_string, 2147483647, stdin ); I think we need to make a distinction between three similar but different situations. 1. One set of functions (e.g. gets()) deal with file input of indeterminate size. 2. Other functions (e.g. fgets()) deal with file input of limited size. 3. Yet other functions (e.g. strcpy()) deal with data internal to the program. In order to guarantee that the buffer used by functions of type 1 will not overflow, the programmer has to guarantee something *outside* the program: that none of the lines in the file being read will ever exceed the buffer size. Often the programmer cannot guarantee this. But with functions of type 2 and 3, the programmer merely has to size things *within* the program appropriately. The programmer *always* has the capability to do this. Scanf() and fscanf() can fit into type 1 or type 2 depending on whether a field width is used in each conversion specification. -- Rob Bernardo, Pacific Bell UNIX/C Reusable Code Library Email: ...![backbone]!pacbell!pbhyf!rob OR rob@pbhyf.PacBell.COM Office: (415) 823-2417 Room 4E750A, San Ramon Valley Administrative Center Residence: (415) 827-4301 R Bar JB, Concord, California
les@chinet.chi.il.us (Leslie Mikesell) (11/11/88)
In article <8841@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >By the way, have you removed scanf() from your C library as well? Or >sprintf()? Or strcpy()? They can be misused in the same way as gets(). >Let us know how happy your customers are once ALL such routines are gone. With gets() and strcpy() a safe alternative exists. Is everyone really going to write their own safe versions of scanf() and sprintf()? I always wondered why the standard library versions have no way to control the size of the output - maybe real programers like core dumps? Les Mikesell
les@chinet.chi.il.us (Leslie Mikesell) (11/11/88)
In article <9054@ihlpb.ATT.COM> gregg@ihlpb.ATT.COM (Wonderly) writes: >I believe that the right thing to do is to use a new function called >nlfgets (str, size, fp), that does exactly as gets(3). >.... I have written this exact function many times just to >have the benefit of no strchr() call. Same here, but I made it return the number of characters read so that it can also avoid the strlen() call (which will be incorrect anyway if there were any nulls in the line). Why does any function return a pointer that you obviously already knew? Seems like it would only be useful if you wanted to nest function calls and ignore errors. Les Mikesell
daveh@marob.MASA.COM (Dave Hammond) (11/11/88)
In article <32301@oliveb.olivetti.com> chase@Ozona.UUCP (David Chase) writes: >You should also consider retiring certain features of 'scanf' and >'fscanf'. A call along the lines of > > scanf("%s", junk); > >is perfectly able to scribble past the end of 'junk'. I'm not sure if >there are other holes like this built in to the standard i/o library; >it wouldn't hurt to check. (I've never been a real fan of 'scanf', >but it does seem marginally more useful and harder to replace than >'gets'). Carrying this line of thought foreward, it would seem that Mr. Chase advocates retiring any library call which requires that the programmer take responsibility for providing enough buffer space to handle the data resulting from the call in question. IMHO, if the programmer is aware that the library call does not know about buffer length (which is obvious when no length parameter is passed to the call), then it is the programmers responsibility to ensure that (a) the buffer is of an appropriate length for his/her application, or (b) if an appropriate length can not be determined, the call should *not* be used. [the unnecessary overhead of calling scanf("%s") instead of fgets() or a getc() loop might also be pointed out -- but I suspect the example was nothing more than that] Dave Hammond UUCP: ...!uunet!masa.com!{marob,dsix2}!daveh DOMAIN: daveh@marob.masa.com ----------------------------------------------------------------------------
guy@auspex.UUCP (Guy Harris) (11/12/88)
>What's to prevent the following...
Nothing, other than intelligence on the part of the programmer.
However, unless your application can guarantee that the input will
*never* have overly-long lines (or can hand a buffer *so* immense to
"gets" that it won't matter - but consider how big a buffer might well
have to be), there's nothing to prevent a blowup in a program using
"gets()".
I don't know that I'd argue that "gets()" should be removed, especially
since it's in the dpANS. I would, however, argue that it should *never*
be used.
jwr@scotty.UUCP (Dier Retlaw Semaj) (11/12/88)
In article geoff@utstat.uucp (Geoff Collyer) writes: > >The recent exposure of the security bug in the 4BSD fingerd caused by >use of gets(3) reminded me that gets is a bug waiting to happen and >should be stamped out. Would someone please explain me what the problem with gets(3) is, or point me in appropriate direction of something that would. I'd be interested in finding out. Thank you. -- Dier R. Semaj {ames,cmcl2,rutgers}!rochester!kodak!fedsys!wally!jwr --
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (11/12/88)
In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: | The recent exposure of the security bug in the 4BSD fingerd caused by | use of gets(3) reminded me that gets is a bug waiting to happen and | should be stamped out. I have deleted gets from my stdio implementation I hate to say this, but C allows many things which are unsafe. The problem is not the language, or the library, but that people make bad choices about their selection of features. If you stamp out gets you will see postings of dozens of "public domain replacements" for the gets features "left out of BSD 4.17" or whatever. I don't disagree for a moment with your sentiment, and I see the problem, but I think you will have better luck educating your users on how to use the language than taking away all the parts with sharp edges. The best way to get rid of gets is to offer a better alternative. I wrote a "getsn" routine which looks like fgets but avoids putting the newline in the buffer in the first place, and I would expect to find that hundreds of others have do it, too. There is no way to strip the newline as quickly as not putting it in the buffer in the first place. | With your help, we can stamp out gets in our lifetimes. From or header files and our libraries, but not from our programmer's hearts (unfortunately). | -- | Geoff Collyer utzoo!utstat!geoff, geoff@utstat.toronto.edu -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
mcdonald@uxe.cso.uiuc.edu (11/12/88)
>gets() is deliberately required for ANSI C standard conformance because >a LOT of existing code relies on it. Any vendor who omits this function >will not be standard conforming and will not sell its compiler to those >(expected to be MANY customers) who specify standard conformance. How about fixing this, and the scanf and strcpy problems as well, by a little outside-the-standard kludge? (Okay, I realize that every time I suggest something like this, somebody tries to roast me, but I am flameproof.) That is #pragma _MAX_STRING_LENGTH=256 /*or some other suitable number*/ and the compiler would call special versions of gets, strcpy, and cohorts, that stopped at such a maximum. Now I am not sure whether the result of overrun would have to be a fatal error or whether it could just stop copying, but that would at least prevent old bugs from biting too bad.
ggs@ulysses.homer.nj.att.com (Griff Smith) (11/12/88)
In article <9054@ihlpb.ATT.COM>, gregg@ihlpb.ATT.COM (Wonderly) writes: > I believe that the right thing to do is to use a new function called > nlfgets (str, size, fp), that does exactly as gets(3). The biggest > concern that most people have about moving from gets to fgets is the > added hassle of doing a | > if ((t = strchr (buf, '\n')) != NULL) > *t = 0; | > This seems to be a lot of work when you may be processing thousands of > lines of code. I have written this exact function many times just to > have the benefit of no strchr() call. > -- > It isn't the DREAM that NASA's missing... DOMAIN: gregg@ihlpb.att.com > It's a direction! UUCP: att!ihlpb!gregg I think this misses the point. Gets guarantees that you will read all the characters in a line, but forces you to write insecure programs. One must also make the dubious assumption that the first null encountered is the terminal null rather than a null in the file. Your variation avoids the security problem, but preserves the ambiguity of nulls. It also adds another ambiguity: if someone hands you a line that is longer than your buffer, you gratuitously break it into two lines since you don't know where the newline is. Fgets avoids all these problems by marking the end of a line with newline. Proper use requires that you call fgets until you find the newline. You may need to use malloc as you discover that the line is much larger than anticipated. Fgets does have one annoying flaw: it should return the character count instead of the worthless pointer to the destination. If you complain that all this fuss is unnecessary, since all reasonable input will fit in the buffer you provided, you are really saying you don't like to write correct programs. I sometimes settle for `partially correct': a program must either operate as specified or stop. Breaking lines isn't even partially correct. -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: 1-201-582-7736 UUCP: {most AT&T sites}!ulysses!ggs Internet: ggs@ulysses.att.com
geoff@utstat.uucp (Geoff Collyer) (11/12/88)
> From: gwyn@smoke.BRL.MIL (Doug Gwyn ) > > gets() is deliberately required for ANSI C standard conformance because > a LOT of existing code relies on it. That's the whole point, Doug. People *should* fix their existing code; it's unsafe. > Any vendor who omits this function > will not be standard conforming and will not sell its compiler to those > (expected to be MANY customers) who specify standard conformance. Once the standards are changed, their code *will* be standard-conforming. > Even if your philosophy is right, you should get others to go along with > it BEFORE trying to force them to conform to it. That's what I'm trying to do now: get people to agree, and then act on that agreement. > By the way, have you removed scanf() from your C library as well? Or > sprintf()? Or strcpy()? They can be misused in the same way as gets(). No, I have not; all of these functions *can* be used safely, though it does take a little extra care. The point is that gets() *can* *not* be used safely; a dedicated opponent can *always* defeat a program that reads with gets(). > I think the appropriate treatment of gets() is to omit it from the > documentation or to document it with "Unless you have sufficient control > over the data being read to be sure that it will not overflow the buffer, > use fgets". It is in general not possible to have sufficient control over the input data. Remember the old maxim "Never trust any input." or, as Kernighan and Plauger put it in Elements of Programming Style, "Make sure input cannot violate the limits of the program.". It is *not* *possible* to ensure that input cannot violate the limits of a program which uses gets(). Someone can always provide input longer than the program's gets() buffer. -- Geoff Collyer utzoo!utstat!geoff, geoff@utstat.toronto.edu
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/12/88)
In article <2566@ihlpm.ATT.COM> snafu@ihlpm.ATT.COM (00704a-Wallis) writes:
-Actually, I don't understand the argument that
-gets() should be removed because it can overrun
-the buffer. What's to prevent the following (and
-how is it different from gets?):
- char some_string[10];
- fgets( some_string, 2147483647, stdin );
The main difference is that the above example would immediately
raise a flag in the mind of almost any competent programmer reading
the code, whereas we have not yet attained that degree of awareness
concerning gets() on uncontrolled sources of input.
strcpy() also is widely abused, so my mentioning it was not spurious.
The solution is not to ban potentially dangerous tools, but to ensure
that people are properly trained in their safe use.
scs@athena.mit.edu (Steve Summit) (11/12/88)
In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: >...gets is a bug waiting to happen and should be stamped out. Getting rid of gets is an excellent idea. I'm all for backwards compatibility and not breaking existing code, but it's got to be conscientiously written existing code, and to my way of thinking no reasonable program should ever have been using gets. (Apologies and condolences to those of you who do, and to the original implementor.) As an interim measure, why not recode gets with an implicit maximum buffer size of, say, 512? That is, implement it as if by char *gets(buf) char *buf; { return fgets(buf, 512, stdin); } except with the requisite newline-stripping code added. I doubt this would break many programs, particularly since 512 is a common buffer size anyway. Programs that use bigger buffers will just have to use fgets, if they're not already. After we get rid of gets, we should get rid of calloc(n, size), which doesn't really do anything for you that malloc(n * size) doesn't do. (This is not a security hole, just a quality-of-life issue.) calloc's only claim to fame is specious; its zero fill property is misunderstood by many programmers and is sufficiently useless that it can easily be replaced by bzero and/or memset for those few instances that truly require filling with bytes of zero. (Recall that such a zero fill does not necessarily result in NULL pointers or 0.0 floating-point values, in the common case where arrays or structures are being allocated.) Finally, I'd not mourn the passing of scanf -- not just %s, but all of it. It just doesn't work robustly enough for its common usage: interactive user input. (For example, scanf("%d %d") gives you no way of prodding the user if he only types one number; newlines are acceptable whitespace, so the user can keep banging the return key and getting nowhere because scanf hasn't returned and your program can't say "please type two numbers" even if it wants to. A related case is scanf("%s") to read command lines: you'd like to print another prompt if the user hits return without typing a command, but you can't, again because scanf doesn't return.) sscanf can remain for picking apart strings (perhaps read with fgets) while leaving the calling program in control for error handling, and fscanf can remain for reading carefully formatted data from files. (I'm not seriously suggesting getting rid of scanf; I know how many programs use it. To my mind, however, there are no good uses of scanf, but as long as it exists, people are going to keep using it, because it is extremely convenient.) Steve Summit scs@adam.pika.mit.edu
mesard@bbn.com (Wayne Mesard) (11/13/88)
From article <9054@ihlpb.ATT.COM>, by gregg@ihlpb.ATT.COM (Wonderly): > The biggest > concern that most people have about moving from gets to fgets is the > added hassle of doing a > > if ((t = strchr (buf, '\n')) != NULL) > *t = 0; > > This seems to be a lot of work when you may be processing thousands of > lines of code. Or have this new function return a pointer to the _last_ char read, instead of redundantly returning its first param. This would require exactly no extra work on the part of the library routine or the client program. -- void *Wayne_Mesard(); MESARD@BBN.COM BBN, Cambridge, MA
henry@utzoo.uucp (Henry Spencer) (11/13/88)
In article <8841@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >I think the appropriate treatment of gets() is to omit it from the >documentation or to document it... But leave it in the library. Actually, my suggestion to Geoff (which he did mention, note) was that it ought to go into a separate backwards-compatibility library. That way it's available, *but* you have to ask for it explicitly. -- Sendmail is a bug, | Henry Spencer at U of Toronto Zoology not a feature. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
henry@utzoo.uucp (Henry Spencer) (11/13/88)
In article <8847@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >... if your program can be sure that the line >will fit, there is nothing wrong with using gets(). That's a large "if" in most cases, however. -- Sendmail is a bug, | Henry Spencer at U of Toronto Zoology not a feature. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
henry@utzoo.uucp (Henry Spencer) (11/13/88)
In article <2566@ihlpm.ATT.COM> snafu@ihlpm.ATT.COM (00704a-Wallis) writes: >... What's to prevent the following (and >how is it different from gets?): > fgets( some_string, 2147483647, stdin ); Programmers with IQs larger than their waistlines? Nobody can protect against stupid programmers. But gets doesn't even give you a chance to be smart. -- Sendmail is a bug, | Henry Spencer at U of Toronto Zoology not a feature. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
henry@utzoo.uucp (Henry Spencer) (11/13/88)
In article <6927@chinet.chi.il.us> les@chinet.chi.il.us (Leslie Mikesell) writes: >... Is everyone really >going to write their own safe versions of scanf() and sprintf()? I always >wondered why the standard library versions have no way to control the >size of the output - maybe real programers like core dumps? ANSI C came within a hairsbreadth of including a length-limited sprintf. (There are length-limiting provisions in scanf, if you read the manual.) If there had been any prior experience with it, it probably would have made it. Sigh. -- Sendmail is a bug, | Henry Spencer at U of Toronto Zoology not a feature. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
peter@ficc.uu.net (Peter da Silva) (11/13/88)
In article <2566@ihlpm.ATT.COM>, snafu@ihlpm.ATT.COM (00704a-Wallis) writes: > Actually, I don't understand the argument that > gets() should be removed because it can overrun > the buffer. What's to prevent the following (and > how is it different from gets?): > char some_string[10]; > fgets( some_string, 2147483647, stdin ); This is a program bug... the programmer specified the wrong buffer size. Unlike the case of gets, you can limit the read to the buffer size. In all the other routines with the gets problem, a program can be written that will not allow any buffer overflow: char buffer[10]; sprintf(buffer, "%.9s", ptr); fscanf(fp, "%.9s", buffer); fgets(buffer, 10, fp); The problem is that there is no way to limit how much I/O gets will do. -- Peter da Silva `-_-' Ferranti International Controls Corporation "Have you hugged U your wolf today?" uunet.uu.net!ficc!peter Disclaimer: My typos are my own damn business. peter@ficc.uu.net
awm@gould.doc.ic.ac.uk (Aled Morris) (11/14/88)
I was going to suggest the following as a replacement for "gets": #define gets(buf) fgets(buf, sizeof(buf), stdin) since all the examples I've seen of "gets" in use have been: char buf[10]; ... gets(buf); But of course it won't work, (a) gets drops the newline at the end, fgets keeps it, and (b) maybe someone, has written: char *buf; buf = malloc(10); gets(buf); (although the #define would be fine in this case, it would read only 4 characters :-) I guess there isn't an easy answer :-( (but you didn't need me to tell you that) Aled Morris systems programmer mail: awm@doc.ic.ac.uk | Department of Computing uucp: ..!ukc!icdoc!awm | Imperial College talk: 01-589-5111x5085 | 180 Queens Gate, London SW7 2BZ
scs@athena.mit.edu (Steve Summit) (11/14/88)
In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: >...gets is a bug waiting to happen and should be stamped out. Getting rid of gets is an excellent idea. I'm all for backwards compatibility and not breaking existing code, but it's got to be conscientiously written existing code, and to my way of thinking no reasonable program should ever have been using gets. (Apologies and condolences to those of you who do, and to the original implementor.) As an interim measure, why not recode gets with an implicit maximum buffer size of, say, 512? That is, implement it as if by char *gets(buf) char *buf; { return fgets(buf, 512, stdin); } except with the requisite newline-stripping code added. I doubt this would break many programs, particularly since 512 is a common buffer size anyway. Programs that use bigger buffers will just have to use fgets, if they don't already. After we get rid of gets, we should get rid of calloc(n, size), which doesn't really do anything for you that malloc(n * size) doesn't do. (This is not a security hole, just a quality-of-life issue.) calloc's only claim to fame is specious; its zero fill property is misunderstood by many programmers and is sufficiently useless that it can easily be replaced by bzero and/or memset for those few instances that truly require filling with bytes of zero. (Recall that such a zero fill does not necessarily result in NULL pointers or 0.0 floating-point values, in the common case where arrays or structures are being allocated.) Finally, I'd not mourn the passing of scanf -- not just %s, but all of it. It just doesn't work robustly enough for its common usage: interactive user input. (For example, scanf("%d %d") gives you no way of prodding the user if he only types one number; newlines are acceptable whitespace, so the user can keep banging the return key and getting nowhere because scanf hasn't returned and your program can't say "please type two numbers" even if it wants to. A related case is scanf("%s") to read command lines: you'd like to print another prompt if the user hits return without typing a command, but you can't, again because scanf doesn't return.) sscanf can remain for picking apart strings (perhaps read with fgets) while leaving the calling program in control for error handling, and fscanf can remain for reading carefully formatted data from files. (I'm not seriously suggesting getting rid of scanf; I know how many programs use it. To my mind, however, there are no good uses of scanf, but as long as it exists, people are going to keep using it, because it is extremely convenient.) Steve Summit scs@adam.pika.mit.edu
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/14/88)
In article <1988Nov11.232629.15414@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: >> From: gwyn@smoke.BRL.MIL (Doug Gwyn ) >> gets() is deliberately required for ANSI C standard conformance because >> a LOT of existing code relies on it. >That's the whole point, Doug. People *should* fix their existing code; >it's unsafe. Bullshit. When I use gets() I use it safely. >> Any vendor who omits this function >> will not be standard conforming and will not sell its compiler to those >> (expected to be MANY customers) who specify standard conformance. >Once the standards are changed, their code *will* be standard-conforming. The standard is not going to change. This proposal has been debated and rejected by X3J11 on more than one occasion. (See my first sentence quoted above.) >> Even if your philosophy is right, you should get others to go along with >> it BEFORE trying to force them to conform to it. >That's what I'm trying to do now: get people to agree, and then act on >that agreement. It has already been tried, and failed. >> By the way, have you removed scanf() from your C library as well? Or >> sprintf()? Or strcpy()? They can be misused in the same way as gets(). >No, I have not; all of these functions *can* be used safely, though it >does take a little extra care. The point is that gets() *can* *not* be >used safely; a dedicated opponent can *always* defeat a program that >reads with gets(). I already said "bullshit" to this so I need not repeat it here. gets() has legitimate uses. It is in the library Base Document. It is widely used in existing code (sometimes safely, sometimes not). It stays. You seem to want to protect the programmer who is too stupid to protect himself. This is a dangerous thing to attempt where C is concerned. My god, pointers can really be abused -- maybe we better get rid of them too. The right thing to do, as I said before, is to eductae craftsmen in the proper use of their tools so they don't hurt themselves or their customers.
ok@quintus.uucp (Richard A. O'Keefe) (11/14/88)
In article <7963@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes: >In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: >>...gets is a bug waiting to happen and should be stamped out. > >Getting rid of gets is an excellent idea. I'm all for backwards >compatibility and not breaking existing code, but it's got to be >conscientiously written existing code, and to my way of thinking >no reasonable program should ever have been using gets. >(Apologies and condolences to those of you who do, and to the >original implementor.) When I am writing a program for my own use to process my own data sets which I _know_ have reasonable lines, why the d---l shouldn't I use gets()? If I am writing a program for _other_ people to use, I have an obligation to try to make it reasonably robust, but a lot of my C programs are there for a day (I find it easier to write C than awk, better debugging tools to start with... -- would a lint for awk be called lawk?). I have just posted a "safe gets" to comp.sources.misc.
desnoyer@Apple.COM (Peter Desnoyers) (11/15/88)
Perhaps I'm being naive, but wouldn't changing char buf[x]; gets( buf); to char * buf; buf = malloc( x); gets( buf); eliminate most (not all) of the security hole associated with gets()? The problem seems to be not only the use of gets(), but the use of temporary arrays on the stack to hold the output of dangerous functions. If you keep the buffer off the stack you make it much more difficult to exploit gets()'s unsafeness. (unless all you want to do is make the program crash.) Peter Desnoyers
shankar@hpclscu.HP.COM (Shankar Unni) (11/15/88)
> gets() has legitimate uses. It is in the library Base Document. > It is widely used in existing code (sometimes safely, sometimes not). > It stays. Exactly how do you use gets "safely"? There is really no way to stop gets from overwriting the end of your buffer, unless you fiddle around with the internals of stdio. Or do you "know" that your buffer is large enough? (This might be acceptable for a limited situation, like when you or a trusted co-program is writing stuff that you're reading from stdin, but in a more general case, it's impossible). The thing about gets is that until now, the hazards of using it have not been adequately advertised. There is no mention in any book or reference on C about how gets can be perverted to blow away your application. It does occur to most C programmers ultimately that there is "something wrong" with gets when you cannot specify the max length to read in, but the magnitude of the problem rarely sinks in. This is why the suggestion of moving gets() to a compatibility library sounds so good: this gives you the opportunity of making C programmers re-evaluate their use of gets(), and replace it with fgets() if they are unsure of the security and integrity implications of using gets(). But then, C programmers are such a spoilt bunch (sigh!). They scream and moan at the least little trouble they are put to :-(. ---- Shankar.
chase@Ozona.orc.olivetti.com (David Chase) (11/15/88)
In article <20588@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes: >Perhaps I'm being naive, but wouldn't changing > char buf[x]; gets( buf); >to > char * buf; buf = malloc( x); gets( buf); >eliminate most (not all) of the security hole associated with gets()? In practice it would make invasion difficult. Do bear in mind that it might not make it impossible; memory allocation may look like a black box to you, but with a little care purposeful overwriting is possible (for example, in testing a new garbage collector we discovered a bug tickled by the collector that (independent of input) consistently appeared at the same line while examining a structure at the same address. The cause? Overrunning of a data structure in the heap.) (Yes, a garbage collector for C -- ~ftp/sun-source/gc.shar@titan.rice.edu It works on Sun3s, Sun4s, Vaxes. Send mail with subject "help" to "archive-server@rice.edu" if you lack FTP access.) What I fail to understand is why you couldn't just as easily write char * buf; buf = malloc(x); fgets(buf, x, stdin); (yes, I know that fgets leaves the newline in the string) People say again and again "but I know how big the input is in my programs, so it's safe to use 'gets'". If you know how big the input is, then you might as well say it. People talk about performing certain hand-optimizations in a habitual way; is it too much to ask people to acquire habits that make their programs more robust? Optimizing a correct program is easier than correcting an optimized program (more fun, too). David
geoff@utstat.uucp (Geoff Collyer) (11/15/88)
> From: gwyn@smoke.BRL.MIL (Doug Gwyn ) > > Bullshit. When I use gets() I use it safely. Okay, Doug, let's take this again from the top. I'll use simple words and try to make myself utterly clear, and I won't even abuse your ancestry or swear at you, which I think is awfully polite of me, under the circumstances. To be proven: gets(3) should be abolished. Any program which uses gets(3) can be corrupted by giving it a long-enough input line. There is no protection possible against such an attack, other than sh's trick of making the gets buffer the last object in the data segment, catching the resulting SIGSEGV signal, growing the data segment and returning from the signal catcher, and this is certainly not portable to Cray-1s and Sun-3s, for example. gets is probably unique among C library functions because it cannot be used safely, no matter how hard you wish or how hard you work. Thus there seems little point (aside from writing unsafe programs) in continuing to support gets in standards and C libraries. QED -- Geoff Collyer utzoo!utstat!geoff, geoff@utstat.toronto.edu
jrk@s1.sys.uea.ac.uk (Richard Kennaway CMP RA) (11/15/88)
In article <8876@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes: > In article <1988Nov11.232629.15414@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: > > >That's the whole point, Doug. People *should* fix their existing code; > >it's unsafe. > > Bullshit. When I use gets() I use it safely. Please give us an example of your safe use of gets(). -- Richard Kennaway School of Information Systems, University of East Anglia, Norwich, U.K. uucp: ...mcvax!ukc!uea-sys!jrk Janet: kennaway@uk.ac.uea.sys
jackson@freyja (Jerry Jackson) (11/16/88)
In article <32596@oliveb.olivetti.com>, chase@Ozona (David Chase) writes: >(Yes, a garbage collector for C -- > ~ftp/sun-source/gc.shar@titan.rice.edu >It works on Sun3s, Sun4s, Vaxes. Send mail with subject "help" to >"archive-server@rice.edu" if you lack FTP access.) > I've written garbage collectors for lisp and have a pretty good idea what is involved... I can't imagine what this does, but I'm pretty sure it's something very different. Could someone please explain what this program does? Thanks, Jerry Jackson
dhesi@bsu-cs.UUCP (Rahul Dhesi) (11/16/88)
In article <1988Nov14.220842.3980@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: >gets is >probably unique among C library functions because it cannot be used >safely, no matter how hard you wish or how hard you work. Well, now, suppose we (a) write lines of known length to a file that is not writable by anybody else, (b) open that file for input and make it our standard input and (c) use gets with a buffer that is known to be big enough to hold any line in that file. Thus gets is probably unique among C library functions because it can be used safely if you try hard enough, but it should not be used anyway, because most practical uses of it are not safe. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi
jimp@cognos.uucp (Jim Patterson) (11/16/88)
In article <7963@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes: >After we get rid of gets, we should get rid of calloc(n, size), >which doesn't really do anything for you that malloc(n * size) >doesn't do.calloc's only claim to fame is specious; its zero fill >property is misunderstood by many programmers and is sufficiently >useless that it can easily be replaced by bzero and/or memset for >those few instances that truly require filling with bytes of zero. >(Recall that such a zero fill does not necessarily result in NULL >pointers or 0.0 floating-point values, in the common case where >arrays or structures are being allocated.) I've seen this point made many times, and it's even in the ANSI draft standard that null pointers don't HAVE to be all-bits-zero (as opposed to the "null pointer constant", which IS required to be 0). Realistically, though, are there REALLY C implementations out there which don't take binary 0 to be a NULL pointer, or a floating-point datum of all zero bits to be other than 0.0? I would be very interested in hearing about such systems. I know of at least one system where the system convention is not 0; Data General MV systems have instructions which take -1 as the null pointer value, and this has persisted through many system call conventions as well. However, the C implementation still considers a null pointer to be 0 even though this requires quite a bit of "glue" around some system calls to interface between the two formats. Requiring that the "null pointer constant" be 0, as ANSI C does, just makes any other implementation painfully difficult (and is begging for problems when porting software as well). I don't consider calloc() specious; if you have a large table to allocate even memset() can be too much overhead if you can do it better. Explicitly setting all elements of a table to the appropriate sort of 0, while maximally portable, is definitely even less efficient (assuming your memset implementation isn't completely out to lunch). If efficiency isn't a problem, fine, but often it is. Where a good implementation of calloc() can shine is in virtual memory (VM) environments where it can avoid actually faulting in the pages that you allocate. On many VM systems you can do this using a demand-page-zero page type which is allocated and cleared to zero when it's first referenced (VAX VMS is one system that supports this). You can't take advantage of this using malloc() and memset() (or explicit initialization). You are forced to fault in the entire area to clear it to zero, even though if it's a large area much of it will likely be faulted out again before you reference it again (if you do). It's worth noting that pre-clearing memory shouldn't be considered wasted overhead on the part of the OS. It's an important security precaution, to prevent other system users from poking through memory that used to belong to someone else and which could contain sensitive information. This may not be important to all users, but it is to many. -- Jim Patterson Cognos Incorporated UUCP:decvax!utzoo!dciem!nrcaer!cognos!jimp P.O. BOX 9707 PHONE:(613)738-1440 3755 Riverside Drive Ottawa, Ont K1G 3Z4
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/16/88)
In article <660023@hpclscu.HP.COM> shankar@hpclscu.HP.COM (Shankar Unni) writes: >Or do you "know" that your buffer is large enough? Sometimes one does know exactly this. Of course you don't know it for a general-purpose utility whose stdin can be directed from random places. So you don't use gets() then. >The thing about gets is that until now, the hazards of using it have not been >adequately advertised. There is no mention in any book or reference on C >about how gets can be perverted to blow away your application. The potential for abuse of gets() was quite well known before the virus attack. For instance, I don't think anyone on X3J11 was unaware of it. I'm pretty sure this has been discussed in comp.lang.c (INFO-C) more than once before. To take just two of the standard C texts: Harbison & Steele, "C: A Reference Manual": The use of gets can be dangerous because it is always possible for the input length to exceed the storage available in the character array. Plum, "Reliable Data Structures in C": Since it provides no means of specifying the size of the receiving string, it can seldom be used in reliable programs. Furthermore, it gives no convenient way to tell whether a newline was present in the input. The fgets function is more reliable, but oftentimes awkward to use. From the Rationale for Draft Proposed American National Standard for Information Systems -- Programming Language C: 4.9.7.2 The fgets function This function subsumes gets, which has no limit to prevent storage overwrite on arbitrary input (see section 4.9.7.7). >But then, C programmers are such a spoilt bunch (sigh!). They scream and >moan at the least little trouble they are put to :-(. I will complain if you try to enforce your notions of proper style on me, or try to protect me from myself.
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/16/88)
In article <1988Nov14.220842.3980@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: >To be proven: ... Did you really get through math classes with such a notion of "proof"? "Any program ... can be corrupted ..." and "There is no protection possible ..." and "it cannot be used safely ..." are simply stated, not demonstrated. In fact they're wrong. I routinely use gets() in an utterly safe manner. I'll let you try to figure out how this can be.. (Hint: Examine your notion of what is "always" possible.)
ok@quintus.uucp (Richard A. O'Keefe) (11/16/88)
In article <1988Nov14.220842.3980@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: >Any program which uses gets(3) can be corrupted by giving it a >long-enough input line. There is no protection possible against such an >attack There is a false assumption in this, namely that an attacker can control the input to every program. If I have a program which _only_ I have permision to execute, and I _always_ use it in a pipeline (or in a command script), and the preceding program in the pipeline (or script) always generates sufficiently short lines, it is safe to use gets(). The input to such a program is _every_ bit as much under my control as the source argument of strcpy().
daveh@marob.MASA.COM (Dave Hammond) (11/16/88)
In article <225800090@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes: >How about fixing this, and the scanf and strcpy problems as well, >by a little outside-the-standard kludge? (Okay, I realize that every >time I suggest something like this, somebody tries to roast me, >but I am flameproof.) That is > >#pragma _MAX_STRING_LENGTH=256 /*or some other suitable number*/ > >and the compiler would call special versions of gets, strcpy, >and cohorts, that stopped at such a maximum. Now I am not sure whether >the result of overrun would have to be a fatal error or whether >it could just stop copying, but that would at least prevent >old bugs from biting too bad. Sorry, but I fail to see where this (and a previous article suggesting a 512 byte limit) helps the problem if the programmer uses a buffer whos length is smaller than MAX_STRING_LENGTH. The result is still going to be an overflowed buffer, which is still going to be wrong. The best solution is to have the programmer instruct the function as to the *true* buffer length, and this can only be done with a function which expects a length parameter (eg fgets()). Dave Hammond UUCP: ...!uunet!masa.com!{marob,dsix2}!daveh DOMAIN: daveh@marob.masa.com ----------------------------------------------------------------------------
henry@utzoo.uucp (Henry Spencer) (11/17/88)
In article <8902@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >... In fact they're wrong. I routinely use gets() in >an utterly safe manner... Well, "utterly safe" if you're always very careful that part A of your program preserves the length limits that part B is relying on. Personally I prefer slightly more robust programming, especially when there's no significant difference in convenience or efficiency. -- Sendmail is a bug, | Henry Spencer at U of Toronto Zoology not a feature. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
news@ism780c.isc.com (News system) (11/17/88)
In article <682@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: > >There is a false assumption in this, namely that an attacker can control >the input to every program. If I have a program which _only_ I have >permision to execute, and I _always_ use it in a pipeline (or in a >command script), and the preceding program in the pipeline (or script) >always generates sufficiently short lines, it is safe to use gets(). >The input to such a program is _every_ bit as much under my control as >the source argument of strcpy(). No one worries much about a program written by Mr O'keefe that can be executed only by Mr O'keefe. What worries most people is programs distributed for public use that are written by someone who is unaware of the 'gets problem'. Simply admonishing programers (of publicly available software) to avoid making the 'gets mistake' is less effective than removing gets from the library. I would like to suggest a library routine to replace gets say, safegets(buffer,count), which for lines no longer then count would behave like gets, and for lines longer than count would place the first count-1 characters of the line into the buffer followed by a '\0'. The value returned by safegets is the line length (or EOF). Marv Rubinstein
john@frog.UUCP (John Woods) (11/17/88)
In article <660023@hpclscu.HP.COM>, shankar@hpclscu.HP.COM (Shankar Unni) writes: > > gets() has legitimate uses. It is in the library Base Document. > > It is widely used in existing code (sometimes safely, sometimes not). > > It stays. > Exactly how do you use gets "safely"? The only case I can think of is when you have a process that fork()s, and the parent feeds the child stuff which is guaranteed to fit into the buffer. I used to think that parsing machine-generated output files was another case. Then one day my program for analyzing /usr/spool/uucp/SYSLOG started blowing out because I had run out of space during a uucp transfer... gets. A clock-tick of convenience. A process-lifetime of regret. :-) -- John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101 ...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu Science does not remove the TERROR of the Gods!
chris@mimsy.UUCP (Chris Torek) (11/18/88)
In article <4509@aldebaran.UUCP> jimp@cognos.uucp (Jim Patterson) writes: >Realistically, though, are there REALLY C implementations out there >which don't take binary 0 to be a NULL pointer, or a floating-point >datum of all zero bits to be other than 0.0? The S1 project at LLL built such a machine, and the people working on it eventually gave in and made all-bits-zero be a nil pointer. It was less work than fixing all the incorrect programs. >Data General MV systems have instructions which take -1 as the null >pointer value.... the C implementation still considers a >null pointer to be 0 even though this requires quite a bit of "glue" >around some system calls to interface between the two formats. >Requiring that the "null pointer constant" be 0, as ANSI C does, just >makes any other implementation painfully difficult (and is begging for >problems when porting software as well). It is neither particularly painful nor difficult, but it is indeed begging to expose all the old bugs (similar to what Sun did when porting 4.2BSD onto their hardware, where *(char *)0 was not 0, but rather `segmentation fault'). >Where a good implementation of calloc() can shine is in virtual memory >(VM) environments where it can avoid actually faulting in the pages >that you allocate. ... demand-page-zero page type ... (VAX VMS is one >system that supports this). 4BSD Unix also supports it. While this is true, it is also true that malloc() can avoid faulting in the pages too, if you simply leave them unset. For bounded operations (i.e., you are not going to go referencing the uninitialised memory) this is just as efficient: pages not used are not touched. Of course, unset memory is a good place for bugs to hide. (If you want to get really silly, memset() could ask the O/S to map out any full pages, marking them as `c'-fill, where c is what memset is to fill with. I wonder if this would actually ever pay off?) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/18/88)
In article <4509@aldebaran.UUCP> jimp@cognos.UUCP (Jim Patterson) writes: >Requiring that the "null pointer constant" be 0, as ANSI C does, just >makes any other implementation painfully difficult (and is begging for >problems when porting software as well). Please get your facts straight before complaining. C has always allowed a null pointer constant to be written as 0. ANSI C merely makes (void*)0 a valid alternative way to write a null pointer constant. (K&R C didn't have void*.) The contexts where a null pointer constant are being used aren't all that hard for a compiler to determine, and it can generate whatever code is necessary for such cases. By no means is an all-0-bit representation forced on the implementation. >It's worth noting that pre-clearing memory shouldn't be considered >wasted overhead on the part of the OS. It's an important security >precaution, to prevent other system users from poking through memory >that used to belong to someone else and which could contain sensitive >information. This may not be important to all users, but it is to >many. All the UNIX implementations I know of arrange for extended program break memory (heap) and stack to be zeroed. It would be even safer to zero it just before relinquishing process ownership of it.
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/18/88)
In article <1988Nov16.184238.16375@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <8902@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >>... In fact they're wrong. I routinely use gets() in >>an utterly safe manner... >Well, "utterly safe" if you're always very careful that part A of your >program preserves the length limits that part B is relying on. Personally >I prefer slightly more robust programming, especially when there's no >significant difference in convenience or efficiency. Why work harder when gets() does exactly what one needs? Another safe use is for small "one-shot" test programs etc. that are to be used only by persons and procedures that will not exceed the limits. I've written quite a few of these over the years and they have never had their buffers overrun, because nobody who is in a position to do so (me, usually) has the least interest in doing so.
ok@quintus.uucp (Richard A. O'Keefe) (11/19/88)
In article <19278@ism780c.isc.com> marv@ism780.UUCP (Marvin Rubenstein) writes: >I would like to suggest a library routine to replace gets say, >safegets(buffer,count), which for lines no longer then count would behave >like gets, and for lines longer than count would place the first count-1 >characters of the line into the buffer followed by a '\0'. The value >returned by safegets is the line length (or EOF). Believing that co-operation is more constructive than criticism, I posted just such a routine to comp.sources.misc a couple of days ago, called getsafe(). The return value is the number of characters in the line _including_ the \n, or 0 for EOF. However, a couple of other people on the net have pointed out problems with my code, such as the possibility of someone supplying >2**32 characters of input so that the counter would wrap around, and some things to be done for dpANS compatibility. I have included these changes, and in a day or two (in case anyone else spots something wrong) will post the revised version. Trying to make getsafe() absolutely foolproof (and portable) has been an educational experience for me. I have come to the conclusion that there is something _worse_, far far worse, than gets(), and that is the routines which people took great care to make safe, but because of C's under- specified integer arithmetic, aren't. (Leaving aside the fact that in UNIX it is _impossible_ for a C program to be sure of getting the right value of errno -- and no, 'volatile' doesn't fix that, it just stops the compiler making it worse.)
henry@utzoo.uucp (Henry Spencer) (11/20/88)
In article <8915@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >>Well, "utterly safe" if you're always very careful that part A of your >>program preserves the length limits that part B is relying on... > >Why work harder when gets() does exactly what one needs? What "work harder"? It's a few more characters of typing. >Another safe use is for small "one-shot" test programs etc... Agreed, provided one is careful to destroy those programs after their one shot is fired. Such programs have a depressing tendency to persist, and even to end up in 4BSD distributions... :-( -- Sendmail is a bug, | Henry Spencer at U of Toronto Zoology not a feature. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/20/88)
In article <1988Nov19.214209.27406@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >What "work harder"? It's a few more characters of typing. It's considerably more than "a few characters". Enough so that if I didn't have gets() I'd write one and add it to my personal library.
jas@ernie.Berkeley.EDU (Jim Shankland) (11/21/88)
In article <8915@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: [In effect: sometimes gets() really is safe, or sufficient: e.g., in programs whose input is known a priori, or in small, one-shot test programs, or ....] >Why work harder when gets() does exactly what one needs? But how much harder do you end up working without gets()? Using fgets() isn't exactly 5 years of hard labor. gets() just doesn't seem to provide much added value, and is almost never safe. (I've certainly written some small, one-shot test programs that ended up being so useful that lots of people had the opportunity to gag at my "one-shot" code.) Jim
tanner@cdis-1.uucp (Dr. T. Andrews) (11/21/88)
In article <8876@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes:
) Bullshit. When I use gets() I use it safely.
I suspect that I am far from the only one who would be most
interested in learning to use gets(3) safely.
--
...!bikini.cis.ufl.edu!ki4pv!cdis-1!tanner ...!bpa!cdin-1!cdis-1!tanner
or... {allegra killer gatech!uflorida decvax!ucf-cs}!ki4pv!cdis-1!tanner
meissner@xyzzy.UUCP (Usenet Administration) (11/22/88)
In article <4509@aldebaran.UUCP> jimp@cognos.UUCP (Jim Patterson) writes: /* stuff deleted */ | I know of at least one system where the system convention is not 0; | Data General MV systems have instructions which take -1 as the null | pointer value, and this has persisted through many system call | conventions as well. However, the C implementation still considers a | null pointer to be 0 even though this requires quite a bit of "glue" | around some system calls to interface between the two formats. | Requiring that the "null pointer constant" be 0, as ANSI C does, just | makes any other implementation painfully difficult (and is begging for | problems when porting software as well). Sigh.... Yes the MV does have some queue instructions that take -1 for a null pointer. However, the general NULL pointer as defined by the C library is all 0's, as it is for other DG languages. Whatever other faults we have (three pointer types, etc.), a non-zero NULL is not one of them. And yes there are some system calls that want -1 in pointer fields as a special value, there are also system calls that want you to do a logical OR with the high bit set. Such is life..... -- Michael Meissner, Data General. Uucp: ...!mcnc!rti!xyzzy!meissner Arpa: meissner@dg-rtp.DG.COM (or) meissner%dg-rtp.DG.COM@relay.cs.net
daveb@gonzo.UUCP (Dave Brower) (11/22/88)
>Jas writes: >In <8915@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: > >[In effect: sometimes gets() really is safe, or sufficient: e.g., in >programs whose input is known a priori, or in small, one-shot test >programs, or ....] > >>Why work harder when gets() does exactly what one needs? > >But how much harder do you end up working without gets()? Using fgets() >isn't exactly 5 years of hard labor. gets() just doesn't seem to >provide much added value, and is almost never safe. (I've certainly >written some small, one-shot test programs that ended up being so useful >that lots of people had the opportunity to gag at my "one-shot" code.) Jasbo obviously doesn't want to tell the story here, so I will, with minor embellishments. Once upon a time, the 'Zbo tried to figure out how the large character writing on a VT100 family terminal worked. In order to do so, he wrote a little test program that took the command line arguments like echo, and spat them out with the right escape sequences and multi-line duplication to correctly drive the terminal. It is lost to history who started it, but there followed a brief period of "writebig" wars, with surreal messages in large letters appearing at random times on compatriot's screens, to humourous effect. "Wonderful!" said the workmates, who quickly snatched the program for use in a messaging service that would send out the clarion call to go to lunch in nice big letters. "But! But! But! It's a hack!", said 'Zbo, "I don't want to support it!" And the users said, "Oh pleeaze, 'Zbo, it's so handy, please don't take it away." And then they whispered, "it would be awfully nice if it would center lines on the screen" "No! No! No!", said 'Zbo, "It's a hack! I don't want to be responsible! Next thing I know, people will start asking for documentation!" But the user's cajoled and begged, and twisted the 'Zbo's arm, and writebig was changed to center lines. Then one day at tea, the Ceferino Lamb innocently inquired about the neat announcement program that wrote letters double high and double wide. "Writebig, huh. Where's the man page?" And the 'Zbo let out a quiet scream. Moral: There is _never_ a one shot test program. (Corrollaries are left as an excercise to the reader.) -dB [ This is one of the few times I've ever disagreed with Doug Gwyn. Don't use gets(). It is the work of the devil. ] -- If life was like the movies, the music would match the picture. {sun,mtxinu,hoptoad}!rtech!gonzo!daveb daveb@gonzo.uucp
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/23/88)
In article <471@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes: >Moral: There is _never_ a one shot test program. That is simply untrue. I've written scads of them over the years, probably an average of one per week.
badri@valhalla.ee.rochester.edu (Badri Lokanathan) (11/24/88)
In article <8959@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes: > In article <471@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes: > >Moral: There is _never_ a one shot test program. > > That is simply untrue. I've written scads of them over the years, > probably an average of one per week. I hate to add to a discussion that is going nowhere, but I must say I agree with Doug. As part of my research I design and implement many algorithms, most of which are modules for a bigger package. Almost all of them have a #ifdef DEBUG_MAIN main() { . . } #endif DEBUG_MAIN built into them for stand-alone debugging. Here, gets, puts, scanf are the easiest way of I/O and I use them all the time. It does not make any sense to worry about safe gets, coz' this part of the code is never going to be used by anybody for purposes other than testing. Quick and easy is the way to go. -- "Don't blame me for wanting more {) badri@ee.rochester.edu The facts are too hard to ignore //\\ {ames,cmcl2,columbia,cornell, I'm scared to death of poverty ///\\\ garp,harvard,ll-xn,rutgers}! I only want what's best for me."-UB40 /\ rochester!ur-valhalla!badri
barmar@think.COM (Barry Margolin) (11/24/88)
In article <8959@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: ]In article <471@gonzo.UUCP> daveb@gonzo.UUCP (Dave Brower) writes: ]>Moral: There is _never_ a one shot test program. ]That is simply untrue. I've written scads of them over the years, ]probably an average of one per week. OK, how about this one: Moral: You can never be sure that a program will be a one-shot test program. Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
rob@pbhyf.PacBell.COM (Rob Bernardo) (11/24/88)
In article <1606@valhalla.ee.rochester.edu> badri@valhalla.ee.rochester.edu (Badri Lokanathan) writes:
+ Almost all of them have a
+
+#ifdef DEBUG_MAIN
+main() {
+ .
+ .
+}
+#endif DEBUG_MAIN
+
+built into them for stand-alone debugging. ... It does not make any
+sense to worry about safe gets, coz' this part of the code is never going to
+be used by anybody for purposes other than testing. Quick and easy is the way
+to go.
Um, er, wasn't it a debug part of sendmail that had it's security hole that
many people compiled in anyway?
--
Rob Bernardo, Pacific Bell UNIX/C Reusable Code Library
Email: ...![backbone]!pacbell!pbhyf!rob OR rob@pbhyf.PacBell.COM
Office: (415) 823-2417 Room 4E750A, San Ramon Valley Administrative Center
Residence: (415) 827-4301 R Bar JB, Concord, California
mcdonald@uxe.cso.uiuc.edu (11/24/88)
>I suspect that I am far from the only one who would be most >interested in learning to use gets(3) safely. - I don't understand how you can get something like gets(3); past a compiler. Isn't 'gets' supposed to take a char * argument, not an int literal?
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/24/88)
In article <32095@think.UUCP> barmar@kulla.think.com.UUCP (Barry Margolin) writes:
-OK, how about this one:
-Moral: You can never be sure that a program will be a one-shot test
-program.
Still not true.
rob@pbhyf.PacBell.COM (Rob Bernardo) (11/25/88)
In article <225800095@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
+>I suspect that I am far from the only one who would be most
+>interested in learning to use gets(3) safely.
+I don't understand how you can get something like
+ gets(3);
+past a compiler. Isn't 'gets' supposed to take a char * argument,
+not an int literal?
You win the Gracie Allen Award of C!
Reminds me of something that happened in code walkthrough. One of the reviewers
noticed that all the exit statements were:
exit(2);
and asked the programmer why an exit value of two was used regardless of
the exit conditions. The programmer replied, "That's what it says on
the top of the man page."
--
Rob Bernardo, Pacific Bell UNIX/C Reusable Code Library
Email: ...![backbone]!pacbell!pbhyf!rob OR rob@pbhyf.PacBell.COM
Office: (415) 823-2417 Room 4E750A, San Ramon Valley Administrative Center
Residence: (415) 827-4301 R Bar JB, Concord, California
atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]) (11/26/88)
In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: >Let your vendor know that you want to see gets deleted from its next >release, delete gets.o from your C library, move gets.o to -lgets, >define gets(s) as "gets is unsafe; use fgets(3)"<><><> in your stdio.h; >do whatever you can to help. > >If your vendor protests your reasonable request, point out that gets, >as part of stdio, is a decade-old backward compatibility hack for >compatibility with the Sixth Edition UNIX Portable I/O Library, which >was utterly replaced by stdio no later than 1979. Accept no excuses; >converting programs from using gets to fgets is largely mechanical, >and stripping trailing newlines is trivial to code yourself. > While the vendor may sympathize with the reasoning, the mechanics of the the US Federal bureaucracy work against this. As long as gets() is in the an official ANSI standard, it will be in a the validation suites. Part of the boiler plate used in sales contracts to the the US government is that the compiler must be an officially validated compiler (lawyers an accountants don't care about the dangers of GETS/FGETS just that it be "certified"). In other words once the ANSI standard gets passed and someone gets themselves declared and official certifier, you can't sell your compiler to a US Federal department without such certification. That is a lot of revenue for a vendor to give up to satisfy your request.
henry@utzoo.uucp (Henry Spencer) (11/27/88)
In article <22402@watmath.waterloo.edu> atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]) writes: >... As long as >gets() is in the an official ANSI standard, it will be in a the validation >suites. Part of the boiler plate used in sales contracts to the >the US government is that the compiler must be an officially >validated compiler... It is not necessary for a vendor to give up validation-suite compliance for the sake of discouraging use of gets(). How a compiler is invoked is compiler-specific in any case; putting gets() in a separate library and requiring that it be explicitly included (e.g. with "-lunsafe") retains compliance (and the ability to compile broken old programs) while still pushing in the right direction. -- Sendmail is a bug, | Henry Spencer at U of Toronto Zoology not a feature. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
daveb@geaclib.UUCP (David Collier-Brown) (11/27/88)
> In article <1988Nov8.054845.23998@utstat.uucp> geoff@utstat.uucp (Geoff Collyer) writes: >>If your vendor protests your reasonable request, point out that gets, >>as part of stdio, is a decade-old backward compatibility hack for >>compatibility with the Sixth Edition UNIX Portable I/O Library, which >>was utterly replaced by stdio no later than 1979. From article <22402@watmath.waterloo.edu>, by atbowler@watmath.waterloo.edu (Alan T. Bowler [SDG]): > While the vendor may sympathize with the reasoning, the mechanics > of the the US Federal bureaucracy work against this. As long as > gets() is in the an official ANSI standard, it will be in a the validation > suites. (Hi, Allan!) This raises the interesting, and possibly invidious, question of why the ANSI C standard includes gets... It may prove advisable to ask for its elimination on the next (NOT! current) round of standardization, and a request from the (U.S) DOD Computer Security Center (sic) for an exception in the validation suite... --dave -- David Collier-Brown. | yunexus!lethe!dave Interleaf Canada Inc. | 1550 Enterprise Rd. | HE's so smart he's dumb. Mississauga, Ontario | --Joyce C-B
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/27/88)
In article <1988Nov27.005945.29173@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >It is not necessary for a vendor to give up validation-suite compliance for >the sake of discouraging use of gets(). How a compiler is invoked is >compiler-specific in any case; putting gets() in a separate library and >requiring that it be explicitly included (e.g. with "-lunsafe") retains >compliance (and the ability to compile broken old programs) while still >pushing in the right direction. This is dumb, dumb, dumb. Now you want C vendors to have to support multiple levels of compilation, one which is harder to invoke for the standard C environment and one that is just like it except it's missing gets() from the C library. This is NOT a "push in the right direction"; it adds complexity merely because some people hate a particular function. I happen to dislike several library functions for reasons similar to those put forth againt gets(); should vendors also segregate those out into a -lgwyn_disapproved library? Why would that be any more absurd than your suggestion? Might I suggest that you simply add to whatever code-quality checks you perform something along the following lines and LEAVE C ALONE: grep -n '[^a-zA-Z_]gets[^a-zA-Z_]' /dev/null "$*" && \ echo "Henry thinks you shouldn't be using gets()." That is the right place to apply your notions of proper coding style. It amazes me how ready people are to jump onto a totally irrelevant bandwagon in the aftermath of the Internet virus/worm attack. If you really think that lack of gets() in somebody's C library would have prevented the attack, you're quite mistaken. A programmer who made the mistake that allowed the virus to enter through the 4BSD finger daemon would very likely have been equally careless with numerous other language and operating system facilities. In fact there have been several such security holes discovered so far, and the famous virus/worm exploited only a couple of them to enter systems. You cannot fix the security problems by removing every function that somebody misuses from the C library; there wouldn't be many left if you took that approach. Learn to use what's there wisely, and when there isn't a canned function suitable for the job invent one (preferably nicely designed and published so it will eventually be a candidate for addition to the standard library). I avoid use of gets() in general-purpose input code, but I still want it in the C library for the times when it IS appropriate and useful. If vendors really are so stupid as to try to make it hard to find, they're going to have trouble convincing me that they want to sell C implementations to me. Of course if necessary I would immediately cons up a public-domain implementation, add it to the deficient libraries, and spread it around for others in the same boat. The net result would have been just a lot of extra trouble to get back to the point from which we started. To repeat my main point: gets() is NOT a problem. Programmers who don't think clearly enough about what they are doing ARE the problem. You cannot solve the real problem by working on the non-problem. It's at best a waste of time and potentially a nuisance; at worst it draws attention away from real causes for lack of system security and gives people a false sense of security, on the misperception that the problem has been properly dealt with.
peter@ficc.uu.net (Peter da Silva) (11/28/88)
In article <7008@cdis-1.uucp>, tanner@cdis-1.uucp (Dr. T. Andrews) writes: > In article <8876@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn ) writes: > ) Bullshit. When I use gets() I use it safely. > I suspect that I am far from the only one who would be most > interested in learning to use gets(3) safely. Um, wear a prophylactic and use a sterile needle? -- Peter da Silva `-_-' Ferranti International Controls Corporation "Have you hugged U your wolf today?" uunet.uu.net!ficc!peter Disclaimer: My typos are my own damn business. peter@ficc.uu.net
gwyn@smoke.BRL.MIL (Doug Gwyn ) (11/28/88)
In article <3449@geaclib.UUCP> daveb@geaclib.UUCP (David Collier-Brown) writes: > This raises the interesting, and possibly invidious, question of >why the ANSI C standard includes gets... It's there because it is useful and much existing code relies on its existence. It was specified in the library base document. There was not sufficient committee support for its removal. We've been over all this before.. >It may prove advisable to ask for its elimination on the next (NOT! >current) round of standardization, I don't know what this is intended to refer to. The proposed ANSI C standard is complete at this point and is expected to be adopted without alteration (although perhaps with additions) by ISO. The committee officially tasked with standardization did NOT deem it advisable to eliminate gets(). This is a CLOSED ISSUE insofar as the standards process is concerned. (That's why it's so annoying to me to hear it being discussed on the net as though anything was really going to, or needed to, be done about the current state of gets() in the C standard. Don't use it if you don't like it, and propagandize your friends to not use it if you wish, but stop suggesting that the standards committees deal with it. We already have. It stays.) > and a request from the (U.S) DOD Computer Security >Center (sic) for an exception in the validation suite... I don't know what model you have for how standards work. To conform to an ANSI standard, the requirements of the standard must be met. There are no provisions for "exceptions". Now, a FIPS can say anything it wants, no matter how silly, and products specified as FIPS-xxx compliant are expected to meet its requirements. An example of this is FIPS-151, which took the IEEE 1003.1 not-yet-standard (Draft 12) as its starting point then added a collection of more specific requirements to it, the result being that no planned vendor POSIX implementation was likely to meet the FIPS without the vendor's plans being revised. It is not clear that this really served anyone's interests, and it is to be hoped that in the case of C any relevant FIPS would not attempt to alter the technical requirements set forth in the ANSI standard. There is much less excuse for this with C than with POSIX, because POSIX had numerous explicit options that I suppose NBS felt obliged to nail down. What is optional in the proposed ANSI C standard are just those things that COULD NOT be made more specific without unjustifiably excluding important compilation/execution environments. There are practically NO "political options" like POSIX had. This was by design, as was the absence of "levels" of conformance and the prohibitions against name-space pollution.
henry@utzoo.uucp (Henry Spencer) (11/30/88)
I won't do a point-by-point rebuttal of Doug's long posting, partly because this is obviously a semi-religious issue. I will content myself with observing that saying "it's all just a matter of coding style" ignores the fact that there are objective differences between coding styles: some *are* better than others. Many people, notably including those at a certain Bell Labs site of some historical significance, seem to agree with Geoff and me that gets() is an error-prone and unnecessary function whose use should be firmly discouraged. This would not magically solve all our problems, but it would eliminate one superfluous sharp edge from widely-used software. -- SunOSish, adj: requiring | Henry Spencer at U of Toronto Zoology 32-bit bug numbers. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
eao@zeus.umu.se (12/03/88)
When I retire gets() I would like a function like this fgetline() to replace it. Are there any drawbacks I have missed? or is recursion to simple to use in problems like this? (To parse a input in search of newline.) /* * char *fgetline(file) * FILE *file; * returns a null terminated line from stdin allocated with *some_malloc */ #include <stdio.h> /* * Size of chunks read whith fgets. This constant could be freely altered * to achieve optimal efficiency. (Try 1 :-) */ #define BUFFSIZE 512 static char *head, *tail; static long size; static FILE *stream; void storetail(buff, tailsize) char *buff; long tailsize; { long headsize; extern char *(*some_malloc)(); headsize = size; size += tailsize; head = (*some_malloc)(size + 1); tail = head + headsize; strncpy(tail,buff, tailsize); return; } static void getchunk() { char buff[BUFFSIZE+1], *strchr(); static char *s; s = fgets(buff, BUFFSIZE+1, stream); if (s == NULL) if (size == 0) /* Do nothing */; else storetail(buff, 0); else { s = strchr(buff, '\n'); if (s != NULL) { /* Newline has been read */ *s = 0; storetail(buff, s - buff); } else { /* Newline is still to be seen. Read more */ size += BUFFSIZE; getchunk(); tail -= BUFFSIZE; strncpy(tail, buff, BUFFSIZE); } } return; } char *fgetline(file) FILE *file; { size = 0; stream = file; getchunk(); if (size == 0) return NULL; else { head[size] = 0; return head; } } Erik Marklund +90-16 63 30
bright@Data-IO.COM (Walter Bright) (12/06/88)
In article <649@umecs.cs.umu.se> eao@zeus.umu.se () writes: >When I retire gets() I would like a function like this fgetline() to replace >it. Are there any drawbacks I have missed? or is recursion to simple to use >in problems like this? (To parse a input in search of newline.) >/* > * char *fgetline(file) > * FILE *file; > * returns a null terminated line from stdin allocated with *some_malloc > */ > [ code deleted for brevity ] My objections to the code presented are: 1. It depends on static variables. This makes it non-reentrant, and therefore a bug waiting to happen on multi-threaded systems like OS2. 2. If a 0 byte is read, the behavior is undefined. So I present this: size_t fgetline(FILE *file, char **pbuffer, size_t *pbufsize); Semantics: Reads a line from the file. The end of the line is defined by reading a \n, or encountering the EOF. If a \n was read, it's included in the read line. 0s may also be read, and are included in the read line, thus the count of bytes read that's returned may be larger than that obtained by strlen(*pbuffer). Input: file input stream pointer pbuffer pointer to the buffer pointer. If the buffer pointer is NULL, one is malloc'd. The buffer pointer must be NULL or point to data allocated by malloc, realloc or calloc. pbufsize pointer to variable containing the allocated length of the buffer Output: *pbuffer If the buffer needs to be realloc'd, this is set to the new buffer. *pbufsize Set to the size of the buffer, which may be larger than the actual amount of data in the buffer. Errors: If EOF or an error occurs while a partially read line is being read, it is treated as the end of the line. If no bytes are read yet, 0 is returned. If malloc or realloc run out of memory, fgetline will return what it's already got, and errno will be set. Returns: number of bytes read into *pbuffer, excluding terminating 0 Example (in ANSI C): typefile(FILE *f) /* copy file to stdout */ { char *buffer = NULL; size_t buflen = 0; size_t linelen; while (1) { linelen = fgetline(f,&buffer,&buflen); if (linelen == 0) /* error or EOF */ break; if (fwrite(buffer,1,linelen,stdout) != linelen) break; /* error */ } free(buffer); } Put a quarter in the juke, Boogie 'till yah puke.