[comp.lang.c] What's so bad about scanf anyway???

avery@netcom.UUCP (Avery Colter) (11/11/90)

In the self-teaching course I have here, scanf is the most often used
input function. I don't see gets used much at all.

And indeed, gets only seems of much advantage when you want to take in
a whole line into one string.

Otherwise, scanf can take individual numbers and put them directly into
numerical variables. With gets, you'd have to first manually parse the
line, and then use strtol to translate them into numbers.

I didn't see puts used for printing strings to screen much either.
printf was the function of choice.

-- 
Avery Ray Colter    {apple|claris}!netcom!avery  {decwrl|mips|sgi}!btr!elfcat
(415) 839-4567   "Fat and steel: two mortal enemies locked in deadly combat."
                                     - "The Bending of the Bars", A. R. Colter

gordon@osiris.cso.uiuc.edu (John Gordon) (11/11/90)

	Scanf() is bad because if you use it to directly get user input, and
the user types in something different than scanf() is expecting, it screws
up.  A better scheme is to store user input in an intermediate buffer and
sscanf() the buffer.

roy%cybrspc@cs.umn.edu (Roy M. Silvernail) (11/11/90)

avery@netcom.UUCP (Avery Colter) writes:

> In the self-teaching course I have here, scanf is the most often used
> input function. I don't see gets used much at all.
> 
> And indeed, gets only seems of much advantage when you want to take in
> a whole line into one string.

The problem with scanf() is that it can behave unpredictably when you
give it badly formatted input. It's better, IMHO, to gets() a whole
line, check its validity and _then_ sscanf() it into the target
variables. (no need for strtol() or similar, since sscanf() looks at the
validated string just as scanf() would have looked at the original
input) It just makes things more bullet-resistant.

--
Roy M. Silvernail |+|  roy%cybrspc@cs.umn.edu  |+| #define opinions ALL_MINE;
main(){float x=1;x=x/50;printf("It's only $%.2f, but it's my $%.2f!\n",x,x);}
"This is cyberspace." -- Peter da Silva  :--:  "...and I like it here!" -- me

jak@sactoh0.SAC.CA.US (Jay A. Konigsberg) (11/12/90)

In article <16582@netcom.UUCP> avery@netcom.UUCP (Avery Colter) writes:
>In the self-teaching course I have here, scanf is the most often used
>input function. I don't see gets used much at all.
>
>And indeed, gets only seems of much advantage when you want to take in
>a whole line into one string.
>
IMHO gets() or getchar() is better for input because the programmer
has greater control over what is being input. Specifically, if the
programmer wants a float value and a character is input gets() won't
error on it, scanf() will. My argument goes mainly to bullet-proofing
programs.

>Otherwise, scanf can take individual numbers and put them directly into
>numerical variables. With gets, you'd have to first manually parse the
>line, and then use strtol to translate them into numbers.
>
Generally, I'll use scanf() when reading from a file that a program
has created. Then scanf() is superior.

>I didn't see puts used for printing strings to screen much either.
>printf was the function of choice.
>
puts()/fputs() is generally faster than printf() and should be used
when possible. However, I will confess that printf() is much more
commonly used. Perhaps its because printf() will handle all cases
and puts() will only handle the string only case.


-- 
-------------------------------------------------------------
Jay @ SAC-UNIX, Sacramento, Ca.   UUCP=...pacbell!sactoh0!jak
If something is worth doing, it's worth doing correctly.

rjc@uk.ac.ed.cstr (Richard Caley) (11/12/90)

In article <VXogs2w163w@cybrspc> roy%cybrspc@cs.umn.edu (Roy M. Silvernail) writes:

    The problem with scanf() is that it can behave unpredictably when you
    give it badly formatted input. It's better, IMHO, to gets() a whole
    line, check its validity and _then_ sscanf() it into the target
    variables.

Maybe it was just a typo, but repeat after me

	`GETS is EVIL'

This has been un unpayed anouncement by paranoids anonymous.

--
rjc@uk.ac.ed.cstr			_O_
					 |<

zvs@bby.oz.au (Zev Sero) (11/12/90)

Roy   = roy%cybrspc@cs.umn.edu (Roy M. Silvernail)
Avery = avery@netcom.UUCP (Avery Colter)

Avery> In the self-teaching course I have here, scanf is the most often used
Avery> input function. I don't see gets used much at all.

Roy> The problem with scanf() is that it can behave unpredictably when you
Roy> give it badly formatted input. It's better, IMHO, to gets() a whole
Roy> line, check its validity and _then_ sscanf() it into the target

When you are not absolutely, 100% sure that the input from stdin will
be what the program expects (e.g. when stdin is coming from a user, or
even from a file if you didn't generate it yourself), scanf() is a bad
idea, for the reason Roy mentioned.  When stdin is a terminal (as it
is in almost all cases), you must expect the user to type absolutely
anything that pops into its putative brain, or simply to lean on the
keyboard and give you a nice random string!

But for exactly the same reason, you should never, never, never use
gets().  The gets() function does not check how many characters it
reads.  It just keeps going until it sees a newline.  If the array
you're storing the thing in overflows, tough bikkies.  H&S warn
against the use of gets() for this reason, but I was flabbergasted to
see a textbook published by Microsoft which consistently used gets()
for input.  The safe way to read input from a user is to use fgets()
and sscanf().

  char buf[1000];
  int i;
  if (!fgets (buf, sizeof buf, stdin)) {
    [complain]
  }
  i = sscanf (buf, [whatever]);
  if (i != [the right number]) {
    [complain]
  }


Only use scanf() and/or gets() when you are sure that the program is
only ever called as a pipe from another program which you know will
not produce any surprises, or with stdin coming from a file generated
by such a program.
---
				Zev Sero  -  zvs@bby.oz.au
If a compiler emits correct code purely by divine guidance and
has no memory at all, it can still be a C compiler.
				-   Chris Torek

imp@marvin.Solbourne.COM (Warner Losh) (11/12/90)

In article <VXogs2w163w@cybrspc> roy%cybrspc@cs.umn.edu (Roy M. Silvernail) writes:
>It's better, IMHO, to gets() a whole line, check its validity and _then_ sscanf()

True.  However, I'd use fgets().  See below.

>It just makes things more bullet-resistant.

gets() is a bad function to use when you don't have total control over
the input (like a user typing at a program).  Since it can't check to
see if the input line is too large for the buffer, "bad things" can
happen as a result.  One vector of the Internet Worm/Virus/Whatever
used the fact that the finger daemon used gets and was running as
root to cause some trouble....

Warner

--
Warner Losh		imp@Solbourne.COM
How does someone declare moral bankruptcy?

chris@mimsy.umd.edu (Chris Torek) (11/12/90)

Whenever you deal with (significant pause, change of voice tone) users
(shudder) ( :-) ) you must think of all the things that could possibly
go wrong.  It is impossible to make any system completely foolproof---
fools are too ingenious---but it is usually not too hard to make a system
fool-resistant.

Consider the following three ways to read and print a series of integers:

	way_the_first() {
		int i;
		while (scanf("%d", &i) != EOF)
			printf("I got %d.\n", i);
	}

	way_the_second() {
		int i;
		char inbuf[10];
		while (gets(inbuf))
			printf("I got %d.\n", atoi(inbuf));
	}

	way_the_third() {
		int i;
		char inbuf[10];
		while (fgets(inbuf, sizeof inbuf, stdin))
			printf("I got %d.\n", atoi(inbuf));
	}

The first is susceptible to a number of problems.  Foremost, however,
is the user typing the wrong thing.  If the user enters the word `one'
instead of the digit `1', the loop runs forever, because the letter `o'
is not a digit and the scanf LEAVES IT BEHIND in the input stream.
Each call to scanf() finds another `o' and then puts it back.

The second is susceptible to a different kind of problem.  If a user
types in `supercalifragilisticexpialidocious', your program does something
completely unpredictable, because you have asked the computer to shove
35 characters (34 plus '\0') into a 10 character buffer.  It Just Ain't
Gonna Work.

The third, while imperfect, is the best of the three.  There is nothing
the user can type in (other than special system functions that, say,
trap into a debugger) that will cause the program to run wild.

This is why we (`we' == comp.lang.c posters who have seen it before)
recommend using fgets to read input.  Scanf and gets are both rather
fragile; fgets is not.  (Once you have some input, you can pick it
apart however you like, including via sscanf.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 405 2750)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

kimcm@diku.dk (Kim Christian Madsen) (11/12/90)

rjc@uk.ac.ed.cstr (Richard Caley) writes:

>In article <VXogs2w163w@cybrspc> roy%cybrspc@cs.umn.edu (Roy M. Silvernail) writes:

>    The problem with scanf() is that it can behave unpredictably when you
>    give it badly formatted input. It's better, IMHO, to gets() a whole
>    line, check its validity and _then_ sscanf() it into the target
>    variables.

>Maybe it was just a typo, but repeat after me

>	`GETS is EVIL'

>This has been un unpayed anouncement by paranoids anonymous.

gets can get you into a lot of trouble if used for in a non-controlled
manner, e.g. for user input. Then you be better off by using fgets or
reading char-by-char with getchar() or family. But using scanf() for user
input is asking for trouble!

				Kim Chr. Madsen
just as

gwyn@smoke.brl.mil (Doug Gwyn) (11/12/90)

In article <4300@sactoh0.SAC.CA.US> jak@sactoh0.SAC.CA.US (Jay A. Konigsberg) writes:
>IMHO gets() or getchar() is better for input because the programmer
>has greater control over what is being input. Specifically, if the
>programmer wants a float value and a character is input gets() won't
>error on it, scanf() will. My argument goes mainly to bullet-proofing
>programs.

If you really are concerned about bulletproofing, don't use gets() unless
you have control over the length of the lines being scanned.  Use fgets()
instead, if the input comes from some uncontrolled source (like a human).

sarima@tdatirv.UUCP (Stanley Friesen) (11/13/90)

In article <16582@netcom.UUCP> avery@netcom.UUCP (Avery Colter) writes:
>In the self-teaching course I have here, scanf is the most often used
>input function. I don't see gets used much at all.

Sounds like a poorly designed course.  using scanf for user input is very
dangerous.  Why?  Because scanf keeps reading until its entire input list
is fulfilled or EOF is reached.  It treats NL as *white* *space*. 
Thus given the invocation:
scanf("%d %d %d", &i, &j, &k);
a user can become very frustrated if he only types in two integers, followed
by a NL (or RETURN). The computer will just *sit* there and do nothing.
No error message about incomplete input, no prompt, no output, no nothing.
And unless the poor user can intuit that the computer wants another number,
he is stuck.  Bleah.

>And indeed, gets only seems of much advantage when you want to take in
>a whole line into one string.

Or if you want to make sure that the computer can respond to the user after
every input line.

>Otherwise, scanf can take individual numbers and put them directly into
>numerical variables. With gets, you'd have to first manually parse the
>line, and then use strtol to translate them into numbers.

Hardly, you just use sscanf on the string read by gets.  Since sscanf treats
end-of-string as EOF, this will not get stuck like scanf.  Now, if sscanf
returns an input count lower than expected you can print a "Usage:" message
to the user explaining clearly that you need that third number.  Voila,
much less frustration, and a more friendly, conversational program.

>I didn't see puts used for printing strings to screen much either.
>printf was the function of choice.

Agreed here. There is little reason to use puts unless the output string
is already formatted.
-- 
---------------
uunet!tdatirv!sarima				(Stanley Friesen)

rob@b15.INGR.COM (Rob Lemley) (11/13/90)

In <1990Nov12.050450.7194@Solbourne.COM> imp@marvin.Solbourne.COM (Warner Losh) writes:

>True.  However, I'd use fgets().  See below.
 . . .
>gets() is a bad function to use when you don't have total control over
>the input (like a user typing at a program).  Since it can't check to
>see if the input line is too large for the buffer, "bad things" can
>happen as a result.

Another bad:

Both gets() and fgets() (SysV R3) will blindly read in NULL chars
(ascii zero's).  Since gets() and fgets() return no info about the
number of chars read (unless you use ftell() maybe?), you might throw
away a whole or partial line of input (and never know about it!).

Rob
--
Rob Lemley
System Consultant, Scanning Software, Intergraph, Huntsville, AL
rcl@b15.ingr.com		OR		...!uunet!ingr!b15!rob
205-730-1546

roy%cybrspc@cs.umn.edu (Roy M. Silvernail) (11/13/90)

imp@marvin.Solbourne.COM (Warner Losh) writes:

> gets() is a bad function to use when you don't have total control over
> the input (like a user typing at a program).  Since it can't check to
> see if the input line is too large for the buffer, "bad things" can
> happen as a result.

Thank you! I hadn't thought of this possibility. Anything I can do to
make my stuff more fool-resistant... (in anticipation of the
new-model-year improved fools ;-)
--
Roy M. Silvernail |+|  roy%cybrspc@cs.umn.edu  |+| #define opinions ALL_MINE;
main(){float x=1;x=x/50;printf("It's only $%.2f, but it's my $%.2f!\n",x,x);}
"This is cyberspace." -- Peter da Silva  :--:  "...and I like it here!" -- me

hilfingr@rama.cs.cornell.edu (Paul N. Hilfinger) (11/13/90)

I have been following this discussion with some interest, but I am
still a little puzzled about a few things.

1. Chris Torek displayed the following code to illustrate why scanf is
"fragile"

>	way_the_first() {
>		int i;
>		while (scanf("%d", &i) != EOF)
>			printf("I got %d.\n", i);
>	}

and said that the foremost problem is that "if the user enters the
word `one' instead of the digit `1', the loop runs forever, because
the letter `o' is not a digit and the scanf LEAVES IT BEHIND in the
input stream."

Can't argue with that, but are we criticizing the best example of the
use of scanf?  What are everyone's comments on the following?

	way_the_first_and_a_half() {
	    for (;;) { /* Please no flaming about how to do infinite loops */
	        int i;
		int r = scanf("%d", &i);
		if (r == EOF) break;
		if (r == 1)
		    printf("I got %d.\n", i);
                else
		    (void) getchar();
	    }
	}

I know of one obvious problem.  For the illegal input `--1', scanf
reads the first `-', finds an error, and then getchar reads and throws
away the second `-'.  This could be corrected by using something more
elaborate than getchar() for error correction.  On the other hand,
let's say that my goal in just to produce code that detects errors and
recovers from them adequately (in particular, without blowing up),
even if its choice of recovery is not always perfect.

2. Several contributors have suggested the use of sscanf after using
fgets.  This has problems, since sscanf won't tell you where in its
input string it stopped reading.  Fortunately, there are strtod,
strtol, etc., but they still leave the problem that the newline
character is not just whitespace when using fgets.  One must make
annoying provisions for ends of lines that are not necessary
when input is treated as a continuous stream of characters.  Do any of
you have nice ways of dealing with these problems? 

Thanks for your help.

Paul Hilfinger

rjc@uk.ac.ed.cstr (Richard Caley) (11/13/90)

In article <1990Nov12.112032.22979@diku.dk> kimcm@diku.dk (Kim Christian Madsen) writes:

    rjc@uk.ac.ed.cstr (Richard Caley) writes:

    >In article <VXogs2w163w@cybrspc> roy%cybrspc@cs.umn.edu (Roy M. Silvernail) writes:

    >    The problem with scanf() is that it can behave unpredictably when you
    >    give it badly formatted input. It's better, IMHO, to gets() a whole
    >    line, check its validity and _then_ sscanf() it into the target
    >    variables.

    >Maybe it was just a typo, but repeat after me

    >	`GETS is EVIL'

    >This has been un unpayed anouncement by paranoids anonymous.

    gets can get you into a lot of trouble if used for in a non-controlled
    manner, e.g. for user input. Then you be better off by using fgets or
    reading char-by-char with getchar() or family. But using scanf() for user
    input is asking for trouble!


Sorry for being confusing, I wasn't defending scanf, I was just
pointing out the fact that gets is never useful.

--
rjc@uk.ac.ed.cstr		_O_
				 |<

henry@zoo.toronto.edu (Henry Spencer) (11/14/90)

In article <16582@netcom.UUCP> avery@netcom.UUCP (Avery Colter) writes:
>In the self-teaching course I have here, scanf is the most often used
>input function. I don't see gets used much at all.

It is not surprising that an introductory course will focus on doing things
the easy way rather than the better but more complex way, for the sake of
not confusing beginners.

As others have discussed at length, the problem with scanf is a poor and
inflexible design that gives you little control over the situation when
unexpected input is encountered.  Pulling in a line with fgets (not gets!)
and then picking it apart with sscanf makes clean error recovery much
easier.

>I didn't see puts used for printing strings to screen much either.
>printf was the function of choice.

People frequently draw this analogy, but it is false and misleading.
Printf works very well for output because its inputs are C data, very
tightly constrained by the language and the machine, and the free-form
version is what it is *generating*.  The situation is not symmetrical;
scanf is faced with a very different and much harder problem.
-- 
"I don't *want* to be normal!"         | Henry Spencer at U of Toronto Zoology
"Not to worry."                        |  henry@zoo.toronto.edu   utzoo!henry

karl@ima.isc.com (Karl Heuer) (11/14/90)

In article <48257@cornell.UUCP> hilfingr@cs.cornell.edu (Paul N. Hilfinger) writes:
>2. Several contributors have suggested the use of sscanf after using
>fgets.  This has problems, since sscanf won't tell you where in its
>input string it stopped reading.

Fixed in ANSI C, via the `%n' specifier.

Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint

aduncan@rhea.trl.oz (Allan Duncan) (11/15/90)

From article <VXogs2w163w@cybrspc>, by roy%cybrspc@cs.umn.edu (Roy M. Silvernail):
> The problem with scanf() is that it can behave unpredictably when you
> give it badly formatted input. It's better, IMHO, to gets() a whole
> line, check its validity and _then_ sscanf() it into the target
> variables. (no need for strtol() or similar, since sscanf() looks at the
> validated string just as scanf() would have looked at the original
> input) It just makes things more bullet-resistant.
                                   ^^^^^^^^^^^^^^^^
I hope you are really using fgets( stdin,...) rather than gets(...) -
there are a lot of _system_ things out there that can be broken by just
keeping on typing till the buffer is overflowed!

Allan Duncan	ACSnet	a.duncan@trl.oz
(03) 541 6708	ARPA	a.duncan%trl.oz.au@uunet.uu.net
		UUCP	{uunet,hplabs,ukc}!munnari!trl.oz!a.duncan
Telecom Research Labs, PO Box 249, Clayton, Victoria, 3168, Australia.

jon@jonlab.UUCP (Jon H. LaBadie) (11/16/90)

In article <1990Nov12.014850.14475@melba.bby.oz.au>, zvs@bby.oz.au (Zev Sero) writes:
> 
> But for exactly the same reason, you should never, never, never use
> gets().  The gets() function does not check how many characters it
> reads.  It just keeps going until it sees a newline.  If the array
> you're storing the thing in overflows, tough bikkies.

This question is asked regarding input from terminals only.

I've a vague recollection that declaring input arrays to be BUFSIZ
in length provides some protection to overflow by gets(3C).

Is this just "conventional wisdom", or does something in the choice
of BUFSIZ for a particular system ensure any overflow protection?

Jon

-- 
Jon LaBadie
{att, princeton, bcr, attmail!auxnj}!jonlab!jon

gwyn@smoke.brl.mil (Doug Gwyn) (11/16/90)

In article <879@jonlab.UUCP> jon@jonlab.UUCP (Jon H. LaBadie) writes:
>Is this just "conventional wisdom", or does something in the choice
>of BUFSIZ for a particular system ensure any overflow protection?

gets() will input arbitrarily long lines.
The only thing really special about BUFSIZ in this regard is that
many UNIX text editors do not support lines longer than that, so
text files containing longer lines are rarely encountered (but
not impossible).

henry@zoo.toronto.edu (Henry Spencer) (11/17/90)

In article <879@jonlab.UUCP> jon@jonlab.UUCP (Jon H. LaBadie) writes:
>I've a vague recollection that declaring input arrays to be BUFSIZ
>in length provides some protection to overflow by gets(3C).

Nope.  Except insofar as making the arrays longer reduces the probability
of somebody overflowing them.  There is no magic associated with BUFSIZ.
-- 
"I don't *want* to be normal!"         | Henry Spencer at U of Toronto Zoology
"Not to worry."                        |  henry@zoo.toronto.edu   utzoo!henry

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/20/90)

In article <879@jonlab.UUCP> jon@jonlab.UUCP (Jon H. LaBadie) wrote
> I've a vague recollection that declaring input arrays to be BUFSIZ
> in length provides some protection to overflow by gets(3C).
In article <1990Nov16.165203.18786@zoo.toronto.edu>,
 henry@zoo.toronto.edu (Henry Spencer) replied
: Nope.  Except insofar as making the arrays longer reduces the probability
: of somebody overflowing them.  There is no magic associated with BUFSIZ.

The original question asked specifically about input from terminals.
Some operating systems (UNIX, VMS, OS/2, others) place a limit on the
number of characters in a line entered at a keyboard.  In OS/2 it's 255.
The POSIX standard defines a parameter, I think it's MAXCANON or something
like that.  The limit has typically been 255, but there's no reason it
couldn't be more.  Since each read() from the keyboard is going to be
stored in a stdio buffer, BUFSIZ had better be at least as large as this
limit, so declaring your arrays that big should be enough to handle
terminal input.  Except...

gets() will keep on reading from stdin until it hits a \n or an EOF.
Lines entered from a keyboard _normally_ end with a \n, but they don't
have to.  Let <EOF> represent your end-of-file character on a UNIX system
and let <junk70> represent 70 printing characters.  Then
	<junk70><EOF>
	<junk70><EOF>
	<junk70><EOF>
	<junk70><EOF>
	<junk70><EOF>
	<junk70><EOF>
	<junk70><EOF>
	<junk70><EOF>
	<junk70><EOF>
	<junk70><RET>
will result in gets() seeing a line with 701 characters, refilling the
stdio buffer several times.  (I've tried this.  It works.)  Since VMS
returns a record to the caller when you hit <RET> _or_ a function key,
I imagine that it might be possible to play a similar trick in VMS.

So the answer is, for your own private use, yes you can get away with
using BUFSIZ as a limit for keyboard input, but don't do dare do that
in a program you sell to customers.

-- 
I am not now and never have been a member of Mensa.		-- Ariadne.

epames@eos.ericsson.se (Michael Salmon) (11/20/90)

In article <4319@goanna.cs.rmit.oz.au> Richard A. O'Keefe writes:
>gets() will keep on reading from stdin until it hits a \n or an EOF.
>Lines entered from a keyboard _normally_ end with a \n, but they don't
>have to.  Let <EOF> represent your end-of-file character on a UNIX system

This is getting a long way from the original point but I thought I
should respond to this statement. In C EOF is not a character, its
value is such that it can *NEVER* be present in a file. In all the C
implementations that I have ever seen EOF has been represented by the
value -1 but it can be any integer and one of the traps that I think
everyone falls for at some time is to presume that getc() returns a
character rather than an integer.  EOF is never 255 or ^D, it is in
fact a read() that returns 0 as the byte count. N.B. the least
significant char of the usual representation of -1 is 255.

Michael Salmon
L.M.Ericsson
Stockholm

richard@aiai.ed.ac.uk (Richard Tobin) (11/21/90)

In article <1990Nov20.123036.11103@ericsson.se> epames@eos.ericsson.se writes:
>should respond to this statement. In C EOF is not a character, its

You seem to have misunderstood Richard O'Keefe's point.  By EOF,
Richard meant the C #defined constant, normally -1.  By <EOF>, he
meant the key you press to send end-of-file (perhaps ^D), which is why
he said:

>>Let <EOF> represent your end-of-file character on a UNIX system

When you type <EOF> after a <linefeed> (or another <EOF>) it results
in read() returning zero, and getc() returning EOF.  This is the
"normal" use of the <EOF> key.

When you type <EOF> at other times (eg after typing some letters), it
causes the line to be made available for read()ing, just as <linefeed>
does.  However, there is no \n character (or ^D or whatever) appended
to the data.  Typing

    abc<EOF>

results in read() returning 3 and getting the characters 'a', 'b', and
'c'.  Thus you can type in something that's a line from the point of
view of the tty driver, but doesn't end with '\n' and isn't a line
from the point of view of fgets(), which does something like

    while(--count > 0 && (c = getc(file)) != EOF  && c != '\n')

completely ignoring the boundaries of data returned by read().

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/22/90)

In article <1990Nov20.123036.11103@ericsson.se>, epames@eos.ericsson.se (Michael Salmon) writes:
: In article <4319@goanna.cs.rmit.oz.au> Richard A. O'Keefe writes:
: >gets() will keep on reading from stdin until it hits a \n or an EOF.
: >Lines entered from a keyboard _normally_ end with a \n, but they don't
: >have to.  Let <EOF> represent your end-of-file character on a UNIX system

: This is getting a long way from the original point but I thought I
: should respond to this statement.

It would have been better to try understanding it first.
I wrote
	Let <EOF> represent your end-of-file character on a UNIX system.
People who have a UNIX system and don't know what I'm talking about should
read the "stty" manual page.  If they still don't understand, they should
find someone who does understand and get an explanation.  That or refrain
from posting.  The point is, of course, that the way you signal end-of-file
from a keyboard in UNIX is by typing a particular character.   Q: _Which_
character?  A: _You_ get to pick.  Some people use End-of-Transmission (^D).
Some people like ^Z.  I've seen ^Y used.

: In C EOF is not a character, its
: value is such that it can *NEVER* be present in a file.

So flipping what?  I said nothing whatsoever about C's EOF macro.
I was talking about the key you type at the keyboard.  This is totally
independent of C's EOF macro and the two values have nothing in common.
What I was doing was exhibiting a method of tricking gets() into reading
an arbitrary number of characters from the keyboard as one "line"; that
method relies on the malicious typist typing his <EOF> character from time
to time, whichever character he has selected for that purpose.

-- 
I am not now and never have been a member of Mensa.		-- Ariadne.

epames@eos.ericsson.se (Michael Salmon) (11/22/90)

In article <3797@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes:
>You seem to have misunderstood Richard O'Keefe's point.  By EOF,
>Richard meant the C #defined constant, normally -1.  By <EOF>, he
>meant the key you press to send end-of-file (perhaps ^D), which is why
>he said:
>
>>>Let <EOF> represent your end-of-file character on a UNIX system

^D is *NOT* an eof character, it is a command to the tty driver to send
the contents of the input buffer, the same as your ERASE etc. characters
are special commands to the tty driver. By typing ^D when there are no
characters in the input buffer you are sending 0 characters which is the
end of file condition.

Michael Salmon
L.M.Ericsson
Stockholm

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/23/90)

I wrote
 Let <EOF> represent your end-of-file character on a UNIX system

In article <1990Nov22.071319.3222@ericsson.se>, epames@eos.ericsson.se (Michael Salmon) writes:
> ^D is *NOT* an eof character, it is a command to the tty driver ...

With the utmost possible respect, may I suggest that since the context
was "UNIX system"s, we take the UNIX manuals as authoritative?  From
"man 7 termio":
          Normally, terminal input is processed in units of lines.  A
          line is delimited by a new-line (ASCII LF) character, an
          end-of-file (ASCII EOT) character, or an end-of-line
          character.

The SVID release 2 has the same text, and speaks of
	The ERASE, KILL, and EOF characters ...

So when I wrote of an "end-of-file character" I was using *precisely*
the terminology blessed by the SVID, which nowhere calls it a "command".
-- 
I am not now and never have been a member of Mensa.		-- Ariadne.

epames@eos.ericsson.se (Michael Salmon) (11/23/90)

In article <4354@goanna.cs.rmit.oz.au> Richard A. O'Keefe writes:
>I wrote
> Let <EOF> represent your end-of-file character on a UNIX system
>
....
>
>The SVID release 2 has the same text, and speaks of
>	The ERASE, KILL, and EOF characters ...
>
>So when I wrote of an "end-of-file character" I was using *precisely*
>the terminology blessed by the SVID, which nowhere calls it a "command".

I agree that that is what the manual says and I think it is unfortunate
as it doesn't mean end of file as defined by gets() etc. I quote below from
SunOS man page for termio.

     EOF       (CTRL-D or ASCII EOT) may be used to  generate  an
               end-of-file  from  a terminal.  When received, all
               the characters waiting to be read are  immediately
               passed  to the program, without waiting for a NEW-
               LINE, and the EOF is discarded.   Thus,  if  there
	       are no characters waiting, which is to say the EOF
               occurred at the beginning of a line, zero  charac-
               ters  will  be  passed back, which is the standard
               end-of-file indication.

Strictly my own opinions.
    Michael Salmon
    L.M.Ericsson
    Stockholm

gwyn@smoke.brl.mil (Doug Gwyn) (11/24/90)

In article <1990Nov22.071319.3222@ericsson.se> epames@eos.ericsson.se writes:
>^D is *NOT* an eof character, it is a command to the tty driver to send
>the contents of the input buffer, ...

The character that in "cooked" mode is interpreted by the terminal driver to
act as an invisible line delimiter is normally called the EOF character in
UNIX user documentation.  While most people map ^D to this control function,
the choice is programmable.

In any event, the (cooked mode) input sequence
	A B <EOF> C D <NL> E F <EOF> <EOF> G H <NL> <EOF> I J
(where <EOF> is often ^D and <NL> is often ^M)
results in four "packets" being inserted into the terminal input queue:
	A B
	C D \n
	E F
	<empty>
	G H \n
	<empty>
with the two characters I and J still buffered for canonical (erase/kill)
processing.
The first subsequent read() on the terminal (assuming that several characters
are requested for the read count) will return the two characters:
	A B
The second such read() will return the three characters:
	C D \n
the third such read() will return:
	E F
The fourth read() will return:
	<empty>
The fifth read() will return:
	G H \n
The sixth read() will return:
	<empty>
(That is the only use for the EOF character that most UNIX users are
aware of.)
The seventh read() will block until a line delimiter is input.

The standard I/O functions need to be prepared to deal with this behavior,
generally by having input operations loop until enough data is obtained to
satisfy the implementation request (i.e. up to a \n for gets() or until the
requested count is satisfied for fread()).  While doing this, a read() that
returns 0 characters is conventionally interpreted as an "end of file"
indication.  While most applications will not read past an EOF indication,
on stream-like input channels such as terminals this might be a reasonable
thing to do under some circumstances.

Anyway, this is the intended UNIX behavior.  There are undoubtedly
variations even among UNIX implementations, and other operating systems
may have significantly different terminal input support.

richard@aiai.ed.ac.uk (Richard Tobin) (11/27/90)

In article <1990Nov22.071319.3222@ericsson.se> epames@eos.ericsson.se writes:

>>>>Let <EOF> represent your end-of-file character on a UNIX system

>^D is *NOT* an eof character, it is a command to the tty driver to send
>the contents of the input buffer,

I thought I had made it quite clear what happens when you type ^D.

Are you trying to make a substantial point here, or are you just 
quibbling about the term "end-of-file character"?

When Richard O'Keefe says "end-of-file character" he means "the
character you type when you want to cause the program to see an
end-of-file condition".  Just like "erase character" (a term you used
yourself) means "the character you press when you want to erase a
character", and "suspend character" means "the character you press
when you want to suspend your program".

If your point is that the behaviours after newline and after other
characters are really the same - ie send the waiting characters, of
which there may be a zero or non-zero number - then yes, that's true,
but it's normally more useful to distinguish these cases.

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

epames@eos.ericsson.se (Michael Salmon) (11/27/90)

In article <3819@skye.ed.ac.uk> Richard Tobin writes:
>In article <1990Nov22.071319.3222@ericsson.se> epames@eos.ericsson.se writes:
>
>>>>>Let <EOF> represent your end-of-file character on a UNIX system
>
>>^D is *NOT* an eof character, it is a command to the tty driver to send
>>the contents of the input buffer,
>
>I thought I had made it quite clear what happens when you type ^D.
>
>Are you trying to make a substantial point here, or are you just 
>quibbling about the term "end-of-file character"?

I think that the substantial point is that there is no "end-of-file
character". End of file is a read() of zero characters, when reading
from a terminal this can be achieved by typing ^D (usually) with a
blank line. The end of file indication requires both conditions.
Getting back to gets(), it behaved exactly as I expected it would and
as the manuals say it should.

Solely the opinion of
    Michael Salmon
    L.M.Ericsson
    Stockholm

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (11/28/90)

>Let <EOF> represent your end-of-file character on a UNIX system

In article <1990Nov27.110005.7203@ericsson.se>, epames@eos.ericsson.se (Michael Salmon) writes:
> I think that the substantial point is that there is no "end-of-file
> character".

According to the UNIX manuals, there *IS*.  The end of file character is
a character you type on the keyboard.  Nobody has ever claimed that read()
or gets() or getchar() *return* this character to the caller, or that they
themselves ever see it.  Never mind whether the name is confusing, that *IS*
the name used in the UNIX manuals.

To those who understood the point the first time, sorry to have troubled
you.  To anyone who still thinks 'that there is no "end-of-file character"'
on a UNIX system, do us all a favour, *read* *the* *fine* *manuals*.
-- 
I am not now and never have been a member of Mensa.		-- Ariadne.