[comp.sources.bugs] Bug in csh

greim@sbsvax.UUCP (Michael Greim) (07/03/89)

In article <192@dg.dg.com> in comp.bugs.sys5 rec@dg.dg.com (Robert Cousins)
reported a bug in csh.
Here is a description of it and a fix.

Absorb, apply and enjoy,
	-mg

Symptoms:

>I have found that there are several basic bugs in the Cshell history
>mechanism which can be quite problematic.  These were originally called
>to my attention by the work of a couple of researchers at the U of W in
>Madison.  Bug 1:
>
>	!a%999999999f
>
>which causes 999999999 spaces to be output to the screen.  (Fewer on
>16 bit machines.)  It seems that the error message in the history
>module goes through the first argument of a printf() call and can be
>interpreted as a format.  A second bug, which happens on fewer machines
>takes place with:
>
>	!a%f%f
>
>which can cause csh to dump core from a floating point error.  
[...]

Diagnosis:

	When printing an error message the csh function "error" passes
	its first argument to its printf. If this string contains a "%",
	printf tries to evaluate the following as a format accessing its
	argument list, which is empty quite probably. This will cause
	unpredictable behaviour, like core dumps, strange numbers, ...

Therapy:
	Apply the following context diff to your source of 43BSD csh.
	This fix works like this:
	- in error only a string ending in "@" is passed as format string
		to printf.
	- only internal error messages needing arguments need format specifiers,
		so I add a trailing "@" to any such message
	- any string not ending with "@" is passed to printf with "%s" as first
		parameter.

*** sh.dol.c.old	Mon Jul  3 15:18:22 1989
--- sh.dol.c	Mon Jul  3 15:12:39 1989
***************
*** 159,165
  			if (c == c1)
  				break;
  			if (c == '\n' || c == DEOF)
! 				error("Unmatched %c", c1);
  			if ((c & (QUOTE|TRIM)) == ('\n' | QUOTE))
  				--wp, ++i;
  			if (--i <= 0)

--- 159,165 -----
  			if (c == c1)
  				break;
  			if (c == '\n' || c == DEOF)
! 				error("Unmatched %c@", c1);		/* mg 3-jul-89 */
  			if ((c & (QUOTE|TRIM)) == ('\n' | QUOTE))
  				--wp, ++i;
  			if (--i <= 0)
*** sh.err.c.old	Mon Jul  3 15:18:36 1989
--- sh.err.c	Mon Jul  3 15:16:10 1989
***************
*** 34,39
  {
  	register char **v;
  	register char *ep;
  
  	/*
  	 * Must flush before we print as we wish output before the error

--- 34,40 -----
  {
  	register char **v;
  	register char *ep;
+ 	register int i;		/* mg 3-jul-89 */
  
  	/*
  	 * Must flush before we print as we wish output before the error
***************
*** 53,58
  	/*
  	 * A zero arguments causes no printing, else print
  	 * an error diagnostic here.
  	 */
  	if (s)
  		printf(s, arg), printf(".\n");

--- 54,65 -----
  	/*
  	 * A zero arguments causes no printing, else print
  	 * an error diagnostic here.
+ 	 * mg 3-jul-89 : If error is called with s containing a part of
+ 	 * the users input, and if this contains a "%", printf tries to evaluate
+ 	 * this as a format reference. Try : "!a%100s".
+ 	 * I fixed this: internal error messages which should be processed as
+ 	 * format strings, will end in "@", all else is NOT interpreted as a
+ 	 * format string.
  	 */
  	if (s) {
  		i = strlen (s) - 1;
***************
*** 54,61
  	 * A zero arguments causes no printing, else print
  	 * an error diagnostic here.
  	 */
! 	if (s)
! 		printf(s, arg), printf(".\n");
  
  	didfds = 0;		/* Forget about 0,1,2 */
  	if ((ep = err) && errspl) {

--- 61,74 -----
  	 * format strings, will end in "@", all else is NOT interpreted as a
  	 * format string.
  	 */
! 	if (s) {
! 		i = strlen (s) - 1;
! 		if (s[i] == '@') {
! 			s[i] = '\0';
! 			printf(s, arg), printf(".\n");
! 		} else
! 			printf("%s\n", s);
! 	}
  
  	didfds = 0;		/* Forget about 0,1,2 */
  	if ((ep = err) && errspl) {
*** sh.func.c.old	Mon Jul  3 15:18:55 1989
--- sh.func.c	Mon Jul  3 15:12:55 1989
***************
*** 1004,1010
  	while (*cp && *cp == *str)
  		cp++, str++;
  	if (*cp)
! 		error("Bad scaling; did you mean ``%s''?", str0);
  }
  
  plim(lp, hard)

--- 1004,1010 -----
  	while (*cp && *cp == *str)
  		cp++, str++;
  	if (*cp)
! 		error("Bad scaling; did you mean ``%s''?@", str0); /* mg 3-jul-89 */
  }
  
  plim(lp, hard)
*** sh.glob.c.old	Mon Jul  3 15:19:06 1989
--- sh.glob.c	Mon Jul  3 15:13:10 1989
***************
*** 139,145
  			if (gpathp != gpath + 1) {
  				*gpathp = 0;
  				if (gethdir(gpath + 1))
! 					error("Unknown user: %s", gpath + 1);
  				(void) strcpy(gpath, gpath + 1);
  			} else
  				(void) strcpy(gpath, value("home"));

--- 139,145 -----
  			if (gpathp != gpath + 1) {
  				*gpathp = 0;
  				if (gethdir(gpath + 1))
! 					error("Unknown user: %s@", gpath + 1); /* mg 3-jul-89 */
  				(void) strcpy(gpath, gpath + 1);
  			} else
  				(void) strcpy(gpath, value("home"));
*** sh.sem.c.old	Mon Jul  3 15:19:21 1989
--- sh.sem.c	Mon Jul  3 15:13:22 1989
***************
*** 401,405
  		return;
  	if ((stb.st_mode & S_IFMT) == S_IFCHR)
  		return;
! 	error("%s: File exists", cp);
  }

--- 401,405 -----
  		return;
  	if ((stb.st_mode & S_IFMT) == S_IFCHR)
  		return;
! 	error("%s: File exists@", cp);	/* mg 3-jul-89 */
  }
-- 
Michael Greim    Email : greim@sbsvax.informatik.uni-saarland.dbp.de
                 or    : ...!uunet!unido!sbsvax!greim
# include <disclaimers/std.h>

argv%eureka@Sun.COM (Dan Heller) (07/05/89)

Why are people so stuck on using printf?  Michael Greim finds a bug
in csh because it misuses printf, yet the fix (altho it works) continues
to use printf -- I have nothing against using printf, but this is a very
costly function when you compare it to something like fputs or puts.
(Have you ever seen the source to printf()?)  But more importantly, it
causes severe bugs with programs that are sometimes hard to trace.

In article greim@sbsvax.UUCP (Michael Greim) writes:
> ! 	if (s) {
> ! 		i = strlen (s) - 1;
> ! 		if (s[i] == '@') {
> ! 			s[i] = '\0';
> ! 			printf(s, arg), printf(".\n");
> ! 		} else
> ! 			printf("%s\n", s);
> ! 	}

How about this:
		if (s[i] == '@') {
			s[i] = '\0';
			printf(s, arg), puts(".");
		} else
			puts(s);

Now don't get me wrong, I realize that this is a trivial "simple" thing
that one might say, "give me a break."  But the fact that people are not
as conscientious about how printf is used is the reason that the bugs
recently found is csh are created.  A much more important bug that I've
found as a result of the same errors is when programs write out data to
files using fprintf.  As soon as there is a %s in the data written, you
just created a junk file.  And this is also the type of bug that doesn't
present itself frequently.

dan <island!argv@sun.com>
-----
My postings reflect my opinion only -- When on the net, I speak for no company.

chk@dciem.dciem.dnd.ca (C. Harald Koch) (07/05/89)

In article <113630@sun.Eng.Sun.COM> island!argv@sun.com (Dan Heller) writes:
>Why are people so stuck on using printf?

Because everybody's first C program, taken from K&R, is:

main()
{
	printf("Hello, World!\n");
}

From step one, we are taught to use it for everything.

-- 
          C. Harald Koch          NTT Systems, Inc., Toronto, Ontario          
     chk@gpu.utcs.utoronto.ca,  chk@dretor.dciem.dnd.ca,  chk@chkent.UUCP
University: Like a software house, except the software's free, and it's usable,
and it works, and if it breaks they'll quickly tell you how to fix it, and ...

greim@sbsvax.UUCP (Michael Greim) (07/06/89)

In article <113630@sun.Eng.Sun.COM>, argv%eureka@Sun.COM (Dan Heller) writes:
> Why are people so stuck on using printf?  Michael Greim finds a bug
> in csh because it misuses printf, yet the fix (altho it works) continues
> to use printf -- I have nothing against using printf, but this is a very
> costly function when you compare it to something like fputs or puts.
> (Have you ever seen the source to printf()?)  But more importantly, it
> causes severe bugs with programs that are sometimes hard to trace.
1.) I did not find the bug in printf, Robert Cousins did. I rather found
	a fix for it.
2.) Yes stdio's printf is very costly, but ...
3.) ... csh uses its own printf. In fact, printf just calls _doprnt.
	_doprnt calls strout to emit parts of its stuff, which in turn calls
	putchar. There is no puts or fputs.
4.) Of course I could have used putchar. But for the strings I would
	have needed a loop -> more code and error messages occur fairly seldom,
	so there is not much gain in trying to safe some microseconds by taking the
	shortcut for the calling sequence.
5.)	_doprnt is written in VAX assembler. So on other machines it must either
	be rewritten in the appropriate assembler, or a C implementation of printf
	must be used.
	There might not even be a putchar available after all.
	In my testversion I use the C printf from vi.
6.) Yes, I have seen the innards of printf. In my own programs I try to
	avoid printf (and scanf) if possible. (See "strings", recently published
	in comp.sources.misc)

> Now don't get me wrong, I realize that this is a trivial "simple" thing
> that one might say, "give me a break."  But the fact that people are not
> as conscientious about how printf is used is the reason that the bugs
> recently found is csh are created.  A much more important bug that I've
> found as a result of the same errors is when programs write out data to
> files using fprintf.  As soon as there is a %s in the data written, you
> just created a junk file.  And this is also the type of bug that doesn't
Normally one gets a core dump fairly quick.
How can you create a file using
	fprintf ("%s", n);
? (Which is what I think you are writing about)
And then there is lint ...

	-mg
-- 
Michael Greim    Email : greim@sbsvax.informatik.uni-saarland.dbp.de
                 or    : ...!uunet!unido!sbsvax!greim
[.signature removed by the board of censors for electronic mail's main
executive computer because it contained a four letter word ("word")]

argv%eureka@Sun.COM (Dan Heller) (07/07/89)

In article <771@sbsvax.UUCP> greim@sbsvax.UUCP (Michael Greim) writes:
> In article <113630@sun.Eng.Sun.COM>, argv%eureka@Sun.COM (Dan Heller) writes:
> > Why are people so stuck on using printf?  Michael Greim finds a bug

Basically, I apologize for the "flame" that people seem to think I
posted.  I wasn't flaming really.  I just meant to point out that
puts() will never be slower than printf(), so to avoid "other bugs"
(see below), it's safer to use puts when that's what you really meant.
Sorry, Michael.

> >   As soon as there is a %s in the data written, you
> > just created a junk file.  And this is also the type of bug that doesn't
> Normally one gets a core dump fairly quick.
> How can you create a file using
> 	fprintf ("%s", n);
> ? (Which is what I think you are writing about)
> And then there is lint ...

No, I was talking about the type of bug that lint can't catch.  That is,
I see this type of usage all the time:

    ...
    send_to_file("this is a string");
    ...

send_to_file(s)
char *s;
{
    extern FILE *fp;
    fprintf(fp, s);
}

It is this type of bug that eventually catches up to you.  For example,
a hypothetical mail program could extract the return address of a user
and try to print it:

    extern char *get_return_address();
    char *addr;

    addr = get_return_address(message_3);
    ...
    printf(addr);

Now, suppose the address that was returned was:

    argv%island@sun.com

What do you suppose will happen?  This is the type of warning I was
trying to convey to people.  Sorry if I implied anything else...


dan <island!argv@sun.com>
-----
My postings reflect my opinion only -- not the opinion of any company.

allyn@hp-sdd.hp.com (Allyn Fratkin) (07/10/89)

In article <769@sbsvax.UUCP>, greim@sbsvax.UUCP (Michael Greim) writes:
> 	This fix works like this:
> 	- in error only a string ending in "@" is passed as format string
> 		to printf.
> 	- only internal error messages needing arguments need format specifiers,
> 		so I add a trailing "@" to any such message
> 	- any string not ending with "@" is passed to printf with "%s" as first
> 		parameter.

nothing personal, but what a disgusting hack.  you use @ as the *last* 
character of the string to indicate the presence of a second parameter?
wouldn't it have been much easier to use the first char?

but why didn't you just change the calls
	error(singlearg);
to 
	error("%s", singlearg);

seems like it would have been much cleaner.

(you could have done it with sed!  but i won't get into that now :-) )
-- 
 From the virtual mind of Allyn Fratkin            allyn@sdd.hp.com
                          San Diego Division       - or -
                          Hewlett-Packard Company  uunet!ucsd!hp-sdd!allyn

greim@sbsvax.UUCP (Michael Greim) (07/10/89)

In article <2218@hp-sdd.hp.com>, allyn@hp-sdd.hp.com (Allyn Fratkin) writes:
> In article <769@sbsvax.UUCP>, greim@sbsvax.UUCP (Michael Greim) writes:
> > 	This fix works like this:
> > 	- in error only a string ending in "@" is passed as format string
> > 		to printf.
> > 	- only internal error messages needing arguments need format specifiers,
> > 		so I add a trailing "@" to any such message
> > 	- any string not ending with "@" is passed to printf with "%s" as first
> > 		parameter.
> 
> nothing personal, but what a disgusting hack.  you use @ as the *last* 
> character of the string to indicate the presence of a second parameter?
> wouldn't it have been much easier to use the first char?
I never denied that it was a disgusting hack. Most changes in csh will
result in such a thing.
When I decided on how to fix it, I had the following goals in mind:
- The fixes should be to as little files as possible. If people have
	done changes to csh, they may have to do the new fixes by hand, and
	will be really pissed off, if they have so much to do. (Our csh
	contains a lot of changes, e.g. it has a version command, which prints
	out csh's version, some special csh variables which change certain
	(mis-)behaviours; the format of "time" is changed ...)
- there are a lot more invocations to error with one argument than
	there are to error with two arguments.
- No new function unless necessary.
- No new global variables unless necessary.
- Didn't want to spend too much time on it.
- I did not use "@" as the first character, because the command
	!@%10s would then still produce 10 spaces, as error would see "@%10s",
	think: "aha, it starts with @, so it is a format string", call
	printf ("%10s", ...   which will produce 10 space, and maybe a core
	dump.
	I decided on "@" because no valid error message from inside csh
	can end with a "@".

> 
> but why didn't you just change the calls
> 	error(singlearg);
> to 
> 	error("%s", singlearg);
> 
> seems like it would have been much cleaner.
Maybe, but there are several error function, which package the messages
in several ways. If one uses "error(fmt, arg)", one must replace some
calls to seterr2 and seterrc, one must trace possible calls to "error(av)"
"Perror" or to "error" of the form "error(err)" to see,
where the "av" really comes from, and probably change some other calls too,
if the "av" can contain a "%".

I preferred the simple, though ugly way, as I did not want to spend a lot of
time on the investigation whether any alternative might yield a more
beautiful fix.

>  From the virtual mind of Allyn Fratkin            allyn@sdd.hp.com
>                           San Diego Division       - or -
>                           Hewlett-Packard Company  uunet!ucsd!hp-sdd!allyn

	-mg
-- 
Michael Greim    Email : greim@sbsvax.informatik.uni-saarland.dbp.de
                 or    : ...!uunet!unido!sbsvax!greim
[.signature removed by the board of censors for electronic mail's main
executive computer because it contained a four letter word ("word")]

greim@sbsvax.UUCP (Michael Greim) (07/10/89)

In article <114074@sun.Eng.Sun.COM>, argv%eureka@Sun.COM (Dan Heller) writes:
= In article <771@sbsvax.UUCP> greim@sbsvax.UUCP (Michael Greim) writes:
= > In article <113630@sun.Eng.Sun.COM>, argv%eureka@Sun.COM (Dan Heller) writes:
= > > Why are people so stuck on using printf?  Michael Greim finds a bug
= 
= Basically, I apologize for the "flame" that people seem to think I
= posted.  I wasn't flaming really.  I just meant to point out that
= puts() will never be slower than printf(), so to avoid "other bugs"
= (see below), it's safer to use puts when that's what you really meant.
= Sorry, Michael.

No offense taken.
As to "puts() will never be slower than printf()": I recall reading
somewhere that some people think about implementing puts() using printf().

= 
= No, I was talking about the type of bug that lint can't catch.  That is,
= I see this type of usage all the time:
= 
=     ...
=     send_to_file("this is a string");
=     ...
= 
= send_to_file(s)
= char *s;
= {
=     extern FILE *fp;
=     fprintf(fp, s);
= }
You must look at some strange progams :-)

I aggree that this is bad practice and people should be made aware of
it.

	-mg
-- 
Michael Greim    Email : greim@sbsvax.informatik.uni-saarland.dbp.de
                 or    : ...!uunet!unido!sbsvax!greim
[.signature removed by the board of censors for electronic mail's main
executive computer because it contained a four letter word ("word")]

frank@zen.co.uk (Frank Wales) (07/12/89)

In article <774@sbsvax.UUCP> greim@sbsvax.UUCP (Michael Greim) writes:
>As to "puts() will never be slower than printf()": I recall reading
>somewhere that some people think about implementing puts() using printf().

If I ever found that a C library I was using had puts() (or any other
straightforward string function) implemented by calling printf(), I'd
be round at the vendor's factory gates with a large shovel to reprogram
the clown responsible.  Just in case you know someone who's thinking
about it.
--
Frank Wales, Systems Manager,        [frank@zen.co.uk<->mcvax!zen.co.uk!frank]
Zengrange Ltd., Greenfield Rd., Leeds, ENGLAND, LS9 8DB. (+44) 532 489048 x217 

greim@sbsvax.UUCP (Michael Greim) (07/14/89)

In article <1640@zen.co.uk>, frank@zen.co.uk (Frank Wales) writes:
> In article <774@sbsvax.UUCP> greim@sbsvax.UUCP (Michael Greim) writes:
> >As to "puts() will never be slower than printf()": I recall reading
> >somewhere that some people think about implementing puts() using printf().
> 
> If I ever found that a C library I was using had puts() (or any other
> straightforward string function) implemented by calling printf(), I'd
> be round at the vendor's factory gates with a large shovel to reprogram
> the clown responsible.  Just in case you know someone who's thinking
> about it.
If it isn't too far away, I'd join you :-)
I've dug in our archives, and lo! I have found the reference in question.
Here it is:

#From: bostic@ucbvax.BERKELEY.EDU (Keith Bostic)
#Newsgroups: comp.bugs.4bsd
#Subject: Re: [n]eqn(1) speedup
#Keywords: [n]eqn munches too much CPU time
#Message-ID: <25589@ucbvax.BERKELEY.EDU>
#Date: 11 Aug 88 17:52:19 GMT
#References: <583@sbsvax.UUCP>
#Organization: University of California at Berkeley
#Lines: 13
#Posted: Thu Aug 11 18:52:19 1988
#
#In article <583@sbsvax.UUCP>, bs@sbsvax.UUCP (Bernard Sieloff) writes:
#> Subject: [n]eqn(1) speedup
#> Input to eqn with no effect to eqn itself will be passed unaltered to
#> stdout, but by calling printf()!
#> Eqn behavior is heavily sped up, if the "passing"-printf() calls  are
#> changed to fputs() calls.
#
#This is implementation dependent.  On the current BSD system, printf
#is significantly faster, for strings that don't contain percent signs,
#than [f]puts calls.  I expect to make [f]puts just call printf in the
#near future.
#
#Keith Bostic

> --
> Frank Wales, Systems Manager,        [frank@zen.co.uk<->mcvax!zen.co.uk!frank]
> Zengrange Ltd., Greenfield Rd., Leeds, ENGLAND, LS9 8DB. (+44) 532 489048 x217 

	-mg
-- 
Michael Greim    Email : greim@sbsvax.informatik.uni-saarland.dbp.de
                 or    : ...!uunet!unido!sbsvax!greim
[.signature removed by the board of censors for electronic mail's main
executive computer because it contained a four letter word ("word")]