[comp.bugs.4bsd] ungetc will put characters back into an _IOSTRG input stream

casey@lll-crg.llnl.gov (Casey Leedom) (02/13/89)

Description:
	Ungetc will put characters back into an _IOSTRG input stream.  This
	will cause a segmentation violation if the string is in read only
	memory (shared text segment).  For example:

		if (!some_option) {
			opt = getenv(...);
			some_option = opt ? opt : "some_string";
		}
		...
		sscanf(some_option, "some_format", ...);

	In some cases, some_option will refer to the constant string
	"some_string".  Since sscanf uses ungetc to back up over input
	it decides it hadn't wanted to look at yet, we write into the
	constant string "some_string".

Repeat-By:
	Use gcc to compile the following program without the
	-fwritable-strings option:

	main()
	{
		int i, j;
	
		sscanf("4x4", "%dx%d", &i, &j);
		printf("%d %d\n", i, j);
	}

Fix:
	Debatable.  I fixed ungetc, but who knows?  Maybe we do want to
	be able to do ungetc's on a string argument.  Currently _IOSTRG
	is only used internally for sscanf and sprintf, so changing ungetc
	doesn't break anything now.

	Another option would be to replace all uses of ungetc in doscan
	with fseek(iop, -1L, 1) but I don't think fseek works on _IOSTRG
	arguments and I think it's a lot more expensive to call.

	Another option is to separate out the IO stream backup function
	from putting a [possibly different] character back on the stream.
	But this involves either defining a new interface, or extending
	the definition of ungetc's interface to accept something like
	ungetc(__IOS_BACKUP);

	And Finally, we could have read only streams and simply have
	sscanf set that flag in addition to _IOSTRG.  But this would add
	extra processing for every putc, etc.

	*** lib/libc/stdio/ungetc.c-dist	Wed Mar 26 18:08:43 1986
	--- lib/libc/stdio/ungetc.c	Sun Feb 12 13:45:30 1989
	***************
	*** 18,24 ****
	  			return (EOF);
	  
	  	iop->_cnt++;
	! 	*--iop->_ptr = c;
	  
	  	return (c);
	  }
	--- 18,27 ----
	  			return (EOF);
	  
	  	iop->_cnt++;
	! 	if (iop->_flag & _IOSTRG)
	! 		--iop->_ptr;
	! 	else
	! 		*--iop->_ptr = c;
	  
	  	return (c);
	  }

kre@cs.mu.oz.au (Robert Elz) (02/14/89)

In article <20282@lll-winken.LLNL.GOV>, casey@lll-crg.llnl.gov (Casey Leedom) writes:
> Description:
> 	Ungetc will put characters back into an _IOSTRG input stream. 
> Fix:
> 	Debatable.

Maybe better to have ungetc() look at the character that's in the
buffer - if its the same as the one being pushed back (which it
always will be in sscanf()) then don't do the write.  ungetc() is
rare enough that the extra mem read & test aren't going to matter.

That way you don't get a segmentaion violation with write only strings
if you're just doing sscanf(), but you will if you try random other
ungetc's on a string (somehow).

You may also win doing this with normal writeable buffers - if the buffer
is currently mapped shared read, copy on write, you may be able to avoid the
copy (yes, I know, that will happen once a century...)

kre

aida@porthos.csl.sri.com (Hitoshi Aida) (02/14/89)

In article <20282@lll-winken.LLNL.GOV> casey@lll-crg.llnl.gov (Casey Leedom) writes:
>Fix:
>	Debatable.  I fixed ungetc, but who knows?  Maybe we do want to
>	be able to do ungetc's on a string argument.  Currently _IOSTRG
>	is only used internally for sscanf and sprintf, so changing ungetc
>	doesn't break anything now.
>
>	Another option would be to replace all uses of ungetc in doscan
>	with fseek(iop, -1L, 1) but I don't think fseek works on _IOSTRG
>	arguments and I think it's a lot more expensive to call.
>
>	Another option is to separate out the IO stream backup function
>	from putting a [possibly different] character back on the stream.
>	But this involves either defining a new interface, or extending
>	the definition of ungetc's interface to accept something like
>	ungetc(__IOS_BACKUP);

I think this is the correct option.  I prefer a new interface.
Anyway, for the moment, following is more consistent.

	*** lib/libc/stdio/ungetc.c-dist	Wed Mar 26 18:08:43 1986
	--- lib/libc/stdio/ungetc.c	Sun Feb 12 13:45:30 1989
	***************
	*** 18,24 ****
	  			return (EOF);
	  
	  	iop->_cnt++;
	! 	*--iop->_ptr = c;
	  
	  	return (c);
	  }
	--- 18,31 ----
	  			return (EOF);
	  
	  	iop->_cnt++;
	! 	if (iop->_flag & _IOSTRG) {
	! 		if ((*--iop->_ptr & 0377) != c) {
	!			iop->_cnt--;
	!			iop->_ptr++;
	!			return (EOF);
	!		}
	! 	} else
	! 		*--iop->_ptr = c;
	  
	  	return (c);
	  }

Hitoshi AIDA (aida@csl.sri.com)
Computer Science Laboratory, SRI International

jack@cwi.nl (Jack Jansen) (02/14/89)

I ran into this problem as well, some time ago.

My fix was to have ungetc() only push the character back if it
isn't there already, i.e.
	if( *--fp->_ptr != ch )
		*fp->_ptr = ch;
Makes sscanf work with read-only strings, and still keeps the
semantics the same.
-- 
--
Fight war, not wars			| Jack Jansen, jack@cwi.nl
Destroy power, not people! -- Crass	| (or mcvax!jack)

ditto@cbmvax.UUCP (Michael "Ford" Ditto) (02/14/89)

In article <20282@lll-winken.LLNL.GOV> casey@lll-crg.llnl.gov (Casey Leedom) writes:
>	! 	if (iop->_flag & _IOSTRG)
>	! 		--iop->_ptr;
>	! 	else
>	! 		*--iop->_ptr = c;

As an alternative to the above four lines, why not:

	if (*--iop->_ptr != (char)c)
		*iop->ptr = c;

This will give anyone ungetc'ing something weird into a readonly string
the segmentation fault they deserve, and still allow ungetc'ing into
a writable string (even though nobody should be doing that).
-- 
					-=] Ford [=-

"The number of Unix installations	(In Real Life:  Mike Ditto)
has grown to 10, with more expected."	ford@kenobi.cts.com
- The Unix Programmer's Manual,		...!sdcsvax!crash!elgar!ford
  2nd Edition, June, 1972.		ditto@cbmvax.commodore.com

karl@haddock.ima.isc.com (Karl Heuer) (02/22/89)

The solutions I've seen all involve changing ungetc().  It would also be
possible to change the scanf() internals to just decrement _ptr instead of
calling ungetc(); I believe that, in the usual stdio implementation, this
would do the right thing for both string files and real files.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint