[comp.unix.wizards] obscure questions on sprintf

amit@cybvax0.UUCP (Amit Green) (09/09/87)

Hello,

I am implementing a secure version of sprintf(3S), something that won't
randomly overrun buffers.

Until now I have been using a large buffer, calling sprintf(3S), and
checking that it didn't overwrite the last element of the buffer, in
which a '\0' had been placed, aborting with a "memory corrupted by
sprintf" error if it did.

Although workable, and probably reasonably proof against errors, I
prefer to implement something more secure.

Since we don't have source; its a bit hard to find how sprintf(3S)
works in some of the more obscure areas.

I would like help from anyone who can legally divulge it* (See
footnote).

I especially want to know what the output for %e, %f, and %g formats
are with various field width's and precision.

Looking at "nm -pg /lib/libc.a" seems to indicate that _doprnt(3S) does
not call any of the ecvt(3) routines, as I had espected, making
emulating these routines rather difficult.

Unless I can find some way to exactly match the output of these formats
with all different field options, I am going to reconstruct a "%"
format (with field widths under 128 characters to avoid the overruning
internal buffers) and calling sprintf(3S) on this small buffer, then
copying it with bound checking to the user buffer; I prefer a
different way if possible, thus this note.

I have been testing the sprintf(3S) implementation on our BSD 4.2
system (actually Ultrix 1.?); and have come to the following
conclusions on the other simpliar formats:

1.  "%*s", -10, "hi"		Will left adjust "hi".
    "%-*s", -10, "hi"		Will right adjust "hi" (the two negatives
				cancel)

2.  "%-05d", 2			Will ignore the zero-padding.

3.  "%05d", -2			Will zero-pad after the "-" sign.

4.  "%05#x", 2			Will zero-pad after the "0x" prefix.

6.  "%5%"			Will actually put the "%" in the specified
				width, left/right justfied, etc.

7.  "%05s", "hi"		Will zero-pad the string.  I expected it
				not to, but on second thought, this does
				have some uses.

The rest of the formats seems to be as expected, any illogical fields,
such as "%-0.#ls", are ignored.

Please respond by mail; I doubt many people on the net are interested
in this.   Thank you.


					{mit-eddie,harvard}!cybvax0!amit
					Amit Green

Footnote
=========
	*That is, I believe looking at AT&T code and explaining it's
algorithm it not allowed due to trade-secret status.  However, if anyone
with a minix system can send me source code for just the relevant parts
legally, please do so [Does minix even do floating point?  I somehow
get the feeling it might not].  Thank you.

ron@topaz.rutgers.edu (Ron Natalie) (09/09/87)

Checking for a null to find the end of the buffer is not likely
to work at all.  The first argument to sprintf is not required
to have anything in it at all (and in many cases is all zeros).

-Ron

chris@mimsy.UUCP (Chris Torek) (09/11/87)

[Our sys file was broken for a while, and I think this did not go out
to the net in general.  Apologies if this is a repeat.]

In article <1484@cybvax0.UUCP> amit@cybvax0.UUCP (Amit Green) writes:
>I am implementing a secure version of sprintf(3S), something that won't
>randomly overrun buffers.

A curious coincidence....

>Since we don't have source; its a bit hard to find how sprintf(3S)
>works in some of the more obscure areas. ... I would like help from
>anyone who can legally divulge it* (See footnote).

This is no help, as sprintf() works differently in different Unixes.
4BSD has a flag called _IOSTRG, I/O to string, but this is not in
fact ever tested, and writing to a struct _iobuf that has _IOSTRG
and _IOWRT set can surprise you (it certainly surprised *me*).  I
believe at least one version of Unix uses special static buffer
structures to tell `file' buffers from `string' buffers.  All of
which brings me to. . . .

I have reimplemented stdio for 4.3BSD with a new interface.  All of
the old documented functions are still there with the same interface;
some undocumented functions are gone, and (due to widespread use)
some still remain.  There is one new interface:

	FILE *funopen(
		void *p,	/* `char *p' in pre-ANSI-C version */
		int (*readfn)(void *p, char *buf, int n),
		int (*writefn)(void *p, const char *buf, int n),
		long (*seekfn)(void *p, off_t off, int whence),
		int (*closefn)(void *p));

There are two aliases:

	#define	fropen(p, fn) funopen(p, fn, NULL, NULL, NULL)
	#define	fwopen(p, fn) funopen(p, NULL, fn, NULL, NULL)
	/* nb. there are casts in the pre-ANSI-C version */

The functions `readfn', `writefn', `seekfn', and `closefn' are
called to perform read, write, seek, and close operations; they
are expected to return the same values as a corresponding read,
write, seek, or close call if the argument `p' were a file descriptor.
The pointer `p' is never examined, and is passed to the four
functions; it serves as an identifier, if required.  If any of
readfn, writefn, or seekfn are given as NULL, the corresponding
operation is disabled; if closefn is NULL, no special operation
is done at close time beyond flushing any buffered text.

Note that given this interface, it is possible (though not necessarily
efficient) to build any other interface, including the original
stdio interface.  In particular, it is possible to implement both
a `safe sprintf' and a `dynamic string sprintf'.  The former is
not as efficient as it might be, and for this and a few other
reasons, it might be best to present one more interface:

	FILE *memopen(char *addr, unsigned int len, char *mode);
	/* e.g., f = memopen(buf, sizeof (buf), "w") */

This would permit a `safe sprintf' as (any variation on) the following:

	int
	ssprintf(s, len, fmt, va_alist)
		char *s;
		unsigned int len;
		char *fmt;
		va_dcl
	{
		va_list ap;
		FILE *f;

		if ((f = memopen(s, len, "w")) == NULL)
			return (-1);	/* could not write */
		va_start(ap);
		(void) vfprintf(f, fmt, ap);
		va_end(ap);
		(void) putc(0, f);
		if (ferror(f)) {
			/* buffer overflow */
			(void) fclose(f);
			return (1);
			/* note that string is not \0 terminated */
		}
		return (0);	/* all OK */
	}

Note that memopen() is purely for efficiency: ssprintf() can still
be implemented without it, as long as one has funopen().

I hope that this interface, and perhaps my implementation, will be
distributed with some future release of 4BSD (if there is in fact
such a release [hedge, hedge]).  If you have strong feelings on
memopen, or on the names for any of the new functions, you might
do well to send me mail by October or so, `before the concrete sets'.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris