amit@cybvax0.UUCP (Amit Green) (09/09/87)
Hello, I am implementing a secure version of sprintf(3S), something that won't randomly overrun buffers. Until now I have been using a large buffer, calling sprintf(3S), and checking that it didn't overwrite the last element of the buffer, in which a '\0' had been placed, aborting with a "memory corrupted by sprintf" error if it did. Although workable, and probably reasonably proof against errors, I prefer to implement something more secure. Since we don't have source; its a bit hard to find how sprintf(3S) works in some of the more obscure areas. I would like help from anyone who can legally divulge it* (See footnote). I especially want to know what the output for %e, %f, and %g formats are with various field width's and precision. Looking at "nm -pg /lib/libc.a" seems to indicate that _doprnt(3S) does not call any of the ecvt(3) routines, as I had espected, making emulating these routines rather difficult. Unless I can find some way to exactly match the output of these formats with all different field options, I am going to reconstruct a "%" format (with field widths under 128 characters to avoid the overruning internal buffers) and calling sprintf(3S) on this small buffer, then copying it with bound checking to the user buffer; I prefer a different way if possible, thus this note. I have been testing the sprintf(3S) implementation on our BSD 4.2 system (actually Ultrix 1.?); and have come to the following conclusions on the other simpliar formats: 1. "%*s", -10, "hi" Will left adjust "hi". "%-*s", -10, "hi" Will right adjust "hi" (the two negatives cancel) 2. "%-05d", 2 Will ignore the zero-padding. 3. "%05d", -2 Will zero-pad after the "-" sign. 4. "%05#x", 2 Will zero-pad after the "0x" prefix. 6. "%5%" Will actually put the "%" in the specified width, left/right justfied, etc. 7. "%05s", "hi" Will zero-pad the string. I expected it not to, but on second thought, this does have some uses. The rest of the formats seems to be as expected, any illogical fields, such as "%-0.#ls", are ignored. Please respond by mail; I doubt many people on the net are interested in this. Thank you. {mit-eddie,harvard}!cybvax0!amit Amit Green Footnote ========= *That is, I believe looking at AT&T code and explaining it's algorithm it not allowed due to trade-secret status. However, if anyone with a minix system can send me source code for just the relevant parts legally, please do so [Does minix even do floating point? I somehow get the feeling it might not]. Thank you.
ron@topaz.rutgers.edu (Ron Natalie) (09/09/87)
Checking for a null to find the end of the buffer is not likely to work at all. The first argument to sprintf is not required to have anything in it at all (and in many cases is all zeros). -Ron
chris@mimsy.UUCP (Chris Torek) (09/11/87)
[Our sys file was broken for a while, and I think this did not go out to the net in general. Apologies if this is a repeat.] In article <1484@cybvax0.UUCP> amit@cybvax0.UUCP (Amit Green) writes: >I am implementing a secure version of sprintf(3S), something that won't >randomly overrun buffers. A curious coincidence.... >Since we don't have source; its a bit hard to find how sprintf(3S) >works in some of the more obscure areas. ... I would like help from >anyone who can legally divulge it* (See footnote). This is no help, as sprintf() works differently in different Unixes. 4BSD has a flag called _IOSTRG, I/O to string, but this is not in fact ever tested, and writing to a struct _iobuf that has _IOSTRG and _IOWRT set can surprise you (it certainly surprised *me*). I believe at least one version of Unix uses special static buffer structures to tell `file' buffers from `string' buffers. All of which brings me to. . . . I have reimplemented stdio for 4.3BSD with a new interface. All of the old documented functions are still there with the same interface; some undocumented functions are gone, and (due to widespread use) some still remain. There is one new interface: FILE *funopen( void *p, /* `char *p' in pre-ANSI-C version */ int (*readfn)(void *p, char *buf, int n), int (*writefn)(void *p, const char *buf, int n), long (*seekfn)(void *p, off_t off, int whence), int (*closefn)(void *p)); There are two aliases: #define fropen(p, fn) funopen(p, fn, NULL, NULL, NULL) #define fwopen(p, fn) funopen(p, NULL, fn, NULL, NULL) /* nb. there are casts in the pre-ANSI-C version */ The functions `readfn', `writefn', `seekfn', and `closefn' are called to perform read, write, seek, and close operations; they are expected to return the same values as a corresponding read, write, seek, or close call if the argument `p' were a file descriptor. The pointer `p' is never examined, and is passed to the four functions; it serves as an identifier, if required. If any of readfn, writefn, or seekfn are given as NULL, the corresponding operation is disabled; if closefn is NULL, no special operation is done at close time beyond flushing any buffered text. Note that given this interface, it is possible (though not necessarily efficient) to build any other interface, including the original stdio interface. In particular, it is possible to implement both a `safe sprintf' and a `dynamic string sprintf'. The former is not as efficient as it might be, and for this and a few other reasons, it might be best to present one more interface: FILE *memopen(char *addr, unsigned int len, char *mode); /* e.g., f = memopen(buf, sizeof (buf), "w") */ This would permit a `safe sprintf' as (any variation on) the following: int ssprintf(s, len, fmt, va_alist) char *s; unsigned int len; char *fmt; va_dcl { va_list ap; FILE *f; if ((f = memopen(s, len, "w")) == NULL) return (-1); /* could not write */ va_start(ap); (void) vfprintf(f, fmt, ap); va_end(ap); (void) putc(0, f); if (ferror(f)) { /* buffer overflow */ (void) fclose(f); return (1); /* note that string is not \0 terminated */ } return (0); /* all OK */ } Note that memopen() is purely for efficiency: ssprintf() can still be implemented without it, as long as one has funopen(). I hope that this interface, and perhaps my implementation, will be distributed with some future release of 4BSD (if there is in fact such a release [hedge, hedge]). If you have strong feelings on memopen, or on the names for any of the new functions, you might do well to send me mail by October or so, `before the concrete sets'. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris