[comp.bugs.sys5] NOFILES bug in System V.3

Walter_James_DeReu@cup.portal.com (11/22/88)

I've lost the posting, but a System V.3 problem was reported earlier
which occurred when NOFILES exceeded 20, files were opened via system
calls (bypassing fopen()), and the stdio routines were used.  The _bufendtab
array is indexed by the UNIX file descriptor but dimensioned under the
assumption that file descriptors won't exceed 20, and memory following
that array gets corrupted.

I thought I could work around this by ensuring that fopen'd files had
file descriptors < 20 while using larger file descriptors for files being
accessed via direct calls to open(), read(), and write().  This almost
worked, but I encountered a problem when I called sscanf() with a
file open as file descriptor 20.

I don't have source, but after much time in sdb I believe I know what is
happening.  Sscanf() apparently dummies up a FILE struct and then calls
_doscan() to do the real work.  It uses the string being scanned as the
"buffer" in the FILE struct and sets to file descriptor to 20.  When
_doscan() reaches the end of your the, it calls _filbuf() to read the
next block from the file into the "buffer".  If file descriptor 20 isn't
an open file, the read fails, _doscan() returns, and everything is fine.
But if a file is open for reading as file descriptor 20, it will be
read into the "buffer" -- which is really the string being scanned.
Furthermore, the number of bytes read into the "buffer" is _bufendtab[20] -
the address of the buffer.  In my case, this was 0 - 0x7ffffsomething,
with an unsigned result of about 2 billion characters!  This does
a bit of damage to the stack.

This can be demonstrated with the following program:

	#include <stdio.h>
	main()
	{
		char buf1[100];
		char buf2[10];
		char buf3[100];
		int i;

		strcpy(buf1, "Buffer 1");
		strcpy(buf2, "01/02/03");
		strcpy(buf3, "Buffer 3");
		while (dup(0) != 20)
			;
		sscanf(buf2, "%d/%d/%d", &i, &i, &i);
		printf("Buf1: %s\nBuf 2: %s\nBuf 3: %s\n");
	}

You must have NOFILES configured for more than 20 (or the while-dup
will loop forever).  On the two machines I have tested (AT&T 3B15
and an 80386 running Interactive UNIX) the sscanf() will cause a
read from stdin.  If you type a fairly long string, sscanf() will
trash both buf2 and buf3 before returning.  If you type a REALLY
long string or redirect your input to a large file, a core dump ensues.

I hesitate to call this a bug because I believe AT&T has documented that
you can't use stdio in programs which use more than 20 files.  On the other
hand, certain standard UNIX programs (such as the C preprocessor) dump
core if they inherit 15 or so open files and are given a task to perform
which requires them to open more than 5 files (such as compiling a program
with deeply nested #includes).

I am currently working around this by arranging for /dev/null to be
opened as file descriptor 20.  This keeps my buffer from getting trashed
but is hardly elegant.  Does anyone have a better solution?

johnl@n3dmc.UU.NET (John Limpert) (11/23/88)

In article <11616@cup.portal.com> Walter_James_DeReu@cup.portal.com writes:
>I am currently working around this by arranging for /dev/null to be
>opened as file descriptor 20.  This keeps my buffer from getting trashed
>but is hardly elegant.  Does anyone have a better solution?

I ran into this when writing some code that emulated UNIX system calls
on a real-time system.  The problem is the brain damaged implementation
of sprintf and sscanf, they both use file descriptor #20 and assume
the read and write system calls will fail due to the illegal descriptor.
This trick saves a few bytes of code at the cost of generating bogus
and useless system calls.  The 'illegal' descriptor number could be
changed or the implementation of sprintf and sscanf could be fixed.

-- 
John A. Limpert
UUCP:	johnl@n3dmc.UUCP, johnl@n3dmc.UU.NET, uunet!n3dmc!johnl

guy@auspex.UUCP (Guy Harris) (11/24/88)

>Sscanf() apparently dummies up a FILE struct and then calls
>_doscan() to do the real work.  It uses the string being scanned as the
>"buffer" in the FILE struct and sets to file descriptor to 20.

This is true.  Some means of indicating that the FILE struct in question
refers to a string, rather than a file, is necessary; you have just
discovered why the method used by System V Releases 2 and 3 (and perhaps
earlier releases) is wrong.  The one used by 4BSD (and perhaps by V7 or
32V) is better; it uses the _IOSTRG flag to indicate that the FILE
struct refers to a string.

An even better one would permit you to create standard I/O streams that
can refer to arbitrary objects, with "put buffer" and "get buffer"
operations; this would 1) permit you to use more than "sprintf" and
"sscanf" on strings, and 2) might permit you to do things like support
string objects that grow dynamically as needed, as well as fixed-length
strings.  I think Chris Torek has done such a version of standard I/O,
which may appear in a future BSD release.