[net.unix] possible bug in soabort

goldfarb@ucf-cs.UUCP (Ben Goldfarb) (09/20/84)

[]

I've been chasing this one for a while to no avail and our net.bugs.4bsd
is missing a lot of bug reports because of an expire problem.  I can give
only sketchy details on what is happening at the moment, but I hope 
someone else out there has seen the problem and can point me to a fix.

There seems to be a race somewhere in the socket abort code for AF_UNIX
SOCK_STREAM sockets.  This simple program causes the system to hang
after the failed accept():

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>

struct sockaddr_un name2, name1 = {AF_UNIX, "socktest"};

main()
{
	int s, s1, ns, i, j, length;
	char *sendm = "Hello";
	char recvm[10];

	length = strlen(name1.sun_path)+2;
	s = socket(AF_UNIX, SOCK_STREAM, 0);
	bind(s, &name1, length);
	listen(s, 5);
	s1 = socket(AF_UNIX, SOCK_STREAM, 0);
	connect(s1, &name1, length);
	send(s1, sendm, 6);
	printf("sizeof(name2) = %d\n", sizeof(name2));
	if ((ns = accept(s, &name2, sizeof(name2)) < 0)
		exit(perror("accept"));
	recv(ns, recvm, sizeof recvm);
	printf("Received %s\n", recvm);
	unlink("socktest");
}

----
(I realize that the third parameter to accept() is incorrect, but
this code is almost identical to that which a student wrote, causing
the system to hang.  If any user can do this to the system, it is well
worth chasing down.)

I've been unable to get a decent core dump because I have to use the
console to halt the system.  I've used the console to look at where
the system is hanging and it is continually in soabort(), uipc_usrreq(),
and associated routines.  This happens on both a 730 and a 780.

When I enable SYSCALLTRACE in the kernel, the timing is altered such 
that the bug doesn't appear and the process terminates normally.

Has anyone had the misfortune of running into this one?  If so, I'd
appreciate your sharing your experiences with me.

-- 
Ben Goldfarb
University of Central Florida
uucp: {duke,decvax,princeton}!ucf-cs!goldfarb
ARPA: goldfarb.ucf-cs@csnet.relay
csnet: goldfarb@ucf

ron@BRL-TGR.ARPA (09/26/84)

From:      Ron Natalie <ron@BRL-TGR.ARPA>

Theirs a bogus table entry that initially caused AF_UNIX to work
the way you describe.  Someone with a complete set of the bugs
could probably tell you which line in the proto lists to change.

-Ron