mouse@mcgill-vision.UUCP (der Mouse) (05/11/87)
Index: Kernel (probably sys/uipc_usrreq.c), mtXinu 4.3+NFS
Description:
Under certain timing conditions, trying to connect() to a
socket in the AF_UNIX domain, when the process that was
listening to the socket has closed the socket (eg, died - when
a process dies all its file descriptors get closed), will cause
a panic: "trap type 8, code = d05904c2".
I have no idea whether this is present in vanilla 4.3. If so
it is probably in a different form. Certainly the fix I found
is specific to mtXinu's 4.3+NFS.
Repeat-By:
Run the following program:
#include <stdio.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
char *sname = "/tmp/foo";
int s;
struct sockaddr_un sun;
int childpid;
int u1cnt;
sigusr1()
{
u1cnt ++;
}
main()
{
sun.sun_family = AF_UNIX;
strcpy(sun.sun_path,sname);
u1cnt = 0;
signal(SIGUSR1,sigusr1);
childpid = fork();
switch (childpid)
{ case -1:
perror("fork");
exit(1);
break;
case 0:
child();
break;
default:
parent();
break;
}
}
mkill(pid,sig)
int pid;
int sig;
{
if (pid != 1) /* insurance....don't want to KILL init! */
{ kill(pid,sig);
}
}
child()
{
int c;
int s2;
int suns;
unlink(sname);
s = socket(AF_UNIX,SOCK_STREAM,0);
if (bind(s,&sun,sizeof(sun)) < 0)
{ perror("bind");
die:
sleep(1);
mkill(getppid(),SIGKILL);
sleep(1);
unlink(sname);
exit(0);
}
if (listen(s,10) < 0)
{ perror("listen");
goto die;
}
kill(getppid(),SIGUSR1);
for (c=0;c<10;c++)
{ suns = sizeof(sun);
s2 = accept(s,&sun,&suns);
if (s2 < 0)
{ perror("accept");
goto die;
}
close(s2);
}
close(s);
sleep(10);
printf("child done\n");
exit(0);
}
parent()
{
while (u1cnt == 0)
{ pause();
}
while (1)
{ s = socket(AF_UNIX,SOCK_STREAM,0);
if (connect(s,&sun,sizeof(sun)) < 0)
{ perror("connect");
die:
sleep(1);
mkill(childpid,SIGKILL);
sleep(1);
unlink(sname);
exit(0);
}
printf("@");
fflush(stdout);
close(s);
}
}
Watch it print ten @ signs and then watch your console print
messages about trap type 8, panic segmentation fault. I am not
sure, but if your file systems are really busy (in terms of
directory lookups per second) I think this might not work - try
it single-user.
Fix:
The problem appears to be that when the socket file descriptor
is closed, the vnode is released with VN_RELE(). However, this
does not clear the v_socket member of the struct vnode. If the
connect() is done soon enough that the vnode is found in the
cache, v_socket will still be set, except it will be pointing
to a struct socket that has had many important fields destroyed
by unp_detach, the AF_UNIX close routine (I think the struct
socket will also have been freed, but that doesn't matter at
the moment). I changed unp_detach (in sys/uipc_usrreq.c) from
unp_detach(unp)
register struct unpcb *unp;
{
if (unp->unp_vnode) {
VN_RELE(unp->unp_vnode);
unp->unp_vnode = 0;
}
to
unp_detach(unp)
register struct unpcb *unp;
{
if (unp->unp_vnode) {
unp->unp_vnode->v_socket = 0;
VN_RELE(unp->unp_vnode);
unp->unp_vnode = 0;
}
With this fix, our system survived the above program 7 times
out of 7; without the fix, it crashed 2 out of 2 times (I don't
feel like trying lots of crashes). This is not counting the
crash (of a different system, same software) that made me start
looking in the first place.
der Mouse
(mouse@mcgill-vision.uucp)