spahni@cui.UUCP (SPAHNI Stephane) (12/23/87)
I have a question about sockets on Unix: I tried to use a socket between two processes, and regularly crash the system (i.e. the system is looping somewhere in the kernel and is doing nothing else !). The only way to recover is to halt the system (^P HALT on Vax, L1-A on Sun) and reboot it ! The case is simple to reproduce. It arrives when the following operations are done: slave program master program ------------- -------------- create socket create socket bind() listen() accept() connect() transfert transfert close() (loop on unterminated operation) connect() transfert close() Abort program (^C or kill -9) When killing the master program, the system never terminates the operation... The two programs that I wrote are joined below. I just compile them, execute the master (rcv.c) in background, and start the slave (send.c) twice. The a take the master in foreground and abort it. Could someone see if I have a mistake in my system calls ? Or is it really a bug in the kernel ? Stephane Spahni Centre Universitaire d'Informatique University of Geneva - Switzerland PS: The hard/soft on which I have tried these programs are: Vax 11/780 running 4.2bsd Sun 3/160 running SunOS 3.2 Sun 3/60 running SunOS 3.4 Sun 4/xx running SunOS 3.2Beta Please respond to: mcvax!cernvax!cui!spahni (uucp) spahni@cui.unige.ch (x.400) ********* * rcv.c * (master) ********* #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/un.h> #include <fcntl.h> int errno; main () { int code; int s, snew; struct sockaddr *name; struct sockaddr_un sock; char buffer[1024]; int cpt, ind; printf ("Try to create socket..."); if ((s = socket (AF_UNIX, SOCK_STREAM, 0)) == -1) { printf ("Error !\n"); exit (1); } else printf ("done.\n"); printf ("Try to bind..."); sock.sun_family = AF_UNIX; strcpy (sock.sun_path, "NameServerSocket"); if (bind (s, &sock, strlen (sock.sun_path) + 2) != 0) { printf ("Error !\n"); exit (2); } else printf ("done.\n"); printf ("Wait for incoming calls..."); listen (s, 1); printf ("done.\n"); printf ("Accept connection..."); if ((snew = accept (s, NULL, 0)) == -1) { printf ("Error !\n"); exit (3); } else printf ("done.\n"); fcntl(snew,F_SETFL,O_NDELAY) ; do { while ((cpt = read (snew, buffer, sizeof(buffer))) > 0) for (ind = 0; ind < cpt; ind++) printf ("%4d", buffer[ind] & 0xff); } while (errno == 0 || errno == 35); /* 35 = operation would block */ perror("Error returned"); printf("\ncpt = %d , errno = %d\n", cpt,errno); close (s); } ********** * send.c * (slave) ********** /* Slave program: try to connect to the owner of a socket and send him bytes from 0 to 255 (twice each byte) */ #include <sys/types.h> #include <sys/socket.h> #include <sys/un.h> main () { int code, cpt, ind; int s; struct sockaddr *name; struct sockaddr_un sock; char c; printf ("Try to create socket..."); if ((s = socket (AF_UNIX, SOCK_STREAM, 0)) == -1) { printf ("Error !\n"); exit (1); } else printf ("done.\n"); printf ("Try to connect..."); sock.sun_family = AF_UNIX; strcpy (sock.sun_path, "NameServerSocket"); if ((code = connect (s, &sock, strlen(sock.sun_path) + 2)) != 0) { printf ("Error !\n"); exit (3); } else printf ("done.\n"); for (ind = 0; ind < 255; ind++) { c = ind; cpt = write (s, &c, 1); cpt = write (s, &c, 1); } printf ("Close the socket.\n"); close (s); }
jas@llama.rtech.UUCP (Jim Shankland) (01/08/88)
In article <88@cui.UUCP> spahni@cui.UUCP (SPAHNI Stephane) writes: >I have a question about sockets on Unix: I tried to use a socket between >two processes, and regularly crash the system (i.e. the system is looping >somewhere in the kernel and is doing nothing else !). The only way to >recover is to halt the system (^P HALT on Vax, L1-A on Sun) and reboot >it ! Looks like a kernel bug I've encountered before. Somewhere in the UNIX domain socket code are lines to this effect: /* Close any half-open connections of an exiting process. */ while (the list of half-open connections != NULL) soclose( head-of-list ); This loop has the property that it will either execute not at all, or forever, since the list pointer is not advanced. Solution (well, workaround): user Internet domain sockets instead of UNIX domain sockets. They're only half as fast, even intra-machine, but they DO work. Jim Shankland ..!ihnp4!cpsc6a!\ sun!rtech!jas .!ucbvax!mtxinu!/