[comp.unix.questions] System crash when using sockets

spahni@cui.UUCP (12/23/87)

I have a question about sockets on Unix: I tried to use a socket between
two processes, and regularly crash the system (i.e. the system is looping
somewhere in the kernel and is doing nothing else !). The only way to
recover is to halt the system (^P HALT on Vax, L1-A on Sun) and reboot
it !
The case is simple to reproduce. It arrives when the following operations
are done:

     slave program                master program
     -------------                --------------
     create socket                 create socket
                                       bind()
                                      listen()
                                      accept()
       connect()
       transfert                     transfert
        close()                      (loop on unterminated operation)
       connect()
       transfert
        close()
                                   Abort program (^C or kill -9)

When killing the master program, the system never terminates the 
operation...

The two programs that I wrote are joined below. I just compile them,
execute the master (rcv.c) in background, and start the slave (send.c)
twice. The a take the master in foreground and abort it.
Could someone see if I have a mistake in my system calls ? Or is it
really a bug in the kernel ?

                                       Stephane Spahni
                                       Centre Universitaire d'Informatique
                                       University of Geneva - Switzerland
                                       
PS: The hard/soft on which I have tried these programs are:
     Vax 11/780 running 4.2bsd
     Sun 3/160  running SunOS 3.2
     Sun 3/60   running SunOS 3.4
     Sun 4/xx   running SunOS 3.2Beta

Please respond to:
  mcvax!cernvax!cui!spahni (uucp)
  spahni@cui.unige.ch (x.400)

*********
* rcv.c * (master)
*********

#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <fcntl.h>

int errno;

main ()

{
int code;
int s, snew;
struct sockaddr *name;
struct sockaddr_un sock;
char   buffer[1024];
int    cpt, ind;

  printf ("Try to create socket...");
  if ((s = socket (AF_UNIX, SOCK_STREAM, 0)) == -1) {
    printf ("Error !\n"); exit (1); }
    else printf ("done.\n");

  printf ("Try to bind...");
  sock.sun_family = AF_UNIX;
  strcpy (sock.sun_path, "NameServerSocket");
  if (bind (s, &sock, strlen (sock.sun_path) + 2) != 0) {
    printf ("Error !\n"); exit (2); }
    else printf ("done.\n");

  printf ("Wait for incoming calls...");
  listen (s, 1);
  printf ("done.\n");

  printf ("Accept connection...");
  if ((snew = accept (s, NULL, 0)) == -1) {
    printf ("Error !\n"); exit (3); }
    else printf ("done.\n");

  fcntl(snew,F_SETFL,O_NDELAY) ;

  do {
    while ((cpt = read (snew, buffer, sizeof(buffer))) > 0) 
      for (ind = 0; ind < cpt; ind++) printf ("%4d", buffer[ind] & 0xff);
    } while (errno == 0 || errno == 35); /* 35 = operation would block */

  perror("Error returned");
  printf("\ncpt = %d , errno = %d\n", cpt,errno);
  close (s);
}

**********
* send.c * (slave)
**********

/* Slave program: try to connect to the owner of a socket and send him
   bytes from 0 to 255 (twice each byte) */

#include <sys/types.h>
#include <sys/socket.h>
#include <sys/un.h>
main ()

{
int code, cpt, ind;
int s;
struct sockaddr *name;
struct sockaddr_un sock;
char   c;
 
  printf ("Try to create socket...");
  if ((s = socket (AF_UNIX, SOCK_STREAM, 0)) == -1) {
    printf ("Error !\n"); exit (1); }
    else printf ("done.\n");

  printf ("Try to connect...");
  sock.sun_family = AF_UNIX;
  strcpy (sock.sun_path, "NameServerSocket");
  if ((code = connect (s, &sock, strlen(sock.sun_path) + 2)) != 0) {
    printf ("Error !\n"); exit (3); }
    else printf ("done.\n");

  for (ind = 0; ind < 255; ind++) {
    c = ind;
    cpt = write (s, &c, 1);
    cpt = write (s, &c, 1);
    }

  printf ("Close the socket.\n"); 
  close (s);
}

jamesa%betelgeuse@Sun.COM (James D. Allen) (01/06/88)

In article <86@cui.UUCP>,  Stephane Spahni writes:
> I have a question about sockets on Unix: I tried to use a socket between
> two processes, and regularly crash the system (i.e. the system is looping
> somewhere in the kernel and is doing nothing else !). The only way to
> recover is to halt the system (^P HALT on Vax, L1-A on Sun) and reboot
> it !

Stephane's program neglects to recognize that read returning 0 (EOF) is
different from errno 35 (WOULDBLOCK).  In the former case, the other end
has closed its connection so it's time to try another accept().

This is no excuse for the kernel hang of course.  (The faulty zomby disables
interrupts; otherwise it would just be another "something is hung (wont die)".)
The fact that this bug has not been fixed yet suggests that very few
programmers out there are trying to "accept connections".

The hang arises when soclose() tries to empty so_q but can't.  I don't have an
authoritative fix but sofree() will eventually do the cleanup so just deleting
the first block of code in soclose() lets my machine run Stephane's "test".

For you kids who want to duplicate Stephane's crash, the following should
suffice:

#include <sys/types.h>
#include <sys/socket.h>

short ugly[] = { AF_UNIX, 'GU', 'YL', 0 };

main()
{
	int sr = socket(AF_UNIX, SOCK_STREAM, 0);
	int sw = socket(AF_UNIX, SOCK_STREAM, 0);

	bind(sr, ugly, 6);
	listen(sr, 1);
	connect(sw, ugly, 6);
}
/* -- james allen  */