tbray@watdragon.waterloo.edu (Tim Bray) (02/19/90)
To state the problem simply: Standard bsd-style co-operative socket setup via socket(), bind(), listen(), socket(), connect(), accept(). The calls work fine and the two processes are talking. Problem is that the server when it does a select() that includes the new connection in the bitmask right after it's established, may see an exception condition advertised. This is not the case on `real' Berkeley systems. However, if that exception is ignored, they can proceed to use the connection just fine. The details: Fairly standard TCP/IP application (*every* function return value is checked; suppressed for brevity). The server process does the following: int s; struct sockaddr_in server; s = socket(AF_INET, SOCK_STREAM, 0); server.sin_family = AF_INET; server.sin_addr.s_addr = INADDR_ANY; server.sin_port = 0; bind(s, &server, sizeof(server)); length = sizeof(server); getsockname(s, &server, &length); /* advertise the socket id (server.sin_port) */ /* ... later ... */ listen(s, 5); /* in event loop */ new_client = accept(s, 0, 0); /* and chatter away */ The client does the following hp = gethostbyname(host); server.sin_family = AF_INET; bcopy(hp->h_addr, &server.sin_addr, hp->h_length); /* kernel is the advertised socket from the server */ server.sin_port = htons(kernel); session = socket(AF_INET, SOCK_STREAM, 0); connect(session, &server, sizeof(server)); Everything's OK now, the client & server are talking. Now the server drops into a loop where among other things he does a non-blocking select() on several input files including the just-established 'new_client' to see what's up. The first time is very soon after the accept() call just above. Depending on circumstances, the client may have sent a message through the socket, which may or may not have arrived. Imagine the server's surprise when, on 386/ix 2.0.2, TCP/IP 1.1.2, select() says there's an exception! errno isn't set. So I tried doing a read() of one byte on the exception-labelled file; it failed, complaining about EBADMSG (the sys_errlst message however does not correspond to the comment in sys/errno.h, grrr). This code is known to work on sun, ultrix, 4.3, sequent, etc., etc.. In a fit of desperation, I put in a hack saying 'Ignore exceptional conditions on that first select()' (a sick, twisted thing to do) and everything went just fine. All this ugly stuff is hidden down in low-level routines so that the software that's really doing work knows nothing about sockets or such ugliness. So one smart(?) answer would be to recode it all using Real System Five stuff... Well anyhow, Tim Bray Open Text Systems, Waterloo, Ont.