robert@SPAM.ISTC.SRI.COM (Robert Allen) (04/07/88)
Recently I was experimenting with client and server processes under SunOS 3.4, using the 4.2 BSD networking kernel, which used TCP sockets to carry out Inter Process Communication (IPC). I dis- covered several points which I think have some relevance to this list so I thought I'd mention them and open the topic for discussion. Note: I discuss the issues below. I've included dumps from netstat below the discussion to show exactly what I saw; they are labled appropriately. Background ---------- - 1 Server process listening at a port number for connections from processes attemping connections at that port. The backlog to the UNIX listen() call is 5. - 30 client processes attempting, somewhat sequentially, to connect to the server at the agreed upon port number. Issues ------ - Due to UNIX implementation issues, the Server can accept upto 26 client connections. After that point the limit on the max. number of open file descriptors per-process will be exceeded. - If the Server attempts to support a 27th client connection then two things happen simultaneously: 1. The Server hears the connection request, and attempts an accept(). The accept() fails due to the UNIX restriction on the max. number of open file descriptors per process. Thus, the Server cannot accept the 27th connection. 2. The Client makes a connect() call, and the call succeeds! A netstat -a shows the TCP connection for that socket as being "ESTABLISHED". The connections in the listen() backlog are all ESTABLISHED as well. These connections can be picked up by the Server routine if the Server closes some of its' open file descriptors (previously accepted connections) and does some more accept()s. - The 'netstat -a' shows that the "Recv-Q" is getting filled, but since the Server process cannot accept the socket/connection, it cannot read from it, and hence the system cannot free the data buffered up. Questions --------- Some people will probably point out that what I've discovered is an excellent reason to implement a session-layer protocol on top of the sockets between the Client and Server here. This is the same conclusion I've arrived at. However, up until now I supposed that the UNIX socket implementation *provided* a session layer protocol. While I was obviously wrong in this instance, I'm curious to know whether I was assigning the socket interface more significance than it merited (ie. it *isn't* a session layer protocol), or if this is a (pardon the word) 'bug' of the implementation. I'm not trying to cast aspersions regarding implementation with this question! Although this test was done on a Sun 3, I will soon be doing this test on a Hewlett-Packard 9000 series computer, and I don't know what results I'll get there. If this is a bug, is it a 4.2 bug? A 4.3 bug? A Sun bug? On a more direct note, what do the entries mean in the "Recv-Q"? Does each element therein correspond to a TCP message, or an mbuf, or what? How high can the Q go (To the TCP high-water mark?)? What happens when it reaches that point (does the system crash, or do the Clients' write() calls block?)? Comments/Explanations will be much appreciated, Robert Allen, robert@spam.istc.sri.com 415-859-2143 (work phone, days) /****************************************************************************/ /* In all the output below only sockets used for the Client or Server are */ /* listed. The listen socket at port 3030 is the Server listen socket. */ /* The command used in all cases is "netstat -a" */ /****************************************************************************/ /* The following netstat was taken as the 30 Client processes were * attempting to connect to the server process. Some are established * and some are still getting established. */ Active connections (including servers) Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp 0 0 milk10.3030 milk10.1643 ESTABLISHED tcp 0 0 milk10.1643 milk10.3030 ESTABLISHED tcp 0 0 localhost.1642 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1641 ESTABLISHED tcp 0 0 milk10.1641 milk10.3030 ESTABLISHED tcp 0 0 localhost.1640 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1639 ESTABLISHED tcp 0 0 milk10.1639 milk10.3030 ESTABLISHED tcp 0 0 localhost.1638 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1637 ESTABLISHED tcp 0 0 milk10.1637 milk10.3030 ESTABLISHED tcp 0 0 localhost.1636 localhost.sunrpc TIME_WAIT tcp 0 0 localhost.1635 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1634 ESTABLISHED tcp 0 0 milk10.1634 milk10.3030 ESTABLISHED tcp 0 0 localhost.1633 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1632 ESTABLISHED tcp 0 0 milk10.1632 milk10.3030 ESTABLISHED tcp 0 0 localhost.1631 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1630 ESTABLISHED tcp 0 0 milk10.1630 milk10.3030 ESTABLISHED tcp 0 0 localhost.1629 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1628 ESTABLISHED tcp 0 0 milk10.1628 milk10.3030 ESTABLISHED tcp 0 0 localhost.1627 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1626 ESTABLISHED tcp 0 0 milk10.1626 milk10.3030 ESTABLISHED tcp 0 0 localhost.1625 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1624 ESTABLISHED tcp 0 0 milk10.1624 milk10.3030 ESTABLISHED tcp 0 0 localhost.1623 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1622 ESTABLISHED tcp 0 0 milk10.1622 milk10.3030 ESTABLISHED tcp 0 0 localhost.1621 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1620 ESTABLISHED tcp 0 0 milk10.1620 milk10.3030 ESTABLISHED tcp 0 0 localhost.1619 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1618 ESTABLISHED tcp 0 0 milk10.1618 milk10.3030 ESTABLISHED tcp 0 0 localhost.1617 localhost.sunrpc TIME_WAIT tcp 0 0 milk10.3030 milk10.1616 ESTABLISHED tcp 0 0 milk10.1616 milk10.3030 ESTABLISHED tcp 0 0 localhost.1615 localhost.sunrpc TIME_WAIT tcp 0 0 localhost.1614 localhost.sunrpc TIME_WAIT tcp 0 0 *.3030 *.* LISTEN /* A few seconds later. All sockets which the Server can accept on are * full. All clients' connect() calls have returned 0, which indicates * that they have connected. Note that 4 connections which show "ESTABLISHED" * have "Recv-Q"'s of 11. These are the Clients which think they are * connected to the Server based on the connect() call return of 0, but * which in actuality are still in the listen backlog of the Server. Left * alone, the Clients will be able to send data through the socket, and will * get no UNIX error indications, because they are connected to the peer TCP * entity. */ Active connections (including servers) Proto Recv-Q Send-Q Local Address Foreign Address (state) tcp 0 0 localhost.1677 localhost.sunrpc TIME_WAIT tcp 11 0 milk10.3030 milk10.1675 ESTABLISHED tcp 0 0 milk10.1675 milk10.3030 ESTABLISHED tcp 11 0 milk10.3030 milk10.1673 ESTABLISHED tcp 0 0 milk10.1673 milk10.3030 ESTABLISHED tcp 11 0 milk10.3030 milk10.1671 ESTABLISHED tcp 0 0 milk10.1671 milk10.3030 ESTABLISHED tcp 11 0 milk10.3030 milk10.1669 ESTABLISHED tcp 0 0 milk10.1669 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1667 ESTABLISHED tcp 0 0 milk10.1667 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1665 ESTABLISHED tcp 0 0 milk10.1665 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1663 ESTABLISHED tcp 0 0 milk10.1663 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1661 ESTABLISHED tcp 0 0 milk10.1661 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1659 ESTABLISHED tcp 0 0 milk10.1659 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1657 ESTABLISHED tcp 0 0 milk10.1657 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1655 ESTABLISHED tcp 0 0 milk10.1655 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1653 ESTABLISHED tcp 0 0 milk10.1653 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1651 ESTABLISHED tcp 0 0 milk10.1651 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1649 ESTABLISHED tcp 0 0 milk10.1649 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1647 ESTABLISHED tcp 0 0 milk10.1647 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1645 ESTABLISHED tcp 0 0 milk10.1645 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1643 ESTABLISHED tcp 0 0 milk10.1643 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1641 ESTABLISHED tcp 0 0 milk10.1641 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1639 ESTABLISHED tcp 0 0 milk10.1639 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1637 ESTABLISHED tcp 0 0 milk10.1637 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1634 ESTABLISHED tcp 0 0 milk10.1634 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1632 ESTABLISHED tcp 0 0 milk10.1632 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1630 ESTABLISHED tcp 0 0 milk10.1630 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1628 ESTABLISHED tcp 0 0 milk10.1628 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1626 ESTABLISHED tcp 0 0 milk10.1626 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1624 ESTABLISHED tcp 0 0 milk10.1624 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1622 ESTABLISHED tcp 0 0 milk10.1622 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1620 ESTABLISHED tcp 0 0 milk10.1620 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1618 ESTABLISHED tcp 0 0 milk10.1618 milk10.3030 ESTABLISHED tcp 0 0 milk10.3030 milk10.1616 ESTABLISHED tcp 0 0 milk10.1616 milk10.3030 ESTABLISHED tcp 0 0 *.3030 *.* LISTEN
chris@GYRE.UMD.EDU (Chris Torek) (04/07/88)
If the server attempts an accept() call which returns with an EMFILE (`too many open files') error, the pending connection should NOT be established. I.e., that it is is a bug. 4.3BSD does not have this bug (see /sys/sys/uipc_syscalls.c$accept(), around line 137). Chris