thomson@utcsrgv.UUCP (Brian Thomson) (12/21/83)
Index: sys/uipc_socket.c h/socketvar.h 4.2BSD Description: If you do a select() for writing on a non-blocking SOCK_STREAM socket, and there is some send queue buffer space available, it will tell you the socket can be written. But sosend() insists that all writes to non-blocking sockets be atomic, and will return EWOULDBLOCK if there is not enough buffer space for the entire write to go in one shot. This behaviour is OK for non-stream sockets, but streams should allow partial writes. A couple of distributed utilities agree with me ... Repeat-by: Both rlogind(1) and telnetd(1) are prepared for partial socket writes. Try this: % rlogin localhost < message of the day > % cat /usr/dict/words <blah> <blah> <blah> ~^Z (i.e. suspend the rlogin locally) Stopped % jobs [1] Stopped rlogin localhost % An iostat at this point will show (unless you happen to exactly fill the send queue) that your system is being eaten alive by rlogind. Fix: Allow partial writes to non-blocking sockets unless the underlying protocol is atomic. This is consistent with the behaviour of non-blocking ttys, which are a good model for stream-oriented sockets. In file /sys/h/socketvar.h, change: #define sosendallatonce(so) \ (((so)->so_state & SS_NBIO) || ((so)->so_proto->pr_flags & PR_ATOMIC)) to #define sosendallatonce(so) \ ((so)->so_proto->pr_flags & PR_ATOMIC) In file /sys/sys/uipc_socket.c, routine sosend(), diff -c shows: *************** *** 281,286 register int space; int len, error = 0, s, dontroute; struct sockbuf sendtempbuf; if (sosendallatonce(so) && uio->uio_resid > so->so_snd.sb_hiwat) return (EMSGSIZE); --- 287,293 ----- register int space; int len, error = 0, s, dontroute; struct sockbuf sendtempbuf; + int sentsome = 0; if (sosendallatonce(so) && uio->uio_resid > so->so_snd.sb_hiwat) return (EMSGSIZE); *************** *** 324,329 goto release; } mp = ⊤ } if (uio->uio_resid == 0) { splx(s); --- 331,337 ----- goto release; } mp = ⊤ + sentsome = 1; } if (uio->uio_resid == 0) { splx(s); *************** *** 336,342 if (space <= 0 || sosendallatonce(so) && space < uio->uio_resid) { if (so->so_state & SS_NBIO) ! snderr(EWOULDBLOCK); sbunlock(&so->so_snd); sbwait(&so->so_snd); splx(s); --- 344,353 ----- if (space <= 0 || sosendallatonce(so) && space < uio->uio_resid) { if (so->so_state & SS_NBIO) ! if(sentsome) ! { splx(s); goto release; } ! else ! snderr(EWOULDBLOCK); sbunlock(&so->so_snd); sbwait(&so->so_snd); splx(s); Reservation: You should probably HOLD OFF installing this change until it gets batted about the net a bit. The original behaviour appears to have been quite deliberate, and although I do think it's wrong, I'd like to give someone in the know a chance to explain the unobvious reason that it was right in the first place! -- Brian Thomson, CSRG Univ. of Toronto {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!thomson
ka@hou3c.UUCP (Kenneth Almquist) (12/22/83)
It looks like Brian's proposed change would make writes to pipes non-atomic. Kenneth Almquist
thomson@utcsrgv.UUCP (Brian Thomson) (01/17/84)
It's been almost a month since I suggested allowing partial writes to non-blocking stream sockets. Faithful readers will remember that such a change cures a tendency of rlogind(8C) and telnetd(8C) to devour cpu cycles. I also asked for comments. Let's delve into the ol' newsbag and see what lively discussion ensued: >From: ka@hou3c.UUCP (Kenneth Almquist) >Message-ID: <148@hou3c.UUCP> >Date: Thu, 22-Dec-83 10:17:31 EST >Organization: Bell Labs, Holmdel, NJ > >It looks like Brian's proposed change would make writes to pipes non-atomic. > Kenneth Almquist Well ... yes and no. It only affects writes to pipes whose write ends have been marked non-blocking. Such writes are indeed rendered non-atomic, but I don't know how important that is. But note that ordinary, synchronous pipes may not be 'atomic' in 4.2 depending on your definition of that term. Two reasonable definitions are: 1) No other process attempting to write on the socket (pipe) can intersperse data. 2) The written data will be buffered in the kernel at one go, so the write can complete without a corresponding read being performed. Previous Unix systems guaranteed 1) if the write was less than PIPSIZ, and guaranteed 2) if the pipe was not full and the write was less than PIPSIZ. On synchronous stream sockets, the distributed 4.2 guarantees property 1) for writes of any size, but 2) only happens if there is adequate space in the pipe's send queue. My modifications do not affect this behaviour. On non-blocking stream sockets, distributed 4.2 REQUIRES that all writes to pipes be less than PIPSIZ (the limit also exists for non-pipes but may differ in value). It guarantees either that both 1) and 2) will be satisfied or the write will return with an EMSGSIZE error. My modifications remove the size requirement for all stream sockets by permitting partial writes, but sacrifice atomicity by both definitions. I also make select() and send() agree on whether a socket can be written, which was the original intent of the change. If it becomes necessary, atomicity can be restored by employing the (currently unused) so->so_snd.sb_lowat field. Set it to PIPSIZ-1 for pipes, and leave it 0 for everyone else, then change the definition of sowriteable() to include the term sbspace(&so->so_snd) > so->so_snd.sb_lowat instead of sbspace(&so->so_snd) > 0 But I don't want to start using a field that the Berkeley people may have designs for, and besides, I'm not so sure that anything is broken enough to need fixing. Just how dependent are people on atomic pipes, and what kind of atomicity do they need? Hmmm ... I seem to have reached the bottom of the ol' newsbag. Not such a lively discussion after all. Maybe nobody cares. Sniff. -- Brian Thomson, CSRG Univ. of Toronto {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!thomson
thomson@utcsrgv.UUCP (Brian Thomson) (01/18/84)
Oops. I said ... "On synchronous stream sockets, the distributed 4.2 guarantees [no interspersal of data from contending writes on the same socket] for writes of any size..." Sorry. I was, uh, mistaken. Ok, ok, to not mince words, I was WRONG. It does no such thing: the send queue is not locked continuously during long writes. -- Brian Thomson, CSRG Univ. of Toronto {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!thomson