thomson@utcsrgv.UUCP (Brian Thomson) (12/21/83)
Index: sys/uipc_socket.c h/socketvar.h 4.2BSD
Description:
If you do a select() for writing on a non-blocking
SOCK_STREAM socket, and there is some send queue buffer space
available, it will tell you the socket can be written.
But sosend() insists that all writes to non-blocking sockets
be atomic, and will return EWOULDBLOCK if there is not enough
buffer space for the entire write to go in one shot.
This behaviour is OK for non-stream sockets, but streams
should allow partial writes. A couple of distributed utilities
agree with me ...
Repeat-by:
Both rlogind(1) and telnetd(1) are prepared for partial
socket writes. Try this:
% rlogin localhost
< message of the day >
% cat /usr/dict/words
<blah>
<blah>
<blah>
~^Z (i.e. suspend the rlogin locally)
Stopped
% jobs
[1] Stopped rlogin localhost
%
An iostat at this point will show (unless you happen to exactly fill
the send queue) that your system is being eaten alive by rlogind.
Fix:
Allow partial writes to non-blocking sockets unless the
underlying protocol is atomic. This is consistent with the
behaviour of non-blocking ttys, which are a good model
for stream-oriented sockets.
In file /sys/h/socketvar.h, change:
#define sosendallatonce(so) \
(((so)->so_state & SS_NBIO) || ((so)->so_proto->pr_flags & PR_ATOMIC))
to
#define sosendallatonce(so) \
((so)->so_proto->pr_flags & PR_ATOMIC)
In file /sys/sys/uipc_socket.c, routine sosend(), diff -c shows:
***************
*** 281,286
register int space;
int len, error = 0, s, dontroute;
struct sockbuf sendtempbuf;
if (sosendallatonce(so) && uio->uio_resid > so->so_snd.sb_hiwat)
return (EMSGSIZE);
--- 287,293 -----
register int space;
int len, error = 0, s, dontroute;
struct sockbuf sendtempbuf;
+ int sentsome = 0;
if (sosendallatonce(so) && uio->uio_resid > so->so_snd.sb_hiwat)
return (EMSGSIZE);
***************
*** 324,329
goto release;
}
mp = ⊤
}
if (uio->uio_resid == 0) {
splx(s);
--- 331,337 -----
goto release;
}
mp = ⊤
+ sentsome = 1;
}
if (uio->uio_resid == 0) {
splx(s);
***************
*** 336,342
if (space <= 0 ||
sosendallatonce(so) && space < uio->uio_resid) {
if (so->so_state & SS_NBIO)
! snderr(EWOULDBLOCK);
sbunlock(&so->so_snd);
sbwait(&so->so_snd);
splx(s);
--- 344,353 -----
if (space <= 0 ||
sosendallatonce(so) && space < uio->uio_resid) {
if (so->so_state & SS_NBIO)
! if(sentsome)
! { splx(s); goto release; }
! else
! snderr(EWOULDBLOCK);
sbunlock(&so->so_snd);
sbwait(&so->so_snd);
splx(s);
Reservation:
You should probably HOLD OFF installing this change until
it gets batted about the net a bit. The original behaviour appears
to have been quite deliberate, and although I do think it's wrong,
I'd like to give someone in the know a chance to explain the
unobvious reason that it was right in the first place!
--
Brian Thomson, CSRG Univ. of Toronto
{linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!thomsonka@hou3c.UUCP (Kenneth Almquist) (12/22/83)
It looks like Brian's proposed change would make writes to pipes non-atomic. Kenneth Almquist
thomson@utcsrgv.UUCP (Brian Thomson) (01/17/84)
It's been almost a month since I suggested allowing partial writes to non-blocking stream sockets. Faithful readers will remember that such a change cures a tendency of rlogind(8C) and telnetd(8C) to devour cpu cycles. I also asked for comments. Let's delve into the ol' newsbag and see what lively discussion ensued: >From: ka@hou3c.UUCP (Kenneth Almquist) >Message-ID: <148@hou3c.UUCP> >Date: Thu, 22-Dec-83 10:17:31 EST >Organization: Bell Labs, Holmdel, NJ > >It looks like Brian's proposed change would make writes to pipes non-atomic. > Kenneth Almquist Well ... yes and no. It only affects writes to pipes whose write ends have been marked non-blocking. Such writes are indeed rendered non-atomic, but I don't know how important that is. But note that ordinary, synchronous pipes may not be 'atomic' in 4.2 depending on your definition of that term. Two reasonable definitions are: 1) No other process attempting to write on the socket (pipe) can intersperse data. 2) The written data will be buffered in the kernel at one go, so the write can complete without a corresponding read being performed. Previous Unix systems guaranteed 1) if the write was less than PIPSIZ, and guaranteed 2) if the pipe was not full and the write was less than PIPSIZ. On synchronous stream sockets, the distributed 4.2 guarantees property 1) for writes of any size, but 2) only happens if there is adequate space in the pipe's send queue. My modifications do not affect this behaviour. On non-blocking stream sockets, distributed 4.2 REQUIRES that all writes to pipes be less than PIPSIZ (the limit also exists for non-pipes but may differ in value). It guarantees either that both 1) and 2) will be satisfied or the write will return with an EMSGSIZE error. My modifications remove the size requirement for all stream sockets by permitting partial writes, but sacrifice atomicity by both definitions. I also make select() and send() agree on whether a socket can be written, which was the original intent of the change. If it becomes necessary, atomicity can be restored by employing the (currently unused) so->so_snd.sb_lowat field. Set it to PIPSIZ-1 for pipes, and leave it 0 for everyone else, then change the definition of sowriteable() to include the term sbspace(&so->so_snd) > so->so_snd.sb_lowat instead of sbspace(&so->so_snd) > 0 But I don't want to start using a field that the Berkeley people may have designs for, and besides, I'm not so sure that anything is broken enough to need fixing. Just how dependent are people on atomic pipes, and what kind of atomicity do they need? Hmmm ... I seem to have reached the bottom of the ol' newsbag. Not such a lively discussion after all. Maybe nobody cares. Sniff. -- Brian Thomson, CSRG Univ. of Toronto {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!thomson
thomson@utcsrgv.UUCP (Brian Thomson) (01/18/84)
Oops. I said ...
"On synchronous stream sockets, the distributed 4.2 guarantees
[no interspersal of data from contending writes on the same socket]
for writes of any size..."
Sorry. I was, uh, mistaken. Ok, ok, to not mince words, I was WRONG.
It does no such thing: the send queue is not locked continuously during
long writes.
--
Brian Thomson, CSRG Univ. of Toronto
{linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!thomson