robert@SRI-SPAM.ARPA.UUCP (06/06/86)
Hello there, I have a question regarding the implementation of TCP used on the SUN computers. Specifically the question concerns version 2.2 of SUN UNIX, but will no doubt extend to their later (and earlier) releases. The question follows; I have had occasion to use non-blocking sockets (TCP links) as a link across the Internet between 2 or more SUNs. Empirically, I have discovered that there is a limit of 2048 bytes which can be written in a single non-blocking write. Anything more than that and an error is returned "EMSGSIZE", which indicates that the internal buffer is only 2048 bytes. Note that if I use blocking sockets, the size is unlimited. There is also a limit of 4096 bytes total which can be held "in the pipe", before the receiving side of the socket must do a read to clear the internal buffers. In a Client/Server relationship, the server will read the bytes which the client writes into the socket. I've noticed that with non-blocking sockets where greater than 2K bytes can be written, that continuous calls to recv() or read() by the server will return 2K byte chunks. This leads me to believe that the SUN implementation can only handle a max of 2K byte transfers, and with only two of these before a read must be done. I've talked with SUN tech support and they tell me that this 'seems to be' the case, although without talking directly to an engineer I could not get a good idea of why this is. Finally, my question is this. Why is there a 2K limit, and can it be changed, perhaps with a kernal reconfiguration? It seems to me that putting such a limit on the sockets is a poor implementation for that layer, which should not restrict data size. Shouldn't such 'packetizing' be done at a lower layer than what sockets are the entry point to? Or are sockets actually a lower layer interface than I think? Any comments, pointers, or commiserations would be greatly appreciated. Robert Allen (415) 859-2143, robert@sri-spam.ARPA
root@BU-CS.BU.EDU.UUCP (06/06/86)
There are two questions in your query, 1.> ...Finally, my question is this. Why is there a 2K limit, and >can it be changed, perhaps with a kernal reconfiguration? It almost certainly can be changed given sources, perhaps it can be changed otherwise or perhaps that is a suggestion that might be reasonable for binary sites. Perhaps for performance reasons it doesn't make sense to change this (what is a better number and why? Simply that it would make it easier for you to ignore the fact that you are doing non-blocking I/O is not a great reason.) 2.> It seems >to me that putting such a limit on the sockets is a poor implementation >for that layer, which should not restrict data size. Shouldn't >such 'packetizing' be done at a lower layer than what sockets are >the entry point to? Or are sockets actually a lower layer interface >than I think? This is a different question. Most importantly you state that this occurs only with non-blocking I/O. In the first place, this is not 'packetizing'. You provided 2K bytes and it received 2K bytes (and made it available to your consumer process.) The fact that they are both the same size is neither a coincidence nor a fault of the implementation. Something like packetizing might be implied had you read less than 2K and found the difference thrown off the floor (I guess, not sure there is any way this term can be worked into this conversation sensibly.) At some point a resource runs out, there is no way around that. On blocking (normal mode) I/O this results in a the process being blocked and waiting until resources are available. When you requested non-blocking I/O the system still had to do *something* when resources ran out (as I said, exactly how much of this resource should be allocated to a given process is an entirely different question.) What would you suggest it do? Ignore your I/O? Do it "anyhow"? Block you "anyhow"? I think it is doing the right thing (telling you immediately that I/O is not possible right now.) I think this was simply necessitated by allowing the user to specify non-blocking I/O, there is no fundamental difference between this and blocking I/O except the reason you felt it was "unlimited" in the latter case was only because the system took the responsibility for making you wait, as soon as you requested non-blocking I/O you stated that you wished to take that responsibility on yourself (ie. you probably had something else to do while waiting.) >Robert Allen -Barry Shein, Boston University
mike@BRL.ARPA.UUCP (06/06/86)
Sockets themselves buffer a limited amount of data. The size of this socket buffering varries, depending on the type of socket (ie, pipe, TCP, UDP, etc). The system-wide default for TCP and UDP can be changed in 4.2 and 4.3 VAX UNIX systems by using ADB to poke the kernel variables tcp_sendspace and tcp_recvspace and udp_sendspace and udp_recvspace. These variables become parameters to the soreserve() routine each time a connection is opened. Note that in 4.2 the maximum size is 31k, in 4.3 it is 64k-1. These values are stored in the socket structure in SHORT ints; in 4.2 if the sign bit comes on, the world will break, thus the 31k limit rather than 32k or 64k-1. These variables can not casually be increased to LONG ints, because that would cause the socket struct not to fit into an MBUF, which is currently necessary, if a tad inelegant. 31k of buffering on each end is, in fact, quite reasonable. Also note that in 4.3 there is a parameter to the setsockopt() sys-call that can be used to adjust these values on a per-connection basis, rather than having to change the system-wide defaults. For those of you that have this, this is the preferred method of increasing buffering. I don't know at which SUN version the 4.3 capabilities will emerge, but almost certainly not in SUN 2.x (but I don't actually know). Best, -Mike Muuss
AUERBACH@SRI-CSL.ARPA (Karl Auerbach) (06/06/86)
This may be incorrect, but I have heard that 4.2/4.3 based TCP implementations set the push flag for every user write. This somewhat transforms the TCP stream into packets as viewed by the receiver. I'm sure someone out there knows whether this is true or not. --karl-- (Karl Auerbach, Epilogue Technology Corp.) auerbach@sri-csl.arpa -------
Casner@USC-ISIB.ARPA (Stephen Casner) (06/09/86)
I have found that with Sun release 2.0 "the world will break" if I patch tcp_sendspace and tcp_recvspace to be only 16K, not 32K. At 15K it works, but 16K crashes. I would like very much to get up to 31K because we need that much for efficient use of the Wideband Satellite Network. Any clue as to how I might get there? -- Steve Casner -------
nowicki%rose@SUN.COM (Bill Nowicki) (06/10/86)
I have found that with Sun release 2.0 "the world will break" if I patch tcp_sendspace and tcp_recvspace to be only 16K, not 32K. At 15K it works, but 16K crashes. Please clarify what you mean by "world will break". Last week I did some tests and tried various sizes like 16K and 32K without any "crashes". There is a severe performance problem if one side has a large recvspace and the other has a sendspace that is less than or equal to one quarter of the recvspace. The "silly window syndrome avoidance feature" then takes effect, and you can send only five times the sendspace per second. For example, if a machine with increased recvspace (8K) sends to a machine with the default sendspace of 2K, then throghput is 10Kbytes/second, about twenty times slower than 2K sending to 2K! This is probably true for 4.2 in general, but have not tried the 4.3BSD SWSA code. Note that my experiments are using Sun-3s running 3.x -- you should upgrade from 2.x as soon as you can, since 3.x has several of the 4.2BSD bugs removed. -- Bill Nowicki Sun Micsrosystems
Casner@USC-ISIB.ARPA.UUCP (06/10/86)
I borrowed the quote about the world breaking from Mike Muuss. What I meant by it was that with tcp_sendspace and tcp_recvspace patched to 16K in a 2.0 kernel, the system crashed during the process of booting. I don't remember the details of the crash, but I believe it was a panic or the like. I'd like very much to have my machine upgraded to 3.x, but first we have to buy more memory => more swap space => no user file space => more NFS space... -- Steve Casner -------
mike@BRL.ARPA (Mike Muuss) (06/12/86)
Well, I guess for SUN 2.0, the magic number must be 16k-1. At least this is an improvement :-) -M