scw@cepu.UUCP (Stephen C. Woods) (09/28/84)
**<- squished bugs There seems to be a major problem with the 2.9BSD implementation of uucico on 1200 baud lines talking to 2 different 2.9BSD systems we're only getting: With ucla-an (09/27-15:25) total data 294648 bytes 12924 secs = 22.80 CPS With ucla-an (09/27-15:25) recieved data 0 bytes 0 secs With ucla-an (09/27-15:25) sent data 294648 bytes 12924 secs = 22.80 CPS With ucla-ci (09/27-15:25) total data 1874341 bytes 85608 secs = 21.89 CPS With ucla-ci (09/27-15:25) recieved data 4422 bytes 279 secs = 15.85 CPS With ucla-ci (09/27-15:25) sent data 1869919 bytes 85329 secs = 21.91 CPS Typically with other systems (V7, sysIII, 4.xBSD, and Locus) that I talk to at 1200 baud the Gross Xfer rate is ~100 CPS. Even the V7 system that I talk to at 300 baud is faster (~27 CPS). Has anyone else seen this problem? If so are there any fixes available? Please respond directly to me, I'll post a summary (and any fixes that we discover) to the net. Thanks in advance. -- Stephen C. Woods (VA Wadsworth Med Ctr./UCLA Dept. of Neurology) uucp: { {ihnp4, uiucdcs}!bradley, hao, trwrb, sdcrdcf}!cepu!scw ARPA: cepu!scw@ucla-cs location: N 34 3' 9.1" W 118 27' 4.3"
jerry@oliveb.UUCP (Jerry Aguirre) (10/02/84)
I ran into the same problem of uucico slowness when I installed Unix 2.9BSD. In comparing the sources to the version we were running before I came to the conclusion that the problem was the sleep(1) call in the file pk1.c. for (nchars = 0; nchars < n; nchars += ret) { if (nchars > 0) sleep(1); ret = read(fn, b, n - nchars); if (ret == 0) { alarm(0); return(-1); } PKASSERT(ret > 0, "PKCGET READ", "", ret); b += ret; } Apparently this call was added to improve efficiency. The idea is that if it takes more than one read to receive a packet then you are wasting time and should sleep to allow some other process to run. Meanwhile the line will buffer up characters so you can read a whole packet in one read instead of a single character. This sounds good until you analyze what is really happening! The condition of receiving only a few(1) characters only happens when your system is lightly loaded so you are trying to improve response time when it is good and not doing a damn thing when it is bad. The upshot of this is that during peak load your uucp will run at full speed and during the wee small hours of the night it will limp along. Also the faster the line speed, the slower the data transfers. This sleep is probably ok at 300 baud (Yech! primitive) where it would indeed cut down on the overhead. I noticed little degradation on my system at 1200 baud. But on our internal 9600 baud links the result was terrible! Monitoring the line showed: blink(packet sent) [delay 1 second] blink(ack sent) [delay 1 second] The result is a packet sent once every 2 seconds. There is a modification to this code that checks the line speed in order to determine whether to do a sleep. The test is whether the speed is less that 4800 buad so I doubt it will do anything for you. The better fix is to "nap" for a period less than a second. This requires the "nap" system call which is nonstandard. Another alternative is to compile the pk(2) routines into the kernel. I would suggest that you just comment out the sleep or use your 2.8 uucp source. In comparing our version to the 2.9 I decided that our "old" source was more up to date.
scw@cepu.UUCP (10/03/84)
*<eat it> I'm reposting this, as it appears that it didn't get out of my system (or even into my system). Problem 2.9BSD uucico is VERY slow No, let me rephrase that, 2.9BSD uucico is *UNBELIEVEABLY* slow. Repeat-by tail -20 /usr/spool/uucp/SYSLOG, divide the number of bytes transfered by the time it took. Typical rates for a 1200 baud line will be ~22 bytes/sec. Fix-by Remove the sleep(1) at line 375 in pk1.c Cause It appears to be an incomplete removal of the cpu.speedup patch from the news.2.10 dist. A full reinstalation of the cpu.speedup patch has been posted to net.sources. Thanks to the following, who all pointed out the same error: ihnp4!ut-sally!harvard!wjh12!sob trwrb!wlbr!jm ihnp4!harvard!tardis!ddl and to ...cepu!ucla-cime!kyle and ...cepu!ucla-an!stan for letting me use their systems for testing and implementation of the patch. -- Stephen C. Woods (VA Wadsworth Med Ctr./UCLA Dept. of Neurology) uucp: { {ihnp4, uiucdcs}!bradley, hao, trwrb, sdcrdcf}!cepu!scw ARPA: cepu!scw@ucla-cs location: N 34 3' 9.1" W 118 27' 4.3"
honey@down.FUN (code 101) (10/04/84)
here is a note that appeared in this forum about 18 months ago. (original authorship is lost to historians.) it pre-dates 4.2bsd uucp and honey danber, but does explain how the sleep/nap/select code found its way into pkcget(). peter =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Under certain circumstances, you may find that when 2 or 3 uucicos are running on your system, they are eating up all the CPU time, and system performance suffers horribly. If this is your problem, you can do a "vmstat 5" and watch the system calls and context switches counters. If they are both very high whenever 2 or more uucicos are running (100-200 system calls/second, over 100 context switches), chances are that the problem is as follows: When another system is sending you a file, your uucico reads characters from the line. The read returns whatever is there waiting, or if nothing is waiting, waits for one character and returns. Since uucico usually wants 64 characters at a time, at 1200 baud it's quite common to read these in 1 or 2 character pieces. Each uucico will read 1 or 2 characters, wake up the user process, go back for more, there won't be any, so it hangs and gives up the CPU. A very short time later, (often within the same clock tick) there will be a character available, the process will wake up, read one character, and try again. This modification is very simple. If the first read returned fewer characters than requested, before doing another read, the process will sleep for one second. Then, when it wakes up, there will probably be as many characters waiting as it needs. This modification makes a big difference when you are RECEIVING a file from another system. It won't make much difference when you are SENDING a file, because the user process doesn't usually have to hang to write to the line, and when it does, the high/low water mark mechanism in the tty driver keeps it from waking up too often. This change is intended for a V7 or 4BSD system. It may not help much on System V, because uucp uses a USG tty driver feature to make it wake up only every 6 characters. The amount this fix helps depends a LOT on the baud rate. Since it is sleeping while it had been reading characters, it is reasonable to expect the file to get transferred more slowly than before. This might, in turn, lead to increased phone bills. Some experimentation receiving a file over a hardwired link is detailed here. The file received is /etc/termcap, which is 66405 bytes long. The local system is a 4.1BSD VAX 11/750, the remote system is a UNIX 5.0 VAX 11/750. The link is over a develcon dataswitch. Both systems were almost idle, although when another uucico did start up, it didn't seem to affect the numbers. The commands uucp -r othersys!~/termcap ~uucp/termcap time /usr/lib/uucp/uucico -r1 -sothersys were given. "type" is the type of uucico run: "old" is without the sleep, "sleep" has a sleep(1) added after every short read, "nap" is the same as sleep except that at 4800 baud and higher the sleep is for less than one second (the parameter is the number of milliseconds). "user" and "sys" are the user and system CPU times from the time command, in seconds. "elapsed" is the time, in seconds, to transfer the file, taken from /usr/spool/uucp/SYSLOG. (Elapsed time does not include time to get the connection going or close it down, just to transfer the file.) "%" is the percentage of the system the uucico command took, from the time command. type speed user sys elapsed % old 1200 35.3 120.8 606 21% sleep 1200 14.2 35.9 609 7% old 2400 27.4 115.8 305 31% sleep 2400 13.2 35.0 351 9% old 4800 23.9 116.0 152 57% sleep 4800 14.4 40.3 338 12% old 9600 14.4 68.1 79 42% nap 60 9600 14.6 52.7 97 39% nap 100 9600 14.9 48.5 113 32% nap 200 9600 15.0 47.1 127 37% sleep 9600 12.0 46.1 279 15% It is clear that at 2400 baud or less, the load on the system was cut considerably, while the penalty in slower transfer speed is negligible. At 9600 baud, the sleep version (equivalent to nap 1000) cut the system load by about 1/3, the elapsed time shot way up. (It takes much less than 1 second to accumulate 64 characters at 9600 baud.) At 4800 baud the results are somewhere in between. The system time was cut by a factor of 3, but the elapsed time doubled. Putting in shorter naps at 9600 baud brought the elapsed time down, while increasing the system load moderately. Essentially, the system time remained constant when any sleeping was done. The difference in percentage of the system used shows that, in effect, the same work was spread out over different lengths of time. This results in a tradeoff that can only be evaluated by each system in terms of their priorities. An added complication is that most V7 and 4BSD systems do not have a way to sleep for less than a second. 4.2BSD has the select system call, or you may have installed a nap system call or the Cornell fast timer driver. Otherwise, your only choices are either sleep(1) or nothing. The napms call below calls a routine in the new curses, to sleep for around 1 clock tick (60 ms). If your top priority is to keep system response good, it is recommended that you do the sleep(1) no matter what the baud rate is. If your top priority is to make 9600 baud transfers go as quickly as possible, you should do the sleep for 1200 baud or less, and otherwise do nothing. If you want a well balanced compromise and have a high resolution sleep or nap or select available, the changes shown here are appropriate. This change is trivial except for the change to conn.c to make the baud rate available to the packet driver. The code dealing with the speed is different in different versions of UUCP. If you have trouble with conn.c, search for the string "speed" and look for a conveniently available integer version of the speed. The variable linebaudrate is a global integer, exported to pk1.c for purposes of this test. The changes shown here are for the 4.1BSD version of UUCP. *** conn.c Wed Jun 4 01:47:12 1980 --- conn.c.new Sat Apr 2 18:13:25 1983 *************** *** 85,90 char *D_calldev; int D_speed; } Devs [MAXDEV]; char Devbuff[MAXDCH]; --- 85,91 ----- char *D_calldev; int D_speed; } Devs [MAXDEV]; + int linebaudrate; char Devbuff[MAXDCH]; *************** *** 344,349 alarm(0); fflush(stdout); fixline(dcf, pd->D_speed); DEBUG(4, "Forked %d ", pid); DEBUG(4, "Wait got %d ", nw); DEBUG(4, "Status %o\n", lt); --- 345,351 ----- alarm(0); fflush(stdout); fixline(dcf, pd->D_speed); + linebaudrate = pd->D_speed; DEBUG(4, "Forked %d ", pid); DEBUG(4, "Wait got %d ", nw); DEBUG(4, "Status %o\n", lt); *** pk1.c Mon May 28 00:44:06 1979 --- pk1.c.new Sat Apr 2 18:16:02 1983 [This is routine pkcget, near the end of pk1.c.] *************** *** 335,340 char *b; { int nchars, ret; if (setjmp(Getjbuf)) { Ntimeout++; --- 335,341 ----- char *b; { int nchars, ret; + extern int linebaudrate; if (setjmp(Getjbuf)) { Ntimeout++; *************** *** 343,349 } signal(SIGALRM, cgalarm); ! for (nchars = 0; nchars < n; nchars += ret) { alarm(PKTIME); ret = read(fn, b, n - nchars); if (ret == 0) { --- 344,350 ----- } signal(SIGALRM, cgalarm); ! for (nchars = 0; nchars < n; ) { alarm(PKTIME); ret = read(fn, b, n - nchars); if (ret == 0) { *************** *** 352,357 } PKASSERT(ret > 0, "PKCGET READ %d", ret); b += ret; } alarm(0); return(0); --- 353,364 ----- } PKASSERT(ret > 0, "PKCGET READ %d", ret); b += ret; + nchars += ret; + if (nchars < n) + if (linebaudrate > 0 && linebaudrate < 4800) + sleep(1); + else + napms(60); } alarm(0); return(0);
mark@cbosgd.UUCP (Mark Horton) (10/04/84)
The sleep seems to be a (broken) variation of an idea I posted long ago. Unfortunately, the code you posted is clearly wrong. The right way to do it is, after the read returns, if it returned short, to sleep. The code posted unconditionally sleeps before trying the first read. Measurements showed that at 1200 baud it cut way down on system load (and this really makes a difference - a uucico at 1200 baud only adds about .2 to our load average instead of 1 like it used to) with almost no effect on throughput rates, but at 9600 baud it hurts throughput drastically (since 1 second is far too long to sleep at 9600 baud) and is a bad idea. At 4800 it's a close call, and a local decision should be made.
larry@decvax.UUCP (Larry Cohen) (10/07/84)
Here at decvax we will do anything to speed up uucico. We also noticed that in the pk routines, several "reads" were necessary to pull a packet off the network. On the average there were about 3 context switches per packet on a 1200 baud line. Of course this varied with the time of day and speed of the line. One "experiment" we tried was to try a modified Berknet line discipline. This line discipline would read a specified number of characters and not return until they all had arrived. Internally it used circular queues. It does not use clists at all. The improvement was pretty good. I dont have the exact figures with me (I'm at home 4 months after I ran the experiments) but It was something in the ballpark of about 20 % improvement in overall throughput. 33 % reduction in context switches. Less time was spent in the system, and the user time was less also. There was no reduction in performance on 9600 baud lines. I hope to try running on decvax before long. -Larry Cohen
jerry@oliveb.UUCP (Jerry Aguirre) (10/09/84)
What happened to the idea of putting the pk routines into the kernel? The man pages still advertize the kernel version of the pk routines but I know of no system that uses them. Does anybody have them working? Is there a reason not to put them in the kernel? Jerry Aguirre {hplabs|fortune|idi|ihnp4|ios|tolerant|allegra|tymix}!oliveb!jerry