[net.bugs.uucp] 2.9 BSD uucico slowness

scw@cepu.UUCP (Stephen C. Woods) (09/28/84)

**<- squished bugs
There seems to be a major problem with the 2.9BSD implementation of uucico
on 1200 baud lines talking to 2 different 2.9BSD systems we're only getting:

With ucla-an (09/27-15:25) total data 294648 bytes 12924 secs = 22.80 CPS
With ucla-an (09/27-15:25) recieved data 0 bytes 0 secs 
With ucla-an (09/27-15:25) sent data 294648 bytes 12924 secs = 22.80 CPS

With ucla-ci (09/27-15:25) total data 1874341 bytes 85608 secs = 21.89 CPS
With ucla-ci (09/27-15:25) recieved data 4422 bytes 279 secs = 15.85 CPS
With ucla-ci (09/27-15:25) sent data 1869919 bytes 85329 secs = 21.91 CPS

Typically with other systems (V7, sysIII, 4.xBSD, and Locus) that I
talk to at 1200 baud the Gross Xfer rate is ~100 CPS. Even the V7
system that I talk to at 300 baud is faster (~27 CPS).

Has anyone else seen this problem?  If so are there any fixes available?
Please respond directly to me, I'll post a summary (and any fixes that we
discover) to the net.
Thanks in advance.
-- 
Stephen C. Woods (VA Wadsworth Med Ctr./UCLA Dept. of Neurology)
uucp:	{ {ihnp4, uiucdcs}!bradley, hao, trwrb, sdcrdcf}!cepu!scw
ARPA: cepu!scw@ucla-cs location: N 34 3' 9.1" W 118 27' 4.3"

jerry@oliveb.UUCP (Jerry Aguirre) (10/02/84)

I ran into the same problem of uucico slowness when I installed Unix
2.9BSD.  In comparing the sources to the version we were running before
I came to the  conclusion that the problem was the sleep(1) call in 
the file pk1.c.
	for (nchars = 0; nchars < n; nchars += ret) {
		if (nchars > 0)
			sleep(1);
		ret = read(fn, b, n - nchars);
		if (ret == 0) {
			alarm(0);
			return(-1);
		}
		PKASSERT(ret > 0, "PKCGET READ", "", ret);
		b += ret;
	}
Apparently this call was added to improve efficiency.  The idea is that
if it takes more than one read to receive a packet then you are wasting
time and should sleep to allow some other process to run.  Meanwhile
the line will buffer up characters so you can read a whole packet in
one read instead of a single character.

This sounds good until you analyze what is really happening!  The
condition of receiving only a few(1) characters only happens when your
system is lightly loaded so you are trying to improve response time
when it is good and not doing a damn thing when it is bad.  The upshot
of this is that during peak load your uucp will run at full speed and
during the wee small hours of the night it will limp along.  Also the
faster the line speed, the slower the data transfers.

This sleep is probably ok at 300 baud (Yech! primitive) where it would
indeed cut down on the overhead.  I noticed little degradation on my
system at 1200 baud.  But on our internal 9600 baud links the result
was terrible!  Monitoring the line showed:

    blink(packet sent) [delay 1 second] blink(ack sent) [delay 1
    second]

The result is a packet sent once every 2 seconds.

There is a modification to this code that checks the line speed in
order to determine whether to do a sleep.  The test is whether the
speed is less that 4800 buad so I doubt it will do anything for you.
The better fix is to "nap" for a period less than a second.  This
requires the "nap" system call which is nonstandard.  Another
alternative is to compile the pk(2) routines into the kernel.

I would suggest that you just comment out the sleep or use your 2.8
uucp source.  In comparing our version to the 2.9 I decided that our
"old" source was more up to date.

scw@cepu.UUCP (10/03/84)

*<eat it>
I'm reposting this, as it appears that it didn't get out of my system 
(or even into my system).

Problem	    2.9BSD uucico is VERY slow
	    No, let me rephrase that, 2.9BSD uucico is *UNBELIEVEABLY* slow.

Repeat-by   tail -20 /usr/spool/uucp/SYSLOG, divide the number of bytes
	    transfered by the time it took.  Typical rates for a 1200
	    baud line will be ~22 bytes/sec.

Fix-by	    Remove the sleep(1) at line 375 in pk1.c

Cause	    It appears to be an incomplete removal of the cpu.speedup patch
	    from the news.2.10 dist.

	    A full reinstalation of the cpu.speedup patch has been posted to
	    net.sources.

Thanks to the following, who all pointed out the same error:
ihnp4!ut-sally!harvard!wjh12!sob
trwrb!wlbr!jm
ihnp4!harvard!tardis!ddl
and to ...cepu!ucla-cime!kyle and ...cepu!ucla-an!stan for letting me use
their systems for testing and implementation of the patch.

-- 
Stephen C. Woods (VA Wadsworth Med Ctr./UCLA Dept. of Neurology)
uucp:	{ {ihnp4, uiucdcs}!bradley, hao, trwrb, sdcrdcf}!cepu!scw
ARPA: cepu!scw@ucla-cs location: N 34 3' 9.1" W 118 27' 4.3"

honey@down.FUN (code 101) (10/04/84)

here is a note that appeared in this forum about 18 months ago.
(original authorship is lost to historians.)  it pre-dates 4.2bsd
uucp and honey danber, but does explain how the sleep/nap/select
code found its way into pkcget().

	peter

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Under certain circumstances, you may find that when 2 or 3 uucicos
are running on your system, they are eating up all the CPU time,
and system performance suffers horribly.  If this is your problem,
you can do a "vmstat 5" and watch the system calls and context switches
counters.  If they are both very high whenever 2 or more uucicos
are running (100-200 system calls/second, over 100 context switches),
chances are that the problem is as follows:

When another system is sending you a file, your uucico reads characters
from the line.  The read returns whatever is there waiting, or if
nothing is waiting, waits for one character and returns.  Since uucico
usually wants 64 characters at a time, at 1200 baud it's quite common
to read these in 1 or 2 character pieces.  Each uucico will read 1 or
2 characters, wake up the user process, go back for more, there won't
be any, so it hangs and gives up the CPU.  A very short time later,
(often within the same clock tick) there will be a character available,
the process will wake up, read one character, and try again.

This modification is very simple.  If the first read returned fewer
characters than requested, before doing another read, the process
will sleep for one second.  Then, when it wakes up, there will probably
be as many characters waiting as it needs.

This modification makes a big difference when you are RECEIVING a file
from another system.  It won't make much difference when you are
SENDING a file, because the user process doesn't usually have to hang
to write to the line, and when it does, the high/low water mark
mechanism in the tty driver keeps it from waking up too often.
This change is intended for a V7 or 4BSD system.  It may not
help much on System V, because uucp uses a USG tty driver feature
to make it wake up only every 6 characters.

The amount this fix helps depends a LOT on the baud rate.  Since it
is sleeping while it had been reading characters, it is reasonable
to expect the file to get transferred more slowly than before.  This
might, in turn, lead to increased phone bills.  Some experimentation
receiving a file over a hardwired link is detailed here.  The file
received is /etc/termcap, which is 66405 bytes long.  The local system
is a 4.1BSD VAX 11/750, the remote system is a UNIX 5.0 VAX 11/750.
The link is over a develcon dataswitch.  Both systems were almost
idle, although when another uucico did start up, it didn't seem to
affect the numbers.  The commands
	uucp -r othersys!~/termcap ~uucp/termcap
	time /usr/lib/uucp/uucico -r1 -sothersys
were given.

"type" is the type of uucico run: "old" is without the sleep, "sleep"
has a sleep(1) added after every short read, "nap" is the same as
sleep except that at 4800 baud and higher the sleep is for less than
one second (the parameter is the number of milliseconds).  "user" and
"sys" are the user and system CPU times from the time command, in
seconds.  "elapsed" is the time, in seconds, to transfer the file,
taken from /usr/spool/uucp/SYSLOG.  (Elapsed time does not include time to
get the connection going or close it down, just to transfer the file.)
"%" is the percentage of the system the uucico command took, from the
time command.

type	speed	user	sys	elapsed	%

old	1200	35.3	120.8	606	21%
sleep	1200	14.2	 35.9	609	 7%

old	2400	27.4	115.8	305	31%
sleep	2400	13.2	 35.0	351	 9%

old	4800	23.9	116.0	152	57%
sleep	4800	14.4	 40.3	338	12%

old	9600	14.4	 68.1	79	42%
nap 60	9600	14.6	 52.7	97	39%
nap 100	9600	14.9	 48.5	113	32%
nap 200	9600	15.0	 47.1	127	37%
sleep	9600	12.0	 46.1	279	15%

It is clear that at 2400 baud or less, the load on the system was
cut considerably, while the penalty in slower transfer speed is
negligible.  At 9600 baud, the sleep version (equivalent to nap 1000)
cut the system load by about 1/3, the elapsed time shot way up.
(It takes much less than 1 second to accumulate 64 characters at
9600 baud.) At 4800 baud the results are somewhere in between.
The system time was cut by a factor of 3, but the elapsed time doubled.

Putting in shorter naps at 9600 baud brought the elapsed time down, while
increasing the system load moderately.  Essentially, the system time
remained constant when any sleeping was done.  The difference in
percentage of the system used shows that, in effect, the same work
was spread out over different lengths of time.  This results in a tradeoff
that can only be evaluated by each system in terms of their priorities.

An added complication is that most V7 and 4BSD systems do not have
a way to sleep for less than a second.  4.2BSD has the select system
call, or you may have installed a nap system call or the Cornell fast
timer driver.  Otherwise, your only choices are either sleep(1) or
nothing.  The napms call below calls a routine in the new curses, to
sleep for around 1 clock tick (60 ms).

If your top priority is to keep system response good, it is recommended
that you do the sleep(1) no matter what the baud rate is.  If your top
priority is to make 9600 baud transfers go as quickly as possible, you
should do the sleep for 1200 baud or less, and otherwise do nothing.
If you want a well balanced compromise and have a high resolution sleep
or nap or select available, the changes shown here are appropriate.

This change is trivial except for the change to conn.c to make the
baud rate available to the packet driver.  The code dealing with the
speed is different in different versions of UUCP.  If you have trouble
with conn.c, search for the string "speed" and look for a conveniently
available integer version of the speed.  The variable linebaudrate is
a global integer, exported to pk1.c for purposes of this test.  The
changes shown here are for the 4.1BSD version of UUCP.

*** conn.c	Wed Jun  4 01:47:12 1980
--- conn.c.new	Sat Apr  2 18:13:25 1983
***************
*** 85,90
  	char *D_calldev;
  	int D_speed;
  	} Devs [MAXDEV];
  
  char Devbuff[MAXDCH];
  

--- 85,91 -----
  	char *D_calldev;
  	int D_speed;
  	} Devs [MAXDEV];
+ int linebaudrate;
  
  char Devbuff[MAXDCH];
  
***************
*** 344,349
  	alarm(0);
  	fflush(stdout);
  	fixline(dcf, pd->D_speed);
  	DEBUG(4, "Forked %d ", pid);
  	DEBUG(4, "Wait got %d ", nw);
  	DEBUG(4, "Status %o\n", lt);

--- 345,351 -----
  	alarm(0);
  	fflush(stdout);
  	fixline(dcf, pd->D_speed);
+ 	linebaudrate = pd->D_speed;
  	DEBUG(4, "Forked %d ", pid);
  	DEBUG(4, "Wait got %d ", nw);
  	DEBUG(4, "Status %o\n", lt);
*** pk1.c	Mon May 28 00:44:06 1979
--- pk1.c.new	Sat Apr  2 18:16:02 1983
[This is routine pkcget, near the end of pk1.c.]
***************
*** 335,340
  char *b;
  {
  	int nchars, ret;
  
  	if (setjmp(Getjbuf)) {
  		Ntimeout++;

--- 335,341 -----
  char *b;
  {
  	int nchars, ret;
+ 	extern int linebaudrate;
  
  	if (setjmp(Getjbuf)) {
  		Ntimeout++;
***************
*** 343,349
  	}
  	signal(SIGALRM, cgalarm);
  
! 	for (nchars = 0; nchars < n; nchars += ret) {
  		alarm(PKTIME);
  		ret = read(fn, b, n - nchars);
  		if (ret == 0) {

--- 344,350 -----
  	}
  	signal(SIGALRM, cgalarm);
  
! 	for (nchars = 0; nchars < n; ) {
  		alarm(PKTIME);
  		ret = read(fn, b, n - nchars);
  		if (ret == 0) {
***************
*** 352,357
  		}
  		PKASSERT(ret > 0, "PKCGET READ %d", ret);
  		b += ret;
  	}
  	alarm(0);
  	return(0);

--- 353,364 -----
  		}
  		PKASSERT(ret > 0, "PKCGET READ %d", ret);
  		b += ret;
+ 		nchars += ret;
+ 		if (nchars < n)
+ 			if (linebaudrate > 0 && linebaudrate < 4800)
+ 				sleep(1);
+ 			else
+ 				napms(60);
  	}
  	alarm(0);
  	return(0);

mark@cbosgd.UUCP (Mark Horton) (10/04/84)

The sleep seems to be a (broken) variation of an idea I posted long ago.
Unfortunately, the code you posted is clearly wrong.  The right way
to do it is, after the read returns, if it returned short, to sleep.
The code posted unconditionally sleeps before trying the first read.

Measurements showed that at 1200 baud it cut way down on system load
(and this really makes a difference - a uucico at 1200 baud only adds
about .2 to our load average instead of 1 like it used to) with almost
no effect on throughput rates, but at 9600 baud it hurts throughput
drastically (since 1 second is far too long to sleep at 9600 baud)
and is a bad idea.  At 4800 it's a close call, and a local decision
should be made.

larry@decvax.UUCP (Larry Cohen) (10/07/84)

Here at decvax we will do anything to speed up uucico.
We also noticed that in the pk routines, several "reads"
were necessary to pull a packet off the network.  On the
average there were about 3 context switches per packet on
a 1200 baud line.  Of course this varied with the time of
day and speed of the line.  One "experiment" we tried was
to try a modified Berknet line discipline.  This line discipline
would read a specified number of characters and not return until
they all had arrived.  Internally it used circular queues. It
does not use clists at all.
The improvement was pretty good. 
I dont have the exact figures with me (I'm at home 4 months
after I ran the experiments) but It was something
in the ballpark of about 20 % improvement in overall throughput.
33 % reduction in context switches.  Less time was spent in the system,
and the user time was less also.  There was no reduction in performance
on 9600 baud lines.  I hope to try running on decvax before long.
						-Larry Cohen

jerry@oliveb.UUCP (Jerry Aguirre) (10/09/84)

What happened to the idea of putting the pk routines into the
kernel?  The man pages still advertize the kernel version of the
pk routines but I know of no system that uses them.  Does anybody
have them working?  Is there a reason not to put them in the
kernel?

					    Jerry Aguirre
{hplabs|fortune|idi|ihnp4|ios|tolerant|allegra|tymix}!oliveb!jerry