[comp.dcom.modems] Telebit transfer rate problem

brad@island.uu.net (Bradley Mabe) (03/18/89)

We seem to be having a curious problem with our telebit modems.
When connecting to another site via uucp in pep mode we get an average 
send rate of around 1200 bytes per second, but and average receive rate 
of only 500 bytes per second. The modems are connected to a sun 2/170 via 
a 16 channel systec multiplexer and we are running sunOS 3.2.  I have
configured uucico to run at 19200 and have the serial ports set up at
19200 as well.  Below are our register settings:

E1 F1 M1 Q0 T V1 X1     Version BA4.00
S00=001 S01=000 S02=043 S03=013 S04=010 S05=008 S06=002 S07=080 S08=002 S09=010
S10=010 S11=070 S12=050 
S45=255 S47=004 S48=000 S49=000
S50=000 S51=255 S52=002 S53=004 S54=003 S55=000 S56=017 S57=019 S58=000 S59=000
S60=000 S61=255 S62=003 S63=001 S64=000 S65=000 S66=001 S67=000 S68=255 
S90=000 S91=000 S92=001 S95=000 
S100=000 S101=000 S102=000 S104=000
S110=001 S111=255 S112=001 
S121=000 
N0:
N1:
N2:
N3:
N4:
N5:
N6:
N7:
N8:
N9:

Any help with this problem will be greatly appreciated.
Thanks
----------------------------------------------------------------------------
|                      Bradley Mabe.  "Born to be mild"                    |
|               Island Graphics Corp.  4000 Civic Center Drive             |
|                   San Rafael, Ca 94903   (415) 491-1000                  |
|                        {uunet,sun,well}!island!brad                      |
----------------------------------------------------------------------------

root@libcmp.UUCP (Super user) (03/20/89)

You may wish to check the settings of your "S111" register, if both TELEBIT's
are configured for S111=255, then you have failed to select a valid Protocol.

The TELEBIT which is acting as the originator needs to select a protocol, i.e.
S111=30, and the unit which is answering the call will then configure for the
same protocol if S111=255 on that unit.

I have noted that the PEP connection without a protocol is at least 20% less
efficient. I suggest that you configure your dialer to select a protocol for
dial out contacts, and leave your S111=255 to handle incomming calls.


Ed Unrein		(peora!rtmvax!libcmp!ecu)
Liberty Computers, Inc.	(407) 293-6346 office voice number
UNIX guest account	(407) 299-6947 TELEBIT (8N1) login: guest
UUCP: libcmp Any (speed any) 14072996947 ogin:-""-ogin:-""-ogin:nuucp

vixie@decwrl.dec.com (Paul A Vixie) (03/21/89)

There are a few things to consider when examining your performance on
Telebit modems over UUCP.

First, the Telebit has some buffering.  Since it's spoofing the sender on
ACK's, the sender thinks it's sent the last packet a long time (sometimes
five to ten seconds) before the receiver actually receives the last packet.
The buffering is not a bug; it's part of what lets the Telebit spoof the
UUCP protocol.  You can get a more accurate measurement of bytes-per-second
if you send huge files, since the extra five or ten seconds will make very
little difference in the (bytes/time) calculation.  1MB is a good round
number.  And as Erik Fair has pointed out in the past, 1MB is an excellent
news batch size to use over Telebit modems, since the end-of-file process-
ing causes five or ten seconds of idle time on the modem, so you want to
minimize that by sending fewer files, which means sending bigger files.

Second, most or all serial-port hardware (and their device drivers) are
designed to send data very quickly (like: filling up your screen when you
get into an editor) but receive it rather slowly (like: you typing at your
editor).  When a Telebit modem tries to dump data in at 1800 CPS, there's
a good chance that your system will miss a lot of it.  You're probably
dealing with an interrupt per incoming character - on your Sun 2/120, you
are definitely dealing with one interrupt per character.  Some serial-port
hardware has a FIFO for incoming data, which helps a little, but the system
sometimes still ends up interrupting at its maximum rate since it's going
to have another interrupt right after it gets back from emptying the FIFO.
A high-water threshold that you can turn on when input data starts coming
at you Real Fast is a good thing, and in fact there are serial boards on
the market that have the feature, but the people who do the device drivers
for these boards don't want to deal with interesting questions like: how
do you know data is coming in Real Fast?  How do you know when it slows
down?  What happens to the rest of the ports on the board when you turn
on the high-water interrupt block?

I keep asking Mike @Telebit to put a thinwire ethernet port on the next
Telebit modem product, since the ethernet hardware on most systems will
do intelligent things like DMA with big blocks of data.  But then Telebit
would have to deal with high-water thresholds, plus telnet or whatever
protocol was to be used on the ethernet.

I have a feeling that serial hardware isn't going to get any better, either.

Summary:

(1) send big files if you want an accurate bytes/sec number, 1MB is
   enough and is a good idea for news batches for other reasons.

(2) expect someone's serial-receive performance to be the limiting
   factor in a PEP-UUCP transfer; if you can max out the modem, you
   have not one but two very unusual systems.
--
Paul Vixie
Work:    vixie@decwrl.dec.com    decwrl!vixie    +1 415 853 6600
Play:    paul@vixie.sf.ca.us     vixie!paul      +1 415 864 7013

romain@pyramid.pyramid.com (Romain Kang) (03/27/89)

In article <640@island.uu.net> brad@island.uu.net (Bradley Mabe) writes:
| When connecting to another site via uucp in pep mode we get an average 
| send rate of around 1200 bytes per second, but and average receive rate 
| of only 500 bytes per second. The modems are connected to a sun 2/170 via 
| a 16 channel systec multiplexer and we are running sunOS 3.2.

Since no one else has responded, I guess I'll stick my neck out:  Maybe
a 2/170+Systech doesn't have the horsepower to receive any faster.
Remember, one reason 't' protocol was invented was that VAX 11/780's
running 'g' peaked out at about 900 bytes/sec over TCP/IP sockets, with
the data DMA'd in, avoiding the tty overhead.  Here you have a CPU
that's probably slower; each character received means a CPU interrupt
and a couple of context switches, and has to filter through the raw tty
interface.  Sending may be that much more CPU-efficient.

I see several possible solutions:
	1. Get a faster processor (perhaps not easy for you).
	2. Get a modernized UUCP.  A SysV termio interface with VMIN
	   and VTIME set to reasonable values, or a high resolution
	   sleep (like select()) might be used to avoid unnecessarily
	   scheduling uucico.
	3. Get a smart terminal server to handle the TrailBlazer.  Some
	   time back, Rutgers hacked their Cisco servers to allow UUCP
	   logins to use port 540, thus avoiding the tty overhead.  I
	   think I was able to get 1350-1400 cps before this feature
	   disappeared.
	4. Put 'g' protocol in the kernel, where God meant it to be.
	   I think Peter Honeyman once said you couldn't be considered
	   serious about UUCP if you hadn't hacked your kernel for it
	   somehow.

Two years ago, I cited the VAX statistics and claimed the UUCP spoof
was no substitute for intelligent front-end hardware and more efficient
protocols.  Oops.  Today, someone with a 80386 box and a decent serial
card can do 'g' protocol faster than a '780.  Time to find a predictable
occupation, like migrant farm worker...
--
"Eggheads unite! You have nothing to lose but your yolks!"  -Adlai Stevenson

jeff@tc.fluke.COM (Jeff Stearns) (03/28/89)

In article <640@island.uu.net> brad@island.uu.net (Bradley Mabe) writes:
>We seem to be having a curious problem with our telebit modems.
>When connecting to another site via uucp in pep mode we get an average 
>send rate of around 1200 bytes per second, but and average receive rate 
>of only 500 bytes per second. The modems are connected to a sun 2/170 via 
>a 16 channel systec multiplexer and we are running sunOS 3.2. ...

You have discovered that the Sun kernel burns a lot of cycles servicing
interrupts from the Systech serial interface when performing raw input.
You can confirm this by running vmstat while uucico is receiving files;
note the very high percentage of system time.

Sun has known this for quite some time.  I doubt that they'll ever do anything
more about it; they probably feel that the hardware is nearly obsolete.

When I pointed this problem out to Sun in 1987, they suggested that I upgrade
to a Sun-3 and run run a custom device driver for ttya or ttyb.  (I believe
that they called it "CONSULT-HISPEED".)  This "solution" wasn't viable for us,
so I can't report on whether it works or not.
-- 
    Jeff Stearns        John Fluke Mfg. Co, Inc.               (206) 356-5064
    jeff@tc.fluke.COM   {uw-beaver,microsoft,sun}!fluke!jeff
						  
PS - Calling all users of the Vitalink TransLAN IV Ethernet bridge! Please
     drop me a line.

wtm@neoucom.UUCP (Bill Mayhew) (03/28/89)

It has been my experience that quite a few implementations of Unix
have pretty crummy tty drivers, especially on the receive side of
the coin.  The lack of optimization is probably due to the fact
that most software engineers forget that anything other than a
human being might be typing characters in.  Most of the tty drivers
generate an CPU interrupt per character received.  With Unix this
is nasty, as it might mean that several context switches take place
for each character received.  If you've got it, run vmshow or
vmstat while uucico is running; you'll probably see the system time
is very high.

Our solution was to dig our old AT&T Unix PC out of the closet and
front-end it onto our vax.  The Unix PC manages pretty good xfers,
as it is essentially a zero user machine, whose only function is to
do uucp.  The Unix PC tty drive seems to be reasonably well written
too, which would make sense geiven that the machine is sold by a
company whose main interset is telecommunication.  We found this
fix simpler than attempting to re-do the BSD drivers on the vaxen.
The Unix PC -> Vax transfers take place at a low baud rate to avoid
shooting the load up too high on the vax.

Actually, the above fix, though seemingly arcane, is reasonable
since the current market price of a Unix PC is less than an
internet-discounted Trailblazer.

I like the idea that someone mentioned of putting a thin wire
ethernet port on a Trailbalzer.

Bill

stacy@mcl.UUCP (Stacy L. Millions) (03/29/89)

In article <64160@pyramid.pyramid.com>, romain@pyramid.pyramid.com (Romain Kang) writes:
> Time to find a predictable occupation, like migrant farm worker...

Does that mean you can predict weather and farm commodity prices? :-)

-stacy

-- 

"You should not drink and bake."
				- Arnold Schwarzenegger, _Raw Deal_
S. L. Millions                                            ..!tmsoft!mcl!stacy

root@texbell.UUCP (root) (03/29/89)

In article <640@island.uu.net> brad@island.uu.net (Bradley Mabe) writes:
> When connecting to another site via uucp in pep mode we get an average 
> send rate of around 1200 bytes per second, but and average receive rate 
> of only 500 bytes per second.

In article <64160@pyramid.pyramid.com> romain@pyramid.pyramid.com (Romain Kang)
 writes:
> each character received means a CPU interrupt
> and a couple of context switches, and has to filter through the raw tty
> interface.

In article <7457@fluke.COM> jeff@tc.fluke.COM (Jeff Stearns) writes:
> You have discovered that the Sun kernel burns a lot of cycles servicing
> interrupts from the Systech serial interface when performing raw input.

Question: How much greater is the cpu load when receiving uucp than
when transmitting a file? Or is the disparity mainly due to Telebit
buffering?
--

Greg

guy@auspex.UUCP (Guy Harris) (03/31/89)

>It has been my experience that quite a few implementations of Unix
>have pretty crummy tty drivers, especially on the receive side of
>the coin.  The lack of optimization is probably due to the fact
>that most software engineers forget that anything other than a
>human being might be typing characters in.  Most of the tty drivers
>generate an CPU interrupt per character received.

That's not a software engineer's problem, that's a hardware engineer's
problem - the serial port hardware doesn't buffer up characters.

>With Unix this is nasty, as it might mean that several context switches
>take place for each character received.

That has nothing to do with the number of interrupts; in UNIX systems,
interrupts tend to be serviced in whatever context was running at the
time the interrupt occurred.  Even if you have a streams driver, streams
modules would tend to be run in the same context.

The problem is that you get a *wakeup* for every character received;
that's where the context switches come from.  There are at least two
ways around this:

	1) When you receive a character, buffer it a while in the driver
	   and see whether any more come in just after it.  Only wake up
	   the process waiting for input when enough characters come in,
	   or more than some amount of time elapses after "the last one"
	   comes in.

	   SunOS, for example, does this on its CPU serial ports; when
	   running high-speed UUCP input (38.4KB), the difference
	   between receiving on a CPU serial port and, say, an ALM-1 is
	   noticeable.

	   SunOS 4.0 does this on all serial ports, in order to fix some
	   problems with streams flow control.  4.0 had some other
	   problems with streams flow control (limits being set too low,
	   and CPU serial port driver sending "runt" streams messages
	   upstream); 4.0.1 fixes those.  This significantly reduced the
	   CPU overhead for 19.2KB receiving on ALM-1 ports (ALM-1s
	   don't run at 38.4), for example.

	   (Also, converting to streams reduced CPU overhead some more -
	   I suspect it was because the old line discipline interface
	   required one procedure call from the driver to the line
	   discipline on every character, while the streams code
	   requires one call per streams message, and if the characters
	   are coming in thick and fast the driver tries to pack 16 or
	   so characters per streams module.)

	2) If you have VMIN and VTIME support in your tty driver, set
	   VMIN and VTIME to do a similar sort of buffering (VMIN
	   specifies "enough characters", and VTIME specifies "some
	   amount of time").

4.4BSD should give you VMIN and VTIME, since it'll have a
POSIX-compliant tty driver.  Perhaps the 4.4BSD UUCP will use VMIN and
VTIME.  To some degree, sticking in delays into the receive code UUCP
gives the same result; I think the 4.3BSD UUCP does so. 

(SunOS 4.0 already gives you VMIN and VTIME, although the UUCPs first
tested were the current SunOS UUCPs, which still use the old tty driver
"ioctl"s and thus don't use VMIN nor VTIME.  As I remember, using the
version of Honey DanBer UUCP slated for 4.1, which *does* use VMIN and
VTIME, the CPU overhead dropped a bit more.)

rpw3@amdcad.AMD.COM (Rob Warnock) (03/31/89)

In article <1330@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
+---------------
| >...Most of the tty drivers generate an CPU interrupt per character received.
| That's not a software engineer's problem, that's a hardware engineer's
| problem - the serial port hardware doesn't buffer up characters.
+---------------

Well, it's also a Unix problem. Unix was originally written when terminals
were *slooow*. (Can you still spell ASR-33?) Thus it never bothered anybody
that the kernel cut all interrupts off for many milliseconds at a time.
(The most serious offenders tend to be in the disk buffer cache search
and in the time-of-day-crosses-a-second/minute/hour/day code.)

Thus with higher speed lines, regardless of the *efficiency* (see below)
of the TTY driver implementation, the rest of the kernel simply doesn't
accommodate the low *latency* requirement of these speeds.

+---------------
| >With Unix this is nasty, as it might mean that several context switches
| >take place for each character received.
| That has nothing to do with the number of interrupts;
| The problem is that you get a *wakeup* for every character received;
| that's where the context switches come from.  There are at least two
| ways around this:
+---------------

...and Guy describes what I call "pseudo-DMA with dallying", and VMIN/VTIME.

These are both *efficiency* optimizations, and while quite worth
it in terms of efficiency [esp. pseudo-DMA + dallying], they can't
help high-speed character input if the rest of the kernel breaks the
*latency* requirement.

[To put some numbers on it, since most serial ports these days have
3, maybe as little as 2, bytes of buffering, you can tolerate the TTY
interrupts being shut off for at *most* 3 character times, and 1 char
time is safer. With 19200 baud async, that's about 1/2 millisecond.
But I have seen "production" Unix kernels which held "spl_high()" for
tens of milliseconds!]

The solution is to fix the latency breakers, *then* apply the mentioned
efficiency changes. An straightforward way to do that (known to many kernel
hackers, but by no means all) I recently described at length in comp.arch,
but for those who don't read that group, a condensed version:

You split interrupt service into into a "first-level"/hardware-oriented/
assembly-language part, and a "second-level"/software-oriented/C-language
part. You leave the "real" hardware interrupts always enabled (especially
during 2nd-level handlers, system calls, etc.). When an interrupt occurs,
all you do is clear the interrupting hardware, grab whatever really volatile
data there might be [e.g., a just-received async character], and queue up
a task block naming the 2nd-level handler to run -- if it's even needed
("soft"-DMA can often just stash the data in a buffer and dismiss).

The Unix "splXXX()" [Set Priority Level] routines are modified to manipulate
a *software* notion of priority, which is respected by the 2nd-level routines
and system-call level code (but not the hardware), but they never turn off
the *hardware* enables.

Benefits:

1. The hardware interrupts are disabled only for the brief moment when a
   1st-level handler is running.
   
[You will be amazed how good your CPU's interrupt response time *really*
is -- especially if it's one of the new RISCs. Even older CISCs can handle
astounding numbers of interrupts per second. For example, a certain PDP8-based
terminal front-end handled 10,000 chars/sec *through* the node, interrupt
per char. 68000's do better. 29000's do *lots* better.]

2. The 1st-level tasks can usually be done in a few assembly instructions
   without saving very much CPU state; the 2nd-level tasks need a full
   C context, reentrant and "interruptable" -- a lot more state. Since
   interrupts are often "bursty", the two-level structure saves state
   *once* for several interrupts, a significant efficiency gain. In fact,
   interrupt handling gets more efficient the higher the interrupt rate.

3. Most interrupts from "character" devices can be handled entirely in
   the 1st-level handlers as "soft-DMA", or "pseudo-DMA", thus lessening
   further the number of full CPU state saves done. [This is the main
   benefit of Guy's first point.]

Applying the above to a Version 7 Unix port to a 5.5 MHz 68000 (years ago),
we were able to take a system which could hardly do a single 2400-baud UUCP
and get it to cheerfully handle three simultaneous 9600-baud UUCPs! ...and
with no change to the hardware: interrupt-per-character SIO chips.

[Sadly, I must admit that the reason that same system could never do even
*one* 19200-baud UUCP is that after we had achieved such a speedup, management
wouldn't let us spend the time to find out where the remaining latency-breaker
for 19200 was... somewhere in the once-a-second clock stuff, we thought.
Thus my my Telebit is locked at 9600, not 19200. (*sigh*)]

To hammer the point home, there are three conflicting goals in doing "real-
time" work [and yes, Unix I/O *is* "real-time"!]: latency, efficiency, and
throughput. UNLESS YOU ARE VERY CAREFUL and explicitly pay attention to
"balance", efforts to improve one often have adverse effects on the others.


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
ATTmail:  !rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403