brad@island.uu.net (Bradley Mabe) (03/18/89)
We seem to be having a curious problem with our telebit modems. When connecting to another site via uucp in pep mode we get an average send rate of around 1200 bytes per second, but and average receive rate of only 500 bytes per second. The modems are connected to a sun 2/170 via a 16 channel systec multiplexer and we are running sunOS 3.2. I have configured uucico to run at 19200 and have the serial ports set up at 19200 as well. Below are our register settings: E1 F1 M1 Q0 T V1 X1 Version BA4.00 S00=001 S01=000 S02=043 S03=013 S04=010 S05=008 S06=002 S07=080 S08=002 S09=010 S10=010 S11=070 S12=050 S45=255 S47=004 S48=000 S49=000 S50=000 S51=255 S52=002 S53=004 S54=003 S55=000 S56=017 S57=019 S58=000 S59=000 S60=000 S61=255 S62=003 S63=001 S64=000 S65=000 S66=001 S67=000 S68=255 S90=000 S91=000 S92=001 S95=000 S100=000 S101=000 S102=000 S104=000 S110=001 S111=255 S112=001 S121=000 N0: N1: N2: N3: N4: N5: N6: N7: N8: N9: Any help with this problem will be greatly appreciated. Thanks ---------------------------------------------------------------------------- | Bradley Mabe. "Born to be mild" | | Island Graphics Corp. 4000 Civic Center Drive | | San Rafael, Ca 94903 (415) 491-1000 | | {uunet,sun,well}!island!brad | ----------------------------------------------------------------------------
root@libcmp.UUCP (Super user) (03/20/89)
You may wish to check the settings of your "S111" register, if both TELEBIT's are configured for S111=255, then you have failed to select a valid Protocol. The TELEBIT which is acting as the originator needs to select a protocol, i.e. S111=30, and the unit which is answering the call will then configure for the same protocol if S111=255 on that unit. I have noted that the PEP connection without a protocol is at least 20% less efficient. I suggest that you configure your dialer to select a protocol for dial out contacts, and leave your S111=255 to handle incomming calls. Ed Unrein (peora!rtmvax!libcmp!ecu) Liberty Computers, Inc. (407) 293-6346 office voice number UNIX guest account (407) 299-6947 TELEBIT (8N1) login: guest UUCP: libcmp Any (speed any) 14072996947 ogin:-""-ogin:-""-ogin:nuucp
vixie@decwrl.dec.com (Paul A Vixie) (03/21/89)
There are a few things to consider when examining your performance on Telebit modems over UUCP. First, the Telebit has some buffering. Since it's spoofing the sender on ACK's, the sender thinks it's sent the last packet a long time (sometimes five to ten seconds) before the receiver actually receives the last packet. The buffering is not a bug; it's part of what lets the Telebit spoof the UUCP protocol. You can get a more accurate measurement of bytes-per-second if you send huge files, since the extra five or ten seconds will make very little difference in the (bytes/time) calculation. 1MB is a good round number. And as Erik Fair has pointed out in the past, 1MB is an excellent news batch size to use over Telebit modems, since the end-of-file process- ing causes five or ten seconds of idle time on the modem, so you want to minimize that by sending fewer files, which means sending bigger files. Second, most or all serial-port hardware (and their device drivers) are designed to send data very quickly (like: filling up your screen when you get into an editor) but receive it rather slowly (like: you typing at your editor). When a Telebit modem tries to dump data in at 1800 CPS, there's a good chance that your system will miss a lot of it. You're probably dealing with an interrupt per incoming character - on your Sun 2/120, you are definitely dealing with one interrupt per character. Some serial-port hardware has a FIFO for incoming data, which helps a little, but the system sometimes still ends up interrupting at its maximum rate since it's going to have another interrupt right after it gets back from emptying the FIFO. A high-water threshold that you can turn on when input data starts coming at you Real Fast is a good thing, and in fact there are serial boards on the market that have the feature, but the people who do the device drivers for these boards don't want to deal with interesting questions like: how do you know data is coming in Real Fast? How do you know when it slows down? What happens to the rest of the ports on the board when you turn on the high-water interrupt block? I keep asking Mike @Telebit to put a thinwire ethernet port on the next Telebit modem product, since the ethernet hardware on most systems will do intelligent things like DMA with big blocks of data. But then Telebit would have to deal with high-water thresholds, plus telnet or whatever protocol was to be used on the ethernet. I have a feeling that serial hardware isn't going to get any better, either. Summary: (1) send big files if you want an accurate bytes/sec number, 1MB is enough and is a good idea for news batches for other reasons. (2) expect someone's serial-receive performance to be the limiting factor in a PEP-UUCP transfer; if you can max out the modem, you have not one but two very unusual systems. -- Paul Vixie Work: vixie@decwrl.dec.com decwrl!vixie +1 415 853 6600 Play: paul@vixie.sf.ca.us vixie!paul +1 415 864 7013
romain@pyramid.pyramid.com (Romain Kang) (03/27/89)
In article <640@island.uu.net> brad@island.uu.net (Bradley Mabe) writes: | When connecting to another site via uucp in pep mode we get an average | send rate of around 1200 bytes per second, but and average receive rate | of only 500 bytes per second. The modems are connected to a sun 2/170 via | a 16 channel systec multiplexer and we are running sunOS 3.2. Since no one else has responded, I guess I'll stick my neck out: Maybe a 2/170+Systech doesn't have the horsepower to receive any faster. Remember, one reason 't' protocol was invented was that VAX 11/780's running 'g' peaked out at about 900 bytes/sec over TCP/IP sockets, with the data DMA'd in, avoiding the tty overhead. Here you have a CPU that's probably slower; each character received means a CPU interrupt and a couple of context switches, and has to filter through the raw tty interface. Sending may be that much more CPU-efficient. I see several possible solutions: 1. Get a faster processor (perhaps not easy for you). 2. Get a modernized UUCP. A SysV termio interface with VMIN and VTIME set to reasonable values, or a high resolution sleep (like select()) might be used to avoid unnecessarily scheduling uucico. 3. Get a smart terminal server to handle the TrailBlazer. Some time back, Rutgers hacked their Cisco servers to allow UUCP logins to use port 540, thus avoiding the tty overhead. I think I was able to get 1350-1400 cps before this feature disappeared. 4. Put 'g' protocol in the kernel, where God meant it to be. I think Peter Honeyman once said you couldn't be considered serious about UUCP if you hadn't hacked your kernel for it somehow. Two years ago, I cited the VAX statistics and claimed the UUCP spoof was no substitute for intelligent front-end hardware and more efficient protocols. Oops. Today, someone with a 80386 box and a decent serial card can do 'g' protocol faster than a '780. Time to find a predictable occupation, like migrant farm worker... -- "Eggheads unite! You have nothing to lose but your yolks!" -Adlai Stevenson
jeff@tc.fluke.COM (Jeff Stearns) (03/28/89)
In article <640@island.uu.net> brad@island.uu.net (Bradley Mabe) writes: >We seem to be having a curious problem with our telebit modems. >When connecting to another site via uucp in pep mode we get an average >send rate of around 1200 bytes per second, but and average receive rate >of only 500 bytes per second. The modems are connected to a sun 2/170 via >a 16 channel systec multiplexer and we are running sunOS 3.2. ... You have discovered that the Sun kernel burns a lot of cycles servicing interrupts from the Systech serial interface when performing raw input. You can confirm this by running vmstat while uucico is receiving files; note the very high percentage of system time. Sun has known this for quite some time. I doubt that they'll ever do anything more about it; they probably feel that the hardware is nearly obsolete. When I pointed this problem out to Sun in 1987, they suggested that I upgrade to a Sun-3 and run run a custom device driver for ttya or ttyb. (I believe that they called it "CONSULT-HISPEED".) This "solution" wasn't viable for us, so I can't report on whether it works or not. -- Jeff Stearns John Fluke Mfg. Co, Inc. (206) 356-5064 jeff@tc.fluke.COM {uw-beaver,microsoft,sun}!fluke!jeff PS - Calling all users of the Vitalink TransLAN IV Ethernet bridge! Please drop me a line.
wtm@neoucom.UUCP (Bill Mayhew) (03/28/89)
It has been my experience that quite a few implementations of Unix have pretty crummy tty drivers, especially on the receive side of the coin. The lack of optimization is probably due to the fact that most software engineers forget that anything other than a human being might be typing characters in. Most of the tty drivers generate an CPU interrupt per character received. With Unix this is nasty, as it might mean that several context switches take place for each character received. If you've got it, run vmshow or vmstat while uucico is running; you'll probably see the system time is very high. Our solution was to dig our old AT&T Unix PC out of the closet and front-end it onto our vax. The Unix PC manages pretty good xfers, as it is essentially a zero user machine, whose only function is to do uucp. The Unix PC tty drive seems to be reasonably well written too, which would make sense geiven that the machine is sold by a company whose main interset is telecommunication. We found this fix simpler than attempting to re-do the BSD drivers on the vaxen. The Unix PC -> Vax transfers take place at a low baud rate to avoid shooting the load up too high on the vax. Actually, the above fix, though seemingly arcane, is reasonable since the current market price of a Unix PC is less than an internet-discounted Trailblazer. I like the idea that someone mentioned of putting a thin wire ethernet port on a Trailbalzer. Bill
stacy@mcl.UUCP (Stacy L. Millions) (03/29/89)
In article <64160@pyramid.pyramid.com>, romain@pyramid.pyramid.com (Romain Kang) writes: > Time to find a predictable occupation, like migrant farm worker... Does that mean you can predict weather and farm commodity prices? :-) -stacy -- "You should not drink and bake." - Arnold Schwarzenegger, _Raw Deal_ S. L. Millions ..!tmsoft!mcl!stacy
root@texbell.UUCP (root) (03/29/89)
In article <640@island.uu.net> brad@island.uu.net (Bradley Mabe) writes: > When connecting to another site via uucp in pep mode we get an average > send rate of around 1200 bytes per second, but and average receive rate > of only 500 bytes per second. In article <64160@pyramid.pyramid.com> romain@pyramid.pyramid.com (Romain Kang) writes: > each character received means a CPU interrupt > and a couple of context switches, and has to filter through the raw tty > interface. In article <7457@fluke.COM> jeff@tc.fluke.COM (Jeff Stearns) writes: > You have discovered that the Sun kernel burns a lot of cycles servicing > interrupts from the Systech serial interface when performing raw input. Question: How much greater is the cpu load when receiving uucp than when transmitting a file? Or is the disparity mainly due to Telebit buffering? -- Greg
guy@auspex.UUCP (Guy Harris) (03/31/89)
>It has been my experience that quite a few implementations of Unix >have pretty crummy tty drivers, especially on the receive side of >the coin. The lack of optimization is probably due to the fact >that most software engineers forget that anything other than a >human being might be typing characters in. Most of the tty drivers >generate an CPU interrupt per character received. That's not a software engineer's problem, that's a hardware engineer's problem - the serial port hardware doesn't buffer up characters. >With Unix this is nasty, as it might mean that several context switches >take place for each character received. That has nothing to do with the number of interrupts; in UNIX systems, interrupts tend to be serviced in whatever context was running at the time the interrupt occurred. Even if you have a streams driver, streams modules would tend to be run in the same context. The problem is that you get a *wakeup* for every character received; that's where the context switches come from. There are at least two ways around this: 1) When you receive a character, buffer it a while in the driver and see whether any more come in just after it. Only wake up the process waiting for input when enough characters come in, or more than some amount of time elapses after "the last one" comes in. SunOS, for example, does this on its CPU serial ports; when running high-speed UUCP input (38.4KB), the difference between receiving on a CPU serial port and, say, an ALM-1 is noticeable. SunOS 4.0 does this on all serial ports, in order to fix some problems with streams flow control. 4.0 had some other problems with streams flow control (limits being set too low, and CPU serial port driver sending "runt" streams messages upstream); 4.0.1 fixes those. This significantly reduced the CPU overhead for 19.2KB receiving on ALM-1 ports (ALM-1s don't run at 38.4), for example. (Also, converting to streams reduced CPU overhead some more - I suspect it was because the old line discipline interface required one procedure call from the driver to the line discipline on every character, while the streams code requires one call per streams message, and if the characters are coming in thick and fast the driver tries to pack 16 or so characters per streams module.) 2) If you have VMIN and VTIME support in your tty driver, set VMIN and VTIME to do a similar sort of buffering (VMIN specifies "enough characters", and VTIME specifies "some amount of time"). 4.4BSD should give you VMIN and VTIME, since it'll have a POSIX-compliant tty driver. Perhaps the 4.4BSD UUCP will use VMIN and VTIME. To some degree, sticking in delays into the receive code UUCP gives the same result; I think the 4.3BSD UUCP does so. (SunOS 4.0 already gives you VMIN and VTIME, although the UUCPs first tested were the current SunOS UUCPs, which still use the old tty driver "ioctl"s and thus don't use VMIN nor VTIME. As I remember, using the version of Honey DanBer UUCP slated for 4.1, which *does* use VMIN and VTIME, the CPU overhead dropped a bit more.)
rpw3@amdcad.AMD.COM (Rob Warnock) (03/31/89)
In article <1330@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes: +--------------- | >...Most of the tty drivers generate an CPU interrupt per character received. | That's not a software engineer's problem, that's a hardware engineer's | problem - the serial port hardware doesn't buffer up characters. +--------------- Well, it's also a Unix problem. Unix was originally written when terminals were *slooow*. (Can you still spell ASR-33?) Thus it never bothered anybody that the kernel cut all interrupts off for many milliseconds at a time. (The most serious offenders tend to be in the disk buffer cache search and in the time-of-day-crosses-a-second/minute/hour/day code.) Thus with higher speed lines, regardless of the *efficiency* (see below) of the TTY driver implementation, the rest of the kernel simply doesn't accommodate the low *latency* requirement of these speeds. +--------------- | >With Unix this is nasty, as it might mean that several context switches | >take place for each character received. | That has nothing to do with the number of interrupts; | The problem is that you get a *wakeup* for every character received; | that's where the context switches come from. There are at least two | ways around this: +--------------- ...and Guy describes what I call "pseudo-DMA with dallying", and VMIN/VTIME. These are both *efficiency* optimizations, and while quite worth it in terms of efficiency [esp. pseudo-DMA + dallying], they can't help high-speed character input if the rest of the kernel breaks the *latency* requirement. [To put some numbers on it, since most serial ports these days have 3, maybe as little as 2, bytes of buffering, you can tolerate the TTY interrupts being shut off for at *most* 3 character times, and 1 char time is safer. With 19200 baud async, that's about 1/2 millisecond. But I have seen "production" Unix kernels which held "spl_high()" for tens of milliseconds!] The solution is to fix the latency breakers, *then* apply the mentioned efficiency changes. An straightforward way to do that (known to many kernel hackers, but by no means all) I recently described at length in comp.arch, but for those who don't read that group, a condensed version: You split interrupt service into into a "first-level"/hardware-oriented/ assembly-language part, and a "second-level"/software-oriented/C-language part. You leave the "real" hardware interrupts always enabled (especially during 2nd-level handlers, system calls, etc.). When an interrupt occurs, all you do is clear the interrupting hardware, grab whatever really volatile data there might be [e.g., a just-received async character], and queue up a task block naming the 2nd-level handler to run -- if it's even needed ("soft"-DMA can often just stash the data in a buffer and dismiss). The Unix "splXXX()" [Set Priority Level] routines are modified to manipulate a *software* notion of priority, which is respected by the 2nd-level routines and system-call level code (but not the hardware), but they never turn off the *hardware* enables. Benefits: 1. The hardware interrupts are disabled only for the brief moment when a 1st-level handler is running. [You will be amazed how good your CPU's interrupt response time *really* is -- especially if it's one of the new RISCs. Even older CISCs can handle astounding numbers of interrupts per second. For example, a certain PDP8-based terminal front-end handled 10,000 chars/sec *through* the node, interrupt per char. 68000's do better. 29000's do *lots* better.] 2. The 1st-level tasks can usually be done in a few assembly instructions without saving very much CPU state; the 2nd-level tasks need a full C context, reentrant and "interruptable" -- a lot more state. Since interrupts are often "bursty", the two-level structure saves state *once* for several interrupts, a significant efficiency gain. In fact, interrupt handling gets more efficient the higher the interrupt rate. 3. Most interrupts from "character" devices can be handled entirely in the 1st-level handlers as "soft-DMA", or "pseudo-DMA", thus lessening further the number of full CPU state saves done. [This is the main benefit of Guy's first point.] Applying the above to a Version 7 Unix port to a 5.5 MHz 68000 (years ago), we were able to take a system which could hardly do a single 2400-baud UUCP and get it to cheerfully handle three simultaneous 9600-baud UUCPs! ...and with no change to the hardware: interrupt-per-character SIO chips. [Sadly, I must admit that the reason that same system could never do even *one* 19200-baud UUCP is that after we had achieved such a speedup, management wouldn't let us spend the time to find out where the remaining latency-breaker for 19200 was... somewhere in the once-a-second clock stuff, we thought. Thus my my Telebit is locked at 9600, not 19200. (*sigh*)] To hammer the point home, there are three conflicting goals in doing "real- time" work [and yes, Unix I/O *is* "real-time"!]: latency, efficiency, and throughput. UNLESS YOU ARE VERY CAREFUL and explicitly pay attention to "balance", efforts to improve one often have adverse effects on the others. Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403