[comp.unix.questions] Major problems with A/UX serial drivers?

alexis@panix.uucp (Alexis Rosen) (06/26/91)

(Since some serial guru outside of the A/UX world may have relevant info,
I'm posting this to comp.unix.questions, but followups to comp.unix.aux only.)

I've been having major grief with A/UX serial port behavior for quite a
while now. Unlike previous problems (which remain unresolved), these
latest problems make the entire machine utterly unusable when they occur.

I may not have all of the relevant information I'd like, but the seriousness
of this matter is prompting me to post early. I will follow up with more info
as it becomes available. The machine is a Mac IIx with 8MB RAM, running
A/UX 2.0.1. It has 4 disks totalling about 1200MB and two CommCard 4-port
serial boards.

Ever since the upgrade to 2.0.1, or a little after, we have experienced
system crashes every few days. They'd be more frequent, but we reboot
when we see symptoms appear, if we're around when they become evident.
The crashes are all clist panics. The pre-crash symptoms are loss of
character input or output. For long periods, terminals will go dead, only to
behave normally for a second or two, and then go dead again. This behavior
is observable on the console, in the CommandShell windows, and on built-in
and add-on serial ports. The machine itself seems fine though- running
processes behave normally, logging no errors, the file system's OK, etc.
Of course uucico doesn't like things because it can't talk anymore.

We are pretty sure this has nothing to do with the Mac environment. We
rebooted a few days ago and never invoked the MacOS, even to run /etc/Login.
Yet we just had serious "character disease" occur again fifteen minutes ago.
(I managed to reboot, after trying to type "/etc/reboot" for about six
minutes.)

The big question is whether or not the add-on serial ports are causing
problems. But in a way, it's irrelevant. They worked fine in 2.0 (at least
in this respect), and if they're broken now then A/UX is broken. On the
other hand, it may be that the serial drivers for the built-in ports are
bad in 2.0.1. In that case, we'll know in a few days, because tonight we
are moving all modems off the built-in ports onto the cards. (For those who
might ask: the CommCard drivers make no use of anything past u-dot.)

As for why I suspect the serial ports- well, first of all, clists are used
by the serial drivers. And not much else. And the "character disease" seems
to be exactly what you'd see if you had major clist problems. (But I'm not
an expert here and am open to additional or contrary information.)

The other reason is that this system, Panix, is (as c.u.aux readers probably
know by now :-) probably the biggest byte-moving A/UX system around, excepting
maybe the ftp machine at Apple (but that's all Ethernet). So I think we're
hitting a condition in a few days that other systems wouldn't hit for weeks,
and by then they'd probably have been rebooted for other reasons.

As I said, I'll post more once we've experimented with not using the built-in
ports at all. But in the meantime, if anybody at all has any ideas or
suggestions, no matter how off the wall, I'd like to hear them. And do any
other A/UX sites which use serial I/O ever see problems like this?

Also, while I'm criticizing A/UX's serial drivers... I have finally figured
out something that has puzzled me (and others) for months now. A few months
ago, there was some debate about whether getty would properly back off of a
serial port that cu, kermit, or uucico (or whatever) had dialed out on. I
maintained that this works fine, as did several other people. Others, however,
said that this did _not_ work and that the getty would start talking to the
comm program or uucico. I thought that they were wrong, but they're not.

The key difference on Panix seems to be whether or not we're using the built-
in serial ports. If we use the CommCard ports, getty behaves politely and
backs off. If we use the built-ins, getty and uucico clobber each other. This
suggests to me that the built-in drivers are broken. They fail in some way,
though I can't figure out how. (At first I thought that getty and uucico
only look at the lock files in /usr/spool/uucp. But perhaps they try to lock
the special files? Does this involve the driver? As you can tell I have only
the faintest idea what I'm talking about here.)

Lastly, I'm not sure what to ascribe this to, but it's probably the serial
drivers: I don't think that the timing of modem control signals is being done
right. In particular, we often see that modems will answer a call, and their
DCD lights will go on, but the corresponding getty will never wake up. On
the other end, when people log out, sometimes they'll get another getty,
instead of seeing the line hang up on them, as it should.


All in all the behavior of serial ports under A/UX is an entirely dreadful
mess. I know that these are not perfect bug reports, and I will continue to
gather more information, but I'm hoping that someone will turn up something,
and that someone in A/UX developement will say "Aha! I know where that bug
is!" and proceed to fix it. Well, it could happen...

---
Alexis Rosen
Owner/Sysadmin, PANIX Public Access Unix, NY
alexis@panix.com
{cmcl2,apple}!panix!alexis