[comp.unix.sysv386] TCP/IP dies and won't resurrect

jimmy@denwa.info.com (Jim Gottlieb) (09/29/90)

We have a network of 386 machines, mostly running Interactive 2.0.2
with Interactive 1.2 TCP/IP and NFS.  The network is still in the
experimental stage.  We have / of each of nine machines mounted on the
tenth.  But there is almost no traffic across the network.

The problem is that at random we find that all network services on a
particular machine have died.  I bring the machine down to init level
2, then back up to 3.  At this point I get flooded with error messages,
starting with:

cp: cannot create /dev/netsched
cp: no such device
tcp timer failed

I have looked at the permissions and major/minor numbers of netsched,
udp, arp, ip, and tcp in the /dev directory and they look the same as
on working machines.  I have tried rm'ing them so that the programs can
recreate them.  I also tried running /etc/conf/bin/idmknod.  Even if I
can get past this problem, I then get errors about streams.

The only solution I've found is to reload Streams, TCP/IP, and NFS in
that order.  Any better solutions would be welcomed.  Thanks.

cpcahil@virtech.uucp (Conor P. Cahill) (09/29/90)

In article <533@denwa.uucp> jimmy@denwa.info.com (Jim Gottlieb) writes:
>The problem is that at random we find that all network services on a
>particular machine have died.  I bring the machine down to init level
>2, then back up to 3.  At this point I get flooded with error messages,
>starting with:

In 386/ix 2.0.2 this is a known bug (as far as I am concerned).  The correct
way to recover from it is to bring the system down (not just to run level 2)
and power cycle the machine (although the power cycle may not be necessary,
but you do have to go all the way down).

>The only solution I've found is to reload Streams, TCP/IP, and NFS in
>that order.  Any better solutions would be welcomed.  Thanks.

You definately shouldn't have to reload.  We had this kind of problem at
least once a week when we were running 2.0.2. 

The real answer is to upgrade to 2.2 where the problem appears to have 
gone away.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170