[comp.protocols.tcp-ip.ibmpc] Intermittent Novell+BYU+PD+CUTCP hangups

djc@duke.cs.duke.edu (David J. Cherveny) (12/08/90)

In the last couple of months, we have started using CUTCP, packet drivers
and the BYU Novell-packet diver interface to do TCP/IP Telnet from our Novell
workstations.  It all works fine... most of the time.

Intermittently, a workstation will loose touch with the server and give
the good ole  IGNORE, RETRY, ABORT message.  Retrying does not help.

I've rebuilt the ni5210 driver with the DAN code turned off as it is known
to cause problems with Novell nets.  I've heard the "free" version 2.1 BYU
code can cause some hangs, but I'm unwilling to pay for the 3.0 version just
yet.

Until recently, we thought it was just occuring on workstations using the
packet driver setup.  However, recently, it has occured to NORMALLY CONFIGURED
workstations as well.

It seems to happen in rashes.  Like for an hour all hell will break loose
then it will be quite for days then will happen again.

It does NOT happen to the STARLAN connected workstations.

We've used NETWATCH during these spells but haven't noticed anything unusual.

We are connected to a campus "backbone" network with a MAC level bridge.

It is starting to sound like a HW problem to me since unmodified workstations
are involved.

Does anyone have any suggestions on how to proceed?



HW and SW Details:

	DOS 4.0 on AT&T PCs
	Micom/Interlan NI5210 cards in workstations

	Novell 286 Netware SFT v2.15 rev c
	NI5210 version 7 packet driver with DAN code OFF
	BYU Novell-Packet driver interface V 2.1
	CUTCP 2.2d

	Network HW is Synoptics Lattice Net UTP from a 110 box.



David Cherveny
Duke University Medical Center
djc@hodgkin.mc.duke.edu		(919)684-6804

trier@cwlim.INS.CWRU.Edu (Stephen C. Trier) (12/08/90)

In article <660597854@lear.cs.duke.edu> djc@duke.cs.duke.edu (David J. Cherveny) writes:
>Intermittently, a workstation will loose touch with the server and give
>the good ole  IGNORE, RETRY, ABORT message.  Retrying does not help.

I know that bug!!!  We're having exactly the same problems here.  They
started happening in August, about the time we started widespread use
of the packet drivers.

I spent a week watching our network analyzer for traces of what was going
on and found nothing.  We've been swapping cards, changing cables for the
servers, and everything else we could think of.

The problem seems to affect only computers that haven't sent network
packets for ten minutes or so, which leads me to suspect that the server
keep-alive packets are somehow getting dropped.

Hardware used: Just about anything that runs MS-DOS 3.0 or higher.  Many
machines are AT&T 6386's, but we have bunches of Zeniths and PS/2's, too.
The file servers are all AT&T 6386's.  Ethernet cards are mostly AT&T
Starlan-10 Fiber NAU's, but we've also got bunches of 3c503's, 3c523's,
and Cabletron 1020 and 1040 cards.  The servers use 3c505's.

Software: CWRU-PC/IP (local PC/IP descendant) and BYU's packet driver
IPX, version 2.1.  The servers are running Advanced Netware 2.15 rev C,
2.15 rev A, and 2.15 rev 0.

It's nice to know that we aren't imagining this.  Does anyone have any
ideas where to start looking?  The failures are random, which makes
watching with the net. analyzer a little difficult.

-- 
Stephen Trier                              Case Western Reserve University
Work: trier@cwlim.ins.cwru.edu             Information Network Services
Home: sct@seldon.clv.oh.us               %% Any opinions above are my own. %%

nelson@sun.soe.clarkson.edu (Russ Nelson) (12/09/90)

In article <1990Dec8.061321.11400@usenet.ins.cwru.edu> trier@cwlim.INS.CWRU.Edu (Stephen C. Trier) writes:

   In article <660597854@lear.cs.duke.edu> djc@duke.cs.duke.edu (David J. Cherveny) writes:
   >Intermittently, a workstation will loose touch with the server and give
   >the good ole  IGNORE, RETRY, ABORT message.  Retrying does not help.

   I know that bug!!!  We're having exactly the same problems here.  They
   started happening in August, about the time we started widespread use
   of the packet drivers.

Kelly McDonald, the author of the BYU Packet driver shell says:

   The problem:
      Idle workstations logged into Novell servers periodically come
   up with an error message stating that they have lost a connection
   to their logged in file server and their connection is no longer
   valid.

   The cause:
      other stations (besides the idle one) that is running the packet
   driver shell sometimes incorrectly respond to the "watchdog" packet
   sent out to the idle station from the server to see if it is still
   alive. The incorrect response causes the server to close the
   connection to the idle station.  When the user of the idle station
   tries to access the server again, the error message is generated.
   (As far as we can tell, this only occurs with Netware 286 or
   earlier servers.)


There would seem to be several solutions:

  o License the 3.0 Packet driver shell from Kelly McDonald.  He has
    licensed it back from Atlantix for use by degree-granting American
    universities only.  He wants several thousand dollars, which I'm sure
    merely reflect *his* cost from Atlantix.

  o Re-implement the packet driver shell and copyleft it.  This requires the
    use of Novell's device driver kit, which costs $7,500.  Now, that's
    a heap of money.  Perhaps we could convince a manufacturer who's already
    bought it to let someone use theirs.  That might be difficult, as Novell
    requires a nondisclosure agreement.  Perhaps we should form an ad-hoc
    consortium to write a freely copyable packet driver shell?

  o Wait until Novell writes their ODI-over-packet-driver interface.

  o Switch to another LAN operating system that supports the packet drivers,
    you know, ???????.  Hmmm...  That would seem to be a problem.  Perhaps
    we could convince Artisoft or whomever to include packet driver support?

--
--russ (nelson@clutx [.bitnet | .clarkson.edu])  FAX 315-268-7600
It's better to get mugged than to live a life of fear -- Freeman Dyson
I joined the League for Programming Freedom, and I hope you'll join too.

Jan.Engvald@ldc.lu.se (Jan Engvald LDC) (12/09/90)

>   >Intermittently, a workstation will loose touch with the server and give
>   >the good ole  IGNORE, RETRY, ABORT message.  Retrying does not help.
> 
>   I know that bug!!!  We're having exactly the same problems here.  They
>   started happening in August, about the time we started widespread use
>   of the packet drivers.
> 
>Kelly McDonald, the author of the BYU Packet driver shell says:
> 
>   The problem:
>      Idle workstations logged into Novell servers periodically come
>   up with an error message stating that they have lost a connection
>   to their logged in file server and their connection is no longer
>   valid.
> 
>   The cause:
>      other stations (besides the idle one) that is running the packet
>   driver shell sometimes incorrectly respond to the "watchdog" packet
>   sent out to the idle station from the server to see if it is still
>   alive. The incorrect response causes the server to close the
>   connection to the idle station.  When the user of the idle station
>   tries to access the server again, the error message is generated.
>   (As far as we can tell, this only occurs with Netware 286 or
>   earlier servers.)

Anybody that has more details on the above proposed cause?

Reading between the lines I get the impression that the bad station sends
a response to the server with a from address that is not its own. Is it
the Ethernet address or the IPX address or both?

We have been plauged by this aborted communication ever since June. We have
been running packet drivers with the BYU driver for several years, so it
is hard to believe that any of those is the cause. Late May, however,
we got the Novell 3.01 rev A shells, and I would guess that they have
something to do with the error. The rev D of NETx does not seem to help
for this error. I have seen rumors on a rev B of 3.01 IPX, it might help.
Is there any anonymous FTP server with IPX 3.1 rev B?

If a new IPX does not help and the problem really is wrong from address,
it is easy as a temporary fix to do a special packet driver version to
force correct from address for a novell packet.
                                             
Jan Engvald, Lund University Computing Center
________________________________________________________________________
   Address: Box 783                E-mail: Jan.Engvald@ldc.lu.se
            S-220 07 LUND     Earn/Bitnet: xjeldc@seldc52
            SWEDEN           (Span/Hepnet: Sweden::Gemini::xjeldc)
    Office: Soelvegatan 18         VAXPSI: psi%2403732202020::xjeldc
 Telephone: +46 46 107458          (X.400: C=se; A=TeDe; P=Sunet; O=lu;
   Telefax: +46 46 138225                  OU=ldc; S=Engvald; G=Jan)
     Telex: 33533 LUNIVER S