[comp.sys.proteon] FAL 1.2/BTI/Proteon problem

swb@CHUMLEY.TN.CORNELL.EDU (12/15/88)

>From IBMTCP-L@cunyvm Thu Dec 15 01:34:06 1988
Date:         Wed, 14 Dec 88 12:59:24 CST
Reply-To: IBM TCP/IP For VM List <IBMTCP-L@cunyvm>
Sender: IBM TCP/IP For VM List <IBMTCP-L@cunyvm>
>From: "David L. Merrifield" <DM06900@uafsysb>
Subject:      FAL 1.2/BTI/Proteon problem
To: Scott Brim <SWB@tcgould.TN.cornell.EDU>

Before we go crazy here, I thought I'd broadcast this cry for help on the
net to see if anyone else has experienced similar behavior or had an opinion
on the solution to our problem.

Apology:  Please forgive the length of this inquiry.

Environment:  FAL 1.2 with David Lippke's driver, IBM 4381-14 with BTI ELC,
   Proteon p4200 router at Release 8.0, thin-ethernet connecting the two.

What works:  We can ping/telnet/ftp from the IBM host to any other host on
   the ethernet, including PCs, Unix minis, etc.  All of these other hosts
   can ping/telnet/ftp the Proteon p4200 router.

What doesn't work:  We are unable to ping/telnet/ftp from the IBM host to
   the Proteon p4200 router.  All we get are timeouts, even after repeated
   attempts.

Details:  Our traces through the FAL code shows that (on a ping, for e.g.)
   FAL attempts to resolve the hardware (ethernet) address of the Proteon
   by sending an ARP Request, broadcast on the ethernet.  FAL never receives
   the ARP Reply coming from the Proteon.

   With our limited debugging tools, we have determined that the Proteon is
   sending an ARP Reply, but there may be something screwy with the packet
   because the BTI never sends it down the channel to the FAL code.

   We contacted BTI and they indicated that they packet may not be passing
   the hardware diagnostic check for validity in the BTI box.  We just
   happen to have two IBM 4381s and two BTI boxes (both exhibiting the same
   symptoms), so we used one to monitor the activity between the other 4381
   and the Proteon.  We ran the IPLable diagnostic program that BTI
   supplied with the box, putting the BTI in promiscuous mode and dumping
   all of the packets on the ethernet.  We were unable to see any ARP Reply
   packets from the Proteon.

   BTI then gave us a zap to the diagnostic program that places the box in
   both promiscuous and diagnostic mode, which lets us see all packets on
   the ethernet, even those that fail the hardware error checking.  When we
   ran the program, we could *finally* see the ARP Reply packets, *but* we
   noticed that the packet lengths reported by the program were not 60 bytes
   (as expected), but were varying lengths, usually in the range of 22-28
   bytes.  This would seem to indicate that the Proteon is sending out
   invalid ARP Reply packets, which the BTI is rejecting.

   We contacted Proteon and they weren't of much help.  They seem to think
   that if the other hosts on the ethernet can trade ARP requests and replies
   with the p4200, then there can't be anything wrong with their box.

Similar environments:  We have contacted at another site who has the identical
   same configuration as us, and they aren't having the problem.

And in conclusion:  (Thanks for staying with me and reading this far  :-)
   Is there anyone who might have any suggestions?
------------------------------------------------------------------------
David L. Merrifield                  Bitnet:  DM06900@UAFSYSB
University of Arkansas               Phone:   (501) 575-2901

lekash@ORVILLE.NAS.NASA.GOV (John Lekashman) (12/16/88)

We had a somewhat similar problem, (but not exactly the same)
where an amdahl running UTS could not complete the arp transaction.
It turned out to be a bug in the driver, where it would clobber 
the incoming arp packet with a retransmit of the arp request.
Someone was being frugal with buffers, and it did not work if 
the timing was wrong.

We used the workaround of having a couple of vaxes supply the 
arp response for the broken machine, until we got the driver
fixed.
					john

CLIFF@UCBCMSA.BITNET (Cliff Frost {415} 642-5360) (12/16/88)

> Before we go crazy here, I thought I'd broadcast this cry for help on the
> net to see if anyone else has experienced similar behavior or had an opinion
> on the solution to our problem.

Well, we've seen something similar with Proteon Ethernet boards and our
Ungermann-Bass boards in PCs.  But, you are running thin-wire and that
may change things.

Anyway, the problem we saw was that on certain small packets the Proteon
ethernet controller pads the packet with nulls to a greater length.
Our UB cards won't pick up these packets, even though they are perfectly
valid.  Because of this we can't put Proteon Ethernet boards into most
of our routers and remain with Interlan boards.  (I'm not sure what the
status of a fix from UB is, but we have so many of their boards out
there it'll take a long time before we can fix them all.)

If you have Proteon Ethernet boards, you might try an Interlan one and
see if it makes a difference.
        Cliff

jch@SONNE.TN.CORNELL.EDU (Jeffrey C Honig) (12/16/88)

I just searched high and low for a mail message from Proteon I had seen
about this problem to no avail.  Proteon has a padding problem with
their p4215 Proteon Ethernet boards and software version 8.0.  Minimum
length packets are padded with a couple of extra bytes.  This is
supposedly fixed in the next release.  This does not affect the
p4213/p4214 Interlan boards where the padding is done by the Ethernet
controller. 

This won't account for the short packets seen by the BTI box though.  Do
you have any other way of looking at Ethernet packets to verify that
there is really something wrong with them?  Netwatch or Lanwatch on a PC
or etherfind or tcpdump on a Sun be good tools. 

Do you have the latest and greatest FAL driver from David, he found bug
dealing with short packets a couple of weeks ago.  I don't think it
relates to this problem though. 

Jeff