[comp.dcom.lans] DEC LAN Bridges

dorl@vms.macc.wisc.edu (Michael Dorl - MACC) (10/19/88)

We have been having a rash of recent problems that would best
be explained by the fact that DEC LAN-100 Bridges stop forwarding
all 1-s broadcast traffic.  Problems include tcp/ip sites that
do not see routed routing traffic and tcp/ip sites that can not
initiate a session to a machine on the other side of a bridge.

I'd be interested in hearing from anyone who has experienced
similar problems or who has any insight into the problem.

Michael Dorl (608) 262-0466
dorl@vms.macc.wisc.edu
dorl@wiscmacc.bitnet

chris@wucfua.wustl.edu (Chris Myers) (10/19/88)

In article <773@dogie.edu> dorl@vms.macc.wisc.edu (Michael Dorl - MACC) writes:
>We have been having a rash of recent problems that would best
>be explained by the fact that DEC LAN-100 Bridges stop forwarding
>all 1-s broadcast traffic.  Problems include tcp/ip sites that
>do not see routed routing traffic and tcp/ip sites that can not
>initiate a session to a machine on the other side of a bridge.
>
>I'd be interested in hearing from anyone who has experienced
>similar problems or who has any insight into the problem.
>
>Michael Dorl (608) 262-0466

Washington University has experienced this problem in its network several times
in the past few months.  It seems largely attributable to one of two causes:

   1)  A LAN Bridge 100 with version 1 ROMs.  There is a bug that causes a
       PERMANENT MANAGEMENT entry to be placed in the forwarding database that
       blocks the FF-FF-FF-FF-FF-FF address whenever the bridge is initialized.
       Get the version 2.0 ROM field upgrade kits to fix the problem.

   2)  There are (apparently) some confused nodes on the network that will
       send a packet with a source address of FF-FF-FF-FF-FF-FF when they boot.
       The LAN Bridges learn this source address, of course, and proceed to
       block most of the broadcast traffic on the network.  For some unknown
       reason DEC decided that the all FF's address should not be expired from
       the bridging database, unlike all other entries.  The only way to fix
       this is to use the Remote Bridge Management Software (RBMS) from DEC
       and remove the forwarding entry for FF-FF-FF-FF-FF-FF.

We have decided that the best way to deal with the problem is to run a batch
job every 15 minutes or so and check a bridge to see if it has learned the all
FF's address, and if so it removes it on all working bridges.  The are better
better solutions (routers, smart bridges) that we are looking at seriously.

Chris Myers
Software Engineer
Washington University Office of the Network Coordinator

davew@gvgpsa.GVG.TEK.COM (David C. White) (10/20/88)

In article <773@dogie.edu> dorl@vms.macc.wisc.edu (Michael Dorl - MACC) writes:
>We have been having a rash of recent problems that would best
>be explained by the fact that DEC LAN-100 Bridges stop forwarding
>all 1-s broadcast traffic.  Problems include tcp/ip sites that
>do not see routed routing traffic and tcp/ip sites that can not
>initiate a session to a machine on the other side of a bridge.


This is caused by the bridge seeing a src address of ff-ff-ff-ff-ff-ff
on one side or the other of the bridge.  Once it sees this src address
from then on it will not forward broadcast traffic depending which side
of the bridge you see it on.  One very easy way to lock up the DEC
bridge is to ping the broadcast address for the network.  There is a
fix on the way from DEC to get around this problem.  Pester your
service engineer to get the updated firmware for the bridge.  It isn't
released yet, but he may be able to get it.  In the meantime you
will be stuck reinitiailizing the bridge(s) whenever this happens.

I would also question what the real cause of the problem is, in other
words, what device on your network is sending out messages with
ff-ff-ff-ff-ff-ff as the source address?  Find the culprit(s) and
fix them also.
-- 
Dave White	Grass Valley Group, Inc.   PHONE: +1 916.478.3052
P.O. Box 1114  	Grass Valley, CA  95945    davew@gvgpsa.gvg.tek.com

alan@cunixc.columbia.edu (Alan Crosswell) (10/20/88)

This sounds like a similar problem we saw and I posted to this group
several months ago and received several replies on.  The problem
appears to be that a sick host sends a packet whose SOURCE address is
the ethernet broadcast address.  This of course violates the ethernet
spec. The lanbridge doesn't bother checking for this case and merrily
enters this source address into its forwarding database.  So now what
you have is a forwarding entry that says the broadcast address is on
one side of the bridge.  The result is that broadcasts go thru in one
direction but not the other, so ARP, etc.  work in one direction but
not the other.

If you have RBMS, you can confirm this by issuing a command to the bridge
along the lines of:

	SHOW FORW PHYS ADDR FF-FF-FF-FF-FF-FF

To clear the forwarding entry, issue:

	REMOVE FORW PHYS ADDR FF-FF-FF-FF-FF-FF

If you don't have RBMS, simply power-cycle the bridge until the next time
you get hosed.

Here's a VMS batch jub that I run hourly to keep our bridges squeaky clean:

$ SET VERIFY
$ PURGE/KEEP=10 RBMS_CLEANUP.LOG
$ RBMS
USE KNOWN BR
SHOW FORW PHYS ADDR FF-FF-FF-FF-FF-FF
REMOVE FORW PHYS ADDR FF-FF-FF-FF-FF-FF
$ SUBMIT/AFTER="+1:00:00"/RESTART RBMS_CLEANUP
$ EXIT

Pretty gross, Huh?

/a

ron@ron.rutgers.edu (Ron Natalie) (10/20/88)

Your lan bridges are too new, but not new enough.  Some bogus host sourced
a packet from the broadcast addreess.  The LANBRIDGE now thinks that the
all ones address is local to a segment and doesn't forward it everywhere.
Call DEC.  The major problem is that ARP stops working.

-ROn

cyrus@pprg.unm.edu (Tait Cyrus) (10/20/88)

Dave White asks:
>I would also question what the real cause of the problem is, in other
>words, what device on your network is sending out messages with
>ff-ff-ff-ff-ff-ff as the source address?  Find the culprit(s) and
>fix them also.

We, like everyone else, are being bitten by this DEC LANBridge bug.
We have even been able to capture the bogus packet though.
Unfortunately, the packet contained ALL 1's (i.e. ff ff ff ff ff ....)
so there was no way to tell which "box" sent it.

I have, though, some idea as to which "boxes" are sending these
packets.  In 5 days, 3 of these packets have been seen on our net
coming from different places on the net (i.e. different "boxes").
Looking at ruptimes and such, I noticed that a machine had just been
rebooted (15 minutes after the bogus packet).  A possibility is that
this machine sent the bogus packet upon reboot/shutdown/crash.  I
chalked this up to coincidence.  A few days later I saw two more
bogus packets (100 seconds apart) from a DIFFERENT part of the net, a
net were a "box" was having new software installed.  Again I chalked
this up to coincidence.

Ok, so far nothing "really" out of the ordinary.  Well, it turns out
that both "boxes" are IBM PC/RT's, one running AIX and the other having
BSD 4.3 installed on it.  I am NOT saying that the IBM PC/RT (with UB
ethernet boards) is causing the problems, it is that I find it fairly
interesting that there is such a coincidence.

While watching the net, I have walked up to an IBM PC/RT and powered
it off allowing it to reboot; no bogus packets seen.  I have shut the
IBM down (gracefully); again no bugus packets.

Has anyone seeing these bogus packets noticed similar circumstances?
Could it the IBM PC/RT?  Could it the UB ethernet board?  Could I be
looking at events that are pure coincidence?  Am I full of it? :-) 

Comments/ideas/suggestions/flames/etc ?????

---
    @__________@    W. Tait Cyrus   (505) 277-0806
   /|         /|    University of New Mexico
  / |        / |    Dept of ECE - Parallel Processing Research Group
 @__|_______@  |    Albuquerque, New Mexico 87131
 |  |       |  |
 |  |  hc   |  |    e-mail:
 |  @.......|..@       cyrus@pprg.unm.edu
 | /        | /
 @/_________@/

goeran@ae.chalmers.se (Goran Bengtson) (10/27/88)

In article <23652@pprg.unm.edu>, cyrus@pprg.unm.edu (Tait Cyrus) writes:

> Ok, so far nothing "really" out of the ordinary.  Well, it turns out
> that both "boxes" are IBM PC/RT's, one running AIX and the other having
> BSD 4.3 installed on it.  I am NOT saying that the IBM PC/RT (with UB
> ethernet boards) is causing the problems, it is that I find it fairly
> interesting that there is such a coincidence.
> 
> While watching the net, I have walked up to an IBM PC/RT and powered
> it off allowing it to reboot; no bogus packets seen.  I have shut the
> IBM down (gracefully); again no bugus packets.
> 
> Has anyone seeing these bogus packets noticed similar circumstances?
> Could it the IBM PC/RT?  Could it the UB ethernet board?  Could I be
> looking at events that are pure coincidence?  Am I full of it? :-) 

PC/RT with UB boards (at least some models) IS a source.  We have
confirmed that. 

The UB board sends a packet from it's internal memory when initilized
and/or started.  Warm restart may give packet with ANY content 
(usually part of the last packet seen before shutdown) so you can get
ANY source address in that packet. Cold restart usually gives 1's or
0's (dynamic ram...).

We think that it IS possible to initialize the board without causing
this problem.  We have not seen it from PC/RT running AIX,  only from
PC/RT's running Bsd 4.3 (with or without Andrew File system).
ifconfig down/up cause a random packet to be sent! 

Our temporary fix was to make sure that a KNOWN packet (short,  but with
legal source and destination address) is transmitted when the board is
initiated.
-- 
Goran Bengtson				Email:	goeran@ae.chalmers.se
Dept. of Applied Electronics
Chalmers Univ. of Technology		Phone:  +46 31 721825	(int)
S-412 96 Gothenburg				031 721825	(nat)
Sweden