[comp.sys.sun] LANCE/ethernet problem.

apollo@bu.edu (Douglas Chan) (05/29/90)

We're getting a lot of the following pairs of error messages:
 le0: Received packet with ENP bit in rmd cleared
 le0: Received packet with STP bit in rmd cleared

It doesn't seem to be causing any problems, it is just an annoyance...

I've called Sun about it and they say it is caused by other systems on the
network generating packet sizes greater than 1515.  I've tried tracking
down the source(s) w/ etherfind but it doesn't seem to find anything
generating packets larger than 1515.

Just for your info, the system is a 4/380 (re: 4/330), SunOS-4.0.3.

-Doug
apollo@raven.bu.edu

glenn@uunet.uu.net (Glenn Herteg) (05/30/90)

We also receive the

	vmunix: le0: Received packet with ENP bit in rmd cleared
	vmunix: le0: Received packet with STP bit in rmd cleared

messages periodically.  They almost always come as three pairs together.
I've never tracked them down, but it might be difficult to even try -- our
network has both Suns (3/50s and 3/60s, 4.0.1 and 4.0.3) and PCs talking
totally different protocols on the same (thinnet) wire, with repeaters
connecting many, many segments into a gigantic octopus.

Looking back over my /var/adm/messages files from the last month, I see
the times at which these messages occur as follows:

	May  1 17:37:57 (Tuesday)
	May  3 19:15:47 (Thursday)
	May  7 17:07:18 (Monday)
	May  8 18:57:45 (Tuesday)
	May 10 17:07:57 (Thursday)
	May 14 17:16:38 (Monday)
	May 15 17:21:09 (Tuesday)
	May 17 18:20:30 (Thursday)
	May 24 18:21:19 (Thursday)

The timing of these events suggests to me a common end-of-day phenomenon:
perhaps one of our PC users is turning off the power, and the machine
squelches as it dies.  I have no proof of this, but it might be worth
checking out.  Or perhaps it is due to noise as someone opens a segment
momentarily to install or deinstall equipment.  Just why Monday, Tuesday,
and Thursday should be favorite days, almost on a regular basis, I don't
know.

pm@cs.city.ac.uk (Pete Mellor) (05/31/90)

In v9n179, buengc!apollo@bu.edu (Douglas Chan) writes:

> We're getting a lot of the following pairs of error messages:
>  le0: Received packet with ENP bit in rmd cleared
>  le0: Received packet with STP bit in rmd cleared

> It doesn't seem to be causing any problems, it is just an annoyance...

Funny! I've been getting exactly the same thing ever since I started using
the Sun on a network. (Sun 3/50 running OS 3.5, but the 3/80's running
4.0.3 get similar messages.) It scared the heck out of me the first time I
saw it, but now I just ignore it.

In case it's relevant, I usually see a batch of 20 or more of these pairs
stacked up on the screen waiting for me in the morning, particularly on
Monday. (They appear on the screen while the machine is logged out.)
Strangely, they very rarely appear on the console window while I am using
the machine.

> I've called Sun about it and they say it is caused by other systems on the
> network generating packet sizes greater than 1515.  I've tried tracking
> down the source(s) w/ etherfind but it doesn't seem to find anything
> generating packets larger than 1515.

I don't wish to sound ignorant, but what's magic about 1515? I've done no
investigation at all, on the principle "If it's not broken, why fix it?",
but if any network wizard out there is willing to educate me, I'd like to
know what's going on, too!

Peter Mellor,
Centre for Software Reliability,
City University,
Northampton Square,
London EC1V 0HB

  Tel.: +44 (0)71-253-4399 Ext. 4162/3/1
  Fax.: +44 (0)71-253-3861
E-mail: p.mellor@uk.ac.city (JANET)

tktk@physics.att.com (06/01/90)

In v9n179, buengc!apollo@bu.edu (Douglas Chan) writes:

> We're getting a lot of the following pairs of error messages:
>  le0: Received packet with ENP bit in rmd cleared
>  le0: Received packet with STP bit in rmd cleared

> It doesn't seem to be causing any problems, it is just an annoyance...

> I've called Sun about it and they say it is caused by other systems on the
> network generating packet sizes greater than 1515.  I've tried tracking
> down the source(s) w/ etherfind but it doesn't seem to find anything
> generating packets larger than 1515.

 Pete Mellor responds (v9n189):
> I don't wish to sound ignorant, but what's magic about 1515?

Well this all makes some small amount of sense because we have seen this
problem too.  It began to happen only after we put up nfs on some
Sparcstations.  Turns out that nfs packets are logically large and need to
be fragmented down to the max ethernet size of 1526 bytes (suspiciously
close to the 1515 number above).  The fragmented packets are sent lickety
split one after another which is normally not a problem.  It turns out we
had a bad thinwire port on a repeater which propagated trash on the net
after each packet.  Because packets are normally widely spaced the trash
was normally ignored, but in the case of the fragmented packets the trash
landed on top of a packet and the above errors showed up.

Not every machine showed the problem at the same time (which might have
something to do with propagation delay?) Any kind of noise on a very busy
net would probably turn up the same problem.  etherfind shows the size of
the ethernet packet not the logical ip packet, I think fragmented packets
are preceeded by an *.

Terry Kovacs
AT&T Bell Labs
tktk@physics.att.com

sah@cs.purdue.edu (Sean Hershberger) (06/01/90)

In v9n179, buengc!apollo@bu.edu (Douglas Chan) writes:

> We're getting a lot of the following pairs of error messages:
>  le0: Received packet with ENP bit in rmd cleared
>  le0: Received packet with STP bit in rmd cleared

> It doesn't seem to be causing any problems, it is just an annoyance...

> I've called Sun about it and they say it is caused by other systems on the
> network generating packet sizes greater than 1515.  I've tried tracking
> down the source(s) w/ etherfind but it doesn't seem to find anything
> generating packets larger than 1515.

I've seen this same problem as a result of noisy twisted pair connections.
In our particular case we fixed the problem by replacing the RJ-11 and
RJ-45 connectors we were using.  We had been using connectors, meant for
use on stranded wire, on solid wire.  The result was bad connections which
worked but generated a lot of noise.

Sean Hershberger		sah@arthur.cs.purdue.edu
Computer Hardware Engr.
Purdue University

poffen@sj.ate.slb.com (Russ Poffenberger) (06/05/90)

In article <8324@brazos.Rice.edu> pm@cs.city.ac.uk (Pete Mellor) writes:
>X-Sun-Spots-Digest: Volume 9, Issue 189, message 5
>
>In v9n179, buengc!apollo@bu.edu (Douglas Chan) writes:
>
>> We're getting a lot of the following pairs of error messages:
>>  le0: Received packet with ENP bit in rmd cleared
>>  le0: Received packet with STP bit in rmd cleared

I had exactly the same problem and finally tracked it down to a thick-wire
ethernet transceiver cable not seating properly on a system on the
network.  What tracked it down for me was the flurry of messages everytime
we did a backup of that particular machine over the network. I checked the
date/time stamps of the messages in /var/adm/messages on the system
showing the messages, and matched it to the date/time stamp of my backup
log and it coincided perfectly with one of the machines.

This is something to look for on ALL ethernet networks out there. Check
the integrity of you transceiver cable connections. You may very well find
that the connectors don't seat deeply enough when mated. This is due to
the apparent lack of standardization in the length of the locking "posts"
on the connectors.  You may find that removing one or more of the washers
that space the posts out need to be removed to get adequate seating.

This of course varies with make of cable and such, your mileage may vary.

Russ Poffenberger               DOMAIN: poffen@sj.ate.slb.com
Schlumberger Technologies       UUCP:   {uunet,decwrl,amdahl}!sjsca4!poffen
1601 Technology Drive		CIS:	72401,276
San Jose, Ca. 95110             (408)437-5254

nickw@sol1.harlqn.co.uk (Nick Walton) (06/05/90)

In v9n179, buengc!apollo@bu.edu (Douglas Chan) writes:

> We're getting a lot of the following pairs of error messages:
>  le0: Received packet with ENP bit in rmd cleared
>  le0: Received packet with STP bit in rmd cleared

> It doesn't seem to be causing any problems, it is just an
annoyance...

Out of interest I've seen this message on a network of three Sparc 4C
workstations running SunOS 4.0.3c I was looking after a while ago.
Everything was ok until we hooked up to the outside world ie DECNET using
Suns DNI software.  The DEC system manager said it was a DEC problem and
he'd asked DEC to look into it. The connection to DECNET used thickwire
ethernet, and the DECNET netted together a number of PC's VAXES and
possibly 386is, if that's of any use. The machines crashed occasionally
usually whilst they were not being used but wether this problem was the
cause I dont know.

Nick Walton

dowell@flamingo.metaphor.com (Craig Dowell) (06/06/90)

In article <8509@brazos.Rice.edu> nickw@sol1.harlqn.co.uk (Nick Walton) writes:
>X-Sun-Spots-Digest: Volume 9, Issue 194, message 9
>In v9n179, buengc!apollo@bu.edu (Douglas Chan) writes:
>> We're getting a lot of the following pairs of error messages:
>>  le0: Received packet with ENP bit in rmd cleared
>>  le0: Received packet with STP bit in rmd cleared

Nobody, that I have read, has gone into the details of what the Lance is
really telling us, sooooo ...

The rmd mentioned is a Receive Message Descriptor -- an element of the
Lance receive ring.  The rmd specifies things like: where is the receive
buffer, how long is it, does the Lance "own" it and includes a byte for
the Lance to write status.  ENP and STP are bits in that status byte.

The Lance will scatter a packet into multiple buffers if the first buffer
is not big enough to receive the whole packet.  STP (Start Of Packet)
means that the buffer related to this rmd is the first buffer of a
scattered packet.  ENP (ENd of Packet) means that this rmd points to the
last buffer to receive data from the packet.

Many systems don't allow scattered packet data and therefore any rmd
without STP and ENP set is an unusual (error?) condition.  Lance drivers
that I have seen will just pitch rmds without both STP and ENP set.  Why
the Sun needs to print the message?  Dunno.