[comp.unix.wizards] de0: buffer unavailable

mullen@nrl-css.arpa (Preston Mullen) (08/13/87)

A Vax 750 running Ultrix 1.1 (groan) started to give off zillions of
	de0: buffer unavailable
messages on the console (one or two every few seconds).  It seems to
correlate with network traffic for that machine, as might be expected
if the DEUNA is involved, but specifically what can cause this?  This
is a machine that worked OK for a very long time.  No known software
changes were made around the time these messages started appearing.

Replacing the DEUNA made no difference.

The machine usually still seems to communicate OK on the network
(telnet, ftp, rcp, rsh, rlogin, lpd).

Thanks for any ideas.

forys@sigi.Colorado.EDU (Jeff Forys) (08/14/87)

In article <8776@brl-adm.ARPA> mullen@nrl-css.arpa (Preston Mullen) writes:
> A Vax 750 running Ultrix 1.1 (groan) started to give off zillions of
>	de0: buffer unavailable

When a packet arrives, and your DEUNA is out of receive buffers, the
packet is dropped, and a flag is set in one of the CSRs.  The DEUNA
driver checks the bit in this CSR and displays the above error if
it's set.

> what can cause this?  This is a machine that worked OK for a very
> long time.  No known software changes were made around the time
> these messages started appearing.

If you bombard the DEUNA with packets (with, for example, a Sun), it'll
start dropping them 'cause the 11/750 doesnt grab them in time.  If you
just got connected to a busy network, that would be a reason why they
suddenly started to show up.  Also, it may have been happening all along,
but a change in what priority messages you are logging brought them into
the limelight.  They are only considered WARNINGs, *not* ERRORs.  In fact,
under 4.3 (I dont have the 4.2 driver handy), you have to explicitly turn
on deuna debugging (in the driver) before you even get to see them!

>Replacing the DEUNA made no difference.

They are all the same.  :-)

>The machine usually still seems to communicate OK on the network
>(telnet, ftp, rcp, rsh, rlogin, lpd).

Right, and you shouldnt notice much.  It may be a little slower, and
a few broadcast packets (e.g. rwho) will be dropped, but if it's
important, it'll be re-transmitted by the higher level protocols.

In summary, since it's working, I dont think it's anything to worry
about.  It only *sounds* bad, in reality, it isnt (zillions?? :-).
---
Jeff Forys @ UC/Boulder Engineering Research Comp Cntr (303-492-4991)
forys@Boulder.Colorado.EDU  -or-  ..!{hao|nbires}!boulder!forys

steve@dartvax.UUCP (Steve Campbell) (08/16/87)

In article <8776@brl-adm.ARPA> mullen@nrl-css.arpa (Preston Mullen) writes:
>A Vax 750 running Ultrix 1.1 (groan) started to give off zillions of
>	de0: buffer unavailable
>messages on the console (one or two every few seconds)....

Precisely the same thing happened here.  Ultrix support people suggested
changing the value of NRCV (the number of receive buffers) from 4 to 8
in /sys/data/if_de_data.c and making a new kernel.  Even binary licensees
can do this.  It was an intelligent suggestion, but by itself it didn't
solve the problem.  But I left it in anyway.  [4.3BSD has a value of 7.]

Running "netstat 10" on the effected 750 showed that once a minute
there was a burst of broadcast packets, roughly equal in number to the
number of TCP/IP hosts on the ethernet.  We then went to a Sun that's
on the wire and ran the etherfind(8C) program, which watches all
packets on the ethernet and prints out their type, source, destination,
etc.  [This is a VERY handy tool.  Does anyone have something
comparable for VAXen under BSD??]  The cause of the problem jumped out
at us: one host (not the 750) was using 0xFF as the host part of the
broadcast packet it was sending out for rwho.  That is, it was using
the new 4.3 broadcast address, while everyone else, including the 750,
is still using the old address (0x0).  See "Installing & Operating" in
your BSD documentation for details.  And as soon as that host blurted
out that "bad" broadcast packet, every other host on the wire answered
with a broadcast packet of its own.  I do not know exactly what those
packets where supposed to be doing; maybe someone else can explain.
But once we fixed that one errant host, everything went back to normal,
and the 750 even shut up.

						Steve Campbell
						Dartmouth College

narten@purdue.edu (Thomas Narten) (08/20/87)

The message means that a burst of packets was recieved faster than the
cpu can process them. The device driver makes a small number of
buffers available to the DEUNA. A burst of packets may use them all up
if the cpu cannot process packets fast enough.

You could try raising the number of buffers the DEUNA has to play
with, but it probably won't help. Look for the constants NXMT and NRCV.

Most likely, your problems are caused by broadcast storms and other
such bogus packet bursts. That is the problem you really need to
tackle.

Thomas