[comp.bugs.4bsd] networking bug?

dave@rosevax.Rosemount.COM (Dave Marquardt) (11/20/86)

I've got a problem.  We have a VAX-11/785 running 4.2BSD, and every two weeks 
or so, telnet and rlogin stop working.  This is a BIG problem, because many of 
our users log in via Bridge terminal servers.  The Bridge box is also no
longer able to talk to the VAX, and disconnects all sessions to that
machine.

In checking out the problem, I tried using telnet from the VAX to another
machine, and got this message:

	Out of buffer space.

This message occurs even after the Bridge box disconnects most of the 
sessions in progress because it can't talk to the VAX any more.
This makes me suspect that some networking code is not freeing up memory 
correctly (or perhaps not at all).  Has anyone else seen this and come up with 
a fix or work-around?  Any help would be appreciated.  Please send mail to
me.

Also, if it helps any, we're using Excelan's EXOS 204 Ethernet controller
and the driver provided by Excelan.  If you need any further information,
send me mail.  Also, if there is interest, I'll post a summary later.

	Thanks in advance,

		Dave Marquardt
-- 
Dave Marquardt			Mail: 	   dave@rosevax.Rosemount.COM
Rosemount, Inc.			Telephone: 612/828-3057

"It's a multipurpose shape -- a box."

sweeney@rust.dec.com (Glenn Sweeney) (11/26/86)

In article <742@rosevax.Rosemount.COM> dave@rosevax.Rosemount.COM (Dave Marquardt) writes:
>I've got a problem.  We have a VAX-11/785 running 4.2BSD, and every two weeks 
>or so, telnet and rlogin stop working.  This is a BIG problem, because many of 
>our users log in via Bridge terminal servers.  The Bridge box is also no
>longer able to talk to the VAX, and disconnects all sessions to that
>machine.
>
>In checking out the problem, I tried using telnet from the VAX to another
>machine, and got this message:
>
>	Out of buffer space.

This problem exists because the system is failing to properly release mbufs
after they are used.

A workaround to this problem is to edit the file /sys/h/mbuf.h and change the
value of NMBCLUSTERS from 256 to 1024, re-sysgen the kernel, and reboot.

Glenn Sweeney
DECwest Engineering
Bellevue, WA., 98007
(206) 865-8738

sweeney%decwet.DEC@decwrl.DEC.COM

chris@mimsy.UUCP (Chris Torek) (11/30/86)

>In article <742@rosevax.Rosemount.COM> dave@rosevax.Rosemount.COM
(Dave Marquardt) writes:
>>I've got a problem.  We have a VAX-11/785 running 4.2BSD, and every
>>two weeks or so, telnet and rlogin stop working. ... In checking out
>>the problem, I tried using telnet from the VAX to another machine,
>>and got this message: Out of buffer space.

In article <235@rust.dec.com>, sweeney@rust.dec.com (Glenn Sweeney) writes:
>This problem exists because the system is failing to properly release
>mbufs after they are used.

I doubt this:  A stock 4.2BSD system will panic with `panic: exit:
m_getclr' very soon after running out of mbufs.  There is at least
one other place in a 4.2 kernel that generates `ENOBUFS' errors,
and that is in the PSN (nee IMP) code in /sys/netimp.

The workaround (increasing NMBCLUSTERS) may still be useful on a
very busy system, where it is indeed possible to run out of space
(and soon panic).  But if you have a connection to a PSN, see if
perhaps your routing tables have become confused.  `netstat -r'
prints the kernel's routing tables.

(Better yet, convert to 4.3BSD and fix all those lurking bugs at
once.  It seems significant that only a handful of fixes have been
posted for 4.3BSD since its release, whereas 4.2 generated a
veritable flood. . . .)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu