[net.unix-wizards] MBUF problems

cck%cucca.columbia.arpa@BRL.ARPA (Charlie C. Kim) (02/17/86)

SYSTEM type: VAX + DEQNA
OPERATING System: 4.2BSD and  Ultrix 1.1

GENERAL area: MBUF handling.

PROBLEM: panic: trap type 9 (Protection fault) in ip_output.

DIAGNOSIS: ip_output is called with a bogus mbuf by ip_forward.  

ANALYSIS: This results because a basic assumption of the dtom macro is
violated.  dtom assumes that mbufs are aligned on 128 byte boundaries
and that all the data is contained in that 128 bytes.  For most mbuf
this is true, and it is valid to simply mask off the low order bits.
Unfortunately, for mbufs whose data is in the page pool, the data is
not in the same 128 bytes.  (As a matter of fact, the data pointer
points to some virtual address behind the mbuf.)

On a vax, this would not be seen very often since pages in the pool
generally don't get used until the data size exceeds CLBYTES
(CLSIZE*NBPG) (1024).  Another mitigating factor is the fact that the
private page pool is only used "when copying data from a user process
into the kernel, and when bringing data in at the hardware level".

In particular, the Ultrix 1.1 DEQNA driver uses the private page pool.
(The DEUNA, and other devices may also be affected).


CURE:	I'm not sure there is an easy cure, but I've outlined in what
I think is best to worst solution.  Hopefully, this is fixed in 4.3BSD
or Ultrix 1.2.

	1.  Fix dtom.  This may not be easy.  Though we can check
whether the data pointer is in the data pool or in the mbuf space, it
may be difficult to trace back pointers in the data pool.  Basically,
the problem arises from the fact that copies are made by duplicating
the page entries instead of copying the data.  To handle traceback, we
would need some page table which told us which mbuf a page was
associated with; if more than one mbuf could be associated with a data
page, then things would be a real hassle.

	2. Drop usage of dtom where necessary.  This would require
careful rewrites of portions of the code.  This could be done by
dropping all usages of dtom or by tracing back where dtom could be
used.  Some of this is easy, but some nontrivial rewriting would
definitely be necessary.  For example, in in_input, the ip reassembly
code would have to be reworked.

	3.  Drop usage of the private page pool.  This is undesirable;
though there should be no reason why it wouldn't work.


Charlie C. Kim
User Services
Center for Computing Activities
Columbia University
New York, NY 10025

chris@umcp-cs.UUCP (Chris Torek) (02/19/86)

In article <994@brl-smoke.ARPA> cck%cucca.columbia.arpa@BRL.ARPA

... describes a problem with the Ultrix 1.1 DEQNA (MicroVAX) driver,
and suggests possible fixes.

>1.  Fix dtom.

dtom() is not broken: it is the DEQNA driver that is broken.  Fix
that.

>2. Drop usage of dtom where necessary.

Again, this is the wrong fix.

>3. Drop usage of the private page pool.

The DEQNA driver should not use private mclusters.  It should use
the standard ones, in the standard ways; see vaxif/if_uba.c.  (The
4.3 version handles multiple transmit and receive buffers.  I would
guess that this is part of the problem in the Ultrix driver.)
There are no problems with dtom() as long as the drivers adhere to
the protocol requirements.

I suspect the Ultrix 1.2 DEQNA driver is already fixed.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

chris@umcp-cs.UUCP (Chris Torek) (02/19/86)

In article <3273@umcp-cs.UUCP> I write:
>... it is the DEQNA driver that is broken.  Fix that.

Sigh.  That is what I get for making assumptions.  It is not the
DEQNA driver, it is the generic mbuf code (in old Ultrix only).
(This is relayed via someone who has seen the code; I have not.)

(But dtom() is still not broken.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu