cck%cucca.columbia.arpa@BRL.ARPA (Charlie C. Kim) (02/17/86)
SYSTEM type: VAX + DEQNA OPERATING System: 4.2BSD and Ultrix 1.1 GENERAL area: MBUF handling. PROBLEM: panic: trap type 9 (Protection fault) in ip_output. DIAGNOSIS: ip_output is called with a bogus mbuf by ip_forward. ANALYSIS: This results because a basic assumption of the dtom macro is violated. dtom assumes that mbufs are aligned on 128 byte boundaries and that all the data is contained in that 128 bytes. For most mbuf this is true, and it is valid to simply mask off the low order bits. Unfortunately, for mbufs whose data is in the page pool, the data is not in the same 128 bytes. (As a matter of fact, the data pointer points to some virtual address behind the mbuf.) On a vax, this would not be seen very often since pages in the pool generally don't get used until the data size exceeds CLBYTES (CLSIZE*NBPG) (1024). Another mitigating factor is the fact that the private page pool is only used "when copying data from a user process into the kernel, and when bringing data in at the hardware level". In particular, the Ultrix 1.1 DEQNA driver uses the private page pool. (The DEUNA, and other devices may also be affected). CURE: I'm not sure there is an easy cure, but I've outlined in what I think is best to worst solution. Hopefully, this is fixed in 4.3BSD or Ultrix 1.2. 1. Fix dtom. This may not be easy. Though we can check whether the data pointer is in the data pool or in the mbuf space, it may be difficult to trace back pointers in the data pool. Basically, the problem arises from the fact that copies are made by duplicating the page entries instead of copying the data. To handle traceback, we would need some page table which told us which mbuf a page was associated with; if more than one mbuf could be associated with a data page, then things would be a real hassle. 2. Drop usage of dtom where necessary. This would require careful rewrites of portions of the code. This could be done by dropping all usages of dtom or by tracing back where dtom could be used. Some of this is easy, but some nontrivial rewriting would definitely be necessary. For example, in in_input, the ip reassembly code would have to be reworked. 3. Drop usage of the private page pool. This is undesirable; though there should be no reason why it wouldn't work. Charlie C. Kim User Services Center for Computing Activities Columbia University New York, NY 10025
chris@umcp-cs.UUCP (Chris Torek) (02/19/86)
In article <994@brl-smoke.ARPA> cck%cucca.columbia.arpa@BRL.ARPA ... describes a problem with the Ultrix 1.1 DEQNA (MicroVAX) driver, and suggests possible fixes. >1. Fix dtom. dtom() is not broken: it is the DEQNA driver that is broken. Fix that. >2. Drop usage of dtom where necessary. Again, this is the wrong fix. >3. Drop usage of the private page pool. The DEQNA driver should not use private mclusters. It should use the standard ones, in the standard ways; see vaxif/if_uba.c. (The 4.3 version handles multiple transmit and receive buffers. I would guess that this is part of the problem in the Ultrix driver.) There are no problems with dtom() as long as the drivers adhere to the protocol requirements. I suspect the Ultrix 1.2 DEQNA driver is already fixed. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
chris@umcp-cs.UUCP (Chris Torek) (02/19/86)
In article <3273@umcp-cs.UUCP> I write: >... it is the DEQNA driver that is broken. Fix that. Sigh. That is what I get for making assumptions. It is not the DEQNA driver, it is the generic mbuf code (in old Ultrix only). (This is relayed via someone who has seen the code; I have not.) (But dtom() is still not broken.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu