greg@duke.cs.unlv.edu (Greg Wohletz) (03/24/90)
We have several microvax II's that we are using as fileservers. The are running ultrix 3.1. Periodically (about once every 24 hours) they crash with ``qe: Non existant memory interrupt''. A peek at if_qe.c reveales the following comment: * 1 Aug 85 -- rjl * Panic on a non-existent memory interrupt and the case where a packet * was chained. The first should never happen because non-existant * memory interrupts cause a bus reset. The second should never happen * because we hang 2k input buffers on the device. Then in the interupt routine the code: if( csr & QE_RCV_INT ) qerint( unit ); if( csr & QE_XMIT_INT ) qetint( unit ); if( csr & QE_NEX_MEM_INT ) panic("qe: Non existant memory interrupt"); So it would appear that this is an error condition from the controller itself. Has anyone seen this before? Is there a fix? What is a non-existent memory interrupt? --Greg
greg@duke.cs.unlv.edu (Greg Wohletz) (03/24/90)
In article <1642@jimi.cs.unlv.edu>, greg@duke.cs.unlv.edu (Greg Wohletz) writes: |> From: greg@duke.cs.unlv.edu (Greg Wohletz) |> Subject: qe: Non existant memory interrupt |> Date: 23 Mar 90 23:27:57 GMT |> Organization: The Cave |> |> We have several microvax II's that we are using as fileservers. The are |> running ultrix 3.1. Periodically (about once every 24 hours) they crash |> with ``qe: Non existant memory interrupt''. A peek at if_qe.c reveales the |> following comment: |> |> * 1 Aug 85 -- rjl |> * Panic on a non-existent memory interrupt and the case where a packet |> * was chained. The first should never happen because non-existant |> * memory interrupts cause a bus reset. The second should never happen |> * because we hang 2k input buffers on the device. |> |> |> Then in the interupt routine the code: |> |> if( csr & QE_RCV_INT ) |> qerint( unit ); |> if( csr & QE_XMIT_INT ) |> qetint( unit ); |> if( csr & QE_NEX_MEM_INT ) |> panic("qe: Non existant memory interrupt"); |> |> So it would appear that this is an error condition from the controller |> itself. Has anyone seen this before? Is there a fix? What is a |> non-existent memory interrupt? |> |> --Greg oh yea, one more thing. The controller in question is a delqa. --Greg
grr@cbmvax.commodore.com (George Robbins) (03/24/90)
In article <1642@jimi.cs.unlv.edu> greg@unlv.edu (Greg Wohletz) writes: > We have several microvax II's that we are using as fileservers. The are > running ultrix 3.1. Periodically (about once every 24 hours) they crash > with ``qe: Non existant memory interrupt''. A peek at if_qe.c reveales the > following comment... > > So it would appear that this is an error condition from the controller > itself. Has anyone seen this before? Is there a fix? What is a > non-existent memory interrupt? Well, the first comment is certainly bogus, since (illegally) long packets on your ethernet will cause a panic due to "chained packets". I wouldn't be too surpries if there is some network disease that could cause the second. What is the history of this problem? Is it new with 3.1 or are the machines new or is there some new system/software elsewhere on your network that has triggered these panics? Which board is actually involved? If all else fails and they're DEQNA's you might try upgrading to a newer board - see the VMS related DEQNA discussion recently in comp.sys.dec. A while back I had a DEQNA problem that turned out to be a problem with jumpers on the *memory* card, but that was in an PDP11 Q-bus environment... -- George Robbins - now working for, uucp: {uunet|pyramid|rutgers}!cbmvax!grr but no way officially representing: domain: grr@cbmvax.commodore.com Commodore, Engineering Department phone: 215-431-9349 (only by moonlite)
greg@duke.cs.unlv.edu (Greg Wohletz) (03/27/90)
In article <10333@cbmvax.commodore.com>, grr@cbmvax.commodore.com (George Robbins) writes: |> In article <1642@jimi.cs.unlv.edu> greg@unlv.edu (Greg Wohletz) writes: |> > We have several microvax II's that we are using as fileservers. The are |> > running ultrix 3.1. Periodically (about once every 24 hours) they crash |> > with ``qe: Non existant memory interrupt''. A peek at if_qe.c reveales the |> > following comment... |> > |> > So it would appear that this is an error condition from the controller |> > itself. Has anyone seen this before? Is there a fix? What is a |> > non-existent memory interrupt? |> |> Well, the first comment is certainly bogus, since (illegally) long packets |> on your ethernet will cause a panic due to "chained packets". I wouldn't |> be too surpries if there is some network disease that could cause the second. |> |> What is the history of this problem? Is it new with 3.1 or are the machines |> new or is there some new system/software elsewhere on your network that has |> triggered these panics? We've had the microvaxes for several years, about 6 months ago we converted three of them into fileservers, until then we were running 2.0 on them, but we discovered severe NFS bugs with 2.0 that caused frequent crashes, we also discovered problems with our old DEQNA boards, so we upgraded to 3.1 and installed new DELQA boards. This made things alot better, but we still get the ``non-existent memory interrupt'' panics daily... I've found the following in the DELQA documentation: There are three interrupt conditions: o Recieve Interrupt Request, when a complete packet has been recieved. o Transmit Interrupt Request, when a transmission is completed o Nonexistent Memory, when a Q-bus or memory access error occurs. This seems to match well with the interupt code that looks like: if( csr & QE_RCV_INT ) qerint( unit ); if( csr & QE_XMIT_INT ) qetint( unit ); if( csr & QE_NEX_MEM_INT ) panic("qe: Non existant memory interrupt"); So the question is what is causes this interupt? Elsewhere in the documentation it says: Nonexisten-Memory timeout, this is set if the DELQA times out while trying to access host memory. So, I've come up with one possible theory, could the interupt priority of the DELQA be higher that the processor level set by the kernel when manipulating the memory management registers? |> Which board is actually involved? If all else fails and they're DEQNA's you |> might try upgrading to a newer board - see the VMS related DEQNA discussion |> recently in comp.sys.dec. A while back I had a DEQNA problem that |> turned out |> to be a problem with jumpers on the *memory* card, but that was in an PDP11 |> Q-bus environment... As I said above the card is a less than 6 months old DELQA. One other possibility is the following piece of info from the manual: The mode switch defines two possible modes of operation for the DELQA. The preferred mode is the ``Normal mode'' which indicates that the DELQA is operating as a DELQA. All current DIGITAL software for the DEQNA may be used with confidence for the DELQA when the DELQA is switched to operate in Normal mode. ``DEQNA-lock mode'' should only be requered for use with some non-DIGITAL software drivers to acheice compatibility with DEQNA programming features. We currently have the boards set up the way they were shiped (normal mode). Perhaps I'll try putting them into DEQNA-lock mode and see if this clears up the problem (What? You thought Ultrix was ``current DIGITAL software''? Shame on you!) Anyway, if anyone has any further insite I'd sure appreciate it, otherwise I'll keep you posted. --Greg
mogul@decwrl.dec.com (Jeffrey Mogul) (03/29/90)
In article <10333@cbmvax.commodore.com> grr@cbmvax (George Robbins) writes: >Well, the first comment is certainly bogus, since (illegally) long packets >on your ethernet will cause a panic due to "chained packets". I wouldn't >be too surpries if there is some network disease that could cause the second. I don't know anything about the non-existent memory problems; I've never seen them but we may not have the relevant hardware/software combination. I do know that back a few versions ago (definitely Ultrix 1.2, maybe in Ultrix 2.x) if a chained packet was received, the if_qe driver would always panic. (I know this because Dave Boggs was running his Ethernet performance tests on our net and he sometimes sent humongous packets.) I also know that this appears to have been fixed in more recent versions of the code; there is still a panic on chained packets, but that is only for a "Should NEVER happen" condition on some status flag, and in fact chained packets (i.e., packets > 2kbytes long) should simply be discarded now. -Jeff