[comp.unix.ultrix] qe: Non existant memory interrupt

greg@duke.cs.unlv.edu (Greg Wohletz) (03/24/90)

We have several microvax II's that we are using as fileservers.  The are
running ultrix 3.1.  Periodically (about once every 24 hours) they crash
with ``qe: Non existant memory interrupt''.  A peek at if_qe.c reveales the
following comment:

*  1 Aug 85 -- rjl
*      Panic on a non-existent memory interrupt and the case where a packet
*      was chained.  The first should never happen because non-existant
*      memory interrupts cause a bus reset. The second should never happen
*      because we hang 2k input buffers on the device.


Then in the interupt routine the code:

        if( csr & QE_RCV_INT )
                qerint( unit );
        if( csr & QE_XMIT_INT )
                qetint( unit );
        if( csr & QE_NEX_MEM_INT )
                panic("qe: Non existant memory interrupt");

So it would appear that this is an error condition from the controller
itself.  Has anyone seen this before?  Is there a fix?  What is a
non-existent memory interrupt?

					--Greg

greg@duke.cs.unlv.edu (Greg Wohletz) (03/24/90)

In article <1642@jimi.cs.unlv.edu>, greg@duke.cs.unlv.edu (Greg Wohletz)
writes:
|> From: greg@duke.cs.unlv.edu (Greg Wohletz)
|> Subject: qe: Non existant memory interrupt
|> Date: 23 Mar 90 23:27:57 GMT
|> Organization: The Cave
|> 
|> We have several microvax II's that we are using as fileservers.  The are
|> running ultrix 3.1.  Periodically (about once every 24 hours) they crash
|> with ``qe: Non existant memory interrupt''.  A peek at if_qe.c reveales the
|> following comment:
|> 
|> *  1 Aug 85 -- rjl
|> *      Panic on a non-existent memory interrupt and the case where a packet
|> *      was chained.  The first should never happen because non-existant
|> *      memory interrupts cause a bus reset. The second should never happen
|> *      because we hang 2k input buffers on the device.
|> 
|> 
|> Then in the interupt routine the code:
|> 
|>         if( csr & QE_RCV_INT )
|>                 qerint( unit );
|>         if( csr & QE_XMIT_INT )
|>                 qetint( unit );
|>         if( csr & QE_NEX_MEM_INT )
|>                 panic("qe: Non existant memory interrupt");
|> 
|> So it would appear that this is an error condition from the controller
|> itself.  Has anyone seen this before?  Is there a fix?  What is a
|> non-existent memory interrupt?
|> 
|> 					--Greg

oh yea, one more thing.  The controller in question is a delqa.

						--Greg

grr@cbmvax.commodore.com (George Robbins) (03/24/90)

In article <1642@jimi.cs.unlv.edu> greg@unlv.edu (Greg Wohletz) writes:
> We have several microvax II's that we are using as fileservers.  The are
> running ultrix 3.1.  Periodically (about once every 24 hours) they crash
> with ``qe: Non existant memory interrupt''.  A peek at if_qe.c reveales the
> following comment...
> 
> So it would appear that this is an error condition from the controller
> itself.  Has anyone seen this before?  Is there a fix?  What is a
> non-existent memory interrupt?

Well, the first comment is certainly bogus, since (illegally) long packets
on your ethernet will cause a panic due to "chained packets".  I wouldn't
be too surpries if there is some network disease that could cause the second.

What is the history of this problem?  Is it new with 3.1 or are the machines
new or is there some new system/software elsewhere on your network that has
triggered these panics?

Which board is actually involved?  If all else fails and they're DEQNA's you
might try upgrading to a newer board - see the VMS related DEQNA discussion
recently in comp.sys.dec.  A while back I had a DEQNA problem that turned out
to be a problem with jumpers on the *memory* card, but that was in an PDP11
Q-bus environment...

-- 
George Robbins - now working for,     uucp:   {uunet|pyramid|rutgers}!cbmvax!grr
but no way officially representing:   domain: grr@cbmvax.commodore.com
Commodore, Engineering Department     phone:  215-431-9349 (only by moonlite)

greg@duke.cs.unlv.edu (Greg Wohletz) (03/27/90)

In article <10333@cbmvax.commodore.com>, grr@cbmvax.commodore.com
(George Robbins) writes:
|> In article <1642@jimi.cs.unlv.edu> greg@unlv.edu (Greg Wohletz) writes:
|> > We have several microvax II's that we are using as fileservers.  The are
|> > running ultrix 3.1.  Periodically (about once every 24 hours) they crash
|> > with ``qe: Non existant memory interrupt''.  A peek at if_qe.c
reveales the
|> > following comment...
|> > 
|> > So it would appear that this is an error condition from the controller
|> > itself.  Has anyone seen this before?  Is there a fix?  What is a
|> > non-existent memory interrupt?
|> 
|> Well, the first comment is certainly bogus, since (illegally) long packets
|> on your ethernet will cause a panic due to "chained packets".  I wouldn't
|> be too surpries if there is some network disease that could cause the
second.
|> 
|> What is the history of this problem?  Is it new with 3.1 or are the machines
|> new or is there some new system/software elsewhere on your network that has
|> triggered these panics?

We've had the microvaxes for several years, about 6 months ago we
converted three of them into fileservers, until then we were running
2.0 on them, but we discovered severe NFS bugs with 2.0 that caused
frequent crashes, we also discovered problems with our old DEQNA
boards, so we upgraded to 3.1 and installed new DELQA boards.  This
made things alot better, but we still get the ``non-existent memory
interrupt'' panics daily...   I've found the following in the DELQA
documentation:

    There are three interrupt conditions:

    	o   Recieve Interrupt Request, when a complete packet has
    	    been recieved.

    	o   Transmit Interrupt Request, when a transmission is
    	    completed

    	o   Nonexistent Memory, when a Q-bus or memory access error
    	    occurs.

This seems to match well with the interupt code that looks like:

        if( csr & QE_RCV_INT )
                qerint( unit );
        if( csr & QE_XMIT_INT )
                qetint( unit );
        if( csr & QE_NEX_MEM_INT )
                panic("qe: Non existant memory interrupt");

So the question is what is causes this interupt?  Elsewhere in the
documentation it says:

    Nonexisten-Memory timeout, this is set if the DELQA times out
    while trying to access host memory.

So, I've come up with one possible theory, could the interupt priority
of the DELQA be higher that the processor level set by the kernel when
manipulating the memory management registers?


|> Which board is actually involved?  If all else fails and they're DEQNA's you
|> might try upgrading to a newer board - see the VMS related DEQNA discussion
|> recently in comp.sys.dec.  A while back I had a DEQNA problem that
|> turned out
|> to be a problem with jumpers on the *memory* card, but that was in an PDP11
|> Q-bus environment...

As I said above the card is a less than 6 months old DELQA.  One other
possibility is the following piece of info from the manual:


The mode switch defines two possible modes of operation for the DELQA.
The preferred  mode is the  ``Normal mode'' which  indicates  that the
DELQA is operating  as a DELQA.  All  current DIGITAL software for the
DEQNA may  be used with  confidence for  the  DELQA  when the DELQA is
switched  to operate in Normal mode.   ``DEQNA-lock mode'' should only
be requered for use with some non-DIGITAL  software drivers to acheice
compatibility with DEQNA programming features.


We currently have the boards set up the way they were shiped (normal
mode).  Perhaps I'll try putting them into DEQNA-lock mode and see if
this clears up the problem (What?  You thought Ultrix was ``current
DIGITAL software''?  Shame on you!)

Anyway, if anyone has any further insite I'd sure appreciate it,
otherwise I'll keep you posted.

    	    	    	    	    	    --Greg

mogul@decwrl.dec.com (Jeffrey Mogul) (03/29/90)

In article <10333@cbmvax.commodore.com> grr@cbmvax (George Robbins) writes:
>Well, the first comment is certainly bogus, since (illegally) long packets
>on your ethernet will cause a panic due to "chained packets".  I wouldn't
>be too surpries if there is some network disease that could cause the second.

I don't know anything about the non-existent memory problems; I've
never seen them but we may not have the relevant hardware/software
combination.

I do know that back a few versions ago (definitely Ultrix 1.2, maybe
in Ultrix 2.x) if a chained packet was received, the if_qe driver
would always panic.  (I know this because Dave Boggs was running
his Ethernet performance tests on our net and he sometimes sent
humongous packets.)

I also know that this appears to have been fixed in more recent
versions of the code; there is still a panic on chained packets,
but that is only for a "Should NEVER happen" condition on some status
flag, and in fact chained packets (i.e., packets > 2kbytes long) should
simply be discarded now.

-Jeff