[comp.sys.mac.misc] MEMORY FAULTS - any way to check ram for intermittent faults??

phd11@keele.ac.uk (Zipzoid) (09/12/90)

Hi Netters,

Subject says it all really.

Plain vanilla !Mbyte Mac+, no INITS etc, sys 6.0.4.

Program dies with err=02, onetime, but restarted and run through to same
position again...no error.

This has happened several times before (for various programs).
I'm wondering if  this can be a intermittent ram fault, remember that I'm
using NO INITs/Cdev's whatsoever.

My question is:
Is there a program available that will 'soak test' a Mac checking memory
(for days if necessary), looking for intermittent faults.

I used to have one for an Apple ][ I had once (oooh that's going back some!),
but I don't think I've ever seen one for the Mac.

Even something that could be entered in (in hex!) through the inbuilt
(ROM based) MAcsbug would be a help!

Can anyone enlighten me?
thanks for your time...
--
Tony McDonald (Tones)		
                                JANET:    phd11@uk.ac.kl.seq1
               ~ *              ARPANET:  phd11@seq1.kl.ac.uk
               \_/              BITNET:   phd11%uk.ac.kl.seq1@ukacrl

dorner@pequod.cso.uiuc.edu (Steve Dorner) (09/13/90)

In article <615@keele.keele.ac.uk> phd11@keele.ac.uk (Zipzoid) writes:
>Plain vanilla !Mbyte Mac+, no INITS etc, sys 6.0.4.
>Program dies with err=02, onetime, but restarted and run through to same
>position again...no error.
>I'm wondering if  this can be a intermittent ram fault

It's probably sort of an "intermittent ram fudge".

02 is an address error; the 68000 tried to access a word or longword at
an odd address, which it can't do.  Chances are the program is using an
invalid pointer.  Chances further are that this pointer is 0, and that
your mac dies because the four bytes starting at location 0 are odd.

If nobody is purposefully writing to location 0 (and no one should),
location 0 will have random junk in it; this is probably why you have
a problem sometimes and not others.

>Even something that could be entered in (in hex!) through the inbuilt
>(ROM based) MAcsbug would be a help!

Try saying "sm 2 0101", continuing (what's the command for that in the
mini-monitor?) and running the program.  If my guesses are correct, it will
fail every time.
--
Steve Dorner, U of Illinois Computing Services Office
Internet: s-dorner@uiuc.edu  UUCP: uunet!uiucuxc!uiuc.edu!s-dorner

dwal@ellis.uchicago.edu (David Walton) (09/13/90)

In article <1990Sep12.185459.15694@ux1.cso.uiuc.edu> dorner@pequod.cso.uiuc.edu (Steve Dorner) writes:

>02 is an address error; the 68000 tried to access a word or longword at
>an odd address, which it can't do.  Chances are the program is using an
>invalid pointer.  Chances further are that this pointer is 0, and that
>your mac dies because the four bytes starting at location 0 are odd.
>
>If nobody is purposefully writing to location 0 (and no one should),
>location 0 will have random junk in it; this is probably why you have
>a problem sometimes and not others.

...

>Try saying "sm 2 0101", continuing (what's the command for that in the
>mini-monitor?) and running the program.  If my guesses are correct, it will
>fail every time.

Um...I think that's a typo.  You mean sm _0_ 0101, don't you?  You
don't want to put this at memory location 2, you want it at 0.

The proper sequence is

	>sm 0 0101 (or any odd four-byte number)
	>g

">" is the ROM debugger prompt.

>Steve Dorner, U of Illinois Computing Services Office




--
David Walton            Internet: dwal@midway.uchicago.edu
University of Chicago   {  Any opinions found herein are mine, not  }
Computing Organizations {  those of my employers (or anybody else). }

gix (Brian Gix) (09/14/90)

In article <1990Sep13.154744.2277@midway.uchicago.edu> dwal@ellis.uchicago.edu (David Walton) writes:
>In article <1990Sep12.185459.15694@ux1.cso.uiuc.edu> dorner@pequod.cso.uiuc.edu (Steve Dorner) writes:
>
>>02 is an address error; the 68000 tried to access a word or longword at
>>an odd address, which it can't do.  Chances are the program is using an
>>invalid pointer.  Chances further are that this pointer is 0, and that
>>your mac dies because the four bytes starting at location 0 are odd.
>...
>
>>Try saying "sm 2 0101", continuing (what's the command for that in the
>>mini-monitor?) and running the program.  If my guesses are correct, it will
>>fail every time.
>
>Um...I think that's a typo.  You mean sm _0_ 0101, don't you?  You
>don't want to put this at memory location 2, you want it at 0.
>
>The proper sequence is
>
>	>sm 0 0101 (or any odd four-byte number)
>	>g
>
>
>>Steve Dorner, U of Illinois Computing Services Office
>
>
 Truely you want to save at memory location 2, since the 68000 uses 32
 bit addresses... :
  in MacsBug:
  >sb 2 01 01 
  *or*
  >sb 0 00 00 01 01

  would both accomplish the same thing.




-- 

================================================================================
|| Brian Gix                        ||      ...uunet!mdisea!gix               ||
|| Mobile Data International        ||          gix@mdi.com                   ||

Greg@AppleLink.apple.com (Greg Marriott) (09/15/90)

In article <1990Sep13.154744.2277@midway.uchicago.edu> 
dwal@ellis.uchicago.edu (David Walton) writes:
> >Try saying "sm 2 0101", continuing (what's the command for that in the
> >mini-monitor?) and running the program.  If my guesses are correct, it 
will
> >fail every time.
> 
> Um...I think that's a typo.  You mean sm _0_ 0101, don't you?  You
> don't want to put this at memory location 2, you want it at 0.
> 
> The proper sequence is
> 
>         >sm 0 0101 (or any odd four-byte number)
>         >g

In this case, sm 2 0101 is correct.   0101 is an odd TWO byte number, 
being stored in the low word of the longword at 0.

If you want to find programs dereferencing location 0, a good value to put 
there is $00f0f0f1.  This will cause an adddress error on 68000 macs, and 
a bus error on others.  There is an INIT called Mr. BusError that writes 
this value at location 0, and keeps writing it there from time to time (in 
a VBL task, I think).  It keeps writing the value there in case an errant 
program writes to location 0, blasting the bus error causing value.

Greg Marriott
Just Some Guy
Apple Computer, Inc.

d88-jwa@dront.nada.kth.se (Jon W{tte) (09/15/90)

>In article <> dorner@pequod.cso.uiuc.edu (Steve Dorner) writes:

>>Try saying "sm 2 0101", continuing (what's the command for that in the
>>mini-monitor?) and running the program.  If my guesses are correct, it will
>>fail every time.


In article <> dwal@ellis.uchicago.edu (David Walton) writes:

>Um...I think that's a typo.  You mean sm _0_ 0101, don't you?  You
>don't want to put this at memory location 2, you want it at 0.

>	>sm 0 0101 (or any odd four-byte number)
>	>g

0101 is a TWO-byte number.

Well, actually 2 IS right, since the last byte of the longword at 0
should be odd, which means that the last byte of the word at 2
should be odd. To avoid confusion, use

sm 0 bbbbbbbb

(Or, in MacsBug, sm 0 NIL! :-)

							h+

	Jon W{tte, Stockholm, Sweden, h+@nada.kth.se

dwal@ellis.uchicago.edu (David Walton) (09/15/90)

In article <10242@goofy.Apple.COM> Greg@AppleLink.apple.com (Greg Marriott) writes:

>dwal@ellis.uchicago.edu (David Walton) writes:
>> 
>> Um...I think that's a typo.  You mean sm _0_ 0101, don't you?  You
>> don't want to put this at memory location 2, you want it at 0.
>> 
>
>In this case, sm 2 0101 is correct.   0101 is an odd TWO byte number, 
>being stored in the low word of the longword at 0.

That's right.  Wasn't thinking.  I usually use Macsbug and do a 
	sm 0 'NIL!'

>Greg Marriott



--
David Walton            Internet: dwal@midway.uchicago.edu
University of Chicago   {  Any opinions found herein are mine, not  }
Computing Organizations {  those of my employers (or anybody else). }

dorner@pequod.cso.uiuc.edu (Steve Dorner) (09/18/90)

In article <10242@goofy.Apple.COM> Greg@AppleLink.apple.com (Greg Marriott) writes:
>a VBL task, I think).  It keeps writing the value there in case an errant 
>program writes to location 0, blasting the bus error causing value.

And therein lies an interesting tale.

My mail program, Eudora, puts a longword of 0 at location 0, on the theory
that, should something go wrong and a nil pointer be dereferenced, 0 could
be less harmful than random junk (after all, random junk will give you
a bus error on 68000's half the time).  Marginal, I know, but I figured
that it wouldn't hurt anything, and might help somebody sometime. [PLEASE
read on BEFORE billy-uns of you post that I shouldn't do it.]

Friday I received a call from a guy at some software company.

Him: "Does your program write to location 0?"
Me:  "Yep."
Him: "We are working on a commercial application, and this gives us trouble."
Me:  "Too bad."
Him: "But Apple says you shouldn't write to location 0."
Me:  "Well, then you shouldn't be using it either, should you?"
Him: <excuses>
Me:  <extreme annoyance>

Anyway, he never would tell me what the program was, and made several comments
along the lines of:

"Well, we're developing a commercial app, so we have testing cycles, and
can't just release a new version."  (Implying that I didn't test, and could
release a version every time I blew my nose.)

"Well, WE don't leave debugging code in our production apps."

(My response was, neither do I; my *debugging* code puts *odd* values at 0.)

Anyway, I'm going to remove the code from Eudora, because he also pointed
out that it bothers "The Debugger" (Jasik's).  The benefit was really
minimal to nonexistent anyway.

But the guy left me more than a little ticked off; I didn't appreciate
his "we make people pay so we're better than you" attitude.

[By the way, the reason this guy cared in such a big way about Eudora is
that one of his customers was using Eudora and didn't want to give it up.]
--
Steve Dorner, U of Illinois Computing Services Office
Internet: s-dorner@uiuc.edu  UUCP: uunet!uiucuxc!uiuc.edu!s-dorner