[comp.sys.handhelds] Interesting effect

cloos@acsu.buffalo.edu (James H. Cloos) (02/14/91)

I do not want to alarm anyone, as I have not been able to reproduce this,
but Monday morning I was checking out the performance of the SPLINE suite
that was posted here, and ran into an interesting problem.

I gave the routines the data [ 1 5 14 30 55 91 125 153 171 175 161 125 90
58 31 11 ] [ 3 18 ] and hit the key.  After about 5 minutes, I gave up on
the program & tried to break out.  ATTN didn't work, so I went so far as to
try ON-C.  This was also non-functional.

(Last April or May I ran into a similar problem when I had cleared all of
the key assignments & put myself in USER mode; ON-C did not work even
though it is supposed to.  The next time I tried it, ON-C did work, but
that time it did not.  I gave up on reproducing that until now.)

I ended up taking the foot off and defibrillated it.  The screen went blank
so I hit ON, which turned it back on & gave me a Do You Want To Recover Mem
screen.  Of course I hit YES, and soon got that screen back, so I hit YES
again & was returned 8 directories (D.0[1-8]) of which the first contained
all of the other 7 (named D.0[1-7]) and the others were each of the
subdir's I had w/ the exception of the SPLINE dir & its parent (which,
interestingly, contined many of those recovered dirs).  The '' dir came up
as one of the D.... dirs.  I did find it interesting that the 128K RAM card
I had MERGEd was still MERGEd and the BAK in it of HOMEDIR was unchanged.
The clock was also OK.  I was able to recover by RESTOREing the BAK from
PORT0.

What is the moral of this story?  I don't know, but this does make 3 times
now that I had the calc lock up completely, and the 1st and 3rd were while
it was running a strictly userlanguage program.  (The 2nd was so bad even
ON-A-F didn't work, nor did the defibrillator; only removing the batteries
did.)

After the first time, I mentioned in a posting that ON-C could be disabled
by clearing the assignment of the ON key--asuming you are in USER mode.
Someone on the design team (maybe Bill) followed-up that nothing could (in
userlang anyway) block ON-C.  My subsequent trials agreed with that
sentiment, but now it has happened to me again, & I KNOW that I was
pressing ON and C real well many, many times. There MAY be an obscure
problem, or I may just have gotten a thick spurt of some interference
(cosmic rays?) that mucked up RAM just enough to crash.  I was given to
understand that defibrillating would NOT trash mem, so it looks like
trashed mem caused the crash.

I would be curious to learn if anyone else has had unexplainable crashes
(w/o the use of non-userlang stuff).

P.S.	I'm currently leaning toward some kind of RAM-affecting interference.

P.P.S.	FLAMES to /dev/null, bitte; only constructive replies need
	reply or follow-up.

-JimC
--
James H. Cloos, Jr.		Phone:  +1 716 673-1250
cloos@ACSU.Buffalo.EDU		Snail:  PersonalZipCode:  14048-0772, USA
cloos@ub.UUCP			Quote:  <>

frechett@spot.Colorado.EDU (-=Runaway Daemon=-) (02/14/91)

I don't want to quote this whole article, but you said that you were led to
believe that the reset button didn't trash memory.  I don't think it does.  
It sounds to me like the memory was trashed before you hit the button.  If you
consider that they calc was already locked up and nothing else was working, I
would have to think that your memory wasn't in very good shape.  Try hitting
the reset button when the calc is working fine.  It seems to be analgous to the
hp28s's ON-ENTER-BACKSPACE sequence which just turns the calc off.  You will
notice that the reset button always does turn the calc off.  

As for the actual process of locking up the machine, it is easy to do.  I have
crashed my memory many times.  The last time, was while beta-testing NRTS0.1
and seemed to have something to do with the input buffer, as everything was 
hunky dory when I left it and a few minutes later it was displying 
Try to recover Memory?    I have locked it up bad enough to have to hit the
button about 4 times and in about 3 out of 4, the memory loss was total.  
If you think about it, other machines do it too.  I used to freak out Apples 
many years ago.  Ever play with screwed up binaries on a Mac?  Wasn't it
something like "Fatal System error" and it shows a little bomb.... 
I have locked up my share of PCs and the best was when in the time span of
about an hour, I managed to crash 5 DECstation 3100s 3 or 4 times each.  
Our big DEC 5500 and SEQUENT machines freak and die about once a week.  

The main difference between them and the hp48sx is that the hp48sx just doesn't
come back up as easily.  It doesn't have a drive.  I am getting a second 128K
card so I should be able to use this one for all myu backups.  Will make
hacking a much safer proposition.  Fun little machine.  

	ian

mcgrant@elaine30.stanford.edu (Michael Grant) (02/15/91)

In article <59716@eerie.acsu.Buffalo.EDU> 
cloos@acsu.buffalo.edu (James H. Cloos) writes:
>There MAY be an obscure
>problem, or I may just have gotten a thick spurt of some interference
>(cosmic rays?) that mucked up RAM just enough to crash.  
>
>P.S.	I'm currently leaning toward some kind of RAM-affecting interference.

It could be a 'soft error', which is simply the interaction of radiation
with the memory, causing a temporary read error.  I've worked in memory
failure analysis before, and soft errors are enough of a problem that they
are used as a measure of the robustness of a new design.  They bombard
the thing with alpha particles, and see if anything goes wrong.  Of course,
this means that they get many more errors than anyone else ever will.

Well, to be perfectly honest, there is no way of knowing if this is 
the reason why the calculator crashed, but, despite all semiconductor
companies' efforts to prevent them, they still crop up--especially since
memory cell size keeps shrinking (the lower a cell's capacitance, the fewer
the number of electrons, the higher the susceptibility).

Hell, for all I know, it could be completely irrelevant to this particular
crash--I've never known a bug in my life to be attributable to anything but
software error.  But, your mention of cosmic rays brought this to mind.

Just a wild-eyed suggestion,
Michael C. Grant

ervin@pinbot.enet.dec.com (Joseph James Ervin) (02/15/91)

>It could be a 'soft error', which is simply the interaction of radiation
>with the memory, causing a temporary read error.  I've worked in memory
>failure analysis before, and soft errors are enough of a problem that they
>are used as a measure of the robustness of a new design.  

I believe you can pretty much rule out alpha particles as a source of the
errors you've seen.  Such "soft" errors are a phenomenon of dynamic
memory devices.  The memory in the HP48 is static, so alpha particles
have a much, much, much smaller chance of doing any bit-flipping than
in the case of dynamic memory.

>>>Joe Ervin

mamos@uafhp.uark.edu (Mark _E_ Amos) (02/16/91)

  As long as we're on the subject of RAM problems and lockups, I would like to
share my latest adventure, in the hopes I can learn just what the H*LL is 
happening.  

-Some guys I know in my department discovered they could use Smith-Corona 32K
RAM cards ($23.84, local Wal-Mart) in their 48's.  Out of 3 purchased, 1 didn't
work, and a quick exchange at the store fixed that.  Here comes the fun part:
A friend and I figured we'd cash in on this cheap expansion stuff, and after
discovering the local Wal-Mart was sold out (word gets around, eh?) we went to
one at a neighboring town, finding plenty of new and unopened blister packs of
the little jewels.  I bought 2 and my friend bought 1.  Upon reaching the car,
I got my 48 out and plugged 1 of mine in - it worked great!  I then plugged
the second one in and turned on the power to see random vertical lines, then
blank screen, then a BLACK screen which began bleeding and pulsating from left
to right, with NO key sequence having any affect at all.  The bleeding screen
smacked of overdrive, so I unplugged the second card.  The machine then came
back after a bit of a wait and key pressing.

-Ok, fine, I happened to get a bad card like the other dude I knew...  So, I
tried my friend's card - same exact thing.  This happened with the two "bad"
cards whether I had the "good" one in or not, and didn't matter which slot.
Now what?  Try them on my friend's machine - they ALL 3 worked flawlessly,
including two at once...

-Hmmm.  Well, I had about 28k used on mine, including a library in port 0,
so I copied all my stuff to his and duplicated the above sequence.  Same 
result.  Next I tried checking ROM versions, etc.-identical (D).  I then began
grabbing for straws - wiped my machine (ON-A-F, No) and tried again: same
thing.  OK, he had his old original batteries and I had brand new ones (NOW
we're grabbing for shadows of straws) so I swapped them - same result.
(Incidently, the ON-D, G sequence does NOT seem to have any indication of 
relative battery strength WHATSOEVER, as I checked this during the process).

-I finally gave up and went back in to exchange one of the "bad" cards, hoping
I would get lucky and get one that would work on mine like the one I had that
would.  No such luck.  This one also worked fine in my friend's but did the
bleeding screen bizness on mine.

-I have been an electronics tech for upwards of 8 years, and an engineering
student for 2 years so far - I consider my methods logical and thorough, yet
I can find NO explanation whatsoever for this "phenomena".  I will be sending
my machine in to update ROM to E in a few months, but in the mean time, what
in the name of Sam Hill is happening?  I know, I know, it's not an authorized
HP card, etc., etc. - but the fact remains 3 of 4 cards would NOT work on MY
machine, yet 7 of 8 work on 3 other machines - the 8th of which I have yet to
discover the actual symptoms of (ie. screen bleeding, or just no workum?).

-Conclusion: authorized RAM card or no, there are obiously SOME kind of
differences between machines that are ROM independent.  OK, so let's talk about
the variations in line drivers/buffers, etc. -I know these differences exist,
but what I would like to know most of all is, am I just the fluke or does this
kind of thing happen to anyone else out there?

"Curiouser and curiouser..."

==============================================================================
  Mark _E_ Amos        | University of Arkansas Computer Science Engineering
  mamos@uafhp.uark.edu | 
  mea1@engr.uark.edu   | (emphasise the Computer Engineering please)
------------------------------------------------------------------------------
"Man's mind, when stretched to a new idea, never goes back to its original
 dimension."				              -Oliver Wendell Holmes
==============================================================================

cah@gripl.UUCP (Chris Heitmann) (02/19/91)

Well, I too went out and tried a S.C. ram card from Lechmere ($27.99 :-( ) 
and it did not work.  I observed the same bleeding screen effect that the 
previous poster saw.  I exchanged it for another, and the same thing 
happened again.  I will be trying another soon, but wanted to give it a rest 
for a day (supersticious I guess...).  One word of caution, when I tried the 
unauthorized ram cards, my memory was erased.  Not just the internal memory 
but the 128k ram card also.  The external memory was merged with the 
internal so possibly if it were a backup or something instead, it would not 
have been erased.  In any case I could not find any serial numbers on the 
S.C. ram card to compare so as to figure out any differences.
        I am considering trying the Korg memory cards (128k!) from the local 
music store ($115.00 if memory serves...oops, no pun intended!).

                                Chris
cah@gripl.uucp

frechett@spot.Colorado.EDU (-=Runaway Daemon=-) (02/20/91)

From what I can tell... I think the Smith Corona Cards are as better 
deal and possibly better cards... Unfortunately, I don't have the old
discussion on Epson RAM cards archived, so I don't remember what all the 
differences were.  Something like, the Epson cards are made to run at 5-5.5V 
and the HP cards are of course made to work at 4.5V.  I don't know how this
translates into weird problems described here.  My personal theory is that
it could have something to do with the various versions with the dying 
LCDs.  I know that they extended up into the Ds.  Any ideas anyone?

	ian

--
-=Runaway Daemon=-

TNAN0@CCVAX.IASTATE.EDU (02/20/91)

Chris,

  I purchased a Smith Corona "DataStore 32K" card today, brought it home,
tried it out and it works perfectly.  It's model number (I think that's what
it is) is S 75531.  I have tried it in two HP-48s so far and I haven't yet
noticed any odd screen (or memory) effects.

I have:
HP-48D
(and tried it on an E)
Serial #: 3031A00755

I purchased the card at Wal-Mart for $25.03 (after 5% sales tax).
I've tried it solo and with the EQLIB card, but not with another RAM card--
could this be the problem?

---Xeno

jcohen@lehi3b15.csee.Lehigh.EDU (Josh Cohen [890918]) (03/13/91)

try  the sc card with NEW batteries.. I have heard that low batt condition
in the hp can cause a crash wth the insertion of a  sc card.