[comp.databases] How would YOU solve this problem???

r_anderson@quickr.enet.dec.com (Rick Anderson) (03/27/91)

This is for all of you database-oriented debugging gurus out
in netland - I need some creative ideas to solve a problem
I'm having with a production database.

First, a little background: The database is in production
mode (i.e. it's live - not memorex :-)) and is used by
approximately 100 users - 50 in the U.S. and 50 in Ireland.
This means the database is used 24 hours a day and cannot
be down for long periods of time.  These users are typically
data-entry types (i.e. terminal applications).  There are
also a large number of "batch" jobs that get submitted 
throughout the day.

Here's the problem: A few select addresses in shared memory
are getting corrupted by an unknown process(es).  The addresses
getting corrupted are NOT contiguous, but are usually within
the same 512 byte block.  The same addresses are being corrupted
with the same values every time!

Here's the question: How would you go about determining what
process is corrupting the shared memory addresses?  I have
tried adjusting database parameters, but that only moves the
problem to new shared memory addresses.  The shared memory 
cannot be write-protected because it contains the database
schema information.

BTW: The operating system is VMS, but it could just as well
be UNIX.

ALSO: The problem does NOT occur on the development database
(figures...).

I would appreciate any creative debugging ideas!  Either email
them to me (and I'll post if appropriate) or post them here
directly...

Thanks in advance!
Rick

**********************************************************
* Rick Anderson          * Digital Equipment Corporation *
**********************************************************
* UUNET: ...{decwrl|decvax}!nova.enet.dec.com!r_anderson *
* Internet: r_anderson%nova.enet.dec@decwrl.dec.com      *
**********************************************************

evan@plxsun.uucp (Evan Bigall) (03/27/91)

In article <21464@shlump.nac.dec.com> r_anderson@quickr.enet.dec.com (Rick Anderson) writes:
   [Big problem deleted]

   Here's the question: How would you go about determining what
   process is corrupting the shared memory addresses?

I'd stay up late a few nights doing whatever your preferred activity for
softening the brain is (the deities in charge expect penance) and then smack
yourself in forehead and yell "ahhhhhh Ha!!!"

This may not be the response you were looking for, but I have solved this
sort of problem before, and thats how I always did it.

/Evan

PS: you may want to mention the software involved, and the accessibility of
code/tools. 
--
Evan Bigall, Plexus Software, Santa Clara CA (408)982-4840  ...!sun!plx!evan
"I barely have the authority to speak for myself, certainly not anybody else"