[comp.software-eng] Most likely cause of symptoms

alesha@auto-trol.com (Alec Sharp) (06/06/91)

I'm trying to put together a list of symptoms, and their most likely
causes, to help people who are trying to debug problems.  This such
as:

Intermittent, non-reproducible problems: often memory or timing 
dependent.  Memory problems often caused by uninitialized variables.

Bugs that can't be reproduced in the debugger: often memory or timing
dependent.

Access violation/exception conditions:  Uninitialized variables,
arrays out of bounds, pointers out of range.

Does anyone have other hints they use when debugging?

Alec Sharp
-- 
------Any resemblance to the views of Auto-trol is purely coincidental-----
Don't Reply - Send mail: alesha%auto-trol@sunpeaks.central.sun.com
Alec Sharp           Auto-trol Technology Corporation
(303) 252-2229       12500 North Washington Street, Denver, CO 80241-2404

mcmahan@netcom.COM (Dave Mc Mahan) (06/09/91)

 In a previous article, alesha@auto-trol.com (Alec Sharp) writes:
>
>I'm trying to put together a list of symptoms, and their most likely
>causes, to help people who are trying to debug problems.  This such
>as:
>
>Intermittent, non-reproducible problems: often memory or timing 
>dependent.  Memory problems often caused by uninitialized variables.

I have found that this is usually caused by either not enough CPU time or
the fact that I designed the software process without taking into account
one or more behaviours of the real data.  Uninitialized variables rarely
bite me anymore because I set all RAM to know values on powerup (I like to
use 0xFFFFFFF for memory because it causes an IMMEDIATE memory bounds access
error any time it is used as a pointer) and use compilers that will warn me
if I am using a local variable without initialization.  The other thing that
causes such intermitent errors is variations in the manufacturing process of
the product (I do lots of work with embedded analog/digital systems) that
identify weak points in the software.  We recently built 300 units of a
product and final test showed that about 15 or so had tolerance problems in
two resistor pairs that would cause the software to fail certain actions.
Others were marginal.  The software performed exactly as desired, it was just
getting bad data from the real world.

>Bugs that can't be reproduced in the debugger: often memory or timing
>dependent.

The first step in any debug process is being able to repeat a bug so it can
be isolated.  I have found most non-reproducable problems occur because I
haven't used the real data that caused the problem in the first place.  This
data is usually (again) a real-world signal that occurs in conjunction with
other external inputs.  The right combination causes failure, it's just very
improbable that such a combination will be generated.  Such elusive events
are also difficult to detect and use as trigger points for a debugger until
long after they have passed.  Being able to detect any wierd data as soon as
possible is a big help.  You have to be very strict on what is let in to the
CPU for processing if you think you will have a problem with data.

>Access violation/exception conditions:  Uninitialized variables,
>arrays out of bounds, pointers out of range.

Pretty much true, but general causes for these things in my programs tend to
be design error or implementation error on my part.  Things like pointers
will walk out of proper bounds if you don't test end conditions properly.
I usually try to do a pre- and post-test of any pointers used in queues when
I code a set of routines that are used for manipulation.  This checks to make
sure that pointers are in-bounds and fall on proper boundaries of structures
they point to.

>Does anyone have other hints they use when debugging?

I find that re-use of software or hardware for a slightly different purpose
can sometimes lead to surprising results.  Developing a set of test routines
early in the debug process (and discarded for the final release) can save
lots of time.  Prototype hardware is also notorious for developing surprise
connections that screw up software.

A big problem I have faced often is trying to develop interrupt routines that
try to do too much when running.  Manipulation of data passing between 
interrupt routines and normal code has to be rigorous in design and should
be made to be as simple as possible.  Developing a set of techniques for such
data passing and ensuring they work reliably should be done by every code 
designer who uses them.  These techniques should then be followed for future
work.

>Alec Sharp           Auto-trol Technology Corporation

   -dave

-- 
Dave McMahan                            mcmahan@netcom.com
					{apple,amdahl,claris}!netcom!mcmahan

kambic@iccgcc.decnet.ab.com (George X. Kambic, Allen-Bradley Inc.) (06/14/91)

In article <1991Jun6.135234.18165@auto-trol.com>, alesha@auto-trol.com (Alec Sharp) writes:
> 
> I'm trying to put together a list of symptoms, and their most likely
> causes, to help people who are trying to debug problems.  This such
> as:
> 
> Intermittent, non-reproducible problems: often memory or timing 
> dependent.  Memory problems often caused by uninitialized variables.
> 
> Bugs that can't be reproduced in the debugger: often memory or timing
> dependent.
> 
> Access violation/exception conditions:  Uninitialized variables,
> arrays out of bounds, pointers out of range.
> 
> Does anyone have other hints they use when debugging?

Try reading Myers or Beizer

GXKambic
standard disclaimer