[comp.lang.c] The nastiest kind of C bug

eric@snark.UUCP (Eric S. Raymond) (10/08/88)

(Cross-posted to comp.lang.c because the point made has more general relevance)

In <4025@pbhyf.pacbell.com>, rob@pbhyf.PacBell.COM (Rob Bernardo) writes:
> I'm one of the elm 2.2 developers and I've run across a nasty
> bug that has existed in elm since the 1.5 version (at least).
>    [describes stepping through looking for a buffer corruption point]
> What makes this even more bizarre is that different pieces of evidence
> point in different directions for the location of the overrun array:

Overrun static buffers don't tend to produce quite this level of confusion;
the bug search strategy you described would have caught the problem, I think,
if that's what was going on.

An overrun *automatic* buffer, on the other hand, can trash stack frames,
leaving just this kind of apparently contradictory evidence -- because the pc
can land in the middle of a buffer-altering routine on return from the
enclosing subroutine. It's amazing how often this results in silent corruption
and later bugs, instead of a nice clean core dump.

If you have sdb, get to a point where the bug is presenting, then do a check
on each automatic buffer in a routine above your break point on the execution
stack. Odds are you'll find one full...

This is the most insidious kind of C bug there is, bar none. Even malloc()
braindamage isn't as nasty, because you can replace malloc() with a version
that does more checking. I've learned the hard way to check on this whenever
I see a bug involving magic changes in buffers within routines that shouldn't
access them.
-- 
      Eric S. Raymond                     (the mad mastermind of TMN-Netnews)
      UUCP: ...!{uunet,att,rutgers}!snark!eric = eric@snark.UUCP
      Post: 22 S. Warren Avenue, Malvern, PA 19355      Phone: (215)-296-5718