[comp.mail.elm] Need help discovering bug source

rob@pbhyf.PacBell.COM (Rob Bernardo) (10/07/88)

I'm one of the elm 2.2 developers and I've run across a nasty
bug that has existed in elm since the 1.5 version (at least).

I say "nasty" because the symptoms are awful and because the
source is obscure. And therefore I need your help in suggestions
on how to proceed in finding the source.

The symptom is that when you read a message of a mailfile,
the beginning part of the message has garbled parts of other
messages. I've been able to reproduce the symptoms perfectly reliably.
You are likely see the bug under this scenario, and perhaps others,
but because stdio buffering seems to be involved, you may not see
it under this scenario of the messages in your mailfile are radically
different from those I was using in investigating this bug.

	1. You are *not* using the builtin pager.

	2. The sort order is "mailbox order" (either by selection
	or because the sort order you choose is coincidentallly
	mailbox order in a particular case).

	3. Your sequence of interactions with elm is as follows:
		a. You save the first message in the mailfile.
		(You may read  it first if you want.)
		   
		b. You read the second message (you don't need to
		read it all the way through.)

		c. You "j" down to the third message and read it.
		The third message will begin with garbled parts
		of the first and second messages!

		d. If you re-read the third message, it will be okay.

At first I thought elm's offsets to the beginnings of each message
had gotten trashed, but I checked this out. The offsets remained
correct and were correctly passed to fseek() prior to the reading
of the selected message.

I checked the mailfile contents at the proper offset with lseek()
and read() just prior to the "faulty" fgets() and found the file
contents had not been corrupted.

Since elm reads the mailfile with stdio (specifically fgets()), I 
deduced that stdio's input buffer was getting corrupted. By using 
setbuf(), I was able to gain direct access to the input buffer. And 
indeed at a certain point in reading through the mailfile, stdio's 
input buffer was getting corrupted. So what I suspect is that one of 
elm's character arrays is undersized for the data copied into it and 
the data is trashing stdio's parameters for the open mailfile. 

What makes this even more bizarre is that different pieces of evidence
point in different directions for the location of the overrun array:

	1. Since in my test situation whether or not you save the
	first message determines whether the symptom will be seen,
	I would think that the save() section of code is at fault.

	2. Since the stdio buffer read that occurs during the presentation
	of the second message is okay, the stdio parameters don't seem
	to have been corrupted at this point. But they're apparently
	corrupted during the presentation of the third message. This
	would point to the code that executes between the readings
	of the two messages. But this is *not* the save() code.

	3. Since the symptom does not occur if you use the builtin pager,
	I'd guess the source of the bug is somehow tied to the different
	parts of the code for the builtin vs. external pagers. This is
	also not the save() code. I have already gone over the code for
	interfacing with an external pager and I see nothing suspicious.


HELP!!!!
-- 
Rob Bernardo, Pacific Bell UNIX/C Reusable Code Library
Email:     ...![backbone]!pacbell!rob   OR  rob@PacBell.COM
Office:    (415) 823-2417  Room 4E750A, San Ramon Valley Administrative Center
Residence: (415) 827-4301  R Bar JB, Concord, California