rob@pbhyf.PacBell.COM (Rob Bernardo) (10/07/88)
I'm one of the elm 2.2 developers and I've run across a nasty bug that has existed in elm since the 1.5 version (at least). I say "nasty" because the symptoms are awful and because the source is obscure. And therefore I need your help in suggestions on how to proceed in finding the source. The symptom is that when you read a message of a mailfile, the beginning part of the message has garbled parts of other messages. I've been able to reproduce the symptoms perfectly reliably. You are likely see the bug under this scenario, and perhaps others, but because stdio buffering seems to be involved, you may not see it under this scenario of the messages in your mailfile are radically different from those I was using in investigating this bug. 1. You are *not* using the builtin pager. 2. The sort order is "mailbox order" (either by selection or because the sort order you choose is coincidentallly mailbox order in a particular case). 3. Your sequence of interactions with elm is as follows: a. You save the first message in the mailfile. (You may read it first if you want.) b. You read the second message (you don't need to read it all the way through.) c. You "j" down to the third message and read it. The third message will begin with garbled parts of the first and second messages! d. If you re-read the third message, it will be okay. At first I thought elm's offsets to the beginnings of each message had gotten trashed, but I checked this out. The offsets remained correct and were correctly passed to fseek() prior to the reading of the selected message. I checked the mailfile contents at the proper offset with lseek() and read() just prior to the "faulty" fgets() and found the file contents had not been corrupted. Since elm reads the mailfile with stdio (specifically fgets()), I deduced that stdio's input buffer was getting corrupted. By using setbuf(), I was able to gain direct access to the input buffer. And indeed at a certain point in reading through the mailfile, stdio's input buffer was getting corrupted. So what I suspect is that one of elm's character arrays is undersized for the data copied into it and the data is trashing stdio's parameters for the open mailfile. What makes this even more bizarre is that different pieces of evidence point in different directions for the location of the overrun array: 1. Since in my test situation whether or not you save the first message determines whether the symptom will be seen, I would think that the save() section of code is at fault. 2. Since the stdio buffer read that occurs during the presentation of the second message is okay, the stdio parameters don't seem to have been corrupted at this point. But they're apparently corrupted during the presentation of the third message. This would point to the code that executes between the readings of the two messages. But this is *not* the save() code. 3. Since the symptom does not occur if you use the builtin pager, I'd guess the source of the bug is somehow tied to the different parts of the code for the builtin vs. external pagers. This is also not the save() code. I have already gone over the code for interfacing with an external pager and I see nothing suspicious. HELP!!!! -- Rob Bernardo, Pacific Bell UNIX/C Reusable Code Library Email: ...![backbone]!pacbell!rob OR rob@PacBell.COM Office: (415) 823-2417 Room 4E750A, San Ramon Valley Administrative Center Residence: (415) 827-4301 R Bar JB, Concord, California