earle@smeagol.UUCP (Greg Earle) (05/14/86)
In the Netnews 2.10.3 4.3bsd-beta 6/6/85 distibution, there is a program
called article (written by Peter Honeyman, down!honey) that comes in
the `misc' directory. This program will take an article message ID
(like 1052@ellie, 1424@lll-crg, etc.), look it up in the dbm version
of the news history file, and if it finds it, will print out the
first pathname it finds for the article (if it was cross-posted), by
fetching the line containing the article ID from the history file.
Example:
% /usr/lib/news/misc/article 150@hadron.UUCP
/usr/spool/news/net/unix-wizards/XXXX (Your mileage may vary, here)
%
The problem I have found, is that more often than not, the program will
blow up on a Bus Error. This happens on a Sun 2-120 workstation, running
Sun OS 2.0 (4.2BSD based). The cause of this Bus Error is an execution
of lseek(2). A subroutine looks for the article ID in question, and returns
a datum that has a pointer to the entry in the history dbm file.
It then calls lseek(2) with this pointer dereferenced in order to get to
that line in the history file.
Here is a snatch of the code (from ./misc/article.c in the distribution):
...
content = dofetch(*argv);
if (content.dptr == 0) {
printf("%s: No such key\n", *argv);
continue;
}
if (lseek(fd, *(long *) content.dptr, 0) < 0)
continue;
So `content' gets the offset associated with argv, and lseek tries to
position to that spot.
I am finding that it retrieves some articles with no problem, but most of
them are gagging with the accompaning Bus Error.
Anyhow, this begs the basic question :
What are the limits on positioning in an lseek(2) under 4.2BSD?
What conditions are there, that allow successful completion of the call
in some circumstances, and Bus Errors in (most) others?
This came about because my history (and .dbm) file is pretty big;
I made the assumption that the value returned in content.dptr was larger
than some cutoff value (Why should an lseek fail if the file has enough
bytes? Arggghh ...), but I'm not sure ...
All ideas welcome ...
--
Greg Earle UUCP: sdcrdcf!smeagol!earle, attmail!earle
JPL ARPA: elroy!smeagol!earle@csvax.caltech.edu
My HAIRCUT is totally NON-TRADITIONAL!
jpm@quad1.UUCP (John McMamee) (05/15/86)
> Here is a snatch of the code (from ./misc/article.c in the distribution): > > ... > content = dofetch(*argv); > if (content.dptr == 0) { > printf("%s: No such key\n", *argv); > continue; > } > if (lseek(fd, *(long *) content.dptr, 0) < 0) > continue; This looks like a problem I've had before. The dbm library does not align the data to a word boundary, thus you can't always access it directly. My solution has been to copy the data returned by dbm into my own structure or buffer and then use that. -- John P. McNamee Quadratron Systems Inc. UUCP: {sdcrdcf|ttdica|scgvaxd|mc0|bellcore|logico|ihnp4}!psivax!quad1!jpm ARPA: jpm@BNL.ARPA
fair@styx.UUCP (Erik E. Fair) (05/15/86)
This comes about because the dbm(3) library does not guarantee to deliver things aligned on the appropriate boundary. What you need to do is (approximately): dp = fetch(key); bcopy(offp, dp->d_data, sizeof(offp)); lseek(fd, *offp, 0); Declare offp so that it is guaranteed to be aligned, and copy all pointers you wish to dereference into it before dereff'ing them. Erik E. Fair styx!fair fair@lll-tis-b.arpa
earle@smeagol.UUCP (Greg Earle) (05/16/86)
In article <717@smeagol.UUCP>, earle@smeagol.UUCP I wrote: > In the Netnews 2.10.3 4.3bsd-beta 6/6/85 distibution, there is a program > called article (written by Peter Honeyman, down!honey) that comes in > the `misc' directory. This program will take an article message ID > (like 1052@ellie, 1424@lll-crg, etc.), look it up in the dbm version > of the news history file, and if it finds it, will print out the > first pathname it finds for the article (if it was cross-posted), by > fetching the line containing the article ID from the history file. [ Example omitted ] > The problem I have found, is that more often than not, the program will > blow up on a Bus Error. This happens on a Sun 2-120 workstation, running > Sun OS 2.0 (4.2BSD based). The cause of this Bus Error is an execution > of lseek(2). No it's not, idiot ... > I am finding that it retrieves some articles with no problem, but most of > them are gagging with the accompaning Bus Error. Like the guy at the school crossing once said: Look both ways before posting. I got bitten by the Famous 680x0 long-pointer-must-be-word-aligned problem ... The program blows up when the pointer value in the datum that is returned by fetch() is odd; i.e. not word aligned. When interpreted as a (long *) and dereferenced, gag city ... Thanks to voder!jeff for pointing this out just as I was realizing it myself. That's what I get for not being a dbm expert, & forgetting Machine Dependencies to boot :-( Hopefully this will get out before I get too much "Geez, what a MAROON" mail. BTW, the fix: Add a long fpos; declaration to main, and before the lseek insert /* The bcopy is NECESSARY to insure alignment on some machines */ bcopy(content.dptr, (char *)&fpos, sizeof (long)); This is from the file funcs2.c, function findhist(); in the 2.10.3 4.3bsd-beta Netnews distribution. Again, my apologies for the original posting. Oh well, maybe some of you got a good laugh out of it, so ... -- Greg Earle UUCP: sdcrdcf!smeagol!earle; (new!!) attmail!earle JPL ARPA: elroy!smeagol!earle@csvax.caltech.edu Hello, GORRY-O!! I'm a GENIUS from HARVARD!!
guy@sun.uucp (Guy Harris) (05/17/86)
(Third attempt at posting - the person who decided that the "rlogin" daemon should send a SIGKILL to all processes in the "rlogin" session when the user logs out should be shot.) > This comes about because the dbm(3) library does not guarantee to > deliver things aligned on the appropriate boundary. To clarify: some machines, like the VAX on which "article" was probably written, permit you to reference 2-byte and 4-byte quantities regardless of what byte boundary they are aligned on. Other machines, like machines using the 68010 (of which the Sun-2 is one), will fail if you try to reference such a quantity if it's not properly aligned. The "*(long *) content.dptr" reference is failing, not the "lseek"; it never even gets to call "lseek" since it faults while setting up the argument list. > What you need to do is (approximately): > > dp = fetch(key); > bcopy(offp, dp->d_data, sizeof(offp)); > lseek(fd, *offp, 0); > > Declare offp so that it is guaranteed to be aligned, and copy all > pointers you wish to dereference into it before dereff'ing them. Not quite. The problem is not that the pointer itself is not properly aligned, it's that the object the pointer points to is not properly aligned. The "bcopy" should copy this object to a properly-aligned object, whose value should then be passed to "lseek" as its second argument. After correcting the usage of "dofetch", "bcopy", and the datum returned by "bcopy", the code would read like: long offset; . . . content = dofetch(*argv); if (content.dptr == 0) { printf("%s: No such key\n", *argv); continue; } bcopy(content.dptr, (char *)&offset, sizeof offset); if (lseek(fd, offset, 0) < 0) continue; -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.arpa
mouse@mcgill-vision.UUCP (der Mouse) (05/17/86)
In article <717@smeagol.UUCP>, earle@smeagol.UUCP (Greg Earle) writes: > In the Netnews 2.10.3 4.3bsd-beta 6/6/85 distibution, there is a > program called article (written by Peter Honeyman, down!honey) > [which] will take an article message ID, look it up in the dbm > version of the news history file, and [...] > > The problem I have found, is that more often than not, the program will > blow up on a Bus Error. [...] The cause of this Bus Error is an > execution of lseek(2). [...] > > Here is a snatch of the code (from ./misc/article.c in the distribution): > > ... > content = dofetch(*argv); > if (content.dptr == 0) { > printf("%s: No such key\n", *argv); > continue; > } > if (lseek(fd, *(long *) content.dptr, 0) < 0) > continue; > > I am finding that it retrieves some articles with no problem, but most of > them are gagging with the accompaning Bus Error. > > Anyhow, this begs the basic question : > What are the limits on positioning in an lseek(2) under 4.2BSD? > What conditions are there, that allow successful completion of the call > in some circumstances, and Bus Errors in (most) others? I suspect that the problem has nothing to do with lseek(). Why are you sure it does? Merely because article died on that line? I would suspect that content.dptr contains trash, so of course when it is cast to pointer to long it will *still* be trash, and we all know what dereferencing trash does. According to the man page here, lseek can return the following errors: EBADF if the fd argument is bad ESPIPE trying to seek on a socket EINVAL bad value for whence (the third argument) or the resulting position is before the beginning of the file. I cannot think of any other reason lseek() could fail; and indeed, checking our source reveals that indeed, lseek() cannot generate any error except those listed above (VAX 4.2BSD, Sun source should be similar). In fact, the second reason for giving EINVAL is not checked by lseek(); presumably it is permissible for the pointer to go negative provided no read()/write() operations are attempted while it is negative. inews: Article rejected - more included text than new text < T inews: Article rejected - more included text than new text o inews: Article rejected - more included text than new text inews: Article rejected - more included text than new text k inews: Article rejected - more included text than new text e inews: Article rejected - more included text than new text e inews: Article rejected - more included text than new text p inews: Article rejected - more included text than new text inews: Article rejected - more included text than new text i inews: Article rejected - more included text than new text n inews: Article rejected - more included text than new text e inews: Article rejected - more included text than new text w inews: Article rejected - more included text than new text s inews: Article rejected - more included text than new text inews: Article rejected - more included text than new text h inews: Article rejected - more included text than new text a inews: Article rejected - more included text than new text p inews: Article rejected - more included text than new text p inews: Article rejected - more included text than new text < y Now, netnews authors, ask yourselves: Isn't that kind of a counterproductive thing to have put into inews? -- der Mouse USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!mcgill-vision!mouse philabs!micomvax!musocs!mcgill-vision!mouse Europe: mcvax!decvax!utcsri!mcgill-vision!mouse mcvax!seismo!cmcl2!philabs!micomvax!musocs!mcgill-vision!mouse ARPAnet: utcsri!mcgill-vision!mouse@uw-beaver.arpa "Come with me a few minutes, mortal, and we shall talk."
gordon@sneaky (05/20/86)
> In the Netnews 2.10.3 4.3bsd-beta 6/6/85 distibution, there is a program > called article (written by Peter Honeyman, down!honey) that comes in > the `misc' directory. This program will take an article message ID > (like 1052@ellie, 1424@lll-crg, etc.), look it up in the dbm version > of the news history file, and if it finds it, will print out ... ... > The problem I have found, is that more often than not, the program will > blow up on a Bus Error. This happens on a Sun 2-120 workstation, running > Sun OS 2.0 (4.2BSD based). The cause of this Bus Error is an execution > of lseek(2). A subroutine looks for the article ID in question, and returns > a datum that has a pointer to the entry in the history dbm file. I found this same code in news 2.10.2. The data stored by dbm is an lseek() offset into the history file. The data returned by dbm's fetch() is a pointer to an *UNALIGNED* lseek() offset. Depending on your processor, you get bus errors, or the processor quietly uses the long word which contains the byte your pointer is pointing at, giving bozo results. If the data happens to be aligned right for your processor, it works. > It then calls lseek(2) with this pointer dereferenced in order to get to > that line in the history file. I doubt it gets a chance to actually call lseek(). > if (lseek(fd, *(long *) content.dptr, 0) < 0) > continue; Does lint say "Possible pointer alignment problem"? Try something like: { long scratchvar; /* maybe should be off_t instead */ int i; /* can also use bcopy or something similar */ for (i = 0; i < sizeof(long); i++) ((char *) &scratchvar)[i] = content.dptr[i]; if (lseek(fd, scratchvar, 0) < 0) continue; } Gordon Burditt ...!convex!ctvax!trsvax!sneaky!gordon ...!ihnp4!sys1!sneaky!gordon