[net.news.b] Limits on 4.2BSD lseek

earle@smeagol.UUCP (Greg Earle) (05/14/86)

In the Netnews 2.10.3 4.3bsd-beta 6/6/85 distibution, there is a program
called article (written by Peter Honeyman, down!honey) that comes in
the `misc' directory.  This program will take an article message ID
(like 1052@ellie, 1424@lll-crg, etc.), look it up in the dbm version
of the news history file, and if it finds it, will print out the
first pathname it finds for the article (if it was cross-posted), by 
fetching the line containing the article ID from the history file.
Example:
	% /usr/lib/news/misc/article 150@hadron.UUCP
	/usr/spool/news/net/unix-wizards/XXXX (Your mileage may vary, here)
	%

The problem I have found, is that more often than not, the program will
blow up on a Bus Error.  This happens on a Sun 2-120 workstation, running
Sun OS 2.0 (4.2BSD based).  The cause of this Bus Error is an execution
of lseek(2).  A subroutine looks for the article ID in question, and returns
a datum that has a pointer to the entry in the history dbm file.
It then calls lseek(2) with this pointer dereferenced in order to get to 
that line in the history file.

Here is a snatch of the code (from ./misc/article.c in the distribution):

		...
                content = dofetch(*argv);
                if (content.dptr == 0) {
                        printf("%s: No such key\n", *argv);
                        continue;
                }
                if (lseek(fd, *(long *) content.dptr, 0) < 0)
                        continue;

So `content' gets the offset associated with argv, and lseek tries to
position to that spot.

I am finding that it retrieves some articles with no problem, but most of
them are gagging with the accompaning Bus Error.

Anyhow, this begs the basic question :
What are the limits on positioning in an lseek(2) under 4.2BSD?
What conditions are there, that allow successful completion of the call 
in some circumstances, and Bus Errors in (most) others?
This came about because my history (and .dbm) file is pretty big;
I made the assumption that the value returned in content.dptr was larger
than some cutoff value (Why should an lseek fail if the file has enough
bytes?  Arggghh ...), but I'm not sure ...

All ideas welcome ...
--

	Greg Earle		UUCP: sdcrdcf!smeagol!earle, attmail!earle
	JPL			ARPA: elroy!smeagol!earle@csvax.caltech.edu

My HAIRCUT is totally NON-TRADITIONAL!

jpm@quad1.UUCP (John McMamee) (05/15/86)

> Here is a snatch of the code (from ./misc/article.c in the distribution):
> 
> 		...
>                 content = dofetch(*argv);
>                 if (content.dptr == 0) {
>                         printf("%s: No such key\n", *argv);
>                         continue;
>                 }
>                 if (lseek(fd, *(long *) content.dptr, 0) < 0)
>                         continue;

This looks like a problem I've had before.  The dbm library does not
align the data to a word boundary, thus you can't always access it
directly.  My solution has been to copy the data returned by dbm into
my own structure or buffer and then use that.
-- 
John P. McNamee					Quadratron Systems Inc.

UUCP: {sdcrdcf|ttdica|scgvaxd|mc0|bellcore|logico|ihnp4}!psivax!quad1!jpm
ARPA: jpm@BNL.ARPA

fair@styx.UUCP (Erik E. Fair) (05/15/86)

This comes about because the dbm(3) library does not guarantee to
deliver things aligned on the appropriate boundary. What you need
to do is (approximately):

	dp = fetch(key);
	bcopy(offp, dp->d_data, sizeof(offp));
	lseek(fd, *offp, 0);

Declare offp so that it is guaranteed to be aligned, and copy all
pointers you wish to dereference into it before dereff'ing them.

	Erik E. Fair	styx!fair	fair@lll-tis-b.arpa

earle@smeagol.UUCP (Greg Earle) (05/16/86)

In article <717@smeagol.UUCP>, earle@smeagol.UUCP I wrote:
> In the Netnews 2.10.3 4.3bsd-beta 6/6/85 distibution, there is a program
> called article (written by Peter Honeyman, down!honey) that comes in
> the `misc' directory.  This program will take an article message ID
> (like 1052@ellie, 1424@lll-crg, etc.), look it up in the dbm version
> of the news history file, and if it finds it, will print out the
> first pathname it finds for the article (if it was cross-posted), by 
> fetching the line containing the article ID from the history file.

[ Example omitted ]

> The problem I have found, is that more often than not, the program will
> blow up on a Bus Error.  This happens on a Sun 2-120 workstation, running
> Sun OS 2.0 (4.2BSD based).  The cause of this Bus Error is an execution
> of lseek(2).

No it's not, idiot ...

> I am finding that it retrieves some articles with no problem, but most of
> them are gagging with the accompaning Bus Error.

Like the guy at the school crossing once said:
	Look both ways before posting.

I got bitten by the Famous 680x0 long-pointer-must-be-word-aligned
problem ...

The program blows up when the pointer value in the datum that is returned
by fetch() is odd; i.e. not word aligned.  When interpreted as a (long *)
and dereferenced, gag city ...

Thanks to voder!jeff for pointing this out just as I was realizing it myself.

That's what I get for not being a dbm expert, & forgetting Machine Dependencies
to boot :-(

Hopefully this will get out before I get too much "Geez, what a MAROON" mail.

BTW, the fix:

Add a 
	long fpos;
declaration to main, and before the lseek insert

        /* The bcopy is NECESSARY to insure alignment on some machines */
        bcopy(content.dptr, (char *)&fpos, sizeof (long));

This is from the file funcs2.c, function findhist(); in the 2.10.3 4.3bsd-beta
Netnews distribution.

Again, my apologies for the original posting.  Oh well, maybe some of you
got a good laugh out of it, so ...
-- 
	Greg Earle	UUCP: sdcrdcf!smeagol!earle; (new!!) attmail!earle
	JPL		ARPA: elroy!smeagol!earle@csvax.caltech.edu

Hello, GORRY-O!!  I'm a GENIUS from HARVARD!!

guy@sun.uucp (Guy Harris) (05/17/86)

(Third attempt at posting - the person who decided that the "rlogin" daemon
should send a SIGKILL to all processes in the "rlogin" session when the user
logs out should be shot.)

> This comes about because the dbm(3) library does not guarantee to
> deliver things aligned on the appropriate boundary.

To clarify: some machines, like the VAX on which "article" was probably
written, permit you to reference 2-byte and 4-byte quantities regardless of
what byte boundary they are aligned on.  Other machines, like machines using
the 68010 (of which the Sun-2 is one), will fail if you try to reference
such a quantity if it's not properly aligned.  The "*(long *) content.dptr"
reference is failing, not the "lseek"; it never even gets to call "lseek"
since it faults while setting up the argument list.

> What you need to do is (approximately):
> 
> 	dp = fetch(key);
> 	bcopy(offp, dp->d_data, sizeof(offp));
> 	lseek(fd, *offp, 0);
> 
> Declare offp so that it is guaranteed to be aligned, and copy all
> pointers you wish to dereference into it before dereff'ing them.

Not quite.  The problem is not that the pointer itself is not properly
aligned, it's that the object the pointer points to is not properly aligned.
The "bcopy" should copy this object to a properly-aligned object, whose value
should then be passed to "lseek" as its second argument.  After correcting
the usage of "dofetch", "bcopy", and the datum returned by "bcopy", the code
would read like:

	long offset;
	.
	.
	.
	content = dofetch(*argv);
	if (content.dptr == 0) {
	        printf("%s: No such key\n", *argv);
	        continue;
	}
	bcopy(content.dptr, (char *)&offset, sizeof offset);
	if (lseek(fd, offset, 0) < 0)
	        continue;
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.arpa

mouse@mcgill-vision.UUCP (der Mouse) (05/17/86)

In article <717@smeagol.UUCP>, earle@smeagol.UUCP (Greg Earle) writes:
> In the Netnews 2.10.3 4.3bsd-beta 6/6/85 distibution, there is a
> program called article (written by Peter Honeyman, down!honey)
> [which] will take an article message ID, look it up in the dbm
> version of the news history file, and [...]
>
> The problem I have found, is that more often than not, the program will
> blow up on a Bus Error.  [...] The cause of this Bus Error is an
> execution of lseek(2).  [...]
>
> Here is a snatch of the code (from ./misc/article.c in the distribution):
>
> 		...
>                 content = dofetch(*argv);
>                 if (content.dptr == 0) {
>                         printf("%s: No such key\n", *argv);
>                         continue;
>                 }
>                 if (lseek(fd, *(long *) content.dptr, 0) < 0)
>                         continue;
>
> I am finding that it retrieves some articles with no problem, but most of
> them are gagging with the accompaning Bus Error.
> 
> Anyhow, this begs the basic question :
> What are the limits on positioning in an lseek(2) under 4.2BSD?
> What conditions are there, that allow successful completion of the call 
> in some circumstances, and Bus Errors in (most) others?

     I suspect that the problem has nothing to do with lseek().  Why are
you sure  it does?   Merely because article died  on that line?  I would
suspect that content.dptr contains trash, so of course  when  it is cast
to  pointer  to long  it will  *still* be  trash, and we all  know  what
dereferencing  trash does.   According to the  man page  here, lseek can
return the following errors:

	EBADF	if the fd argument is bad
	ESPIPE	trying to seek on a socket
	EINVAL	bad value for whence (the third argument)
	   or	the resulting position is before the beginning
		  of the file.

     I cannot think of any other reason lseek() could fail;  and indeed,
checking  our  source reveals  that indeed, lseek()  cannot generate any
error  except those  listed  above  (VAX 4.2BSD,  Sun  source should  be
similar).  In fact, the second reason  for giving EINVAL is not  checked
by lseek(); presumably it is permissible  for the pointer to go negative
provided  no  read()/write()  operations  are   attempted  while  it  is
negative.

inews: Article rejected - more included text than new text  <	T
inews: Article rejected - more included text than new text	o
inews: Article rejected - more included text than new text
inews: Article rejected - more included text than new text	k
inews: Article rejected - more included text than new text	e
inews: Article rejected - more included text than new text	e
inews: Article rejected - more included text than new text	p
inews: Article rejected - more included text than new text
inews: Article rejected - more included text than new text	i
inews: Article rejected - more included text than new text	n
inews: Article rejected - more included text than new text	e
inews: Article rejected - more included text than new text	w
inews: Article rejected - more included text than new text	s
inews: Article rejected - more included text than new text
inews: Article rejected - more included text than new text	h
inews: Article rejected - more included text than new text	a
inews: Article rejected - more included text than new text	p
inews: Article rejected - more included text than new text	p
inews: Article rejected - more included text than new text  <	y
Now,  netnews  authors,  ask  yourselves:     Isn't  that   kind  of   a
counterproductive thing to have put into inews?
-- 
					der Mouse

USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!mcgill-vision!mouse
     philabs!micomvax!musocs!mcgill-vision!mouse
Europe: mcvax!decvax!utcsri!mcgill-vision!mouse
        mcvax!seismo!cmcl2!philabs!micomvax!musocs!mcgill-vision!mouse
ARPAnet: utcsri!mcgill-vision!mouse@uw-beaver.arpa

"Come with me a few minutes, mortal, and we shall talk."

gordon@sneaky (05/20/86)

> In the Netnews 2.10.3 4.3bsd-beta 6/6/85 distibution, there is a program
> called article (written by Peter Honeyman, down!honey) that comes in
> the `misc' directory.  This program will take an article message ID
> (like 1052@ellie, 1424@lll-crg, etc.), look it up in the dbm version
> of the news history file, and if it finds it, will print out ...
...
> The problem I have found, is that more often than not, the program will
> blow up on a Bus Error.  This happens on a Sun 2-120 workstation, running
> Sun OS 2.0 (4.2BSD based).  The cause of this Bus Error is an execution
> of lseek(2).  A subroutine looks for the article ID in question, and returns
> a datum that has a pointer to the entry in the history dbm file.

I found this same code in news 2.10.2.  The data stored by dbm is an lseek()
offset into the history file.  The data returned by dbm's fetch() is
a pointer to an *UNALIGNED* lseek() offset.  Depending on your processor,
you get bus errors, or the processor quietly uses the long word which
contains the byte your pointer is pointing at, giving bozo results.
If the data happens to be aligned right for your processor, it works.

> It then calls lseek(2) with this pointer dereferenced in order to get to 
> that line in the history file.

I doubt it gets a chance to actually call lseek().

>                 if (lseek(fd, *(long *) content.dptr, 0) < 0)
>                         continue;
Does lint say "Possible pointer alignment problem"?
Try something like:
		{
			long	scratchvar; /* maybe should be off_t instead */
			int	i;

			/* can also use bcopy or something similar */
			for (i = 0; i < sizeof(long); i++)
				((char *) &scratchvar)[i] = content.dptr[i];
			if (lseek(fd, scratchvar, 0) < 0)
				continue;
		}


			Gordon Burditt
			...!convex!ctvax!trsvax!sneaky!gordon
			...!ihnp4!sys1!sneaky!gordon