[comp.os.v] crashing V unix server on 4.3 + nfs

cdash@BOULDER.COLORADO.EDU (Charles Shub) (10/27/88)

our server machine keeps crashing. it is a microvax running 4.3 + nfs

in bringing up the packet filter, we had to change the call to MCLGET
in enet.c because the distributed code (for 4.2) called MCLGET with 2
arguments and the macro definition for 4.3 + nfs has only one argument.
we're not sure this got fixed correctly. If anybody else has added the filter
to a 4.3 + nfs configuration, would they be kind enough to help us with
getting the change correct. it is at about line 580 of vaxif/enet.c 

a snapshot of that area of the fixed code or a diff between enet.c in the V
distribution and enet.c as it is working on your system would be appreciated.

on the other hand, if you know of a DIFFERENT reason why our server might be
crashing, we're all ears.

thanks...

charlie shub  cdash@boulder.Colorado.EDU  -or-  ..!{ncar|nbires}!boulder!cdash
  or even     cdash@colospgs (BITNET)

mogul@decwrl.dec.com (Jeffrey Mogul) (10/29/88)

In article <8810270030.AA07914@boulder.Colorado.EDU> cdash@BOULDER.COLORADO.EDU (Charles Shub) writes:
>our server machine keeps crashing. it is a microvax running 4.3 + nfs
>
>in bringing up the packet filter, we had to change the call to MCLGET
>in enet.c because the distributed code (for 4.2) called MCLGET with 2
>arguments and the macro definition for 4.3 + nfs has only one argument.
>we're not sure this got fixed correctly. If anybody else has added the filter
>to a 4.3 + nfs configuration, would they be kind enough to help us with
>getting the change correct. it is at about line 580 of vaxif/enet.c 

I can't promise that this is right for you, but this is how my version
of the code (which runs in my private copy of Ultrix, NOT a product)
looked when it was hacked over by the folks at Stanford (this is a
reconstruction; I don't actually have one file with this code in it).

	    if (iov->iov_len >= CLBYTES) {	/* big enough to use a page */
		register struct mbuf *p;
#ifdef	SULTRIX
		if (mclget(m) == 0)
#else
		MCLGET(m, p);
		if (p == 0)
#endif	SULTRIX
		    goto nopages;
		len = CLBYTES;
	    }
	    else {
nopages:
		len = MIN(MLEN, iov->iov_len);
	    }

This is NOT GUARANTEED!

By the way, I can contemplate rather nasty punishments for people
who change the arguments to macros without changing their name.

-Jeff

bart@videovax.Tek.COM (Bart Massey) (11/03/88)

In article <8810270030.AA07914@boulder.Colorado.EDU> cdash@BOULDER.COLORADO.EDU (Charles Shub) writes:
> our server machine keeps crashing. it is a microvax running 4.3 + nfs
> ...
> on the other hand, if you know of a DIFFERENT reason why our server might be
> crashing, we're all ears.

Here at Tektronix TV Systems, we've been using V for product development for
several years.  Our Emmy-winning VM-700, a TV test and measurement
instrument, runs the V kernel internally, and was developed almost entirely
in a V development environment.

I looked at the original versions of our V server sources, and the fixes I'm
about to suggest all seem to apply, but I'm not sure what else might need to
be fixed -- the last guy to work on the server left before I arrived.
Caveat Fixor.

The fixes below were put in largely as a result of porting the server to a
BIG_ENDIAN machine -- the 68020-based Tektronix 4301.  I know this isn't the
above person's problem, though I think someone did ask about the Sun.  Note
that some of the fixes below *are* more generally applicable...

I strongly suggest that you use the debugging flags to isolate V server
problems to a specific piece of code.  I usually run my test server with -A
in a jove i-process window, so that I can get a permanent record, and can
easily kill the server.

Anyway, here's my fixes in human readable form -- I'm afraid our server may
be different enough that context diffs wouldn't be too useful...  BTW, I
found that it was a lot easier to work on the source after I folded all the
source directories and the binary directory together into one big mess,
UNIX-style, and rewrote the buildfile so that it would also work as a
Makefile.  Then I could easily build the server under UNIX before porting
any of the rest of the V stuff...

At least in the V server version we have, there was a horrid bug in Vikc.h  .
Around line 62 is the macro "SwapIKPacket".  If your copy looks like

> /* swaps a kPacket in place.  Does not affect the appended segment, if any. */
> #define SwapIKPacket(p) \
>    swabSmall( (char *)(p), 2*sizeof(short)); \
>    ByteSwapLongInPlace( (char *)&((kPacket *)(p))->srcPid, \
>			 sizeof(kPacket)-2*sizeof(short))

it needs to look like

> /* swaps a kPacket in place.  Does not affect the appended segment, if any. */
> #define SwapIKPacket(p) \
>    (swabSmall( (char *)(p), 2*sizeof(short)),	\
>    ByteSwapLongInPlace( (char *)&((kPacket *)(p))->srcPid, \
>			 sizeof(kPacket)-2*sizeof(short)))

The reason is obvious, especially when you look at the reference to it in
the server (ether10meg.c, ~line 298).  Note, amusingly, that the right thing
does coincidentally happen on machines where the server is
DifferentIKCByteOrder than everyone else, which is normally the case...

Also, there's an uninitialized error return from a function, which happens
to be zero on our 750 often enough that things almost work.  At the
beginning of the code body for "ForkNewSession" (session.c, ~line 194), add
the line "*error = OK;".  Again, look at the call (server.c, ~line 126) to
understand this.

A bunch of files are missing the critical include of config.h .  Identify
and insert these, as this is where LITTLE_ENDIAN gets defined (or not) and
so a bunch of the other header files depend on it...

Finally, there's a bug in qk.c which was masked on vaxen by the fact that
config.h was not included, and thus LITTLE_ENDIAN was undefined.  If your
qk.c has a switch that looks like

> switch( req->requestCode )

it should be changed to read

> switch( req->groupSelect )

after the include of config.h is inserted.

With all of these fixes installed, if your packet filter works, you may be
able to get your UNIX V Server up, even on a BIG_ENDIAN box.  I also have
some really simple utilities I wrote to test our packet filter port.  If
anybody is interested, I could post them here...

					Bart Massey
					
					Tektronix, Inc.
					TV Systems Engineering
					M.S. 58-639
					P.O. Box 500
					Beaverton, OR 97077
					(503) 627-5320

					UUCP: ..tektronix!videovax!bart
					DOMAIN: bart@videovax.tek.com