[comp.protocols.nfs] bug in sun-3/os4.1 rpc/xdr?

BACON@MTUS5.BITNET (Jeffery Bacon) (08/14/90)

I submit the following problem:

Given the following declaration (taken just about right out of the Sun 4.1
netowrk prog guide) bla.x:

-------------
program BLAPROG { version BLAVERS {int BLA(int) = 1; } = 1; } = 0x20000099;
--------------

This should send an integer and receive an integer.

For this, I've written the following server procedure:

--------------
int *bla_1(oog) int *oog; { int erg = 5; printf("%d\n",*oog);return(&erg); }
--------------

(please forgive my lack of formatting, but this is pretty simple; I wrote it
on the fly)

And for the client side, a simple little program:

-----------------

#include <stdio.h>
#include <rpc/rpc.h>
#include "bla.h"
main() {
	CLIENT *bla;
	int *result, oog;

	bla=clnt_create("ctsee8",BLAPROG,BLAVERS,"tcp");
	oog=44;

	result=bla_1(&oog,bla);

	printf("%d\n",*result);
}

-----------------

Now I take and compile all of this. I should expect the server to print the
number 44, and the client should return the number 5. Simple enough. (I
copied the code out of the book as much as possible.)

Now, when I run the server on my os4.1 sparc, it works fine, whether I run
run the client on a sun3 or sun4. But when I run the server on a sun-3...
the server prints out 44 like it should, but the client prints out some
ridiculous number, again whether or not the client is run on a sun-3 or sun-4.
I've tried this on a 4.0.3 and 4.1 sun-3, and it still breaks.

The reason I got into this is because I am writing an app that returns a lot
of integers, returns a string in one place, and also some structures. Again,
when I run the server on a sun-4, everything is happy, and I get all the right
results (tested on sun-3, sun-4, and a sequent balance running dynix 3.0.17).
But when I run the server on a sun-3, the integer- and string-returning
functions don't return the right value. The odd thing (to me) is that the
struct-returning ones (well, pointer-to-struct) STILL WORK JUST FINE.
I have no real idea why.

Am I doing something wrong? I thought I might have been for a while (I ran
across this while working on my program, which isn't rpcgenned code, so I
thought it was my code), but if even this simple program breaks...or is
this an "official bug" that I don't know about?

Your time is greatly appreciated.

Jeffery Bacon -- Computing Technology Svcs., Michigan Technological University
email- bacon@mtus5.bitnet       voice: (906)487-2110        fax: (906)487-2787
alternate-  uucp: <world>!itivax!anet!bacos  domain: bacos%anet@itivax.iti.org

corbin@nibroc.Eng.Sun.COM (John Corbin) (08/14/90)

In article <90225.174053BACON@MTUS5.BITNET> BACON@MTUS5.BITNET (Jeffery Bacon) writes:
>...
>For this, I've written the following server procedure:
>
>--------------
>int *bla_1(oog) int *oog; { int erg = 5; printf("%d\n",*oog);return(&erg); }
>--------------

You need to declare erg to be a static variable. Your bla_1() routine is
returning the address of an automatic variable.  As to why it works on
some machines and not others, I would guess that for some the area on
the stack gets modified and for others it doesn't.

John Corbin	(jcorbin@Sun.COM)
Sun Microsystems

mh@iti.org (Mike Hoegeman) (08/16/90)

In article <90225.174053BACON@MTUS5.BITNET> BACON@MTUS5.BITNET (Jeffery Bacon) writes:
 >I submit the following problem:
 >Given the following declaration (taken just about right out of the Sun 4.1
 >netowrk prog guide) bla.x:
 >-------------
 >program BLAPROG { version BLAVERS {int BLA(int) = 1; } = 1; } = 0x20000099;
 >--------------
 >This should send an integer and receive an integer.
 >For this, I've written the following server procedure:
 >--------------
 >int *bla_1(oog) int *oog; { int erg = 5; printf("%d\n",*oog);return(&erg); }
 >--------------
 >(please forgive my lack of formatting, but this is pretty simple; I wrote it
 >on the fly)
 >And for the client side, a simple little program:
 >-----------------
 >#include <stdio.h>
 >#include <rpc/rpc.h>
 >#include "bla.h"
 >main() {
 >	CLIENT *bla;
 >	int *result, oog;
 >	bla=clnt_create("ctsee8",BLAPROG,BLAVERS,"tcp");
 >	oog=44;
 >	result=bla_1(&oog,bla);
 >	printf("%d\n",*result);
 >}

(poster goes on to explain that sometimes the server program returns garbage.)

 answer: the problem is in your server procedure

 >int *bla_1(oog) int *oog; { int erg = 5; printf("%d\n",*oog);return(&erg); }

 the statement 
     int erg = 5;

 should be 
     static int erg; 
     erg = 5;

You are returning pointer to a int which is a variable that goes out of
scope by the time the service transport to returns that value. That is why 
you sometimes get back garbage. You are just getting 'lucky' in the cases
where it seems to work.

If you find making 'erg' static repugnant , you can do something like this
inside of bla_1

    int *
    bla_1(oog)
	int *oog;
    {
	extern SVCXPRT *MyXprt;
	int erg;
	:
	<processing to produce value for erg here>
	:
	svc_sendreply(MyXprt, xdr_int, &erg); /* xprt is the SVCXPRT * 
						created at via
						something like
						svcudp_create or
						svctcp_create */

	return ((int *)0); /* the return of a NULL pointer 
			  tells the server machinery that you
			  have already sent the reply back yourself. */
    }

the above is off the top of my head , you can check the man page on
svc_sendreply() to make sure it's right


-mike h.
--
-------------------------------------------------------------------------------
Mike Hoegeman               email: mike@wlv.imsd.contel.com  tel: (818)706-4145
Contel Federal Systems      31717 La Tienda Dr, Westlake Village CA. 91359

barmar@think.com (Barry Margolin) (08/16/90)

In article <90225.174053BACON@MTUS5.BITNET> BACON@MTUS5.BITNET (Jeffery Bacon) writes:
>int *bla_1(oog) int *oog; { int erg = 5; printf("%d\n",*oog);return(&erg); }

This is invalid C code.  You are returning the address of an automatic
variable outside the extent of the declaration.  By the time the caller of
bla_1 uses the returned pointer the stack frame has probably been
overwritten.  The "erg" variable must either have static duration (either
declare it globally, or give it the "static" storage class specifier), or
it must be allocated in the heap (i.e. with malloc()).  In this case,
malloc() is probably the wrong choice because you will never be able to
free it.

--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

BACON@MTUS5.BITNET (Jeffery Bacon) (08/18/90)

Bleah. Apologies to the net for what is now to me an obvious mistake.
I can't believe I actually missed that...

Jeffery Bacon -- Computing Technology Svcs., Michigan Technological University
email- bacon@mtus5.bitnet       voice: (906)487-2110        fax: (906)487-2787
alternate-  uucp: <world>!itivax!anet!bacos  domain: bacos%anet@itivax.iti.org
I hereby reserve the right to be stupid and to make a fool of myself in public.