[comp.sys.sgi] Troubles with shared libraries!

mike@BRL.MIL (Mike Muuss) (03/02/89)

I have been continuing to have problems moving binaries between diffrent
SGI platforms when they are compiled with shared libraries.  I consider
being able to share binaries to be a highly desirable feature, and I
intend to continue to "flog" this topic until something is decided.

I have modified the Cakefile so that all programs in the BRL-CAD Package
are compiled (linked) with -lgl_s (when using LIBFB), -lm, and -lc_s.
I still have problems when calling the routine ps_open_PostScript().

In this particular test, I compiled the code on a 4D/70GT, and then
ran it on a 120/GTX, with this result:

54 voyage> ./pix-fb /n/spark/m/cad/pix/star.pix
Bus error (core dumped)

55 voyage> dbx ./pix-fb
dbx version 1.31
Copyright 1987 Silicon Graphics Inc.
Copyright 1987 MIPS Computer Systems Inc.
Type 'help' for help.
Reading symbolic information of `./pix-fb' . . .
Process name from core dump: pix-fb
Process died at pc 0xf02db00 of signal : bus error
[using memory image in core]                       
(dbx) where
>  0 ps_open_PostScript(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) [0xf02dafc]
   1 sgi_dopen(0x90, 0x100030c0, 0x200, 0x200, 0x0, 0x0) ["../libfb/if_4d.c":720, 0x402c90]
   2 fb_open(0x2, 0x200, 0x200, 0x0, 0x0, 0x0) ["../libfb/fb_generic.c":180, 0x400f78]
   3 main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) ["../util/pix-fb.c":159, 0x400674]
(dbx)

More startlingly, when I run the same binary on a Personal Iris,
it *leaps* to location 0, and traps, giving a worthless core dump
(the stack frame is no good, DBX is lost).

Compiling the code on the GTX and then running on a GT also dumps core.
Same for GTX code run on a non-GT (eg 4D/70).

A test program to demonstrate looks like this:

main()
{
	int	g_status, f;
	/*
	 *  Now that the mode has been determined,
	 *  ensure that the graphics system is running.
	 */
	if( !(g_status = ps_open_PostScript()) )  {
		char * grcond = "/etc/gl/grcond";
		char * newshome = "/usr/brlcad/etc";		/* XXX */

		f = fork();
		if( f < 0 )  {
			perror("fork");
			return(-1);		/* error */
		}
		if( f == 0 )  {
			/* Child */
			chdir( newshome );
			execl( grcond, (char *) 0 );
			perror( grcond );
			_exit(1);
			/* NOTREACHED */
		}
		/* Parent */
		while( !(g_status = ps_open_PostScript()) )  {
			sleep(1);
		}
	}
}

This code is compiled by:

cc foo.c -lgl_s -lc_s -o foo

Run it on the machine that compiled it, no problem.
Run the exact same binary on any other kind of 4D: Pffft -- core dumped.

THE INTENT.

The intent of this code fragment is to permit applications to produce
graphics display even if nobody is logged in on the console (eg, the
window manager is not running).  This happens often at BRL because
(a) we don't like using the SGI-provided keyboard, and (b) we produce
a lot of our graphics elsewhere on the network, and just wish to display
the result on an SGI.  We don't necessarily want to have to log in
on the SGI just to see pictures. If the window manager is not running,
it has to be started first, with a somewhat restricted set of PostScript
code (to avoid offering menus that might allow somebody to open a shell
window).

I believe that the HotLine told us to do things this way.
It seems to work fine when used on the machine that compiled it, but
not other types of SGI machines.

Did I forget a compiler option?  Not invoke the shared libraries right?
Find a bug?  Or what?

So, several questions arise:

1)  Is calling ps_open_PostScript() the only way to accomplish this test?

2)  Is ps_open_PostScript() the best way?

3)  Where is it documented, anyways?  I can't find any mention online.
    ("pcat *.z|grep ps_open" is a poor substitute for "man -k").

4)  Can anyone suggest a workaround, so that I can make this stuff work, NOW?

Any help you can provide will be most appreciated!
	Thanks,
	 -Mike

msc@ramoth.SGI.COM (Mark Callow) (03/03/89)

In article <8903020325.aa27209@SPARK.BRL.MIL>, mike@BRL.MIL (Mike Muuss) writes:
> I have been continuing to have problems moving binaries between diffrent
> SGI platforms when they are compiled with shared libraries.  I consider
> being able to share binaries to be a highly desirable feature, and I
> intend to continue to "flog" this topic until something is decided.
> 
> I have modified the Cakefile so that all programs in the BRL-CAD Package
> are compiled (linked) with -lgl_s (when using LIBFB), -lm, and -lc_s.
> I still have problems when calling the routine ps_open_PostScript().

You have found a bug which is fixed in release 4D1-3.1 Rev D which was
released to manufacturing on Monday February 27th.  Volume shipments should
start within 30 days.

Incidently Rev D includes version 1.3 of 4Sight which has many bug fixes
and is much more solid than version 1.2.

Here are the gory details of Mike's problem.  You can stop reading here
if you don't care about them.
The GL uses libcps
to communicate with the window server.  We embedded the key pieces of libcps
in the GL so people with existing GL applications wouldn't have to change
their Makefiles to also link with libcps.  However these functions are
not exported from the GL.  When you link with a shared library and reference
a function that isn't exported (exporting means the function is in the shared
library's call table) the reference is resolved by calling to the address
of the function in the version of the shared library you are linking with.

When your program referenced ps_open_PostScript it ended up calling the
address of ps_open_PostScript in the shared GL you linked with.  It will
be in a different place on other machines and other releases of the shared
GL for the same machine.  Hence it breaks on these other machines.

I won't even try to explain the fix here.  It is very long and complicated.
The complexity comes from maintaining binary compatibility with GL applications
from previous releases.  In Rev D you will have to link programs that
use the GL and do their own PostScript stuff with -lgl_s and -lcps and
you will have binary compatibilty across all 3.1 Rev D machines.

--
	-Mark