mike@BRL.MIL (Mike Muuss) (03/02/89)
I have been continuing to have problems moving binaries between diffrent SGI platforms when they are compiled with shared libraries. I consider being able to share binaries to be a highly desirable feature, and I intend to continue to "flog" this topic until something is decided. I have modified the Cakefile so that all programs in the BRL-CAD Package are compiled (linked) with -lgl_s (when using LIBFB), -lm, and -lc_s. I still have problems when calling the routine ps_open_PostScript(). In this particular test, I compiled the code on a 4D/70GT, and then ran it on a 120/GTX, with this result: 54 voyage> ./pix-fb /n/spark/m/cad/pix/star.pix Bus error (core dumped) 55 voyage> dbx ./pix-fb dbx version 1.31 Copyright 1987 Silicon Graphics Inc. Copyright 1987 MIPS Computer Systems Inc. Type 'help' for help. Reading symbolic information of `./pix-fb' . . . Process name from core dump: pix-fb Process died at pc 0xf02db00 of signal : bus error [using memory image in core] (dbx) where > 0 ps_open_PostScript(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) [0xf02dafc] 1 sgi_dopen(0x90, 0x100030c0, 0x200, 0x200, 0x0, 0x0) ["../libfb/if_4d.c":720, 0x402c90] 2 fb_open(0x2, 0x200, 0x200, 0x0, 0x0, 0x0) ["../libfb/fb_generic.c":180, 0x400f78] 3 main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) ["../util/pix-fb.c":159, 0x400674] (dbx) More startlingly, when I run the same binary on a Personal Iris, it *leaps* to location 0, and traps, giving a worthless core dump (the stack frame is no good, DBX is lost). Compiling the code on the GTX and then running on a GT also dumps core. Same for GTX code run on a non-GT (eg 4D/70). A test program to demonstrate looks like this: main() { int g_status, f; /* * Now that the mode has been determined, * ensure that the graphics system is running. */ if( !(g_status = ps_open_PostScript()) ) { char * grcond = "/etc/gl/grcond"; char * newshome = "/usr/brlcad/etc"; /* XXX */ f = fork(); if( f < 0 ) { perror("fork"); return(-1); /* error */ } if( f == 0 ) { /* Child */ chdir( newshome ); execl( grcond, (char *) 0 ); perror( grcond ); _exit(1); /* NOTREACHED */ } /* Parent */ while( !(g_status = ps_open_PostScript()) ) { sleep(1); } } } This code is compiled by: cc foo.c -lgl_s -lc_s -o foo Run it on the machine that compiled it, no problem. Run the exact same binary on any other kind of 4D: Pffft -- core dumped. THE INTENT. The intent of this code fragment is to permit applications to produce graphics display even if nobody is logged in on the console (eg, the window manager is not running). This happens often at BRL because (a) we don't like using the SGI-provided keyboard, and (b) we produce a lot of our graphics elsewhere on the network, and just wish to display the result on an SGI. We don't necessarily want to have to log in on the SGI just to see pictures. If the window manager is not running, it has to be started first, with a somewhat restricted set of PostScript code (to avoid offering menus that might allow somebody to open a shell window). I believe that the HotLine told us to do things this way. It seems to work fine when used on the machine that compiled it, but not other types of SGI machines. Did I forget a compiler option? Not invoke the shared libraries right? Find a bug? Or what? So, several questions arise: 1) Is calling ps_open_PostScript() the only way to accomplish this test? 2) Is ps_open_PostScript() the best way? 3) Where is it documented, anyways? I can't find any mention online. ("pcat *.z|grep ps_open" is a poor substitute for "man -k"). 4) Can anyone suggest a workaround, so that I can make this stuff work, NOW? Any help you can provide will be most appreciated! Thanks, -Mike
msc@ramoth.SGI.COM (Mark Callow) (03/03/89)
In article <8903020325.aa27209@SPARK.BRL.MIL>, mike@BRL.MIL (Mike Muuss) writes: > I have been continuing to have problems moving binaries between diffrent > SGI platforms when they are compiled with shared libraries. I consider > being able to share binaries to be a highly desirable feature, and I > intend to continue to "flog" this topic until something is decided. > > I have modified the Cakefile so that all programs in the BRL-CAD Package > are compiled (linked) with -lgl_s (when using LIBFB), -lm, and -lc_s. > I still have problems when calling the routine ps_open_PostScript(). You have found a bug which is fixed in release 4D1-3.1 Rev D which was released to manufacturing on Monday February 27th. Volume shipments should start within 30 days. Incidently Rev D includes version 1.3 of 4Sight which has many bug fixes and is much more solid than version 1.2. Here are the gory details of Mike's problem. You can stop reading here if you don't care about them. The GL uses libcps to communicate with the window server. We embedded the key pieces of libcps in the GL so people with existing GL applications wouldn't have to change their Makefiles to also link with libcps. However these functions are not exported from the GL. When you link with a shared library and reference a function that isn't exported (exporting means the function is in the shared library's call table) the reference is resolved by calling to the address of the function in the version of the shared library you are linking with. When your program referenced ps_open_PostScript it ended up calling the address of ps_open_PostScript in the shared GL you linked with. It will be in a different place on other machines and other releases of the shared GL for the same machine. Hence it breaks on these other machines. I won't even try to explain the fix here. It is very long and complicated. The complexity comes from maintaining binary compatibility with GL applications from previous releases. In Rev D you will have to link programs that use the GL and do their own PostScript stuff with -lgl_s and -lcps and you will have binary compatibilty across all 3.1 Rev D machines. -- -Mark