[comp.sys.sun] debugging query

silvert@cs.dal.ca (Bill Silvert) (10/08/90)

As a guest user of this Sun 4 I've spent an amazing amount of time
debugging pretty vanilla C and Fortran code that runs on every system but
Sun.  I'm hoping that someone can steer me in a direction that will make
life easier, especially since I am in the market for a small system for my
own division and may end up with a SPARCstation.

Consistently I have programs give me segmentation errors with virtually no
way of finding out what the error is -- even if I run dbx with debug and
trace the best I can do is find out what happened before the crash.  For
example, this code error:

	char *env;
	env = getenv("STRING");
	if(strchr(env, 'a')) ...

crashes if STRING is not defined so that env = NULL (other machines don't
even consider this an error, they act as though *env = '\0').  But I can't
get the debugger to tell me that I have illegally passed a NULL pointer, I
just get the dreaded signal 11.

I've just spent several hours curing (not finding) a problem in a mixed
Fortran and C program.  It turned out that the C code included a short (~
10 line) function which wasn't used, and deleting it cured the
segmentation error, which occurred while loading.  But the function seems
perfectly correct, and in fact it is actually a Sun function that I used
in an earlier version of the program (scale.c, which converts between
centimeters and inches).  The graphics package of which this is part
hadn't been invoked, and the only reports I could get with the debuggers
dbx and adb were (1) sometimes the program crashed when it encountered a
Fortran PRINT statement with 4 continuation lines, and (2) adb could not
find a data address for one of the COMMON blocks.  I used all kinds of f77
options, such as -Nx..., and all that did was change the COMMON block
which could not be found.  By the way, none of these COMMON blocks was
accessed by the C code.

This headaches raise several questions.  (1) am I missing a powerful and
effective way to get code running on a Sun?  (2) are the Sun software
products trustworthy?  (3) should I be investing in a Sun when my major
application is developing Fortran and C programs -- our major use is
ecological simulation.

Any advice would be most welcome.  Thanks in advance.  Bill

William Silvert, Habitat Ecology Division, Bedford Inst. of Oceanography
P. O. Box 1006, Dartmouth, Nova Scotia, CANADA B2Y 4A2.  Tel. (902)426-1577
UUCP=..!{uunet|watmath}!dalcs!biomel!bill
BITNET=bill%biomel%dalcs@dalac	InterNet=bill%biomel@cs.dal.ca

venkat@uunet.uu.net (D Venkatrangan) (11/01/90)

In article <1990Oct7.230303.2350@rice.edu> silvert@cs.dal.ca (Bill Silvert) writes:

>Consistently I have programs give me segmentation errors with virtually no
>way of finding out what the error is -- even if I run dbx with debug and
>trace the best I can do is find out what happened before the crash.  For
>example, this code error:
>
>	char *env;
>	env = getenv("STRING");
>	if(strchr(env, 'a')) ...
>
>crashes if STRING is not defined so that env = NULL (other machines don't
>even consider this an error, they act as though *env = '\0').  But I can't
>get the debugger to tell me that I have illegally passed a NULL pointer, I
>just get the dreaded signal 11.

In SS architecture machines, if you cast a non quad-word aligned address
into a quad-word type (such as in int), you could cause a program crash
similar to the type you have described.  Perhaps your problem is the same.
The debugger would not show that anything is wrong.  About the only thing
you can verify in dbx is to assembly step to the instruction where you get
the segmentation error and examine the target address of the mov that
causes the crash.  Make sure it is quad-word aligned.