[comp.unix.wizards] Implementing NULL trapping

clewis@eci386.uucp (Chris Lewis) (07/06/90)

As faulting on NULL dereference is often an explicit decision on the
part of the O/S's memory management/loader definition people, I was
wondering if anyone knew how to modify COFF to implement NULL trapping
on COFF systems...

Background: I distribute psroff and some of the code has been caught
with different compilers on other systems with the infamous null
dereference (ala *p = NULL; i = *p;, or worse, *p = NULL; i = p->something).
I've managed to find and swat most of these, but it would be nice if
I could fix my compiler/loader/OS to trap them *here*.

On System V (I'm 386/ix 1.0.6), the memory layout of an executable
program is controlled by a default loader control file ("ifile"),
which lays out where .text/.data/.bss will be relocated and arranged
in the memory image.  Some system implementations of COFF have
various ifiles that describe how the images are to be built for
various memory models (eg: COFF implementations on 286 or shared vs.
non-shared on other systems - corresponding to the magic numbers).  The
386 one uses the "defaults" built into "ld"'s binary, which I can't
seem to be able to reconstruct from the 386/ix Guide entries for
the loader.  Eg: by manually creating an ifile, I can't seem to be
able to build a binary that runs (and many variants won't even link - the
examples seem defective).  What's really interesting is that
using the implicit "ld" layout causes at least *some* of the
COFF headers to precede the run-time startoff at virtual 0 -
in fact, the load module's magic number is at 0 in the executing
image!  (so pointer chasing can get rather nasty and "printf("%s", 0)"
prints a string of garbage)

Anyways, two questions:
	1) Has anybody got a working ifile for a 386 UNIX system
	   that I could try playing with?
	2) Has anybody got a working ifile for 386 UNIX systems
	   that explicitly maps *out* at least the first couple
	   of pages at virtual 0 so that null dereferences fault?
	   Is this possible?  (does the 386/ix execution model
	   memory requirements forbid this?)

I attacked this from another approach, by putting a filter
in front of the assembler that inserts checking code immediately
prior to all register dereferences.  But, it (at least in my
few minutes of hacking) misses some dereferences, and explodes
the binary size by something near a factor of 2 - I'd rather
it didn't effect the program except for relocation factors.

[No, I'm too ashamed to post it...]

To reiterate a remark I made before:

    All programmers should be forced to develop their UNIX software
    on systems that have a thermonuclear device triggered by the
    access bits on page 0....

I'm just trying to get there myself ;-)

[Yes, I know that NULL isn't necessarily 0 from a C language theoretical
point - I'm just trying to implement a testing mechanism for my specific
implementation...]
-- 
Chris Lewis, Elegant Communications Inc, {uunet!attcan,utzoo}!lsuc!eci386!clewis
Ferret mailing list: eci386!ferret-list, psroff mailing list: eci386!psroff-list

jak@sactoh0.UUCP (Jay A. Konigsberg) (07/07/90)

>As faulting on NULL dereference is often an explicit decision on the
>part of the O/S's memory management/loader definition people, I was
>wondering if anyone knew how to modify COFF to implement NULL trapping
>on COFF systems...
>
I have been watching this thread for some time now and to my surprise,
no one out there seems to know about the -z option.

ld(1): -z Do not bind anything to address zero. This option will allow
	  runtime detection of null pointers.

cc(1): the cc command recognizes ... -z ... and passes these options
       directly to the loader [ld(1)].


Now, don't take this to mean I want this thread to stop, I don't. However
a lot of the postings seem to be asking how their machine can catch this
neferious error/bug. (one that I am all to guilty of - thats how I know
about the -z).

-- 
-------------------------------------------------------------
Jay @ SAC-UNIX, Sacramento, Ca.   UUCP=...pacbell!sactoh0!jak
If something is worth doing, its worth doing correctly.

jc@minya.UUCP (John Chambers) (07/08/90)

[Reader Warning:  portability horror story follows which may be more
than weak hearts can handle; you may wish to hit the 'n' key now. ;-}

In article <1990Jul5.174608.17336@eci386.uucp>, clewis@eci386.uucp (Chris Lewis) writes:
> To reiterate a remark I made before:
>     All programmers should be forced to develop their UNIX software
>     on systems that have a thermonuclear device triggered by the
>     access bits on page 0....
> I'm just trying to get there myself ;-)
> [Yes, I know that NULL isn't necessarily 0 from a C language theoretical
> point - I'm just trying to implement a testing mechanism for my specific
> implementation...]

Part of the reason I find this chain interesting is some fun I had a
few years back porting an application to a long list of Unix systems.
It was fairly well-written, with lots of builtin error detection, a
run-time debug-level flag, and all that.  On one system it compiled
fine and seemed to run, but it totally ignored all its command-line
parameters.  This naturally aroused my curiosity.

It seems that the program had a set of N routines for chewing up
assorted command-line args, essentially a routine for each module.
Each routine stripped out the args it found interesting, and left
the uninteresting ones in argv[] for the next routine.  They were
called by code that looked like:
	args1(&argc,argv);
	args2(&argc,argv);
	...
	argsN(&argc,argv);
and anything left over was treated as a file name, as I recall.
Seems quite reasonable.  These routines, of course, did some
error checking on their params, including making sure they were
nonnull and that argv[0] thru argv[*argc-1] were also nonnull.

On one particular machine (whose name will be omitted to protect 
the guilty turkeys who should all be strung up by their toes, but 
I digress; note that it could be your next system ;-), the folks 
who put the Unix system together decided, in their wisdom, that 
page 0 of the D-space shouldn't be used by applications.  But they 
didn't put it out of bounds; they used it for "system" stuff.  In 
particular, since the execve() call needs to stuff its args into 
the process's data area, they decided to put all the main() args 
at the start of page 0, starting with argc, then the strings for 
argv[], then the strings for argv[], then the pointers for these 
vectors, and then some other junk that isn't relevant here.

Do I detect looks of horror forming on the faces of some readers?
Yep, you got it right.  Page 0 (at virtual address 0) started with 
argc, so the above calls looked like:
	args1(0,argv);
	args2(0,argv);
	...
	argsN(0,argv);
And since the routines were written by a Good Programmer, they
all tested both args for being NULL, and returned immediately.

I "corrected" the code by #ifdefing out the null test on the
first argument.  I won't repeat the verbal comments that went
along with it.

I've also used this on occasion to illustrate the Old Engineer's
maxim that you can't guarantee the correct functioning of a system
by guaranteeing correctness of all its parts.  If you examine the 
design of any one of the parts, each in itself seems like a very
reasonable way to do that job.  But when you put these particular
parts together, the result is a disaster, and you can't point a
finger at any one culprit as the "cause".  For instance, if the
C compiler had used some value other than zero for null, the code
would have worked, but the C compiler writers are innocent because 
in fact zero is a valid representation of null pointers, and they 
aren't responsible for the OS's memory layout.  The memory-management
module is innocent, because zero is a valid hardware address, and 
there's no inherent reason to make any address illegal; that's the 
job of the memory-allocation module.  Completing the argument for
each part of the system is left as an exercise for the reader...

Just though I'd share this with y'all.  (Say "Thank you, jc.")

-- 
Typos and silly ideas Copyright (C) 1990 by:
Uucp: ...!{harvard.edu,ima.com,eddie.mit.edu,ora.com}!minya!jc (John Chambers)
Home: 1-617-484-6393
Work: 1-508-952-3274

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/09/90)

In article <423@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>I've also used this on occasion to illustrate the Old Engineer's
>maxim that you can't guarantee the correct functioning of a system
>by guaranteeing correctness of all its parts.

Sure you can, if the design is also correct.

However, in this particular example, the C implementation was
clearly NOT correct, since the implementors assigned an object
an address such that it was not distinguishable from a null
pointer.

There is no way to guarantee that one's application will not
break due to bugs in the implementation of the target system.