clewis@eci386.uucp (Chris Lewis) (07/06/90)
As faulting on NULL dereference is often an explicit decision on the part of the O/S's memory management/loader definition people, I was wondering if anyone knew how to modify COFF to implement NULL trapping on COFF systems... Background: I distribute psroff and some of the code has been caught with different compilers on other systems with the infamous null dereference (ala *p = NULL; i = *p;, or worse, *p = NULL; i = p->something). I've managed to find and swat most of these, but it would be nice if I could fix my compiler/loader/OS to trap them *here*. On System V (I'm 386/ix 1.0.6), the memory layout of an executable program is controlled by a default loader control file ("ifile"), which lays out where .text/.data/.bss will be relocated and arranged in the memory image. Some system implementations of COFF have various ifiles that describe how the images are to be built for various memory models (eg: COFF implementations on 286 or shared vs. non-shared on other systems - corresponding to the magic numbers). The 386 one uses the "defaults" built into "ld"'s binary, which I can't seem to be able to reconstruct from the 386/ix Guide entries for the loader. Eg: by manually creating an ifile, I can't seem to be able to build a binary that runs (and many variants won't even link - the examples seem defective). What's really interesting is that using the implicit "ld" layout causes at least *some* of the COFF headers to precede the run-time startoff at virtual 0 - in fact, the load module's magic number is at 0 in the executing image! (so pointer chasing can get rather nasty and "printf("%s", 0)" prints a string of garbage) Anyways, two questions: 1) Has anybody got a working ifile for a 386 UNIX system that I could try playing with? 2) Has anybody got a working ifile for 386 UNIX systems that explicitly maps *out* at least the first couple of pages at virtual 0 so that null dereferences fault? Is this possible? (does the 386/ix execution model memory requirements forbid this?) I attacked this from another approach, by putting a filter in front of the assembler that inserts checking code immediately prior to all register dereferences. But, it (at least in my few minutes of hacking) misses some dereferences, and explodes the binary size by something near a factor of 2 - I'd rather it didn't effect the program except for relocation factors. [No, I'm too ashamed to post it...] To reiterate a remark I made before: All programmers should be forced to develop their UNIX software on systems that have a thermonuclear device triggered by the access bits on page 0.... I'm just trying to get there myself ;-) [Yes, I know that NULL isn't necessarily 0 from a C language theoretical point - I'm just trying to implement a testing mechanism for my specific implementation...] -- Chris Lewis, Elegant Communications Inc, {uunet!attcan,utzoo}!lsuc!eci386!clewis Ferret mailing list: eci386!ferret-list, psroff mailing list: eci386!psroff-list
jak@sactoh0.UUCP (Jay A. Konigsberg) (07/07/90)
>As faulting on NULL dereference is often an explicit decision on the >part of the O/S's memory management/loader definition people, I was >wondering if anyone knew how to modify COFF to implement NULL trapping >on COFF systems... > I have been watching this thread for some time now and to my surprise, no one out there seems to know about the -z option. ld(1): -z Do not bind anything to address zero. This option will allow runtime detection of null pointers. cc(1): the cc command recognizes ... -z ... and passes these options directly to the loader [ld(1)]. Now, don't take this to mean I want this thread to stop, I don't. However a lot of the postings seem to be asking how their machine can catch this neferious error/bug. (one that I am all to guilty of - thats how I know about the -z). -- ------------------------------------------------------------- Jay @ SAC-UNIX, Sacramento, Ca. UUCP=...pacbell!sactoh0!jak If something is worth doing, its worth doing correctly.
jc@minya.UUCP (John Chambers) (07/08/90)
[Reader Warning: portability horror story follows which may be more than weak hearts can handle; you may wish to hit the 'n' key now. ;-} In article <1990Jul5.174608.17336@eci386.uucp>, clewis@eci386.uucp (Chris Lewis) writes: > To reiterate a remark I made before: > All programmers should be forced to develop their UNIX software > on systems that have a thermonuclear device triggered by the > access bits on page 0.... > I'm just trying to get there myself ;-) > [Yes, I know that NULL isn't necessarily 0 from a C language theoretical > point - I'm just trying to implement a testing mechanism for my specific > implementation...] Part of the reason I find this chain interesting is some fun I had a few years back porting an application to a long list of Unix systems. It was fairly well-written, with lots of builtin error detection, a run-time debug-level flag, and all that. On one system it compiled fine and seemed to run, but it totally ignored all its command-line parameters. This naturally aroused my curiosity. It seems that the program had a set of N routines for chewing up assorted command-line args, essentially a routine for each module. Each routine stripped out the args it found interesting, and left the uninteresting ones in argv[] for the next routine. They were called by code that looked like: args1(&argc,argv); args2(&argc,argv); ... argsN(&argc,argv); and anything left over was treated as a file name, as I recall. Seems quite reasonable. These routines, of course, did some error checking on their params, including making sure they were nonnull and that argv[0] thru argv[*argc-1] were also nonnull. On one particular machine (whose name will be omitted to protect the guilty turkeys who should all be strung up by their toes, but I digress; note that it could be your next system ;-), the folks who put the Unix system together decided, in their wisdom, that page 0 of the D-space shouldn't be used by applications. But they didn't put it out of bounds; they used it for "system" stuff. In particular, since the execve() call needs to stuff its args into the process's data area, they decided to put all the main() args at the start of page 0, starting with argc, then the strings for argv[], then the strings for argv[], then the pointers for these vectors, and then some other junk that isn't relevant here. Do I detect looks of horror forming on the faces of some readers? Yep, you got it right. Page 0 (at virtual address 0) started with argc, so the above calls looked like: args1(0,argv); args2(0,argv); ... argsN(0,argv); And since the routines were written by a Good Programmer, they all tested both args for being NULL, and returned immediately. I "corrected" the code by #ifdefing out the null test on the first argument. I won't repeat the verbal comments that went along with it. I've also used this on occasion to illustrate the Old Engineer's maxim that you can't guarantee the correct functioning of a system by guaranteeing correctness of all its parts. If you examine the design of any one of the parts, each in itself seems like a very reasonable way to do that job. But when you put these particular parts together, the result is a disaster, and you can't point a finger at any one culprit as the "cause". For instance, if the C compiler had used some value other than zero for null, the code would have worked, but the C compiler writers are innocent because in fact zero is a valid representation of null pointers, and they aren't responsible for the OS's memory layout. The memory-management module is innocent, because zero is a valid hardware address, and there's no inherent reason to make any address illegal; that's the job of the memory-allocation module. Completing the argument for each part of the system is left as an exercise for the reader... Just though I'd share this with y'all. (Say "Thank you, jc.") -- Typos and silly ideas Copyright (C) 1990 by: Uucp: ...!{harvard.edu,ima.com,eddie.mit.edu,ora.com}!minya!jc (John Chambers) Home: 1-617-484-6393 Work: 1-508-952-3274
gwyn@smoke.BRL.MIL (Doug Gwyn) (07/09/90)
In article <423@minya.UUCP> jc@minya.UUCP (John Chambers) writes: >I've also used this on occasion to illustrate the Old Engineer's >maxim that you can't guarantee the correct functioning of a system >by guaranteeing correctness of all its parts. Sure you can, if the design is also correct. However, in this particular example, the C implementation was clearly NOT correct, since the implementors assigned an object an address such that it was not distinguishable from a null pointer. There is no way to guarantee that one's application will not break due to bugs in the implementation of the target system.