[comp.lang.c] Referencing through a null pointer

Paul_L_Schauble@cup.portal.com (04/23/88)

Could someone please tell me which machine it was that started the
unfortunate convention that referencing through the null pointer returns a
zero with no error? I was under the impression it was BSD Unix, but I'm not
sure.

    Thanks,
       Paul

henry@utzoo.uucp (Henry Spencer) (04/24/88)

> Could someone please tell me which machine it was that started the
> unfortunate convention that referencing through the null pointer returns a
> zero with no error? I was under the impression it was BSD Unix, but I'm not
> sure.

Sigh, yet another example of Berkeley being given credit for something they
didn't do.  In this case maybe we should let them have it, mind you... :-)
[pun intentional]

This convention started in split-space programs on the pdp11/45 under V6,
as an implementation accident:  the data space really did begin at 0 in
a split-space program, so an anonymous variable was inserted there to make
sure that no normal variable had address 0, and unfortunately said variable
was initialized to 0.  I'm not sure about V6, but there are V7 programs
that depend on this.  So Bell Labs gets the blame here.  (I hesitate to
blame Dennis in particular; he is probably responsible for the anonymous
variable, but it would have required unusual foresight to predict the
problems and put some strange value there instead of zero.)

Both System V and Berklix inherited the problem, but it *mostly* got cleaned
out of Berklix by early efforts at Sun that were fed back to Berkeley.  It
keeps creeping back, since Berklix (unlike Sunnix) does not set up its page
map to trap accesses to location 0.  Unless it's been fixed quite recently,
System V still has the problem, since AT&T likewise does not map out 0.
(If it comes to that, Sun would probably have preferred not to, but as I
recall it their early hardware gave them no choice.)  Patches to make 4.2
map out zero have been posted in the past; the same could probably be done
for System V, but far more programs would break.
-- 
"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
non-negotiable."  --DMR              | {ihnp4,decvax,uunet!mnetor}!utzoo!henry

chris@mimsy.UUCP (Chris Torek) (04/24/88)

In article <4729@cup.portal.com>, Paul_L_Schauble@cup.portal.com writes:
>Could someone please tell me which machine it was that started the
>unfortunate convention that referencing through the null pointer returns a
>zero with no error? I was under the impression it was BSD Unix, but I'm not
>sure.

PDP-11s with split I&D had a shim at data address zero; I believe the
shim was a zero.  (The shim is necessary to keep the first data object
from having an address that compares equal to NULL.)

I imagine that PDP-11s without split I&D had *(char *)0 == 7 or 8, and
*(short *)0 == 0407 or 0408 (OMAGIC and NMAGIC respectively).

32V Unix had *(int *)0==0.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (04/24/88)

In article <11199@mimsy.UUCP> I wrote
>32V Unix had *(int *)0==0.

Ai!  That should be *(char *)0 == 0.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

ark@alice.UUCP (04/25/88)

In article <4729@cup.portal.com>, Paul_L_Schauble@cup.portal.com.UUCP writes:
> Could someone please tell me which machine it was that started the
> unfortunate convention that referencing through the null pointer returns a
> zero with no error?

There is no such convention.  Some machines do behave that way,
but I've never heard anyone claim that a program relying on that
behavior is anything but wrong.

guy@gorodish.Sun.COM (Guy Harris) (04/25/88)

> This convention started in split-space programs on the pdp11/45 under V6,
> as an implementation accident:  the data space really did begin at 0 in
> a split-space program, so an anonymous variable was inserted there to make
> sure that no normal variable had address 0, and unfortunately said variable
> was initialized to 0.

Any split I&D program with a data+BSS segment larger than 8K could be set up to
remove 0 from the address space; just make the bottommost data page a "grow
down from the top" page and stop just before the last click.  Programs with
smaller data segments could have the BSS padded.  The same could be done for
non-split-I&D programs by doing the same to the bottommost page (text or data).
This would have introduced some extra complications; I don't know whether it
would have been worth it or not.  Old binaries would still have to have been
supported, probably.

> (I hesitate to blame Dennis in particular; he is probably responsible for
> the anonymous variable, but it would have required unusual foresight to
> predict the problems and put some strange value there instead of zero.)

Dennis sent me mail a while ago indicating that John Reiser had done a paging
32V and that it did *not* have a location 0 in the address space; he referred
to this as one of the best things about that system.  He also indicated that
this system had other interesting features, such as mapped files.  Alas, this
system never got out the door....

> Both System V and Berklix inherited the problem, but it *mostly* got cleaned
> out of Berklix by early efforts at Sun that were fed back to Berkeley.  It
> keeps creeping back, since Berklix (unlike Sunnix) does not set up its page
> map to trap accesses to location 0.  Unless it's been fixed quite recently,
> System V still has the problem, since AT&T likewise does not map out 0.

That's somewhat hardware dependent; AT&T's releases don't, but people who port
it to their hardware can.  Also, the paging S5R2.2 release had a "-z" flag to
the linker that set up a "no page 0" executable; alas, this flag wasn't the
default, so buggy programs didn't get fixed.  (I seem to remember a claim that
the '286 or '386 port of S5 has no location zero, and the Motorola 68K port may
also have no location zero; alas, none of the fixes made to programs that broke
got folded back into the mainstream S5 releases.)

> (If it comes to that, Sun would probably have preferred not to, but as I
> recall it their early hardware gave them no choice.)

Probably.  At this point, we continue it in order to catch bugs; the Sun-2,
Sun-3, Sun-4, and Sun-386i all have no location zero in user mode.  The Sun-3
and Sun-4, and perhaps the Sun-386i as well, have no location zero in kernel
mode either; this caught a number of null-pointer-dereferencing bugs in the
kernel.

I know at Computer Consoles, our 68K machine had no location 0 accessible from
user mode, because there were no separate kernel and user address spaces, the
kernel portion was read-and-write protected in user mode, and the interrupt
vector had to live at the bottom of the address space and it was considered
part of the kernel.  The same problem may have occurred at Sun.

> Patches to make 4.2 map out zero have been posted in the past; the same
> could probably be done for System V, but far more programs would break.

You bet.  Many of them *did* break when we added them to SunOS - one reason why
I wish the fixes for the '286 and 68K S5 ports had made it back into the
mainstream....

grimlok@hubcap.UUCP (Mike Percy) (04/26/88)

From article <4729@cup.portal.com>, by Paul_L_Schauble@cup.portal.com:
> Could someone please tell me which machine it was that started the
> unfortunate convention that referencing through the null pointer returns a
> zero with no error? I was under the impression it was BSD Unix, but I'm not
> sure.
> 
>     Thanks,
>        Paul

Well, I don't know bout this, but I do know that at least some compilers
try to give some sort of indication that you have reda or written
through a null pointer. For instance, when I use TurboC and do this:

  printf("%s",some_char_star_pointer);

I get
  
  (null)

which is real nice.
Only good for strings though, but it is also useful, but not quite so
obviously, that on termination, it seems that TC also checks the values
at &(NULL) and if what is there is not what should be there you get the
message

 (null pointer reference)

(that is, if your program manages to terminate "properly")

Note that I have not gotten elbow deep verifying exactly how TC
implements this, but it seems simple enough to me.


volatile --- yeaah!
noalias --- ppppppphbbblllaaatthhh

henry@utzoo.uucp (Henry Spencer) (04/26/88)

> I imagine that PDP-11s without split I&D had *(char *)0 == 7 or 8, and
> *(short *)0 == 0407 or 0408 (OMAGIC and NMAGIC respectively).

Actually, no.  The a.out header was not part of the actual core image, so
the first instruction of the program was first; in practice this was the
"setd" that got the floating-point processor into the right mode (or tipped
the software off that the processor lacked hardware floating point), which
gave *(char *)0 == 011 and *(short *)0 == 0170011.  The programs which
made assumptions about *0 were generally the big ones, which ran split-space
of necessity.
-- 
"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
non-negotiable."  --DMR              | {ihnp4,decvax,uunet!mnetor}!utzoo!henry

blarson@skat.usc.edu (Bob Larson) (04/26/88)

[followups redirected to comp.lang.c, this isn't a unix-specific issue.]

In article <2730@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>So long as your memory management hardware is trapping references
>through a null pointer and printing an error message, how about
>allowing the user to set a switch that will cause such illegal trapped
>references to be handled by an emulation routine that will cause a zero
>to be returned and continue execution?

Unfortunatly, this would only encorage the develepment of buggy software,
and slow the fixing of existing buggy software.

I propose adding a flag to executables that tells the system how to
handle null pointer dereferences.  Normally, they should trap to a
fatal error procedure, but an option to trap to a routine that does a
sleep(1), sets the process priority to the lowest possible, and then
returns a 0 of the correct type to the buggy program might be useful
as a porting aid.  (Why should a background process be considered less
important that continuing a known buggy program?)  The reduced speed
should be enough so software vendors can't get away distributing broken
programs.
--
Bob Larson	Arpa: Blarson@Ecla.Usc.Edu	blarson@skat.usc.edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson
Prime mailing list:	info-prime-request%fns1@ecla.usc.edu
			oberon!fns1!info-prime-request

mcdonald@uxe.cso.uiuc.edu (04/27/88)

/* Written  6:04 pm  Apr 25, 1988 by henry@utzoo.uucp in uxe.cso.uiuc.edu:comp.lang.c */
> I imagine that PDP-11s without split I&D had *(char *)0 == 7 or 8, and
> *(short *)0 == 0407 or 0408 (OMAGIC and NMAGIC respectively).

Actually, no.  The a.out header was not part of the actual core image, so
the first instruction of the program was first; in practice this was the
"setd" that got the floating-point processor into the right mode (or tipped
the software off that the processor lacked hardware floating point), which
gave *(char *)0 == 011 and *(short *)0 == 0170011.  The programs which
made assumptions about *0 were generally the big ones, which ran split-space
of necessity.
-- 
"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
non-negotiable."  --DMR              | {ihnp4,decvax,uunet!mnetor}!utzoo!henry
/* End of text from uxe.cso.uiuc.edu:comp.lang.c */
Actually, yes. I think Henry is making a unstated assumption: his use
of the phrase "a.out" implies he is thinking of Unix. Most PDP-11's run
RT-11. The memory location 0 in RT-11 is indeed 7. Thus if you're
not using relocation, *(char *)0 is 7. I just tried it. 

Doug McDonald

thorinn@diku.dk (Lars Henrik Mathiesen) (05/05/88)

In article <1988Apr25.230435.3434@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>> I imagine that PDP-11s without split I&D had *(char *)0 == 7 or 8, and
>> *(short *)0 == 0407 or 0408 (OMAGIC and NMAGIC respectively).

>Actually, no.  The a.out header was not part of the actual core image, so
>the first instruction of the program was first;

Was this always so? In that case it is a striking coincidence that OMAGIC is
the PDP-11 instruction to branch past the next 7 words -- which would be the
rest of the a.out header. I think that this suggests that the whole a.out file
was loaded in some early version of UNIX, with execution starting at 0.
  The rest of the a.out magic numbers may have been constructed by analogy in
later versions of UNIX.

>"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
>non-negotiable."  --DMR              | {ihnp4,decvax,uunet!mnetor}!utzoo!henry
--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcvax!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.

henry@utzoo.uucp (Henry Spencer) (05/06/88)

> >Actually, no.  The a.out header was not part of the actual core image, so
> >the first instruction of the program was first;
> 
> Was this always so?

Perhaps not.  It can't be a coincidence that the magic number is a branch
around the rest of the header.  This may have been aimed at things like
standalone diagnostics rather than normal Unix programs, though.  I know
that it wasn't in the core image in V7 or V6, and I'm fairly sure that it
wasn't in V5, but that's as far back as my experience goes.  Dennis?
-- 
NASA is to spaceflight as            |  Henry Spencer @ U of Toronto Zoology
the Post Office is to mail.          | {ihnp4,decvax,uunet!mnetor}!utzoo!henry