[comp.unix.wizards] Referencing through a null pointer

guy@gorodish.Sun.COM (Guy Harris) (04/25/88)

> This convention started in split-space programs on the pdp11/45 under V6,
> as an implementation accident:  the data space really did begin at 0 in
> a split-space program, so an anonymous variable was inserted there to make
> sure that no normal variable had address 0, and unfortunately said variable
> was initialized to 0.

Any split I&D program with a data+BSS segment larger than 8K could be set up to
remove 0 from the address space; just make the bottommost data page a "grow
down from the top" page and stop just before the last click.  Programs with
smaller data segments could have the BSS padded.  The same could be done for
non-split-I&D programs by doing the same to the bottommost page (text or data).
This would have introduced some extra complications; I don't know whether it
would have been worth it or not.  Old binaries would still have to have been
supported, probably.

> (I hesitate to blame Dennis in particular; he is probably responsible for
> the anonymous variable, but it would have required unusual foresight to
> predict the problems and put some strange value there instead of zero.)

Dennis sent me mail a while ago indicating that John Reiser had done a paging
32V and that it did *not* have a location 0 in the address space; he referred
to this as one of the best things about that system.  He also indicated that
this system had other interesting features, such as mapped files.  Alas, this
system never got out the door....

> Both System V and Berklix inherited the problem, but it *mostly* got cleaned
> out of Berklix by early efforts at Sun that were fed back to Berkeley.  It
> keeps creeping back, since Berklix (unlike Sunnix) does not set up its page
> map to trap accesses to location 0.  Unless it's been fixed quite recently,
> System V still has the problem, since AT&T likewise does not map out 0.

That's somewhat hardware dependent; AT&T's releases don't, but people who port
it to their hardware can.  Also, the paging S5R2.2 release had a "-z" flag to
the linker that set up a "no page 0" executable; alas, this flag wasn't the
default, so buggy programs didn't get fixed.  (I seem to remember a claim that
the '286 or '386 port of S5 has no location zero, and the Motorola 68K port may
also have no location zero; alas, none of the fixes made to programs that broke
got folded back into the mainstream S5 releases.)

> (If it comes to that, Sun would probably have preferred not to, but as I
> recall it their early hardware gave them no choice.)

Probably.  At this point, we continue it in order to catch bugs; the Sun-2,
Sun-3, Sun-4, and Sun-386i all have no location zero in user mode.  The Sun-3
and Sun-4, and perhaps the Sun-386i as well, have no location zero in kernel
mode either; this caught a number of null-pointer-dereferencing bugs in the
kernel.

I know at Computer Consoles, our 68K machine had no location 0 accessible from
user mode, because there were no separate kernel and user address spaces, the
kernel portion was read-and-write protected in user mode, and the interrupt
vector had to live at the bottom of the address space and it was considered
part of the kernel.  The same problem may have occurred at Sun.

> Patches to make 4.2 map out zero have been posted in the past; the same
> could probably be done for System V, but far more programs would break.

You bet.  Many of them *did* break when we added them to SunOS - one reason why
I wish the fixes for the '286 and 68K S5 ports had made it back into the
mainstream....

dhesi@bsu-cs.UUCP (Rahul Dhesi) (04/26/88)

[0 stored at location 0, allowing access through null pointer]

A suggestion to system designers, to decrease the frustration of
customers who want to run software developed on systems that return 0
from a dereferenced null pointer.

So long as your memory management hardware is trapping references
through a null pointer and printing an error message, how about
allowing the user to set a switch that will cause such illegal trapped
references to be handled by an emulation routine that will cause a zero
to be returned and continue execution?

Conceptually this is not much different from emulation of illegal
(nonexistent) floating point instructions, or handling of page faults.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

dlm@cuuxb.ATT.COM (Dennis L. Mumaugh) (04/26/88)

In  article  <50676@sun.uucp  guy@gorodish.Sun.COM  (Guy  Harris)
writes:

  ??>  Both System V and Berklix inherited the  problem,  but  it
  ??>  *mostly*  got  cleaned  out of Berklix by early efforts at
  ??>  Sun that were fed back  to  Berkeley.  It  keeps  creeping
  ??>  back,  since  Berklix  (unlike Sunnix) does not set up its
  ??>  page map to trap accesses to location 0.  Unless it's been
  ??>  fixed  quite  recently,  System  V  still has the problem,
  ??>  since AT&T likewise does not map out 0.

GH> That's somewhat hardware dependent; AT&T's releases don't, but
GH> people  who  port  it to their hardware can.  Also, the paging
GH> S5R2.2 release had a "-z" flag to the linker that set up a "no
GH> page  0"  executable;  alas,  this flag wasn't the default, so
GH> buggy programs didn't get fixed. (I seem to remember  a  claim
GH> that the '286 or '386 port of S5 has no location zero, and the
GH> Motorola 68K port may also have no location zero;  alas,  none
GH> of  the fixes made to programs that broke got folded back into
GH> the mainstream S5 releases.)

Yes, we were going to make -z the default and then we  discovered
that  we couldn't map out page 0 of the processes.  Seems that on
a WE32100 chip the  "gate"  instruction  wants  the  kernel  trap
vectors maped into the user virtual locations starting at 0. [The
the kernel gate tables reside in page 0.] That's also why we have
to  have  our  programs  load into the top half of vitrual memory
making all pointers negative ( try subtracting two pointers!).

On ports to other boxes it is possible to have page 0 mapped out.
And most do.
-- 
=Dennis L. Mumaugh
 Lisle, IL       ...!{ihnp4,cbosgd,lll-crg}!cuuxb!dlm

henry@utzoo.uucp (Henry Spencer) (04/26/88)

> ... at Computer Consoles, our 68K machine had no location 0 accessible from
> user mode, because there were no separate kernel and user address spaces, the
> kernel portion was read-and-write protected in user mode, and the interrupt
> vector had to live at the bottom of the address space and it was considered
> part of the kernel.  The same problem may have occurred at Sun.

I believe exactly the same situation occurred on the Sun 1, in fact, since
the 68000 couldn't relocate its interrupt vectors and Sun's MMU had the user
and the kernel sharing the address space.
-- 
"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
non-negotiable."  --DMR              | {ihnp4,decvax,uunet!mnetor}!utzoo!henry

ron@topaz.rutgers.edu (Ron Natalie) (04/26/88)

I believe Gould calls this feature on their system: "Braindamaged VAX
compatibility mode."

-Ron

blarson@skat.usc.edu (Bob Larson) (04/26/88)

[followups redirected to comp.lang.c, this isn't a unix-specific issue.]

In article <2730@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>So long as your memory management hardware is trapping references
>through a null pointer and printing an error message, how about
>allowing the user to set a switch that will cause such illegal trapped
>references to be handled by an emulation routine that will cause a zero
>to be returned and continue execution?

Unfortunatly, this would only encorage the develepment of buggy software,
and slow the fixing of existing buggy software.

I propose adding a flag to executables that tells the system how to
handle null pointer dereferences.  Normally, they should trap to a
fatal error procedure, but an option to trap to a routine that does a
sleep(1), sets the process priority to the lowest possible, and then
returns a 0 of the correct type to the buggy program might be useful
as a porting aid.  (Why should a background process be considered less
important that continuing a known buggy program?)  The reduced speed
should be enough so software vendors can't get away distributing broken
programs.
--
Bob Larson	Arpa: Blarson@Ecla.Usc.Edu	blarson@skat.usc.edu
Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson
Prime mailing list:	info-prime-request%fns1@ecla.usc.edu
			oberon!fns1!info-prime-request

dkc@hotlr.ATT (Dave Cornutt) (04/26/88)

In article <2730@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
 > A suggestion to system designers, to decrease the frustration of
 > customers who want to run software developed on systems that return 0
 > from a dereferenced null pointer.
 > 
 > So long as your memory management hardware is trapping references
 > through a null pointer and printing an error message, how about
 > allowing the user to set a switch that will cause such illegal trapped
 > references to be handled by an emulation routine that will cause a zero
 > to be returned and continue execution?

Gould already does this.  They have a utility called "prot" in /etc that
can set this to one of three modes: ignore, warn, or fatal.  In ignore
mode, reads of location 0 are emulated and return 0, and writes silently
go in the bit bucket.  Warning mode is the same except that the a message
is printed on the terminal (if there's a controlling tty associated with
the offending process), and in the system error log.  In fatal mode, the
offending process gets socked with SIGSEGV.  They strongly recommend
that you put it in fatal mode in /etc/rc and leave it that way unless
ignore or warning mode is needed for some particular reason.  (Note:
these are not the actual mode names; I can't remember them.)

And, in article <Apr.25.20.40.05.1988.10155@topaz.rutgers.edu>, 
ron@topaz.rutgers.edu (Ron Natalie) writes:

> I believe Gould calls this feature on their system: "Braindamaged VAX
> compatibility mode."

Well, not *officially*... :-)
-- 
Dave Cornutt, AT&T Bell Labs (rm 4A406,x1088), Holmdel, NJ
UUCP:{ihnp4,allegra,cbosgd}!hotly!dkc
"The opinions expressed herein are not necessarily my employer's, not
necessarily mine, and probably not necessary"

andrew@frip.gwd.tek.com (Andrew Klossner) (04/26/88)

[]

	"I believe Gould calls this feature on their system:
	"Braindamaged VAX compatibility mode.""

We can sit here and smirk about how ideologically impure these fools
are who want a 0 at location 0, but the real world is full of hoary old
programs that run fine on a VAX and fail on the class of systems that
don't have a 0 at 0.  If I'm a computer center manager searching for a
replacement for my aging 11/780's, and my several-megabyte
Bread-and-Butter Application works on system X but not on system Y, how
much credence do you think the Y salesperson will get from me when she
explains that my program has no business dereferencing 0?

This is a real go/no-go decision in a significant number of sales.
Tektronix has eschewed ivory tower arguments in favor of pragmatism and
taken careful pains to put a double 0 at location 0 on all our U**x
graphic workstation products.  This is the *default* behavior; you
can't expect the salesperson to remember to specify the "-braindamage"
switch when compiling the potential customer's application.

[You could take the cynical attitude that, but putting 0 at 0 by
default, we encourage development of buggy software that won't port to
other vendors' systems and so lock our customers into our product line.
I only just thought of that; we never consciously considered this.]

	"So long as your memory management hardware is trapping
	references through a null pointer and printing an error
	message, how about allowing the user to set a switch that will
	cause such illegal trapped references to be handled by an
	emulation routine that will cause a zero to be returned and
	continue execution?"

The obvious performance improvement here is simply to make this a
load-time switch which causes 0 to be put at location 0.  There's no
need to bother with trapping references.

  -=- Andrew Klossner   (decvax!tektronix!tekecs!andrew)       [UUCP]
                        (andrew%tekecs.tek.com@relay.cs.net)   [ARPA]

jas@llama.rtech.UUCP (Jim Shankland) (04/28/88)

In article <9946@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) writes:
>We can sit here and smirk about how ideologically impure these fools
>are who want a 0 at location 0, but the real world is full of hoary old
>programs that run fine on a VAX and fail on the class of systems that
>don't have a 0 at 0.  If I'm a computer center manager searching for a
>replacement for my aging 11/780's, and my several-megabyte
>Bread-and-Butter Application works on system X but not on system Y, how
>much credence do you think the Y salesperson will get from me when she
>explains that my program has no business dereferencing 0?

Personally, all my code assumes there's a 17 at location 0.  It comes
in really handy sometimes, and it works great on my machine, a
Waxahatchy 9400/X; if my program breaks on other machines that stupidly
put some other value at 0, or that read-protect address 0, then those
other machines are just broken.

Aren't they?

Jim Shankland
  ..!ihnp4!cpsc6a!\
               sun!rtech!jas
 ..!ucbvax!mtxinu!/

	And I will show you something different from either
	Your shadow at morning striding behind you
	Or your shadow at evening rising to meet you;
	I will show you fear in a handful of dust.

			-- T. S. Eliot

bob@cloud9.UUCP (Bob Toxen) (04/28/88)

It's not the VAX that is braindamaged so much as the programs.

We added the same fix/brain-damage to our Stratus UNIX product.
Anything that increases the reliability of the customer's system
should be considered an improvement, pretty or ugly.  At my
previous company we let the customers find the refs thru 0.
-- 

Bob Toxen	{ucbvax!ihnp4,harvard,cloud9!es}!anvil!cavu!bob
Stratus Computer, Marlboro, MA
Pilot to Copilot: What's a mountain goat doing way up here in a cloud bank?

wcs@skep2.ATT.COM (Bill.Stewart.<ho95c>) (04/29/88)

In article <9946@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) writes:
> If I'm a computer center manager searching for a replacement
> for my aging 11/780's, and my several-megabyte Bread-and-Butter
> Application works on system X but not on system Y, how much
> credence do you think the Y salesperson will get from me when she
> explains that my program has no business dereferencing 0?

	Sigh.  It's evil, and managers just don't *understand* :-).
We have a manager here who still doesn't trust HPs becuase the
floating point formats are different, and doesn't trust 3Bs because
they're big-endian.

You can't just fix it by putting a zero into location zero, either,
since people can dereference structure pointers without setting
them first.  (i.e. struct foo { int a,b,c; } foo; x=foo->c; )
You might tell them that VAX programs that suffer from this problem
will sometimes produce WRONG OUTPUT SILENTLY on VAXen.
-- 
#				Thanks;
# Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs
# skep2 is a local machine I'm trying to turn into a server.  Please send
# mail to ho95c or ho95e instead.  Thanks.

bp@pixar.uucp (There's too much damn' government) (04/29/88)

In article <9946@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) writes:
>We can sit here and smirk about how ideologically impure these fools
>are who want a 0 at location 0, but the real world is full of hoary old
>programs that run fine on a VAX and fail on the class of systems that
>don't have a 0 at 0.

NYIT has a product with an embedded PDP-11 Version-6 UNIX, running tons
of hoary 5 and 10 year-old programs. About two years ago I hacked the
memory management of this system to make the low 64 bytes of data space
invalid. That day about a dozen programs that had been buggy for years
were fixed, as null-pointer references started dumping core. Most of
those programs had uninitialized structure pointers that read or wrote
a few words above the zero in the first word of the address space,
causing all kinds of un-traceable problems. Once the system could tell
us about null-pointer references, it was only a few hours work to fix
them.

I think VAX unix should make the low page of user space invalid so that
these bugs would be trapped as they are on the Sun. The only
complication would be with the PDP-11 emulation on older VAXes, which
insists on running out of the low 64k of a process address space. Talk
about brain-damage - there wasn't a base register for the PDP-11
emulation space.

There's no reason for architects of new systems to reduce overall
system reliability by not trapping address-zero references.

					Bruce Perens

bob@cloud9.UUCP (Bob Toxen) (05/01/88)

In article <1766@pixar.UUCP>, bp@pixar.uucp (There's too much damn' government) writes:
> That day about a dozen programs that had been buggy for years
> were fixed...  it was only a few hours work to fix them.
> 
> 					Bruce Perens

Well, you found all cases in commonly used code paths.  The possibility
of cases in rarely used cases is the reason we emulate the VAX behavior
in our system ... and we hit cases later.
-- 

Bob Toxen	{ucbvax!ihnp4,harvard,cloud9!es}!anvil!cavu!bob
Stratus Computer, Marlboro, MA
Pilot to Copilot: What's a mountain goat doing way up here in a cloud bank?

mike@turing.UNM.EDU (Michael I. Bushnell) (05/01/88)

You know, people love railing against VAX unix...but there is one
reason that you CAN'T make the bottom of "data" un-readable and thus
fix the problem.  The VAX has a linear address space (or the VM hardware
makes it look that way).  There isn't a distinction between a data 
address and a text address, except that one is less than _etext.  If you
marked the bottom page non-readable, you would have massive problems...
you couldn't read start and whatever else makes it into the beginning
of the text space. 

There is a problem; but there isn't an easy way to avoid it, short
of deciding that a unix process should be loaded starting at address
1024 and make the first page non-readable.


                N u m q u a m   G l o r i a   D e o 

			Michael I. Bushnell
			HASA - "A" division
14308 Skyline Rd NE				Computer Science Dept.
Albuquerque, NM  87123		OR		Farris Engineering Ctr.
	OR					University of New Mexico
mike@turing.unm.edu				Albuquerque, NM  87131
{ucbvax,gatech}!unmvax!turing.unm.edu!mike

guy@gorodish.Sun.COM (Guy Harris) (05/02/88)

> There is a problem; but there isn't an easy way to avoid it, short
> of deciding that a unix process should be loaded starting at address
> 1024 and make the first page non-readable.

Which is *precisely* what:

	1) John Bruner's modifications to 4BSD

and

	2) The "-z" flag in paging VAX S5R2.2

do.  You just need a way to tell the kernel about "page zero" versus "no page
zero" executables (so you don't have to recompile all your old binaries), which
COFF already has and which John Bruner added to 4BSD.

You may end up seeing this in some future 4BSD release.

chris@mimsy.UUCP (Chris Torek) (05/02/88)

In article <1013@unmvax.unm.edu> mike@turing.UNM.EDU (Michael I. Bushnell)
writes:
>...there is one reason that you CAN'T make the bottom of "data"
>un-readable and thus fix the problem.  The VAX has a linear address
>space (or the VM hardware makes it look that way). ... If you
>marked the bottom page non-readable, ... you couldn't read start and
>whatever else makes it into the beginning of the text space. 

This is trivial.  `ld' normally starts the text space at 0.  When
linking with the no-zero-page option, it writes a no-zero-page style
magic number, and starts the text space at CLBYTES.  (It might be
safest to start at, say, 8K rather than 1K, in case someone tries
recompiling the kernel with a CLSIZE of more than 2.  Not that this
works in 4.2 and 4.3 BSD; someone confused CLBYTES and MCLBYTES, among
other things.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chip@ateng.UUCP (Chip Salzenberg) (05/04/88)

In article <50676@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>(I seem to remember a claim that the '286 or '386 port of S5 has no
>location zero, and the Motorola 68K port may also have no location zero;
>alas, none of the fixes made to programs that broke got folded back into
>the mainstream S5 releases.)

The following comments apply to the '286 in protected mode.

In small and medium models (<= 64K of data), it is possible to set up the
data segment so that offset zero doesn't exist.  I don't know of any
implementations that bother to do this.  (Which is unfortunate.)

In compact and large models (data limited only by memory/swap space), a
NULL pointer is (in all implementations I know of) 32 bits of zeros.  This
value for NULL causes a protection trap whenever it is dereferenced, since
the '286 defines segment selector zero to mean "no segment".

-- 
Chip Salzenberg                "chip@ateng.UU.NET" or "codas!ateng!chip"
A T Engineering                My employer may or may not agree with me.
  "I must create a system or be enslaved by another man's." -- Blake