[comp.arch] catering to bad code

phil@amdcad.UUCP (02/18/87)

In a Unix system I am designing, I am considering catering to bad
code. That is, like the VAX I propose to make location 0 contain a
readable 0. I think that code which gets ported to a Sun machine often
has to have this kind of thing cleaned up. 

What do people think of this? Is it kind of disgusting?

Just how much code has this problem? Did every program you ported to a
Sun have to be fixed, or 10%, or something in between? 

I'd like to do things right, but I'm also lazy. I think doing this
will save me a lot of work. Have I overlooked anything? The processor
has a relocatable vector area so I can easily map the vector table
somewhere else.
-- 
 How can I be Asian when I like milk so much?

 Phil Ngai +1 408 982 7840
 UUCP: {ucbvax,decwrl,hplabs,allegra}!amdcad!phil
 ARPA: amdcad!phil@decwrl.dec.com

rpw3@amdcad.UUCP (02/19/87)

In article <14833@amdcad.UUCP> phil@amdcad.UUCP (Phil Ngai) writes:
+---------------
| In a Unix system I am designing, I am considering catering to bad
| code. That is, like the VAX I propose to make location 0 contain a
| readable 0. I think that code which gets ported to a Sun machine often
| has to have this kind of thing cleaned up. 
| 
| What do people think of this? Is it kind of disgusting?
+---------------

No, no, Mr. Phil, don't do it!  ;-}

Well, if you think you MUST do it (I understand limited time/money and
concerns about unlimited risk), do it this way:

Make the a.out format explicitly support specifying the text starting offset.
(System-V COFF format does, and it can be added to BSD a.out easily enough,
if necessary, with a new magic number.)

Make the default be to NOT map in the page, however, write yourself a little
hack that takes an a.out and INSERTS the page of zeros, and adjusts the header
to reflect the offset of zero (rather than "+1 page").  You won't need to
re-link, since the absolute virtual addresses didn't move, just the existence
of page #0.

Benefits:

1. All new code developed on the machine will have the most often bug caught
   by default.

2. Old, bad, dirty, broken code can be made to work with just an execution
   of the "addpage0" utility on the object (AFTER it has blown up the first
   time and you are sure deref'ing a zero pointer is why).

3. You will be able to SLOWLY clean up the ugly code, as time/money permit.


Rob Warnock
Systems Architecture Consultant

UUCP:	{amdcad,fortune,sun}!redwood!rpw3
DDD:	(415)572-2607
USPS:	627 26th Ave, San Mateo, CA  94403

klein@gravity.UUCP (02/19/87)

In article <14833@amdcad.UUCP>, phil@amdcad.UUCP (Phil Ngai) writes:
> In a Unix system I am designing, I am considering catering to bad
> code. That is, like the VAX I propose to make location 0 contain a
> readable 0. I think that code which gets ported to a Sun machine often
> has to have this kind of thing cleaned up. 
> 
> What do people think of this? Is it kind of disgusting?

It's not just an issue of porting to a Sun or other machine, it's an issue of
relying on code to do something that it does only on a subset of machines.  A
well-written program does not read at location 0 because some potential
platforms do not allow it.  A system that allows location 0 to be read
(or written) only encourages bad programming.  My vote is an emphatic NO!
--
	Mike Klein		klein@sun.{arpa,com}
	Sun Microsystems, Inc.	{ucbvax,hplabs,ihnp4,seismo}!sun!klein
	Mountain View, CA

guy@gorodish.UUCP (02/19/87)

>It's not just an issue of porting to a Sun or other machine, it's an issue of
>relying on code to do something that it does only on a subset of machines.

It's even more than that.  Programs that try to use location 0
because they're trying to dereference null pointers do so because
they have *bugs* in them - the core dump is just a more emphatic way
of pointing this out than various flavors of maybe-correct behavior
are.  VAX/VMS takes location 0 out of the address space by default,
probably for just that reason.

I can understand the desire for expediency, but I very strongly
support Rob Warnock's desire that you *not* make this the default
behavior.  Even though some paging versions of S5, at least, permit
you to define images that will be run without location 0, it's *not*
the default; as a result, I've had to fix a number of bugs that
*they* should have fixed.

ron@brl-sem.UUCP (02/19/87)

In article <14833@amdcad.UUCP>, phil@amdcad.UUCP (Phil Ngai) writes:
> In a Unix system I am designing, I am considering catering to bad
> code. That is, like the VAX I propose to make location 0 contain a
> readable 0. I think that code which gets ported to a Sun machine often
> has to have this kind of thing cleaned up. 
> 
I like the GOULD approach.  The board that traps access to location 0
among other out of bound memory addresses can be set to just ignore
the attempt (the user appears to have accessed the location, but doesn't
get anything if it is outside his memory limits, 0 is never in a user
address space there), print a message in the logfile, or memory fault
the process (and print a message in the log file).  This allows you
to turn on carefull mode or revert to VAX bad-code compatibility mode.

-Ron

firth@sei.cmu.edu.UUCP (02/19/87)

In article <14833@amdcad.UUCP> phil@amdcad.UUCP (Phil Ngai) writes:
>In a Unix system I am designing, I am considering catering to bad
>code. That is, like the VAX I propose to make location 0 contain a
>readable 0. I think that code which gets ported to a Sun machine often
>has to have this kind of thing cleaned up. 
>
>What do people think of this? Is it kind of disgusting?

On the VAX-11 under VMS, the bottom 512 bytes of the program virtual
address space is mapped out, and trying to access it crashes your
code.

When I ported systems code to the Vax, this was the SINGLE MOST USEFUL
THING ON THE MACHINE.  I simply could not believe how many old and
trusted programs were in fact wrong, and finally broke.  All those
places where the code looked at p->next before checking for p==NULL.

Tricky code is one thing.  But, in my opinion, for truly "bad" code,
ie code that explicitly violates the semantics of the language
standard, the best remedy is to crash it disgustingly as hard and as
soon as possible.  Then it gets fixed.

>Just how much code has this problem? Did every program you ported to a
>Sun have to be fixed, or 10%, or something in between? 

Too much (again in my opinion), and partly because too many implementors
are "kind" to bad code.

> Phil Ngai +1 408 982 7840
> UUCP: {ucbvax,decwrl,hplabs,allegra}!amdcad!phil
> ARPA: amdcad!phil@decwrl.dec.com

But please remember that this is just the eccentric and intolerant 
opinion of

Robert Firth

howard@cpocd2.UUCP (02/19/87)

In article <14833@amdcad.UUCP> phil@amdcad.UUCP (Phil Ngai) writes:
> In a Unix system I am designing, I am considering catering to bad
> code. That is, like the VAX I propose to make location 0 contain a
> readable 0.
> What do people think of this? Is it kind of disgusting?

I agree with Rob and others who have said No (don't do it) and Yes
(it's disgusting).

It's not just location 0.  It's all small integers, positive and
negative.  I once made a typo which left legal code that managed to
pass lint but still dereferenced address 1 (or was it 3?), treating
it as a (char *) and printing the string (i.e. the contents of low
memory) out onto the users terminal.  The terminal was emulating a
VT100 and low memory just happened to contain several copies of
the "Control String Initiator" character.  The result was that the
terminal would hang waiting for a "Control String Terminator" that
never came.  Explaining this bug to customers who had just lost
minutes/hours of work was not pleasant.  Explaining to management how
it had occurred and gotten by me unnoticed AND GOTTEN THROUGH SIX
WEEKS OF SOFTWARE QA UNNOTICED *AND* *BEEN* *SHIPPED* *TO* *EVERY*
*CUSTOMER* was even less enjoyable.  On a machine without read access
to page 0 this bug would have caused a simple core dump and been
easy to find and fix.

Approximately,
	INTENDED CODE:	printf("%s is %d","name",value);
	ACTUAL CODE:	printf("%s is %d",value);
(Note: this was before "printfck", which finds this error easily.)

It is safer to make page 0 *and* page -1 both inaccessible to catch all
uses of small integers as pointers, no matter which sign.  And perhaps
even all the pages addressable by shorts.  Actually, you'd like it to be
true that you couldn't accidentally use ANY integer in place of a pointer,
but this is not easy on an untagged machine unless you are willing to make
most or all of your address space inaccessible.  ;-)  Perhaps this is an
argument for tagged architectures.

Use of location 0 is a bug.  It is not portable to many machines and
operating systems (e.g. VMS).  And lots of early UNIX programs do it. :^(
-- 

	Howard A. Landman
	...!intelca!mipos3!cpocd2!howard

davis@unc.UUCP (02/20/87)

In article <554@aw.sei.cmu.edu.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP 
(Robert Firth) writes:
>Tricky code is one thing.  But, in my opinion, for truly "bad" code,
>ie code that explicitly violates the semantics of the language
>standard, the best remedy is to crash it disgustingly as hard and as
>soon as possible.  Then it gets fixed.

It is obvious you are only an operating system maker and don't ever
have to use code someone else wrote.  As a part time computer user (as
well as part time architect), I find any tool or operating system that
"crashes disgustingly" to be highly wasteful of my time.  Much of the
software that I use was written by somebody else at another site.
Getting him to fix his questionable practice may not be as easy as you
indicate.  I agree that it should be done right the first time, but
"two wrongs don't make a right."  

The other issue here is that the differences may not show up at compile
time.  You have now introduced a "bug" into the newly installed
software.  Not only will a lot of user time be wasted, but a lot of
system or applications programmer time will be spent chasing down this
bug.  How many guaranteed bugs are you willing to generate as a systems
programmer?

-------------------------------
Mark Davis  (davis@unc.cs.unc.edu)

zs01#@andrew.cmu.edu.UUCP (02/20/87)

I think compilers and operating systems should go out of their way to punish
bad code. Things like unmapping the zero page, and initializing the stack
with something other than 0s are good examples of what I have in mind. Bad
code is due to one of two things, either the programmer is being lazy, or
he/she just didn't know about it. In the latter case, I am sure that the
programmer would prefer to know he/she screwed up instead of letting somebody
else find out about it. Even more important is the fact that a lot of this
"bad code" is flaky and a genuine pain to debug, regardless of the issue of
portability.

As an example, indirecting a NULL function pointer on an IBM RT will
effectively jump to the main() function of your program. Because of this, we
had a lot of fun one day trying to figure out why a certain program was
"infinitely tail-recursive". Indirecting a NULL pointer is likely to be an
unintentional error. Giving the guy a 0 just means that it will take longer
to find the bug.

Another bug I have seen twice in the last week is the following:

struct foo *InitFoo()
{

    register struct foo *tempFoo = (struct foo *) malloc(sizeof(struct foo));

    tempFoo->bar = 1;
    tempFoo->bletch = 2;
    tempFoo->baz = "loselose";
}

Notice there is no return statement there! Surprisingly, this code will work
on a lot of machines. The compiler wisely decides that it can use the return
register as a temporary (since there is no return value).

Sincerely,
Zalman Stern
Information Technology Center
Carnegie Mellon
(ARPA) zs01#@andrew.cmu.edu
(UUCP) try something like
...seismo!rochester!pt.cs.cmu.edu!andrew.cmu.edu!zs01#

john@uw-nsr.UUCP (02/20/87)

In article <14833@amdcad.UUCP> phil@amdcad.UUCP (Phil Ngai) writes:
>In a Unix system I am designing, I am considering catering to bad
>code. That is, like the VAX I propose to make location 0 contain a
>readable 0. I think that code which gets ported to a Sun machine often
>has to have this kind of thing cleaned up. 

Well, I think I can comment on this.  For the last couple of years
I have been (marooned) on a Data General MV/10000 system.  As many
of the readers of this newsgroup are probably aware this system has
different hardware representations for "byte" pointers and "word"
pointers.  They also have "bit" pointers, but never having had to 
use one I really can't comment on them.  I have seen postings from
Steve Wallach in this newsgroup from time to time.  I hear he knows
something about MV's :-)

At any rate it has been my experience that something like 70 - 80 %
of the C programs I have ported have had pointer problems.  There are 
a few exceptions to the rule, for example XLISP, rn and mkmf.  

Many of the programs that I have ported have tried to dereference
through NULL.  On the MV series this does not work, at all.  However 
a much larger number of C programs are sloppy and don't cast pointer 
types appropriately.  On the MV series this causes a protection violation
and you are left looking at a few lines of traceback information.  

>What do people think of this? Is it kind of disgusting?

No comment.  I could vote either way right about now.

>Just how much code has this problem? Did every program you ported to a
>Sun have to be fixed, or 10%, or something in between? 

Too much code has this problem.  However, there are people working on
correcting the bad programs.  I guess they will eventually die out, 
although not soon enough for me.  However, only a couple of weeks ago 
my boss asked me how much work it would be to get "refer" running on 
our system.  Aack!

>I'd like to do things right, but I'm also lazy. I think doing this
>will save me a lot of work. Have I overlooked anything? The processor
>has a relocatable vector area so I can easily map the vector table
>somewhere else.

It would have saved me a lot of work.

To be fair to Data General, they have developed (probably in self-
defense) a top-notch C compiler that is very good at finding problems
with pointer type mismatches and subscript range errors.  To be fair
to me they still don't have a dbx that can be used for serious
debugging.

-- 
John Sambrook                           Work: (206) 545-7433
University of Washington WD-12          Home: (206) 487-0180
Seattle, Washington  98195              UUCP: uw-beaver!uw-nsr!john

news@cit-vax.UUCP (02/20/87)

Organization : California Institute of Technology
Keywords: 
From: jon@oddhack.Caltech.Edu (Jon Leech)
Path: oddhack!jon

In article <954@uw-nsr.UUCP> john@uw-nsr.UUCP (John Sambrook 5-7433) writes:
>Many of the programs that I have ported have tried to dereference
>through NULL.  On the MV series this does not work, at all.  However 
>a much larger number of C programs are sloppy and don't cast pointer 
>types appropriately.  On the MV series this causes a protection violation
>and you are left looking at a few lines of traceback information.  

	I'm glad they fixed this. When I was using an early version
of DG/UX a few years ago, an incorrectly cast pointer in one program
resulted in a system crash. This made the debugging turnaround cycle 
rather long until the cast was corrected. On the bright side, making 
the program in question (a screen editor) run on the MV architecture 
nailed almost every remaining portability bug. Putting the same program
on an 80286 Xenix got the rest. I recommend this technique for developing 
truly portable C code.

    -- Jon Leech (jon@csvax.caltech.edu || ...seismo!cit-vax!jon)
    Caltech Computer Science Graphics Group
    __@/

jans@stalker.UUCP (02/20/87)

In article <436@cpocd2.UUCP> howard@cpocd2.UUCP (Howard A. Landman) writes:

>It's not just location 0.  It's all small integers, positive and
>negative...  On a machine without read access to page 0 this bug
>would have caused a simple core dump...  It is safer to make page *0
>*and* page -1 both inaccessible... perhaps even all the pages
>addressable by shorts...  Use of location 0 is a bug.

Hold on one second, there.  Let's not make those with a legitimate use
of low memory pointers pay for such things!  Such idiocy would mean
that certain machines, such as the NS32000, would lose the speed
advantage of "base page addressing", or whatever you want to call it.
Good National assembly coders typically keep their SB register at 0x40
for best code density and speed in memory indirection.  Such code often
requires C interface code to talk through such pointers.

Another NS32000 application I did was a Z80 simulator.  The Z80 code
was loaded directly in the first 64k so that z80 pointers could be used
directly.  While the opcode simulator was assembly, BIOS routines were
all written in C, and (as any CP/M programmer knows) location zero must
be accessed.

This is kind of far from the original subject, and I agree that the
bad code should be cleaned up.  C is a wonderful vehicle for accessing
machine resources -- don't cripple it just because some people are
incapable of using those resources correctly!  Find the jokers and make
them use Ada instead!

:::::: Artificial   Intelligence   Machines   ---   Smalltalk   Project ::::::
:::::: Jan Steinman		Box 1000, MS 60-405	(w)503/685-2956 ::::::
:::::: tektronix!tekecs!jans	Wilsonville, OR 97070	(h)503/657-7703 ::::::

klein@gravity.UUCP (02/20/87)

In article <636@brl-sem.ARPA>, ron@brl-sem.ARPA (Ron Natalie <ron>) writes:
> I like the GOULD approach.  The board that traps access to location 0
> among other out of bound memory addresses can be set to just ignore
> the attempt (the user appears to have accessed the location, but doesn't
> get anything if it is outside his memory limits, 0 is never in a user
> address space there), print a message in the logfile, or memory fault
> the process (and print a message in the log file).  This allows you
> to turn on carefull mode or revert to VAX bad-code compatibility mode.

Only the last option is acceptable, because:

1.  Ignoring the attempt to access unmapped address: hides a bug that could
	be caught at the disallowed access and may manifest itself in an
	unbelievably obscure manner later.  Or, since the action of the
	program with this option might be the intended one, the bug is
	not detected, not fixed, and pops up sometime later in the
	development cycle (during porting) when it is much more expensive
	to fix.
2.  Printing a message in a log file: too easy to miss; how do you correlate
	the message with the access that caused it?  A core dump tells you
	exactly where the bug is and what the environment was at the time.
3.  Memory fault: that's what the access is, a memory fault, and the system
	should stop right then and there because further execution will only
	mask the bug more.

A point was brought up by a software user where he mentioned that time is
wasted when a program core dumps.  This is true, but the alternative can be
much more costly.  If the program accesses an unmapped address, it has a bug
in it, and is therefore not correct.  Its results are suspect, and it is
better that it inform the user in no uncertain manner that it is doing
something wrong rather than go on and produce wrong answers that might
look OK.
--
	Mike Klein		klein@sun.{arpa,com}
	Sun Microsystems, Inc.	{ucbvax,hplabs,ihnp4,seismo}!sun!klein
	Mountain View, CA

phil@sequent.UUCP (02/20/87)

In my experience, having a zero of at least 4 bytes at zero is extremely
useful.  I have talked to customers who were very concerned about this.
Apparently, they have fought battles trying to port to machines that
did not have this feature.  For our DYNIX software, we settled on:

	1. an entire readonly page of zeros at zero by default
	2. a loader flag that makes page zero invalid

Using number 2 allows you to create more portable code or fix bugs
in existing code before having to port to a machine without a zero at zero.
NOTE: Having 1 be the default does save lots of headaches.
-- 
----
Phil Hochstetler
Sequent Computer Systems
Beaverton, Oregon

guy@gorodish.UUCP (02/21/87)

>As a part time computer user (as well as part time architect), I find
>any tool or operating system that "crashes disgustingly" to be highly
>wasteful of my time.

So do I.  That's why I agree 100% with Firth that programs that
dereference null pointers should "crash disgustingly" on all
implementations - so that they crash in the developer's lap, not
mine.

>Much of the software that I use was written by somebody else at another site.
>Getting him to fix his questionable practice may not be as easy as you
>indicate.

That's entirely beside the point.  There is no *guarantee* that the
code would work correctly even if you *did* permit dereferencing of
null pointers!  If you don't permit dereferencing null pointers, a
program that tries to dereference a null pointer is guaranteed to
fail.  If you do, the program might work, but then again it might fail in
some mysterious way, and the consequences of the failure discovered
much later (when it might be *too* late).

This is *not* just ivory-tower speculation.  See Howard Landman's
article <436@cpocd2.UUCP>.

>The other issue here is that the differences may not show up at compile
>time.  You have now introduced a "bug" into the newly installed
>software.

This resembles an annoying practice called "blaming the victim".  The
compiler/OS authors didn't introduce *any* bug whatsoever!  The bug was
in the original code; the author of that code was just lucky in that
their code ran with no obvious visible symptoms.  

>Not only will a lot of user time be wasted, but a lot of system or
>applications programmer time will be spent chasing down this
>bug.  How many guaranteed bugs are you willing to generate as a systems
>programmer?

What is this "guaranteed bugs" stuff?  If it's guaranteed that
some person will put out code that will fail if you can't dereference
null pointers, then *that person* generated that bug, not the people
who created the implementation that forbids dereferencing null
pointers!  The person in question should be re-educated, using a
salary continuation program as incentive, if necessary.  Go yell at
him or her for the time you spent chasing down his or her bug, not at
the compiler/OS implementor!

guy@gorodish.UUCP (02/22/87)

>In my experience, having a zero of at least 4 bytes at zero is extremely
>useful.

But is not necessarily possible, convenient, or desirable on all
machines.

>I have talked to customers who were very concerned about this.
>Apparently, they have fought battles trying to port to machines that
>did not have this feature.

Who's to say they won't fight battles trying to port to machines that
don't have the same byte order, or the same floating-point format, or
the same address space layout, or the same... as the VAX?

>Using number 2 allows you to create more portable code or fix bugs
>in existing code

Bugs is bugs.  If the code has a bug, the fact that it *may* happen
to work on a machine with 0 at location 0 - which is *not*
guaranteed; consider, e.g., that somebody might be treating a null pointer
or a *non*-zero value pointed to by that pointer as a termination
condition.

>NOTE: Having 1 be the default does save lots of headaches.

And creates lots more.  Mapping in page zero is the default on the
3B2's linker; this means *we* get to fix a bunch of bugs that are
*not* our fault, but the fault of programmers at AT&T-IS.

mash@mips.UUCP (02/22/87)

In article <14833@amdcad.UUCP> phil@amdcad.UUCP (Phil Ngai) writes:
>In a Unix system I am designing, I am considering catering to bad
>code. That is, like the VAX I propose to make location 0 contain a
>readable 0. I think that code which gets ported to a Sun machine often
>has to have this kind of thing cleaned up. ....
[followed by many articles, mostly against allowing zero in there,
with a few allowing the option.]

1) At MIPS we left 0 out of the process image, for all the reasons
that others have cited, although the next version has a ld option
to move text to 0, just in case.  However, on at least some machines you might
be able to write a SIGSEGV catcher that stepped past the offending
instruction [which after all, can't be right], or maybe even stuck
a zero in the right register. Doing this implies the need to
semi-interpret the offending instruction and bump the PC past it.
This is anything from trivial to difficult, depending on the machine,
but is at least a possibility for someone who doesn't control the
OS on their machine, or has one whose architecture doesn't have
a zero location in user space at all.

2) The discussion reminds me of a similar effect that happened somewhere
around 1970.  At Penn State, we were switching from a 360/67 to
a 370/165.  360's required data items to be placed on natural boundaries
[words on words, halfwords on halfwords, etc].  If you disobeyed you
got a specification exception.  370's don't make that restriction.
People at our computer center, as I recall, actually requested a price
from IBM to get our 370 modified to restore the boundary requirement,
because they'd found that in the (large number of) bugs they'd reported to IBM,
SOMETHING LIKE 30-40% HAD FIRST SHOWN UP AS SPECIFICATION ERRORS!
(We didn't get this feature, but people tried.)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

wombat@ccvaxa.UUCP (02/23/87)

/* Written 11:33 pm  Feb 19, 1987 by john@uw-nsr.UUCP in ccvaxa:comp.arch */
At any rate it has been my experience that something like 70 - 80 %
of the C programs I have ported have had pointer problems.
/* End of text from ccvaxa:comp.arch */

Grepping through the Gould UTX 2.0 sources, I found 19 standard 4.3BSD
progams (out of about 420) had been altered so as to survive in the
crash-'em-on-null-pointer-dereferences environment. Or rather, that's
how many had the change commented, and any change to user-level code is
supposed to be commented.

Around here we run some machines with protect-bit on (crash offending
programs) and some with it off, but anything to be released must be
tested on a machine with protect-bit on. The ability to turn it off is
useful if you have binary-only 3rd-party software that misbehaves.


"My words say what you hear them say, but the movements of my mouth
indicate that I am telling a series of humorous stories in Yiddish."
R.A. Lafferty, *The Devil Is Dead*		Wombat
					ihnp4!uiucdcs!ccvaxa!wombat

aglew@ccvaxa.UUCP (02/23/87)

...> Null pointer dereferencing - catering to VAXisms in new machines

Gould's "prot" approach, optionally ignoring, logging, or coredumping
protection violations (low memory) has its advantages for running
binary only 3rd party code.

But, if you use it, make it optional per process, not global
per system.

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms.arpa

henry@utzoo.UUCP (Henry Spencer) (02/24/87)

Ultimately, sometimes you just gotta be realistic about these things.  Yes,
it is a Good Thing to make location 0 nonexistent so that code which touches
it will drop dead.  This will point out a great many bugs, which should be
fixed.  However, that is little comfort if you are on a tight schedule and
the buggy code did somehow deliver the right answer.  The "*NULL == '\0'"
assumption is rather widespread, and it is often harmless, in the sense that
it doesn't usually result in wrong answers.  Making *NULL dump core is
the right thing to do in the long term, but it does have a serious short-
term impact on how soon you can ship the bloody system.  Expediency may
dictate providing some short-term workaround, like an optional ld flag
that makes *NULL work but logs the problem somewhere for later attention.

Note that you will be living with the problem *forever* unless you make
people go out of their way when they want *NULL, so don't make it the default.
-- 
Legalize			Henry Spencer @ U of Toronto Zoology
freedom!			{allegra,ihnp4,decvax,pyramid}!utzoo!henry

lamaster@pioneer.UUCP (02/25/87)

In article <28200009@ccvaxa> wombat@ccvaxa.UUCP writes:
>
>Grepping through the Gould UTX 2.0 sources, I found 19 standard 4.3BSD
>progams (out of about 420) had been altered so as to survive in the
>crash-'em-on-null-pointer-dereferences environment. Or rather, that's
>how many had the change commented, and any change to user-level code is
>supposed to be commented.
>
>R.A. Lafferty, *The Devil Is Dead*		Wombat
>					ihnp4!uiucdcs!ccvaxa!wombat
Which ones are they?



  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg}!ames!pioneer!lamaster 
  NASA Ames Research Center  ARPA lamaster@ames-pioneer.arpa
  Moffett Field, CA 94035    ARPA lamaster@pioneer.arc.nasa.gov
  Phone:  (415)694-6117      ARPA lamaster@ames.arc.nasa.gov

"In order to promise genuine progress, the acronym RISC should stand 
for REGULAR (not reduced) instruction set computer." - Wirth

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

wunder@hpcea.UUCP (02/27/87)

The HP-UX C compilers have flags that allow you to force traps on null
pointer derefs, or to force accepting a null pointer deref (you get a 0
at that location).  The default behavior depends upon the actual
implementation.

Providing both choices to the user is obviously the best idea.

Walter Underwood
wunder@hplabs

-----------------------------------------

	  -z		 Do not	bind anything to  address  zero.  This
			 option	 will  allow runtime detection of null
			 pointers. See the note	on pointers below .

	  -Z		 Allow dereferencing of	null pointers. See the
			 note on pointers below.

     Pointers

	    Accessing  the  object  of	a  NULL	 (zero)	  pointer   is
	    technically	 illegal, (see Kernighan and Ritchie) but many
	    systems have permitted it in the past.  The	 following  is
	    provided   to  maximize  importability  of	code.  If  the
	    hardware is	able to	return zero for	reads of location zero
	    (when accessing at least 8 and 16 bit quantities), it must
	    do so unless the -z	flag is	present. The -z	flag  requests
	    that SIGSEGV be generated if an access to location zero is
	    attempted.	Writes of location zero	 may  be  detected  as
	    errors  even  if  reads  are  not.	If the hardware	cannot
	    assure that	location zero acts as if it was	initialized to
	    zero  or  is locked	at zero, the hardware should act as if
	    the	-z flag	is always set.

-----------------------------------------
The above man page excerpt is almost certainly copyrighted by HP.

jpa@celerity.UUCP (02/27/87)

In article <436@cpocd2.UUCP> howard@cpocd2.UUCP (Howard A. Landman) writes:
>In article <14833@amdcad.UUCP> phil@amdcad.UUCP (Phil Ngai) writes:
>> In a Unix system I am designing, I am considering catering to bad
>> code. That is, like the VAX I propose to make location 0 contain a
>> readable 0.
>> What do people think of this? Is it kind of disgusting?
>
>I agree with Rob and others who have said No (don't do it) and Yes
>(it's disgusting).
>
>It's not just location 0.  It's all small integers, positive and
>negative.
> ...
>Use of location 0 is a bug.  It is not portable to many machines and
>operating systems (e.g. VMS).  And lots of early UNIX programs do it. :^(
>-- 

When we were designing Celerity's first product, I pushed for trapping on
page 0 references.  I lost out to arguments much like those here in the
news.  We also took no special precaution to see that address 0 has a 0 in
it.  In retrospect, I think we would have been better off making page 0
unreferencable.

Almost all Celerity a.out's begin with the string 0x10, 0x80, 0x51, 0x0.
Using 0 as a pointer is now affectionately referred to as 'The Q bug',
referring to its visible attributes when printed as a string.  The net
effect was that 'Q bugs' were common, most of which were relatively
harmless, but enough of which were quite nasty.  Because we don't trap, it
is hard to say how many of these bugs exist today.

Due to the way Celerity manages shared text using a context identifier in
the hardware, the kernel can distinguish a fetch-for-execute from a
fetch-for-data.  It can take special action (e.g., signal the process) on
page 0 fetch-for-data references.  To fix a number of hard-to-find bugs I
have had to generate kernels to do exactly that for in-house developer use.
Someone was in my office yesterday asking if I still had that kernel around
(we never productized this behavior for the same reasons that we didn't make
page 0 unreferencable.  A site-configurable implementation is possible but
has never been proposed).

I still can't accept the argument that says that a machine should be
designed to tolerate a certain class of bad code.  The problem exists due to
the fact that machines tolerated it in the first place.  Would you argue
that a machine should return 0 if a program accesses undefined virtual
memory? (or -1, maybe? ;-)).

john@uw-nsr.UUCP (02/27/87)

In article <28200009@ccvaxa> wombat@ccvaxa.UUCP writes:
>
>/* Written 11:33 pm  Feb 19, 1987 by john@uw-nsr.UUCP in ccvaxa:comp.arch */
>At any rate it has been my experience that something like 70 - 80 %
>of the C programs I have ported have had pointer problems.
>/* End of text from ccvaxa:comp.arch */
>
>Grepping through the Gould UTX 2.0 sources, I found 19 standard 4.3BSD
>progams (out of about 420) had been altered so as to survive in the
>crash-'em-on-null-pointer-dereferences environment. Or rather, that's
>how many had the change commented, and any change to user-level code is
>supposed to be commented.

There are a couple of factors here that I feel need to be taken into
account.  Sorry that I did not make them clear in my first posting.
I would also like to thank "wombat" for pointing out where I was not
clear.
	
First, the code that I was talking about when I said 70 - 80 % had 
pointer problems was not the code from 4.3BSD.  Rather, it was the 
standard assortment of C code posted to mod.sources and net.sources
over the last year or so.  Owing to its relative newness this code 
is naturally less well tested than the code in 4.3BSD.

Second, when I say "pointer problems" I was not restricting my figures
to programs that dereference through null, though certainly a large 
number of programs do this.  An equally serious problem from my 
perspective is that a number of programs do something like this:

	typedef struct _mumble 
	{
		int foo;
		char bar;
	} mumble;

	f()
	{
		mumble	m;
		int	fd;

		if (read(fd, &m, sizeof(m)) == -1)
			.	
			.    
			.
	}

Everyone sees the (hopefully only one) bug in this code, right?

I suppose that this is getting away from the charter for this newsgroup.
I have directed follow-ups to comp.lang.c, where I suspect the bulk of
the readership is thoroughly bored with this topic.  I know I am :-)

-- 
John Sambrook                           Work: (206) 545-7433
University of Washington WD-12          Home: (206) 487-0180
Seattle, Washington  98195              UUCP: uw-beaver!uw-nsr!john

news@cit-vax.UUCP (02/27/87)

Organization : California Institute of Technology
Keywords: 
From: jon@oddhack.Caltech.Edu (Jon Leech)
Path: oddhack!jon

In article <6620001@hpcea.HP.COM> wunder@hpcea.HP.COM (Walter Underwood) writes:
>The HP-UX C compilers have flags that allow you to force traps on null
>pointer derefs, or to force accepting a null pointer deref (you get a 0
>at that location).  The default behavior depends upon the actual
>implementation.
>
>Providing both choices to the user is obviously the best idea.
	True enough. A pity HP doesn't do it.
	The HP 9000/300 C compiler has these flags; unfortunately, they
are no-ops. NULL is always mapped whether you want it or not. I think
they only work on series 500 machines. 

    -- Jon Leech (jon@csvax.caltech.edu || ...seismo!cit-vax!jon)
    Caltech Computer Science Graphics Group
    __@/

wombat@ccvaxa.UUCP (02/28/87)

/*Written  2:23 pm  Feb 25, 1987 by lamaster@pioneer.arpa in ccvaxa:comp.arch*/
Which ones are they?
/* End of text from ccvaxa:comp.arch */

awk, csh, make, su, dump, ifconfig, timed, battlestar, games/hunt, biff,
lastcomm, systat, telnet, w, at, atq, lookbib, refer, uucp, and yacc.
The worst offender is awk, where 8 code changes were made. Most of the
others were changed in only one place.


"My words say what you hear them say, but the movements of my mouth
indicate that I am telling a series of humorous stories in Yiddish."
R.A. Lafferty, *The Devil Is Dead*		Wombat
					ihnp4!uiucdcs!ccvaxa!wombat

phil@amdcad.UUCP (03/01/87)

I'm glad to see some people are mildly interested in this subject. :-)
I didn't make very clear the purpose of the Unix system I'm designing.
It is intended to be a throw-away. We only want to bring it up to show
that Unix can be brought up, and to see how fast it runs.  We don't
intend to sell this software as a product. As such, our only interest
is in doing it fast and to be able to run as much existing software as
possible. 

I think the best thing to do is make accessing null pointers a
segmentation violation by default and be able to allow it as needed.
Perhaps we could discuss the best method of doing this, so as to try
to provide a common mechanism. HP's method, for example sounds
interesting. Should the choice be made at link time, at load time, at
system configuration time, or something else? Should there be special
magic numbers to ask the kernel to load with null pointers returning 0
on a data read? Etc. 

As an unlikely example, suppose you bought a binary from a vendor who
used a permissive kernel and yours was strict. 

Someone mentioned that just returning 0 for the first 2 or 4 bytes was
not enough, as there is code that accesses structure members with null
structure pointers. (how perverse can you get?) Is this really a
problem? Have many people seen this? 

-- 
 I'd rather be compatible than right.

 Phil Ngai +1 408 982 7840
 UUCP: {ucbvax,decwrl,hplabs,allegra}!amdcad!phil
 ARPA: amdcad!phil@decwrl.dec.com

bct@its63b.UUCP (03/04/87)

In article <15014@amdcad.UUCP> phil@amdcad.UUCP (Phil Ngai) writes:
>I'm glad to see some people are mildly interested in this subject. :-)
>I didn't make very clear the purpose of the Unix system I'm designing.
>It is intended to be a throw-away. We only want to bring it up to show
>that Unix can be brought up, and to see how fast it runs.  We don't
>intend to sell this software as a product. As such, our only interest
>is in doing it fast and to be able to run as much existing software as
>possible. 
>
>-- 
> I'd rather be compatible than right.
>

 Did the nice marketing man with a suit and tie tell you the bit about
"..intended to be a throw-away. We only want to bring it up to show that .."?

 They do that to all the systems people all the time. Hardware people get it 
too. "..just put this MC68020 on a board to show that we can do it..", and
 before you know it the company has a new product.

  Where I worked (a large corporation in New England) they always sung me 
the just a "throw away" line. There's no such thing. When it's done someone
will like it and buy it. Then you'll have to maintain it. If not you then
someone else will get the job. You'll get bug reports complaining about bugs
you knew were there all the time. You'll get bug reports about features. 
You'll write accurate documentation explaining how it's all a kludge, and the
editors will expunge those bits. The millstone will be around your neck for
years and years.

 Do it properly the first time. Do it properly everytime. Do a job to be 
proud of. Make checking segment violations the default behaviour, make it
hard to turn them off. [But put a way of turning them off, just to save your
ass]. I recomend turning off the segementation violations/memory violations
or whatever in the software and have them always flagged by hardware.

 Don't you ever wonder how so much of the stuff out there is pure junk? Its 
the marketing people telling you ".. it's just this little demo... after the
show we can put it right..". It never gets put right, they put you on a new
project before your feet hit the floor.

  Brian
-- 
> Brian Tompsett. Department of Computer Science, University of Edinburgh    <
>                   JMCB, The King's Buildings, Mayfield Road, Edinburgh,
>                        Scotland, EH9 3JZ.
>E-Mail:     JANET: bct@uk.ac.ed.ecsvax
>           USENET: bct@ecsvax.ed.ac.uk
>             UUCP: seismo!mcvax!ukc!ecsvax.ed.ac.uk!bct
>             ARPA: bct%ecsvax.ed.ac.uk@ucl-cs
>           BITNET: psuvax1.bitnet!ecsvax.ed.ac.uk!bct
>                or bct%uk.ac.ed.ecsvax@earn.rl.ac.uk
>Phone:  +44 31 667 1081 x 3332

aglew@ccvaxa.UUCP (03/13/87)

...> Phil Ngai wondering if he should make low memory dereferencing
...> fault, or tolerate BSD VAX style code.

Well, I wrote a first response to this last week, and then promptly
got bitten by it.

Gould traps read protection violations, but a kernel flag can be set
to ignore, log (to user and console), or send SIGSEGV or SIGSTACK
to the faulting process. The administrative program to do this is
called "prot".

Anyway, like any development shop, we always run with "prot abort".
Except that last week somebody had to run a cross-assembler that
had been written on VAX BSD. It died. But they *really* *had* to
run it, so they turned "prot ignore" on.

Now, "prot ignore" applies system wide. I was coding that night, and
made some silly errors, and was surprised to come back the next morning,
with "prot abort" turned back on, and find out that 'formerly working'
code from the night before had mysteriously broken.

Come the weekend and some free time, half an hour of code, an hour of
compiling, two hours of documentation and three hours of testing...
"prot" can now be turned on per-process. When or if this will make it into
a product I don't know, but it made me feel better.

Tidbits that may be useful: put the per-process flag into the proc, not the
u. The syscall that sets or reads it should take a pid as an argument,
mainly so that a utility program can perform

    	prot_syscall( getppid(), PROT_IGNORE )

applied to your shell session, if you are about to use a lot of VAXish code,
and don't want to bother prot'ing each of them. Make it inherited.

Something that I didn't do, but which might be useful: it might be nice to
attach this flag to executable files, so that buggy vax code can be
executed without clogging your system logs. Also, it would be nice to
be able to distinguish system programs, like csh and as, for which it
would be nice to keep a log of protection violations, from user programs
whose errors have no role getting into a system log. Distinguishing
on the basis of uid is easy, but insufficient.

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms.arpa