guy@gorodish.Sun.COM (Guy Harris) (04/25/88)
> This convention started in split-space programs on the pdp11/45 under V6, > as an implementation accident: the data space really did begin at 0 in > a split-space program, so an anonymous variable was inserted there to make > sure that no normal variable had address 0, and unfortunately said variable > was initialized to 0. Any split I&D program with a data+BSS segment larger than 8K could be set up to remove 0 from the address space; just make the bottommost data page a "grow down from the top" page and stop just before the last click. Programs with smaller data segments could have the BSS padded. The same could be done for non-split-I&D programs by doing the same to the bottommost page (text or data). This would have introduced some extra complications; I don't know whether it would have been worth it or not. Old binaries would still have to have been supported, probably. > (I hesitate to blame Dennis in particular; he is probably responsible for > the anonymous variable, but it would have required unusual foresight to > predict the problems and put some strange value there instead of zero.) Dennis sent me mail a while ago indicating that John Reiser had done a paging 32V and that it did *not* have a location 0 in the address space; he referred to this as one of the best things about that system. He also indicated that this system had other interesting features, such as mapped files. Alas, this system never got out the door.... > Both System V and Berklix inherited the problem, but it *mostly* got cleaned > out of Berklix by early efforts at Sun that were fed back to Berkeley. It > keeps creeping back, since Berklix (unlike Sunnix) does not set up its page > map to trap accesses to location 0. Unless it's been fixed quite recently, > System V still has the problem, since AT&T likewise does not map out 0. That's somewhat hardware dependent; AT&T's releases don't, but people who port it to their hardware can. Also, the paging S5R2.2 release had a "-z" flag to the linker that set up a "no page 0" executable; alas, this flag wasn't the default, so buggy programs didn't get fixed. (I seem to remember a claim that the '286 or '386 port of S5 has no location zero, and the Motorola 68K port may also have no location zero; alas, none of the fixes made to programs that broke got folded back into the mainstream S5 releases.) > (If it comes to that, Sun would probably have preferred not to, but as I > recall it their early hardware gave them no choice.) Probably. At this point, we continue it in order to catch bugs; the Sun-2, Sun-3, Sun-4, and Sun-386i all have no location zero in user mode. The Sun-3 and Sun-4, and perhaps the Sun-386i as well, have no location zero in kernel mode either; this caught a number of null-pointer-dereferencing bugs in the kernel. I know at Computer Consoles, our 68K machine had no location 0 accessible from user mode, because there were no separate kernel and user address spaces, the kernel portion was read-and-write protected in user mode, and the interrupt vector had to live at the bottom of the address space and it was considered part of the kernel. The same problem may have occurred at Sun. > Patches to make 4.2 map out zero have been posted in the past; the same > could probably be done for System V, but far more programs would break. You bet. Many of them *did* break when we added them to SunOS - one reason why I wish the fixes for the '286 and 68K S5 ports had made it back into the mainstream....
dhesi@bsu-cs.UUCP (Rahul Dhesi) (04/26/88)
[0 stored at location 0, allowing access through null pointer] A suggestion to system designers, to decrease the frustration of customers who want to run software developed on systems that return 0 from a dereferenced null pointer. So long as your memory management hardware is trapping references through a null pointer and printing an error message, how about allowing the user to set a switch that will cause such illegal trapped references to be handled by an emulation routine that will cause a zero to be returned and continue execution? Conceptually this is not much different from emulation of illegal (nonexistent) floating point instructions, or handling of page faults. -- Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi
dlm@cuuxb.ATT.COM (Dennis L. Mumaugh) (04/26/88)
In article <50676@sun.uucp guy@gorodish.Sun.COM (Guy Harris)
writes:
??> Both System V and Berklix inherited the problem, but it
??> *mostly* got cleaned out of Berklix by early efforts at
??> Sun that were fed back to Berkeley. It keeps creeping
??> back, since Berklix (unlike Sunnix) does not set up its
??> page map to trap accesses to location 0. Unless it's been
??> fixed quite recently, System V still has the problem,
??> since AT&T likewise does not map out 0.
GH> That's somewhat hardware dependent; AT&T's releases don't, but
GH> people who port it to their hardware can. Also, the paging
GH> S5R2.2 release had a "-z" flag to the linker that set up a "no
GH> page 0" executable; alas, this flag wasn't the default, so
GH> buggy programs didn't get fixed. (I seem to remember a claim
GH> that the '286 or '386 port of S5 has no location zero, and the
GH> Motorola 68K port may also have no location zero; alas, none
GH> of the fixes made to programs that broke got folded back into
GH> the mainstream S5 releases.)
Yes, we were going to make -z the default and then we discovered
that we couldn't map out page 0 of the processes. Seems that on
a WE32100 chip the "gate" instruction wants the kernel trap
vectors maped into the user virtual locations starting at 0. [The
the kernel gate tables reside in page 0.] That's also why we have
to have our programs load into the top half of vitrual memory
making all pointers negative ( try subtracting two pointers!).
On ports to other boxes it is possible to have page 0 mapped out.
And most do.
--
=Dennis L. Mumaugh
Lisle, IL ...!{ihnp4,cbosgd,lll-crg}!cuuxb!dlm
henry@utzoo.uucp (Henry Spencer) (04/26/88)
> ... at Computer Consoles, our 68K machine had no location 0 accessible from > user mode, because there were no separate kernel and user address spaces, the > kernel portion was read-and-write protected in user mode, and the interrupt > vector had to live at the bottom of the address space and it was considered > part of the kernel. The same problem may have occurred at Sun. I believe exactly the same situation occurred on the Sun 1, in fact, since the 68000 couldn't relocate its interrupt vectors and Sun's MMU had the user and the kernel sharing the address space. -- "Noalias must go. This is | Henry Spencer @ U of Toronto Zoology non-negotiable." --DMR | {ihnp4,decvax,uunet!mnetor}!utzoo!henry
ron@topaz.rutgers.edu (Ron Natalie) (04/26/88)
I believe Gould calls this feature on their system: "Braindamaged VAX compatibility mode." -Ron
blarson@skat.usc.edu (Bob Larson) (04/26/88)
[followups redirected to comp.lang.c, this isn't a unix-specific issue.] In article <2730@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: >So long as your memory management hardware is trapping references >through a null pointer and printing an error message, how about >allowing the user to set a switch that will cause such illegal trapped >references to be handled by an emulation routine that will cause a zero >to be returned and continue execution? Unfortunatly, this would only encorage the develepment of buggy software, and slow the fixing of existing buggy software. I propose adding a flag to executables that tells the system how to handle null pointer dereferences. Normally, they should trap to a fatal error procedure, but an option to trap to a routine that does a sleep(1), sets the process priority to the lowest possible, and then returns a 0 of the correct type to the buggy program might be useful as a porting aid. (Why should a background process be considered less important that continuing a known buggy program?) The reduced speed should be enough so software vendors can't get away distributing broken programs. -- Bob Larson Arpa: Blarson@Ecla.Usc.Edu blarson@skat.usc.edu Uucp: {sdcrdcf,cit-vax}!oberon!skat!blarson Prime mailing list: info-prime-request%fns1@ecla.usc.edu oberon!fns1!info-prime-request
dkc@hotlr.ATT (Dave Cornutt) (04/26/88)
In article <2730@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes: > A suggestion to system designers, to decrease the frustration of > customers who want to run software developed on systems that return 0 > from a dereferenced null pointer. > > So long as your memory management hardware is trapping references > through a null pointer and printing an error message, how about > allowing the user to set a switch that will cause such illegal trapped > references to be handled by an emulation routine that will cause a zero > to be returned and continue execution? Gould already does this. They have a utility called "prot" in /etc that can set this to one of three modes: ignore, warn, or fatal. In ignore mode, reads of location 0 are emulated and return 0, and writes silently go in the bit bucket. Warning mode is the same except that the a message is printed on the terminal (if there's a controlling tty associated with the offending process), and in the system error log. In fatal mode, the offending process gets socked with SIGSEGV. They strongly recommend that you put it in fatal mode in /etc/rc and leave it that way unless ignore or warning mode is needed for some particular reason. (Note: these are not the actual mode names; I can't remember them.) And, in article <Apr.25.20.40.05.1988.10155@topaz.rutgers.edu>, ron@topaz.rutgers.edu (Ron Natalie) writes: > I believe Gould calls this feature on their system: "Braindamaged VAX > compatibility mode." Well, not *officially*... :-) -- Dave Cornutt, AT&T Bell Labs (rm 4A406,x1088), Holmdel, NJ UUCP:{ihnp4,allegra,cbosgd}!hotly!dkc "The opinions expressed herein are not necessarily my employer's, not necessarily mine, and probably not necessary"
andrew@frip.gwd.tek.com (Andrew Klossner) (04/26/88)
[] "I believe Gould calls this feature on their system: "Braindamaged VAX compatibility mode."" We can sit here and smirk about how ideologically impure these fools are who want a 0 at location 0, but the real world is full of hoary old programs that run fine on a VAX and fail on the class of systems that don't have a 0 at 0. If I'm a computer center manager searching for a replacement for my aging 11/780's, and my several-megabyte Bread-and-Butter Application works on system X but not on system Y, how much credence do you think the Y salesperson will get from me when she explains that my program has no business dereferencing 0? This is a real go/no-go decision in a significant number of sales. Tektronix has eschewed ivory tower arguments in favor of pragmatism and taken careful pains to put a double 0 at location 0 on all our U**x graphic workstation products. This is the *default* behavior; you can't expect the salesperson to remember to specify the "-braindamage" switch when compiling the potential customer's application. [You could take the cynical attitude that, but putting 0 at 0 by default, we encourage development of buggy software that won't port to other vendors' systems and so lock our customers into our product line. I only just thought of that; we never consciously considered this.] "So long as your memory management hardware is trapping references through a null pointer and printing an error message, how about allowing the user to set a switch that will cause such illegal trapped references to be handled by an emulation routine that will cause a zero to be returned and continue execution?" The obvious performance improvement here is simply to make this a load-time switch which causes 0 to be put at location 0. There's no need to bother with trapping references. -=- Andrew Klossner (decvax!tektronix!tekecs!andrew) [UUCP] (andrew%tekecs.tek.com@relay.cs.net) [ARPA]
jas@llama.rtech.UUCP (Jim Shankland) (04/28/88)
In article <9946@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) writes: >We can sit here and smirk about how ideologically impure these fools >are who want a 0 at location 0, but the real world is full of hoary old >programs that run fine on a VAX and fail on the class of systems that >don't have a 0 at 0. If I'm a computer center manager searching for a >replacement for my aging 11/780's, and my several-megabyte >Bread-and-Butter Application works on system X but not on system Y, how >much credence do you think the Y salesperson will get from me when she >explains that my program has no business dereferencing 0? Personally, all my code assumes there's a 17 at location 0. It comes in really handy sometimes, and it works great on my machine, a Waxahatchy 9400/X; if my program breaks on other machines that stupidly put some other value at 0, or that read-protect address 0, then those other machines are just broken. Aren't they? Jim Shankland ..!ihnp4!cpsc6a!\ sun!rtech!jas ..!ucbvax!mtxinu!/ And I will show you something different from either Your shadow at morning striding behind you Or your shadow at evening rising to meet you; I will show you fear in a handful of dust. -- T. S. Eliot
bob@cloud9.UUCP (Bob Toxen) (04/28/88)
It's not the VAX that is braindamaged so much as the programs. We added the same fix/brain-damage to our Stratus UNIX product. Anything that increases the reliability of the customer's system should be considered an improvement, pretty or ugly. At my previous company we let the customers find the refs thru 0. -- Bob Toxen {ucbvax!ihnp4,harvard,cloud9!es}!anvil!cavu!bob Stratus Computer, Marlboro, MA Pilot to Copilot: What's a mountain goat doing way up here in a cloud bank?
wcs@skep2.ATT.COM (Bill.Stewart.<ho95c>) (04/29/88)
In article <9946@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) writes: > If I'm a computer center manager searching for a replacement > for my aging 11/780's, and my several-megabyte Bread-and-Butter > Application works on system X but not on system Y, how much > credence do you think the Y salesperson will get from me when she > explains that my program has no business dereferencing 0? Sigh. It's evil, and managers just don't *understand* :-). We have a manager here who still doesn't trust HPs becuase the floating point formats are different, and doesn't trust 3Bs because they're big-endian. You can't just fix it by putting a zero into location zero, either, since people can dereference structure pointers without setting them first. (i.e. struct foo { int a,b,c; } foo; x=foo->c; ) You might tell them that VAX programs that suffer from this problem will sometimes produce WRONG OUTPUT SILENTLY on VAXen. -- # Thanks; # Bill Stewart, AT&T Bell Labs 2G218, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs # skep2 is a local machine I'm trying to turn into a server. Please send # mail to ho95c or ho95e instead. Thanks.
bp@pixar.uucp (There's too much damn' government) (04/29/88)
In article <9946@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) writes: >We can sit here and smirk about how ideologically impure these fools >are who want a 0 at location 0, but the real world is full of hoary old >programs that run fine on a VAX and fail on the class of systems that >don't have a 0 at 0. NYIT has a product with an embedded PDP-11 Version-6 UNIX, running tons of hoary 5 and 10 year-old programs. About two years ago I hacked the memory management of this system to make the low 64 bytes of data space invalid. That day about a dozen programs that had been buggy for years were fixed, as null-pointer references started dumping core. Most of those programs had uninitialized structure pointers that read or wrote a few words above the zero in the first word of the address space, causing all kinds of un-traceable problems. Once the system could tell us about null-pointer references, it was only a few hours work to fix them. I think VAX unix should make the low page of user space invalid so that these bugs would be trapped as they are on the Sun. The only complication would be with the PDP-11 emulation on older VAXes, which insists on running out of the low 64k of a process address space. Talk about brain-damage - there wasn't a base register for the PDP-11 emulation space. There's no reason for architects of new systems to reduce overall system reliability by not trapping address-zero references. Bruce Perens
bob@cloud9.UUCP (Bob Toxen) (05/01/88)
In article <1766@pixar.UUCP>, bp@pixar.uucp (There's too much damn' government) writes: > That day about a dozen programs that had been buggy for years > were fixed... it was only a few hours work to fix them. > > Bruce Perens Well, you found all cases in commonly used code paths. The possibility of cases in rarely used cases is the reason we emulate the VAX behavior in our system ... and we hit cases later. -- Bob Toxen {ucbvax!ihnp4,harvard,cloud9!es}!anvil!cavu!bob Stratus Computer, Marlboro, MA Pilot to Copilot: What's a mountain goat doing way up here in a cloud bank?
mike@turing.UNM.EDU (Michael I. Bushnell) (05/01/88)
You know, people love railing against VAX unix...but there is one reason that you CAN'T make the bottom of "data" un-readable and thus fix the problem. The VAX has a linear address space (or the VM hardware makes it look that way). There isn't a distinction between a data address and a text address, except that one is less than _etext. If you marked the bottom page non-readable, you would have massive problems... you couldn't read start and whatever else makes it into the beginning of the text space. There is a problem; but there isn't an easy way to avoid it, short of deciding that a unix process should be loaded starting at address 1024 and make the first page non-readable. N u m q u a m G l o r i a D e o Michael I. Bushnell HASA - "A" division 14308 Skyline Rd NE Computer Science Dept. Albuquerque, NM 87123 OR Farris Engineering Ctr. OR University of New Mexico mike@turing.unm.edu Albuquerque, NM 87131 {ucbvax,gatech}!unmvax!turing.unm.edu!mike
guy@gorodish.Sun.COM (Guy Harris) (05/02/88)
> There is a problem; but there isn't an easy way to avoid it, short > of deciding that a unix process should be loaded starting at address > 1024 and make the first page non-readable. Which is *precisely* what: 1) John Bruner's modifications to 4BSD and 2) The "-z" flag in paging VAX S5R2.2 do. You just need a way to tell the kernel about "page zero" versus "no page zero" executables (so you don't have to recompile all your old binaries), which COFF already has and which John Bruner added to 4BSD. You may end up seeing this in some future 4BSD release.
chris@mimsy.UUCP (Chris Torek) (05/02/88)
In article <1013@unmvax.unm.edu> mike@turing.UNM.EDU (Michael I. Bushnell) writes: >...there is one reason that you CAN'T make the bottom of "data" >un-readable and thus fix the problem. The VAX has a linear address >space (or the VM hardware makes it look that way). ... If you >marked the bottom page non-readable, ... you couldn't read start and >whatever else makes it into the beginning of the text space. This is trivial. `ld' normally starts the text space at 0. When linking with the no-zero-page option, it writes a no-zero-page style magic number, and starts the text space at CLBYTES. (It might be safest to start at, say, 8K rather than 1K, in case someone tries recompiling the kernel with a CLSIZE of more than 2. Not that this works in 4.2 and 4.3 BSD; someone confused CLBYTES and MCLBYTES, among other things.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
chip@ateng.UUCP (Chip Salzenberg) (05/04/88)
In article <50676@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >(I seem to remember a claim that the '286 or '386 port of S5 has no >location zero, and the Motorola 68K port may also have no location zero; >alas, none of the fixes made to programs that broke got folded back into >the mainstream S5 releases.) The following comments apply to the '286 in protected mode. In small and medium models (<= 64K of data), it is possible to set up the data segment so that offset zero doesn't exist. I don't know of any implementations that bother to do this. (Which is unfortunate.) In compact and large models (data limited only by memory/swap space), a NULL pointer is (in all implementations I know of) 32 bits of zeros. This value for NULL causes a protection trap whenever it is dereferenced, since the '286 defines segment selector zero to mean "no segment". -- Chip Salzenberg "chip@ateng.UU.NET" or "codas!ateng!chip" A T Engineering My employer may or may not agree with me. "I must create a system or be enslaved by another man's." -- Blake