[net.unix-wizards] Bizzare Bourne Shell

hardy@sdccsu3.UUCP (Jeff Hardy) (05/12/84)

The Bizzare Bourne Shell

The Bourne Shell relies on some very obscure quirks of the VAX/PDP-11 and
Unix.  As many of us know, the Bourne Shell does its own memory allocation
scheme.  This scheme does not necessarily do an sbrk when memory is
allocated, but merely marks the memory as allocated.  When an illegal
reference is made to this memory, the memory fault signal is caught,
the sbrk is done and the program continues on its merry way.  This works
like a charm on the VAX and PDP-11.  It cannot work at all on the MC68000
or very easily on the MC68010.

The reasons this cannot work on the MC68000 are well known, basically the
68000 cannot recover from an illegal memory reference.  It cannot work
easily on the MC68010 because the 68010 can only CONTINUE a faulted
instruction, not RESTART it.  Why this causes the Bourne Shell difficulty
can be explained by:  A memory fault occurs, we enter the USER signal
processing which performs an sbrk, then we want to continue in the
instruction which generated the memory fault.  Unfortunately, we are now
in USER state and we can only continue the instruction by doing an RTE,
which must be done from SYSTEM state!  There are, of course, solutions.
Software continuation of the instruction or a special hook to allow a USER
generated RTE might be possiblities.  But I doubt you could convince me
that they are either easy or not kludges.

What most people who have ported the Bourne Shell have done is to catch
all the places where the "non-allocated" memory might be referenced and
check for the sbrk being done.  This is not terribly pleasant either,
since these references are sprinkled throughout the Bourne Shell.
Fortunately, most are done through macros.  But this only mildly
soothes my angst and ire.

Unix Guru Question:  what is the value of *p after the second sbrk?

	p = sbrk(0); *p = 1; sbrk(2);

The Unix Programmer's Manual says that all newly allocated memory is
initialized to zero.  However, on Unix Version 6, 7, System III and
System V, *p is one!  Indeed, Unix only zeros memory when a new MMU
segment of memory is allocated.  Also, this code will generate a memory
fault if p happens to point to a new MMU segment.  I contend that this
really is a Unix bug, not a "feature".  Either sbrk should not initialize
any memory, or it should always initialize memory.  The half-assed attempt
it does now can only lead to bizzare usages of this "feature".

You guessed it, the Bizzare Bourne Shell depends upon this.  The
Bourne Shell places a pointer into its available space pool at the
location returned from an sbrk(0).  If sbrk were to act as the
documentation indicates, the next time you did an sbrk, it would
corrupt your free list.  Also note that on the VAX and PDP-11 this trick
is guaranteed to work 100% of the time, since if p points to an invalid
address, the Bourne Shell will field the memory fault and do the sbrk,
which initializes the new segment to zero, before the assignment of the
location.  Only careful coding will prevent this on the 68000/68010.

I really question whether a program as critical as the Bourne Shell
should depend upon not merely an undocumented "feature", but one seemingly
CONTRADICTED by the documentation.

These are just a couple of the many interesting quirks in the Bourne Shell.
I will not even comment on those brain-damaged individuals who insist
upon "improving" the C language with macros like IF, THEN, ELSE, or who
fail to use standard include files like <signal.h>.

Michael Christensen
Unix System V Project Manager
Alcyon Corporation

ka@hou3c.UUCP (Kenneth Almquist) (05/14/84)

I contend that given the code

	p = sbrk(0); *p = 1; sbrk(2);

a memory fault should be generated when the second statement is executed
because it references memory above the break point.  If the program
catches memory faults and increases the break value, then after the
second sbrk the value of *p should be one.  In practice most hardware
will not allow you to set a break value to an arbitrary location so this
code may not generate a memory fault, but the value of *p should never
the less be one after the second sbrk.

None of this is intended to excuse the other faults of the Borne shell.
				Kenneth Almquist

kds@intelca.UUCP (05/16/84)

>The Unix Programmer's Manual says that all newly allocated memory is
>initialized to zero......
>.........Indeed, Unix only zeros memory when a new MMU
>segment of memory is allocated.  Also, this code will generate a memory
>fault if p happens to point to a new MMU segment.  I contend that this

And Eunice does this also, also, it seems that even calls to
get memory using malloc, etc. do not always create zeroed out memory
areas (as someone suggested).  In addition, unless you set some special 
loader flags you may not get contiguous memory allocated, which creates mucho
problems with, for example, nroff! (and, apparently, sh)

But, its better than RAW VMS...verbum sat sapentia
-- 
Ken Shoemaker, Intel, Santa Clara, Ca.
{pur-ee,hplabs,ucbvax!amd70,ogcvax!omsvax}!intelca!kds

dan@idis.UUCP (Dan Strick) (05/17/84)

I looked up the V6, V7, and 4.0bsd (~32V) manual pages for the
break system call and found that all mention the fact that the
break address is rounded up and none mention the fact that new
memory is cleared.

I assume the unix support group changed the manual page to eliminate
a machine dependency (the rounding) and to make the documentation
more complete (clearing new memory).  These kind of changes to
documentation are always good.  Right?  Wrong!
New memory is cleared not so much to save the programmer the trouble
of clearing it explicitly but to avoid apparently randomly malfunctioning
programs and possibly as a slightly paranoid security measure.
It would have made as much sense to set new virtual memory locations
to -1.  Some features are best left undocumented or hidden in a footnote.
(especially obviously implementation dependent features)

The old manual pages give the rules for rounding up break addresses
on each of the various machines that unix ran on at the time that those
old versions of unix were released.  This had to go, but it would
have made sense to mention that the system did not always take requested
break addresses literally.  Some features are best left documented.
(especially when omission is misleading)

My opinions: the unix implementation of the brk() system call is
correct.  Recent documentation (i.e. system 5) may be misleading
(so much for professionally designed user friendly documentation).
The Bourne shell implementation (which motivated the tirade about
the unix brk() implementation) is sick.  It breaks all the rules.
Worse: it uses longjumps.

				Dan Strick
				[decvax|mcnc]!idis!dan

steiny@scc.UUCP (Don Steiny) (05/18/84)

***

	When I was first trying to figure out the Bourne shell,
I found it useful to do:

	cc -E module.c | cb > more_readable

	The file "more_readable" has had the preprocessor
substitute in the C for the Alogol 68.

				Don Steiny
				Personetics
				425-0382

chris@umcp-cs.UUCP (05/20/84)

Now hold on a minute here ...

	From: hardy@sdccsu3.UUCP

	Unix Guru Question:  what is the value of *p after the
	second sbrk?

		p = sbrk(0); *p = 1; sbrk(2);

Undefined.  sbrk(0) simply returns the address of the current break.
If there is room beyond the current break, then *p will be one.  If
not, then you get that memory fault you're griping about, and I don't
really want to know the details....

	The Unix Programmer's Manual says that all newly allocated
	memory is initialized to zero.  However, on Unix Version
	6, 7, System III and System V, *p is one!  Indeed, Unix
	only zeros memory when a new MMU segment of memory is
	allocated.

Where does it say that?  Not in ``man 2 brk'' (where one finds the
sbrk manual).  If you have a partial page, obviously sbrk is going
to be lazy.  Use calloc() if you want zeroed memory.  (I know, your
complaint is that the Bourne shell doesn't - so that makes the
Bourne shell guilty of making hardware assumptions.  But don't
blame the manuals.)

	Also, this code will generate a memory fault if p happens
	to point to a new MMU segment.  I contend that this really
	is a Unix bug, not a "feature".  Either sbrk should not
	initialize any memory, or it should always initialize
	memory.  The half-assed attempt it does now can only lead
	to bizzare usages of this "feature".

It should be obvious that memory has to be initialized to *something*,
or you'll have a huge security hole.  But why initialize it more than
once?  That's what calloc() is for.

	I really question whether a program as critical as the
	Bourne Shell should depend upon not merely an undocumented
	"feature", but one seemingly CONTRADICTED by the documentation.

The Bourne shell should *not* depend on it.  (And I can't stand the fake
ALGOL either.)

(Anybody know if ksh has this kind of code in it? :-) )
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

avr@CS-Arthur.UUCP (05/29/84)

x

----------------------
From: hardy@sdccsu3.UUCP

I will not even comment on those brain-damaged individuals who insist
upon "improving" the C language with macros like IF, THEN, ELSE . . . .
----------------------

	But I will. It's done in 'adb', too, which I'm doing something with,
and its a royal pain - looks ugly as BabaYaga, too. UGH!

				Andrew Royappa
				{ucbvax,decvax,pur-ee,ihnp4}!purdue!avr