[net.emacs] gnu design question

jqj@bullwinkle (12/30/85)

From: jqj@bullwinkle (J Q Johnson)

One of the design goals of C was to produce a language with very simple and 
efficient variable allocation.  The result, with no nested procedures and 
only local and global variables (no uplevel references, no heap variables
though you do have heap storage, no variable-dimensioned arrays) produces
a language that can be implemented without a frame pointer.  This is
a desirable characteristic of the language, and implies that you should
not expect all or even the typical implementation to use a frame pointer.

Unfortunately, since the VAX CALLS/RET discipline uses a frame pointer,
the alloca() routine exists.  And since it exists on the VAX, it is used
heavily by GNU emacs.  However, its use conflicts with one of what I take
to be GNU emacs's design goals -- portability.  I would like to improve
that portability if possible.  My particular domain is porting GNU emacs 
to a Gould PN.  I see several alternatives, and would like advice on which 
to pursue.

Note that emacs uses alloca() in a very stylized way -- almost entirely
to provide dynamically dimensioned arrays.  One never sees code like:
foo() { ...
	while (baz) {
		p->next = alloca(sizeof(p)+n); p = p->next;
	...
So alloca() would not have been necessary at all in an Algol-like language.

My alternatives seem to be:

    1/	implement a full alloca() for the Gould.  This would not be hard
but would be a gross assembler hack -- alloca()ed variables would actually
be malloc()ed, but alloca() would push the ptr on a private stack and
fudge the caller's return to free() then pop the private stack.  setjmp{}
would be extended to mark the private stack, and longjmp() would do a
number of free()s.  The only visible change to existing code would
be redefining jmp.buf by changing the #include to reference a private
setjmp.h.  However, as I remarked above this would be a gross assembler
hack, and nonportable.  Also, it would be impossible to preserve the
alloca() semantics during interrupts, and would limit you to a (small)
fixed number of outstanding alloca()s based on the size of the private 
stack.

    2/	recode entirely in C to eliminate the need for alloca() as such.
This could be done in several ways.  Perhaps the simplest would be to
have alloca() allocate on the heap, saving the pointer on a private stack,
and in alloca()'s caller record the private stack top in a local variable.
Add a "mark()" routine at the beginning of each function that uses alloca()
and a "release()" routine at each return: 
	old:				new:
foo() { ...			foo() { ...
  char *name=alloca(xxx);	  int alloclim = mymark();
  ...				  char *name=myalloca(xxx);
       return baz;		  ...
  ...					(myrelease(alloclim),  return baz);
}				  ...
				  myrelease(alloclim);
				}
Similarly, setjmp would be protected with a mark/release:
	old:				new:
if (setjmp(buf))		int alloclim; ...
	foo;			if (alloclim = mymark(), setjmp(buf))
					(myrelease(alloclim), foo);
This scheme has the disadvantage of running slower if a true alloca() exists
(unless all the code is conditionalized, which would be a maintenance 
headache).  It also makes the code less clear, again increasing maintenance.
But it has the advantage of being completely portable.

   3/	Recode GNU emacs to do compile-time maximum-size array allocations 
instead of runtime bounds.  This is typical C style but could increase
(dramatically!) the size of the stack.  Except in a few places where reasonable
bounds are known in advance (e.g. dispnew.c), I think this would be a bad idea.

   4/	Hack up the Gould C compiler to use a stack frame.  And it's still
non-portable.

   5/	Do something else entirely.  (e.g. use Hemlock?)


1/, 2/, or 3/ would each take, I estimate, less than a person-week to 
program and debug.  4/ would take several person-months for the average 
hacker.

gwyn@BRL.ARPA (12/30/85)

From: Doug Gwyn (VLD/VMB) <gwyn@BRL.ARPA>

Alloca() is definitely not portable, and Dennis Ritchie even tried
to suppress it altogether in 7th Edition UNIX (obviously not fully
successfully).  Except for the action of longjmp(), alloca() does
nothing that malloc()/free() would not do.  I don't understand why
GNU EMACS wants to use setjmp() and alloca() so heavily; in the
over 500,000 lines of source code that I maintain, there is not a
single use of alloca(), and only a few places where it could have
been used effectively if available.

macrakis@harvard.UUCP (Stavros Macrakis) (12/31/85)

In article <21@cornell.UUCP>, jqj@bullwinkle writes:
> the alloca() routine ... is used heavily by GNU emacs....
> My alternatives seem to be:
>     1/	implement a full alloca() for the Gould.  ...
>     2/	recode entirely in C to eliminate the need for alloca()...
>    3/	Recode GNU emacs to do compile-time maximum-size array allocations... 
>    4/	Hack up the Gould C compiler to use a stack frame....
>    5/	Do something else entirely.  (e.g. use Hemlock?) ....
> 1/, 2/, or 3/ would each take, I estimate, less than a person-week to 
> program and debug.  4/ would take several person-months....

Why not the very simplest solution to alloca: a parallel stack for
alloca'd objects with Mark/Release?  This stack would have to be as
large as the total size of allocas alloc'able at one time.  In a
virtual memory system with large address space, there should be no
problem at all.  Part of the charm of a Mark/Release scheme (as of
regular alloca) is that it is robust: if a Release is `forgotten' via
some coding or timing error, the stack still gets cleaned up
eventually.

    #define Max_Total_Alloca 10000
    char linear_heap[Max_Total_Alloca];
    int linear_heap_pointer = Max_Total_Alloca;

Every routine that used alloca would have to include the Mark macro
among its declarations (of course, it doesn't hurt to have a Mark even
if there is no alloca within the routine):

    #define Mark char *Mark_point = linear_heap_pointer;

...and would release just before returning:

    #define Release linear_heap_pointer = Mark_point;

...for convenience:

    #define Return(x) {Release; return(x);}

Setjmp is slighly more complicated.  Ideally, the setjmp environment
should include the linear_heap_pointer.  If this cannot be
accomplished, then every statement that has a setjmp within it must
become:
    { Mark;
      ...  setjmp ...
      Release; }
Ideally, Setjmp would just be
    #define Setjmp (Mark, setjmp(x) + Release) 
but this won't work for two reasons:
  1. Mark declares a variable, and there can be no declarations within
     C statements; 
  2. the order of evaluation of f()+g() is undefined, and in
     particular on the Vax is the wrong way round.

The function alloca itself is very simple:

#define alloca(x) (Mark_point, _alloca(x))
/* Defined like this to give an error if Mark has not been used. */
char *_alloca(size)
  int size;
{ if ((linear_heap_pointer -= size) < 0) ...error...;
  return(&(linear_heap[linear_heap_pointer]));
}

Note that as a standard precaution the reset-world function should
reset the linear_heap_pointer.  So should the top-level loop.

Since there are only 46 alloca's and 6 setjmp's in all of gnumacs (and
several functions have several allocas), I estimate less than one
man-day to install this type of alloca.  Since I'm working on my
thesis, I am not volunteering (not to mention that my machine has
alloca!).

Note that there is no need to conditionalize the Mark/Return's in
implementations with regular alloca's, since you can just define them
to be null:
    #define Mark
On machines with regular alloca, the alloca macro as above can be
preserved to provide a compile-time error indication in case of
forgetting to use Mark.  Of course, this doesn't guarantee that
Release is used consistently.

	-s

shaddock@rti-sel.UUCP (Mike Shaddock) (01/01/86)

In article <21@cornell.UUCP> jqj@bullwinkle writes:
>From: jqj@bullwinkle (J Q Johnson)
>
> ... Here jqj talks about the initial design goals of C, the rise of
>     alloca(), and the problems of porting GNU Emacs to a Gould PN
>     machine.

I too am trying to port GNU Emacs 16.60 to a Gould machine, and have
run into some of the problems mentioned here.  I have stopped working
on it for the time being, but have decided that several unfortunate
design decisions were made:

  (1) Non-portable use of alloca.
      As pointed out, alloca will not work on machines without stack
      and frame pointers.  When I pointed this out on the GNU Emacs
      mailing list, the response was that alloca "was used because it
      was good for GNU, that GNU would have alloca, and that the author
      did not intend support machines that were deficient in important
      features such as not having stack and frame pointers".
      According to some people very familiar with Gould machines,
      implementing a full alloca would be very, very hard, and
      certainly not worth the time for one program.

  (2) Use of setjmp/longjmp.
      GNU Emacs uses setjmp/longjmp in several places, when it would
      have been more portable (and more correct in a pure theoretical
      sense) to return error codes, etc. instead of just jumping to
      some other place in the program.

  (3) Implementation of Lisp_Objects
      Why wasn't a simple structure, such as

	typedef struct {
	    char *multiple_things;
	    char type;
	} Lisp_Object;

	or

	/*
	 * No flames if this is slightly incorrect, I
	 * avoid using unions and am not up on the syntax
	 */
	typedef struct {
	    union {
		    /* Put all the different things */
		    /* a Lisp_Object can be here */
	    } foo;
	    char type;
	} Lisp_Object;

      used instead this rather gross way of hacking on an int?  What
      would happen if a pointer needed more than the 24 bits that it
      gets in the int?  This may be handled correctly (I couldn't
      determine that from the code), but it would certainly be a little
      less obscure to use a structure.  Some places in the code seem to
      *depend* on Lisp_Objects being ints, so it is non-trivial to
      change this for a particular machine.

  (4) Unexec
      Use of unexec is a gross hack merely for a little efficiency.

Given sufficient time I could probably re-write enough of GNU Emacs to
fix these problems, but I have neither the time nor the inclination to
do so.  Some of these complaints may be answered in the newest version
of GNU Emacs, but I have a copy of 17.31 and it doesn't look like it.
GNU Emacs seems to be a very good program from the user's point of
view, and could increase people's productivity, but I'm afraid that the
attitude of some of the people involved with GNU may be its downfall.
Portability is an important issue, and unless GNU, including GNU Emacs,
is highly portable, I'm afraid that it won't be a successful as it
should.
-- 
Mike Shaddock		{decvax,seismo,ihnp4}!mcnc!rti-sel!shaddock

"You're in a twisty maze of sendmail rules, all obscure."

daveb@shrew.UUCP (Dave Brower) (01/05/86)

[ I expect to be corrected ]

GNU is not, as far as I can see, intended to be really portable to
things like BSD, SV, SV.2, etc., on different machines.  Since GNU
is intended to be a complete *NIX replacement, rms and the team
of hackers can feel free to assume the presence of alloca() on
every machine in the universe.   They are, after all, designing
the universe.

It just happens that GNU emacs is available before the rest of the system,
and if you can use it, great.

I've been contemplating the conditional/macro apprach to alloca on
different machines, since you really do want to use it on a machine
that will support it.

-dB

marick@ccvaxa.UUCP (01/06/86)

Setjmp/longjmp work just fine on Gould machines; only alloca() is missing.

I haven't looked at unexec in gnumacs, but I've written a rather large program
that does something similar for much the same reason.  It might be nice for 
unexec to be a library routine, but in my experience those people who use it 
all need to do something slightly different -- you need an example, not a 
library routine.

I can't provide a full example, but I can tell you how to write an unexec-like
thing for ZMAGIC files:

1.  Find the old executable -- you'll need it for the symbol table.
    The Franz Lisp gstab() routine is an example of what you need to do.

2.  Unlink the destination file (so creat() doesn't reuse the modes).

3.  Fetch the old header (out of the old executable or starting at the first
    word of this process's text space).

4.  Fill in the new header -- note that both header.a_data and header.a_nbdata
    must be updated if you've used sbrk().

5.  Write the new header out.  Its size in bytes is ZTXTOFF (from a.out.h).

6.  Write the text out, starting at header.a_txbase + ZTXTOFF and going for
    header.a_text-ZTXTOFF bytes.

7.  Write the data out.  header.a_text is rounded up to a multiple of pages,
    so you don't have to.

8.  Copy the symbol and symbol tables from the old executable 
    to the new one.  The symbol table starts at a_text + a_data in the old
    executable. 

The only problem I know of is a bug in signal handling.  In (at least some
distributions of) UTX, sigvec breaks when called "again" from a saved process
image.  See sigvec.s -- you will need to do a setsigc system call to get
around the problem.  This bug will be (has been?) fixed in later versions of
UTX.

Hope this helps someone.  Usual disclaimers about opinions apply; further, 
I've been known to make mistakes both writing and reading code, so anything
I wrote above could be wrong.


Brian Marick, Wombat Consort
Gould Computer Systems -- Urbana
...ihnp4!uiucdcs!ccvaxa!marick
ARPA:  Marick@GSWD-VMS

sjm@dayton.UUCP (Steven J. McDowall) (01/15/86)

Just a simple question, as a new member of the net, where can
I get GNU emacs? Also, how is the GNU project coming along,
and where can I find updated information on GNU?

-- 
Steven J. McDowall	
Dayton-Hudson Dept. Store. Co.		UUCP: ihnp4!rosevax!dayton!sjm
700 on the Mall				ATT:  1 612 375 2816
Mpls, Mn. 55408