[comp.emacs] Portability problem with gnu-emacs

lars@myab.se (Lars Pensj|) (09/13/88)

This proposal has probably been mentioned before, but I have not seen it.

Why is gnu-emacs implemented with the self dump feature?  I know that
it speeds up the start-up, but it is extremly unportable.

A suggestion:

Let temacs write the compiled lisp code into a file 'code.c' in the following
format:

char lisp_code[] = {
23, 45, 76, 93, -34, 45,
...
};


And then relink a new emacs with 'code.o'. 'temacs' can be linked with
an empty 'lisp_code' defenition.

This should be portable. Byte order should not be any problem, because
you write the file 'code.c' directly from the memory.

Some compilers even have a flag to put data in text area, which is
just what is wanted now.


    Lars Pensj|
    lars@myab.se
-- 
    Lars Pensj|
    {decvax,philabs}!mcvax!enea!chalmers!myab!lars

jr@bbn.com (John Robinson) (09/15/88)

In article <441@myab.se>, lars@myab (Lars Pensj|) writes an excellent idea:
>This proposal has probably been mentioned before, but I have not seen it.
>...
>Let temacs write the compiled lisp code into a file 'code.c' in the following
>format:
>
>char lisp_code[] = {
>23, 45, 76, 93, -34, 45,
>...
>};
>... etc.
>Some compilers even have a flag to put data in text area, which is
>just what is wanted now.

But the problem may be that not all compilers support this.  Of course
if GCC does, we should all adopt it instantly!  Also, signed chars
(they appear in your example) may be a problem.  But I merely quibble;
it's a great idea.  Needs elisp symbol-table hooking but not much more.
--

idall@augean.OZ (Ian Dall) (09/16/88)

In article <441@myab.se> lars@myab.se (Lars Pensj|) writes:
>This proposal has probably been mentioned before, but I have not seen it.
>
>Why is gnu-emacs implemented with the self dump feature?  I know that
>it speeds up the start-up, but it is extremly unportable.
>
>A suggestion:
>
>Let temacs write the compiled lisp code into a file 'code.c' in the following
>format:
>
>char lisp_code[] = {
>23, 45, 76, 93, -34, 45,
>...
>};

I think there might be a better way. The problem with this is that the compiled
lisp ends up in .data. On BSD machines at least the .data section will not
be shared so with multiple emacs users there will be multiple versions of the
preloaded lisp in memory. (I think SysV avoids this problem by implimenting
a copy on write scheme for the .data area but I could be wrong).

Would it be possible to translate the preloaded lisp to C in the format of
the lisp callable C functions already there. Eg:

	DEFUN ("kill-emacs", Fkill_emacs, Skill_emacs, 0, 1, "P",
	  "Exit the Emacs job and kill it.  ARG means no query.\n\
	If emacs is running noninteractively and ARG is an integer,\n\
	return ARG as the exit program code.")
	  (arg)
	     Lisp_Object arg;
	{
	  Lisp_Object answer;
	  int i;
	.
	.
	.
	}

This is of course a harder problem. Would there be fundamental restrictions
on the allowable lisp code to do this? With the addition of dynamic linking
this would also allow a true compiled E-lisp.

>Some compilers even have a flag to put data in text area, which is
>just what is wanted now.

Does this flag have the side effect of making the .text area unsharable?
If so it is not really the way to go.
  -- 
 Ian Dall           life (n). A sexually transmitted disease which afflicts
                              some people more severely than others.
idall@augean.oz

lars@myab.se (Lars Pensj|) (09/19/88)

In article <29698@bbn.COM> jr@bbn.com (John Robinson) writes:
>In article <441@myab.se>, lars@myab (Lars Pensj|) writes an excellent idea:
>>...
>>Let temacs write the compiled lisp code into a file 'code.c' in the following
>>format:
>>
>>char lisp_code[] = {
>>23, 45, 76, 93, -34, 45,
>>...
>>};
>...
> Also, signed chars
>(they appear in your example) may be a problem.

I do not think signed versus unsigned chars will be a problem.
If you have a machine with only unsigned chars, a compiled program (temacs)
will also only write unsigned numbers on the file 'code.c'.
Automatically portable !

I put the negative number on purpose in the example, to trigger a discussion,
because I am still not sure about the problem with the sign of characters.

---
lars@myab.se
-- 
    Lars Pensj|
    {decvax,philabs}!mcvax!enea!chalmers!myab!lars

kjones@talos.UUCP (Kyle Jones) (09/21/88)

In article <441@myab.se> lars@myab.se (Lars Pensj|) writes:
>Why is gnu-emacs implemented with the self dump feature?  I know that
>it speeds up the start-up, but it is extremly unportable.

Dumping compiled code also helps keep down GNU Emacs' virtual memory
usage (which in turn speeds startup time a bit more.)  Code common to
all invocations of the editor will be shared among concurrent Emacs
sessions, instead of being duplicated in-core.  Since GNU Emacs is BIG
even for Emacs-style editors, this is a big win.

As for portability, let us not forget that GNU Emacs is targeted for
the (not yet completed) GNU operating system. As such, it is
sufficient that the dump feature work with the BSD executable file
format that the GNU system ultimately will use.  The real task will be
to port the operating system once it is completed.

>A suggestion:
>Let temacs write the compiled lisp code into a file 'code.c' in the following
>format:
>
>char lisp_code[] = {
>23, 45, 76, 93, -34, 45,
>...
>};
>
>And then relink a new emacs with 'code.o'. 'temacs' can be linked
>with an empty 'lisp_code' defenition.

It's not just the lisp code that needs to be dumped but the entire
initilized lisp system e.g. interned symbols, internal pointers to
lisp objects, symbol -> function definition relationships, initialized
keymaps, and so on.  I can't see how this idea can be made to save
this information, since all the other .o files with which 'code.o' will be
linked already will have all external variables initialized to 0.

In article <394@augean.OZ> idall@augean.OZ (Ian Dall) writes:
>Would it be possible to translate the preloaded lisp to C in the format of
>the lisp callable C functions already there.
>...
>This is of course a harder problem. Would there be fundamental restrictions
>on the allowable lisp code to do this?

I have read that the Kyoto Common Lisp compiler does just that.  GNU
Emacs Lisp is clearly less complex than full Common Lisp so the
situation is workable.

kyle jones

janssen@titan.sw.mcc.com (Bill Janssen) (09/24/88)

In article <305@talos.UUCP>, kjones@talos (Kyle Jones) writes:
>GNU Emacs Lisp is clearly less complex than full Common Lisp...

"Clearly".  uh-huh.

Bill

rlk@think.com (Robert Krawitz) (09/24/88)

In article <1252@titan.SW.MCC.COM>, janssen@titan (Bill Janssen) writes:
]In article <305@talos.UUCP>, kjones@talos (Kyle Jones) writes:
]>GNU Emacs Lisp is clearly less complex than full Common Lisp...
]"Clearly".  uh-huh.

The problem is that it's missing a fair number of useful features, and
there are a few major problems mostly with the reader (lack of reader
macros, only dynamic scoping, and case sEnSiTiViTy).  Other than that,
it's quite powerful indeed, and it doesn't seem a lot "simpler"
conceptually.
-- 
harvard >>>>>>  |		Robert Krawitz <rlk@think.com>
bloom-beacon >  |think!rlk
topaz >>>>>>>>  .		rlk@a.HASA.disorg

meissner@xyzzy.UUCP (Usenet Administration) (09/26/88)

In article <441@myab.se> lars@myab.se (Lars Pensj|) writes:
| Why is gnu-emacs implemented with the self dump feature?  I know that
| it speeds up the start-up, but it is extremly unportable.

After spending a bit of time hacking an unexec to work on Data General MV
computers, let me share some observations about GNU:

    1)	The lisp code smashes addresses + lisp type into one 32-bit word, which
	looks like (on a big-endian machine):

	 3 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
	 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1
	+-+-+-------------+----------------------------------------------+
	|M|A|        Type | Bottom 24 bits of address or lisp integer    |
	+-+-+-------------+----------------------------------------------+

	Where the top bit is used as a mark during garbage collection, and
	the second bit seems to be used also for garbage collection for
	arrays.  The lisp type  follows, and the lower 24 bits are dependent
	on the type, but are typically the bottom 24 bits of the address, or
	the integer value.

    2)	Because of the lisp format, you can't really initialize things, since
	C has no way of getting just the bottom 24 bits of an address.

    3)	You can't really relink, because addresses might change due to the
	new module being added.

    4)	For a word oriented machine like the MV, you have to be careful to
	store a byte address when creating Lisp Objects, and expect it to be
	in byte address form, since garbage collection plays funny games with
	pointers.

    5)	Gnu emacs also depends on preserving the value of static variables, and
	some malloc'ed data.

    6)	The normal unexec code depends on having a dumb linker which will not
	reorder data segments from the order encountered in the object modules.

PS - for any DG/UX readers out there, I will try to make the DG/UX changes
available in mid October (I won't be back until the start of October).
-- 
Michael Meissner, Data General.

Uucp:	...!mcnc!rti!xyzzy!meissner
Arpa:	meissner@dg-rtp.DG.COM   (or) meissner%dg-rtp.DG.COM@relay.cs.net

mike@ists.yorku.ca (Mike Clarkson) (09/26/88)

In article <28474@think.UUCP>, rlk@think.com (Robert Krawitz) writes:
> In article <1252@titan.SW.MCC.COM>, janssen@titan (Bill Janssen) writes:
> ]In article <305@talos.UUCP>, kjones@talos (Kyle Jones) writes:
> ]>GNU Emacs Lisp is clearly less complex than full Common Lisp...
> ]"Clearly".  uh-huh.
> 
> The problem is that it's missing a fair number of useful features, and
> there are a few major problems mostly with the reader (lack of reader
> macros, only dynamic scoping, and case sEnSiTiViTy).  Other than that,
> it's quite powerful indeed, and it doesn't seem a lot "simpler"
> conceptually.

Emacs Lisp is very MacLisp'ish, as you would expect as RMS was one of the
original MacLispers.  It therefore shares a lot in common with Franz Lisp,
which was built to run a large MacLisp program (MACSYMA).  There is
a lot of similarities in design between Franz and Emacs Lisp, particularly
the C-code kernel, followed by load and dump.  However, Franz has
a real compiler, that even by today's standards is quite fast.

A great fantasy of mine has always been to merge Emacs into Franz.
This would give a much fuller and better performing lisp, that had a
real compiler.  It might also help keep some (PD) development going on 
Franz.  There are disadvantages to large monlithic images supporting
two different functions, but both Franz and Emacs have autoloading,
so the combined system need not be to big.  There would be great gains
in GC speed and speed of compiled code, not to mention things like
floating point numbers, and a foreign function interface.

Sigh... sometime when I have a spare year just for hacking...

Mike Clarkson					mike@ists.UUCP
Institute for Space and Terrestrial Science	mike@ists.yorku.ca
York University, North York, Ontario,		uunet!mnetor!yunexus!ists!mike
CANADA M3J 1P3					+1 (416) 736-5611

throopw@xyzzy.UUCP (Wayne A. Throop) (09/27/88)

> jr@bbn.com (John Robinson)
>> lars@myab (Lars Pensj|)
>>[...GNU emacs could be more portable if it arranged to initialize its
>>    pre-defined lisp routines via source like so: ...]
>>char lisp_code[] = {
>>23, 45, 76, 93, -34, 45,
>>...
>>};
>>... etc.
> But the problem may be that not all compilers support [...putting
> such objects in the text (that is, shared) section...].  [...]
> Also, signed chars
> (they appear in your example) may be a problem.  But I merely quibble;
> it's a great idea.  Needs elisp symbol-table hooking but not much more.

I agree that the method Lars points out is more portable, and  good
deal cleaner, even with the problems with signed chars and the fact
that it may be unshared data on some systems.  The code could be
generated based on a set of switches to control the signedness of the
range of character values, and most every system has some hack or
another to put unchanging variables into text space, even if it is the
old mouldy standby of tromping on the intermediate assembly code.  The
results of these hacks would still be far more aesthetic than the
massive hacks involved in unexec.

BUT, the real problem solved by unexec that is not solved by source
generation is that some of the values that go into the initialized
area are not known until after link time.  The addresses of primitive
routines, for example.  As long as the lisp object code refers to
absolute addresses (and I suspect it must do so for efficency
reasons), the initializing code cannot be generated for an object yet
to be linked, but only for the current *already* *linked* executable.
Which implies unexec, or some similar subtrefuge.  Most LISP systems
have similar problems.

All that said, I think unexec could be made a good deal cleaner, and
the machine dependancies could be isolated in a much more palatable
way.  But going all the way to generating source is probably Right Out.

-- 
A LISP programmer knows the value of everything, but the cost of nothing.
					--- Alan J. Perlis
-- 
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw

idall@augean.OZ (Ian Dall) (09/29/88)

In article <1231@xyzzy.UUCP> throopw@xyzzy.UUCP (Wayne A. Throop) writes:
>> jr@bbn.com (John Robinson)
>>> lars@myab (Lars Pensj|)
>>>[...GNU emacs could be more portable if it arranged to initialize its
>>>    pre-defined lisp routines via source like so: ...]
>>>char lisp_code[] = {
>>>23, 45, 76, 93, -34, 45,
>>>...
>>>};
>>>... etc.
>> But the problem may be that not all compilers support [...putting
>> such objects in the text (that is, shared) section...].  [...]
>> Also, signed chars
>> (they appear in your example) may be a problem.  But I merely quibble;
>> it's a great idea.  Needs elisp symbol-table hooking but not much more.
>
>I agree that the method Lars points out is more portable, and  good
>deal cleaner, even with the problems with signed chars and the fact
>that it may be unshared data on some systems.
>
>BUT, the real problem solved by unexec that is not solved by source
>generation is that some of the values that go into the initialized
>area are not known until after link time.  The addresses of primitive
>routines, for example.  As long as the lisp object code refers to
>absolute addresses (and I suspect it must do so for efficency
>reasons), the initializing code cannot be generated for an object yet
>to be linked, but only for the current *already* *linked* executable.
>Which implies unexec, or some similar subtrefuge.  Most LISP systems
>have similar problems.

Well, if the "loaded-lisp.c" is last in the list of things linked it
would be OK on most machines. It still wouldn't be portable to machines
which linked things in funny orders. My earlier suggestion of turning
the lisp into real C instead of just a large initialised array would
not have this problem, but can it be made to work? After all don't
most "real" lisps can produce compiled code?

>All that said, I think unexec could be made a good deal cleaner, and
>the machine dependancies could be isolated in a much more palatable
>way.

Gnu emacs makes several non-portable assumtions. Those that spring to mind
are:

(1) Pointers (to lisp objects) are stored in 24 bits. This means that
    machines which are capable of, AND USE, a virtual address space of
    more that 2^24 won't run Gnu emacs. This is pretty much
    independent of the unexec feature.

(2) ld is assumed to load the concatenated .text sections followed by
    the concatenated .data sections. This allows unexec to work out
    the beginning and end of the sections and also to guarantee that
    the pure data is at the beginning of the .data section.

(3) Various kernels make different assumptions about the alignment of
    the .text and .data in an executable file, presumably to simplify
    the paging process. Emacs must guess what these assumptions are
    when creating the unexeced emacs.

(4) Emacs assumes that C static variables go in .bss if uninitialised
    and in .data if initialised. In fact it uses this as a way of
    forcing which variables end up where. I know of one compiler which
    treats uninitialised static variables as if they were initialised
    with zero (and sticks them in .data).

(5) Emacs needs to be able to read its own .text section. Some systems
    could prevent this if the MMU differentiates between read
    protection and execute protection. Systems with different
    instruction and data spaces would be a problem (not that GNU Emacs
    would run on a PDP-11 anyway).

Assumptions 2 and 4 could disappear if unexec did not attempt to put
data into the .text region. An extra conditional might be useful to
say don't try to adjust the .text/.data boundary when unexecing. This
has a penalty in that the pure lisp will not be shared but at least
the speed up in start up time will still be there. Lars solution would
also result in non-shared .data sections at least on BSD machines, and
any attempt to fix this drawback would probably have to make the same
sort of assumtions as unexec. Perhaps unix could use a .pdata (pure
data) section type.

The file format problems alluded to in 4 are more than just an Emacs
problem, they are a unix problem. The COFF file format for SysV is a
step in the right direction, but there are, unfortunatly some system
dependent magic numbers defining the alignment of the sections, which
vary from system to system.  If these were defined in some standard
include file things might be more palatable. This information is
needed by any program development tools which create object files. One
way out of this would be for unexec to create a dirty big assembler
file (consisting entirely of allocation directives) and use the
existing assembler and loader to create the executable file. Of course
the assembler is not exactly portable!

I don't think that there is much that could be done about 5. Is it a
problem?

Disclaimer: I haven't delved into this since version 17 but I don't think
            things have changed significantly.
-- 
 Ian Dall           life (n). A sexually transmitted disease which afflicts
                              some people more severely than others.
idall@augean.oz