lars@myab.se (Lars Pensj|) (09/13/88)
This proposal has probably been mentioned before, but I have not seen it.
Why is gnu-emacs implemented with the self dump feature? I know that
it speeds up the start-up, but it is extremly unportable.
A suggestion:
Let temacs write the compiled lisp code into a file 'code.c' in the following
format:
char lisp_code[] = {
23, 45, 76, 93, -34, 45,
...
};
And then relink a new emacs with 'code.o'. 'temacs' can be linked with
an empty 'lisp_code' defenition.
This should be portable. Byte order should not be any problem, because
you write the file 'code.c' directly from the memory.
Some compilers even have a flag to put data in text area, which is
just what is wanted now.
Lars Pensj|
lars@myab.se
--
Lars Pensj|
{decvax,philabs}!mcvax!enea!chalmers!myab!lars
jr@bbn.com (John Robinson) (09/15/88)
In article <441@myab.se>, lars@myab (Lars Pensj|) writes an excellent idea: >This proposal has probably been mentioned before, but I have not seen it. >... >Let temacs write the compiled lisp code into a file 'code.c' in the following >format: > >char lisp_code[] = { >23, 45, 76, 93, -34, 45, >... >}; >... etc. >Some compilers even have a flag to put data in text area, which is >just what is wanted now. But the problem may be that not all compilers support this. Of course if GCC does, we should all adopt it instantly! Also, signed chars (they appear in your example) may be a problem. But I merely quibble; it's a great idea. Needs elisp symbol-table hooking but not much more. --
idall@augean.OZ (Ian Dall) (09/16/88)
In article <441@myab.se> lars@myab.se (Lars Pensj|) writes: >This proposal has probably been mentioned before, but I have not seen it. > >Why is gnu-emacs implemented with the self dump feature? I know that >it speeds up the start-up, but it is extremly unportable. > >A suggestion: > >Let temacs write the compiled lisp code into a file 'code.c' in the following >format: > >char lisp_code[] = { >23, 45, 76, 93, -34, 45, >... >}; I think there might be a better way. The problem with this is that the compiled lisp ends up in .data. On BSD machines at least the .data section will not be shared so with multiple emacs users there will be multiple versions of the preloaded lisp in memory. (I think SysV avoids this problem by implimenting a copy on write scheme for the .data area but I could be wrong). Would it be possible to translate the preloaded lisp to C in the format of the lisp callable C functions already there. Eg: DEFUN ("kill-emacs", Fkill_emacs, Skill_emacs, 0, 1, "P", "Exit the Emacs job and kill it. ARG means no query.\n\ If emacs is running noninteractively and ARG is an integer,\n\ return ARG as the exit program code.") (arg) Lisp_Object arg; { Lisp_Object answer; int i; . . . } This is of course a harder problem. Would there be fundamental restrictions on the allowable lisp code to do this? With the addition of dynamic linking this would also allow a true compiled E-lisp. >Some compilers even have a flag to put data in text area, which is >just what is wanted now. Does this flag have the side effect of making the .text area unsharable? If so it is not really the way to go. -- Ian Dall life (n). A sexually transmitted disease which afflicts some people more severely than others. idall@augean.oz
lars@myab.se (Lars Pensj|) (09/19/88)
In article <29698@bbn.COM> jr@bbn.com (John Robinson) writes: >In article <441@myab.se>, lars@myab (Lars Pensj|) writes an excellent idea: >>... >>Let temacs write the compiled lisp code into a file 'code.c' in the following >>format: >> >>char lisp_code[] = { >>23, 45, 76, 93, -34, 45, >>... >>}; >... > Also, signed chars >(they appear in your example) may be a problem. I do not think signed versus unsigned chars will be a problem. If you have a machine with only unsigned chars, a compiled program (temacs) will also only write unsigned numbers on the file 'code.c'. Automatically portable ! I put the negative number on purpose in the example, to trigger a discussion, because I am still not sure about the problem with the sign of characters. --- lars@myab.se -- Lars Pensj| {decvax,philabs}!mcvax!enea!chalmers!myab!lars
kjones@talos.UUCP (Kyle Jones) (09/21/88)
In article <441@myab.se> lars@myab.se (Lars Pensj|) writes: >Why is gnu-emacs implemented with the self dump feature? I know that >it speeds up the start-up, but it is extremly unportable. Dumping compiled code also helps keep down GNU Emacs' virtual memory usage (which in turn speeds startup time a bit more.) Code common to all invocations of the editor will be shared among concurrent Emacs sessions, instead of being duplicated in-core. Since GNU Emacs is BIG even for Emacs-style editors, this is a big win. As for portability, let us not forget that GNU Emacs is targeted for the (not yet completed) GNU operating system. As such, it is sufficient that the dump feature work with the BSD executable file format that the GNU system ultimately will use. The real task will be to port the operating system once it is completed. >A suggestion: >Let temacs write the compiled lisp code into a file 'code.c' in the following >format: > >char lisp_code[] = { >23, 45, 76, 93, -34, 45, >... >}; > >And then relink a new emacs with 'code.o'. 'temacs' can be linked >with an empty 'lisp_code' defenition. It's not just the lisp code that needs to be dumped but the entire initilized lisp system e.g. interned symbols, internal pointers to lisp objects, symbol -> function definition relationships, initialized keymaps, and so on. I can't see how this idea can be made to save this information, since all the other .o files with which 'code.o' will be linked already will have all external variables initialized to 0. In article <394@augean.OZ> idall@augean.OZ (Ian Dall) writes: >Would it be possible to translate the preloaded lisp to C in the format of >the lisp callable C functions already there. >... >This is of course a harder problem. Would there be fundamental restrictions >on the allowable lisp code to do this? I have read that the Kyoto Common Lisp compiler does just that. GNU Emacs Lisp is clearly less complex than full Common Lisp so the situation is workable. kyle jones
janssen@titan.sw.mcc.com (Bill Janssen) (09/24/88)
In article <305@talos.UUCP>, kjones@talos (Kyle Jones) writes: >GNU Emacs Lisp is clearly less complex than full Common Lisp... "Clearly". uh-huh. Bill
rlk@think.com (Robert Krawitz) (09/24/88)
In article <1252@titan.SW.MCC.COM>, janssen@titan (Bill Janssen) writes: ]In article <305@talos.UUCP>, kjones@talos (Kyle Jones) writes: ]>GNU Emacs Lisp is clearly less complex than full Common Lisp... ]"Clearly". uh-huh. The problem is that it's missing a fair number of useful features, and there are a few major problems mostly with the reader (lack of reader macros, only dynamic scoping, and case sEnSiTiViTy). Other than that, it's quite powerful indeed, and it doesn't seem a lot "simpler" conceptually. -- harvard >>>>>> | Robert Krawitz <rlk@think.com> bloom-beacon > |think!rlk topaz >>>>>>>> . rlk@a.HASA.disorg
meissner@xyzzy.UUCP (Usenet Administration) (09/26/88)
In article <441@myab.se> lars@myab.se (Lars Pensj|) writes: | Why is gnu-emacs implemented with the self dump feature? I know that | it speeds up the start-up, but it is extremly unportable. After spending a bit of time hacking an unexec to work on Data General MV computers, let me share some observations about GNU: 1) The lisp code smashes addresses + lisp type into one 32-bit word, which looks like (on a big-endian machine): 3 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 +-+-+-------------+----------------------------------------------+ |M|A| Type | Bottom 24 bits of address or lisp integer | +-+-+-------------+----------------------------------------------+ Where the top bit is used as a mark during garbage collection, and the second bit seems to be used also for garbage collection for arrays. The lisp type follows, and the lower 24 bits are dependent on the type, but are typically the bottom 24 bits of the address, or the integer value. 2) Because of the lisp format, you can't really initialize things, since C has no way of getting just the bottom 24 bits of an address. 3) You can't really relink, because addresses might change due to the new module being added. 4) For a word oriented machine like the MV, you have to be careful to store a byte address when creating Lisp Objects, and expect it to be in byte address form, since garbage collection plays funny games with pointers. 5) Gnu emacs also depends on preserving the value of static variables, and some malloc'ed data. 6) The normal unexec code depends on having a dumb linker which will not reorder data segments from the order encountered in the object modules. PS - for any DG/UX readers out there, I will try to make the DG/UX changes available in mid October (I won't be back until the start of October). -- Michael Meissner, Data General. Uucp: ...!mcnc!rti!xyzzy!meissner Arpa: meissner@dg-rtp.DG.COM (or) meissner%dg-rtp.DG.COM@relay.cs.net
mike@ists.yorku.ca (Mike Clarkson) (09/26/88)
In article <28474@think.UUCP>, rlk@think.com (Robert Krawitz) writes: > In article <1252@titan.SW.MCC.COM>, janssen@titan (Bill Janssen) writes: > ]In article <305@talos.UUCP>, kjones@talos (Kyle Jones) writes: > ]>GNU Emacs Lisp is clearly less complex than full Common Lisp... > ]"Clearly". uh-huh. > > The problem is that it's missing a fair number of useful features, and > there are a few major problems mostly with the reader (lack of reader > macros, only dynamic scoping, and case sEnSiTiViTy). Other than that, > it's quite powerful indeed, and it doesn't seem a lot "simpler" > conceptually. Emacs Lisp is very MacLisp'ish, as you would expect as RMS was one of the original MacLispers. It therefore shares a lot in common with Franz Lisp, which was built to run a large MacLisp program (MACSYMA). There is a lot of similarities in design between Franz and Emacs Lisp, particularly the C-code kernel, followed by load and dump. However, Franz has a real compiler, that even by today's standards is quite fast. A great fantasy of mine has always been to merge Emacs into Franz. This would give a much fuller and better performing lisp, that had a real compiler. It might also help keep some (PD) development going on Franz. There are disadvantages to large monlithic images supporting two different functions, but both Franz and Emacs have autoloading, so the combined system need not be to big. There would be great gains in GC speed and speed of compiled code, not to mention things like floating point numbers, and a foreign function interface. Sigh... sometime when I have a spare year just for hacking... Mike Clarkson mike@ists.UUCP Institute for Space and Terrestrial Science mike@ists.yorku.ca York University, North York, Ontario, uunet!mnetor!yunexus!ists!mike CANADA M3J 1P3 +1 (416) 736-5611
throopw@xyzzy.UUCP (Wayne A. Throop) (09/27/88)
> jr@bbn.com (John Robinson) >> lars@myab (Lars Pensj|) >>[...GNU emacs could be more portable if it arranged to initialize its >> pre-defined lisp routines via source like so: ...] >>char lisp_code[] = { >>23, 45, 76, 93, -34, 45, >>... >>}; >>... etc. > But the problem may be that not all compilers support [...putting > such objects in the text (that is, shared) section...]. [...] > Also, signed chars > (they appear in your example) may be a problem. But I merely quibble; > it's a great idea. Needs elisp symbol-table hooking but not much more. I agree that the method Lars points out is more portable, and good deal cleaner, even with the problems with signed chars and the fact that it may be unshared data on some systems. The code could be generated based on a set of switches to control the signedness of the range of character values, and most every system has some hack or another to put unchanging variables into text space, even if it is the old mouldy standby of tromping on the intermediate assembly code. The results of these hacks would still be far more aesthetic than the massive hacks involved in unexec. BUT, the real problem solved by unexec that is not solved by source generation is that some of the values that go into the initialized area are not known until after link time. The addresses of primitive routines, for example. As long as the lisp object code refers to absolute addresses (and I suspect it must do so for efficency reasons), the initializing code cannot be generated for an object yet to be linked, but only for the current *already* *linked* executable. Which implies unexec, or some similar subtrefuge. Most LISP systems have similar problems. All that said, I think unexec could be made a good deal cleaner, and the machine dependancies could be isolated in a much more palatable way. But going all the way to generating source is probably Right Out. -- A LISP programmer knows the value of everything, but the cost of nothing. --- Alan J. Perlis -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
idall@augean.OZ (Ian Dall) (09/29/88)
In article <1231@xyzzy.UUCP> throopw@xyzzy.UUCP (Wayne A. Throop) writes: >> jr@bbn.com (John Robinson) >>> lars@myab (Lars Pensj|) >>>[...GNU emacs could be more portable if it arranged to initialize its >>> pre-defined lisp routines via source like so: ...] >>>char lisp_code[] = { >>>23, 45, 76, 93, -34, 45, >>>... >>>}; >>>... etc. >> But the problem may be that not all compilers support [...putting >> such objects in the text (that is, shared) section...]. [...] >> Also, signed chars >> (they appear in your example) may be a problem. But I merely quibble; >> it's a great idea. Needs elisp symbol-table hooking but not much more. > >I agree that the method Lars points out is more portable, and good >deal cleaner, even with the problems with signed chars and the fact >that it may be unshared data on some systems. > >BUT, the real problem solved by unexec that is not solved by source >generation is that some of the values that go into the initialized >area are not known until after link time. The addresses of primitive >routines, for example. As long as the lisp object code refers to >absolute addresses (and I suspect it must do so for efficency >reasons), the initializing code cannot be generated for an object yet >to be linked, but only for the current *already* *linked* executable. >Which implies unexec, or some similar subtrefuge. Most LISP systems >have similar problems. Well, if the "loaded-lisp.c" is last in the list of things linked it would be OK on most machines. It still wouldn't be portable to machines which linked things in funny orders. My earlier suggestion of turning the lisp into real C instead of just a large initialised array would not have this problem, but can it be made to work? After all don't most "real" lisps can produce compiled code? >All that said, I think unexec could be made a good deal cleaner, and >the machine dependancies could be isolated in a much more palatable >way. Gnu emacs makes several non-portable assumtions. Those that spring to mind are: (1) Pointers (to lisp objects) are stored in 24 bits. This means that machines which are capable of, AND USE, a virtual address space of more that 2^24 won't run Gnu emacs. This is pretty much independent of the unexec feature. (2) ld is assumed to load the concatenated .text sections followed by the concatenated .data sections. This allows unexec to work out the beginning and end of the sections and also to guarantee that the pure data is at the beginning of the .data section. (3) Various kernels make different assumptions about the alignment of the .text and .data in an executable file, presumably to simplify the paging process. Emacs must guess what these assumptions are when creating the unexeced emacs. (4) Emacs assumes that C static variables go in .bss if uninitialised and in .data if initialised. In fact it uses this as a way of forcing which variables end up where. I know of one compiler which treats uninitialised static variables as if they were initialised with zero (and sticks them in .data). (5) Emacs needs to be able to read its own .text section. Some systems could prevent this if the MMU differentiates between read protection and execute protection. Systems with different instruction and data spaces would be a problem (not that GNU Emacs would run on a PDP-11 anyway). Assumptions 2 and 4 could disappear if unexec did not attempt to put data into the .text region. An extra conditional might be useful to say don't try to adjust the .text/.data boundary when unexecing. This has a penalty in that the pure lisp will not be shared but at least the speed up in start up time will still be there. Lars solution would also result in non-shared .data sections at least on BSD machines, and any attempt to fix this drawback would probably have to make the same sort of assumtions as unexec. Perhaps unix could use a .pdata (pure data) section type. The file format problems alluded to in 4 are more than just an Emacs problem, they are a unix problem. The COFF file format for SysV is a step in the right direction, but there are, unfortunatly some system dependent magic numbers defining the alignment of the sections, which vary from system to system. If these were defined in some standard include file things might be more palatable. This information is needed by any program development tools which create object files. One way out of this would be for unexec to create a dirty big assembler file (consisting entirely of allocation directives) and use the existing assembler and loader to create the executable file. Of course the assembler is not exactly portable! I don't think that there is much that could be done about 5. Is it a problem? Disclaimer: I haven't delved into this since version 17 but I don't think things have changed significantly. -- Ian Dall life (n). A sexually transmitted disease which afflicts some people more severely than others. idall@augean.oz