[gnu.emacs.bug] 24 bit pointers

mac@rhea.ardent.com (Mike McNamara) (01/02/90)

	As RMS asked that requests for long term features go here, I
wish to echo requests for support for greater than 16Mbyte addressing.

	Most of the machines I use at work have 128Mbyte of physical
memory.  My main task right now is verifing new chip designs, which we
do via a simulation language, and test scripts.  The output of these
scripts can easily get up to 20-25 megabytes.  I have to use vi to
look at these files!!!!  This is an embarassment.

	I am quite aware of the memory limitations of the current
memory allocation scheme of emacs; I used to use a Sun 3/50. I would
strongly support any effort to remedy the memory behavoiur of emacs.
However, I eliminated this problem for me personally to a large extent
by porting emacs to the Ardent machine, and using it to run emacs.

	I would be quite willing to suffer making pointers 64 bits,
with 32 bits of tag information. People, even emacs mavens, who have
emacs running for weeks, might have half a million Lisp Objects
running around.  Better to require a million bytes to describe these
objects, then arbitarily limit the total size of all the objects
described to 16 Mbytes.

	Especailly when there are 4 gigabytes of addressable space!!


--
Michael McNamara	(St)ardent, Inc.		mac@ardent.com

lnz@LUCID.COM (Leonard N. Zubkoff) (01/02/90)

The existing GNU Emacs is nowhere near as limited as many people seem to think.
I build versions for Apollo, Sun, Vax, and MIPS and all are built to allow 26
bits of pointer.  Admittedly, 26 bits is only 64mb rather than 16mb, but in
practice that limit is much harder to exceed than the 24 bit/16mb limit.  All
you need to do to build a 26 bit Emacs is to include the following two lines
in config.h:

/* Allow Emacs's larger than 16 megabytes.  */

#define VALBITS 26
#define GCTYPEBITS 5

In order to increase the sizes further, I don't think moving to a 64 bit object
size is optimal.  A better idea would be to allow 32 bits or close to it of
pointer, and to keep the type information in a separate mapping table.

		Leonard

rlk@think.com (Robert Krawitz) (01/02/90)

Good tip, changing the allocation of tag/value bits.

Actually, as I recall Lucid used 30 bit integers with 2 bits of tag in
2.1 (I don't remember if this applies in 3.0, but I think not).  I agree
with the concept of making integers fast and compact, since they're used
often enough so that they shouldn't require a pointer dereference every
time.  On the other hand, there's no good reason why buffers, lists,
arrays, etc. need this sort of optimization; one extra word of memory
and one pointer dereference isn't so bad.  And if lists are stored
cdr-coded, then the extra memory gets swallowed.

Along these lines, if 2 bits are allocated as tag bits, then 30 bits (1
Gbyte) of address space is left.  On future machines with bigger address
spaces, the number of value bits is always n - 2.

Whatever a word should look like -- say, if bits 30 and 31 are zero, the
word is a fixnum (this allows signed 30 bit integers, for a range of +-
2^29).  If bit 31 is zero and bit 30 is one, the word is an untyped
pointer (i.  e. not a Lisp object).  The other two cases would be Lisp
objects (perhaps one case would be anything but a cons, and the other
case would be a cons, to allow cdr-coding somehow).  For lisp objects,
the type information might be encoded in the rest of the word, with the
data (or in the case of arrays, size information followed by data) would
be stored in the following words.
-- 
ames >>>>>>>>>  |	Robert Krawitz <rlk@think.com>	245 First St.
bloom-beacon >  |think!rlk	(postmaster)		Cambridge, MA  02142
harvard >>>>>>  .	Thinking Machines Corp.		(617)876-1111

kjones@talos.uu.net (Kyle Jones) (01/03/90)

Robert Krawitz writes:
 > I agree with the concept of making integers fast and compact, since
 > they're used often enough so that they shouldn't require a pointer
 > dereference every time.

I'm not so sure it's worth it.  Lisp variable that are used internally
and whose values are normally integers have their values forwarded to
real C ints internally anyway, so there's no overhead except when the
Lisp variables value is changed.  As for the interpreted Lisp, the extra
dereference is small potatoes compared to all the type checking and arg
grokking that goes on while interpreting code.

On the other hand, stealing bits from Lisp ints means dates from the
epoch can't be represented in a single integer.  Thus the file times
returned by file-attributes are difficult to use because relational
operator won't work on them.

wolfgang@mgm.mit.edu (Wolfgang Rupprecht) (01/03/90)

Perhaps I'm missing something, but from a quick glance at the code
(especially lisp.h) it looks like the tag bits could be moved from the
"lisp object" pointer/tag to the pointed to object with almost no
trouble.  The only real loss apears to be the nifty hack for lisp
integer types.  Here a real struct would have to be created which
would hold a tag and the actual integer.  In all other cases each
lisp-data struct would need a tag byte+padding preceeding the current
lisp-data struct.  GC bits could still stay with the tag.

old way:

(32 bit lisp object, basically a tag and 24-bit pointer)
8-bits | 24-bits 
tag      pointer to "real" data

proposed way to get 32 bit address, 32 bit integers:
32 bits
 pointer -> to tag + "real data"

Tags would have to be in the same position in each data struct.  Lets
say, the first byte of every lisp struct.

Most changes to the emacs code would be confined to lisp.h.  Anyone 
want to try it? (Or donate a computer for a while so I can do it?)

-wolfgang

PS. While we are at, it bignums anyone?
Wolfgang Rupprecht	ARPA:  wolfgang@mgm.mit.edu (IP 18.82.0.114)
TEL: (703) 768-2640	UUCP:  mit-eddie!mgm.mit.edu!wolfgang

gumby@Gang-of-Four.Stanford.EDU (David Vinayak Wallace) (01/04/90)

Since gnu emacs runs on conventional hardware why not adopt BIBOP?
It's extremely cheap.  I think someone already suggested this.

g

PS: For those who haven't heard of it, under BIBOP you allocate
    regions of memory to specific datatypes.  You can think of this as
    "overloading" the pointers.  This was especially cute and cheap on
    the PDP-10, but any machine with a barrel shifter can implement it
    cheaply too.