[comp.arch] Swizzling

wilson@uicbert.eecs.uic.edu (Paul Wilson) (02/14/91)

I don't know if 64-bit addressing is a good idea in general, and I suspect 
it's a *bad* idea for languages like Lisp and Smalltalk, where the standard
implementation technique is to make everything the size of a pointer,
so that you can store either a pointer or a tagged "immediate" value
anywhere.  Going to 64-bit addresses would *double* your memory
requirements.  There are much better ways to spend transistors.

It's possible to implement 64-bit (or 100 bit, or whatever) addresses
efficiently on standard 32-bit hardware by performing address translation
at _at_page_fault_time_ -- you "swizzle" (translate) big addresses into 
small ones so that running programs only ever see "normal" hardware-supported
addresses.

This scheme *can* support more RAM than the addresses specify, if you
use a kind of page-switched architecture.  (A form of bank switching,
where every RAM page knows whether it is mapped into the address
space, and where.)  Programs (and programmers) don't have to see this 
page-switching.  Pointers are swizzled and pages are switched at 
page-fault time.  (This scheme is a recent development, and wasn't 
in the tech report on swizzling that I sent out to a lot of people 
who requested it.)  [By the way, everybody who asked more than a
week ago -- 75 people! -- should have gotten it by now.]


With swizzling, you run out of address space every now and then
(after touching around a gigabyte or so of pages), and you
have to use faulting to re-map memory.  There's some overhead at page 
fault time, but it's not like a real disk seek.  It's probably not a lot, 
especially if there's any reasonable locality, since then you don't
really do it that often compared to the huge number of instructions you're
executing in between.

This can be seen as the RISC philosophy applied to address translation.
Usually there's enough locality that a 32-bit address can specify
plenty of locations to keep the processor happy.  Every now and then,
you have to adjust the address space seen by the 32-bit processor
(so that it can actually see more memory than that).  You *don't*
do it by just making the processor bigger and fatter, you do it
by handling the unusual case in software.

For Lisp & Smalltalk, even if you have a 64-bit bus, I suspect you'd be 
*much* better off fetching two 32-bit words (e.g., whole cons cells) at 
a time.  As for type tags, using the two low-order bits as a tag works 
fine if you put the rest of the type information in object headers;  
that's how most people do it now.  Big-bag-of-pages schemes are not so 
popular, partly because they make a generational garbage collector a lot 
more complicated.

Even if you can afford 64-bit chips, you may not want them.  A 32-bit
chip can probably be made faster, if only by using faster materials
with a lower level of integration.  The extra overhead for the occasional
swizzling and page-switching is probably well worth it.

I'm not really clear on what the "optimal" hardware address size is,
but I'd bet that for Lisp and Smalltalk, it's *not* 64 bits.

Of course, if you want a bunch of bits to specify a huge number of
addresses across a parallel machine, you may want larger addresses;
but then you're really using a lot of those bits as processor numbers.

   -- Paul


Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 

-- 
Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 

lamaster@pioneer.arc.nasa.gov (Hugh LaMaster) (02/14/91)

In article <1991Feb13.170045.16864@uicbert.eecs.uic.edu> wilson@uicbert.eecs.uic.edu (Paul Wilson) writes:
>I don't know if 64-bit addressing is a good idea in general, and I suspect 
:
:
:
>anywhere.  Going to 64-bit addresses would *double* your memory
>requirements.  There are much better ways to spend transistors.

It's OK.  Since you will need more than 32 bits to address the physical memory
in your 1998-era personal computer, you haven't lost anything!     1/2  :-)


>This scheme *can* support more RAM than the addresses specify, if you
>use a kind of page-switched architecture.  (A form of bank switching,

>page-fault time.  (This scheme is a recent development, and wasn't 
>in the tech report on swizzling that I sent out to a lot of people 
>who requested it.)  [By the way, everybody who asked more than a
>week ago -- 75 people! -- should have gotten it by now.]

As far as I can tell, from your description, and, from the papers which
you sent (thanks!), the maximum size of any particular object is, however, the
maximum amount of virtual memory which you can map.  This is fine for
your applications, apparently, which consist of zillions of little objects,
most of which are less than a page in size.  But, it isn't fine for my
user's applications, which require a few large objects, namely, linearly
addressed arrays of 10-20% of the size of whatever physical memory is
available.  So, once physical memories are, say, > 10GB, you limit the
size of the application that I can run artificially.  Of course, there
are techniques for addressing really large objects, but these always run
into an efficiency problem when accessing arrays in multiple directions.
(In other words, useless :-)


>Even if you can afford 64-bit chips, you may not want them.  A 32-bit
>chip can probably be made faster, if only by using faster materials

MIPSCo claims that it will do 64 bit address generations without penalty.
Whether or not this turns into a real hardware product, there are machines
in existence which require 3 cycles just to generate a *32* bit address.
The penalty for 64 bits will probably be manageable.

>I'm not really clear on what the "optimal" hardware address size is,
>but I'd bet that for Lisp and Smalltalk, it's *not* 64 bits.

For Smalltalk, it could waste some memory, based on your description of memory 
allocation, compared to the swizzled address scheme.  The question to answer is:
do you expect *only* a large number of small objects, or, do you need to handle
the case of a small number of large objects efficiently, too?  The history
of computing is littered with the bones of segmented architectures, which
could handle the first case just fine, but not the second, by the current
standards of their day.  Are *you* going to tell your users that they can 
buy a machine with 128 GBytes of memory, but the biggest array they can use is 
4 GBytes?  (128 GBytes is what I expect to see on a large machine built 
out of 64 MBit chips.)  I wouldn't want to tell my users that!  :-)

>Of course, if you want a bunch of bits to specify a huge number of
>addresses across a parallel machine, you may want larger addresses;

Or, a "huge" number of physical memory addresses in a single machine.  :-)

What seems huge to one person is just a factor of two increase in resolution
in a 3-D model to someone else...

******

P.S.:   - Quote of the Day -

"By 1995, a common desktop machine will have 1,000 MBytes of main memory, 
1,000 SPECmarks of performance, 10,000 MBytes of secondary storage, a
100 MByte network connection, a homogeneous 64-bit address space,
wide-area shared memory, and a vastly richer interface environment."

John Gage and Bill Joy, Unix Today, Feb. 4, 1991 issue.


  Hugh LaMaster, M/S 233-9,  UUCP:                ames!lamaster
  NASA Ames Research Center  Internet:            lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035    With Good Mailer:    lamaster@george.arc.nasa.gov 
  Phone:  415/604-6117       

eliot@cs.qmw.ac.uk (Eliot Miranda) (02/15/91)

In article <1991Feb13.170045.16864@uicbert.eecs.uic.edu> wilson@uicbert.eecs.uic.edu (Paul Wilson) writes:
>I don't know if 64-bit addressing is a good idea in general, and I suspect 
>it's a *bad* idea for languages like Lisp and Smalltalk, where the standard
>implementation technique is to make everything the size of a pointer,
>so that you can store either a pointer or a tagged "immediate" value
>anywhere.  Going to 64-bit addresses would *double* your memory
>requirements.  There are much better ways to spend transistors.
>
In a rather pedantic mode (appologies all) this is not quite so!
In typical Smalltalk-80 systems many objects are bit objects
(e.g. strings, symbols, bytecodes, bitmaps).  When Smalltalk-80 implementations
went from 16-bit pointers to 32-bit pointers the standard V2.0 image file grew
from 598016 bytes (seriously folks!) to about 1 megabyte.

I've just measured my image (whilst reading your message) and for a system
with 40854 objects 899884 bytes were taken up by 32bit pointers, 702028 bytes
by bit objects, and 163416 bytes by 4 byte object headers (40854*4).

The average length in bytes of bit objects was 58.639 excluding the display
screen's bitmap, (62.3648 bytes including the display screen's bitmap).
Each bit object is padded at the end to a 32 bit boundary. The space wasted
by the padding was 23737 bytes.  If this were a 64 bit system padding to a 
64 bit boundary, the wasted space would be 56447 bytes.

So my image file should grow from about
	40854 * 8 (8 byte object table entry per object) +
	899884 +
	702028 +
	163416 bytes = 2092160
to
	40854 * 16 +
	899884 * 2 +
	702028 + 56447 - 23737 +
	163416 * 2 bytes = 3515002 bytes

an expansion of 1.68. (Which is depressingly close to 2).

But think of the new immediate values you could have! With 8 bit tags:
Points(3529),
Floats(291),
<= 7 byte symbols (1067 symbols size <= 7 vs 4886 >= 8 in this image).
<= 7 byte immutable strings (770 vs 1389)
<= 7 byte bytecode arrays (also immutable) (1753 vs 4302)

So thats at least
	3529 * (16 (OTEntry) + 16 (x&y oops) + 8 (header)) +
	291 * (16 (OTEntry) + 8 (value) + 8 (header)) +
	(1067+770+1753) * (16 (OTEntry) + 8 (data) + 8 (header))
less bytes = 265352

So now expansion is from 2092160 to (3515002-265352) which is a factor of 1.55.

That was fun :-)
-- 
Eliot Miranda			email:	eliot@cs.qmw.ac.uk
Dept of Computer Science	Tel:	071 975 5229 (+44 71 975 5229)
Queen Mary Westfield College	ARPA:	eliot%cs.qmw.ac.uk@nsf.ac.uk	
Mile End Road			UUCP:	eliot@qmw-cs.uucp
LONDON E1 4NS

pcg@cs.aber.ac.uk (Piercarlo Grandi) (02/15/91)

On 14 Feb 91 21:02:19 GMT, eliot@cs.qmw.ac.uk (Eliot Miranda) said:

eliot> In article <1991Feb13.170045.16864@uicbert.eecs.uic.edu>
eliot> wilson@uicbert.eecs.uic.edu (Paul Wilson) writes:

wilson> I don't know if 64-bit addressing is a good idea in general, and
wilson> I suspect it's a *bad* idea for languages like Lisp and
wilson> Smalltalk, where the standard implementation technique is to
wilson> make everything the size of a pointer, so that you can store
wilson> either a pointer or a tagged "immediate" value anywhere.  Going
wilson> to 64-bit addresses would *double* your memory requirements.

Not quite, the problem is rather with *some* Lisp implementations that
do use pair-sized cells for everything. Many Lisps (for example those
built with copying collectors can do it easily) also use variable sizes
allocations, with very small overheads, and all Smalltalk systems use
variable sized segments as explained below:

eliot> In a rather pedantic mode (appologies all) this is not quite so!
eliot> In typical Smalltalk-80 systems many objects are bit objects
eliot> (e.g. strings, symbols, bytecodes, bitmaps).  When Smalltalk-80
eliot> implementations went from 16-bit pointers to 32-bit pointers the
eliot> standard V2.0 image file grew from 598016 bytes (seriously
eliot> folks!) to about 1 megabyte.

eliot> [ ... assume pointers, headers, etc double in size but not
eliot> variable sized segments, then get data and so calculations ... ]
eliot> an expansion of 1.68. (Which is depressingly close to 2).

eliot> [ ... but actually one could have as immediate values small sized
eliot> segments, so with more data and calculations ... ] now expansion
eliot> is from 2092160 to (3515002-265352) which is a factor of 1.55.

Actually these large expansion factors (not quite 2, but as you remark
fairly large) need not apply everywhere. Smalltalk regrettably is (like
virtually every other OO or 'AI' language) reference based. If it
*allowed* object composition by contiguity instead as well, as most less
advanced languages do, then many less pointers would be needed. Some
statistics show that sharing of subobjects is fairly rare is some major
applications, so they could be profitably inlined.

eliot> That was fun :-)

Not just fun, terribly useful. Both the data and the calculations.
Thanks for both.

--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

sritacco@hpdmd48.boi.hp.com (Steve Ritacco) (02/16/91)

> I don't know if 64-bit addressing is a good idea in general, and I suspect
> it's a *bad* idea for languages like Lisp and Smalltalk, where the standard
> implementation technique is to make everything the size of a pointer,
> so that you can store either a pointer or a tagged "immediate" value
> anywhere.  Going to 64-bit addresses would *double* your memory
> requirements.  There are much better ways to spend transistors.

This doesn't seem very well thought out to me.  64 bits is a nice healthy 
amount of room for a cons cell or an immediate value.  It was pretty common
to steal the top few(or bottom) bits of a word for a tag but with 64 bits
you can do that without sacrificing precision.  What about applications of
clever crd codeing schemes which could give access to a huge address space
but efficiently describe common addresses with smaller offsets.  

I think 64 bits opens up a whole new world of possibilities, and we must be
careful to re-think implementations, not just re-implement them.

wilson@uicbert.eecs.uic.edu (Paul Wilson) (02/17/91)

sritacco@hpdmd48.boi.hp.com (Steve Ritacco) writes:

>> I don't know if 64-bit addressing is a good idea in general, and I suspect
>> it's a *bad* idea for languages like Lisp and Smalltalk, where the standard
>> implementation technique is to make everything the size of a pointer,
>> so that you can store either a pointer or a tagged "immediate" value
>> anywhere.  Going to 64-bit addresses would *double* your memory
>> requirements.  There are much better ways to spend transistors.

  [ NOTE (added later): I acknowledge Eliot Miranda's criticism of the above, 
    that it doesn't really *double* your memory needs, just increases it by 
    nearly 70%.  That's bad enough for me... ]

>This doesn't seem very well thought out to me.  64 bits is a nice healthy 
>amount of room for a cons cell or an immediate value.

I'm not sure I get this.  It sounds like you're saying you can put an
immediate or a cons cell in 64 bits.  But since a cons cell has two
fields (CAR and CDR) that have to be able to hold tagged pointers (or
immediates), a cons cell MUST be at least twice the size of a pointer.
(Discounting compression schemes like cdr-coding, or my own page-fault-
time compression scheme.)  I agree that 64 bits is a nice size for
a cons cell, but that means it's *not* a nice size for a pointer.  You
want 32-bit addresses so you can grab two of them at once if you have
a 32-bit bus.  And you want pointer swizzling to give you a large
address space anyway.  (And you want page-fault time compression to
reduce the memory and disk requirements even further.

>                                                      It was pretty common
>to steal the top few(or bottom) bits of a word for a tag but with 64 bits
>you can do that without sacrificing precision.  What about applications of
>clever crd codeing schemes which could give access to a huge address space
>but efficiently describe common addresses with smaller offsets.  

Two-bit low tagging works great on byte-addressed machines.  Any other
information can always be put in object headers, rather than in pointer
tags.  I don't think it's a big win to have more bits available for
tag use.  Cdr-coding is dead because it's slow, and requires a custom
architecture to support it efficiently, which Isn't Likely.  (Look at
Symbolics... sigh.)  Page-fault time compression is better, but I still
don't think you want 64-bit architectures for Lisp or Smalltalk.  There's
a lot of small (but nontrivial) wins from 32-bit machines instead of
64-bit machines, like larger effective on-chip cache size.  Lisp and
Smalltalk want *big* (and set-associative) caches.

>I think 64 bits opens up a whole new world of possibilities, and we must be
>careful to re-think implementations, not just re-implement them.

I agree.  But we also need to rethink the whole idea of 64-bit architectures.
Pointer swizzling on 32-bit architectures is an idea we should look at
too.  [ I agree with Hugh LaMaster that you have restrictions on object
sizes, i.e., gigabyte arrays cause trouble.  But it's not clear that
there isn't some cleverer way around that than simply doubling the
word size.  And it's a lot nicer than simple segmenting, for most purposes,
because programs and programmers only have to see one kind of pointer, etc.
And you only need a 32-bit hardware bus. ]

   -- Paul

-- 
Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 

wilson@uicbert.eecs.uic.edu (Paul Wilson) (02/18/91)

pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

>On 14 Feb 91 21:02:19 GMT, eliot@cs.qmw.ac.uk (Eliot Miranda) said:

>eliot> In article <1991Feb13.170045.16864@uicbert.eecs.uic.edu>
>eliot> wilson@uicbert.eecs.uic.edu (Paul Wilson) writes:

>wilson> I don't know if 64-bit addressing is a good idea in general, and
>wilson> I suspect it's a *bad* idea for languages like Lisp and
>wilson> Smalltalk, where the standard implementation technique is to
>wilson> make everything the size of a pointer, so that you can store
>wilson> either a pointer or a tagged "immediate" value anywhere.  Going
>wilson> to 64-bit addresses would *double* your memory requirements.

>Not quite, the problem is rather with *some* Lisp implementations that
>do use pair-sized cells for everything. Many Lisps (for example those
>built with copying collectors can do it easily) also use variable sizes
>allocations, with very small overheads, and all Smalltalk systems use
>variable sized segments as explained below:

That's not what I was talking about.  I *wasn't* talking about only
being able to allocate memory in pair-sized units.  (My own garbage
collector certainly doesn't have this limitation -- see my 1989
OOPSLA paper if you're interested.)

I was just saying that pairs have to be able to hold two pointers,
so that (barring compression), they have to be bigger than the
pointer (address) size.  Twice as big or more, in the general
case.  So you can use 64-bit pairs, or 64-bit pointers, but not both.

Eliot is still right, however, that some objects (like string and bitmap
objects) contain fields that can't hold a pointer.  This is a ubiquitous
efficiency hack.  These objects therefore needn't expand, so we don't
really double memory requirements.

>Actually these large expansion factors (not quite 2, but as you remark
>fairly large) need not apply everywhere. Smalltalk regrettably is (like
>                                                   ^^^^^^^^^^^
>virtually every other OO or 'AI' language) reference based.

   Them's fightin' words :-)

>*allowed* object composition by contiguity instead as well, as most less
>advanced languages do, then many less pointers would be needed. Some
>statistics show that sharing of subobjects is fairly rare is some major
>applications, so they could be profitably inlined.

   Right, sharing is comparatively rare, but when it's used, it's *very*
*very* useful.  The nice thing is that you don't have to rewrite your
code if you decide something should be shared -- you don't have to
"de-inline" it.  It's no concidence that the first gc'd language (LISP), and 
the most completely object-oriented languages (Smalltalk and Self) have 
uniform-sized value cells for most purposes, into which you can shove pretty 
much anything.  Even strongly-typed polymorphic languages tend toward this.
It's handy to be able to change sharing patterns when you need to, without
redefining all of your object types to un-inline their components.  You
can write much more general and reusable code.  An object is an object is
an object (and other hype).

  This is not to say I'm against contiguous compound objects;  I'm just
saying they have a downside in terms of generality.

>eliot> That was fun :-)

>Not just fun, terribly useful. Both the data and the calculations.
>Thanks for both.

  Yes, thanks.

>--
>Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
>Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
>Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

  -- Paul
-- 
Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680 

sxc@se2.dsto.oz (Stephen Crawley) (02/21/91)

pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:

[... about how going from 32 to 64 bit pointers hits Smalltalk etc]

>Actually these large expansion factors (not quite 2, but as you remark
>fairly large) need not apply everywhere. Smalltalk regrettably is (like
>virtually every other OO or 'AI' language) reference based. If it
>*allowed* object composition by contiguity instead as well, as most less
>advanced languages do, then many less pointers would be needed. Some
>statistics show that sharing of subobjects is fairly rare is some major
>applications, so they could be profitably inlined.

Adding explicit inlining of data structures to a typical reference
based language (RBL) would destroy many of the properties that make it
so advanced!

If you add constructs that allow the programmer to declare static
(non-reference) variables and fields:

  1) You need an "address-of" operator:

     o if you don't have one, statics are not first-class

     o the address-of operator may be implicit e.g. use
       call-by-reference when passing statics as function args

     o if you have address-of (implicit or explicit) you have
       to get local frames containing statics from the heap.
       If you don't do this, the language will have dangling 
       pointer problems ... baaaad!

  2) The programmer is encouraged to spend too much effort on
     storage efficiency when this is largely irrelevant in most 
     cases.  [What costs more; memory or programmer time?]

  3) The computation model is more complex:

     o address-of gives an extra source of aliasing; another 
       source of errors => less reliable programs

     o programs are harder to reason about.

  4) The language syntax is larger and messier.


The alternative is to have the compiler/linker infer which variables
and fields in a RBL program can be inlined without changing the
program's semantics.  Unfortunately this optimisation would need to be
done globally to get much benefit.  Hence we can't use it in an
incremental or persistent environment.  [That covers most "advanced"
languages ... sigh]  At least this isn't unpleasant for the programmer.

In addition, inlining has hidden performance penalties;

  1) You have to allocate local frames on the heap.  This costs
     CPU on procedure entry / exit, extra GC overheads, extra
     memory.  Clever optimisation can avoid some of this, at the
     cost of compilation time and compiler complexity.  [Of course 
     many RBL's start out with local frames on the heap!]

  2) The garbage collector must understand pointers into the middle
     of objects.  This costs CPU time and maybe memory too.  

  3) You can end up with uncollectable garbage.  E.g. if you pass
     a reference to a static field of an object, the entire object 
     must be retained by the GC.  [OK, so the programmer / optimiser
     shouldn't have inlined that field.  But that's just making
     his / her / its job harder!]

-- Steve