[comp.lang.smalltalk] why tagged immediate floats are good

wilson@carcoar.Stanford.EDU (Paul Wilson) (03/11/90)

I've gotten a lot of mail about *how* to do tagged immediate floats, but
also some asking *why* it's important.  A couple of people thought
that a generational gc should *alleviate* the problem of heap-allocated 
floats considerably;  they wanted to know why I thought the opposite.

It turns out that heap-allocated floats can be a BIG problem for a 
generational gc because they can violate the assumptions that the 
generational strategy depends on.

Generational gc's assume that most objects are short-lived, and that few
pointers are created that point from an object in an older generation
to an object in a younger generation.  Heap-allocating float values
can violate this assumption because a lot of unnecessary objects are
created *and* pointers to those objects are often stored into older
objects.

Consider the case of updating every element of a million-element 
(4-megabyte) array.  (Perhaps you're incrementing each element by 
the corresponding element of another array, or whatever.)

In this case, you allocate several words on the heap for each element
of your four-megabyte array -- probably 8 megabytes or so.  This will
cause your youngest generation to fill and be scavenged several times,
unless it is huge.

Not only will you perform several extra scavenges, but all of those
objects that you created will survive the scavenges, because they
are pointed to from the array in older memory.  That means you not
only allocated them, but you copy them at least once, to get
them out of the youngest generation.  So for each float value you
update, you spend a handful of instructions to heap allocate it
and at least another handful to copy what you allocated.


And things can get worse.  A generational gc must keep track of all
pointers from older generations into younger ones.  (This is
necessary to ensure that all live objects in the younger
generations can be found at scavenge time, and all pointers
to them can be updated.)

If you use the simple strategy of keeping a list ("remembered set") 
of all of the objects you've stored such pointers into, you can
run into trouble.  In the case of updating our million-element array, 
we'll have to scan the whole array at every scavenge that the extra
allocation forces.  That means scanning four megabytes every time
we allocate up through the youngest generation (probably a quarter
or a half megabyte of allocation.)  This not only increases processing
overhead, but is likely to cause thrashing.  

Tektronix Smalltalk's Large Object Space alleviates this problem
for many programs by managing large objects specially.  (See
Caudill & Wirfs-Brock's OOPSLA 86 paper and Ungar and Jackson's
OOPSLA 88 paper.)  My gc is also less sensitive to this sort
of thing because it keeps dirty bits for small fixed-size areas
of memory ("cards") rather than tracking objects.  (See my OOPSLA
89 paper.)

But neither of these tricks always works.  The large object space
doesn't work if you have a large data structure made up of small
objects, which gets promoted into an older generation and then
a lot of the objects get updated.  (This is a variant of the
"pig-in-the-snake" problem, to use Jon L White's term.)  And
in a really bad case, card marking may not help either -- your
cards may not be smaller than your objects.  So really bad
locality of writes can be bad news.

Note that a smart compiler, which keeps intermediate floating-point
values in registers or on a special stack (a la Maclisp or S-1 Lisp)
will *not* help a lot with this problem.  The floats that cause this
problem are not the intermediate values in an expression, they're
the results that get stored back into the heap. 


The upshot is that it's preferable to use immediate values when
you can, rather than creating heap objects that may violate the
assumptions of the gc.  If you can update a float value in
place, do it.

This seems to me particularly important because float-intensive
programs often *do* use lots of them in *big* data structures.

One tried-and-true way to punt on this at the language level
is to provide primitive homogeneous arrays. (e.g., array-of-double)

But that requires that the user do un-Lispy or un-Smalltalky
things -- writing FORTRAN-like programs.  And it won't work
for arbitrary non-array objects that just happen to have a float
in some slot.  If a user just uses a float because something
is conceptually a real number, but doesn't need really
high precision and fast math, it shouldn't choke the gc.

So my philosophy is to provide short immediate floats for
such "casual" purposes.  Then the issue becomes one of whether
to sacrifice a little precision to make room for a tag, or
to sacrifice a little speed to allow compacting most of them
without loss of precision.  

    -- Paul

Paul R. Wilson                         
Software Systems Laboratory               lab ph.: (312) 996-9216
U. of Illin. at C. EECS Dept. (M/C 154)   wilson@bert.eecs.uic.edu
Box 4348   Chicago,IL 60680

chao@sunflower.crd.ge.com (William Chao) (03/19/90)

In the paper "Basic Polymorphic typechecking," by Luca Cardelli,
Science of Computer Programming 8 (1987) pp. 147-172. On page 150:
-----------------------------------------------------------------------
Polymorphism in language comes from the interaction of two contrasting
programming language design goals: static typing and reusability.
             :
             :
Polymorphic type systems try to reconcile these two goals
by providing all the safety of statically typed languages,
and most (but not all) the flexibility of untyped languages.
-----------------------------------------------------------------------
My 1st question: Why he says "but not all?"
2nd question: It seems that the "full" flexibility of untyped languages
              will never be achieved by static typing, then static binding.
	      Does this imply that we got to have dynamic binding to accomplish
	      the full reusability?
3rd question: People (especially real-time systems) are against dynamic binding
	      because it is too slow. Some people use the same reason (too
	      slow) to against object-oriented programming. Can we do somethings
	      about this?
I will appreciate whoever in this news group gives me the answer.

lgm@cbnewsc.ATT.COM (lawrence.g.mayka) (03/19/90)

In article <6143@crdgw1.crd.ge.com> chao@sunflower.crd.ge.com (William Chao) writes:
>In the paper "Basic Polymorphic typechecking," by Luca Cardelli,
>Science of Computer Programming 8 (1987) pp. 147-172. On page 150:
>-----------------------------------------------------------------------
>Polymorphic type systems try to reconcile these two goals
>by providing all the safety of statically typed languages,
>and most (but not all) the flexibility of untyped languages.
>-----------------------------------------------------------------------
>My 1st question: Why he says "but not all?"

Languages without mandatory compile-time typing, such as Common
Lisp, are able to manipulate information elements of
heterogeneous, unrelated types in a uniform manner.  A Common Lisp
list, for example, can contain elements of completely different
types - an integer, a complex number, a function, another list,
etc. - and operate on them all in a uniform fashion.  All types
are reusable to the programmer of the list, and the list is
reusable to writers of new types in the future.

>2nd question: It seems that the "full" flexibility of untyped languages
>              will never be achieved by static typing, then static binding.
>	      Does this imply that we got to have dynamic binding to accomplish
>	      the full reusability?

In my opinion, *yes*.

>3rd question: People (especially real-time systems) are against dynamic binding
>	      because it is too slow. Some people use the same reason (too
>	      slow) to against object-oriented programming. Can we do somethings
>	      about this?

Several approaches:

a) Show that a dynamically bound system can or does meet the
real-time constraints *of your application* right now.

b) Look for, or agitate for, a better-optimizing language
implementation.  Much optimization is based on essentially the
same principle as demand paging and RAM caches:  Make the usual
cases faster, at the expense of the rarer ones, while maintaining
a simple, uniform abstraction for the application programmer.

c) Look for, or agitate for, a more powerful processor.  Of
course, greater expense may require competitive justification -
e.g., greater functionality, flexibility, productivity, or
reusability.  But arguments just such as these have in the past
been successfully employed to justify C over assembly language,
the UNIX System over MS-DOS, the X Window System over native
window systems, etc.

d)  Go ahead and prototype your system on a high-reusability
platform.  You may find that a higher level of abstraction permits
the use of better algorithms, resulting in *better* total
performance than low-reusability implementations.  Even if not,
the computer industry's amazing price/performance increases may,
over time, make your solution feasible.

	Lawrence G. Mayka
	AT&T Bell Laboratories
	lgm@ihlpf.att.com

Standard disclaimer.