[comp.lang.icon] tables, efficiency

goer@SOPHIST.UCHICAGO.EDU (Richard Goerwitz) (09/27/89)

Could someone familiar with the implementation of tables explain
how efficient use can be made of storage when the ultimate size
of the table is unknown?  Or does the interpreter indeed check
to see how big the various regions are, and adjust the algorithm
to make maximum use of a given memory space?

Just dunno, but would like t'know.

                                        -Richard L. Goerwitz
                                        goer@sophist.uchicago.edu
                                        rutgers!oddjob!gide!sophist!goer

wgg@JUNE.CS.WASHINGTON.EDU (William Griswold) (09/27/89)

>Date: Tue, 26 Sep 89 12:11:08 CDT
>From: Richard Goerwitz <goer@sophist.uchicago.edu>
>To: icon-group@arizona.edu
>Subject: tables, efficiency
>Errors-To: icon-group-errors@arizona.edu
>
>Could someone familiar with the implementation of tables explain
>how efficient use can be made of storage when the ultimate size
>of the table is unknown?  Or does the interpreter indeed check
>to see how big the various regions are, and adjust the algorithm
>to make maximum use of a given memory space?
>
>Just dunno, but would like t'know.
>
>                                        -Richard L. Goerwitz


The current table implementation uses a simple hashing scheme.  If you are
not familiar with hashing, check your nearest data structures text or the
Knuth volume on searching and sorting.  The briefest description follows.

Basically, a hashing data structure allows computing an array index for
an input element (a constant-time operation), and storing the element at
that index.  If multiple elements are assigned to the same slot, both are
stored there on a linked list.  Elements on a chain are stored in sorted
order of their hash numbers.  By simple hashing, I mean that there are a
fixed number of hash slots.  For small tables, constant-time access is
usually achieved for an individual element.  For tables over 1000
elements, performance begins to degrade measurably.  This is due to the
linear-time cost of searching down the hash chain.

Hashing does not take memory availability into account in building tables.
It is hard to say how this could be done effectively in the face of so
many dynamic properties of programs.

However, an upcoming version of Icon may have an implementation of tables 
(and sets) employing dynamic hashing.  Dynamic hashing permits increasing 
the number of slots in a hash table as a table grows.  This avoids 
performance degradation when tables become large, because the collision 
resolution chains are kept short.  This uses an additional 10% of memory
over the old hash table implementation.  It does not take memory
availability characteristics in to account, unless memory is actually
exhausted.  For a more complete description of dynamic hashing, look for
your next issue of the Icon Newsletter. 

I should note that taking memory characteristics into account on a machine 
supporting a paging operating system does not make sense.  The operating
system on such a system makes every attempt to make the finiteness of
memory invisible in terms of function *and* performance.  They are usually
successful.  This is not at all the case with PC machines with more
primitive operating systems.  With any luck, however, all of these
machines will be gone in a decade. 


					Bill Griswold