goer@SOPHIST.UCHICAGO.EDU (Richard Goerwitz) (09/27/89)
Could someone familiar with the implementation of tables explain how efficient use can be made of storage when the ultimate size of the table is unknown? Or does the interpreter indeed check to see how big the various regions are, and adjust the algorithm to make maximum use of a given memory space? Just dunno, but would like t'know. -Richard L. Goerwitz goer@sophist.uchicago.edu rutgers!oddjob!gide!sophist!goer
wgg@JUNE.CS.WASHINGTON.EDU (William Griswold) (09/27/89)
>Date: Tue, 26 Sep 89 12:11:08 CDT >From: Richard Goerwitz <goer@sophist.uchicago.edu> >To: icon-group@arizona.edu >Subject: tables, efficiency >Errors-To: icon-group-errors@arizona.edu > >Could someone familiar with the implementation of tables explain >how efficient use can be made of storage when the ultimate size >of the table is unknown? Or does the interpreter indeed check >to see how big the various regions are, and adjust the algorithm >to make maximum use of a given memory space? > >Just dunno, but would like t'know. > > -Richard L. Goerwitz The current table implementation uses a simple hashing scheme. If you are not familiar with hashing, check your nearest data structures text or the Knuth volume on searching and sorting. The briefest description follows. Basically, a hashing data structure allows computing an array index for an input element (a constant-time operation), and storing the element at that index. If multiple elements are assigned to the same slot, both are stored there on a linked list. Elements on a chain are stored in sorted order of their hash numbers. By simple hashing, I mean that there are a fixed number of hash slots. For small tables, constant-time access is usually achieved for an individual element. For tables over 1000 elements, performance begins to degrade measurably. This is due to the linear-time cost of searching down the hash chain. Hashing does not take memory availability into account in building tables. It is hard to say how this could be done effectively in the face of so many dynamic properties of programs. However, an upcoming version of Icon may have an implementation of tables (and sets) employing dynamic hashing. Dynamic hashing permits increasing the number of slots in a hash table as a table grows. This avoids performance degradation when tables become large, because the collision resolution chains are kept short. This uses an additional 10% of memory over the old hash table implementation. It does not take memory availability characteristics in to account, unless memory is actually exhausted. For a more complete description of dynamic hashing, look for your next issue of the Icon Newsletter. I should note that taking memory characteristics into account on a machine supporting a paging operating system does not make sense. The operating system on such a system makes every attempt to make the finiteness of memory invisible in terms of function *and* performance. They are usually successful. This is not at all the case with PC machines with more primitive operating systems. With any luck, however, all of these machines will be gone in a decade. Bill Griswold