[comp.lang.lisp] optimizing array lookups - need help

glassw@athena.mit.edu (William B Glass) (02/27/90)

I am currently working on a project that involves creating a lookup
routine for an online (English) dictionary.  Words are stored in nested
(1D) arrays and lists.

At each level words are indexed by the current letter.  For example, at the
top level, "apple", "banana", and "cherry" would all be in seperate 
substructures.

I would like to optimize this by storing each level as either an array or list
depending on the number of elements.  For example, if there are only three 
words beginning with "smi" it is faster to use  a list than an array to 
look the words up in.  However, the combination "st" is followed by almost
every other possible letter, and thus would be faster if I used a 26 element
array.

Does anyone know where the appropriate cutoff point should change
from lists to arrays for quicker indexing?  My guess is between 5 and
ten, but it would help if someone else had any info an this.

I'm not a regular reader of this list, so I would appreciate email.  
Thanks in advance.

William Glass  <glassw@athena.mit.edu>
Massachusetts Institute of Technology           / Standard Disclaimer
--
Will Glass

forster@cs.umass.edu (03/03/90)

Two points:
1. there was a discussion about a year ago about trade-offs between hash tables
and a-lists (or property lists, I don't remember which), which eventually
arrived at the conclusion that the point at which it was more efficient to use
one scheme or the other was implementation-dependent, but around 30 for a
lispm.  This is the closest thing I can recall to what you're asking about.
2. You're essentially describing tries, and since you don't say how you're
going to access the data in the lists, it's not at all clear that you'd have
any savings at all, let alone save time.  The whole point of using a trie is
that you can use the character as an index, and get the data as quickly as
possible.  (How could you improve on an array reference for speed?)  Space is,
of course, an issue, and you might be able to save space this way, but it
seems like you're going to lose more than you gain.
- David Forster