glassw@athena.mit.edu (William B Glass) (02/27/90)
I am currently working on a project that involves creating a lookup routine for an online (English) dictionary. Words are stored in nested (1D) arrays and lists. At each level words are indexed by the current letter. For example, at the top level, "apple", "banana", and "cherry" would all be in seperate substructures. I would like to optimize this by storing each level as either an array or list depending on the number of elements. For example, if there are only three words beginning with "smi" it is faster to use a list than an array to look the words up in. However, the combination "st" is followed by almost every other possible letter, and thus would be faster if I used a 26 element array. Does anyone know where the appropriate cutoff point should change from lists to arrays for quicker indexing? My guess is between 5 and ten, but it would help if someone else had any info an this. I'm not a regular reader of this list, so I would appreciate email. Thanks in advance. William Glass <glassw@athena.mit.edu> Massachusetts Institute of Technology / Standard Disclaimer -- Will Glass
forster@cs.umass.edu (03/03/90)
Two points: 1. there was a discussion about a year ago about trade-offs between hash tables and a-lists (or property lists, I don't remember which), which eventually arrived at the conclusion that the point at which it was more efficient to use one scheme or the other was implementation-dependent, but around 30 for a lispm. This is the closest thing I can recall to what you're asking about. 2. You're essentially describing tries, and since you don't say how you're going to access the data in the lists, it's not at all clear that you'd have any savings at all, let alone save time. The whole point of using a trie is that you can use the character as an index, and get the data as quickly as possible. (How could you improve on an array reference for speed?) Space is, of course, an issue, and you might be able to save space this way, but it seems like you're going to lose more than you gain. - David Forster