webber@athos.rutgers.edu (03/28/88)
In article <929@ima.ISC.COM>, beres@cadnetix.COM (Tim Beres) writes some stuff to which the moderator appends: > [It's hard to believe that storing a symbol table in a B-tree is worth the > effort; B-trees make sense for disk files since looking at a page is > relatively expensive, but for an in-core symbol table the O(1) lookup time > of hashing should be better. It's also a lot easier to program. -John] Depends on how large your symbol table is. It is easy to imagine B-trees out performing hash tables if the b-tree nodes are near page size. Remember, virtual memory is on a disk too. Given the variety of computer architectures, hash functions, ways of handling hash collisions, identifier usage patterns among languages, and programming goals (i.e., how architecture specific it is possible to make the implementation), I doubt if there is one right structure. Many of the studies have been done on systems where page fetch delays aren't charged to the process making them, making them suspect. Considering the number of times a compiler gets used in comparison to the number of times it gets written, the real question is: is the symbol table a bottleneck worth speeding up? Probably the time would be better spent creating structure editors that hand the compiler a pre-scanned pre-parsed object to fiddle with as appropriate. ------ BOB (webber@athos.rutgers.edu ; rutgers!athos.rutgers.edu!webber) -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
daveb@ucbvax.Berkeley.EDU (Crack? No thanks, I've got a new CD player) (03/31/88)
In article <930@ima.ISC.COM> webber@athos.rutgers.edu writes: >In article <929@ima.ISC.COM>, beres@cadnetix.COM (Tim Beres) writes >some stuff to which the moderator appends: > >> [It's hard to believe that storing a symbol table in a B-tree is worth the >> effort; B-trees make sense for disk files since looking at a page is >> relatively expensive, but for an in-core symbol table the O(1) lookup time >> of hashing should be better. It's also a lot easier to program. -John] > >Depends on how large your symbol table is. It is easy to imagine >B-trees out performing hash tables if the b-tree nodes are near page >size. Remember, virtual memory is on a disk too. Given the variety >of computer architectures, hash functions, ways of handling hash >collisions, identifier usage patterns among languages, and programming >goals... I'd also like to question the presumed superiority of hashing for symbol table functions. Even when in-memory the constant factors can outweigh the O(1) vs O(log N) theoretical advantage. [ Anecdotal ] An application I was working on was using a hash table to hold around 30k entries. With instrumentation and profiling, I found long hash chains *on average*, caused by a not very randomizing hash algorithm. The profile showed most of the time being spent in strcmps down the chains, and in the hash function proper. Changing the hash function for better distribution slowed things down, because the time spent doing a better hash was greater than the cost of doing the strcmps. I replaced the hash tables with splay trees, and found the performance was better, mostly because I was not doing the expensive hash function, and effectively went strcmping right down the chain, whose average length turned out to be about the same as the collision chain hanging off the hash buckets before. As a side effect, it gives you a nicely ordered scan should that be needed. The moral is that good performance in a hash table depends a lot on: * The distribution of the hashing function. * The cost of the hashing function. * The cost of searching a collision chain Few people bother to instrument and/or tune the hash for the best distribution, and this can easily wipe out whatever theoretical advantage may be present. Those interested in the code for a splay tree library suitable for symbol table use will find it in comp.sources.unix in a few weeks, wind and weather permitting. I'll mail it to anyone who is in a rush. -dB {amdahl, cpsc6a, mtxinu, ptsfa, sun, hoptoad}!rtech!daveb daveb@rtech.uucp -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.EDU Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request