armstron@sjuvax.UUCP (L. Armstrong) (10/16/85)
*** YUM YUM - EAT EM' UP *** I have a question that has bugging me for quite some time lately, and I was wondering what some of the opinions of others were on the subject. My question is this: Do you think it would be faster for a compiler to have a more complex scanner that could recognize key-words/reserved identifiers immediately (i.e., simply by keeping a record of each inputted character) and not have to hash and search for those key-words each time one is encountered, or would it be better to keep the scanner simple, recognizing each key-word as simply an identifier, and then search the symbol table to determined wether or not your identifier was reserved. It is not immediately obvious to me which one would be quicker, although I have a tendency to think it may be the former. Lex allows for one to design a scanner of the first type very simply. Thanks, -Len -- NAME Len Armstrong UUCP {astrovax | bpa | burdvax | allegra }!sjuvax!armstron ORGANIZATIONS RCA Advanced Technology Labs St. Joseph's University PHONES (WORK) (609) 866-6647
stevev@tekchips.UUCP (Steve Vegdahl) (10/21/85)
> Do you think it would be faster for a compiler to have a more > complex scanner that could recognize key-words/reserved identifiers > immediately (i.e., simply by keeping a record of each inputted > character) and not have to hash and search for those key-words > each time one is encountered, or would it be better to keep the > scanner simple, recognizing each key-word as simply an identifier, > and then search the symbol table to determined wether or not > your identifier was reserved. > > It is not immediately obvious to me which one would be quicker, > although I have a tendency to think it may be the former. > Lex allows for one to design a scanner of the first type > very simply. If you ignore things like virtual memory, I'd think that the complex scanner would be faster if implemented in a straightforward way. However, this would increase the size of the state-transition table. Instead of having one or two character classes for letters, you'd have something like 26. I prefer to use hashing here because the set of reserved words is known at compiler-compile time. Thus, you can analyze the reserved words in advance and select a *perfect hash function* (or something close to it). The size of your hash table will be small, and you will be guaranteed to find any reserved word in a single probe of the table. Steve Vegdahl Computer Research Lab. Tektronix, Inc. Beaverton, Oregon