mph@inmos.com (Mike Harrison) (08/15/90)
In response to tfd!kent@uunet.UU.NET (Kent Hauser), John wrote: [Given the well-known equivalence between regular expressions and finite state machines (albeit not a one-to-one equivalence) there are many programs that compile regular expressions into FSMs and then run them. ...] In my copy of Aho & Ullman - 'The theory of parsing translation and compiling', regular sets are shown to be equivalent to: - regular expressions, - right linear grammars, - finite automata. Thus, I don't quite understand the "albeit not a one-to-one equivalence" above. Incidentally, this book is an excellent reference for a rigorous treatment of many aspects of language theory. Mike, Michael P. Harrison - Software Group - Inmos Ltd. UK. UK : mph@inmos.co.uk, US : mph@inmos.com [There are many equivalent DFAs that a regular expression can translate to, and vice versa. After all, any regular expression can be written in infinitely many equivalent ways, e.g. a(b|c) vs. ab|ac, or to be really pedantic, x vs. (x|x) vs. (x|x|x) ... -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
worley@compass.com (Dale Worley) (08/23/90)
[There are many equivalent DFAs that a regular expression can translate to, and vice versa. After all, any regular expression can be written in infinitely many equivalent ways, e.g. a(b|c) vs. ab|ac, or to be really pedantic, x vs. (x|x) vs. (x|x|x) ... -John] However, all DFAs for a given regular language can be derived from the minimal DFA (the one with the fewest states that accepts that language) by turning single states into sets of equivalent states. Dale Worley Compass, Inc. worley@compass.com -- My favorite was an example in Dijkstra's classic "A Discipline of Programming" where he claimed that he hadn't submitted his program to operational testing since it had been created using a discipline that guaranteed it would be correct. Naturally, there were a couple of bugs in it! -- Doug Gwyn [Dale mentioned in a separate note that Aho and Ullman's books tell this and much more about DFAs and regular expressions. -John] -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.
hankd@dynamo.ecn.purdue.edu (Hank Dietz) (08/25/90)
In article <9008231529.AA26343@sn1987a.compass.com> worley@compass.com (Dale Worley) writes: > > [There are many equivalent DFAs that a regular expression can translate to, > and vice versa. After all, any regular expression can be written in > infinitely many equivalent ways, e.g. a(b|c) vs. ab|ac, or to be really > pedantic, x vs. (x|x) vs. (x|x|x) ... -John] > >However, all DFAs for a given regular language can be derived from the >minimal DFA (the one with the fewest states that accepts that >language) by turning single states into sets of equivalent states. First, even without state merging, the DFAs generated for (x|x|x) and x are IDENTICAL. Still, state equivalence and merging are important concepts. Which reminds me of a strange and maybe useful thing I did a while back... I'm wondering if others think it useful also.... Basically, if someone gives you a DFA, it is possible to perform merging of equivalent states to try to reduce it to the minimal form. Two states A and B are equivalent if they have the same accept action and, for each transition from A to C labeled L, there is a corrseponding transition from B also labeled L and going to a state D| D is equivalent to C. That's the "standard" definition and, for example, it will correctly reduce the DFA for (x*|x) into the DFA for x*. Nothing tricky... I've implemented it and it works fine. Now, my little tweak was to recognize that, provided error detection is not needed, states can be considered equivalent if they don't have conflicting exit arcs and the accept action is either the same or is the error accept action. I call this "unsafe" state merging. The result is often a much simpler recognizer, but without error detection ability. For example, the lex-like specification: a*b { ACTION1 } aaaac { ACTION2 } would generate a DFA which actually corresponded to: a*b { ACTION1 } a*c { ACTION2 } I figured that this might be useful because it effectively reduces the patterns to the minimal trailing components which distinguish them. That clearly results in fewer states for the DFA... although my experiments show that it doesn't save all that many states for more complex patterns. So, what's the verdict? Is this "unsafe" state merging idea useful? Has it been done before? (If not, is it worth publishing?) -hankd@ecn.purdue.edu -- Send compilers articles to compilers@esegue.segue.boston.ma.us {ima | spdcc | world}!esegue. Meta-mail to compilers-request@esegue.