[comp.compilers] Regular expressions & finite automata.

mph@inmos.com (Mike Harrison) (08/15/90)

In response to tfd!kent@uunet.UU.NET (Kent Hauser), John wrote:

[Given the well-known equivalence between regular expressions and finite
state machines (albeit not a one-to-one equivalence) there are many programs
that compile regular expressions into FSMs and then run them.  
...]

In my copy of Aho & Ullman - 'The theory of parsing translation and
compiling', regular sets are shown to be equivalent to:
  - regular expressions,
  - right linear grammars,
  - finite automata.

Thus, I don't quite understand the "albeit not a one-to-one
equivalence" above.

Incidentally, this book is an excellent reference for a rigorous treatment of
many aspects of language theory.

Mike,

Michael P. Harrison - Software Group - Inmos Ltd. UK.
UK : mph@inmos.co.uk, US : mph@inmos.com
[There are many equivalent DFAs that a regular expression can translate to,
and vice versa.  After all, any regular expression can be written in
infinitely many equivalent ways, e.g. a(b|c) vs. ab|ac, or to be really
pedantic, x vs. (x|x) vs. (x|x|x) ... -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

worley@compass.com (Dale Worley) (08/23/90)

   [There are many equivalent DFAs that a regular expression can translate to,
   and vice versa.  After all, any regular expression can be written in
   infinitely many equivalent ways, e.g. a(b|c) vs. ab|ac, or to be really
   pedantic, x vs. (x|x) vs. (x|x|x) ... -John]

However, all DFAs for a given regular language can be derived from the
minimal DFA (the one with the fewest states that accepts that
language) by turning single states into sets of equivalent states.

Dale Worley		Compass, Inc.			worley@compass.com
--
My favorite was an example in Dijkstra's classic "A Discipline of
Programming" where he claimed that he hadn't submitted his program
to operational testing since it had been created using a discipline
that guaranteed it would be correct.  Naturally, there were a couple
of bugs in it! -- Doug Gwyn
[Dale mentioned in a separate note that Aho and Ullman's books tell this
and much more about DFAs and regular expressions. -John]
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.

hankd@dynamo.ecn.purdue.edu (Hank Dietz) (08/25/90)

In article <9008231529.AA26343@sn1987a.compass.com> worley@compass.com (Dale Worley) writes:
>
>  [There are many equivalent DFAs that a regular expression can translate to,
>  and vice versa.  After all, any regular expression can be written in
>  infinitely many equivalent ways, e.g. a(b|c) vs. ab|ac, or to be really
>  pedantic, x vs. (x|x) vs. (x|x|x) ... -John]
>
>However, all DFAs for a given regular language can be derived from the
>minimal DFA (the one with the fewest states that accepts that
>language) by turning single states into sets of equivalent states.

First, even without state merging, the DFAs generated for (x|x|x) and x
are IDENTICAL.  Still, state equivalence and merging are important
concepts.  Which reminds me of a strange and maybe useful thing I did a
while back...  I'm wondering if others think it useful also....

Basically, if someone gives you a DFA, it is possible to perform merging
of equivalent states to try to reduce it to the minimal form.  Two states
A and B are equivalent if they have the same accept action and, for each
transition from A to C labeled L, there is a corrseponding transition from
B also labeled L and going to a state D| D is equivalent to C.  That's the
"standard" definition and, for example, it will correctly reduce the DFA
for (x*|x) into the DFA for x*.  Nothing tricky...  I've implemented it
and it works fine.

Now, my little tweak was to recognize that, provided error detection is
not needed, states can be considered equivalent if they don't have
conflicting exit arcs and the accept action is either the same or is the
error accept action.  I call this "unsafe" state merging.  The result is
often a much simpler recognizer, but without error detection ability.  For
example, the lex-like specification:

a*b	{ ACTION1 }
aaaac	{ ACTION2 }

would generate a DFA which actually corresponded to:

a*b	{ ACTION1 }
a*c	{ ACTION2 }

I figured that this might be useful because it effectively reduces the
patterns to the minimal trailing components which distinguish them.  That
clearly results in fewer states for the DFA...  although my experiments
show that it doesn't save all that many states for more complex patterns.

So, what's the verdict?  Is this "unsafe" state merging idea useful?  Has
it been done before?  (If not, is it worth publishing?)

					-hankd@ecn.purdue.edu
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{ima | spdcc | world}!esegue.  Meta-mail to compilers-request@esegue.