[comp.lang.c] State Machines

peter@sugar.UUCP (Peter da Silva) (05/07/88)

I submit that there is one case where gotos are no worse and possibly better
than the alternative. And that is state machine type code, as is found in
lexers. It's rather a narrow category, but I feel it's a valid one. I don't
know if you're arguing against having a goto in a language at all, but here's
one reason to leave it in...

Objection #1 I'll cover immediately: "Use Lex".

	You can't always get lex.

	In fact, for most machines and languages there is no lex or anything
like it. Say, Fortran on a Unisys 1100. Most machines *aren't* UNIX.

	Even using Lex on a UNIX box and massaging the tables doesn't help
when the program has to be maintained in the non-UNIX environment. Even when
you have all the UNIX tools, it's not always cool. I once wrote a recursive-
descent parser in Forth for a real-time applications language in something like 
two months, including a line-oriented language-oriented editor. The guy in
charge of the job didn't like this, and went away for 6 months playing with
doing it with Yacc and massaging the tables. Eventually he and the project were
canned.

Objection #2 is something like: "Write a table-driven scanner". It's also
expressed in terms of using a big while loop and a case statement to do the
state machine.

	You still have to maintain the state machine. The table-driven one
has to be generated somehow, and unless you have Lex or its equivalent it's
going to be expressed in terms of states and transitions... just like the
goto-based version. The same can be said of the case statement. All you're
doing here is hiding the gotos. They're still there:

	ENDSTMT:
		IF(TOKEN .EQ. NEWLINE) THEN
			CALL OUTPUT(SCANNEDLINE)
			GOTO NEWSTMT
		ELSE IF(TOKEN .EQ. COMMENT) THEN
			GOTO ENDSTMT
		ELSE
			GOTO SYNERR
		ENDIF

Is no worse than (and is in fact equivalent to):

	ELSE IF(STATE .EQ. ENDSTMT) THEN
		IF(TOKEN .EQ. NEWLINE) THEN
			CALL OUTPUT(SCANNEDLINE)
			STATE = NEWSTMT
		ELSE IF(TOKEN .EQ. COMMENT) THEN
			STATE = ENDSTMT
		ELSE
			STATE = SYNERR
		ENDIF

And is much better than:

	DATA STATABL(1,ENDSTMT) /NEWLINE, ACTIONOUT, NEWSTMT/
	DATA STATABL(2,ENDSTMT) /COMMENT, 0, ENDSTMT/
	DATA STATABL(3,ENDSTMT) /0, 0, SYNERR/

with some special code to decode the actions.
-- 
-- Peter da Silva      `-_-'      ...!hoptoad!academ!uhnix1!sugar!peter
-- "Have you hugged your U wolf today?" ...!bellcore!tness1!sugar!peter
-- Disclaimer: These aren't mere opinions, these are *values*.

mouse@mcgill-vision.UUCP (der Mouse) (05/16/88)

In article <1945@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes:
> I submit that there is one case where gotos are no worse and possibly
> better than the alternative.  And that is state machine type code, as
> is found in lexers.

Ideally, one should use a real lexer language.  If you have C on the
target machine, and you have a UNIX machine available for development,
and you are scanning out of a string (as opposed to a stdio stream), I
have something which may be able to help.  It takes input describing
something like a finite state machine (it isn't quite a vanilla FSM, as
one might encounter in a theory class) and produces C code to implement
it.  It was patterned after the VMS lib$tparse facility, but is
somewhat more powerful.

Mail if interested,

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

kenny@uiucdcsm.cs.uiuc.edu (05/17/88)

/* Written  3:04 am  May 16, 1988 by mouse@mcgill-vision.UUCP in uiucdcsm:comp.lang.c */
In article <1945@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes:
> I submit that there is one case where gotos are no worse and possibly
> better than the alternative.  And that is state machine type code, as
> is found in lexers.

Ideally, one should use a real lexer language.
[...]
/* End of text from uiucdcsm:comp.lang.c */

     Moreover, with the release of flex to the public, one doesn't
need to be on a Un*x system to use a real lexer language; I've already
heard from a gentleman who's got flex up and running on an IBM PC.

     The advent of Bison does the same for LR parser languages, or
would were it not for the fact that whenever you use Bison, it puts
the FSF copyright on your code.  There are a couple of clients that I
can't use Bison with, because of pre-existing agreements *not* to
release the work I do for them.

Kevin