[comp.lang.misc] YACC/LEX Combo for large textual input

ddickey@aspen04.cray.com (Dan A. Dickey) (01/26/91)

Hi, I'm working on a yacc/lex parser pair that parses a wide variety
of output.  The problem I have is that the output contains a rather
large set of words in it, english language words.  I'm wondering if
there is an easy way of parsing these words with yacc/lex that I'm
not aware of.  Currently I define tokens for each and every word,
and return these token in the lex part.  It is very messy; I have had
to increase the table sizes in my lex input file quite substantially.
The yacc input doesn't even make it through a standard yacc.  I
have had to increase several table sizes in the yacc source code
in order to get it to process my yacc input file.  As I said, this
is very messy.  Currently, I would guess that there are about 300 words
or so that I'm parsing, along with the normal things like integers,
floating point output, strings, etc.

If you think you can help me out, please email.  If you think you
can help, but I have explained myself poorly, email with your questions
and I'll send a sample of what I'm doing.  The address is:
	ddickey@fizban.cray.com

	-Dan

--
---------
Dan A. Dickey      ddickey@fizban.cray.com, or ddickey@aspen.cray.com

new@ee.udel.edu (Darren New) (01/29/91)

In article <144123.10308@timbuk.cray.com> ddickey@aspen04.cray.com (Dan A. Dickey) writes:
>Hi, I'm working on a yacc/lex parser pair that parses a wide variety
>of output.  The problem I have is that the output contains a rather
>large set of words in it, english language words.  I'm wondering if
>there is an easy way of parsing these words with yacc/lex that I'm
>not aware of.  

If you are not stuck with using yacc and lex, and the input to
your parser is (or can be made) line oriented (e.g., compiler
error messages, or some language like FORTRAN or BASIC rather
than C or Pascal), you might want to look at LOME, which I
recently posted to comp.sys.amiga. It also compiles and runs under
SunOS 4.1 and I'm told compiles and runs fine under System V.
Feel free to contact me for more help if you like.
	  -- Darren

-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, 
      Formal Description Techniques (esp. Estelle), Coffee, Amigas -----
              =+=+=+ Let GROPE be an N-tuple where ... +=+=+=

ddickey@aspen04.cray.com (Dan A. Dickey) (01/29/91)

Well, I now know what I need to know.  I thank everyone who has sent
me email.  You can stop sending now.  I particularly thank:
Larry Jones, Florian Krohm, dilse2@info.win.tue.nl,
Mark Pledger, Rick Kimball, Thomas J Roberts, and Peter Mielke.

The solution was quite simple actually, I'm embarassed that I didn't
think of it myself; but, that's why I originally asked.

The solution I'll use consists of modifying the lex piece to just parse
"identifiers".  Then, when it finds an identifier, it will look it up
in a "reserved word" table.  If found, it will return the associated token.
If not found, then it returns "identifier".  The lookup will take place
via a C routine, not building it into the lex parse tables like I was doing.

I knew there had to be a better and easier way, and this, I believe, is it.

	-Dan


--
---------
Dan A. Dickey      ddickey@fizban.cray.com, or ddickey@aspen.cray.com