[comp.unix.questions] lex/yacc question

johnl@ima.UUCP (04/24/87)

I have an assembler that uses lex & yacc, and I would like
to have it generate traditional assembler listings of the form:

    address: data	label: mnemonic instruction and operands

Now, my current technique is to gather the text in the lexical analyzer
and the address and data in the parser.  Production of the address &
data part of a listing line is driven by the parser recognizing an
instruction, while production of the text part is driven by the lexer
seeing a new-line.  The problem is that the execution of lex- and
yacc-generated code is not synchronized, so sometimes I see the text
first, then the data, while other times I see the data first and then
the text.  The problem is exacerbated when an instruction produces more
than one line of data, or multiple lines of text correspond to just one
word of data.  I have been able to deal with the problems, for the most
part, by keeping flags and counters around for special cases, but I'm
not very pleased with the result.

So here's the question.  Is there some arcanery in lex & yacc (of which
I am ignorant) that would make this easier?  I would appreciate advice
from anyone who has been in a similar boat.

					Thanks.
					Guy Hillyer

					guy%ksr.uucp@harvard.harvard.edu
[The few times I had to write an assembler, I finessed the problem by not
making source listings, just symbol tables.  But I thought about it some and
didn't come up with anything very clever.  I'd buffer up a source line and
set a flag when I saw a line feed.  Then, as I emit object code, I'd append
the source line to the next listing line I put out and clear the flag so
that subsequent listing lines didn't repeat the same source line.  If I
started to read another source line before dumping the current one, most
likely because it's a comment, I'd put out the saved source line then.  Not
too elegant, but probably effective.  Neither yacc nor lex are liable to give
you much more help.  Readers are encouraged to prove me wrong.  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

edw@ius2.cs.cmu.edu (Eddie Wyatt) (04/29/87)

In article <558@ima.UUCP>, johnl@ima writes:
>[any easy way to make a listing in an assembler written in yacc and lex?]

Try production rules of the form

%term 	LEXDEFADDRESS	LEXDEFDATA COLON

addrprod 	: 	address COLON data LEXDEFNEWLINE
			{   generate_code ($1,$3); 
			   /* write code at end of production.
			      If you want, make generate_code
			      smart so that it buffers output
			      and writes only when the buffer is
			      full. */ }
		;

address		:	LEXDEFADDRESS
			{ $$ = make_address(); 
			  /* read the chars from yytext[]
			     and convert them into an address */ }
		;

data		:	LEXDEFDATA
			{ $$ = make_data();
			  /* read the chars from yytest[]
			     and convert them into data */ }
		;

I assume that your production is of the form:

addrprod	: 	LEXDEFADDRESS COLON LEXDEFDATA


in which case, any semantic rules will not be executed until
the lex symbol LEXDEFDATA is identified hence the chars that make
up the lex symbol LEXDEFADDRESS are lost to the parser (I know you've
hacked it so the lex code takes care of it, yuk).

General comment, lex and yacc are probably too powerful for writing
an assembler.  You will pay for all the generality of lex and
yacc in how fast your assembler runs.
	Eddie Wyatt, edw@ius2.cs.cmu.edu
[Lex might slow you down, but I have never, ever, seen a compiler where
hand-coding the parser rather than using yacc would make the compiler
noticably faster.  Yacc parses pretty fast, and parsing isn't that big a
part of compile time anway.  -John]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request

fc121102@gwusun.gwu.edu (M. J. Lamoureux) (11/29/89)

	I have a assignment I'm working on in which I'm supposed to
"Develop the list of tokens to be placed in a file 'y.tab.h'"  I have
read the man page for lex, yacc, flex, and bison.  And looked through
as much other assorted documentation as I really care to, but I have
yet to find a word on the format of this file.  Is the only way to find
out to write the yacc code and use the -d option?

Thanks,

Michael Lamoureux
fc121102@gwusun.gwu.edu
lamour@smiley.mitre.org

prs@tcsc3b2.tcsc.com (Paul Stath) (11/30/89)

fc121102@gwusun.gwu.edu (M. J. Lamoureux) writes:

>	I have a assignment I'm working on in which I'm supposed to
>"Develop the list of tokens to be placed in a file 'y.tab.h'"  I have
>read the man page for lex, yacc, flex, and bison.  And looked through
>as much other assorted documentation as I really care to, but I have
>yet to find a word on the format of this file.  Is the only way to find
>out to write the yacc code and use the -d option?

[.sig deleted]

The y.tab.h file is simply a set of #define <token-name> <number> statements
which provide mapping of the token names to integer constants.

The yacc -d option is the best way to do this!  In fact, I sometimes use
yacc for the express purpose of generating a set of #define directives while
developing a program with a a lot of constants.  This will allow me to quickly
add or change the constants without worrying about how they are numbered.
I just add a %token <token-name> command to my file.y!  If I REALLY care about
the order, I arrange the token directives.  Nobody says the yacc code has to
do anything usefull.  I usually use a mininal grammer.

(No flames about abusing the tools please!  This works great for me. :-)
-- 
===============================================================================
Paul R. Stath       The Computer Solution Co., Inc.       Voice: 804-794-3491
------------------------------------------------+------------------------------
INTERNET:	prs@tcsc3b2.tcsc.com		| "There was no diety involved,

jbd0@gte.com (Jeffrey B. DeLeo) (11/30/89)

Once you have identified your lexical primitives you should be all
set.  These will be the nonterminals for your grammar; the grammar
being defined in the yacc source file.

Simply put all of the nonterminals in a .y (yacc source file) using
the %token construct, put in some minimal information so yacc will
run, and run yacc with the "-d" option - the y.tab.h file will be
produced. 

The y.tab.h file will only change if you define new nonterminals; now
you can use these values elsewhere while you write your grammar (yacc
rules). 

		...!bunny!thoth!jbd0

chris@mimsy.umd.edu (Chris Torek) (12/02/89)

In article <21582@adm.BRL.MIL> thoth!jbd0@gte.com (Jeffrey B. DeLeo) writes:
>Once you have identified your lexical primitives you should be all
>set.  These will be the nonterminals for your grammar; the grammar
>being defined in the yacc source file.

Since no one else has said anything yet, I will point out that these
are the terminals.  The nonterminals are the other words in the grammar:

	%token BAR BAZ FOO THE
	%%
	grammar: sentence | grammar sentence;
	sentence: FOO rest '.';
	rest: /* empty */ | THE object;
	object: BAR | BAZ;
	%%

Here there are 7 terminals (BAR, BAZ, FOO, and THE, and 3 unnamed:
$end, $error [a yacc internal thing, not actually used here], and the
period `.') and 4 nonterminals (grammar, sentence, rest, and object).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

johnl@esegue.segue.boston.ma.us (John R. Levine) (12/14/89)

In article <1989Nov29.180030.15742@tcsc3b2.tcsc.com> prs@tcsc3b2.tcsc.com (Paul Stath) writes:
>The y.tab.h file is simply a set of #define <token-name> <number> statements
>which provide mapping of the token names to integer constants.
>
>The yacc -d option is the best way to do this!  In fact, I sometimes use
>yacc for the express purpose of generating a set of #define directives while
>developing a program with a a lot of constants.

I suppose we should take this as testimony reminding us how flexible all of
the Unix tools are.  With any C compiler written since about 1978, though,
it's a lot easier to write an enumeration type:

enum {
	firstsymbol=256,	/* or wherever you want to start */
	secondsymbol,
	/* as many more as you want */
};

This gives you the same effect, avoids extra trips through yacc, and in many
cases makes the names available in debuggers.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
"Now, we are all jelly doughnuts."