johnl@ima.UUCP (04/24/87)
I have an assembler that uses lex & yacc, and I would like to have it generate traditional assembler listings of the form: address: data label: mnemonic instruction and operands Now, my current technique is to gather the text in the lexical analyzer and the address and data in the parser. Production of the address & data part of a listing line is driven by the parser recognizing an instruction, while production of the text part is driven by the lexer seeing a new-line. The problem is that the execution of lex- and yacc-generated code is not synchronized, so sometimes I see the text first, then the data, while other times I see the data first and then the text. The problem is exacerbated when an instruction produces more than one line of data, or multiple lines of text correspond to just one word of data. I have been able to deal with the problems, for the most part, by keeping flags and counters around for special cases, but I'm not very pleased with the result. So here's the question. Is there some arcanery in lex & yacc (of which I am ignorant) that would make this easier? I would appreciate advice from anyone who has been in a similar boat. Thanks. Guy Hillyer guy%ksr.uucp@harvard.harvard.edu [The few times I had to write an assembler, I finessed the problem by not making source listings, just symbol tables. But I thought about it some and didn't come up with anything very clever. I'd buffer up a source line and set a flag when I saw a line feed. Then, as I emit object code, I'd append the source line to the next listing line I put out and clear the flag so that subsequent listing lines didn't repeat the same source line. If I started to read another source line before dumping the current one, most likely because it's a comment, I'd put out the saved source line then. Not too elegant, but probably effective. Neither yacc nor lex are liable to give you much more help. Readers are encouraged to prove me wrong. -John] -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
edw@ius2.cs.cmu.edu (Eddie Wyatt) (04/29/87)
In article <558@ima.UUCP>, johnl@ima writes: >[any easy way to make a listing in an assembler written in yacc and lex?] Try production rules of the form %term LEXDEFADDRESS LEXDEFDATA COLON addrprod : address COLON data LEXDEFNEWLINE { generate_code ($1,$3); /* write code at end of production. If you want, make generate_code smart so that it buffers output and writes only when the buffer is full. */ } ; address : LEXDEFADDRESS { $$ = make_address(); /* read the chars from yytext[] and convert them into an address */ } ; data : LEXDEFDATA { $$ = make_data(); /* read the chars from yytest[] and convert them into data */ } ; I assume that your production is of the form: addrprod : LEXDEFADDRESS COLON LEXDEFDATA in which case, any semantic rules will not be executed until the lex symbol LEXDEFDATA is identified hence the chars that make up the lex symbol LEXDEFADDRESS are lost to the parser (I know you've hacked it so the lex code takes care of it, yuk). General comment, lex and yacc are probably too powerful for writing an assembler. You will pay for all the generality of lex and yacc in how fast your assembler runs. Eddie Wyatt, edw@ius2.cs.cmu.edu [Lex might slow you down, but I have never, ever, seen a compiler where hand-coding the parser rather than using yacc would make the compiler noticably faster. Yacc parses pretty fast, and parsing isn't that big a part of compile time anway. -John] -- Send compilers articles to ima!compilers or, in a pinch, to Levine@YALE.ARPA Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | cca}!ima Please send responses to the originator of the message -- I cannot forward mail accidentally sent back to compilers. Meta-mail to ima!compilers-request
fc121102@gwusun.gwu.edu (M. J. Lamoureux) (11/29/89)
I have a assignment I'm working on in which I'm supposed to "Develop the list of tokens to be placed in a file 'y.tab.h'" I have read the man page for lex, yacc, flex, and bison. And looked through as much other assorted documentation as I really care to, but I have yet to find a word on the format of this file. Is the only way to find out to write the yacc code and use the -d option? Thanks, Michael Lamoureux fc121102@gwusun.gwu.edu lamour@smiley.mitre.org
prs@tcsc3b2.tcsc.com (Paul Stath) (11/30/89)
fc121102@gwusun.gwu.edu (M. J. Lamoureux) writes: > I have a assignment I'm working on in which I'm supposed to >"Develop the list of tokens to be placed in a file 'y.tab.h'" I have >read the man page for lex, yacc, flex, and bison. And looked through >as much other assorted documentation as I really care to, but I have >yet to find a word on the format of this file. Is the only way to find >out to write the yacc code and use the -d option? [.sig deleted] The y.tab.h file is simply a set of #define <token-name> <number> statements which provide mapping of the token names to integer constants. The yacc -d option is the best way to do this! In fact, I sometimes use yacc for the express purpose of generating a set of #define directives while developing a program with a a lot of constants. This will allow me to quickly add or change the constants without worrying about how they are numbered. I just add a %token <token-name> command to my file.y! If I REALLY care about the order, I arrange the token directives. Nobody says the yacc code has to do anything usefull. I usually use a mininal grammer. (No flames about abusing the tools please! This works great for me. :-) -- =============================================================================== Paul R. Stath The Computer Solution Co., Inc. Voice: 804-794-3491 ------------------------------------------------+------------------------------ INTERNET: prs@tcsc3b2.tcsc.com | "There was no diety involved,
jbd0@gte.com (Jeffrey B. DeLeo) (11/30/89)
Once you have identified your lexical primitives you should be all set. These will be the nonterminals for your grammar; the grammar being defined in the yacc source file. Simply put all of the nonterminals in a .y (yacc source file) using the %token construct, put in some minimal information so yacc will run, and run yacc with the "-d" option - the y.tab.h file will be produced. The y.tab.h file will only change if you define new nonterminals; now you can use these values elsewhere while you write your grammar (yacc rules). ...!bunny!thoth!jbd0
chris@mimsy.umd.edu (Chris Torek) (12/02/89)
In article <21582@adm.BRL.MIL> thoth!jbd0@gte.com (Jeffrey B. DeLeo) writes: >Once you have identified your lexical primitives you should be all >set. These will be the nonterminals for your grammar; the grammar >being defined in the yacc source file. Since no one else has said anything yet, I will point out that these are the terminals. The nonterminals are the other words in the grammar: %token BAR BAZ FOO THE %% grammar: sentence | grammar sentence; sentence: FOO rest '.'; rest: /* empty */ | THE object; object: BAR | BAZ; %% Here there are 7 terminals (BAR, BAZ, FOO, and THE, and 3 unnamed: $end, $error [a yacc internal thing, not actually used here], and the period `.') and 4 nonterminals (grammar, sentence, rest, and object). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
johnl@esegue.segue.boston.ma.us (John R. Levine) (12/14/89)
In article <1989Nov29.180030.15742@tcsc3b2.tcsc.com> prs@tcsc3b2.tcsc.com (Paul Stath) writes: >The y.tab.h file is simply a set of #define <token-name> <number> statements >which provide mapping of the token names to integer constants. > >The yacc -d option is the best way to do this! In fact, I sometimes use >yacc for the express purpose of generating a set of #define directives while >developing a program with a a lot of constants. I suppose we should take this as testimony reminding us how flexible all of the Unix tools are. With any C compiler written since about 1978, though, it's a lot easier to write an enumeration type: enum { firstsymbol=256, /* or wherever you want to start */ secondsymbol, /* as many more as you want */ }; This gives you the same effect, avoids extra trips through yacc, and in many cases makes the names available in debuggers. -- John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650 johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl "Now, we are all jelly doughnuts."