zhb2165@dsachg1.UUCP (Ned D Hanks) (12/12/89)
I am in the need of grammar for COBOL. Any format would be ok but lex and yacc would be best. I also would like a lex and yacc for MS-DOS. Please E-Mail responses. -- Ned D Hanks, DLA Systems Automation Center, (801)392-8623, AV 790-0543 UUCP: ucbvax!tut.cis.ohio-state.edu!dsac!dsachg1!nhanks INTERNET: nhanks@dsachg1.dsac.dla.mil or 73727.435@compuserve.com COMPUSERVE: 73727,435 ** Opinions are mine, nobody else would want them.
ejp@bohra.cpg.oz (Esmond Pitt) (12/14/89)
In article <706@dsachg1.UUCP> zhb2165@dsachg1.UUCP (Ned D Hanks) writes: >I am in the need of grammar for COBOL. Any format would be ok but >lex and yacc would be best. This question comes around every year or so. You can do a 'yacc' grammar for Cobol(-85 I assume), but it's very difficult. 1. Cobol-85 is neither (i) LR(k) for any k, (ii) context-free, nor even (iii) regular. Anybody contemplating 'lex'/'yacc' for COBOL who doesn't know what the above means is advised to forget all about it straight away and do it in recursive descent in C with a hand-written scanner. You need a good appreciation of these 3 issues to understand how to get around them with tools such as 'lex' and 'yacc' which rely on these properties. Cobol-74 is slightly better from this point of view, but not all that much. 2. Cobol-85 has 400+ reserved words, and this alone will bust most yacc's unless they are greatly enlarged, which means you need the source or a co-operative vendor. 3. The grammar requires semantic feedback at various points, which means you have to built quite a lot of the compiler even if that's not what you're going to use it for. 4. Lexical problems: Keywords, identifiers, and literals can be continued across line boundaries. There are two distinct rules for continued tokens. There are three distinct context-dependent scanning modes (normal, PICTURE string, comment-entry), and the last of these is not very well specified from the implementor's point of view. There are two distinct definitions of a token, depending on whether you are doing Source Text Manipulation or compiling proper. And so on and so forth. On the other hand, I've done both a 'yacc' grammar and a 'lex' scanner for Cobol-85. In their present state they're undoubtedly incomprehensible to anybody but me. This work may turn into a product one day so I'm not about to release it to the world. I also had to speed up my yacc, as originally it took about 15 minutes to produce a parser (on a Pyramid, that is). Bison was better (a couple of minutes). -- Esmond Pitt, Computer Power Group ejp@bohra.cpg.oz