[comp.unix.questions] Need COBOL Grammer

zhb2165@dsachg1.UUCP (Ned D Hanks) (12/12/89)

I am in the need of grammar for COBOL. Any format would be ok but
lex and yacc would be best.

I also would like a lex and yacc for MS-DOS.

Please E-Mail responses.

-- 
Ned D Hanks, DLA Systems Automation Center, (801)392-8623, AV 790-0543
UUCP:       ucbvax!tut.cis.ohio-state.edu!dsac!dsachg1!nhanks
INTERNET:   nhanks@dsachg1.dsac.dla.mil or 73727.435@compuserve.com
COMPUSERVE: 73727,435 ** Opinions are mine, nobody else would want them.

ejp@bohra.cpg.oz (Esmond Pitt) (12/14/89)

In article <706@dsachg1.UUCP> zhb2165@dsachg1.UUCP (Ned D Hanks) writes:
>I am in the need of grammar for COBOL. Any format would be ok but
>lex and yacc would be best.

This question comes around every year or so. You can do a 'yacc'
grammar for Cobol(-85 I assume), but it's very difficult.

1. Cobol-85 is neither (i) LR(k) for any k, (ii) context-free, nor even
(iii) regular.  Anybody contemplating 'lex'/'yacc' for COBOL who
doesn't know what the above means is advised to forget all about it
straight away and do it in recursive descent in C with a hand-written
scanner. You need a good appreciation of these 3 issues to understand
how to get around them with tools such as 'lex' and 'yacc' which rely
on these properties.  Cobol-74 is slightly better from this point of
view, but not all that much.

2. Cobol-85 has 400+ reserved words, and this alone will bust most
yacc's unless they are greatly enlarged, which means you need the
source or a co-operative vendor.

3. The grammar requires semantic feedback at various points, which
means you have to built quite a lot of the compiler even if that's not
what you're going to use it for.

4. Lexical problems: Keywords, identifiers, and literals can be
continued across line boundaries. There are two distinct rules for
continued tokens. There are three distinct context-dependent scanning
modes (normal, PICTURE string, comment-entry), and the last of these is
not very well specified from the implementor's point of view.  There
are two distinct definitions of a token, depending on whether you are
doing Source Text Manipulation or compiling proper. And so on and so
forth.

On the other hand, I've done both a 'yacc' grammar and a 'lex' scanner
for Cobol-85. In their present state they're undoubtedly
incomprehensible to anybody but me. This work may turn into a product
one day so I'm not about to release it to the world.

I also had to speed up my yacc, as originally it took about 15 minutes
to produce a parser (on a Pyramid, that is). Bison was better (a couple
of minutes).


-- 
Esmond Pitt, Computer Power Group
ejp@bohra.cpg.oz