csnjr@its63b.ed.ac.uk (Nick Rothwell) (05/10/88)
From article <7756@mcdchg.UUCP>, by mauny@inria.UUCP (Michel Mauny): > CAML, a functional programmimg language of the ML family developed > at INRIA, is now available on SUN-3 and VAX11 under Unix BSD 4.2 and 4.3. > CAML (an acronym for Categorical Abstract Machine Language) is close > in functionality to Standard ML (although release 2.5 does not have modules > yet). It differs mostly in its syntax (closer to the syntax of LCF ML), and > in its ability to manipulate concrete syntaxes through an interface with Yacc. Actually, we're working on a parser generator interface for Standard ML as well, complete with antiquotation (concrete syntaxes) and so on. Our system doesn't rely on yacc syntax, and doesn't generate a parser in source form (I don't recall how the CAML parser is generated and loaded). Instead, the grammar is presented as a data structure (in fact, as an environment with the various types for terminals, nonterminals and the associated actions), and the parser generator returns a new environment containing a parsing function. Writing grammars in this way is a little tedious, but the whole idea is that a simple grammar is written to generate a parser which understands grammars. Then, full grammars can be bootstrapped into this system by antiquoting. The strength of the system is that grammars can be quite flexible, since they are data structures which can be transformed into the form that a LR parser generator wants. I am (at this very moment) putting the finishing touches to an interface which lets you specify productions containing regular expressions. Coupled with the antiquoting mechanism (not yet implemented) you can say things rather like: structure Grammar = struct ... val Gram = << StartSymbol -> Expr Expr -> "nil" | Expr "::" Expr | "[" (Expr / ",")* "]" {Sexy, eh?} | ... >> end structure Parser = ParserGen(structure G = Grammar) fun parse() = Parser.parse(lexer); There's a *lot* of polishing still to do, but the basic mechanisms work. Nick. -- Nick Rothwell, Laboratory for Foundations of Computer Science, Edinburgh. nick%lfcs.ed.ac.uk@nss.cs.ucl.ac.uk <Atlantic Ocean>!mcvax!ukc!lfcs!nick ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ...while the builders of the cages sleep with bullets, bars and stone, they do not see your road to freedom that you build with flesh and bone.
mauny@inria.UUCP (Michel Mauny) (05/11/88)
>Actually, we're working on a parser generator interface for Standard ML as >well, complete with antiquotation (concrete syntaxes) and so on. Our system >doesn't rely on yacc syntax, and doesn't generate a parser in source form >(I don't recall how the CAML parser is generated and loaded). Instead, the >grammar is presented as a data structure (in fact, as an environment with >the various types for terminals, nonterminals and the associated actions), >and the parser generator returns a new environment containing a parsing >function. In CAML, the Yacc interface doesn't take care of the parser produced by yacc: it is only interested in parsing tables which are used by the CAML parser itself (and parsing tables of CAML's syntax are produced in this way). We plan also to give a better syntax to a grammar definition and to typecheck (statically when possible) grammar definitions. - What about "lexer" (are they user-definable, hand-written or mechanically produced in lex-style)? - Do you typecheck grammars? - Do you have a complete antiquotation mechanism (with escapes to the meta-language)?
csnjr@its63b.ed.ac.uk (Nick Rothwell) (05/13/88)
From article <689@inria.UUCP>, by mauny@inria.UUCP (Michel Mauny): > In CAML, the Yacc interface doesn't take care of the parser produced by > yacc: it is only interested in parsing tables which are used by > the CAML parser itself (and parsing tables of CAML's syntax are produced in > this way). That's quite a nice solution. I aim to have a single parsing "engine" both for ML and for any concrete quotations. Different instances of this parser wil be used with different tables and lexers. > We plan also to give a better syntax to a grammar definition and to > typecheck (statically when possible) grammar definitions. > > - What about "lexer" (are they user-definable, hand-written or mechanically > produced in lex-style)? I knocked up a lexer generator a couple of days ago. Here (roughly) is the interface: signature LEX_SPEC = sig type LexClass type Terminal type Value val whiteSpace: LexClass val follows: LexClass -> LexClass lib.set val lexClass: string -> LexClass val eof: Terminal val VOID: Value val makeTerminal: string -> Terminal * Value end You specify some LexClass type, give a function from characters to LexClasses, and say which LexClasses can "stick" to others (eg, LexClass = LETTER | DIGIT, "a" is a LETTER, "0" is a DIGIT, and LETTER can be followed by {LETTER,DIGIT}. What you get back is a (side-effecting) lexing function fun lex: (unit -> string) -> (unit -> Terminal * Value) This takes a string "generator" as argument (which may read from a file, or whatever), and returns a terminal (and value) generator. This isn't as powerful as lex, but serves my purposes for the present. > - Do you typecheck grammars? Yes, they're ML data structures, so ML typechecks them. The problem with this is that every production has to have the same type of attribute, so attribute functions range over some "union" datatype, and tend to have rather complex arguments. I have ideas for making this easier. > - Do you have a complete antiquotation mechanism (with escapes to the > meta-language)? We're implementing a "brute-force" approach in the first instance. The ML lexer will be able to associate any token (such as "<<") with a parser (of type instream -> string). When it sees a "<<", it calls the parser on its current input stream, waits to get the text back, and then carries on lexing from this. Hence, the concrete syntax parser must eventually return ML source text. I don't see any reason why a user-generated lexer for the concrete syntax shouldn't be able to do something similar, and call the ML parser to get back to the meta-language. My thoughts are a little unclear on this at the moment. I'll write a report on all of this in a month or two, when my ideas are more settled, and send you a copy, if you're interested. Nick. -- Nick Rothwell, Laboratory for Foundations of Computer Science, Edinburgh. nick%lfcs.ed.ac.uk@nss.cs.ucl.ac.uk <Atlantic Ocean>!mcvax!ukc!lfcs!nick ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ~~ ...while the builders of the cages sleep with bullets, bars and stone, they do not see your road to freedom that you build with flesh and bone.