babraham@Daisy.EE.UND.AC.ZA (Bobby Abraham) (06/13/91)
I am needing some help with the following problem in an simple assembler I am writing. I wish to parse expressions such as mov #10 20 add @13 #6 l: jmp @5 Ideally I would like a lexical analyser to return the following mov (immediate 10) 20 add (indirect 13) (immediate 6) (label l) jmp (indirect 5) I know that # will cause the lisp reader to dispatch a macro but this is no problem as one can (substitute #\! #\# input-line). I suspect that it may be possible to do this by defining read macros although it may be necessary to specify labels in the form :label rather than label: Any help would be gratefully received? Many thanks. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--=-=-=-= Bobby Abraham Dept of Computer Science, University of Natal, Pietermaritzburg babraham@daisy.ee.und.ac.za
miller@FS1.cam.nist.gov (Bruce R. Miller) (06/13/91)
In article <1991Jun13.064841.20364@Daisy.EE.UND.AC.ZA>, Bobby Abraham writes: > I am needing some help with the following problem in an simple > assembler I am writing. > > I wish to parse expressions such as > mov #10 20 > add @13 #6 > l: jmp @5 > > Ideally I would like a lexical analyser to return the following > mov (immediate 10) 20 > add (indirect 13) (immediate 6) > (label l) jmp (indirect 5) > The first thing is to define your own readtable using copy-readtable or such -- you could start off by copying the lisp readtable. Then you'll need to define your own readers to replace the ones that the lisp readtable does `wrong', such as #\#. In your case, #\# and #\@ could be defined similar to the #\` reader macro, something like: (list 'immediate (read stream t nil t)) The colon is slightly tricky. First you need to change it in some way so it nolonger tries to do package prefixes; At least (set-syntax-from-char #\: #\A *assembler-readtable*) to make alphabetic. The catch is that : in your case is a postfix operator. For postfix and infix operators you need to be able to fetch the `previous' parsed object (not to mention dealing with binding powers, etc). Rather than introduce that complexity into the CL standard, the designers decided to leave it out with the proposal that you should use the readtable machinery to `tokenize' the input, and then use a lexical analyzer to do the remaining steps. Nevertheless, if this is the worst case you could still do the whole parse using the readtable. You need to write a function to replace the `symbol' reader; ie what gets used for every alphanumeric char which, if it discovers a #\: at the end returns (list 'label (intern string-so-far ...)) rather than simply (intern string-so-far...) Hope this sketch helps some. Have fun. bruce miller@cam.nist.gov