bson@rice-chex.ai.mit.edu (Jan Brittenson) (11/17/90)
Alright. Here is SAD 1.03, which has been in the works for quite a while. It works fairly well; I've been using it myself for a couple of weeks now, and can't say I've found any anomalities. But then I'm an unsophisticated user, and don't need to press all buttons at once. :-) Still no documentation. Anyway, here is a brief list of changes: * Multiple passes: - Local symbols are generated during pass 1, as well as cross referencing info gathered. - An intermediate `pass F' collects formatting info, and is repeated until no further info can be collected. This may, in grotesquely misused cases, mean indefinitely. - The final pass 2 generates the final output. * GNU Emacs mode additions * The infamous br/ret and several other bugs are fixed. * Cosmetical changes (indentation, code objects, etc) * Partially recoded to improve robustness. Two anomalities that have _not_ been fixed: * .formats MUST contain an explicit statement for address 0. (`0:c' is recommended.) * xcom problems with major comments. SAD 1.03 is available from rice-chex.ai.mit.edu [128.52.38.46] by anonymous FTP, the file is ~/pub/sad-1.03.tar.Z. Like with the anouncements of 1.01 and 1.02 I have included the README file. SAD is distributed in the hope that it will be useful, but with ABSOLUTELY NO WARRANTY; without even implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. -- Jan Brittenson bson@ai.mit.edu O / \/ /\ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ O \ * SYNOPSIS, SAD 1.03 This is the README file for SAD, the Saturn Disassembler package. SAD comes with no documentation at this time, other than this file. When the documentation is completely finished, it will be included in the distribution. No partial documentation is included because it would serve to confuse and cause complaints only. Documentation is under way, and will be available as a GNU Emacs Info tree. For DOS users, the info tree will be included in TeX format. SAD is a package currently consisting of sad (the disassembler), xsym (the symbol extractor), xcom (the comment extractor), sadfmt (formats tool) and sad.el (GNU Emacs SAD mode). The purpose of SAD is to let you disassemble Saturn Machine Language (ML) and RPL code, edit it, and maintain databases of symbols, comments, formats and macros. The formats database contains information directing the disassembler to either ML, RPL, or Data, the latter of which may be complex nested structures. The Macros database contains nibble patterns for various common idioms. You may, for instance, declare that the sequence - 84e20201424 is to be printed as - 84e20201424 GLOBAL "AB" by defining the macro - 5,2e48:GLOBAL "%2S" The formats database lets you declare structures such as x5,2(d2,a5) to instruct the dissassembler to consider this to be Data consisting of a 5-nibble hex integer, and two 5-character strings each preceeded by a 2-nibble decimal integer. A formats entry will remain active and repeated until a new formats entry is applicable. The main purpose is to specify synchronization points and data formatting. Note: In SAD 1.03, macros are used only during RPL disassembly, and are restricted to 5-nibble sequences. You may, however, define macros with pattern tags of any length up to 8 nibbles, they will merely be ignored. * INSTALLATION Typing `make' should be sufficient. Since execution speed is crucial, you may want to turn on all macho speed optimizations available. To do this, edit Makefile. DOS users may have to convert it to some other format, or compile manually. * REPORTING BUGS If you find a bug in SAD, you should report it. But first, you should make sure that it really is a bug, and that it appears in the latest version of SAD that you have. Once you have ascertained that a bug really exists, please mail me a bug report. If you have a fix, please mail that as well! Suggestions and `philosophical' bugs are equally welcome. Please include the following: * The version number of SAD * A description of the bug behaviour * A short script or `recipe' which exercises the bug And mail it to bson@ai.mit.edu. * USAGE Note for DOS users: Please read through these instructions first, as several file names are not compatible with DOS file names. To change the names, edit sad.h, dump2core.c, and scan2core. Dump alternative 1: On your HP48 KGET (or type in) the DUMP program included in Dump.RPL. The checksum should be # 5149h and the size 72. This program takes two arguments: start and end addresses. It will dump memory, using the PEEK program posted Mar 16, 1990 by Alonzo Gariepy. Make sure to set the word size to 64 first, with `64 STWS'. Direct I/O to WIRE, make sure your computer is set to capture the dump. Hook up your HP48, and type in #0h #6FFF0h DUMP. DUMP will continually display the currect dump address in the top left corner of the display, which will otherwise remain blank apart from the menu. DUMP will take a long time. The the entire ROM dump is about 450 kilobytes - so try and use as high speed as possible. The utility dump2core will convert your dump to a core file named .core, which is what the disassembler will be looking for. It reads the dump from standad input, and overwrites .core if it exists; otherwise a new one will be created. [Note: the dump consists of records of two lines each. The first is the address, the second the data as returned by PEEK. Dump2core ignores the address part, it's included only to serve as a reference for you, to allow you to retransmit smaller portions, should it prove necessary. You are recommended to verify that the dump is correct; the following command will list all clobbered lines, if any, along with their line numbers: grep -vn '^# [0-9A-F]+h$' romdump ] Dump alternative 2: Enter memory scanner mode. Hook up you HP48 to your computer, and make sure the HP48 output is captured in a file. Use the scanner to continuously dump 00000-6FFFF by first pressing ENTER followed by / and then keep pressing SPC until done. Copy the provided set of standard symbols, formats, and macros to .symbols, .formats, and .macros respectively: cp stdsymbols .symbols cp stdformats .formats cp stdmacros .macros Use cp, not rm. Keep the standard set of databases right by the source code to facilitate recovering the standard table should you happen to wreck things. Disassembly is done with the `sad' command: sad [flags] start end where are start,end Hex addresses of first and last instructions. flags, A set of flags, always bundled up as one argument. -acsdxz a Assembler format, i.e. PC and opcode fields are suppressed. c Suppression of disassembler comments. s Symbolic addresses are moved to the comments. d The supplementary definition of symbols known, referenced, but not otherwise defined in the output, is suppressed. f Keep repeating the F pass until no further formatting information can be collected. Write output to formats.out. (See sadfmt -j.) 1 One pass only. Skip local symbols. Independent of the f flag. g Don't generate local symbols (globals only). C Don't output any code. Useful if all you want is a cross reference (see -x below), or collect formatting information. x A cross reference is added at the end, as comments with symbols and addresses in the disassembly where they are referenced. z Alonzo mode. PC and opcode fields are printed slightly differently. The initial org instruction is suppressed. Two of the maintenance tools are xsym and xcom. Both take as arguments a collection of bundled-up flags (`xsym -sr' for instance). s Supersede contents of database with information extracted from a listing on standard input. m Merge contents of database with information extracted from a listing on standard input. l Include source line numbers along with any errors or warnings, r Overwrite the database file instead of sending the superseded/merged result to standard output. The third maintenance tools is sadfmt. It takes as its first argument the optional flag "-r", similar to the -r flag or xsym and xcom, as its second (or first when no flags are present) argument an address in hex, and as an optional last argument a new format. If no new format is supplied, the previous format is displayed. Sadfmt also takes these additional command lines: -[r]j [joinfile] Join `joinfile' with .formats. -[r]d addr Remove format, if any, at `addr.' Note: Do not use this form, as it will clobber your file: xsym >.symbols Instead, use: xsym -r The same applies to xcom and sadfmt. * FILES `.core' consists of binary raw data, where each byte corresponds to one nibble. The upper half is reserved for other purposes, but currently unused. Address 0 corresponds to offset 0. `.symbols' consists of lines of the following format: <value>:<symbol> <value>=<symbol> Example: 70579=TOS The presence of either ":" or "=" reflects whether the symbol was defined with a "symbol=val" or "val symbol:" statement. It is currently not used, but may be in the future, especially in conjunction with Formats and Macros. The value is the symbol value, and the symbol is the symbol. The file is not ordered. `.comments' is similar to .symbols: <address>=<comment string> <address>:<comment string> Example: 5b79=Allocate a string Several comments may be bound to the same address, in which case they appear in the specified order. Here "=" and ":" reflect whether the comment is considered a `major comment' or a mere `minor' one. Major comments are comments put on a line of their own, whereas minor comments are appended to the right of the code. "=" signals a major comment, and ":" a minor. The semicolon is implicit, and not included. During disassembly, at any given address, all major comments are output first, follwed by any symbol definitions, and then code with minor comments appended to their right. The file is not ordered. `.formats' contains disassembly formatting information, mostly related to correctly decoding data and synchronization. The directives can be divided into three categories: Machine Language (ML), RPL, and Data. The file constists of entries of the form: <address>:<format> specifying that from <address> and on, <format> is to be active. If during disassembly <address> is about to be passed, the disassembler will back up to <address>. This behaviour is called `synchronization,' and is performed even if an identical format was previously in effect. For RPL and ML, <format> is either `r' or `c' respectively, and may not be nested or combined with or within Data format specifications. The syntax for Data format specifications is: [<repeat>]<formatchar>[<width>] or <format>[,<format>] or [<repeat>](<format>) Where <repeat> and <width> are decimal integers. Commas (,) are used to separate sequences of formats to be used sequentially. The format character <formatchar> is one of the following. `R' refers to the repeat count, and `w' to the width. x Hex R words of W nibbles in hexadecimal. d Dec R words of W nibbles in decimal. o Oct R words of W nibbles in octal. a Ascii R sequences of W characters. s String R sequences of characters whose lengths are determined by a W-nibble word preceding the sequence, minus W. v Vector R sequences of nibbles presented in hex, whose lengths are determined by a W-nibble word preceding the sequence, minus W. w Word R 64-bit words presented in floating point, RPL style. Examples: 5b79:c 2a2b4:r 2a2b4:x5,w [Note: if the example above were actually used, the format effective at 2a2b4 would unpredictably be either one of the two conflicting ones.] `.macros' contains pairs of patterns and macro definitions. The file consists of entries of the form: <length>,<pattern>:<definition> Where <length> is the length of the pattern, <pattern> the pattern data, and <definition> a string to be expanded. The left-hand side of the colon (:) is referred to as the `tag.' The definition is the resultant strings, possibly with embedded expansion directives. These start with a percent sign (%), are optionally followed by width (W) and adjustment (A) terms, and end in a directive character. The interpretation, if any, of w and a is directive dependent. General: %[<w>[,<a>]]<d> Directives: x Hex W (default 5) nibbles as hex digits, or as a symbol. d Dec W (default 5) nibbles as a decimal word. o Oct W (default 5) nibbles as an octal word. b Bin W (default 5) nibbles as a binary word. w Word 64-bit word as a floating-point word, RPL fashion. l Long 84-bit word as a long floating-point word, RPL fashion. a Ascii W (default 1) characters. s String W (default 2) nibble word specifying the string length in nibbles, minus A (default 0). S String W (default 2) nibble word specifying the string length in characters, minus A (default 0). v Vector W (default 2) nibble word specifying the vector length, minus A (default 0), presented as hex digits. i Instr. W nibble (default 5) word minus 4 minus A specifies a length in nibbles to be disassembled as ML. Expands to the word content minus A, in decimal. Returns to previous format after ML of the given length has been disassembled. I Instr Override current format with ML (format `c'). z Skip Skip (advance) W nibbles. + Begin Designate beginning of new block. - End Designate end of block. e End Same. = Equal Assert that the following W nibbles are A. Examples: 5,2a2c:STRING "%5,5s" 5,2933:REAL %w 5,2d9d:PROGRAM%+ 5,312b:END%- 5,2e48:GLOBAL "%2S" 5,2dcc:CODE %5,1i 5,28fc:TYPE%I The general idea is to put the programs in a bin directory and set up a separate directory for each disassembly project. * GNU Emacs AND sad.el The sad-mode facilitates interactive exploration of a core. First edit sad.el and the runfile variables to point to sad, xsym, and xcom as appropriate. (Default is according to the current search path.) Load sad.el and do M-x sad. Emacs will first prompt for a range before setting up a new buffer and disassembling. The range format is <from>-<to> where are from,to addresses in hexadecimal. While in a SAD buffer, the following key bindings are in effect. C-c is the conventional "special mode prefix." C-c d Redisassemble. C-c r Set new range and redisassemble. C-c C-c Call on xsym and xcom to extract information, and redisassemble. Any errors or warnings go into the *SAD Output* buffer. C-c q Quit current buffer. C-c n Set up new buffer with new range. C-c o Set up new buffer with new range in a different window. C-c v View format. C-c f Change format. C-c C-d Remove format. C-c j Join (see sadfmt -j) format file. C-c m Edit macros database. C-c e -or- Move to line of next error in *SAD Output*. C-x ` C-c . -or- Move to symbol definition. Will currently search M-. the current buffer only. C-c , -or- Move to next definition of same symbol, if any. M-, C-c s View value of symbol M-; Add comment, or reindent current comment, as appropriate. M-LF Continue comment on next line. After C-c C-c an attempt is made at approximately preserving the current position, so don't be too suprised if the cursor moves a couple of lines. The window is also recentered around the new point. The range is indicated in the mode line, and also makes the default file name. Should you prefer some other file name, you can change the variable *sad-default-file-name* in sad.el. * CODE NOTES Sad.el, xcom.c, xsym.c, sadfmt.c, dump2core.c, and scan2core, are pretty straightforward, while sad.c does a lot of hairy stuff related to Saturn disassembly. Sad.c, xcom.c, and xsym.c, have some common code in misc.c. Formats.c contain most formats-related code, while macros.c contains what pertains to the implementation of macros. The code aint pretty, but it does work. * MS-DOS I haven't used MS-DOS for, eh, 6 years now. Hopefully someone will make whatever changes are necessary, and repackage SAD with zip/zoo. This should be fairly trivial for anyone with an MS-DOS system. * A FINAL WORD This is SAD 1.01. Don't expect it to be bug free. It will eventually be succeeded by 1.02, but nothing prevents you from using 1.01 as no major changes will occur in the database format. You will be able to reuse your old data with 1.02. Also, by 1.02 a New Syntax Order may rule - simply redisassemble using your old database. This is now 1.02. Several bugs have been fixed, and formats and macros added. Existing databases are compatible. Sad no longer crashes without a symbols database. Quoted names work with xsym. The code is generally more robust. Funny names no longer pose any problems. This is SAD 1.03. A number of features have been added: formats joining, full indentation regardless of format, multiple passes, an intermediate F pass, local symbol support. Also, the GNU Emacs mode has been improved, and several bugs fixed. Funny names are still a little awkward; use them sparingly. NOTICE: the .formats file MUST contain the line `0:c'. * DISTRIBUTION AND COPYRIGHT SAD 1.01 and 1.02 are no longer available from rice-chex.ai.mit.edu. SAD 1.03 can at be picked up with anonymous FTP from rice-chex.ai.mit.edu [128.52.38.46] as `~/pub/sad-1.03.tar.Z'. It is not in the Public Domain as the author retains all copyrights, but it is free software covered by the GNU General Public License. The file COPYING describes this license in great detail. If you find it to be sheer legalese, don't despair: in short, you can do whatever you want with SAD except sell it for anything beyond copying costs, hide the source code, or distribute it or any modifications you've made without the original copyright notices and the file COPYING. SAD is distributed in the hope that it will be useful, but with ABSOLUTELY NO WARRANTY; without even implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Enjoy, -- Jan Brittenson bson@ai.mit.edu