dwex@mtgzfs3.att.com (David E Wexelblat) (05/09/91)
I am working on fixing a rather broken disassembler for the 680x0 series (which is irrelevant to my general problem, but may help find a specific answer). My problem is trying to disassemble code compiled with GCC, which puts constant character strings into the text segment. The program correctly figures out that this stuff is not executable code by tracing all of the paths through the code. But it cannot tell the difference between word and byte data. I think this is a general problem with disassembling any non-split-I/D program. I was wondering if there are any techniques for determining that a given piece of data should be interpreted as a character string as opposed to word data. I would like a general-case answer, but the following constraints can be applied, if necessary: 1) 680x0 processor 2) C compiler - AT&T UNIX-PC v3.51 (which doesn't generally do this) - gcc 3) COFF format object files - stripped - with symbols - with relocation - with debugging I had though about using 'strings' type algorithm, but this is prone to generating garbage, so I'm looking for something better. -- David Wexelblat | dwex@mtgzz.att.com AT&T Bell Laboratories | ...!att!mtgzz!dwex 200 Laurel Ave - 4B-421 | Middletown, NJ 07748 | (201) 957-5871 [In the absence of extensive symbol table info, this sounds like a tough problem. -John] -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
rfg@ncd.com (Ron Guilmette) (05/12/91)
In article <91-05-072@iecc.cambridge.ma.us> dwex@mtgzfs3.att.com (David E Wexelblat) writes: >... My problem is trying to disassemble code compiled with GCC, >which puts constant character strings into the text segment... >... I would like a general-case answer, but the >following constraints can be applied, if necessary: > .. > 3) COFF format object files The general case answer is to stop using COFF and use ELF instead. In ELF, constant data can go into the .rodata or .rodata1 *sections*. The linker will normally combine all input .rodata sections (for all of the .o files given to it as inputs) into one hunk of output .rodata stuff and it will normally attach that to the output .text *segment*, however you can use the MAPEFILE option to override this behavior and to get all of the .rodata stuff placed into its own unique (LOADable) output segment. You could then just ignore that output segment when doing your disassembly. Actually, you do not even need to use the MAPFILE option (necessarily). As long as you do not strip the executable, it will include both a segment header table *and* a section header table. In the section header table there will be one entry for the collected sum of all of the input .rodata sections. This header will indicate where (within the executable) the start and end of all of the .rodata stuff is. You could then just ignore anything in that range. You may be able to duplicate one or both of these techniques with COFF, but I'm not sure. -- // Ron ("Loose Cannon") Guilmette // Internet: rfg@ncd.com uucp: ...uunet!lupine!rfg -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.