prindle@NADC.ARPA (Frank Prindle) (04/19/88)
Here is, as I recall it, the format of a C Power (AKA Power C) relocatable object (i.e. a ".o" or ".obj") file; others, please correct me if this is wrong, though I checked it out with the C Power assembler (ASSM) source code and the C Power reverse assembler (RA) source code: There are 5 distinct parts to the object file; each part begins with a 2 byte count in standard 6502 low-byte/hi-byte format. The 5 parts directly follow each other in the following order: 1. relocatable object code 2. relocation entries 3. external definition entries 4. external reference entries 5. uninitialized data block entries I will describe each part in detail: 1. relocatable object code: The first 2 bytes are a byte count of the object code to follow. What follows is simply the generated object code for the corresponding source code file. For those instructions and .byte or .word pseudo ops which reference relocatable addresses within a function (typically for local jumps), the value in the operand field is the offset relative to the first word of object code. For those instructions and .byte or .word pseudo ops which reference externally defined addresses, the value of the operand field is irrelevant and typically filled in by the compiler with bytes which duplicate the byte which immediately preceeds them. This part ends when the number of bytes of object code specified in the count have been encountered. 2. relocation entries: The first 2 bytes are a count of the number of relocation entries to follow. Each relocation entry is exactly 2 bytes long and consists of an offset relative to the first byte of object code. This offset actually points to the byte before a 2-byte address which is to have added to it the absolute address of the first word of object code; that is, for 3-byte instructions, this offset points to the op-code preceed- ing an address to be relocated; for 2-byte addresses without an op-code (e.g. .word pseudo ops), the offset points to the byte before the address to be relocated. 1-byte addresses to be relocated (e.g. >addr or <addr) are not handled by this relocation mechanism, but rather as pseudo extdefs/extrefs (see below). This part ends when the number of 2-byte relocation entries specified in the count have been encountered. 3. external definition entries The first 2 bytes are a count of the number of external definition entries to follow. Each extdef entry is a variable number of bytes long. First appears the externally defined name, terminated with a zero byte. Next is a 1-byte flag; if this flag is 0, the externally defined symbol has an absolute value; if this flag is a 1, the external- ly defined symbol has a relocatable value relative to the first byte of object code. Finally, the last 2 bytes of each entry are the absolute value of the (absolute) symbol, or the offset of the (reloc- atable) symbol. Whenever the compiler must reference only the low or high byte address of a local piece of static data (e.g. a string literal), that datum is given a "pseudo" external definition; that is, the compiler makes up a name for it consisting of several randomly generated special characters and additional identifier characters, then treats it as if it were an external definition. This is done so that it may be referenced by an external reference entry to follow. This part ends when the number of multi-byte extdef entries specified in the count have been encountered. 4. external reference entries: The first 2 bytes are a count of the number of external reference entries to follow. Each extref entry is a variable number of bytes long. First appears the externally referenced name, terminated with a zero byte. Next is a 2-byte word in low/hi format; the low 2 bits of this word indicate if this external reference is to a full 2-byte address (flag=0), a single byte to contain the high byte of the address (flag=1), or a single byte to contain the low byte of the address (flag= 2). The upper 14 bits of this word are an offset into the external object. Finally, the last 2 bytes of each entry are the offset of the external referencing instruction (points 1 byte before the external address reference itself) relative to the first byte of object code. In the case of references to "pseudo" external definitions, the reference will be resolved by the matching external definition, and the flag will always be either 1 or 2. This part ends when the number of multi-byte extref entries specified in the count have been encountered. 5. uninitialized data block entries: The first 2 bytes are a count of the number of data block entries to follow. Each data block entry is a variable number of bytes long. First appears the data block name, terminated with a zero byte. Lastly is a 2-byte size, representing the number of data bytes to be reserved by the linker (and zeroed by the run-time initialization code) for that named data block. These entries are used to represent uninitialized static or external data to prevent large object modules filled with nothing but zeros. Since the data block names are effectively externally defined, dummy "pseudo" extdef names are again created when local static data is to be allocated as an uninitialized data block. The purpose of these randomly generated dummy names is to clue the linker that these are not real external definitions, and to prevent external name conflict with identically named local data in other object modules. This part ends when the number of multi-byte data block entries specified in the count have been encountered. At this point the object file is at end-of-file. I hope the above is a reasonably complete and useful description of C Power object code. Refer also to the Transactor (March 1988) article "The link between C and assembly", which is enlightening, though incomplete. Also note that ASSM faithfully adheres to the above format so that ASSM generated object files may directly be linked with those generated by C Power; however, ASSM uses a different algorithm for the generation of pseudo extdef names, attempting to make those names more readable without sacrificing their uniqueness. Sincerely, Frank Prindle Prindle@NADC.arpa