[comp.sys.atari.st] object code file formats

anton@postgres (Jeff Anton) (01/07/88)

Would some kind sole tell me or point me at a definition of the
format of .TOS, .TTS, and .GEM files?  I'm working on a porting the gnu
C compiler to generate atari objects.  I'm not attempting to have the
atari run gnu cc.  Please send responces to me by mail.  Also, a pointer
to a way to avoid ever starting gem when booting would be nice.
					Jeff Anton

dag@chinet.UUCP (Daniel A. Glasser) (01/09/88)

In article <81@pasteur.Berkeley.Edu> anton@postgres.berkely.edu writes:
>Would some kind sole tell me or point me at a definition of the
>format of .TOS, .TTS, and .GEM files?  I'm working on a porting the gnu
>C compiler to generate atari objects.  I'm not attempting to have the
>atari run gnu cc.  Please send responces to me by mail.  Also, a pointer
>to a way to avoid ever starting gem when booting would be nice.
>					Jeff Anton

Well, I did reply to Jeff by mail, though I'm unsure of the path
that was used, however, I believe that this will be of interest to the
Atari ST community on the net.  Therefore, here is the information
from the mail that I sent to Jeff.  I'd suggest filing it and reading
it at your leasure.

[.......................... Cut Here .....................]

Now for the information at hand --

GEMDOS executable file format:

	The GEMDOS executable file consists of a header, text segment,
	data segment, symbol segment and relocation segment.

	Conventions used: USHORT = 16 bits unsigned
			  ULONG  = 32 bits unsigned
			  LONG	 = 32 bits signed

    GEMDOS header:

	struct gemhdr {
		USHORT	gh_magic;	/* always == 0x601A	*/
		ULONG	gh_tsize;	/* size of text segment	*/
		ULONG	gh_dsize;	/* size of data segment	*/
		ULONG	gh_bsize;	/* size of bss segment	*/
		ULONG	gh_ssize;	/* size of symbol segment */
		ULONG	gh_reserved[2];	/* 2 longs, "must" be 0	*/
		USHORT	gh_reserve;	/* Reserved, "      "	*/
	};

	The text and data segments must have even length.
	The magic number resolves into a "BRA .+1C" instruction.

    The TEXT segment:

	This section of the file starts immediatly following the
	header in the file (offset 0x1C).  All relocatable references
	must be long and word-aligned.  The GEMDOS program loader
	adds the address of the beginning of the TEXT segment to
	all relocation references.  [see relocation format, below]
	After the program is loaded, execution begins at the first
	word in this segment.

    The DATA segment:

	This section of the file starts immediatly following the
	TEXT segment.  (offset 0x1C+text_seg_size).  Relcatables
	are TEXT segment based, and relocated in the same manner
	and with the same restrictions as the TEXT segment ones.

    The BSS segment:

	This is not stored in the file.  It is allocated at runtime
	by GEMDOS (actually, by the runtime startup.)

    The SYMBOL segment:

	This contains symbol table information for DRI linker style
	executables.  For Mark Williams C style executables pre 3.0,
	this segment is empty.  As of 3.0, this segment holds debug
	and symbol table information in Mark Williams format.  Write
	or call Mark Williams for more information on this.

	The DRI symbol table format is:

		struct drisym {
			char	ds_name[8];	/* null padded ident.	*/
			USHORT	ds_type;	/* symbol type flags.	*/
			LONG	ds_value;	/* signed 32 bit value.	*/
		};

	Type flags for the ds_type field:

		DEFINED		0x8000		defined symbol
		EQUATED		0x4000		equated symbol
		GLOBAL		0x2000		global symbol
		EQUATED_REG	0x1000		equated register
		EXTERNAL_REF	0x0800		external reference
		DATA_RELOCATE	0x0400		data based relocatable
		TEXT_RELOCATE	0x0200		text based relocatable
		BSS_RELOCATE	0x0100		bss based relcatable

    RELOCATION (fixup) segment:

	The relocation segment begins with a longword that specifies
	the offset into the text segment of the first relocatable
	reference.  This is followed by a stream of unsigned bytes
	that specify the next fixup.  A zero byte flags the end of the
	relocation table, a value of 1 means add 254 to location counter
	and fetch the next byte, any other even value is added to the
	location counter and the longword at that address is relocated.
	All other odd valued bytes are reserved for future use.
	If the initial longword is 0, there is no relocation information.

Loading and relocating:

	When GEMDOS loads a program file through Pexec() it goes through
	the following steps:  The largest segment of free memory is
	allocated to the process and a prototype basepage is built at
	the beginning of it.  The program file header is read and the
	basepage is filled in.  The text and data segments are then
	read into memory.  The symbol segment is skipped and the first
	longword of relocation information is read into the fixup location
	pointer.  If this value is 0, the system proceeds with the action
	specified by the Pexec() mode.  Otherwise, the address of the
	text segment is added to the fixup location pointer and the longword
	at that address has the address of the text segment base added to it.
	The loader then reads the relocation stream until it gets a 0 byte,
	if a byte is 1, 254 is added to the fixup location pointer and
	the next byte is read from the relocation stream; any other even
	non-zero byte is added to the fixup location pointer and the
	longword at that location has the base address of the text segment
	added to it.  Once a zero byte is encountered in the relocation
	stream, Pexec() proceeds to either set up for execution (loading
	appropriate registers with appropriate values and then jumping
	to the first location in the text segment) or returns the basepage
	address to the caller.

Notes on fixup generation:

    o	All relocatable values must be word aligned longwords.
    o	Relcation references to text segement locations are zero-based.
	Data is loaded directly after the text segment, and relocation
	references to data segment locations are based off of the text
	segment base, thus a relocatable reference to the first data location
	would be the text segment size.  BSS is "loaded" immediately
	following the data segment, thus BSS references are the offset
	into the bss segment added to the combined size of the data and
	text segments.  This makes one-pass linking difficult.

Other comments:

    o	All ROM versions of GEMDOS released so far have a bug which
	prevents loading of programs with relocation segments > 32K bytes.
	Atari has confirmed this bug and says it will be fixed in the
	next version of GEMDOS.  The date this was written is 8-January-1988.

    o	You will need references to GEMDOS to understand the basepage
	and other GEMDOS issues.  I recommend the Mark Williams C manual
	for much of this information.

    o	It is up to the runtime startup code, which must begin in the
	first byte of the text segment to set up the program stack and
	free unneeded memory back to the system.  Look at just about
	any vendors runtime startup module source for how this works.

    o	Some documents have the header wrong, listing 3 reserved longwords
	at the end.  There are only 2, plus a short.

Disclaimer and plea:

	I work for Mark Williams Company.  I am responsible for much of
	the Mark Williams C compiler package for the Atari ST, and have
	made considerable contributions to the documentation for that
	package.  Despite this, the material contained in this message
	is not a Mark Williams Company product.  All opinions in this
	message are my own and have not been cleared by my employer.
	Therefore, the information contained in this message is presented
	without warranty.  Any errors in content, grammer or spelling are
	my own.  Please don't call Mark Williams Company about this
	information, send me mail to one of the addresses in my signature
	or write me at 6030 N. Kenmore Ave., Apt. 512, Chicago, IL, 60660.

Final words:

	I hope the above answers the questions...
-- 
					Daniel A. Glasser
					...!ihnp4!chinet!dag
					...!ihnp4!mwc!dag
					...!ihnp4!mwc!gorgon!dag
	One of those things that goes "BUMP!!! (ouch!)" in the night.

michael@garfield.UUCP (Mike Rendell) (01/12/88)

In article <81@pasteur.Berkeley.Edu> anton@postgres (Jeff Anton) writes:
>Would some kind sole tell me or point me at a definition of the
>format of .TOS, .TTS, and .GEM files?  I'm working on a porting the gnu
>C compiler to generate atari objects.  I'm not attempting to have the
>atari run gnu cc.  Please send responces to me by mail.  Also, a pointer
>to a way to avoid ever starting gem when booting would be nice.
>					Jeff Anton

I tried to reply via mail but someone claims not to know about your site.
Maybe this header will be of some help to you:

] Delivery-date: Mon, 11 Jan 1988 15:44:43 UTC-0330
] Originator:    ucbvax.Berkeley.EDU!MAILER-DAEMON@uunet.uucp
] Send-date:     Mon, 11 Jan 1988 14:12:42 UTC-0330
] From:    <ucbvax.Berkeley.EDU!MAILER-DAEMON@uunet.uucp>
] To:      <garfield!michael.uucp>
] Subject: Returned mail: Service unavailable
] 
]    ----- Transcript of session follows -----
] >>> RCPT To:<postgres!anton@pasteur.berkeley.edu>
] <<< 554 <postgres!anton@pasteur.berkeley.edu>... UUCP host name postgres not re
] cognized at this site
] 554 <pasteur!postgres!anton>... Service unavailable

  Anyway, the reason I am repling is that I have already done what you are
intending to do.  Some minor changes needed to be made to gcc and
gas - these included stuff to tell gcc that ints were 16 bits (there are
problems with passing arguments for bios calls otherwise, also much faster)
and some changes to get gas to dump .o files that are usable on the sun
(so I could adb them there...).  Other stuff that was needed was a program to
convert unix a.out executables to gem format, long multiplication/
division/modulo routines for gcc, and of course libc (which is still under
construction).  If you want some/all of the stuff I have done just send a
note (I hacked 4.3bsd (vax) ld/strip/size/nm/ranlib so I can't just send
them to you - maybe diffs if you have a source licience?) The only thing
that is really missing is floating point routines - any sugestions as to
PD versions (in C or assembler) for these would be helpful.


Mike Rendell				Department of Computer Science
michael@garfield.uucp			Memorial University of Newfoundland
uunet!garfield!michael			St. John's, Nfld., Canada
 (709) 737-4550				A1C 5S7

apratt@atari.UUCP (Allan Pratt) (01/12/88)

First, thanks to Daniel Glasser for his posting.  There are one or
two things I want to clarify, though...

in article <2082@chinet.UUCP>, dag@chinet.UUCP (Daniel A. Glasser) says:
> 	... All relocatable references must be long and word-aligned. 

Clarification: all relocatable references must be longwords and must be
word-aligned.  (Another reading of the above sentence is, "They must be
longword-aligned and word-aligned.")

>     The BSS segment:
> 
> 	This is not stored in the file.  It is allocated at runtime
> 	by GEMDOS (actually, by the runtime startup.)

The BSS segment is allocated by GEMDOS.  If you ask for 32K of BSS, your
program will get 32K of BSS (as you can see by checking your basepage). 
What gets set up by the runtime startup is the HEAP, which is the space
between the end of your declared BSS and your initial stack pointer.  It
is the size of the HEAP that you set when you assemble GEMSTART (for
instance). 

>     The SYMBOL segment:
> 
> 	Type flags for the ds_type field:
> 
> 		DEFINED		0x8000		defined symbol
> 		EQUATED		0x4000		equated symbol
> 		GLOBAL		0x2000		global symbol
> 		EQUATED_REG	0x1000		equated register
> 		EXTERNAL_REF	0x0800		external reference
> 		DATA_RELOCATE	0x0400		data based relocatable
> 		TEXT_RELOCATE	0x0200		text based relocatable
> 		BSS_RELOCATE	0x0100		bss based relcatable

There are some more types than this: 0x0080 means "FILE" and is used by
the linker (well, by ALN, at least, and possibly LO68) to show where a
file starts.  (The symbol name is the file name, and the symbol value is
the address of the start of the text segment of that file (even if it
doesn't have anything in the text segment)). 

ALN also uses the next bit, 0x0040, to mean "ARCHIVE" -- this is an
ALN-specific extension, and is only used in conjunction with FILE.
The start of an archive is marked with a symbol of type ARCHIVE FILE,
where the symbol name is the archive name.  The end of the archive
is marked with a symbol of type ARCHIVE FILE with NO name (all nulls).

Thanks again to Dan Glasser for this posting.

============================================
Opinions expressed above do not necessarily	-- Allan Pratt, Atari Corp.
reflect those of Atari Corp. or anyone else.	  ...ames!atari!apratt