[comp.sys.hp] 9000/835 loader and assembler problems

andrew@comp.vuw.ac.nz (Andrew Vignaux) (06/01/89)

I've hit a small problem trying to port KCl to our 835.

KCl uses dynamic loading [ld -A] to load its object files into
memory.  However, it has appended some text to the object file which
it loads separately.  All the lds that I have seen before, allow extra
rubbish on the end of object files, but the 835 loader says

	/bin/ld: foo.o: Not a valid object file (invalid system id)

if the length of the data is >= 128 bytes.  Interestingly, it works
fine if the length is < 128 bytes.  [The system-id is valid in both
cases!]  Any ideas?

Another problem I am having, is what to do with the object file after
I have loaded it.  I read the object module's header to determine how
much space I should allocate in memory.  I allocate the space, and
pass the starting address to "ld -A bar -R %x -o baz".  The object file
that I get back has a number of interesting properties

  - the starting address of the text segment has been rounded up to a
    page boundary -- is there anything in the architecture that
    requires this?

  - the starting address of the data segment has also been rounded up
    to a page boundary.  Again is there any real reason for this?

  - I want to branch to the first routine in the file.  The
    inter-space stub seems to be at TEXT+4 (*TEXT is a break).  Is
    there a better way to find this?

  - the header gives a different size than the one I worked out
    earlier [Surprise, surprise].  Any suggestions on a better size
    predictor (I am currently using size+PAGESIZE).

I guess I should write my own linker :-(

While I've got your attention :-), are there any 9000/800 assembler
gurus out there?

I've got the following declaration C declaration:

	extern struct character character_table[];

which KCl indexes with a character.  Because characters are signed on
some machines, the space for the array is defined in an assembler file

		.globl	_character_table
		.space	1024
	_character_table:
		.space	1024

in the appropriate syntax for the particular machine.  Note: the label is in
the middle of the space.  I've been able to put this in the DATA
subspace but not in the BSS subspace.  Any thoughts on how to get this in
the BSS subspace?

Is it possible to get the assembler to define a symbol that is the
value of another symbol + an offset?

Andrew
-- 
Domain address: andrew@comp.vuw.ac.nz   Path address: ...!uunet!vuwcomp!andrew

shankar@hpclscu.HP.COM (Shankar Unni) (06/03/89)

> I've got the following declaration C declaration:
> 
> 	extern struct character character_table[];
> 
> which KCl indexes with a character.  Because characters are signed on
> some machines, the space for the array is defined in an assembler file
> 
> 		.globl	_character_table
> 		.space	1024
> 	_character_table:
> 		.space	1024
> 
> in the appropriate syntax for the particular machine.  Note: the label is in
> the middle of the space.  I've been able to put this in the DATA
> subspace but not in the BSS subspace.  Any thoughts on how to get this in
> the BSS subspace?

     .space $PRIVATE$       ; this is how you specify spaces  
     .subspa $BSS$          ; this is how you specify subspaces
 _character_table
     .comm  1024
     
Better still, consider the following:

A source file called chartab.c, whose contents are the single line

   char _character_table[1024];

cc -c this file, and you get an object file that has a "common definition"
for your space. Portably.
----
Shankar.

andrew@comp.vuw.ac.nz (Andrew Vignaux) (06/05/89)

In article <1340054@hpclscu.HP.COM> shankar@hpclscu.HP.COM (Shankar Unni) writes:
>In article <14870@comp.vuw.ac.nz> I wrote:
>> 		.globl	_character_table
>> 		.space	1024
>> 	_character_table:
>> 		.space	1024
>> 
>> in the appropriate syntax for the particular machine.  Note: the label is in
>> the middle of the space.  I've been able to put this in the DATA
>> subspace but not in the BSS subspace.  Any thoughts on how to get this in
>> the BSS subspace?
>
>     .space $PRIVATE$       ; this is how you specify spaces  
>     .subspa $BSS$          ; this is how you specify subspaces
> _character_table
>     .comm  1024
>Shankar.

I'm afraid that's not what I meant (I guess my paragraph was ambiguous).
What I need is a "common" definition that is in the MIDDLE of the data, so
the program can use a signed char to access it (pretty wierd huh!).  I
can't get this to happen in the BSS subspace.

BTW: the 800 assembler requires a label for the .COMM directive.

On a related issue (yes still KCl), is there any way to use the value of
$global$ in a C routine? (The C compiler doesn't like the $).  I could get
an assembler routine to put the value in a global, but that means a memory
dereference every time a certain macro is used (rather than constant
folding done at compile/link time).  I'll probably just hard code 0x40000000.

Andrew
-- 
Domain address: andrew@comp.vuw.ac.nz   Path address: ...!uunet!vuwcomp!andrew

jmorris@hpsemc.HP.COM (John V. Morris) (06/07/89)

Unlike most lds, the 835 linker loads multiple modules from a single file.
Thus, if you append extra stuff at the end of an object file, the 835
linker thinks there is another module present and attempts to load it.

The workaround is fairly simple.  You have to add a valid object module
header in front of the extra stuff.  I've attached a program that shows
how to do it.

Coincidently, the header is is 128 bytes long.

Concerning the address rounding.  I believe the addresses are rounded
in order to support load on demand and shared code.  There ought to be a 
way to turn it off, but I don't know how to do it.
Does the -N option help? 


John Morris
HP Technology Access Center
(415) 725-3871


---------------------------- dummy_som.c -----------------------------
/*********************************************************************
dummy_som  file   >>object.o

dummy_som creates a Standard Object Module (SOM) containing the given file.
   The dummy SOM can be appended to a conventional object file and will
   be ignored by the linker.

This program is useful for applications that wish to append their own data
   to object files.

Written for the HP 9000 S800 at the HP Technology Access Center.

**********************************************************************/

#include <filehdr.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>


main(argc, argv)
/********************************************************************
dummy_som creates a null Standard Object Module containing the given file
************************************************************************/
char **argv;
    {
    struct header hdr;
    struct stat file_status;
    char buffer[8192];
    int fd, len;

    /* get the arguments */
    if (argc<2)
      {perror("usage: dummy_som file >> obj.o"); exit(1); }

    /* open the file to append to header */
    fd = open(argv[1], O_RDONLY);
    if (fd < 0)
	{perror("dummy_som: Can't open input file"); exit(1);}

    /* get information about the file */
    if (fstat(fd, &file_status) < 0)
	{perror("dummy_som: Can't get status of input file"); exit(1);}

    /* create a dummy header that reserves the extra space */
    memset(&hdr, 0, sizeof(hdr));

    hdr.system_id = HP9000S800_ID;
    hdr.version_id = VERSION_ID;
    hdr.a_magic = RELOC_MAGIC;
    hdr.som_length = sizeof(hdr) + file_status.st_size;

    hdr.checksum = compute_checksum(&hdr);

    /* output the dummy headr */
    write(1, &hdr, sizeof(hdr));

    /* append the data file to the header */
    while ((len=read(fd, buffer, sizeof(buffer))) > 0)
	write(1, buffer, len);

    return 0;
    }


compute_checksum(hdr)
/****************************************************************
compute_checksum calculates the checksum of an object module header
******************************************************************/
    struct header *hdr;
    {
    int sum, *ptr, i;

    /* start at beginning of header */
    sum = 0;
    ptr = (int *)hdr;

    /* add up the checksum */
    for (i = 0; i<sizeof(*hdr)/4 - 1; i++)
       sum = sum ^ ptr[i];

    /* done */
    return(sum);
    }

mar@hpclmar.HP.COM (Michelle Ruscetta) (06/08/89)

> I've hit a small problem trying to port KCl to our 835.
> 
> KCl uses dynamic loading [ld -A] to load its object files into
> memory.  However, it has appended some text to the object file which
> it loads separately.  All the lds that I have seen before, allow extra
> rubbish on the end of object files, but the 835 loader says
> 
> /bin/ld: foo.o: Not a valid object file (invalid system id)

  [ correctly answered in previous response ] 

> 
> Another problem I am having, is what to do with the object file after
> I have loaded it.  I read the object module's header to determine how
> much space I should allocate in memory.  I allocate the space, and
> pass the starting address to "ld -A bar -R %x -o baz".  The object file
> that I get back has a number of interesting properties
> 
> - the starting address of the text segment has been rounded up to a
> page boundary -- is there anything in the architecture that
> requires this?
> - the starting address of the data segment has also been rounded up
> to a page boundary.  Again is there any real reason for this?
> 

    Yes, the HPUX loader requires page alignment of both the text and data
    segments. This is primarily because memory protection is done on
    a page basis. Even though you will essentially be 'loading' your
    own code, this alignment is still performed.

> 
> - I want to branch to the first routine in the file.  The
> inter-space stub seems to be at TEXT+4 (*TEXT is a break).  Is
> there a better way to find this?
> 
    You MUST use the "exec_entry" field in the HPUX auxiliary header (which
    immediately follows the standard file header), or use the entry_offset
    field in the file header.

> - the header gives a different size than the one I worked out
> earlier [Surprise, surprise].  Any suggestions on a better size
> predictor (I am currently using size+PAGESIZE).
> 

    Sorry, no good size predictor -- it is very difficult to determine
    the size of an a.out file, given a relocatable object file, unless
    you know thatthe a.out doesn't include any code from other objects.

> I guess I should write my own linker :-(
> 

    Good luck! -- The linker for the series 800 is much more complex than
    the linkers I have seen for other CISC architecures -- due to some
    RISCY requirements.

    There are some other things that complicate dynamic linking on the
    s800 architecture:  

	1) HP-UX on the s800 still does not support non-sharable, writable 
           text, so dynamically-loaded code must be placed in the data space.
           This means that inter-space "stubs" must be created in order to 
           support brancheinh between the code and the data space (This is 
           because the standard procedure call and return sequence cannot 
           branch across spaces).

        2) The process of "stack unwinding" cannot handle 
           dynamically-loaded code, so getting a stack trace from a
           debugger will be impossible when executing within the dynamically
           loaded code -- this is also why the Pascal try/recover (escape())
           feature will not work.

        3) Address relocation is complicated by the instruction format, which
           is not a typical " add a constant to a full word" type of patching
           (In fact, for fun take a look at the a.out manual page to see what
           the fixup formats look like).

        4) You have to be careful about flushing the instruction/data caches
           (due to #1 above), before executing the code that has been 'loaded'
           into memory.


Below, I have an example of a program which uses dynamic linking, this might
give you some help/insight as to what's involved with dynamic linking on the
series 800, using the ld -A option.

  The -A option was implemented in the s800, HPUX 3.0 release.

  The -A option is used when you want to dynamically link a file 
  from an existing 'main' program. The link command is called from within
  the main program (using 'system()' or 'exec()'), using the main program
  as the basefile (ld -A basefile ...) so that any symbols defined in the 
  basefile will be used to resolve references from the file which is being 
  dynamically linked (for example if you want to make calls from a dynamically 
  linked function to routines which are defined in the main program). 
  Normally, space is allocated in the main program's data area using malloc(),
  but since you don't know the size of the executable file that you will be
  placing into the data area, the malloc size is just a guess.
  The address returned from malloc must be page-aligned, and then can be used
  in the link (ld -A basefile -R data_address ...) command to inform the linker
  to link the file using that address for code placement. The link command 
  should also sppecify the -N option to tell the linker to place the data
  immediately following the code, since we want code and data to be contiguous
  when we read it into the main program's data area.
  The executable file resulting from the link can then be read into the space 
  allocated using information from the HPUX auxiliary header record, such as 
  size of text, the file location of the program entry point, and the size of 
  data. The execuatble file is read into data, and then can be executed
  by dereferencing a function pointer which has been set to the address of
  the entry point (found in the HPUX auliary header).
  There are other details to be taken care of as well, such as doing a memset
  for BSS (to initialize all of bss to zero), since the loader (exec()) usually
  does that for you, and we are bypassing the loader. 

  Basic steps: 
  (Note: this is not necesarily a complete nor syntactically correct C 
   program but serves for illustration only):

main()
{
     char *x; 
     int (*funcptr)();

     x = malloc(some_large_size);

     /* page align since ld expects page-align value for -R */ 
     page_align(x);

     /* get the value of 'x' into the ld command that we are going to call */
     sprintf(cmd_buf, "ld -A basefile -R %x -N dynfunc.o -o dynfunc -e foo",x);
     
     /* call the linker to link the file */
     system(cmd_buf);

     /* now we open the resulting executable for reading */
     fileptr = fopen("dynfunc", "r");

     /* seek to and read the auxiliary header record 
     fseek(fileptr, sizeof(struct header), 0);
     fread(&filhdr, sizeof(filhdr), 1, fileptr);

     /* determine the size of the executable -- and see if we allocated enough
        space
     */
    dynfunc_size = filhdr.exec_dmem + filhdr.exec_bsize - filhdr.exec_tmem; 
    if(dynfunc_size > some_large_size) {
           /* do something -- either error, or realloc and relink */
    }

    /* seek to and read in the text area of the dynamically linked file */
    fseek(f, filhdr.exec_tfile, 0);
    fread(filhdr.exec_tmem, filhdr.exec_tsize, 1, f);

    /* seek to and read in the data area of the dynamically linked file */
    fseek(f, filhdr.exec_dfile, 0);
    fread(filhdr.exec_dmem, filhdr.exec_dsize, 1, f);

    /* init the BSS area to zero */
    memset(filhdr.exec_dmem+filhdr.exec_dsize, 0, filhdr.exec_bsize);

    /* set the function ptr to the entry point of the dynamically linked file */
    funcptr = (int (*)()) (filhdr.exec_entry);

    /* flush the data and instruction caches -- not this must be done on the
       series 800 ! -- see the flush_cache assembly routine below */

    flush_cache();

    /* call the dynamically linked function */
    (* funcptr)();
} /* END OF PROGRAM */


The following is the routine that can be used to flush the caches courtesy
of Cary Coutant:

;
; Routine to flush and synchronize data and instruction caches
; for dynamic loading
;
; Copyright Hewlett-Packard Co. 1985
;

	.code

; flush_cache(addr, len) - executes FDC and FIC instructions for every cache
; line in the text region given by the starting address in arg0 and
; the length in arg1.  When done, it executes a SYNC instruction and
; the seven NOPs required to assure that the cache has been flushed.
;
; Assumption:  the cache line size must be at least 16 bytes.

	.proc
	.callinfo
	.export	flush_cache,entry
flush_cache
	.enter
	ldsid	(0,%arg0),%r1
	mtsp	%r1,%sr0
	ldo	-1(%arg1),%arg1
	fdc	%arg1(0,%arg0)
loop	fic	%arg1(%sr0,%arg0)
	addib,>,n	-16,%arg1,loop	; decrement by cache line size
	fdc	%arg1(0,%arg0)

	; flush first word at addr, to handle arbitrary cache line boundary
	fdc	0(0,%arg0)
	fic	0(%sr0,%arg0)
	sync
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	.leave
	.procend

	.end

cary@hpcllak.HP.COM (Cary Coutant) (06/08/89)

A few comments to the previous responses:

1.  An easier way to put your own junk at the end of an object
    file is to put 128 bytes of zeroes before the junk.  The
    linker will not attempt to read any more object modules
    from the file if it sees a header full of zeroes.

2.  The flush_cache() routine in the previous response was
    an old version that may not work correctly on some HP-PA
    implementations.  I've included the correct version below.

3.  One way to guarantee that you have to do the ld -A link
    only once is to call sbrk(0) before the link to obtain the
    starting address (for the -R option), then after the link
    use sbrk() to allocate enough space.  This technique
    assumes that you don't do anything that would cause a call
    to malloc() in between the two calls to sbrk().

4.  The linker does indeed round both text and data addresses
    to page boundaries because of loader (i.e., exec())
    requirements.  For -N links, this rounding should probably
    be eliminated, and we may fix this in a future release.
    For this reason, you should always make sure you look in
    the aux header exec_tmem and exec_dmem fields for the
    actual addresses.

Cary Coutant, Hewlett-Packard Computer Language Lab

;
; Routine to flush and synchronize data and instruction caches
; for dynamic loading
;
; Copyright Hewlett-Packard Co. 1985
;

	.code

; flush_cache(addr, len) - executes FDC and FIC instructions for every cache
; line in the text region given by the starting address in arg0 and
; the length in arg1.  When done, it executes a SYNC instruction and
; the seven NOPs required to assure that the cache has been flushed.
;
; Assumption:  the cache line size must be at least 16 bytes.

	.proc
	.callinfo
	.export	flush_cache,entry
flush_cache
	.enter
	ldsid	(0,%arg0),%r1
	mtsp	%r1,%sr0
	ldo	-1(%arg1),%arg1
	copy	%arg0,%arg2
	copy	%arg1,%arg3

	fdc	%arg1(0,%arg0)
loop1	addib,>,n	-16,%arg1,loop1	; decrement by cache line size
	fdc	%arg1(0,%arg0)
	; flush first word at addr, to handle arbitrary cache line boundary
	fdc	0(0,%arg0)
	sync

	fic	%arg3(%sr0,%arg2)
loop2	addib,>,n	-16,%arg3,loop2	; decrement by cache line size
	fic	%arg3(%sr0,%arg2)
	; flush first word at addr, to handle arbitrary cache line boundary
	fic	0(%sr0,%arg2)

	sync
	nop
	nop
	nop
	nop
	nop
	nop
	nop
	.leave
	.procend

	.end

andrew@comp.vuw.ac.nz (Andrew Vignaux) (06/09/89)

This is great -- thanks to everyone who responded.  However, there are
a few comments I would like to make.

In article <1340056@hpclmar.HP.COM> mar@hpclmar.HP.COM (Michelle Ruscetta) writes:
> You MUST use the "exec_entry" field in the HPUX auxiliary header (which
> immediately follows the standard file header), or use the entry_offset
> field in the file header.

At least in the version of the loader I am using (A.01.04??) both of
the fields point at the "main" program's $START$.  I can't use the -e
option because I don't know the name of the initial function.  Is it
unreasonable to get ld to default to the "first" function in the
dynamically loaded file when -A is used?

BTW: My tmem+4 hack doesn't work if the loaded object does any
indirect function calls.  I should probably search around in the
symbol table to find the correct address & 03.

I had not realised that I needed a flush_cache() routine after my
load.  I had read the note after the SYNC instruction, thought "You'll
never catch me writing self-modifing code", and promptly forgot it.

I don't think I would have guessed about ld's multiple module feature.
I was using a wrapper around ld to strip the "data" off while the load
was going on, which was a bit slow and a little dangerous.  Things are
working a lot better (and faster) now.

My latest incremental loading problem is trying to use function
pointers in the main program that have been computed in the
dynamically loaded routine (here-in-after referred to as fred).
Shouldn't the address of a routine be the address of the export stub?
I suspect function pointers from the main program, being used in fred,
would have the same problem -- but, fortunately, I don't think I need to
do that.

Does setjmp/longjmp cope with multiple space programs?

BTW: adb doesn't like my incrementally loaded objects. (segmentation fault)

Does the loader need to generate a different import stub for every
call for the same routine?

Thanks,
  Andrew
-- 
Domain address: andrew@comp.vuw.ac.nz   Path address: ...!uunet!vuwcomp!andrew