[comp.sys.nsc.32k] Gcc 1.39, some bugs fixed & cleaner diffs

jkp@sauna.hut.fi (Jyrki Kuoppala) (03/04/91)

I did some gcc hacking.  The result (a new, cleaner (huh?) patch kit
for gcc 1.39 for the pc532) is in
nic.funet.fi:~ftp/pub/misc/pc532/diffs-1.5/gcc-1.39-DIFFS.Z.

Works with pc532 Minix (GNU or WBC tools) or Mach, all with only one
print_operand_address function (well, the gcc 1.39 original is still
there but not used for the pc532).

Before you ask, of course it has an incompatible calling convention
than the previous version I have announced.  Alignment is now on
double-word boundaries for pointers and functions and
pcc-struct-return is not used.  But structure alignment is still only
16 bits in the Minix tm file (properly 32 for Mach), which I don't
like, but can't change because my filesystem is done with 16-bit
alignment and I have no decent backup scheme.

What would you think about using registers for parameter passing ?
Would generate faster code, but I don't know if it makes debugging
more difficult or even if it works with the ns32k target, but could be
worth a try.

Here's the readme file from the patch kit:


Patch your gcc distribution with:

patch -p1 < /usr/diffs/gcc-1.39-DIFFS

Installation for Minix 1.5.10 (or hybrid):


./config.gcc pc532 or pc532-minix for GNU tools (recommended)
./config.gcc pc532-wbc for Bruce's tools
./config.gcc pc532-mach if you are running Mach

Edit Makefile:

- to use gcc as CC if you have a working gcc installed as gcc
- to use gcc as OLDCC if needed (at the end of installation
  gnulib and gnulib2 should be compiled with the new gcc)
- to put the cc1, cpp and others in a suitable place
- to contain -lflt in CLIB if you use the original Minix libc.a
  with no floating point built in
- to build hard-params with -lflt (just add $(CLIB))
  if you use the original Minix libc.a
- change -g to -O if you use Bruce's tools (debugging doesn't
  work with Bruce tools, I think they don't understand stabs.
  Works fine with GNU tools, however.)
- whe using GNU tools and building gcc and gnulib with itself,
  use the newly installed gcc as OLDCC and change AR, OLDAR,
  and RANLIB to be the GNU ones


There's a bug in gas 1.38.1 case statement handling (.word X-Y) with
makes insn-recog.c go into a tight loop if it is compiled with -O.
Compiling it with -g hides the bug and gcc otherwise compiled with -O
but insn-recog.c compiled with -g works.  However, the bug might cause
other serious harm also.

The bruce syntax part has not been extensively tested.  It seems to
work with small programs, at least.


Babbling about addressing modes used for external labels and such:

Bruce's way of generating pc-relative addressing modes does make the
code smaller if the called function happens to be near - a macro
PC_RELATIVE now controls the behavious - if defined, SYMBOL_REFS and
LABEL_REFS are compiled to pc-relative addressing modes.  PC_RELATIVE
currently works only with Bruce's assembler - I suspect a bug in gas
is the reason why it doesn't with gas syntax.

A little bit of testing with the different addressing modes:

With GNU tools, relevant source and what gdb shows it's assembled to
when it comes to a.out format (yes, there really seems to be at least
four different ways to do a simple subroutine call in 32532 with one
instruction, and then gcc sometimes even loads the address to a
register and jumps via it !  Well, that might be efficient if the call
is made several times).

#APP
LC1:
	.ascii "Hello World!\12\0"
.globl _main
_main:
	jsr	_foo(pc)	# jsr	0x61 <foo>, 3 bytes, not that bad.
				#	This is what Bruce's gcc
				#	uses and what PC_RELATIVE does
	jsr	_foo(sb)	# jsr	97(sb), always (?) 6 bytes I think,bad.
				# Probably no reason to use this
	jsr	_foo		# jsr	@0x61 <foo>, always 6 bytes, bad
	jsr	@_foo		# jsr	@0x61 <foo>, 6 bytes, same as previous
				# This is what ylo's p_o_a does without
				# PC_RELATIVE
	bsr	_foo(pc)	# bsr	0x61 <foo>, 2 bytes,  this is the best.
	bsr	_foo		# same as previous.
				# This should be used always for ext. labels
				# (saves one byte over jsr),
				# however gcc doesn'tseem to be smart
				# enough to do it even thought the mechanism
				# for detecting symbol_refs & label_refs
				# exists
.globl _foo
_foo:
	addr @LC1,tos
	addr @___stderr,tos
	jsr @_fprintf
	adjspb -8
	ret	0

Same with Bruce tools:

.globl _main
LC1:
	.ascii "Hello World!\12\0"
_main:
	jsr	_foo(pc)	# jsr	0x6f, 3 bytes
				# This is used when CONSTANT_ADDRESS_P
				# is not true in the call
	jsr	_foo(sb)	# jsr	111(sb), 6 bytes always (?), bad
	jsr	@_foo		# jsr	@0x6f, 6 bytes always, bad
	bsr	_foo		# bsr	0x6f, 2 bytes
				# when CONSTANT_ADDRESS_P is true,
				# this is used - but it's quite rare
.globl _foo
_foo:
	addr LC1(sb),tos
	movd __io_table+8(sb),tos
	bsr _fprintf
	adjspb -8
	ret	0


So, it seems the most efficient thing would be to use bsr whenever
possible.  One might argue that a displacement can't address all of
the memory space, but I don't think gcc can generate code to work on
the higher / middle part of the memory anyway - that would need some
work, and even then I don't think it can be done efficiently.  A
-m-option for this should be added, but I think it's a lot of trouble.

There's code in the md file to use bsr if CONSTANT_ADDRESS_P is true,
but apparently for some reason CONSTANT_ADDRESS_P mostly is not true
for normal library calls etc.  This needs more research.


Changes in this version from the base gcc 1.39:

- support for new targets pc532{,-minix,-wbc,-mach} in config.gcc
  - tm-pc532.h, tm-pc532-min.h, tm-pc532-wbc.h, xm-pc532-min.h
- support for Minix (tm-pc532-min.h)
  - packing structures tighter incompatibly with the general
    ns32k world, due to compatibility reasons.  This should IMHO
    be changed, but I can't do it because my file system depends
    on the incompatible alignment
- support for Bruce syntax assembler (tm-pc532-wbc.h)
- rewritten print_operand_address, much cleaner (change in out-ns32k.h)
- some macros like IMMEDIATE_PREFIX for easier assembler syntax variations,
  used in tm-ns32k.h
- fixes and / not commutativity bug in ns32k.md
- some GAS & Bruce assembler support changes in ns32k.md
- somewhat relaxed MEM_REG macro in tm-ns32k.h - this should work,
  changed GET_CODE == SYMBOL_REF to CONSTANT_ADDRESS_P (X)
- support in the new print_operand_address (out-ns32k.c)
  for generating pc-relative references for smaller code for SYMBOL_REF and
  LABEL_REF if PC_RELATIVE defined,
  also BASE_REG_NEEDED macro to be defined for
  assemblers like Bruce's which require a base register on
  all displacements, we'll fill in (sb).
  - relevant changes for the new function in tm-sequent.h and tm-encore.h,
    in fact just removed the #define for PRINT_OPERAND_ADDRESS
    because it's now in tm-ns32k.h
- alignment on double-word boundaries for functions and pointer to gain
  speed on the 32532 (tm-pc532.h, see also tm-pc532-min.h for Minix
  tighter structure packing.  Now 32-bit alignment is used for functions
  & pointers.
- symout.c: an added 'extern' needed for the ASM_OUTPUT_LOCAL hook
  for wbc syntax (ASM_OUPUT_LOCAL used in tm-pc532-wbc.h)
- conditionals for alignment by macro GAS_SYNTAX (tm-ns32k.h),
  so every tm file which uses gas syntax doesn't have to redefine these
- support for DBX_DEBUGGING_INFO (tm-pc532.h)
- pcc-struct-return not used (tm-pc532.h)
- new switch -mnosb0 to avoid presuming sb is zero (tm-ns32k.h)
  This does not really work yet, some support for it could be gotten
  from tm-genix.h.
- new config files for tm-icm-*.  They just kind of crept in,
  they're for Ian Dall's machine, and I haven't really tested if they
  even compile now.

Todo list:

- finish the -mnosb0 switch
- make sure tm-encore.h, tm-sequent.h, tm-genix.h and tm-icm*.h work,
  I don't suppose they work now.
- still cleanups needed, esp. in ns32k.md the GAS_SYNTAX and BSR_HACK mess
- perhaps add a -mno-disp or something switch to make memory references
  not use displacements.  This would probably require major changes
  and not make efficient code.
- perhaps use register parametres for efficiency in tm-pc532.h
- get PC_RELATIVE working with gas
- (not ns32k-specific): add a feature to save call-saved registers
  in a function which calls setjmp, so gcc works with original Minix
  setjmp (not really very important)
- proper profiling macros for Minix and Mach if the one in tm-ns32k.h
  doesn't work
- work on gcc 2 ns32k backend
- after doing all this, get these changes to official gcc 2 and next gcc 1.XX
  distribution

This print_operand_address stuff for ns32k is something hilarious.  In
the gcc 1.39 distribution, there's two of them to begin with - one in
the tm-ns32k.h file (which is not used in any tm files distributed
with gcc) and one in out-ns32k.c.  The one originally in out-ns32k.c
calls itself passing arguments in static variables ;-).  And I don't
think it works properly with gas.  So, Ian Dall patched it to make it
work with the ICM assembler (and I think also with gas).  Also, Bruce
rewrote it for his assembler.  Then, ylo@hut.fi rewrote it for gas for
the Mach port.

I'm using ylo's version, which I modified to support Bruce's assembler
(only a few lines added, really).  The only thing different I noticed
in a quick test is that external label + offset uses (sb) instead of
(pc) as the Bruce's version.

If you use this on the pc532 Minix, because a different alignment (and
no pcc-struct-return) is used than before, it's a good idea to
recompile all libraries, the kernel and at least ps, which fails
otherwise.

//Jyrki

culberts@hplwbc.hpl.hp.com (Bruce Culbertson) (03/06/91)

Hurray for Jyrki and all the work he has done with the GNU stuff!
I haven't tried his code yet because my pc532 is down at the
moment, thanks to my recent move, but I have heard good reports from
others who have tried them.

> ...But structure alignment is still only
> 16 bits in the Minix tm file (properly 32 for Mach), which I don't
> like, but can't change because my filesystem is done with 16-bit
> alignment and I have no decent backup scheme.

The Minix FS code reads and writes arrays of inode structures to the
inode blocks in the file system.  This kind of code is inherently
non-portable since different compilers pack structures differently.  It
would be nice if the Minix code were made portable, although it would
result in some performance penalty.  I better not criticize to loudly
because someone might point out that my assembler and linker have the
same kind of problem -- they read and write arrays of symbol table
structures.  I need to fix that -- I want the a.out format to stay
the same, regardless of what compiler you use.

> Bruce's way of generating pc-relative addressing modes does make the
> code smaller if the called function happens to be near - a macro
> PC_RELATIVE now controls the behavious - if defined, SYMBOL_REFS and
> LABEL_REFS are compiled to pc-relative addressing modes.

I developed an algorithm which allows my assembler to crunch
displacements (which are variable length on the 32000) down to the
smallest usable size (with some rare exceptions), even for forward
references.  My assembler crunches all displacements it can, even in
addressing modes other than PC-relative.  While this may not quite be a
miracle of programming, you can pay a fair amount of money for
assemblers which cannot do this, e.g. Microsoft's MASM and National's
GENIX assembler (at least in versions I have tried).  Unfortunately,
since this is done at assemble time, not link time, displacements to
external symbols and symbols in BSS and DATA cannot be crunched.
Crunching displacements does make the code a little smaller, which takes
better advantage of the small 32532 I-cache.

> So, it seems the most efficient thing would be to use bsr whenever
> possible.

The 1.35 gcc, which I distributed with pc532-minix, uses bsr most of
the time.  1.37 seems to use jsr for some reason and, I gather, 1.39
does also.  Those later versions seem to have a problem recognizing
constant addresses.

By the way, I used PC-relative addressing as much as possible in an
effort to make the code as position-independent as possible.  Because
C lets you do things like initialize pointers, and because I like
tools which do such initializations at compile-time, it is hard to
make the code completely position-independent.

Keep up the good work, Jyrki!