[net.lang] Optimization technique wanted...

jss@sjuvax.UUCP (Jonathan Shapiro) (11/06/84)

[Aren't you hungry...?]

	I have a questino about optimizing assembler source... Given that
one is generating relocatable code on a microprocessor which has
offsets whose length in bytes is dependent on the offset amount - i.e.
jumping 63 bytes can be accomplished with a one byte relative address
value, while jumping 64 bytes needs a two byte relative address - how does
one handle the case of:

	B: <code>
	...
	JUMP C
	JUMP A
	JUMP B
	...
	A: <code>
	...
	C: <code>

Any thoughts???

johnl@godot.UUCP (11/10/84)

Long vs. short branch optimization is an interesting problem.  It turns out
that generating perfectly optimal branches is NP-complete, i.e. there's no
known method that's better than enumerating the possibilities, although it's
possible that there is one that we haven't found one yet.  (If you find one,
please let me know.  You could even call collect.)

Fortunately, there are practical nearly-optimal schemes for real programs.
One was written up in CACM a few years ago.  In general, you start by assuming
that all of your branches might have to be long, although you pretend
that they're all short and assign tentative addresses that way.  Then you
interate over a table of all of the branch instructions that you have to make
a decision about, marking those that can definitely be short, even if all of
the undecided branches in their span turned out to be long.  When you find
that you can't mark any more, the unmarked ones are long and the marked ones
are short.

The Vax Unix assembler does this, and it usually turns out that 2 or 3 passes
over the table of branches do the trick.  This approach sometimes misses a
marginal case, but that's better than overoptimizing and generating bad
code.  I recommend the CACM article if you want to understand it better.

John Levine, ima!johnl

eich@uiucdcsb.UUCP (11/11/84)

See "Assembling Code for Machines with Span-Dependent Instructions",
Thomas G. Szymanski, CACM April 1978 Volume 21 No. 4 p 300-308.

Brendan Eich
uiucdcs!eich

ted@scc.UUCP (Ted Goldstein) (11/27/84)

> Long vs. short branch optimization is an interesting problem.  It turns out
> that generating perfectly optimal branches is NP-complete, i.e. there's no
> known method that's better than enumerating the possibilities, although it's
> possible that there is one that we haven't found one yet.  (If you find one,
> please let me know.  You could even call collect.)
> 
> Fortunately, there are practical nearly-optimal schemes for real programs.
> One was written up in CACM a few years ago.  In general, you start by assuming
> that all of your branches might have to be long, although you pretend
> that they're all short and assign tentative addresses that way.  Then you
> interate over a table of all of the branch instructions that you have to make
> a decision about, marking those that can definitely be short, even if all of
> the undecided branches in their span turned out to be long.  When you find
> that you can't mark any more, the unmarked ones are long and the marked ones
> are short.
> 
> The Vax Unix assembler does this, and it usually turns out that 2 or 3 passes
> over the table of branches do the trick.  This approach sometimes misses a
> marginal case, but that's better than overoptimizing and generating bad
> code.  I recommend the CACM article if you want to understand it better.
> 
> John Levine, ima!johnl


Well, it depends on the architecture. On the 8086 and family, where jumps
are relative  8 or 16 bit branch to the Program Counter register, we
only have to keep the previous 256 bytes worth of instructions in a ring
buffer.
A jump is first considered to be a long jump. If its target appears before
the jump leaves the ring buffer, it collapses to a short jump.
The test for collapsing it occurs only on entry and on exit to and from
the ring buffer. The list of jump-target symbols is kept in the symbol
table, where it is easily accessed.

I am not a mathmatician, but I belive that this algorithm is order O(n).

Its been a while since I've looked at the VAX instruction set. 
Would this algorithm apply there?

Sincerely yours,

Ted Goldstein
Computer Innovations Inc.




Any comments?

crandell@ut-sally.UUCP (Jim Crandell) (11/28/84)

> A jump is first considered to be a long jump. If its target appears before
> the jump leaves the ring buffer, it collapses to a short jump.
> 
> I am not a mathmatician, but I belive that this algorithm is order O(n).
> 
> Any comments?

Well, since you asked, yes.  It is O(n).  It's also imperfect.  Since
you effectively make only one pass over the jump list, you get only the
first-order effects; i.e., you initially assume (correctly) that all
jumps will be long, so as the hope of shortening each one wanes, you
give up on it without allowing yourself to take advantage of the
abbreviation of its postrange caused by the shortening of jumps whose
destinations haven't appeared yet at the time of the decision.

Funny thing about O(p(n)) algorithms for NP-complete problems.  They
usually don't work quite right.
-- 

    Jim Crandell, C. S. Dept., The University of Texas at Austin
               {ihnp4,seismo,ctvax}!ut-sally!crandell

henry@utzoo.UUCP (Henry Spencer) (11/28/84)

> Well, it depends on the architecture. On the 8086 and family, where jumps
> are relative  8 or 16 bit branch to the Program Counter register, we
> only have to keep the previous 256 bytes worth of instructions in a ring
> buffer.

You could do this on most any machine with span-dependent instructions.
But note that this algorithm leaves some jumps long unnecessarily, and
in fact isn't as good as the CACM algorithm, because of problems with what
one might call "feedback loops" when one jump is jumping over another.
I believe the CACM paper demonstrated formally that no algorithm which
initially assumes jumps are long, and then tries to shorten them, can
give optimal results.  You have to work the other way, lengthening jumps
only as necessary.

The "ring buffer" (actually it's a FIFO) for output code does get *most*
of the shortenable jumps, though, and it's nice and simple.
-- 
	"Make mine machine-readable!!"

				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

john@dido.UUCP (John Collins) (11/29/84)

I once worked on a crazy British machine where you had a choice between

	-	Signed 8-bit displacements for conditional branches.

	-	Signed 9-bit displacements for unconditional branches.

	-	Unsigned 7-bit displacements to an indirect word lower
		down in the code (which had to be inserted after some
		unconditional branch) if the above 2 proved too little.

This meant that some code looked like (mnemonics different, won't bore
you with those this time round).

		bz	one
		bindir	two
	one:	..... code .....
		................
		bmi	three	|  This could fall over with > 126 indirects
		bindir	four
	two:	.word	five
	four:	.word	six
	three:

The assembler gave no help whatsoever, so you wrote your program, and then
produced an assembly listing with its 100 or so displacement errors and
whittled them down by hand.  I wrote 2 compilers for that machine.  My
eventual approach was to do almost everything by indirects with the compiler
breaking code to jump around a block of indirect words every so often.

Would love to hear comments on that lot!

-- 
		John Collins

Please note that I am visiting Sweden.  Address all replies to

			ist!inset!jmc

Phone:		+44 727 57267
Snail mail:	47 Cedarwood Drive, St Albans, Herts, AL4 0DN, England.

crandell@ut-sally.UUCP (Jim Crandell) (12/04/84)

> The assembler gave no help whatsoever, so you wrote your program, and then
> produced an assembly listing with its 100 or so displacement errors and
> whittled them down by hand.  I wrote 2 compilers for that machine.  My
> eventual approach was to do almost everything by indirects with the compiler
> breaking code to jump around a block of indirect words every so often.

And all this time I thought the Nova was developed in the U.S.
-- 

    Jim Crandell, C. S. Dept., The University of Texas at Austin
               {ihnp4,seismo,ctvax}!ut-sally!crandell