[comp.lang.c] near / far branches

dg@wrs.UUCP (David Goodenough) (10/16/87)

In article <117@nusdhub.UUCP> rwhite@nusdhub.UUCP (Robert C. White Jr.) writes:
>	First, a linker is a "linking loader" and it's only REAL
>purpose is to resolve refrences and setup dynamic relocation
>information.  ALL "short" jumps [add/subtract value from the
>instruction pointer directly] are compleetly generated and closed
>in the assembler.  If there were to be an alteration in the size
>of a block of code-text, the linker would have to "reassemble" the
>code block to make up for the altered "short" jumps and such.  In
>order for the linker to do this it would either need the source
>code, or it would have to take a stab at disassembeling the object
>code and hope not to get it wrong.

Not true - every (useful) assembler out there can generate information
that a given word needs to be relocated in some way, so why not just
have two types or relocation: far (for unresolved, long distance) and
near (for resolved, short distance)

>	The assembler is designed to AUTOMATICALLY determine the
>"near"ness or "far"ness of a refrence.  This works beautfuly as
>long as you act like a pascal programmer and only use backward
>refrences.  For the forward refrences you must use a keyword like
>"near" or "far" or else take what you get, and hope what you get
>is close enough to work.

Wrong again - I have seen many assemblers in my time, and only one was
a single pass animal (and it was a real sharp piece of software - had to
maintain two symbol tables: one for defined labels, and another for
unresolved references). Think about it for half a second, you have to do
a second pass to resolve all the forward references anyway, so all you
do is add a link field to your symbol table, linking the symbols together
in address order, then whenever you munge a far / near branch, just run
up the chain adjusting all the references (this actually requires a third
pass or O N^2 time based on the number of symbols).

>	It should be obvious that the assemblers job is to assemble
>and the linkers job is to link.

Open to discussion, I added the -X flag to my assembler to do an
assemble and link all in one go on a source - saves some time as I
get away with three passes (two for the assembly and one for linkage)
as opposed to four (two each for the separate assembly and linkage)
--
		dg@wrs.UUCP - David Goodenough

		..... !rtech!--\
				>--!wrs!dg
		  ..... !sun!--/

					+---+
					| +-+-+
					+-+-+ |
					  +---+

rwhite@nusdhub.UUCP (Robert C. White Jr.) (10/18/87)

As far as you coments on my coments on the assembeler, I must cede to
you greater knowlege.

My only execption to your statments is:

	When I used "SHORT" you started talking about "NEAR" and "FAR"
under the intel [grabag] there is an instruction which simply adds/
subtracts a signed byte from the instruction pointer.  In fact ALL
the conditional branching in a x86 family are these types of things.

The simbol table for load-time-fixups, not runtime relocation, would
be horrendus.  Since we were talking about shrinking and growing the
operations between word and dword parameters for every FAR refrence
which became NEAR durring linking.  [I.E. Why, that's right over here,
I don't need a segment on this refrence.] the problems become numerous.

1) all the conditional branches will have to have their inline constants
checked for validity, and each "short" label, which the assembeler
now disposes of, will have to be placed in the external symbol table.

2) The linker will have to look at the entire body of code, and determine
if any of the internal or external refrences to a procedure [i.e.
use of the "call" opp.] are actually far.  If none are, then the
linker can go ahead and change all the call opps, and the return opps
to nolonger contain a refrence to a segment.  It must then:
	a) scan the body of the call for any segment overides refering
	to the code segment, and trun them into NEAR opps.
	b) Scan the body of the text, and remove some one of the
	negitive-of-frame-pops, because the frame has been shrunken
	by one word.  This scaning would have to include itterated loops
	and simple addidtion of constatnts to the stack pointer or
	base pointer, or any regester or memory location which assumes
	the value of the above durring the functioning of the call, or
	any of it's branches or children.

<Jumps arn't This bad, but they have their own stickeyness about their
	selves to complicate these "fixups">

.....  [stuff and issues ad nausium deleted]   .....

My point was that any assembeler worth it's salt, in a similarly
valued programmer's hands can MUCH better serve by the judicious
use of "NEAR" and "FAR" rathar than screaming about how easy it
would be to make a linker which would "take a far call, turn it
into a near call, and then delete the no-ops to tighten the code"

Yes, Assembelers send all the pertinent information to the linker,
and yes the linker can do amazing things with it, and yes, Assembelers
can do all sorts of optomizing things to assembly code.  That was part
of my point.

But when someone tells me that they want to have a "Linker" go through
the output of their various compilers and assembelers and have it
re-optomize their code, rigt down to removing all those peskey extra
no-ops and over-refrenced calls, I SCOFF.  Isn't that what I bought
my compiler and assembeler for? <sort of>

If there is such a product, a "disassembeler-optomizer-assembeler-linker"
which can work with "any object output, no matter what the source"
and optomize it to such a persnicity level, there wont have to be any
programmers, because the language it was writen in will take instructions
like "make me an inventory cost-accounting database package costomized to
my needs.<CR>" as a fully functional programming directive.

I say: Write one for me, and I'll see how well it works.

Rob.

daveb@geac.UUCP (10/19/87)

  A very reasonable (and reasoned) discussion between David
Goodenough and Robert C. White Jr., et all, prompts me to paraphrase
Christopher Fraser thusly:

  It is noticeable that assemblers do a lot of address resolution,
passing resolution on to a linker in cases where the information in
the assembler file is insufficient.
  The linker, in turn, does a lot of machine dependent loading.

  It is a good idea to teach the assembler to be just a single-pass
translator, passing all resolution off to the linker.  It is
admitted that this means identifying internal -vs- external symbols
to the linker so it does not mis-link an internal label "i" to an
external one or vice versa.
  It is now apparent that the linking algorithm is a merge-sort,
given a properly ordered set of linkable objects.  It is admitted
that this ordering may require a topological sort to properly order
the commands in the linkage control-file.
  It is finally apparent that the algorithm is portable to many (but
not all) machines, with the machine-dependent portion placed in a
separate loader.

  This is not to say that the classical form of linking is wrong,
merely that it is historical...

 --dave
-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.