[gnu.gcc] shared source question

allbery@ncoast.org (Brandon S. Allbery) (06/07/89)

In your message of Tue, 06 Jun 89 20:31:49 PDT, you write:
+---------------
| No doubt GNU LD could be modified to handle compound executables, or a separate
| utility could be used to produce them from individual architecture-specific
| executables.  The problem, however, is that the kernel must be modified to
| understand how to select and run the correct executable from a compound
| executable.  So|  when there is a GNU kernel, adding this support could be
| done rather easily I expect; without source to the kernel, there's no way
| to do this.
+---------------

I know about the kernel needing to know what's going on.  My thought was that
it might be possible to arrange for the kernel to use a different entry
point depending on the processor; the COFF header structure seems to have some
hooks for this, although they aren't quite good enough for the real world
(which is probably why AT&T's now pushing this new ELF format).  And in
these days of heterogenous networks, maybe someone should push for such a
capability in BSD4.4.

Of course, I could probably figure out for myself whether this would work or
not if I knew what BSD uses for an executable header on various machines....

I suggested it because I seem to remember hearing something about a
technique Sun was using to get SPARC and Sun3 executables for SunOS4.0 into
the same exectable file; this is exactly what we're looking for.  Of course,
if Sun put special hooks into the SunOS4 kernel to do this, it's no help;
but I was betting on Sun retaining as much BSD compatibility as possible.

++Brandon

rms@AI.MIT.EDU (06/08/89)

I will think about compound executables for the GNU system, but I
think it is premature to spend much time on the question now.

It's probably best to postpone this discussion until we know enough
about the GNU kernel to discuss implementation intelligently; and then
conduct it on a smaller list.  I am sure there are people on these
lists who would prefer to keep the volume down.

allbery@ncoast.org (Brandon S. Allbery) (06/08/89)

[Someone should let me know if this belongs somewhere other than info-gcc.
I'm not sure what all the Gnu-flavored lists are.

I've already received a few responses to my original letter; enough to show
me that I should have spoken in a bit more detail.  This is the response to
one of those letters.  ++bsa]

I'll explain my idea in a little more depth.  I am aware of such issues as
the kernel having to be able to identify the executable as such, the
inability to share symbol tables (I don't see this as any different from
being unable to share text segments), etc.  Here is one simple-minded
implementation, designed solely as an example of the basic idea I'm thinking
of.

My thought is to use an encapulated-COFF (as I understand it, which may be
incorrect) like trick to encapsulate multiple files, for different machines,
in a sort of archive:

	executable header		[points to proper executable]
	table of contents		[used by ld/archive maintainer]
	VAX executable
	SunOS4/SPARC executable
	SunOS4/Sun-3 executable
	Sequent executable
	...

To use the file for a particular architecture, point the text, data, bss,
symbol table, entry point, etc. pointers in the executable to the correct
location for a particular machine's executable; these can be stored in the
table of contents, or the original header can be retained on each separate
executable.  (You might also have to copy the correct magic number to the
real header; as I mentioned in my last letter, I know more about COFF than
about BSD format.)

Ld should treat this file as a kind of archive:  if the file exists when it
wants to write to it, ld splits the executable into its components, builds
the new object file for the target machine, then recombines it into the
encapsulated file.  Similarly, when it reads an object file it should act as
if only the section for the current target exists.  Also, /bin/as has to be
converted to write to such a file as if it were an archive, in the same way.

The main problem with this approach is that the date on the .o no longer
accurately reflects the age of the object for a particular target relative
to the source, if it was recompiled for another target.  Gnu make could know
about such executable archives and check a timestamp in the table of
contents instead of stat()'ing the file, to solve this problem.

Once this has been set up, just do an "rsh make" on each machine to update
the executable, then copy it to its final resting place on each machine and
stamp it for that machine -- or, with kernel source, teach the kernel how to
use the symbol table and just use the one copy.  Or have an "ld -extract"
function to pull the executable for a particular machine out of the file.

The advantage of this is that it's completely transparent as far as make'ing
the file for multiple machines.  With conditional symlinks, you need to know
to use the symlink for the .o file if it doesn't exist already... if you
don't set the thing up properly, it could blow up nicely.  I don't call that
transparent, even if the only change is explicitly coding the symlink info
into the Makefile.

It does, of course, have disadvantages... for example, the resulting files
are going to be HUGE.  Imagine an xemacs file containing the dumped Emacses
for all of the systems listed above, assuming 600K per system (conservative?)
-- then add in symbol tables for each; you can't share one symbol table for
all of them, obviously.  I can see the resulting "xemacs" reaching 10MB....

Still, for an NFS network, it might well be worth it.  And I *did* say that
it was a simple-minded approach; I expect that people who are used to
dealing with the mechanics of building executables can come up with better
schemes without thinking about it.

(My own solution to the original problem would be to symlink everything
except config.h out of a single common directory, then create and customize
local copies of config.h and possibly Makefile for each machine/architecture.
Remember KISS!)

++Brandon