[net.unix-wizards] magic numbers?

rich@rexago1.UUCP (K. Richard Magill) (11/25/85)

Two questions. I presume they are related. Answers or pointers to doc would
be appreciated.

	1)  How does the shell (exec?) know whether the command I just typed
		is a shell script or one of several possible types of
		executable?

	2)  Presuming the answer to #1 above has something to do with
		magic numbers, who issues them?  is there a common
		(definitive) base of them or does each
		manufacturer/environment make up their own set?

K. Richard Magill
(someplace between an advanced user & a guru)

pdg@ihdev.UUCP (P. D. Guthrie) (11/26/85)

In article <124@rexago1.UUCP> rich@rexago1.UUCP (K. Richard Magill) writes:
>Two questions. I presume they are related. Answers or pointers to doc would
>be appreciated.
>
>	1)  How does the shell (exec?) know whether the command I just typed
>		is a shell script or one of several possible types of
>		executable?
>

The shell doesn't know.  The shell merely tells the kernel to exec the
file, after doing a fork.  The kernel determines if a file is a binary
executable by the magic number, which is obtained by reading an a.out.h
structure (4.1,4.2) or filehdr.h (sys 5) and comparing it against
hardcoded numbers in the kernel. In 4.1 for instance only 407,413 and
410 are legal.  This also tells the kernel the specific type of
executable, and in some cases can set emulation modes. The kernel also
recognizes 
#! /your/shellname
at the beginning of a file and execs off the appropriate shell instead.

>	2)  Presuming the answer to #1 above has something to do with
>		magic numbers, who issues them?  is there a common
>		(definitive) base of them or does each
>		manufacturer/environment make up their own set?

The magic number is issued by the linker/loader.  Pretty much the magic
number is decided by the manufacturer, but from what I have seen, is
kept constant over machines. Forgive me if this is wrong, but I do not
have any method of checking, but the magic numbers for say plain
executable 4.x Vax and plain executable SysV.x Vax are the same, but
SysV.x Vax and SysV.x 3B20 are different.  Could someone comfirm this?

>
>K. Richard Magill
>(someplace between an advanced user & a guru)

					Paul Guthrie

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (11/27/85)

There is no central authority who issues "official" magic numbers
for UNIX.  System V keeps a list in /etc/magic (perhaps
/usr/lib/magic or /usr/5lib/magic, depending on your system) and
the "file" command uses this (along with some heuristic tests) to
identify types of files.  a.out types are hard-wired into the
kernel (they're usually described in /usr/include/a.out.h).  The
same type of a.out is supposed to have the same magic number
across UNIX implementations, but this scheme has broken down
over the years and it is likely that different CPUs and even
different UNIX vendors use different numbers for the same type.

geoff@ISM780B.UUCP (11/27/85)

> Two questions. I presume they are related.
>      1)  How does the shell (exec?) know whether the command I just typed
>      is a shell script or one of several possible types of executable?
Magic numbers are indeed the answer, if file mode allows execution, the
first few bytes are read, (probably sizeof (struct filehdr)), and if they
contain a recognized magic number,  they are exec'd appropriately, else,
a shell script is assumed.  One drawback of this approach is that if an
executable has an unknown magic number (say you got it off a different
system), the shell will try to interpret it as a script, causing various
syntax errors (the most common is "unexpected `('" ).
>        2)  Presuming the answer to #1 above has something to do with
>        magic numbers, who issues them?  is there a common (definitive)
>        base of them or does each manufacturer/environment make up their
>        own set?
On Sys5 (at least), there is an ascii file called /etc/magic, which contains
magic numbers and information on what they mean, (the `file' command uses
this).  I _think_ the numbers are made up by the implementors of the
particular a.out, but are standardized as much as possible by enlightened
self-interest.  (And by AT&T, one would have to be crazy not to use
whatever they use to describe, say, a 5.0 binary).
For further info, see file(1), /etc/magic (on any sys5), the UNIX* System
5 Support Tools Guide (various sections), all the shell documentation,
and don't assume this is a complete list of references. Hope this helps.

			   Geoffrey Kimbrough
	  INTERACTIVE Systems Corporation, Santa Monica California.
	     {decvax!vortex || ihnp4!allegra!ima}!ism780!geoff

	  Don't hold your nose up so high, it blocks the light.
Standard Disclaimers apply.
* UNIX is a trademark of AT&T.
* Shell is a trademark of Shell Oil Company. (8^)

jsdy@hadron.UUCP (Joseph S. D. Yao) (11/28/85)

In article <124@rexago1.UUCP> rich@rexago1.UUCP (K. Richard Magill) writes:
>	1)  How does the shell (exec?) know whether the command I just typed
>		is a shell script or one of several possible types of
>		executable?

Rarely is magic number testing done in the shell.  Usually it is
done in the kernel, at exec time.  Exec() will test the first word
for magicity (?), and directly execute it or not.  Under 4BSD, the
kernel also checks for "#!", and executes the program on the rest
of the line with that file as input if it finds this "magic number."
If the kernel doesn't execute a file, but the file is executable,
the shell will try to execute a sub-shell with the file as input.
Whether it does this first or goes down the PATH list looking for
another executable file is shell-dependent -- I prefer the latter,
personally.  Note that in executing sub-shell's, the C shell on
non-4BSD(-ish) systems will emulate the 4BSD kernel behaviour for
files starting with "#!".

>	2)  Presuming the answer to #1 above has something to do with
>		magic numbers, who issues them?  is there a common
>		(definitive) base of them or does each
>		manufacturer/environment make up their own set?

Yes.  Both.

Both AT&T and Microsoft, and I believe others such as Intel and ISC (?)
have issued definitive statements on what the common executable files
should look like and what the magic numbers are.  All, of course,
are different.  Look in /usr/include/a.out.h et al for information,
and don't believe it until you've verified it, especially if you
are unfortunate enough to have a binary-only system.  On SysV,
look in /etc/magic [I think] -- notice that several different
numbers have the same labels; I changed them to read "old ...",
"new ...", "COFF ...".  If you have 'file.c', and the numbers are
hard-coded, that is still a good guide.  Provided your file.c
compiles into the same as your /bin/file ...

The very first original magic number (the old man said, stroking
his long white beard) was 0407.  This is equivalent to a 'br .+16'
instruction in PDP-11 machine language -- useful if you want to
execute a file stand-alone, since headers were 16 bytes long.
Then came 410, 411, 413, 405, etc. -- many of which were not
available on ordinary machines, but were lusted after by the more
acquisitive members of the community, and in particular by the
recipient of the coveted Golden Chicken award ... but I digress.
;-)
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}

guy@sun.uucp (Guy Harris) (11/29/85)

> >	1)  How does the shell (exec?) know whether the command I just typed
> >		is a shell script or one of several possible types of
> >		executable?
> 
> The shell doesn't know.  The shell merely tells the kernel to exec the
> file, after doing a fork.  The kernel determines if a file is a binary
> executable by the magic number, which is obtained by reading an a.out.h
> structure (4.1,4.2)

And V7 and System III, and maybe Version 8.

> or filehdr.h (sys 5) and comparing it against hardcoded numbers in the
> kernel.  The kernel also recognizes 
> #! /your/shellname
> at the beginning of a file and execs off the appropriate shell instead.

The *4.1 or 4.2 or Version 8* kernel recognizes "#! shellname"; this isn't
in V7 or S3/S5 (although it could be added).

In the case of other systems, or shell scripts which *don't* have a "#!"
line at the beginning, the kernel sees that the file isn't an executable,
and returns to the (forked) shell with an error indicating this.  The shell
then sees that the file had execute permission but wasn't an executable, and
runs it as a script.  In the C shell, and the 4.1/4.1 Bourne shells, there
is a convention that if the first character of the script is a "#" that it's
a C shell script and that if it's a ":" it's a Bourne shell script (this is
because ":" used to be the only form of comment in the V6 and Bourne shells,
and "#" was a comment character in the C shell; however, the 4.1/4.2 and
S3/S5 Bourne shells accept "#" as a comment, so this convention is now a
crock).  They would "exec" the other shell if the first character of the
script so indicated; otherwise, they'd interpret it themselves.

> >	2)  Presuming the answer to #1 above has something to do with
> >		magic numbers, who issues them?  is there a common
> >		(definitive) base of them or does each
> >		manufacturer/environment make up their own set?
> 
> The magic number is issued by the linker/loader.

I think that by "issued" he meant "who decides what the magic number is"; as
you point out, this is done by the manufacturer.  Some of them are specified
for System V as part of the Common Object File Format (does anybody know why
there seem to be four(!) different magic numbers for 68000-family
executables?).  I don't think AT&T acts as a clearinghouse for them, though,
so if you've just ported UNIX to your new 27-bit machine I guess you get to
choose your own.

	Guy Harris

rb@istbt.UUCP (Bob Bishop) (11/29/85)

In article <416@ihdev.UUCP>, pdg@ihdev.UUCP (P. D. Guthrie)
writes:

>       Forgive me if this is
>       wrong, but I do not have any method of checking, but
>       the magic numbers for say plain executable 4.x Vax and
>       plain executable SysV.x Vax are the same, but SysV.x
>       Vax and SysV.x 3B20 are different.  Could someone
>       comfirm this?

This is true, BUT it doesn't mean you can execute BSD binaries
on a SysV VAX (nor, presumably, vice versa). The file formats
are actually different. Running a BSD binary on a SysV VAX
results in behavior almost, but not entirely, unlike what the
programmer intended.

-- 
Bob Bishop

"What's so unpleasant about being drunk? You ask a glass of water!"

spw2562@ritcv.UUCP (Fishhook) (12/04/85)

In article <3044@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>                                                       does anybody know why
>there seem to be four(!) different magic numbers for 68000-family
>executables?
>
>	Guy Harris
There are four(!8-) distinct forms of executables for the M68000 family
of processors.  I took a course last year in systems programming, and one
off the projects was a primitive object module editor.  We had to be able
to recognize the four formats by the magic number, and determine where
the text and data areas were.  It's been a while, so I don't exactly remember
the formats, but I can look 'em up if you're really interested.


==============================================================================
        Steve Wall @ Rochester Institute of Technology
        USnail: 6675 Crosby Rd, Lockport, NY 14094, USA
        Usenet: ..!rochester!ritcv!spw2562 (Fishhook)   Unix 4.2 BSD
        BITNET: SPW2562@RITVAXC (Snoopy)                VAX/VMS 4.2
        Voice:  Yell "Hey Steve!"

    Disclaimer:  What I just said may or may not have anything to do
                 with what I was actually thinking...

guy@sun.uucp (Guy Harris) (12/07/85)

> > does anybody know why there seem to be four(!) different magic numbers
> > for 68000-family executables?

> There are four(!8-) distinct forms of executables for the M68000 family
> of processors.

Well, I looked on an S5 machine we have here, and there are now seven(1)
different magic numbers in its "filehdr.h" - eight, if you count the fact
that it claims that the UNIX PC 7300 uses the same magic number as iAPX286
large-model code(!!!).  I don't know what the "four distinct forms of
executables for the M68000 family" are, but I suspect there's no correlation
between those forms and the various magic numbers.  The formats listed are:

	five formats with "MC68K" in their #define names - one with no
	comment whatsoever on the #define line, one for "writable
	text segment" (like shared text, only with writable text?)
	one with "TV" in its name ("Transfer Vector", I presume), one for
	read-only text ("410" executables, I presume), and one for
	read-only demand paged text ("413" 4.xBSD executables, I presume).

	two formats with "M68" in their names.

Why an "M68" is different from an "MC68K" is beyond my simple mind; maybe
somebody from the group that did this can explain it to us mere mortals.
Why the PC 7300 would use an iAPX286 magic number is totally beyond my
comprehension.  Why one simple chip family would need all these magic
numbers (by the way, I though the magic number in the "UNIX header" - as in
"aouthdr.h" - was supposed to indicate whether the file was shared text,
split I&D, paged shared text, and all that) is also a bit
incomprehensible.  A Common Object File Format may be nice, but this is
slipping past baroque into rococo... (we won't even discuss the fact that
another header uses "#if mc68000" - it's nice that they agree with the
#define we use, but somebody from Motorola assured me that "m68k" was the
proper predefined constant, not "mc68000")

	Guy Harris

mats@fortune.UUCP (Mats Wichmann) (12/10/85)

68000 magic numbers? Well...

This is what appears in the header file from Motorola (and AT&T) these days:

#define	MC68MAGIC	0520
#define	MC68TVMAGIC	0521
#define	M68MAGIC	0210
#define	M68TVMAGIC	0211

The M68 stuff and all the TV (transfer vector) things have to
do with AT&T-internal development - the multiprocessor
switches and those things, if I am not mistaken. So there is
really only one magic number for the 68000 family that gets used.
(MC68MAGIC).

As far as #defines go, the "party line" is:

m68k is for the family
M68000 M68010 M68020 M68881 and such identify a particular chip.

Variations:
as many as you can think of.
For example, Unisoft at one time used mc68000, but later 
switched to m68000 (and may now be using m68k).
And So On. I think MIT used mc68000, so people who
started with their code as a base probably used
mc68000 at least for a while....

    Mats Wichmann
    Fortune Systems
    {ihnp4,hplabs,dual}!fortune!mats

  "Quality. Comfort. Style.
  And at prices jou can afford!"
    - Izzy Moreno

dave@ecrcvax.UUCP (David Morton) (12/22/85)

Summary:
Expires:
References: <124@rexago1.UUCP> <416@ihdev.UUCP> <3044@sun.uucp> <9107@ritcv.UUCP> <sun.3059>
Sender:
Reply-To: dave@ecrcvax.UUCP (David Morton)
Followup-To:
Distribution: net
Organization: European Computer-Industry Research Centre, Munchen, W. Germany
Keywords:

In article <sun.3059> guy@sun.uucp (Guy Harris) writes:
>Well, I looked on an S5 machine we have here, and there are now seven(1)
>different magic numbers in its "filehdr.h" - eight, if you count the fact
>that it claims that the UNIX PC 7300 uses the same magic number as iAPX286
>large-model code(!!!).  I don't know what the "four distinct forms of
>executables for the M68000 family" are, but I suspect there's no correlation
>between those forms and the various magic numbers.

        There are probably x (x > 8) different magic numbers on
        unix machines in the meantime. What's to stop a
        manufacturer making the kernel recognise yet another
        one (perhaps because he's developed his own mmu for some
        purpose or other), then hacking the assembler, loader & the
        includes. I know of one company here in Germany that did this.
        So much for binary compatibility.
>
>Why an "M68" is different from an "MC68K" is beyond my simple mind; maybe
>somebody from the group that did this can explain it to us mere mortals.

Yes please ! This was really confusing. Apart form that, the Motorola 5.0
SGS was nice to work with.
-- 

Dave Morton
Tel. + (49) 89 - 92699 - 139

CSNET: dave%ecrcvax.uucp@germany.csnet
UUCP: seismo!mcvax!unido!ecrcvax!dave