magic lines for compress

guy@auspex.UUCP (Guy Harris) (11/18/88)

>Not only is this better,

This assertion has now been made twice.  I have yet to see any evidence
for its truth.  The strings in "/etc/magic" correspond, in at least the
two cases I cited (compressed files and packed files), to C-language
strings; could somebody please explain why expressing them as numbers in
a standard byte order, rather than as the strings they are, is somehow
"better"?

dik@cwi.nl (Dik T. Winter) (11/18/88)

In article <470@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
 > >Not only is this better,
(Specifying native byte order.)
 > 
 > This assertion has now been made twice.  I have yet to see any evidence
 > for its truth.  The strings in "/etc/magic" correspond, in at least the
 > two cases I cited (compressed files and packed files), to C-language
 > strings; could somebody please explain why expressing them as numbers in
 > a standard byte order, rather than as the strings they are, is somehow
 > "better"?

Ok, ok, I did not look at the source before posting, and indeed, compress
uses a string (I still did not look at pack).
The reason it is better is when you have executables for different systems
(with different byte order, that is).
Why should I have to specify for a Sun executable:
2	short	05401
on a Vax, or for a Vax on the sun:
0	short	05401
?
Why not something like (possibly I have AR16W and AR16WR wrong, and AR16WR
does not exist, I believe):
2	AR16WR	0413	Sun executable
0	AR16W	0413	Vax executable
which (with a proper implementation of file) will work on both systems?

The public domain implementation contained a file for Vax executables,
an excerpt follows:
0	long		00700200000	VAX executable
0	long		01000200000	VAX pure executable
0	long		01300200000	VAX demand-paged pure executable
0	long		01100200000	PDP-11 executable

Do you see the relationship with 0407, 0410, 0411, 0413 from a.out.h?
And will it work on a Vax?

The CDC Cyber 180 series have a layered implementation of Unix (VX/VE) with
a quite right implementation; from /etc/magic:
0       vlong           0100554         apl workspace
0       vshort          0407            Vax 11-750/780  - executable

I hope I convinced you of the suitability of specifying byte order,
but I doubt it.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

dupuy@douglass.columbia.edu (Alexander Dupuy) (11/19/88)

In article <470@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
> The strings in "/etc/magic" correspond, in at least the
> two cases I cited (compressed files and packed files), to C-language
> strings; could somebody please explain why expressing them as numbers in
> a standard byte order, rather than as the strings they are, is somehow
> "better"?

Let me clarify the assertion that I made - that network byte order numbers are
better than strings.  I agree that if a magic `number' is actually recognized
as a character string by the applications which understand the format, that it
is better to make the entries in /etc/magic strings, since it is byte-order
independent, and clearer to someone looking at /etc/magic and trying to
understand it.  In the case of compressed or packed files, this does work well.

But there are other file formats (such as executables) which I would like to
recognize on machines with the opposite byte order.  This is very useful if you
have Vaxen and Suns sharing filesystems with NFS.  These are most naturally
represented as numbers, and in fact, I can't represent them as strings (since
some have 0-valued bytes, which cause problems in C strings).

By providing a network byte order implementation of file, I can use one version
of /etc/magic for all our machines, without having to worry about which entries
to byteswap when moving from one machine to another.  This is a useful
property, which is independent of the representational convenience of strings.

If I had to choose between using a file with strings with octal escapes or
using a network byte order version of file, I would choose the latter, since
the convenience in having one version outweighs the representational
convenience of strings in magic entries.  That was what I meant by better.
Someone working in a more homogenous environment might well make the opposite
choice, so `better' is a relative term.

Of course, the right solution is to support both.

@alex
-- 
inet: dupuy@columbia.edu
uucp: ...!rutgers!columbia!dupuy

guy@auspex.UUCP (Guy Harris) (11/22/88)

>Of course, the right solution is to support both.

Agreed.  The problem is that the original assertions about the relative
merits of something saying "big-endian long"/"little-endian long" vs. 
"string" were made with little context, so it was not at all obvious for
which situations it was being considered better; the original topic of
discussion was entries for compressed files, and for those strings are
clearly the way to go.

At the time the S5R2 "file" was put into SunOS, the immediate need for
extensions of that sort was for changes to handle compressed files and
"archive random librar(ies)", which called for extensions to type
"string" rather than for standard-byte-order integral types.  Since (at
that time) all machines from Sun had the same byte order, and since most
other machines were unlikely to have adopted that version of "file", no
support for "standard byte order" integral quantities in files was
added.  (The goal wasn't to make the "perfect" version of "file", the
goal was to add support for compressed files in such a form as to be at
least marginally useful for other things.)

I don't see the need for anything more than types such as "long-be",
"long-le", "short-be", and "short-le", which refer to "long"s stored in
"big-endian" form, etc. (the names seem a bit clumsy; although they're a
bit more obvious than COFFisms such as "AR32W", the "-be", etc. don't
sound quite right).  Basically, "file" would grab 2 or 4 bytes and
convert them to a number in the appropriate format; this would then be
compared against constants, masked, printed, etc. just like "native"
byte-order items such as "long" and "short".  There seems to be no need
to print "standard byte order" quantities any differently from "native
byte order" quantities, or to play games with the byte ordering of masks
or constants to compare those quantities with.

Of course, once you've done that, you can start worrying about 64-bit
machines, and machines with non-8-bit bytes, and the like (at least one
of the former, and at least one of the latter, run UNIX).