guy@auspex.UUCP (Guy Harris) (11/18/88)
>Not only is this better,
This assertion has now been made twice. I have yet to see any evidence
for its truth. The strings in "/etc/magic" correspond, in at least the
two cases I cited (compressed files and packed files), to C-language
strings; could somebody please explain why expressing them as numbers in
a standard byte order, rather than as the strings they are, is somehow
"better"?
dik@cwi.nl (Dik T. Winter) (11/18/88)
In article <470@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes: > >Not only is this better, (Specifying native byte order.) > > This assertion has now been made twice. I have yet to see any evidence > for its truth. The strings in "/etc/magic" correspond, in at least the > two cases I cited (compressed files and packed files), to C-language > strings; could somebody please explain why expressing them as numbers in > a standard byte order, rather than as the strings they are, is somehow > "better"? Ok, ok, I did not look at the source before posting, and indeed, compress uses a string (I still did not look at pack). The reason it is better is when you have executables for different systems (with different byte order, that is). Why should I have to specify for a Sun executable: 2 short 05401 on a Vax, or for a Vax on the sun: 0 short 05401 ? Why not something like (possibly I have AR16W and AR16WR wrong, and AR16WR does not exist, I believe): 2 AR16WR 0413 Sun executable 0 AR16W 0413 Vax executable which (with a proper implementation of file) will work on both systems? The public domain implementation contained a file for Vax executables, an excerpt follows: 0 long 00700200000 VAX executable 0 long 01000200000 VAX pure executable 0 long 01300200000 VAX demand-paged pure executable 0 long 01100200000 PDP-11 executable Do you see the relationship with 0407, 0410, 0411, 0413 from a.out.h? And will it work on a Vax? The CDC Cyber 180 series have a layered implementation of Unix (VX/VE) with a quite right implementation; from /etc/magic: 0 vlong 0100554 apl workspace 0 vshort 0407 Vax 11-750/780 - executable I hope I convinced you of the suitability of specifying byte order, but I doubt it. -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax
dupuy@douglass.columbia.edu (Alexander Dupuy) (11/19/88)
In article <470@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes: > The strings in "/etc/magic" correspond, in at least the > two cases I cited (compressed files and packed files), to C-language > strings; could somebody please explain why expressing them as numbers in > a standard byte order, rather than as the strings they are, is somehow > "better"? Let me clarify the assertion that I made - that network byte order numbers are better than strings. I agree that if a magic `number' is actually recognized as a character string by the applications which understand the format, that it is better to make the entries in /etc/magic strings, since it is byte-order independent, and clearer to someone looking at /etc/magic and trying to understand it. In the case of compressed or packed files, this does work well. But there are other file formats (such as executables) which I would like to recognize on machines with the opposite byte order. This is very useful if you have Vaxen and Suns sharing filesystems with NFS. These are most naturally represented as numbers, and in fact, I can't represent them as strings (since some have 0-valued bytes, which cause problems in C strings). By providing a network byte order implementation of file, I can use one version of /etc/magic for all our machines, without having to worry about which entries to byteswap when moving from one machine to another. This is a useful property, which is independent of the representational convenience of strings. If I had to choose between using a file with strings with octal escapes or using a network byte order version of file, I would choose the latter, since the convenience in having one version outweighs the representational convenience of strings in magic entries. That was what I meant by better. Someone working in a more homogenous environment might well make the opposite choice, so `better' is a relative term. Of course, the right solution is to support both. @alex -- inet: dupuy@columbia.edu uucp: ...!rutgers!columbia!dupuy
guy@auspex.UUCP (Guy Harris) (11/22/88)
>Of course, the right solution is to support both.
Agreed. The problem is that the original assertions about the relative
merits of something saying "big-endian long"/"little-endian long" vs.
"string" were made with little context, so it was not at all obvious for
which situations it was being considered better; the original topic of
discussion was entries for compressed files, and for those strings are
clearly the way to go.
At the time the S5R2 "file" was put into SunOS, the immediate need for
extensions of that sort was for changes to handle compressed files and
"archive random librar(ies)", which called for extensions to type
"string" rather than for standard-byte-order integral types. Since (at
that time) all machines from Sun had the same byte order, and since most
other machines were unlikely to have adopted that version of "file", no
support for "standard byte order" integral quantities in files was
added. (The goal wasn't to make the "perfect" version of "file", the
goal was to add support for compressed files in such a form as to be at
least marginally useful for other things.)
I don't see the need for anything more than types such as "long-be",
"long-le", "short-be", and "short-le", which refer to "long"s stored in
"big-endian" form, etc. (the names seem a bit clumsy; although they're a
bit more obvious than COFFisms such as "AR32W", the "-be", etc. don't
sound quite right). Basically, "file" would grab 2 or 4 bytes and
convert them to a number in the appropriate format; this would then be
compared against constants, masked, printed, etc. just like "native"
byte-order items such as "long" and "short". There seems to be no need
to print "standard byte order" quantities any differently from "native
byte order" quantities, or to play games with the byte ordering of masks
or constants to compare those quantities with.
Of course, once you've done that, you can start worrying about 64-bit
machines, and machines with non-8-bit bytes, and the like (at least one
of the former, and at least one of the latter, run UNIX).