[comp.sources.bugs] v05i028: /etc/magic lines for compress

news@investor.UUCP ( Bob Peirce) (11/15/88)

In article <2643@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
>Posting-number: Volume 5, Issue 28
>Submitted-by: "Steve Nuchia" <steve@nuchat.UUCP>
>Archive-name: compress.magic
>
>If you have a file(1) compatible with the one in sysVr3
>adding the following lines to /etc/magic will make it
>recognize compressed files.
>
>0	short		40223		compressed data
>>2	byte		<0200		- %d bits
>>2	byte		0214		- 12 bits
>>2	byte		0215		- 13 bits
>>2	byte		0220		- 16 bits
>
We have SysVr2.2 and the 40223 needed to be changed to 8093 (1F9D hex).
Use "od -x" on a compressed file to see if your first word is different.

We also found a space in front of the '-' would be included in the
output leading to "compressed data - 16 bits" instead of "compressed
data- 16 bits".

Much thanks for the idea.


-- 
Bob Peirce, Pittsburgh, PA				 412-471-5320
uucp: ...!{allegra, bellcore, cadre, idis, psuvax1}!pitt!investor!rbp
	    NOTE:  Mail must be < 30K  bytes/message

guy@auspex.UUCP (Guy Harris) (11/16/88)

>>If you have a file(1) compatible with the one in sysVr3
>>adding the following lines to /etc/magic will make it
>>recognize compressed files.
>>
>>0	short		40223		compressed data
>>
>We have SysVr2.2 and the 40223 needed to be changed to 8093 (1F9D hex).

Person "A" had a little-endian machine (40223 is 9D1F hex) and person
"B" had a big-endian machine; the S5 release had nothing to do with it. 

Unfortunately, "strings" in "/etc/magic" cannot, in the standard S5
version, contain C-language escapes, so you can't do this:

	0	string		\037\235	compressed data

which the SunOS version of the S5 'file" supports; this obviates the need
for byte-order-dependent versions of "/etc/magic".

The SunOS version also supports

	>2	byte&0x80	>0		block compressed
	>2	byte&0x1f	x		%d bits

with the "&<mask>" stuff obviating the need for individual entries for
different numbers of bits.

I hope this gets into S5R4 (especially since "compress" presumably will
get into S5R4), but I have no idea if it will.

dik@cwi.nl (Dik T. Winter) (11/16/88)

In article <454@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
 > >>If you have a file(1) compatible with the one in sysVr3
 > >>adding the following lines to /etc/magic will make it
 > >>recognize compressed files.
 > >>
 > >>0	short		40223		compressed data
 > >>
 > >We have SysVr2.2 and the 40223 needed to be changed to 8093 (1F9D hex).
 > 
 > Person "A" had a little-endian machine (40223 is 9D1F hex) and person
 > "B" had a big-endian machine; the S5 release had nothing to do with it. 
 > 
 > Unfortunately, "strings" in "/etc/magic" cannot, in the standard S5
 > version, contain C-language escapes, so you can't do this:
 > 
 > 	0	string		\037\235	compressed data
 > 
 > which the SunOS version of the S5 'file" supports; this obviates the need
 > for byte-order-dependent versions of "/etc/magic".

It is clear that byte order dependencies are a pain, on the other hand
if for some system some format is specified as 0177767 it appears to be
backward to have to specify it in the form of a string.
Wouldn't it be better to allow specification like AR16W, AR32W and
AR32WR as happens in so many places in SysV (COFF comes to mind)?
Then each system can specify in its favourite order its own /etc/magic
part.  Of course file(1) would have to compose the words and longs
itself, but that is only minor.  (Note: this could also make a truely
portable file(1)).
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

guy@auspex.UUCP (Guy Harris) (11/17/88)

>It is clear that byte order dependencies are a pain, on the other hand
>if for some system some format is specified as 0177767 it appears to be
>backward to have to specify it in the form of a string.

So who says you *have* to specify it in the form of a string?  From the
same SunOS "/etc/magic":

	0	short		0177545		old archive

We didn't *take away* any capability, we just *added* some.

In the case of compressed files, the format *is* properly specified as a
string - in "compress.c", the header is

	char_type magic_header[] = { "\037\235" };	/* 1F 9D */

which sure looks like a string to me....

>Wouldn't it be better to allow specification like AR16W, AR32W and
>AR32WR as happens in so many places in SysV (COFF comes to mind)?

In this particular example, no, it wouldn't be better; since the magic
number for compressed files *is* a string, the right way to specify it is
as a string.

The same applies for packed data:

	0	string		\037\036	packed data

Although PACKED is #defined as 017436 in "pack.c", that #define is not,
in fact, used; the code to generate the magic number is

	outbuff[0] = 037;
	outbuff[1] = 036;

which is, again, a string "\037\036".

The right tool for the right job; telling "file" about a machine's byte
order is, in this case, entirely the wrong tool. 

dupuy@douglass.columbia.edu (Alexander Dupuy) (11/17/88)

In article <454@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes:
> Unfortunately, "strings" in "/etc/magic" cannot, in the standard S5
> version, contain C-language escapes, so you can't do this:
> 
> 	0	string		\037\235	compressed data
> 
> which the SunOS version of the S5 'file" supports; this obviates the need
> for byte-order-dependent versions of "/etc/magic".

In article <7714@boring.cwi.nl> dik@cwi.nl (Dik T. Winter) replies:
> Wouldn't it be better to allow specification like AR16W, AR32W and
> AR32WR as happens in so many places in SysV (COFF comes to mind)?
> Then each system can specify in its favourite order its own /etc/magic
> part.

Not only is this better, but the "strings" in even the SunOS version of magic
can't contain null bytes (well actually, they can, but it doesn't do what you
want), and can't be masked in the way which SunOS allows you to mask numbers.

While I find AR32WR pretty incomprehensible (is it network order, or reversed?)
it should be possible to write a file which converts the numbers and masks
specified in /etc/magic into network order before doing any comparisons, and
converts values read from the file into local byte order before performing %d
substitutions on the descriptions in the /etc/magic file.  If needed, special
printf macros could be used to indicate substitutions for which the bytes
needed to be swapped before the ntohl conversion was done.  This should provide
a portable file(1).

@alex
-- 
inet: dupuy@columbia.edu
uucp: ...!rutgers!columbia!dupuy

guy@auspex.UUCP (Guy Harris) (11/18/88)

>Not only is this better,

This assertion has now been made twice.  I have yet to see any evidence
for its truth.  The strings in "/etc/magic" correspond, in at least the
two cases I cited (compressed files and packed files), to C-language
strings; could somebody please explain why expressing them as numbers in
a standard byte order, rather than as the strings they are, is somehow
"better"?

dupuy@douglass.columbia.edu (Alexander Dupuy) (11/18/88)

Just to follow up on the problems with /etc/magic formats: it is currently
possible to cause file(1) to crash on a Sparc (or other strict alignment
machine) by having magic entries which are of type "short" or "long", at
misaligned offsets.  If you have a Sun-2 or Sun-4, you can repeat this by
placing the following entry into /etc/magic:

1	long		0		doesn't matter if it matches or not.

This should crash file with a Bus Error, when you apply it to any file for
which it doesn't have a builtin rule, such as an ordinary ASCII text file.

@alex

-- 
inet: dupuy@columbia.edu
uucp: ...!rutgers!columbia!dupuy