[net.unix-wizards] TAR DOES NOT SWAP BYTES

guy@sun.uucp (Guy Harris) (09/24/85)

> All I know is the tar program swaps bytes when writing a tape so
> that a VAX running 4.2 must use dd to swab things before un-tar-ing them.

"tar" does no such thing.  The control information on a "tar" tape is in
printable ASCII form, so that it's independent of byte order (and, with any
luck, other greasy architectural details).  "tar" tapes written on purely
big-endian machines (3[67]0, M68K, etc.), purely little-endian machines
(VAX, etc.), and mixed-up machines (PDP-11), can be read on machines of any
other byte sex.  Unless the files in question are text files, however, the
data might not be directly usable on the target machine, but that's not just
a problem of byte order.

"cpio" has a rather stupid byte-swapping option which swaps the data but
*not* the control information.  Since most data does not consist of a huge
uniform stream of "short"s or "long"s, an option to swap the data is
useless.  The control information, by default, consists of a bunch of
"short"s (yes, even the file size and modification/access time are stored as
pairs of "short"s), which should be swapped if the order of bytes in a
"short" is different on the source and target machines, and a bunch of
"char"s making up the file name which should not be swapped under any
circumstances.  This means, BTW, that

	dd if=/dev/rmt0 conv=swab bs=<whatever> | cpio -ib 

doesn't work, since it swaps the bytes in the names of all created files.
What they *should* have done was detect that the source and target machines
had different byte orders by checking whether the "magic number" was 070707
or a byte-swapped 070707, and automatically byte-swap the header "short"s
but not the path names or the data.

However, there is a "-c" option to "cpio" which tells it to write the
control information in - you guessed it - printable ASCII!  I believe it had
bugs in its System III incarnation, but you can read "cpio -c" tapes made on
a machine with different byte order.  The S5 "find" has an undocumented
"-ncpio" option which works like the "-cpio" option, only it writes "cpio
-c" instead of "cpio" tapes.  If you must use "cpio", use "cpio -c";
however, "tar" is more universal - it's in V7, 4.xBSD, and Systems III and V.

There are known cases of brain-damaged *hardware* swapping bytes.  The case
I know of is a big-endian Multibus machine with an extremely stupidly
designed tape controller.  If you write a tape on this machine, and want to
read it in on a sane machine, you have to stick "dd" in front of the "tar"
(or "cpio" or whatever).

The rule for correctness of byte order in a tape controller is simple.  If
you have the string "Now is the time for all good parties to come to the aid
of man" in memory, and tell the tape controller to write this to a tape, the
first byte in the block should be a capital "n", followed by a lower-case
"o", followed by a lower-case "w", followed by a blank, etc..  Violate this
and you'll force everybody who didn't violate this to swap bytes when
reading your tapes.

	Guy Harris

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (09/27/85)

> There are known cases of brain-damaged *hardware* swapping bytes.  The case
> I know of is a big-endian Multibus machine with an extremely stupidly
> designed tape controller.  If you write a tape on this machine, and want to
> read it in on a sane machine, you have to stick "dd" in front of the "tar"
> (or "cpio" or whatever).
> 
> The rule for correctness of byte order in a tape controller is simple.  If
> you have the string "Now is the time for all good parties to come to the aid
> of man" in memory, and tell the tape controller to write this to a tape, the
> first byte in the block should be a capital "n", followed by a lower-case
> "o", followed by a lower-case "w", followed by a blank, etc..  Violate this
> and you'll force everybody who didn't violate this to swap bytes when
> reading your tapes.

Yup, I believe IBM started this byte-swapping magtape foolishness
because of some bogus idea about big-endian byte order being "more
natural" on some 16-bit machine they had.  Some magtape controllers/
interfaces have jumpers to allow them to be operated in either normal
or swabbed mode.

ed@mtxinu.UUCP (Ed Gould) (10/01/85)

In article <2818@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>                              If you must use "cpio", use "cpio -c";
>however, "tar" is more universal - it's in V7, 4.xBSD, and Systems III and V.

Agreed about cpio, but it's not clear that SysV has a real tar in all
cases.  When I visited a site of ATTIS a couple of years ago, they
insisted that I bring a cpio tape, since they weren't sure that their
3B20 tar could read tapes made on a VAX.  It turned out that I had left
something off the cpio tape that was on my standard tar tape, so we
caried the tar tape to a VAX, extracted the appropriate files, and
write them out again with cpio to move them to the 3B.

-- 
Ed Gould                    mt Xinu, 2910 Seventh St., Berkeley, CA  94710  USA
{ucbvax,decvax}!mtxinu!ed   +1 415 644 0146

"A man of quality is not threatened by a woman of equality."

guy@sun.uucp (Guy Harris) (10/05/85)

> Agreed about cpio, but it's not clear that SysV has a real tar in all
> cases.  When I visited a site of ATTIS a couple of years ago, they
> insisted that I bring a cpio tape, since they weren't sure that their
> 3B20 tar could read tapes made on a VAX.  It turned out that I had left
> something off the cpio tape that was on my standard tar tape, so we
> caried the tar tape to a VAX, extracted the appropriate files, and
> write them out again with cpio to move them to the 3B.

I think the problem is with the 3B20 tape drive, not with "tar" on the 3B20.
Some tape controller (which I think they've junked in favor of a sane one)
imposes a rather small minimum block size; "tar cb 20 ..." exceeds this
limit.  This screws "cpio" over, too; this from CPIO(1) in the S5 User's
Manual:

	BUGS
	     ...The -B option does not work with certain magnetic
	     tape drivers (see UN32(7) in the UNIX System V Administrator
	     Reference Manual).

For some reason, the "tar" manual page doesn't contain a similar warning.
Perhaps the people who wrote it thought "tar" was a thing of the past, soon
to be completely replaced by "cpio".

	Guy Harris

jimb@amdcad.UUCP (Jim Budler) (10/07/85)

If you create a tar 'file' (not tape), and transfer it between systems
the file will require byte swapping between big-endian and little endian
machines.  


-- 
 Jim Budler
 Advanced Micro Devices, Inc.
 (408) 749-5806
 UUCPnet: {ucbvax,decwrl,ihnp4,allegra,intelca}!amdcad!jimb
 Compuserve:	72415,1200

"... Don't sue me, I'm just the piano player!...."

guy@sun.uucp (Guy Harris) (10/07/85)

> If you create a tar 'file' (not tape), and transfer it between systems
> the file will require byte swapping between big-endian and little endian
> machines.  

Gee, I'd appreciate it if you'd tell that to the VAX-11/780 (little-endian,
and running 4.2BSD and a vanilla-except-for-bug-fixes-having-nothing-to-do-
with-byte-swapping "tar") and the CCI Power 5/20s (68000-based, thus big-
endian, and running System III and a vanilla-except-for-bug-fixes-having-
nothing-to-do-with-byte-swapping "tar") at CCI.  We exchanged tar files (not
tapes, since the only transfer medium was RS-232 or Ethernet wires) with NO
byte swapping whatsoever, and were able to read each system's "tar" file on
the other system.

Either 1) the "tar" running on one of the systems you tried it on does its
own byte swapping or 2) you never actually tried it and are just guessing
that it "obviously" must require byte swapping.  In either case, it's not
relevant to the discussion.

	Guy Harris

jsq@im4u.UUCP (John Quarterman) (10/07/85)

>I think the problem is with the 3B20 tape drive, not with "tar" on the 3B20.
>Some tape controller (which I think they've junked in favor of a sane one)
>imposes a rather small minimum block size; "tar cb 20 ..." exceeds this
>limit.

The limit is 10, and you also have to specify the blocksize manually
when extracting a tape, as well as when writing one.
-- 
John Quarterman,   UUCP:  {ihnp4,seismo,harvard,gatech}!ut-sally!jsq
ARPA Internet and CSNET:  jsq@sally.UTEXAS.EDU, formerly jsq@ut-sally.ARPA

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/08/85)

> If you create a tar 'file' (not tape), and transfer it between systems
> the file will require byte swapping between big-endian and little endian
> machines.  

Say what?  You mean inter-system transfer of byte stream files
causes the bytes to get jumbled?  Oh, wow.

gj@bubba.UUCP (10/08/85)

> 
> I think the problem is with the 3B20 tape drive, not with "tar" on the 3B20.
> Some tape controller (which I think they've junked in favor of a sane one)
> imposes a rather small minimum block size; "tar cb 20 ..." exceeds this
> limit.  This screws "cpio" over, too; this from CPIO(1) in the S5 User's
> Manual:
> 
> 
> For some reason, the "tar" manual page doesn't contain a similar warning.
> Perhaps the people who wrote it thought "tar" was a thing of the past, soon
> to be completely replaced by "cpio".
> 
> 	Guy Harris

I discovered that the maximum blocksize on a 3B5 I used was 8K.  So
much for reading tar tapes written on a VAX (with default block size).
This limitation was documented, nice of them, huh?  When I called
AT&T support and told them that this was highly undesirable, they
politely informed me that they didn't care.


-- 

George Jenkins, COSI Texas, Inc., 4412 Spicewood Springs #801, Austin TX
78759 USA

uucp: {ihnp4,seismo,ctvax}!ut-sally!cositex!bubba!gj
at&t: (512) 345-2780

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (10/08/85)

> The limit is 10, and you also have to specify the blocksize manually
> when extracting a tape, as well as when writing one.

I finally got around to fixing the SVR2 "tar".  Anyone who has
received the BRL UNIX System V emulation directly from me BEFORE
05-Aug-1985 and who wants the fixed source, send me a message.

mark@tove.UUCP (Mark Weiser) (10/08/85)

> If you create a tar 'file' (not tape), and transfer it between systems
> the file will require byte swapping between big-endian and little endian
> machines.  

I just happened to have a ready-made experiment, having just
tared, moved, and then untared a directory from a vax (little-endian) to a
pyramid (big-endian).  File was moved with rcp.  I tried untaring
a file at the vax end, cmp'd the results with a file untared at the pyramid
end and then rcp'd back, and the files were identical.  Of course
rcp could have swapped the bytes back during the xfer before the cmp,
but how could it have known I was going to do a cmp?

In other words, TAR DOES NOT SWAP BYTES (nor should it, nor should
anyone else need them to be swapped when xfering files).
	-mark


-- 
Spoken: Mark Weiser 	ARPA:	mark@maryland	Phone: +1-301-454-7817
CSNet:	mark@umcp-cs 	UUCP:	{seismo,allegra}!umcp-cs!mark
USPS: Computer Science Dept., University of Maryland, College Park, MD 20742

arnold@gatech.CSNET (Arnold Robbins) (10/11/85)

In article <578@im4u.UUCP>, jsq@im4u.UUCP (John Quarterman) writes:
> >I think the problem is with the 3B20 tape drive, not with "tar" on the 3B20.
> >Some tape controller (which I think they've junked in favor of a sane one)
> >imposes a rather small minimum block size; "tar cb 20 ..." exceeds this
> >limit.
> 
> The limit is 10, and you also have to specify the blocksize manually
> when extracting a tape, as well as when writing one.
> -- 
> John Quarterman

The actual limit for blocks that can be written to and read from the tape
drive on the 3B20 is 6K (using the raw device).  I found this out when porting
a really neat backup program to our two 3B20s. So Jsq's comment is correct,
a tar block size of 10 is 5K bytes (tar's "block" is 512 bytes).

This limit really sucks.  Whoever made that decision at AT&T must not have
been someone who actually had to move tapes around.  When is AT&T going
to take their heads out of the sand? The 3B20s are hardware to run UNIX
on, not the other way around!!!
-- 
Arnold Robbins
CSNET:	arnold@gatech	ARPA:	arnold%gatech.csnet@csnet-relay.arpa
UUCP:	{ akgua, allegra, hplabs, ihnp4, seismo, ut-sally }!gatech!arnold

Hello. You have reached the Coalition to Eliminate Answering Machines.
Unfortunately, no one can come to the phone right now....

dpb@laidbak.UUCP (Darryl Baker) (10/16/85)

In article <1473@gatech.CSNET> arnold@gatech.CSNET (Arnold Robbins) writes:
>In article <578@im4u.UUCP>, jsq@im4u.UUCP (John Quarterman) writes:
>> >I think the problem is with the 3B20 tape drive, not with "tar" on the 3B20.
	My old job was in AT&T UN*X Support and the brain damage is in
	the un52 tape controller.
			......
>The actual limit for blocks that can be written to and read from the tape
>drive on the 3B20 is 6K (using the raw device).
			......
>This limit really sucks.  Whoever made that decision at AT&T must not have
>been someone who actually had to move tapes around.  When is AT&T going
>to take their heads out of the sand? The 3B20s are hardware to run UNIX
>on, not the other way around!!!
>-- 
	Yes, they really have problems with tapes. The first 3B20s had
	tape drives that could only handle 2K blocks. The second cut at
	tape drives had the 6K limit. The third cut (latest as of my 
	leaving AT&T) did handle 10K at least but took a special IOP
	to handle it. This was because of the I/O bandwith of the
	dual serial channel. Too much other traffic and the streamer
	tape drive couldn't stream and was slower than the non-streamer.
	If they wanted the 3B20 to run well they would chuck the dual
	serial channel.

-- 
				from the sleepy terminal of
				Darryl Baker
				[ihnp4!]laidbak!dpb

mikel@codas.UUCP (Mikel Manitius) (10/19/85)

> I discovered that the maximum blocksize on a 3B5 I used was 8K.  So
> much for reading tar tapes written on a VAX (with default block size).
> This limitation was documented, nice of them, huh?  When I called
> AT&T support and told them that this was highly undesirable, they
> politely informed me that they didn't care.
> 
> George Jenkins, COSI Texas, Inc., 4412 Spicewood Springs #801, Austin TX

Well, I *do* care, since I'm stuck with 8 tapes of unix source that
I had written on a Vax 11/750 with a TU80 Tape drive, and I can't read
the damn things on our 3B5 (actually, I can, but I have to do a lot
of kludging, link playing with dd).

"Someone, somewhere, must care, because they are in the same pile of sh*t."
-- 
	Mikel Manitius  - ...{ihnp4|akguc|attmail|indra!koura}!codas!mikel

jsq@im4u.UUCP (John Quarterman) (10/20/85)

In article <268@laidbak.UUCP> dpb@laidbak.UUCP (Darryl Baker) writes:
>In article <1473@gatech.CSNET> arnold@gatech.CSNET (Arnold Robbins) writes:
>>In article <578@im4u.UUCP>, jsq@im4u.UUCP (John Quarterman) writes:
>>> >I think the problem is with the 3B20 tape drive, not with "tar" on the 3B20.

Ahem!  While I do not necessarily take exception to any of the text you
quoted, none of it was by me.  My comment was the one saying that the
actual limit is 10 (512 byte) blocks, and that you must specify that
when reading a tape with tar, as well as when writing one.

Please keep your attributions straight.
-- 
John Quarterman,   UUCP:  {ihnp4,seismo,harvard,gatech}!ut-sally!jsq
ARPA Internet and CSNET:  jsq@sally.UTEXAS.EDU, formerly jsq@ut-sally.ARPA