earle@smeagol.UUCP (Greg Earle) (09/04/86)
Recently I attempted to read the /usr tape from the System V Release 2.0 distribution for VAX processors. It comes from AT&T in cpio(1) format, and Sun distributes a version of cpio from SysV.2 with OS 3.0. I found an interesting problem: If one tried to read the tape directly, lo and behold it was byteswapped so cpio complained. Fair enough. In the manual page for cpio, it explicitly warns of byteswapped cpio tapes, and also warns that the `-s' option will not help because it only swaps data bytes, and not those in the header. The cure, as prescribed, is to dd(1) the contents first with the `conv=swab' option to swap all the bytes, including the header, before feeding to cpio (with the `-s' option set). As I was only interested in a table of contents, I merely tried to get one via the `-t' and `-v' options to produce an `ls -l'-like output. In doing so, I discovered that swapping all the bytes made cpio happy, yet somehow the filenames were still coming out byteswapped!! Example: % dd if=/dev/rmt0 ibs=10b conv=swab | cpio -istvBm | head -15 40775 sys 0 Oct 15 18:55:51 1983 40775 uucp 0 Nov 4 12:32:17 1983 da 40775 uucp 0 Apr 26 07:35:16 1982 da/mcatc 40775 uucp 0 Jan 28 11:19:44 1982 da/mcatcn/ti 40775 uucp 0 Jan 28 11:19:44 1982 da/mcatcf/siac 40775 uucp 0 Jan 28 11:19:44 1982 da/mcatcs/mua 100600 root 0 Dec 31 21:00:00 1969 da/musol 100664 sys 0 Jun 10 07:05:33 1982 da/mrefrli 100664 uucp 0 Dec 31 21:00:00 1969 da/mapcc 40775 uucp 0 Jun 10 09:21:13 1980 da/masc 40775 sys 0 Nov 7 07:50:23 1983 ib 100775 sys 10148 Nov 5 19:16:31 1983 ib/ncpta 100775 sys 10760 Nov 5 17:14:27 1983 ib/napkc 100775 sys 10148 Nov 5 19:16:31 1983 ib/nnuapkc 100775 sys 964 Nov 5 19:32:05 1983 ib/nuuotk I assumed that byteswapping everything would take care of the filenames as well, but apparently they are in the `correct' order (for Suns & 680x0 architecture, at any rate) before the byteswap. How this might have arisen??? Is it a bug in the way it (the tape) was written originally, or a bug in cpio(1)? Or in the way a VAX writes char arrays? I have implemented a `fix', based on this source version: >#ifndef lint >static char sccsid[] = "@(#)cpio.c 1.1 86/02/03 SMI"; /* from S5R2 1.17 */ >#endif -------------- diff -l -cb /usr/src/sun/usr.bin/cpio.c /tmp/cpio.c *** /usr/src/sun/usr.bin/cpio.c Mon Feb 3 23:58:42 1986 --- /tmp/cpio.c Wed Sep 3 20:10:23 1986 *************** *** 602,609 } if(Cflag) readhdr(Hdr.h_name, Hdr.h_namesize); ! else ! bread(Hdr.h_name, Hdr.h_namesize); if(EQ(Hdr.h_name, "TRAILER!!!")) return 0; ftype = Hdr.h_mode & Filetype; --- 602,611 ----- } if(Cflag) readhdr(Hdr.h_name, Hdr.h_namesize); ! else { ! bread(Name, Hdr.h_namesize); ! swab(Name, Hdr.h_name, (Hdr.h_namesize + 1) & ~001); ! } if(EQ(Hdr.h_name, "TRAILER!!!")) return 0; ftype = Hdr.h_mode & Filetype; -------------- The results of this fix: % dd if=/dev/rmt0 ibs=10b conv=swab | cpio.fixed -istvBm | head -16 40775 sys 0 Oct 15 18:55:51 1983 . 40775 uucp 0 Nov 4 12:32:17 1983 adm 40775 uucp 0 Apr 26 07:35:16 1982 adm/acct 40775 uucp 0 Jan 28 11:19:44 1982 adm/acct/nite 40775 uucp 0 Jan 28 11:19:44 1982 adm/acct/fiscal 40775 uucp 0 Jan 28 11:19:44 1982 adm/acct/sum 100600 root 0 Dec 31 21:00:00 1969 adm/sulog 100664 sys 0 Jun 10 07:05:33 1982 adm/errfile 100664 uucp 0 Dec 31 21:00:00 1969 adm/pacct 40775 uucp 0 Jun 10 09:21:13 1980 adm/sa 40775 sys 0 Nov 7 07:50:23 1983 bin 100775 sys 10148 Nov 5 19:16:31 1983 bin/pcat 100775 sys 10760 Nov 5 17:14:27 1983 bin/pack 100775 sys 10148 Nov 5 19:16:31 1983 bin/unpack 100775 sys 964 Nov 5 19:32:05 1983 bin/uuto 100775 sys 357 Nov 5 17:32:42 1983 bin/scc This looks a little more reasonable; but I don't know if it is a `fix' or a `kludge to counteract a certain non-uniform condition'. Any clarification would be appreciated. -- Greg Earle UUCP: sdcrdcf!smeagol!earle; attmail!earle JPL ARPA: elroy!smeagol!earle@csvax.caltech.edu AT&T: +1 818 354 0876 I'm continually AMAZED at th'breathtaking effects of WIND EROSION!!
ggs@ulysses.UUCP (Griff Smith) (09/04/86)
> Recently I attempted to read the /usr tape from the System V Release 2.0 > distribution for VAX processors. It comes from AT&T in cpio(1) > format, and Sun distributes a version of cpio from SysV.2 with OS 3.0. > I found an interesting problem: > If one tried to read the tape directly, lo and behold it was byteswapped > so cpio complained. Fair enough. In the manual page for cpio, it > explicitly warns of byteswapped cpio tapes, and also warns that the `-s' > option will not help because it only swaps data bytes, and not those in the > header. The cure, as prescribed, is to dd(1) the contents first with the > `conv=swab' option to swap all the bytes, including the header, before > feeding to cpio (with the `-s' option set).... ... > % dd if=/dev/rmt0 ibs=10b conv=swab | cpio -istvBm | head -15 > 40775 sys 0 Oct 15 18:55:51 1983 > 40775 uucp 0 Nov 4 12:32:17 1983 da ... > How this might have arisen??? Is it a bug in the way > it (the tape) was written originally, or a bug in cpio(1)? > Or in the way a VAX writes char arrays? There are two kinds of byte swapping you might encounter: swapping caused by ill-conceived tape controllers and swapping of binary fields in cpio headers. You are trying to correct for the first, but you are being bitten by the second. Based on your cpio command, the tape was written without the "c" option, which means the headers are written in machine-dependent binary instead of ascii. Character fields are in normal order, however. The dd byte-swap trick is swapping the correctly-written ascii data, which includes the file name. As to why the tape wasn't written with -ocB: ask AT&T! My guess is that the policy is to write a tape that is compatible with the machine it is licensed to be read on. -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: (201) 582-7736 UUCP: {allegra|ihnp4}!ulysses!ggs Internet: ggs@ulysses.uucp
earle@smeagol.UUCP (Greg Earle) (09/05/86)
In article <760@smeagol.UUCP>, earle@smeagol.UUCP I wrote: > I have implemented a `fix', based on this source version: > >#ifndef lint > >static char sccsid[] = "@(#)cpio.c 1.1 86/02/03 SMI"; /* from S5R2 1.17 */ > >#endif > > -------------- > diff -l -cb /usr/src/sun/usr.bin/cpio.c /tmp/cpio.c > *** /usr/src/sun/usr.bin/cpio.c Mon Feb 3 23:58:42 1986 > --- /tmp/cpio.c Wed Sep 3 20:10:23 1986 > *************** > *** 602,609 > } > if(Cflag) > readhdr(Hdr.h_name, Hdr.h_namesize); > ! else > ! bread(Hdr.h_name, Hdr.h_namesize); > if(EQ(Hdr.h_name, "TRAILER!!!")) > return 0; > ftype = Hdr.h_mode & Filetype; > > --- 602,611 ----- > } > if(Cflag) > readhdr(Hdr.h_name, Hdr.h_namesize); > ! else { > ! bread(Name, Hdr.h_namesize); > ! swab(Name, Hdr.h_name, (Hdr.h_namesize + 1) & ~001); > ! } > if(EQ(Hdr.h_name, "TRAILER!!!")) > return 0; > ftype = Hdr.h_mode & Filetype; > -------------- This should probably read: diff -l -cb /usr/src/sun/usr.bin/cpio.c /tmp/cpio.c *** /usr/src/sun/usr.bin/cpio.c Mon Feb 3 23:58:42 1986 --- /tmp/cpio.c Wed Sep 3 20:10:23 1986 *************** *** 602,609 } if(Cflag) readhdr(Hdr.h_name, Hdr.h_namesize); ! else ! bread(Hdr.h_name, Hdr.h_namesize); if(EQ(Hdr.h_name, "TRAILER!!!")) return 0; ftype = Hdr.h_mode & Filetype; --- 602,612 ----- } if(Cflag) readhdr(Hdr.h_name, Hdr.h_namesize); ! else { ! bread(Name, Hdr.h_namesize); ! if (Swap) ! swap(Hdr.h_name, (Hdr.h_namesize + 1) & ~001); ! } if(EQ(Hdr.h_name, "TRAILER!!!")) return 0; ftype = Hdr.h_mode & Filetype; ---------------- since you don't want to swap unless you are converting from a machine that needs the data bytes swapped (`-s' option). Further question: There are two undocumented options, `-S' and `-b', that are supposed to be for "swap half words" and "swap both words". I'm not sure under what circumstance these switches would be used (swap halfwords only when 1/2 word != byte? Swap both when in `pass' mode?). Any explanation? -- Greg Earle UUCP: sdcrdcf!smeagol!earle; attmail!earle JPL ARPA: elroy!smeagol!earle@csvax.caltech.edu AT&T: +1 818 354 0876 Here I am in the POSTERIOR OLFACTORY LOBULE but I don't see CARL SAGAN anywhere!!
guy@sun.UUCP (09/06/86)
> The cure, as prescribed, is to dd(1) the contents first with the > `conv=swab' option to swap all the bytes, including the header, before > feeding to cpio (with the `-s' option set). As I was only interested in > a table of contents, I merely tried to get one via the `-t' and `-v' options > to produce an `ls -l'-like output. In doing so, I discovered that swapping > all the bytes made cpio happy, yet somehow the filenames were still coming > out byteswapped!! > ... > I assumed that byteswapping everything would take care of the filenames > as well, but apparently they are in the `correct' order (for Suns & > 680x0 architecture, at any rate) before the byteswap. > > How this might have arisen??? Is it a bug in the way > it (the tape) was written originally, or a bug in cpio(1)? > Or in the way a VAX writes char arrays? The tape was written correctly. VAXes write "char" arrays the way any sane machine does: if character N of an array goes onto frame M of the tape, character N+1 goes onto frame M+1, etc.. Any sane machine will also read "char" arrays in the same way, so the character array "Kilroy was here" will, if written to a tape by a sane little-endian machine, produce "Kilroy was here" when read from that tape by any sane machine, whether big-endian or little- endian. Thus, the filenames are in the right order, except on insane machines that swap bytes on character strings when they write them to tape (there are such machines out there, alas). (BTW, "for Suns & 680x0 architecture" is redundant in this case; the 680x0 is big-endian regardless of whether it appear in a Sun or anything else. The only machine I know of where its "endianness" is settable is the WE32000 chip, and maybe the later chips in that family; the endianness of that chip is settable from one of the pins on the chip, but I don't think there are many of them running as little-endian machines, if any at all.) The problem is that a "cpio" tape consists of three kinds of data: 1) Headers. All the data in a header (unless the tape was written with the "-c" option) are in the form of "short"s, and must be byte-swapped if they are read on a machine with a different byte order with "short"s. 2) Pathnames. This is a "char" array, and must not have its byte order changed. 3) File contents. In general, this is either: text, which is, in effect, a gigantic "char" array and must not have its byte order changed, or binary data, which could require an arbitrarily complex transformation, so simply changing the byte order is unlikely to be useful. "dd" will change the byte order of *all* the data on the tape; thus, the headers will be read OK but everything else will be garbled. The System III "cpio"s "-b" option would swap the pathnames and the file contents, leaving the headers alone; thus, you first run the tape through "dd" and then through "cpio -b" to read it correctly. Obviously, the person who implemented this realized that swapping most of the data on the tape twice was far more efficient than swapping a small amount of it once. Some bear of equally little brain decided to "fix" this for System V; they realized that almost all files written to "cpio" tapes consist solely of characters, "short"s, or "long"s, so there should be options to swap bytes, halfwords, or both, and those options should apply *only* to the data. Thus, there is no now way to swap *just* the headers by some combination of "cpio" and "dd". The correct fix - available in our next release, because it bit *me* when trying to read in a "cpio" tape made on a VAX - is to check the "magic number" in the header. If it is equal to a byte-swapped version of the "cpio" magic number, then the tape is almost certainly a "cpio" tape written on a machine of the opposite byte sex; "cpio" should then byte-swap the header *and nothing else*. This way, you don't have to worry about the byte sex of the machine on which the tape was written (unless you're trying to transport binary data, but in that case it's not a simple matter of byte sex anyway); "cpio" will figure it out for you. Ideally, the "-c" option should be used; that writes the header in a printable ASCII format, just as "tar" did N years before the "cpio" maintainers figured it out. Unfortunately, there is a bug in the System III "cpio" that means that the "-c" format doesn't work right. Equally unfortunately, the S5R2 distribution tape wasn't written with "-c" ("gee, why should it be, if it's a VAX distribution tape people are going to read it in on their VAX, right?"). -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
gwyn@brl-smoke.ARPA (Doug Gwyn ) (09/06/86)
In article <760@smeagol.UUCP> earle@smeagol.UUCP (Greg Earle) writes: >Any clarification would be appreciated. There are actually two "cpio" modes. The "old original" one works with archives that are machine-specific ("binary headers"). As you discovered, it is an oversimplification to analyze the machine architecture dependency in terms of "byte swapping". The "new" cpio mode works with a machine-independent "ASCII header" format. AT&T ships add-on software such as DWB in the portable format, but older distributions were in machine-specific CPIO format. Unfortunately the "cpio" default is binary header (for efficiency in the typical use of "cpio" to copy directories, I suppose, as well as for backward-compatibility reasons). One should be careful to specify the "-c" option when writing archives for export to other sites. (There is a "-ncpio" option to find that makes ASCII-header CPIO archives, by the way.) Our version of "cpio" adds to the "out of phase" error a suggestion that perhaps the "-c" option should be used, since this is often the cause of that error.