martin@adpplz.UUCP (Martin Golding) (06/14/91)
We're using unix V3.2 ,more or less, Motorola's release 2 on 88k's We got a 9 track tape in tar format, with 10k blocks. It's from a BSD type system (at least, the tar headers match my sun manuals and tar from the Motorola doesn't work). The only tape drive we have is on one of the Motorola boxes, so what we figured was, we'd cat it to a file on the Motorola box and then rcp or rsh tar the puppy from the Sun. When we cat it to a file (cat /dev/r50t >/dir/bigfile), it looks like we don't get all the data from the tape. We get the same effect if we rsh cat <the device>. My hypothesis is, we're using the wrong driver (either it doesn't buffer internally, or the buffers it has are too small) or all the drivers have too small internal buffers. What should we do? All suggestions appreciated, interesting experiments cheefully undertaken. Please help, the tape is from some furiners, and it could take us weeks to get another. Martin Golding | sync, sync, sync, sank ... sunk: Dod #0236 | He who steals my code steals trash. A poor old decrepit Pick programmer. Sympathize at: {mcspdx,pdxgate}!adpplz!martin or martin@adpplz.uucp
boyd@prl.dec.com (Boyd Roberts) (06/14/91)
In article <803@adpplz.UUCP>, martin@adpplz.UUCP (Martin Golding) writes: > We got a 9 track tape in tar format, with 10k blocks. It's from a BSD type > system (at least, the tar headers match my sun manuals and tar from > the Motorola doesn't work). The only tape drive we have is on one of > the Motorola boxes, so what we figured was, we'd cat it to a file > on the Motorola box and then rcp or rsh tar the puppy from the Sun. > > When we cat it to a file (cat /dev/r50t >/dir/bigfile), it looks > like we don't get all the data from the tape. We get the same effect > if we rsh cat <the device>. No, never do that. With 9 track tapes you must do I/O that will ensure that the _whole_ tape block will be read. Odds on that cat(1)'s blocksize is much less than 10k, and consequently each tape read returns part of the tape block. The tape will be positioned to read the next _tape block_ and the stuff you didn't read is lost. Now, from what I've seen, streamers don't behave like this. But _all_ 9 track UNIX tape drives do. If you find one that doesn't -- it's broken. So you want I/O's that are the same size as the tape block. Use tar directly or: dd if=/dev/r50t bs=10k of=/dir/bigfile Boyd Roberts boyd@prl.dec.com ``When the going gets wierd, the weird turn pro...''
cpcahil@virtech.uucp (Conor P. Cahill) (06/14/91)
martin@adpplz.UUCP (Martin Golding) writes: >We got a 9 track tape in tar format, with 10k blocks. It's from a BSD type >system (at least, the tar headers match my sun manuals and tar from >... >When we cat it to a file (cat /dev/r50t >/dir/bigfile), it looks >like we don't get all the data from the tape. We get the same effect >if we rsh cat <the device>. The problem is the blocking factor. You must read data from a 9-track drive in blocks that are at least as big as the block that was used to write the tape. So to get the data off your tape use: dd if=/dev/r50t of=whereever bs=10k -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
martin@adpplz.UUCP (Martin Golding) (06/15/91)
I said: >We got a 9 track tape in tar format, with 10k blocks... >When we cat it to a file it looks like we don't get all the data. > What should we do? And already I have the correct answer (2 copies, no flames, thank you all) "cat" has too small a buffer, use "dd" to get the data off the tape. Thanks, and I will now go back and re-read everything I have about unix files and tape handling. (_Training_? We didn't have time for _training_, we had to start _coding_.) Thanks again, I'm off to do the makes.. Martin Golding | sync, sync, sync, sank ... sunk: Dod #0236 | He who steals my code steals trash. A poor old decrepit Pick programmer. Sympathize at: {mcspdx,pdxgate}!adpplz!martin or martin@adpplz.uucp
torek@elf.ee.lbl.gov (Chris Torek) (06/19/91)
In article <1991Jun14.094822.7029@prl.dec.com> boyd@prl.dec.com (Boyd Roberts) writes: >No, never do that. With 9 track tapes you must do I/O that will >ensure that the _whole_ tape block will be read. ... It seems to me that the tape driver should return an error if you ask for 1K and the tape drive reads 10K. Unfortunately, there is no obvious errno for this (ENOMEM? EINVAL? E2BIG? EFBIG? EMSGSIZE? ENOBUFS?). -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
boyd@prl.dec.com (Boyd Roberts) (06/19/91)
In article <14433@dog.ee.lbl.gov>, torek@elf.ee.lbl.gov (Chris Torek) writes: > It seems to me that the tape driver should return an error if you > ask for 1K and the tape drive reads 10K. Unfortunately, there is > no obvious errno for this (ENOMEM? EINVAL? E2BIG? EFBIG? EMSGSIZE? > ENOBUFS?). I saw one driver hacked to return the amount not read. No, not one of my hacks. I'm not sure whether it was such a good idea though. Programs who blindly believe write() > 0 is ok, just won't work. Smart archivers could benefit from it, but I think the cost of broken programs would be too high. Boyd Roberts boyd@prl.dec.com ``When the going gets wierd, the weird turn pro...''
martin@adpplz.UUCP (Martin Golding) (06/21/91)
Here I am again, and thanks to all who replied; we used dd and it works just like you said. >In article <1991Jun14.094822.7029@prl.dec.com> boyd@prl.dec.com >(Boyd Roberts) writes: >>No, never do that. With 9 track tapes you must do I/O that will >>ensure that the _whole_ tape block will be read. ... In <14433@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes: >It seems to me that the tape driver should return an error if you >ask for 1K and the tape drive reads 10K. Ummmm. Please confine flames to email. (Takes deep breath. Prepares to meet ancestors.) How about having the tape driver return the data? You know, just like the disk driver, and the terminal driver, and the ethernet driver, and the printer driver. I _never_ told the system what blocksize my files are, and when I read them I get _every single byte_. When we went to our unix introduction class (not _training_. We don't have time for _training_.) the teacher gave us the religious incantation that "everything's just a byte stream, and they're all just the same". Which is why we screwed up in the first place. Streamers need fixed _buffering_ independent of block size, that ought to be a function of the driver or the controller. If you don't want to have permanent large buffers in the tape driver or controller, you could use the P*ck idiom for setting up the tape drive before a read or write process (effectively an ioctl that defines block size, buffering, density and any other exciting features) instead of the unix naming convention. Vast perverse heresy: If you built a streams tape driver, you could handle multiple volumes and arbitrary kinds of labeling, independently of your process! just like the hype says. Like I said, flames to email. I promise to grovel with satisfactory abject humility. Martin Golding | sync, sync, sync, sank ... sunk: Dod #0236 | He who steals my code steals trash. A poor old decrepit Pick programmer. Sympathize at: {mcspdx,pdxgate}!adpplz!martin or martin@adpplz.uucp
torek@elf.ee.lbl.gov (Chris Torek) (06/22/91)
>>In article <1991Jun14.094822.7029@prl.dec.com> boyd@prl.dec.com >>(Boyd Roberts) writes: >>>No, never do that. With 9 track tapes you must do I/O that will >>>ensure that the _whole_ tape block will be read. ... >In <14433@dog.ee.lbl.gov> I suggested: >>It seems to me that the tape driver should return an error if you >>ask for 1K and the tape drive reads 10K. In article <829@adpplz.UUCP> martin@adpplz.UUCP (Martin Golding) writes: >How about having the tape driver return the data? This is a great idea ... but it just will not work, not in conventional Unix contexts. >You know, just like the disk driver, and the terminal driver, and the >ethernet driver, and the printer driver. The problem is somewhat different from disks, not generally applicable to terminals, and entirely applicable to Ethernets (for which raw device read() system calls generally do not exist). Printers generally do not return data to the system and are rather irrelevant. >I _never_ told the system what blocksize my files are, Files? Tape blocks are not files (nor are disk blocks); you do not mount the tape as a file system and open, close, read, write files on the `tape file system'. (There *are* some tape devices that can support this; indeed, 9 track tapes, when extended gaps are used, are to some extent `block addressible'. Most 9 track tapes are not written with extended gaps.) >Streamers need fixed _buffering_ independent of block size, Streamers? Who said anything about streamers? >Vast perverse heresy: If you built a streams tape driver, you could >handle multiple volumes and arbitrary kinds of labeling, independently of >your process! just like the hype says. And indeed, if you did this you could exchange tapes with your Unix buddies, and so forth. But then the day comes when someone hands you a `foreign' tape. (ominous background music) Seriously: The interface we are using here is the `raw' device interface. If you talk to a raw disk, the driver forces you to use the disk's block size: reading or writing one byte from /dev/rdk3c will fail. (On some Unix boxes, it fails by destroying most of the sector, rather than returning an error: not pretty.) Nine track tapes have `records'. The records show through on the raw device, because it *is* the raw device. The records have variable sizes, and in fact do change size. In order to copy a 9 track tape you must retain not only the data, but also the block sizes. Foreign machines actually *use* this stuff, for some reason. The Unix raw device semantics, inasmuch as there are any defined semantics at all, are that each read() or write() system call translates to a single device operation. Hence, when you write() 4096 bytes to a raw 9 track tape, the tape driver tells the tape formatter to write one 4096-byte record. Likewise, when you read() 4096 bytes from a raw 9 track tape, the driver tells the formatter to read one 4096-byte record. If the record under the tape drive's read head just happens to be 10240 bytes, rather than 4096 bytes, the formatter will THROW AWAY the `extra' 6144 bytes. It is gone; the driver never sees it. Typically, all the driver sees is a flag bit in the transfer status, `record length short': `I threw away some of your data. Sorry.' Disk drivers do not have this problem, because disk sectors have a fixed size that is known in advance.% [%Ignore those IBM drives behind the curtain!] Of course, the driver could backspace the tape and reissue the read, asking for more data. There are two problems: a) the driver does not know how *much* more data to read; b) the driver does not have a place to put the extra data anyway. You are using the raw interface, not a buffering interface; there is nowhere to stash the leftover data. You can use the block device, and go through the block device buffer system. However, it generally has some particular size it expects, or some particular range of sizes. Typically this is 512 bytes or some multiple thereof, usually up to 8192 bytes, sometimes 16384 bytes; on a few systems, the block device buffers will even handle 65536 bytes. 9 track tape records typically come in 10240 byte or 32768 byte records, and hence often will not fit anyway. The problem could be solved by adding a whole new abstraction (a `tape' interface with large buffers that, on read, may be only partially filled), but Unix systems generally get away without this. Why are tty interfaces different? Well, first, you are not using the raw device (not even in `raw' mode). Ttys are regular enough, and well-enough understood, to slap an abstraction over top of them and ignore the gritty details of which bits are mark and which are space. This *does* sometimes cause problems; there are people who need particular timing sequences of marking and spacing, and there are interfaces that can do it, with Unix boxes that cannot. But it is not often a problem (unlike 9-track tape exchange, where little sanity reigns). (Note that POSIX spent time wranging over the tty interface, even though they started with the System III stuff, which was clearly a better control abstraction than the V7 stuff found in 4.[123]BSD. Even the well-defined ttys are not well-enough defined for some.) How about Ethernets? Well, not many Unix systems let you open /dev/en0 and read() from it. If you could, and if you asked for ten bytes, and 1536 bytes showed up, the driver would have to save them somewhere, because there is no going back. Fortunately, in this case, there is an easy maximum (1536) and the software abstraction involves protocol demultiplexing already, so already the software must read into private buffers, and can make whatever arrangements it likes. If there were a raw Ethernet interface, though, it might well be best if it required 1536-bytes-or-more on each read() system call. Certainly it should be able to tell you whether you lost something. As it is, the only way a tape driver can do this now is to return an error. Most do not even bother: and when you copy your tape with dd if=/dev/rmt8 of=/dev/rmt9 bs=10k but the record size was 32k, you never even know that your copy is useless. Basically, then, you have two choices: a) Throw a lot of code into the kernel to add `cooked tape devices', somewhat like cooked ttys. You will probably have to leave raw tape devices in anyway, for tape exchange purposes. b) Leave the ugly semantics of 9-track tapes exposed through the raw interface, and let those programs that deal with tapes, also deal with the Outside World. For some reason, most people seem to go for choice (b). -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov
gwyn@smoke.brl.mil (Doug Gwyn) (06/22/91)
In article <14585@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
-Basically, then, you have two choices:
- a) Throw a lot of code into the kernel to add `cooked tape devices',
- somewhat like cooked ttys. You will probably have to leave raw
- tape devices in anyway, for tape exchange purposes.
- b) Leave the ugly semantics of 9-track tapes exposed through the raw
- interface, and let those programs that deal with tapes, also deal
- with the Outside World.
-For some reason, most people seem to go for choice (b).
I've used UNIX systems that implemented both. The "cooked" tape device
was virtually never used.
I agree with the assessment that raw devices are not mere byte streams
and that record boundaries do matter for raw devices.