[comp.unix.questions] Can't cat tape- big blocks?

martin@adpplz.UUCP (Martin Golding) (06/14/91)

We're using unix V3.2 ,more or less, Motorola's release 2 on 88k's

We got a 9 track tape in tar format, with 10k blocks. It's from a BSD type 
system (at least, the tar headers match my sun manuals and tar from
the Motorola doesn't work). The only tape drive we have is on one of
the Motorola boxes, so what we figured was, we'd cat it to a file
on the Motorola box and then rcp or rsh tar the puppy from the Sun.

When we cat it to a file (cat /dev/r50t >/dir/bigfile), it looks
like we don't get all the data from the tape. We get the same effect
if we rsh cat <the device>.

My hypothesis is, we're using the wrong driver (either it doesn't buffer
internally, or the buffers it has are too small) or all the drivers have
too small internal buffers. 


What should we do? All suggestions appreciated, interesting experiments
cheefully undertaken.

Please help, the tape is from some furiners, and it could take us weeks
to get another.


Martin Golding    | sync, sync, sync, sank ... sunk:
Dod #0236         |  He who steals my code steals trash.
A poor old decrepit Pick programmer. Sympathize at:
{mcspdx,pdxgate}!adpplz!martin or martin@adpplz.uucp

boyd@prl.dec.com (Boyd Roberts) (06/14/91)

In article <803@adpplz.UUCP>, martin@adpplz.UUCP (Martin Golding) writes:
> We got a 9 track tape in tar format, with 10k blocks. It's from a BSD type 
> system (at least, the tar headers match my sun manuals and tar from
> the Motorola doesn't work). The only tape drive we have is on one of
> the Motorola boxes, so what we figured was, we'd cat it to a file
> on the Motorola box and then rcp or rsh tar the puppy from the Sun.
> 
> When we cat it to a file (cat /dev/r50t >/dir/bigfile), it looks
> like we don't get all the data from the tape. We get the same effect
> if we rsh cat <the device>.

No, never do that.  With 9 track tapes you must do I/O that will
ensure that the _whole_ tape block will be read.  Odds on that cat(1)'s
blocksize is much less than 10k, and consequently each tape read
returns part of the tape block.  The tape will be positioned to
read the next _tape block_ and the stuff you didn't read is lost.

Now, from what I've seen, streamers don't behave like this.  But _all_
9 track UNIX tape drives do.  If you find one that doesn't -- it's broken.

So you want I/O's that are the same size as the tape block.  Use tar directly or:

   dd if=/dev/r50t bs=10k of=/dir/bigfile

Boyd Roberts			boyd@prl.dec.com

``When the going gets wierd, the weird turn pro...''

cpcahil@virtech.uucp (Conor P. Cahill) (06/14/91)

martin@adpplz.UUCP (Martin Golding) writes:

>We got a 9 track tape in tar format, with 10k blocks. It's from a BSD type 
>system (at least, the tar headers match my sun manuals and tar from
>...
>When we cat it to a file (cat /dev/r50t >/dir/bigfile), it looks
>like we don't get all the data from the tape. We get the same effect
>if we rsh cat <the device>.

The problem is the blocking factor.  You must read data from a 9-track
drive in blocks that are at least as big as the block that was used to
write the tape.  So to get the data off your tape use:

	dd if=/dev/r50t of=whereever bs=10k

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170

martin@adpplz.UUCP (Martin Golding) (06/15/91)

I said:

>We got a 9 track tape in tar format, with 10k blocks...

>When we cat it to a file it looks like we don't get all the data.
> What should we do? 

And already I have the correct answer (2 copies, no flames, thank you
all) "cat" has too small a buffer, use "dd" to get the data off the tape.

Thanks, and I will now go back and re-read everything I have about
unix files and tape handling.	
(_Training_? We didn't have time for _training_, we had to start _coding_.)

Thanks again, I'm off to do the makes..


Martin Golding    | sync, sync, sync, sank ... sunk:
Dod #0236         |  He who steals my code steals trash.
A poor old decrepit Pick programmer. Sympathize at:
{mcspdx,pdxgate}!adpplz!martin or martin@adpplz.uucp

torek@elf.ee.lbl.gov (Chris Torek) (06/19/91)

In article <1991Jun14.094822.7029@prl.dec.com> boyd@prl.dec.com
(Boyd Roberts) writes:
>No, never do that.  With 9 track tapes you must do I/O that will
>ensure that the _whole_ tape block will be read. ...

It seems to me that the tape driver should return an error if you
ask for 1K and the tape drive reads 10K.  Unfortunately, there is
no obvious errno for this (ENOMEM? EINVAL? E2BIG? EFBIG? EMSGSIZE?
ENOBUFS?).
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

boyd@prl.dec.com (Boyd Roberts) (06/19/91)

In article <14433@dog.ee.lbl.gov>, torek@elf.ee.lbl.gov (Chris Torek) writes:
> It seems to me that the tape driver should return an error if you
> ask for 1K and the tape drive reads 10K.  Unfortunately, there is
> no obvious errno for this (ENOMEM? EINVAL? E2BIG? EFBIG? EMSGSIZE?
> ENOBUFS?).

I saw one driver hacked to return the amount not read.  No, not one
of my hacks.  I'm not sure whether it was such a good idea though.

Programs who blindly believe write() > 0 is ok, just won't work.
Smart archivers could benefit from it, but I think the cost
of broken programs would be too high.

Boyd Roberts			boyd@prl.dec.com

``When the going gets wierd, the weird turn pro...''

martin@adpplz.UUCP (Martin Golding) (06/21/91)

Here I am again, and thanks to all who replied; we used dd and it works
just like you said.

>In article <1991Jun14.094822.7029@prl.dec.com> boyd@prl.dec.com
>(Boyd Roberts) writes:
>>No, never do that.  With 9 track tapes you must do I/O that will
>>ensure that the _whole_ tape block will be read. ...

In <14433@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:

>It seems to me that the tape driver should return an error if you
>ask for 1K and the tape drive reads 10K.

Ummmm. Please confine flames to email. (Takes deep breath. Prepares
to meet ancestors.)

How about having the tape driver return the data? You know, just like
the disk driver, and the terminal driver, and the ethernet driver,
and the printer driver. I _never_ told the system what blocksize my
files are, and when I read them I get _every single byte_.  When we
went to our unix introduction class (not _training_. We don't have
time for _training_.) the teacher gave us the religious incantation
that "everything's just a byte stream, and they're all just the same".
Which is why we screwed up in the first place.

Streamers need fixed _buffering_ independent of block size, that ought
to be a function of the driver or the controller. If you don't want to
have permanent large buffers in the tape driver or controller, you could
use the P*ck idiom for setting up the tape drive before a read or write
process (effectively an ioctl that defines block size, buffering, density
and any other exciting features) instead of the unix naming convention.

Vast perverse heresy: If you built a streams tape driver, you could
handle multiple volumes and arbitrary kinds of labeling, independently of
your process! just like the hype says.

Like I said, flames to email. I promise to grovel with satisfactory
abject humility.

Martin Golding    | sync, sync, sync, sank ... sunk:
Dod #0236         |  He who steals my code steals trash.
A poor old decrepit Pick programmer. Sympathize at:
{mcspdx,pdxgate}!adpplz!martin or martin@adpplz.uucp

torek@elf.ee.lbl.gov (Chris Torek) (06/22/91)

>>In article <1991Jun14.094822.7029@prl.dec.com> boyd@prl.dec.com
>>(Boyd Roberts) writes:
>>>No, never do that.  With 9 track tapes you must do I/O that will
>>>ensure that the _whole_ tape block will be read. ...

>In <14433@dog.ee.lbl.gov> I suggested:
>>It seems to me that the tape driver should return an error if you
>>ask for 1K and the tape drive reads 10K.

In article <829@adpplz.UUCP> martin@adpplz.UUCP (Martin Golding) writes:
>How about having the tape driver return the data?

This is a great idea ... but it just will not work, not in conventional
Unix contexts.

>You know, just like the disk driver, and the terminal driver, and the
>ethernet driver, and the printer driver.

The problem is somewhat different from disks, not generally applicable
to terminals, and entirely applicable to Ethernets (for which raw device
read() system calls generally do not exist).  Printers generally do not
return data to the system and are rather irrelevant.

>I _never_ told the system what blocksize my files are,

Files?  Tape blocks are not files (nor are disk blocks); you do not
mount the tape as a file system and open, close, read, write files on
the `tape file system'.  (There *are* some tape devices that can
support this; indeed, 9 track tapes, when extended gaps are used, are
to some extent `block addressible'.  Most 9 track tapes are not written
with extended gaps.)

>Streamers need fixed _buffering_ independent of block size,

Streamers?  Who said anything about streamers?

>Vast perverse heresy: If you built a streams tape driver, you could
>handle multiple volumes and arbitrary kinds of labeling, independently of
>your process! just like the hype says.

And indeed, if you did this you could exchange tapes with your Unix
buddies, and so forth.

But then the day comes when someone hands you a `foreign' tape.
(ominous background music)

Seriously:  The interface we are using here is the `raw' device
interface.  If you talk to a raw disk, the driver forces you to use the
disk's block size: reading or writing one byte from /dev/rdk3c will
fail.  (On some Unix boxes, it fails by destroying most of the sector,
rather than returning an error: not pretty.)

Nine track tapes have `records'.  The records show through on the
raw device, because it *is* the raw device.  The records have variable
sizes, and in fact do change size.  In order to copy a 9 track tape
you must retain not only the data, but also the block sizes.  Foreign
machines actually *use* this stuff, for some reason.

The Unix raw device semantics, inasmuch as there are any defined
semantics at all, are that each read() or write() system call
translates to a single device operation.  Hence, when you write() 4096
bytes to a raw 9 track tape, the tape driver tells the tape formatter
to write one 4096-byte record.  Likewise, when you read() 4096 bytes
from a raw 9 track tape, the driver tells the formatter to read one
4096-byte record.  If the record under the tape drive's read head just
happens to be 10240 bytes, rather than 4096 bytes, the formatter will
THROW AWAY the `extra' 6144 bytes.  It is gone; the driver never sees
it.  Typically, all the driver sees is a flag bit in the transfer
status, `record length short': `I threw away some of your data.
Sorry.' Disk drivers do not have this problem, because disk sectors
have a fixed size that is known in advance.%  [%Ignore those IBM drives
behind the curtain!]

Of course, the driver could backspace the tape and reissue the read,
asking for more data.  There are two problems:

 a) the driver does not know how *much* more data to read;
 b) the driver does not have a place to put the extra data anyway.

You are using the raw interface, not a buffering interface; there is
nowhere to stash the leftover data.

You can use the block device, and go through the block device buffer
system.  However, it generally has some particular size it expects, or
some particular range of sizes.  Typically this is 512 bytes or some
multiple thereof, usually up to 8192 bytes, sometimes 16384 bytes; on a
few systems, the block device buffers will even handle 65536 bytes.  9
track tape records typically come in 10240 byte or 32768 byte records,
and hence often will not fit anyway.  The problem could be solved by
adding a whole new abstraction (a `tape' interface with large buffers
that, on read, may be only partially filled), but Unix systems
generally get away without this.

Why are tty interfaces different?  Well, first, you are not using the
raw device (not even in `raw' mode).  Ttys are regular enough, and
well-enough understood, to slap an abstraction over top of them and
ignore the gritty details of which bits are mark and which are space.
This *does* sometimes cause problems; there are people who need
particular timing sequences of marking and spacing, and there are
interfaces that can do it, with Unix boxes that cannot.  But it is not
often a problem (unlike 9-track tape exchange, where little sanity
reigns).  (Note that POSIX spent time wranging over the tty interface,
even though they started with the System III stuff, which was clearly a
better control abstraction than the V7 stuff found in 4.[123]BSD.  Even
the well-defined ttys are not well-enough defined for some.)

How about Ethernets?  Well, not many Unix systems let you open /dev/en0
and read() from it.  If you could, and if you asked for ten bytes, and
1536 bytes showed up, the driver would have to save them somewhere,
because there is no going back.  Fortunately, in this case, there is an
easy maximum (1536) and the software abstraction involves protocol
demultiplexing already, so already the software must read into private
buffers, and can make whatever arrangements it likes.

If there were a raw Ethernet interface, though, it might well be best
if it required 1536-bytes-or-more on each read() system call.
Certainly it should be able to tell you whether you lost something.
As it is, the only way a tape driver can do this now is to return an
error.  Most do not even bother: and when you copy your tape with

	dd if=/dev/rmt8 of=/dev/rmt9 bs=10k

but the record size was 32k, you never even know that your copy is
useless.

Basically, then, you have two choices:

 a) Throw a lot of code into the kernel to add `cooked tape devices',
    somewhat like cooked ttys.  You will probably have to leave raw
    tape devices in anyway, for tape exchange purposes.

 b) Leave the ugly semantics of 9-track tapes exposed through the raw
    interface, and let those programs that deal with tapes, also deal
    with the Outside World.

For some reason, most people seem to go for choice (b).
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

gwyn@smoke.brl.mil (Doug Gwyn) (06/22/91)

In article <14585@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
-Basically, then, you have two choices:
- a) Throw a lot of code into the kernel to add `cooked tape devices',
-    somewhat like cooked ttys.  You will probably have to leave raw
-    tape devices in anyway, for tape exchange purposes.
- b) Leave the ugly semantics of 9-track tapes exposed through the raw
-    interface, and let those programs that deal with tapes, also deal
-    with the Outside World.
-For some reason, most people seem to go for choice (b).

I've used UNIX systems that implemented both.  The "cooked" tape device
was virtually never used.

I agree with the assessment that raw devices are not mere byte streams
and that record boundaries do matter for raw devices.