ggs@ulysses.homer.nj.att.com (Griff Smith) (02/24/88)
In article <2338@umd5.umd.edu>, chris@trantor.umd.edu.UUCP writes: > >In article <2323@umd5.umd.edu> I asked: > >>Is the 4.3BSD [TU78] driver wrong once or twice? > > It appears I insulted the driver unnecessarily. I have yet to find > a bug here, but... > > In article <10102@ulysses.homer.nj.att.com> ggs@ulysses.homer.nj.att.com > (Griff Smith) writes: > >The driver, as released in 4.3BSD, does have a few bugs - not in error > >recovery that I know of. > > ... I spotted a nice bug in mtustart... > -- > In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163 If you had looked in your archives of comp.bugs.4bsd you would have found the report of this "nice bug"; I sent it to Berkeley on August 11, 1987 and also filed to netnews. That bug was introduced by someone at Berkeley sometime between 4.3 beta and 4.3 official. I also reported a race in the "open" code - I will take responsibility for not having noticed it when I overhauled the driver. I don't want to start a war, but I am a bit miffed about this exchange of notes about the TU78 driver. I tried to do a careful job of quality testing, added comments to the code to explain my assumptions, went through a year's grief getting approval from my management to release the code, and then followed up with more bug reports when further problems (some beyond my control) appeared after the official release. I also left a mail address in the source code in case problems were discovered. It would have been an act of simple courtesy to have asked me in private communication first. Not only would it have saved net bandwidth, but the problem would probably have been diagnosed faster and both of us would have avoided the embarrassment of a public shouting match. I particularly resent having my sentence "I posted fixes to comp.bugs.4bsd for the bugs that I found" removed from the above followup to my followup. This twisted the reply to say "your driver does too have bugs - look at this juicy one". I don't think I should have felt obliged to re-post the bug reports just to preemt this kind of jab. I will try to continue to follow a policy of using private mail when questioning network articles. After an embarrassing exchange with Chris a few years ago, I think I have learned my lesson about public posting. It can be frustrating, however. A recent private exchange with Chris about difficulties taking his advice to port 4.3BSD "dump" to Sun work stations broke off with a comment that could be paraphrased as "this is left as a trivial exercise to the reader". Chris, I respect your ability. You have made valuable contributions to the UNIX System software environment. How about giving the rest of us mortals some credit for intelligence. -- Griff Smith AT&T (Bell Laboratories), Murray Hill Phone: 1-201-582-7736 UUCP: {allegra|ihnp4}!ulysses!ggs Internet: ggs@ulysses.att.com
chris@trantor.umd.edu (Chris Torek) (02/24/88)
In article <10110@ulysses.homer.nj.att.com> ggs@ulysses.homer.nj.att.com (Griff Smith) writes: >If [I] had looked in your archives of comp.bugs.4bsd you would have >found the report of this "nice bug"; I sent it to Berkeley on August >11, 1987 and also filed to netnews. Unfortunately, my archives (such as they are) are on those tapes we have been having trouble reading. You did indeed post such a fix; I found it elsewhere since. And as you mention, that bug was introduced at Berkeley by A. Nonymous anyway. [much deleted; see the previous article] >It would have been an act of simple courtesy to have asked me in >private communication first. (That presupposes I would know where to ask.) >... I particularly resent having my sentence "I posted >fixes to comp.bugs.4bsd for the bugs that I found" removed from the >above followup to my followup. This twisted the reply to say "your >driver does too have bugs - look at this juicy one". I did not mean to do that. (For that matter, I know of no one who uses cooked /dev/mt devices anyway. Without a way to set the block size, and given the repositioning error on 9 track tapes, what good *are* block tape devices? They make terrible disk drives. Hence a bug in the block code is hardly juicy.) At any rate, to get to the point (yes, there is one here), I had actually intended my previous followup as an apology. I just like having the last word :-) and was not careful about the wording of said words. Consider this another attempt. By the way, we have concluded that the problem is in hardware. A handy nearby 6250 bpi drive is now successfully reading those tapes, and we have a call in to DEC to get the TU78 fixed. -- In-Real-Life: Chris Torek, Univ of MD Computer Science, +1 301 454 7163 (hiding out on trantor.umd.edu until mimsy is reassembled in its new home) Domain: chris@mimsy.umd.edu Path: not easily reachable
nessus@athena.mit.edu (Doug Alan) (02/27/88)
In article <2346@umd5.umd.edu> chris@trantor.umd.edu (Chris Torek) writes: > (For that matter, I know of no one who uses cooked /dev/mt devices > anyway. Without a way to set the block size, and given the > repositioning error on 9 track tapes, what good *are* block tape > devices? They make terrible disk drives. Hence a bug in the block > code is hardly juicy.) I've used block tape devices a lot. We have many DEC TK50 streaming tape drives here (one came with every one of a couple hundred VS2's we received). The TK50 performs very very very slow and unreliably if it doesn't get to stream. The block device is double buffered, while the raw device is not. If the raw device is used with the TK50 drive, the tape drive doesn't stream. If the block device is used with the TK50 drive, the tape drive does stream, and is much much happier. |>oug /\lan
jbs@eddie.MIT.EDU (Jeff Siegal) (02/28/88)
In article <3261@bloom-beacon.MIT.EDU> nessus@athena.mit.edu (Doug Alan) writes: >The TK50 performs very very very slow and unreliably if it >doesn't get to stream. The block device is double buffered, while the >raw device is not. If the raw device is used [...], the >tape drive doesn't stream. If the block device is used [...], >the tape drive does stream, [...]. In addition to the buffering, the block device forces an abysmally small block size (as Chris pointed out). This is a conventional way to make streaming tape drives stream (by reducing the tape data rate and density), and also a great way to cripple a tape subsystem. A much better way to drive such devices is with raw, asynchronous I/O. Oh, Unix doesn't do that? Hmm, I thought there was this other operating system for VAX's, but I can't seem to remember the name right now... Jeff Siegal
mangler@cit-vax.Caltech.Edu (Don Speck) (03/10/88)
In article <3261@bloom-beacon.MIT.EDU>, nessus@athena.mit.edu (Doug Alan) writes: > If the block device is used with the TK50 > drive, the tape drive does stream, and is much much happier. Somebody here made a similar observation about TU80's, so he did all his dumps to the block device. Sometime later he needed to do a restore, and all his tapes gave a premature EOF. Dump had been calculating tape capacity based on 10240 byte blocks, but the block device was writing 2048 byte blocks and it wouldn't all fit. The tape driver returned error, but because block-device writes are asynchronous, the completion status doesn't get returned to anybody, so he had no indication that writes were not getting done (except perhaps the long pause at end of tape). The block device is for mounting filesystems. Read only. Which you'd probably only want to do if your tape drive is actually a WORM. Didn't work correctly in 4.2bsd, though. Don Speck speck@vlsi.caltech.edu {amdahl,ames!elroy}!cit-vax!speck
nessus@athena.mit.edu (Doug Alan) (03/11/88)
In article <5719@cit-vax.Caltech.Edu> mangler@cit-vax.Caltech.Edu (Don Speck) writes: >> [Doug Alan:] If the block device is used with the TK50 drive, the >> tape drive does stream, and is much much happier. > Somebody here made a similar observation about TU80's, so he did all > his dumps to the block device. [...] Dump [calculated] tape > capacity based on 10240 byte blocks, but the block device was > writing 2048 byte blocks and it wouldn't all fit. Yup, that will happen if you don't know what you are doing. > [He didn't notice this until later, however, when] he needed to do a > restore, and all his tapes gave a premature EOF. [...] The tape > driver [had] returned error [when writing the tape], but because > block-device writes are asynchronous, the completion status doesn't > get returned to anybody, so he had no indication that writes were > not getting done (except perhaps the long pause at end of tape). Well, I don't know what kind of system you are using, but on our 4.3BSD systems, there is no such problem. 'Dump' receives errors such as these even when writing to the tape asynchronously using the block device. I know this for a fact because this very thing happened to me last night when I made a typo and used the block device when I had meant to use the raw device on a TU78. A while later, 'dump' stopped, complaining that there was a write error 1200 feet into the tape. The 2400 foot tape on the tape drive, however, was at the end. There *are* also a few problems using 'restore' on the block device, but they can also be worked around. If 'restore' gets an error while reading the block device, it can't recover from the error and it just gives up. What you have to do is use 'dd' to read the tape, telling it not to stop on errors and to pad any incomplete blocks. The output from 'dd', you pipe into 'restore'. > The block device is for mounting filesystems. Read only. Which you'd > probably only want to do if your tape drive is actually a WORM. Didn't > work correctly in 4.2bsd, though. So you're saying that instead of using the block device to do dumps on the TK50 and gotten dumps that worked, I should have used the raw device and gotten dumps that didn't work? (Using the raw device with the TK50 results in an order of magnitude increase in time and several orders of magnitude increase in error-rate.) Please explain the logic in that. |>oug /\lan