kathy@gsg.UUCP (Kathryn Smith) (10/24/85)
I've been running into some tape drive problems that I'm hoping some Unix wizard out there will be able to help me with. We are running a 4.2BSD on a Vax 11/750 with a TU77 tape drive unit. We've been experiencing two problems which may or may not be related. The first problem is that both the nightly and the level 0 dumps seem unable to cope with writing a full tape of information. Dump aborts on a tape write error consistently somewhere in the last 400 feet of a 2400 foot mag tape. This is apparently a software error, since no hardware diagnostics are showing up on the console. We have just recently upgraded to 4.2, but were experiencing the same thing on 4.1. I tried instumenting a copy of dump to find out what is going on, and found that the error is coming from the unix write primitive. The error code returned is 'I/O error' (enlightening). Right now we are functioning by running dump with the size option specifying a tape size of 2000 feet, but don't want to keep doing this for obvious reasons. The other(?) problem is apparently a hardware problem. DEC Field Service was here about two weeks ago to do preventive maintenance on the Vax, and had problems with their tape diagnostics. We had three field service engineers here for two working days (during which we got absolutely nothing done because the system was down) trying to locate the problem. Their diagnostic consists of writing bit patterns on the tape then reading them back and verifying them. The net result was that the diagnostic failed a few time when run on a very high quality tape which they use for calibrating and adjusting drives, significantly more on a standard DEC mag tape, and slightly more than on the DEC tape on one of our tapes (a BASF taken new out of the box that morning). They spent two full days taking the drive apart and replacing various parts, probing it with an oscilloscope, and then rerunning their diagnostics. At the end of that time they concluded that the problem was probably in the controller rather than the tape drive itself, since they had tried replacing most of the parts in the tape drve, but that they didn't know exactly what it was, and would have to come back the next day with a controller kit to look into that. At that point, we decided that we would limp along with the drive as is, since except for the dumps it is only failing about 10% of the time, mostly on read operations, since we couldn't afford to be down anymore because of time constraints on some projects. I don't know if there is any connection between the failed diagnostics and the dump problems, but I am inclined to suspect so because the dumps began failing on 4.1 before there were any changes to the software, so the only variables involved should be the hardware and the tapes. I discount the tapes because the problem appears both with old tapes which had been being used for dumps without difficulty, and for new tapes fresh out of the box. I didn't get any response from DEC Field Service about the software problem. In fact, I'm not at all sure they even paid attention when I was describing it to them. If there is anyone out there who has run into a similar problem, or even has educated guesses at what our problem might be, I would much appreciate hearing from you. If I get anything useful in response to this, I will summarize for the net. Kathryn Smith General Systems Group, Inc Salem, NH ( ... decvax!gsg!kathy)
thomas@utah-gr.UUCP (Spencer W. Thomas) (10/27/85)
In article <117@gsg.UUCP> kathy@gsg.UUCP (Kathryn Smith) writes: > The first problem is that both the nightly and the level 0 dumps seem >unable to cope with writing a full tape of information. Dump aborts on a >tape write error consistently somewhere in the last 400 feet of a 2400 foot >mag tape. This is apparently a software error, since no hardware diagnostics >are showing up on the console. > > I tried instumenting a copy of dump to find out what is going on, and >found that the error is coming from the unix write primitive. The error code >returned is 'I/O error' (enlightening). Right now we are functioning by >running dump with the size option specifying a tape size of 2000 feet, but >don't want to keep doing this for obvious reasons. Ah yes... Good old dump and it's "tape estimating" feature. Dump thinks it knows how many blocks will fill a 2400 foot tape, but depending on your tape drive, it can be pretty far off. Looks like your drive writes larger inter-record gaps than dump thinks it "should". So, only "2000 feet" of data will fit on your tape. The "I/O error" means that the drive has seen the end-of-tape marker, and is not really an error at all, except that dump can't handle it intelligently. You will have to keep on using a "tape size" of 2000 feet, because you're really using the whole tape. Watch it sometime, and see. -- =Spencer ({ihnp4,decvax}!utah-cs!thomas, thomas@utah-cs.ARPA) "When wrath runs rampage in your heart you must hold still that rambunctions tongue!" - Sappho
speck%cit-vlsi@CIT-VAX.ARPA (Don Speck) (10/27/85)
Half a year ago, our TU77 went through the same symptoms: errors on about every tenth tape. Cleaning the heads thoroughly would get it going only for another ten tapes. Finally one night while cleaning the heads, I got out a mirror and flashlight to have a good look at the heads. They were pretty badly scored. And no wonder: in 4.5 years of daily dumps, some 15,000 miles of tape had ground off most of a millimeter. (Magtape is kinda like a rouge polishing belt). I told DEC that the heads were shot, and would they please replace them. Instead, they went through all the adjustments, replaced most of the boards, and had the VAX (our ARPAnet gateway) down half the day running diagnostics. Finally, they admitted defeat and ordered new heads. The drive has worked fine ever since, and more importantly, the Field Service guy and his boss learned to listen to me. Don Speck speck@cit-vax.arpa