[net.unix-wizards] Tape drive out to lunch

eichelbe@nadc.ARPA (10/09/85)

	---
	Has anyone on a VAX 11/780 under 4.1 BSD or 4.2 BSD UNIX ever had
a problem where the system all of a sudden acted like your tape drive no
longer existed?  I was running a tape job and everything was going along
fine.  Then all of a sudden, my job bombed.  The tape was not rewound.
Any "mt" commands met with:
	/dev/rmt12: No such device or address

It did not matter if I directed the "mt" commands at another /dev/rmtxx file,
either.  I got the same thing, but for that device file.

If I try to copy to the tape drive (cp .login /dev/rmt8) I get:
	cp: cannot create /dev/rmt8 

The files in /dev look fine.  I am the system administrator/manager, so no
one is playing with things unless there is a security hole.

I am under 4.1 BSD on a VAX 11/780.  Any ideas?

Thanks.
	Jon Eichelberger
	eichelbe@NADC
P.S. The last time this happened a reboot fixed it.  One day both the line
     printer and the tape drive went out to lunch the same way.  A reboot
     fixed that, too.

chris@umcp-cs.UUCP (Chris Torek) (10/11/85)

> ... Any "mt" commands met with:
>	/dev/rmt12: No such device or address

This generally happens when someone else has the drive open.  It
also occasionally happens when no one else has the drive open, but
the kernel believes otherwise.

I prefer kernels fixed so that "mt" says:

	/dev/rmt12: Mount device busy

While not perfect, this is unarguably better than "No such device
or address".

The required change is trivial.  Here is the one for the 4.2/4.3
TS11 driver.  Your line numbers will vary.

*** /tmp/,RCSt1005062	Thu Oct 10 23:46:25 1985
--- ts.c	Thu Oct  3 23:15:34 1985
***************
*** 180,185
  
  	tsunit = TSUNIT(dev);
! 	if (tsunit>=NTS || (sc = &ts_softc[tsunit])->sc_openf ||
! 	    (ui = tsdinfo[tsunit]) == 0 || ui->ui_alive == 0)
  		return (ENXIO);
  	if (tsinit(tsunit))

--- 196,200 -----
  
  	tsunit = TSUNIT(dev);
! 	if (tsunit >= NTS || (ui = tsdinfo[tsunit]) == 0 || ui->ui_alive == 0)
  		return (ENXIO);
  	sc = &ts_softc[tsunit];
***************
*** 183,186
  	    (ui = tsdinfo[tsunit]) == 0 || ui->ui_alive == 0)
  		return (ENXIO);
  	if (tsinit(tsunit))
  		return (ENXIO);

--- 198,204 -----
  	if (tsunit >= NTS || (ui = tsdinfo[tsunit]) == 0 || ui->ui_alive == 0)
  		return (ENXIO);
+ 	sc = &ts_softc[tsunit];
+ 	if (sc->sc_openf)
+ 		return (EBUSY);
  	if (tsinit(tsunit))
  		return (ENXIO);
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

irwin@uiucdcs.CS.UIUC.EDU (10/11/85)

It sounds like a hardware problem, but I can not make much of an
evaluation, without knowing what is in your unibus slots, and in
which order.

If you can figure the offset, and examine the tape controller
status register, it may tell you something. Also, you don't say
what handles the lp, an lp-11 or what, need to know more about
your hdwr.

If you try to examine the tape controller registers, and find
that it is not out there, you will have learned an important fact.

rsk@pucc-k (Wombat) (10/11/85)

In article <2018@brl-tgr.ARPA> eichelbe@nadc.ARPA writes:
>	Has anyone on a VAX 11/780 under 4.1 BSD or 4.2 BSD UNIX ever had
>a problem where the system all of a sudden acted like your tape drive no
>longer existed?  I was running a tape job and everything was going along
>fine.  Then all of a sudden, my job bombed.  The tape was not rewound.
>Any "mt" commands met with:
>	/dev/rmt12: No such device or address

A guess:

The tape drive did not get auto-config'd in on your last reboot; it was
probably disconnected from the bus or something when the machine came up,
and so the probes never found it.  You should be able to verify this by
comparing your console listing for this boot with a listing for a boot
where the tape drive was indeed found.  Reconnect the drive and reboot,
and everything should be fine.
-- 
Rich Kulawiec	rsk@pur-ee.uucp rsk@purdue.uucp rsk@purdue-asc.arpa

root%bostonu.csnet@CSNET-RELAY.ARPA (BostonU SysMgr) (10/15/85)

>(eichelbe@nadc)
>I was running a tape job and everything was going along
>fine.  Then all of a sudden, my job bombed.  The tape was not rewound.
>Any "mt" commands met with:
>	/dev/rmt12: No such device or address

>>(rsk@pucc-k)
>>A guess:
>>The tape drive did not get auto-config'd in on your last reboot;

Bad guess, try reading the article first, if it hadn't gotten auto-config'd
how would the 'tape job and everything (sic?) was going along fine' had
happened?? I believe Chris Torek answered this one so I won't belabor it
further.

	-Barry Shein, Boston University

chris@pixutl.UUCP (chris) (10/16/85)

> I prefer kernels fixed so that "mt" says:
>
>	/dev/rmt12: Mount device busy

The precision of u.u_error is pretty poor. It should have 'Device busy',
'Write protect' and many more to accomodate drivers and to eliminate
the 'uprintf("No write ring\n");' and other kludges found in some kernels.

Chris
-- 

 Chris Bertin            :         (617) 933-7735 x2336
 Pixel Systems Inc.      :	   (800) 325-3342
 300 Wildwood street     :  {allegra|ihnp4|cbosgd|ima|genrad|amd|harvard}\
 Woburn, Ma 01801        :     !wjh12!pixel!pixutl!chris

chris@umcp-cs.UUCP (Chris Torek) (10/18/85)

Actually, under 4.3, the error message for EBUSY has been changed
to `Device busy' (from `Mount device busy'), which I like.  But
yes, the range of error codes is limited, though I am not certain
that this is a problem.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

mikel@codas.UUCP (Mikel Manitius) (10/19/85)

> 	/dev/rmt12: No such device or address

I have had the same identical problem happen a couple of years ago on
a Vax 11/780 with 4.1bsd, the problem is not in permissions anywhere,
you will notice that if you are using the tapedrive and try to access
it from another terminal you will get the same message. When the tape
drive gets an i/o error for some reason, it will backup and try to read
the same data again, it will repeat this several times if needed. For
some obscure reasons, the driver may "hang" on this, and you cannot
kill the process, it just sits there (even if you try bringing the system
down, you will get a message "warning, some processes wouldn't die...").
Since the process still has the device open, no other process can access it.
And the kernel just sais that the device does not exist. I'm not sure
if there is a fix for it, but rebooting the Vax will free up the device
again.
-- 
	Mikel Manitius  - ...{ihnp4|akguc|attmail|indra!koura}!codas!mikel

guy@sun.uucp (Guy Harris) (10/22/85)

> The precision of u.u_error is pretty poor. It should have 'Device busy',
> 'Write protect' and many more to accomodate drivers and to eliminate
> the 'uprintf("No write ring\n");' and other kludges found in some kernels.

True.  The lack of precision in error codes and the lack of enthusiasm by
UNIX programs (and programmers) for actually checking for errors and
printing reasonable messages (i.e., "perror"-style messages, at least) go
hand in hand.  The latter is probably as much a reason for those kludges as
the former (although Multics, if I remember correctly, had better error
codes and a fancier message printer, and its tape driver was amazingly
verbose, spitting out several lines of friendly greetings to let you know it
had opened a tape).

	Guy Harris