[comp.periphs.scsi] Warm swap

guineau@star.enet.dec.com (W. John Guineau) (04/22/91)

Open question with no personal opinions:

Is anyone using SCSI devices in a WARM SWAP mode? 

By swap, I mean physically removing a device from the bus and 
replacing it with another.

Some definitions for reference:

COLD SWAP: everything powed off, peripherals swapped, everything powered up.
WARM SWAP: everything powered on, NO activity on bus, peripherals swapped.
HOT  SWAP: everything powered on, activity on bus, peripherals swapped

What are the risks associated with this?

Electrical experts? SCSI experts? ANSI?

--                			
W. John Guineau                         grep meaning life | more
VMS Development
Digital Equipment Corporation		guineau@star.enet.dec.com

kaufman@neon.Stanford.EDU (Marc T. Kaufman) (04/22/91)

In article <22249@shlump.nac.dec.com> guineau@star.enet.dec.com (W. John Guineau) writes:

>Open question with no personal opinions:

>Is anyone using SCSI devices in a WARM SWAP mode? 
>WARM SWAP: everything powered on, NO activity on bus, peripherals swapped.

>What are the risks associated with this?

In my experience, the most common problem is mating the SCSI cable to the
connector.  Often, people angle the connectors to get them to mate at one
end first before seating them.  This can cause adjacent lines to short
together.  In the case of signal lines, the host or a peripheral may see an
unwanted command (such as bus reset).  In the cast of the trmPwr line, you
may blow the fuse (as happens on Macs).

Marc Kaufman (kaufman@Neon.stanford.edu)

scion@cs.utk.edu (Sam C. Nicholson II) (04/23/91)

In article <22249@shlump.nac.dec.com> guineau@star.enet.dec.com (W. John Guineau) writes:
>
>
>Is anyone using SCSI devices in a WARM SWAP mode? 
>
>COLD SWAP: everything powed off, peripherals swapped, everything powered up.
>WARM SWAP: everything powered on, NO activity on bus, peripherals swapped.
>HOT  SWAP: everything powered on, activity on bus, peripherals swapped
>
>What are the risks associated with this?
>

I was recently writing drivers for WORM drives and would frequently
swap drives around. Can't say as I recommend it as good for the health
of your running system.  The reconnection is going to generate a SCSI
reset.  Some OSs just will not tolerate them.*

I think that I  have treated devices on a SCSI every bit as cavalierly
as some C programmers treat pointers and integers and as hardware
hackers have treated  devices on a UNIBUS.  I have also panic'ed and
re-booted often.  As a powercycle time saver I do feel that I saved
more time than I lost.  Just remember where your terminators are.

It is difficult to say whether there is activity on the bus or not.  I
certainly would not trust the absence of LEDs or the sound of swishing
heads as a certain indicator of a quiecent bus.  If I had an emergency
swap, (e.g. Tape drive failed and I just had to get a crash dump or a
backup without a reboot ) I would halt the processor ( L1-A on my Sun,
^P on my VAX; your mileage may differ) and feel that that would halt
most activity on the bus, do the swap, and continue; hoping for the
best.

I don't beleive that I have read any thing about the SCSI bus that
would give me any confidence in the correctness of dis- and
re-connecting devices with regard to the electrical connections.  I
believe that the standards folk would not assume that activity as normal.

For production use, I would *strongly* recommend removable media
devices.  Sony, LMS, Bernoulli, and Syquest come to mind for disks
Archive, Wangtek and Tandberg likewise for cartridge tapes.

GAWD, I ramble on...
-sam

--------------
* They feel that THEY (being the bus master) have an exclusive domain
over the reset line.  They are wrong, but they are in use.

ben@epmooch.UUCP (Rev. Ben A. Mesander) (04/23/91)

>In article <hamilton.672457277@kickapoo.cs.iastate.edu> hamilton@kickapoo.cs.iastate.edu (Jon Hamilton) writes:
>I'm amazed that you people would think of plugging / unplugging drives with
>the power on.  I assumed the first post was a joke, but I see that I was in
>error.  Do y'all plug/unplug boards with the power on too?  Are you _really_
>too lazy to restart your machines?  Are you _that_ willing to risk your data
>or even your hardware?  Weren't you ever taught that you don't do stuff to the
>inside of a puter with it on?!

There are legitimate reasons for such tomfoolery. I used to write
firmware for Imprimis. I would take my Wren 5's and plug and unplug them
from the bus. Of course I'd also pick them up while running and drop them...

I got to know the hardware pretty well, as in "tap that capacitor hard and
the drive will generate a servo error." It was a real handy way to test my
firmware under adverse conditions. At one time, I was writing a lot of
firmware to ensure drive survivability under such conditions.


--
| ben@epmooch.UUCP   (Ben Mesander)       | "Cash is more important than |
| ben%servalan.UUCP@uokmax.ecn.uoknor.edu |  your mother." - Al Shugart, |
| !chinet!uokmax!servalan!epmooch!ben     |  CEO, Seagate Technologies   |

david@talgras.UUCP (David Hoopes) (04/23/91)

In article <22249@shlump.nac.dec.com> guineau@star.enet.dec.com (W. John Guineau) writes:
>
>Is anyone using SCSI devices in a WARM SWAP mode? 

>WARM SWAP: everything powered on, NO activity on bus, peripherals swapped.

Some of the people around here do this on a regular basis.  Every so often
they have to replace fuses.  You get some really interesting errors when
the fuses are blown.  If you are not willing to replace fuses fairly 
frequently then DON'T DO IT.




-- 
---------------------------------------------------------------------
David Hoopes                              Tallgrass Technologies Inc. 
uunet!talgras!david                       11100 W 82nd St.          
Voice: (913) 492-6002 x323                Lenexa, Ks  66214        

acoolidg@wpi.WPI.EDU (Aaron P Coolidge) (04/24/91)

	Hi. I just tried a warm swap on my PC (386sx, 4mb, wd7000 fasst2 SCSI,
quantun 105s). I tried adding my spare drive (a miniscribe *ugh* 20M) while
the machine was up and running. Plugged the sucker into the external port, 
tried a dir of the quantum, it came up OK, tried a dir of the miniscribe,
got an "invalid drive spec" error. Fine. I <ctrl> <alt> <del> 'd it, and
everything came up OK (I could read both drives). Then I unplugged the 
miniscribe, and tried a dir on the quantum. Nothing- the machine just locked
up. Fine. I warm booted it again, and got an "int 19h boot failture". 
So i reset it- another "int 19h boot failture". Terrific, I thought, I just
toasted the WD7000! Power off, wait 1 minute, power on, another "int 19h
boot failture". Wait a minute- maybee the boot block's corrupted! Boot the
machine off a floppy. try a "dir c:". Get: "invalid drive spec". Lovely!
Try FDISK. What?! No partitions defined?! YES, ITS TRUE!!! Ugh! Spend
an hour reformatting and reloading from tape.
	
	Can anyone shed any light on what may have happened here? All the 
partition info had been wiped off the quantum (the miniscribe was fine),
with the result that I couldn't get my data off! No big deal, but a pain none
the less. Was this due to the wd7000, or should I just power off before i 
plug in/ remove SCSI devices? I guess I should!
	
	PS. I plug in and unplug SCSIs all the time wih my Amiga, with it
on, and have had no problems yet.


-- 
Aaron Coolidge
               acoolidg@wpi.wpi.edu    bitnet:sorry, use a gateway.
        "I'm always in control of my car.  Well, at least 70% of the time." 

hamilton@kickapoo.cs.iastate.edu (Jon Hamilton) (04/24/91)

I'm amazed that you people would think of plugging / unplugging drives with
the power on.  I assumed the first post was a joke, but I see that I was in
error.  Do y'all plug/unplug boards with the power on too?  Are you _really_
too lazy to restart your machines?  Are you _that_ willing to risk your data
or even your hardware?  Weren't you ever taught that you don't do stuff to the
inside of a puter with it on?!


--
Jon Hamilton
hamilton@kickapoo.cs.iastate.edu
 " I feel a lot more like I do now that I did before I got here "
   - can't remember who

kaufman@neon.Stanford.EDU (Marc T. Kaufman) (04/24/91)

In article <hamilton.672457277@kickapoo.cs.iastate.edu> hamilton@kickapoo.cs.iastate.edu (Jon Hamilton) writes:

>                        Weren't you ever taught that you don't do stuff to the
>inside of a puter with it on?!

Depends on the computer.  All telephone stuff and some fault-tolerant systems
(like Tandem) are designed to have stuff plugged and unplugged with the power
on.  It's not overly much work to design PC compatible stuff that won't be
injured in a power-on plug-in =provided= you don't short the connector
traces.  It's too bad that most manufacturers would rather save $0.10 than
do it, though.

Things like SCSI busses, if implemented per the standard, are relatively
immune from gross hardware catastrophe because the drivers are current
limited.  Again, its too bad that some folks who program the device firmware
don't do sanity checking if the bus accidently wiggles.

Maybe the SCSI-3 standard should specify a minimum performance standard with
respect to bus errors.

Marc Kaufman (kaufman@Neon.stanford.edu)

dtb@adpplz.UUCP (Tom Beach) (04/25/91)

Another point which hasn't been mentioned is that many systems do
an autoconfig when powered up and any device that isn't on the bus
at power on isn't EVER on the bus, whether you add it physically on not.

In my testing I frequently swap devices with the SCSI powered. My
devices under test are external to the host with separate power
supplies and on a second SCSI bus. I power them down, swap devices,
power them back up. Do a dummy access to clear the 

ERROR: Device has been reset!

messages, and continue the test. 

Except for the rare bus power fuse failure, this works great but on many
systems 

You Can't:

1) Add new devices!

2) Change device categories, e.g. replace a disk with a tape!

As I said, just my two cents!

Tom

 ------------------------------------------------------------------------
|  Tom Beach : Sr Project Engineer : Mass Storage Technology             |
|  phone : (503) 294-1541                                                |
|  email : uunet : dtb@adpplz.uucp                                       |
|  ADP Dealer Services, ADP Plaza, 2525 S.W. 1st Ave, Portland OR, 97201 |
 ------------------------------------------------------------------------

chugins@hpcupt1.cup.hp.com (Chris Hugins) (04/27/91)

Some mid-range computers do allow peripherals to be powered on and off, and
even replaced without interrupting other users from performing their
tasks.  One example is the tape drive which may be seldom used.  Another
is that of a disk containing a private data set which may be moved from
machine to machine.  Not to mention the replacement/repair of a bad
peripheral.

Some database and manufacturing shops do not allow any (ANY) downtime.
Turning off the computer is not an option.

SCSI is not just for desk-tops anymore.  Unfortunately, the design of
the Small Computer System Interface (even "2") has not fully become
cognizant of this fact.

The ability to power-on and off devices without corruption of data 
across a common bus with other devices is important.  It is not 
completely clear if SCSI allows this, due to "noise" generated on
the bus at power-on (and OFF!) of scsi peripherals.  Some are "quieter"
than others, dependent upon the method of filtering at the line-drivers.

To remove/replace peripherals/cables on SCSI (even on a quiesced bus) 
is extremely risky.  Basically, it's "Do you feel lucky, punk?"

Maybe with SCSI-3....

Chris T. Hugins
chugins@hpisoa2.hp.com

ritchie@hpdmd48.boi.hp.com (David Ritchie) (04/29/91)

  Auspex has special gismos that do this in their file servers, so it
can be done.....

-- Dave Ritchie
ritchie@hpdmd48.boi.hp.com

Rob_Steven_Kramarz@cup.portal.com (05/05/91)

Experimentally, I have found that warm swap works consistently.  I have
never tried a hot swap and do not intend to since theoretically it is
disastrous.  If you do plan to do a warm swap, make sure that the bus
truely is inactive (unmount all file systems, or use a driver which
allows the on-off status of the bus to be toggled at the driver level.
My expertise in this area derives from our work at 1776, Inc. on
disk mirroring and disk array device drivers, where this question
is very germaine to our customers.