[comp.unix.i386] Disk Mirroring

dtynan@altos86.Altos.COM (Dermot Tynan) (08/31/90)

In article <1990Aug27.183821.13518@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) writes:
> > Even "reliable" disks eventually die.
> 
> True.  So do reliable controllers.

I don't know what your hardware background is, but let me assure you that the
following statement is Law:

	MTBF(controllers) >> MTBF(disks)	..........................(i)

No-one can claim to produce a completely fault-free system.  Most of the
rhetoric is exactly that.  "Fault Tolerant", "Fault Resilient", etc.  No
matter what you do, as long as there is a probability (no matter how small),
of something failing, your system is not fault-free.  The whole idea behind
disk mirroring, is not to replace disk backups (which can also be faulty),
but to reduce the fault probability by a considerable margin.  In general
terms, if you want to make a system more resilient to failure, the first
place to look is in any non-solid-state system.  Ie, anything with moving
parts.  In the average system, this means the disk drives.  While mirroring
won't eradicate the probability of failure, it will reduce it considerably.
At least from the users point of view.

> What I want to get at--and it's something I didn't say at all in my previous
> posting--is that if you're looking for a certain level of reliability, it's
> a lot harder than just tossing on extra disks and mirroring.

See above.  Nobody is trying to produce a fault-free system.  We are just
trying to reduce the likelihood of having to restore a filesystem.  Believe
me.  Disk mirroring will slow down disk writes (which aren't the bulk of
disk operations, anyway), but it will double your disk reliability.

>   - Is there another way to get comparable recovery capability?
> To the second question, I'll suggest "journaling" as providing a lot of
> what you need, possibly at much less cost.  I'm more interested in the
> first question.

Certainly "journaling" is another approach.  However, it puts the onus on
the person writing the application, rather than hiding it in the OS, and
furthermore, it is as valid to label "journaling" as a marketing bullet
item, as it is disk mirroring.  It is a question of what the user community
wants.  Altos, like most companies is a slave to its user community.  Most
product development is based on what our customers want.  They want mirroring.
We implemented it.  It has nothing to do with bullet items.  It has to do
with what the market wants.

> I had pointed out that it takes extra I/O bandwidth to handle mirroring;
> someone responded that if you have the right sort of controller, it will
> write both disks at once for you.  OK, fine, now you've made the controller
> a single-point-of-failure.

	MTBF(controller) >> MTBF(disks)		Get it?

> I've seen as many motherboard and controller
> failures as disk failures.  I don't pretend my experience is typical, but
> suppose that it might be.  The disks are not the only failure points in the
> system.

I suggest that you have some serious design flaws here.  See Law (i).
Furthermore, even if the controller *does* die, you can snap on a new
controller, and continue, a lot faster than you can replace a disk, and
restore from backups.  Assuming, of course, that your backups were done
*right* before the disk died, or that you log all transactions to tape.

> If you're essentially running on one disk and just writing the
> other as a backup mirror, you're not getting the ongoing check that you
> really need for reliability.

Again, the reliability gained from even the simplest of mirroring schemes
far exceeds not doing *any* mirroring.  If, indeed, reliability is a concern.
If this isn't enough, there are other things you can do.  This sort of
falls into the standard Cache argument, which goes like this...
"With a 256K cache, you can get a 95% hit rate.  So why bother only using
 a 64K cache?".
The correct answer, of course, is that the 64K cache may only give you
an 80% hit rate (arbitrary figure), but its still a lot better than 0%.
And its one quarter the cost!

> In this case, I'm not arguing that
> mirroring is worthless, but I do argue that it's inordinately expensive
> and only addresses one small part of the overall reliability problem.  A
> single system with mirrored disks on one controller has only one element of
> redundancy.

A third time:
	MTBF(controller) >> MTBF(disks)

What exactly do you mean when you say "expensive".  Since Altos doesn't charge
anything for disk mirroring, and for the most part, is developed in conjunction
with disk striping (which is worth its weight in gold), doesn't require any
noticeable NRE.  As for its performance expense, this is *only* borne by those
who enable it (SCO and C2 could learn something here :), therefore, there is
*no* expense to those people (the majority, probably) who don't use it.  For
those who do, you've failed to convince me that the performance expense is not
worth the gain.
						- Der
-- 
	Dermot Tynan,  Altos Computer Systems,  San Jose, CA   95134
	dtynan@altos86.Altos.COM		(408) 432-6200 x4237

	"Five to one, baby, one in five.  No-one here gets out alive."

truesdel@sun418.nas.nasa.gov (David A. Truesdell) (08/31/90)

dtynan@altos86.Altos.COM (Dermot Tynan) writes:

[ Quite a bit about mirrored filesystems, which I won't repeat here. ]

Disk mirroring IS a relatively inexpensive method of "hardening" modest amounts
of data.  However, when you want to protect more than just a few disks worth,
the costs of buying duplicate drives can quickly get out of hand.

A less expensive approach, if you have a LOT of data, is to use a RAID
(Redundant Array of Inexpensive Disks) style system, which can use a single
spare disk to protect the data on several others.  When you are talking about
100's of gigabytes of data, that's a lot of disk drives you won't have to buy.
(Purists may note that mirroring is considered a simple form of RAID.)

>In article <1990Aug27.183821.13518@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) writes:
>> I had pointed out that it takes extra I/O bandwidth to handle mirroring;
>> someone responded that if you have the right sort of controller, it will
>> write both disks at once for you.  OK, fine, now you've made the controller
>> a single-point-of-failure.

>	MTBF(controller) >> MTBF(disks)		Get it?

>> I've seen as many motherboard and controller
>> failures as disk failures.  I don't pretend my experience is typical, but
>> suppose that it might be.  The disks are not the only failure points in the
>> system.

>I suggest that you have some serious design flaws here.

Another design flaw would be to use a single controller to run both disks.
Separate controllers running, running separate disks, could allow the system
to continue running in spite of the failure of a controller or a disk.  If you
get the software right, you would only have to come down long enough to replace
a controller.  (If you get the hardware right, wouldn't have to do that!)

--

T.T.F.N.,
dave truesdell (truesdel@prandtl.nas.nasa.gov)

meissner@osf.org (Michael Meissner) (08/31/90)

In article <3895@altos86.Altos.COM> dtynan@altos86.Altos.COM (Dermot
Tynan) writes:

| See above.  Nobody is trying to produce a fault-free system.  We are just
| trying to reduce the likelihood of having to restore a filesystem.  Believe
| me.  Disk mirroring will slow down disk writes (which aren't the bulk of
| disk operations, anyway), but it will double your disk reliability.

If both mirrors are operational, it can speed up reads, since the
system will get the data from which ever disk's read head is closer
(assuming a smart OS and/or controller).

Another win with disk mirroring is the trick they used internally on
at least one machine at Data General.  The main OS machine had a disk
farm that was getting to the point that backups could no longer be
done in a reasonable time period.  What they did was mirror some/all
of their critcal drives.  Then they would break the mirror, and start
backups on one side of the mirror (they could break the mirror without
any disruption or taking the disk offline).  Meanwhile, the users
would be busily writing to the other (now non-mirrored) disk.  This
way backups did not have data changing underneath, they could use the
much faster raw disk backup procedure (dump instead of tar in
UNIX-speak), and the system did not have to be taken down.  When the
backups finished, they regrafted the mirrored disks back together, and
the system would resync the disks during the idle loop.

The downside of any mirroring scheme of course, is that you have to
buy twice as many disk drives as you did previously (and I never was
in a group that could afford it :-).
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Do apple growers tell their kids money doesn't grow on bushes?

frank@rsoft.bc.ca (Frank I. Reiter) (09/01/90)

In article <3895@altos86.Altos.COM> dtynan@altos86.Altos.COM (Dermot Tynan) writes:
>
>See above.  Nobody is trying to produce a fault-free system.  We are just
>trying to reduce the likelihood of having to restore a filesystem.  Believe
>me.  Disk mirroring will slow down disk writes (which aren't the bulk of
>disk operations, anyway), but it will double your disk reliability.

Maybe I've missed something, but it seems to me that the results should be
orders of magnitude better than that.  Let's say that the odds of a particular
drive failing on a particular day are 1 in 1000.  The odds of both drives
failing on that day are then 1 in 1000000 are they not?  Does not mirroring
mean that both drives must fail simultaneously in order for there to be loss
of data?
-- 
_____________________________________________________________________________
Frank I. Reiter              UUCP:  {uunet,ubc-cs}!van-bc!rsoft!frank
Reiter Software Inc.                frank@rsoft.bc.ca,  a2@mindlink.UUCP
Surrey, British Columbia      BBS:  Mind Link @ (604)576-1214, login as Guest

rcd@ico.isc.com (Dick Dunn) (09/01/90)

dtynan@altos86.Altos.COM (Dermot Tynan) writes, starting from the
following:

> > > Even "reliable" disks eventually die.
> > True.  So do reliable controllers.

> I don't know what your hardware background is...

Hmmm...you probably don't want a bio right now, but I did spend some fair
time working in a disk test engineering group.  I won't make any great
claim based on that, only that my experience with disk failures is more
than casual and anecdotal.  Whatever...

>...but let me assure you that the
> following statement is Law:
> 
> 	MTBF(controllers) >> MTBF(disks)	..........................(i)

Now, see, here's how flame-fests get started...you assert something as a
"Law" when I "know" it's not so.  In the past (ten years or so, let's say)
you were close enough to right.  It's really no longer true.  Depending on
a handful of factors, either
	MTBF(controllers) > MTBF(disks)
		or
	MTBF(controllers) ~ MTBF(disks)
		
> No-one can claim to produce a completely fault-free system.  Most of the
> rhetoric is exactly that.  "Fault Tolerant", "Fault Resilient", etc...

We agree there, and so we move on (as you suggest) to trying to find the
hot spots for failures.

> ...In general
> terms, if you want to make a system more resilient to failure, the first
> place to look is in any non-solid-state system.  Ie, anything with moving
> parts.  In the average system, this means the disk drives...

This is a good place to start.  It's conventional wisdom and common sense.
(I'll add that the second place to look is wherever you've got true analog
circuits--which is *also* in the disk subsystem, though it may be split
between controller and drive.)

But now consider:  *Every*body knows that the disks are potentially a
serious weak point--not only are they mechanical, but they hold your "per-
manent" data.  Even the disk manufacturers know it, and they don't like
being the fall guys for every system failure.  So they find ways to make
their disks more reliable.  Now, it's not exactly news that the disk boys
are in the hot seat, but in the past it was relatively harder to make
reliable disks at a decent price, so we accepted higher failure rates and
did other things to mitigate them.

Disk manufacturers are doing a much better job these days.  It's not
cheap--the price of disk is one of the larger chunks of the total price of
most systems.  What's really happened is that the disk manufacturers and
system architects have agreed that disk reliability is important enough
that they are spending enough money there to bring the reliability of the
disk subsystem in line with the reliability of the rest of the system.
That's just good engineering--it doesn't make sense to have one part of a
system (particularly a critical part) far less reliable than the rest of
the system...you go spend money on the unreliable part until it's good
enough or until it's not wise to spend any more on it.  The change in
recent years is that it's possible to buy good enough reliability without
screwing up the overall system cost.

The true MTBF of small disks has probably increased by almost a factor of
10 in the last decade.

> ...Disk mirroring will slow down disk writes (which aren't the bulk of
> disk operations, anyway), but it will double your disk reliability.

1.  Yes, writes aren't the bulk of the operations.  However, they can
commonly vary from about 1/3 (two reads for every write) to 1/10 of the
total load.  Your point is good, but you have to be a little careful about
how much weight you give it.

2.  Disk mirroring will double the reliability of the disks themselves,
but that doesn't translate into a doubling of the reliability of even the
disk subsystem, let alone the whole machine.

>...Certainly "journaling" is another approach.  However, it puts the onus on
> the person writing the application, rather than hiding it in the OS...

Not necessarily.  For an application writer, you might do that if the
system doesn't support it.  But you folks are system designers; you'd put
it in the system.  (Nothing novel about that...after all, you've modified
the file system for mirroring, right?  You could just as easily have
implemented journaling.)

> ...Altos, like most companies is a slave to its user community...
> ...They want mirroring.  We implemented it...

All understood...design by customer is uncomfortable.  But I'm more inter-
ested in looking at the real technical aspects of mirroring.

> 	MTBF(controller) >> MTBF(disks)		Get it?

Now, now, don't get too pushy...:-)

I still say "get better disks."  MTBF of good modern disks is many years of
power-on time.  You will get card failures in that amount of time based on
connector oxidation, if nothing else.

> > I've seen as many motherboard and controller
> > failures as disk failures.  I don't pretend my experience is typical...
>...I suggest that you have some serious design flaws here.  See Law (i).

I don't design hardware, and Law (i) isn't a law.  But while we're talking
about MTBF, let's note that
	MTBF(hardware) >> MTBF(software)
for most systems.  That's another reason I suggested journaling; it gives
a second version of your data created by different code than the first.

> Furthermore, even if the controller *does* die, you can snap on a new
> controller, and continue, a lot faster than you can replace a disk, and
> restore from backups...

*After* you figure out that you've got a bad controller.  Depending on the
failure mode, you might have done some real damage in the meantime.

> > In this case, I'm not arguing that
> > mirroring is worthless, but I do argue that it's inordinately expensive
> > and only addresses one small part of the overall reliability problem...

> A third time:
> 	MTBF(controller) >> MTBF(disks)

while (strcmp(grab_input(),"MTBF(controller) >> MTBF(disks)") == 0)
	puts("buy better disks!);

> What exactly do you mean when you say "expensive"...

I mean that the cost of disk mirroring is a doubling of the cost of disk
drives in the system...and they're already a major part of the cost of the
system.
-- 
Dick Dunn     rcd@ico.isc.com -or- ico!rcd       Boulder, CO   (303)449-2870
   ...I'm not cynical - just experienced.

dtynan@altos86.Altos.COM (Dermot Tynan) (09/01/90)

In article <89@rsoft.bc.ca>, frank@rsoft.bc.ca (Frank I. Reiter) writes:
> In article <3895@altos86.Altos.COM> dtynan@altos86.Altos.COM (Dermot Tynan) writes:
> >but [disk mirroring] will double your disk reliability.
> 
> Maybe I've missed something, but it seems to me that the results should be
> orders of magnitude better than that.
> Frank I. Reiter              UUCP:  {uunet,ubc-cs}!van-bc!rsoft!frank

This is true.  However, I didn't want to be accused of using overrated
figures.  You can certainly guarantee doubling reliability, but on this
network, you'd get flamed from on high if you took that too far...
						- Der
-- 
	Dermot Tynan,  Altos Computer Systems,  San Jose, CA   95134
	dtynan@altos86.Altos.COM		(408) 432-6200 x4237

	"Five to one, baby, one in five.  No-one here gets out alive."

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR/KT) (09/02/90)

As quoted from <3895@altos86.Altos.COM> by dtynan@altos86.Altos.COM (Dermot Tynan):
+---------------
| In article <1990Aug27.183821.13518@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) writes:
| > I've seen as many motherboard and controller
| > failures as disk failures.  I don't pretend my experience is typical, but
| > suppose that it might be.  The disks are not the only failure points in the
| > system.
| 
| I suggest that you have some serious design flaws here.  See Law (i).
+---------------

ISC doesn't make hardware.  That's the key to this discussion; you're
discussing apples and oranges.  I've seen many a Taiwanese clone motherboard
and disk controller die in my time.  I've also seen some Altos CPU boards and
file processors die --- but only about once a year (over some fifty systems
that I am de-facto system administrator for) and generally on older equipment.
Not that the integrated approach automatically makes such problems rare --- I
saw quite a few hardware failures on the Plexus equipment I used to manage ---
but when done right, the integrated approach minimizes such problems.  Altos
has had its problems, certainly; the best hardware and software won't help
when it's not appropriate for the market, which has been one of the biggest
problems I've seen with Altos, but the 5000 series looks like it can/will
address many of those problems.

This much I will say about Telotech, Inc. and Altos:  we're picky.  In
particular, *I'm* picky; if Altos hardware and software weren't up to snuff,
I'd recommend dropping it, with a pretty good probability that it would be
done.  But I haven't, and we haven't, because it works.  (I'm technical, not
sales; I could care less about hype, all I care about is if it works.)

+---------------
| > In this case, I'm not arguing that
| > mirroring is worthless, but I do argue that it's inordinately expensive
| > and only addresses one small part of the overall reliability problem.  A
| > single system with mirrored disks on one controller has only one element of
| > redundancy.
| 
| A third time:
| 	MTBF(controller) >> MTBF(disks)
| 
| What exactly do you mean when you say "expensive".  Since Altos doesn't charge
| anything for disk mirroring, and for the most part, is developed in conjunction
| with disk striping (which is worth its weight in gold), doesn't require any
| noticeable NRE.  As for its performance expense, this is *only* borne by those
| who enable it (SCO and C2 could learn something here :), therefore, there is
| *no* expense to those people (the majority, probably) who don't use it.  For
| those who do, you've failed to convince me that the performance expense is not
| worth the gain.
+---------------

I've said enough on SCO and C2 security, so I'll let that one pass.

Granted, most people won't care about disk mirroring.  None of Telotech's
customers, with perhaps one exception (and that only in the long term), will
care about it.  But Ti Kan mentioned airline ticket systems and ATM systems.
In the one, disk mirroring prevents major frustration to employees and users
(think about that next time you're waiting for a plane ticket...) and in the
other, I would consider it essential.

And mirroring is truly *optional*:  it costs NOTHING if you don't enable it.
I've been evaluating an AMS-5000 at work; I'm happy with it, modulo the stuff
Altos has no control over (C2...).  If I weren't happy with it, I'd not be
complaining about C2 security --- I'd be installing another computer in its
place.  Again, I care nothing about hype or "brand loyalty", I care about
machines that do what they're designed to do.

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR/KT on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Delphi: ALLBERY
uunet!usenet.ins.cwru.edu!ncoast!allbery    America OnLine: KB8JRR

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR/KT) (09/02/90)

As quoted from <truesdel.652083402@sun418> by truesdel@sun418.nas.nasa.gov (David A. Truesdell):
+---------------
| dtynan@altos86.Altos.COM (Dermot Tynan) writes:
| >In article <1990Aug27.183821.13518@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) writes:
| >> I've seen as many motherboard and controller
| >> failures as disk failures.  I don't pretend my experience is typical, but
| >> suppose that it might be.  The disks are not the only failure points in the
| >> system.
| 
| >I suggest that you have some serious design flaws here.
| 
| Another design flaw would be to use a single controller to run both disks.
| Separate controllers running, running separate disks, could allow the system
| to continue running in spite of the failure of a controller or a disk.  If you
| get the software right, you would only have to come down long enough to replace
| a controller.  (If you get the hardware right, wouldn't have to do that!)
+---------------

Worst case uses the standard controller; one disk goes, the controller
switches to the other disk.  No down-time.  If the controller goes, you're
screwed.

But if you need disk mirroring that badly, you are using one or two HPFP
boards as well as the standard controller.  A controller goes, the mirror disk
on another controller takes over.  No downtime.  And since disk striping and
disk mirroring are the same mechanism in recent Altos OS'es (including the
current OS for the Series 1000), you can also configure for RAID.  And since
all three controllers are capable of independent operation, you lose very
little time doing the mirroring.

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR/KT on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Delphi: ALLBERY
uunet!usenet.ins.cwru.edu!ncoast!allbery    America OnLine: KB8JRR