dtynan@altos86.Altos.COM (Dermot Tynan) (08/31/90)
In article <1990Aug27.183821.13518@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) writes: > > Even "reliable" disks eventually die. > > True. So do reliable controllers. I don't know what your hardware background is, but let me assure you that the following statement is Law: MTBF(controllers) >> MTBF(disks) ..........................(i) No-one can claim to produce a completely fault-free system. Most of the rhetoric is exactly that. "Fault Tolerant", "Fault Resilient", etc. No matter what you do, as long as there is a probability (no matter how small), of something failing, your system is not fault-free. The whole idea behind disk mirroring, is not to replace disk backups (which can also be faulty), but to reduce the fault probability by a considerable margin. In general terms, if you want to make a system more resilient to failure, the first place to look is in any non-solid-state system. Ie, anything with moving parts. In the average system, this means the disk drives. While mirroring won't eradicate the probability of failure, it will reduce it considerably. At least from the users point of view. > What I want to get at--and it's something I didn't say at all in my previous > posting--is that if you're looking for a certain level of reliability, it's > a lot harder than just tossing on extra disks and mirroring. See above. Nobody is trying to produce a fault-free system. We are just trying to reduce the likelihood of having to restore a filesystem. Believe me. Disk mirroring will slow down disk writes (which aren't the bulk of disk operations, anyway), but it will double your disk reliability. > - Is there another way to get comparable recovery capability? > To the second question, I'll suggest "journaling" as providing a lot of > what you need, possibly at much less cost. I'm more interested in the > first question. Certainly "journaling" is another approach. However, it puts the onus on the person writing the application, rather than hiding it in the OS, and furthermore, it is as valid to label "journaling" as a marketing bullet item, as it is disk mirroring. It is a question of what the user community wants. Altos, like most companies is a slave to its user community. Most product development is based on what our customers want. They want mirroring. We implemented it. It has nothing to do with bullet items. It has to do with what the market wants. > I had pointed out that it takes extra I/O bandwidth to handle mirroring; > someone responded that if you have the right sort of controller, it will > write both disks at once for you. OK, fine, now you've made the controller > a single-point-of-failure. MTBF(controller) >> MTBF(disks) Get it? > I've seen as many motherboard and controller > failures as disk failures. I don't pretend my experience is typical, but > suppose that it might be. The disks are not the only failure points in the > system. I suggest that you have some serious design flaws here. See Law (i). Furthermore, even if the controller *does* die, you can snap on a new controller, and continue, a lot faster than you can replace a disk, and restore from backups. Assuming, of course, that your backups were done *right* before the disk died, or that you log all transactions to tape. > If you're essentially running on one disk and just writing the > other as a backup mirror, you're not getting the ongoing check that you > really need for reliability. Again, the reliability gained from even the simplest of mirroring schemes far exceeds not doing *any* mirroring. If, indeed, reliability is a concern. If this isn't enough, there are other things you can do. This sort of falls into the standard Cache argument, which goes like this... "With a 256K cache, you can get a 95% hit rate. So why bother only using a 64K cache?". The correct answer, of course, is that the 64K cache may only give you an 80% hit rate (arbitrary figure), but its still a lot better than 0%. And its one quarter the cost! > In this case, I'm not arguing that > mirroring is worthless, but I do argue that it's inordinately expensive > and only addresses one small part of the overall reliability problem. A > single system with mirrored disks on one controller has only one element of > redundancy. A third time: MTBF(controller) >> MTBF(disks) What exactly do you mean when you say "expensive". Since Altos doesn't charge anything for disk mirroring, and for the most part, is developed in conjunction with disk striping (which is worth its weight in gold), doesn't require any noticeable NRE. As for its performance expense, this is *only* borne by those who enable it (SCO and C2 could learn something here :), therefore, there is *no* expense to those people (the majority, probably) who don't use it. For those who do, you've failed to convince me that the performance expense is not worth the gain. - Der -- Dermot Tynan, Altos Computer Systems, San Jose, CA 95134 dtynan@altos86.Altos.COM (408) 432-6200 x4237 "Five to one, baby, one in five. No-one here gets out alive."