[comp.os.research] RAID

gibson@rosemary.Berkeley.EDU (Garth Gibson) (07/14/89)

> From: eho@cognito.princeton.edu (Eric Ho)
> Date: 13 Jul 89 03:47:07 GMT
> Subject: RAID backups ??
> Organization: Cognitive Science Lab.  Princeton University.
> 
> If one of the disks in a RAID fails, some data will be lost.  What will the
> recovery procedure (from tapes ?) be under Sprite ?
> --
> 
> Eric Ho
> Cognitive Science Lab.,		Princeton University
> voice = 609-987-2819 (x2987)	email = eho@confidence.princeton.edu
> 					eho@bogey.princeton.edu
> regards.
> 
> -eric-

RAID uses redundant storage of data to ensure that a single disk failure
will not cause the loss of any data.  In fact, in a RAID of many disks
(many parity groups) multiple disks can be lost without any loss of data.
There are combinations of disk loss that cause data loss, but they
occur so infrequently that it is unlikely that backups will be needed
every day.  I doubt that backups can be eliminated completely (because
the world has humans that mess up), but they can be drastically reduced.
One exception to this is systems with extremely valuable data (banks).
These users will do daily backups just to protect against disasters
and terrorism.  In this case, they want a complete copy of all data
(or at least all changes) in a physically remote location.  RAID does
not directly attend to this need.  On the other hand, if you have fast
enough tapes or networks, RAIDs do give you plenty of bandwidth to
minimize the backup duration.

> Yeah, that was what I thought.  But then how about people mistakenly deleted
> some files -- surely, one must be able retrieve them from daily backups ?

Human mistakes are certainly not something that a RAID per se can fix.
What we are looking at is a no-overwrite filesystem.  In this scenario
it becomes a question of time travel - show me my file as it looked 5
minutes ago.  Of course, with high capacity utilization old data will be
recycled fairly soon, but with a unix-like protected 10% free space,
a high percent of those "I didn't mean to do that" will be caught.
Still, you are right in that if a user deleted a file 8 months ago
and decides that he didn't mean to delete it - then things are tough.
The database people are trying to get us to log all changes to archive
(possibly encoded densely) so that time travel is more generally available
if at high cost.  Our general feeling is that after the largest fraction
of those accidental deletions are dealt with, what is the value of all
the extra work, bandwidth and cost?

Yes, Sprite is working on a file system specifically for RAID, but your
reason is not correct.  There is no reason why the physical device has
to have the geometry that unix thinks it has.  SCSI devices already
defy unix.  The geometry information is generally a hint for layout
and this could conceivably be ignored.  But we aren't suggesting that
it be ignored.

garth