[comp.unix.wizards] transparent archiving

randy@umn-cs.cs.umn.edu (Randy Orrison) (05/20/88)

How would you implement transparent archive retrievals in UNIX*?

I'm thinking of a scheme where files will be stored on tape without
user intervention, and will still appear in the directory where they
were, but when they are accessed, they are moved back from tape to
disk without the user knowing.

My thoughts so far are that a bit (or something) will have to be
added to the inode and the kernel processing will have to be changed,
similar to the way that symbolic links are processed.

Any ideas on what will be needed in the inode?  Am i crazy?

What are good reference books for this sort of thing?  UNIX* (or BSD)
processing of file names (namei?), open(), etc.

Any and all help much appreciated!

	-randy
-- 
Randy Orrison, Control Data, Arden Hills, MN		randy@ux.acss.umn.edu
(Anyone got a Unix I can borrow?)   {ihnp4, seismo!rutgers, sun}!umn-cs!randy
Get forgiveness now -- tomorrow you may no longer feel guilty.

chris@mimsy.UUCP (Chris Torek) (05/20/88)

In article <5482@umn-cs.cs.umn.edu> randy@umn-cs.cs.umn.edu
(Randy Orrison) writes:
>I'm thinking of a scheme where files will be stored on tape without
>user intervention, and will still appear in the directory where they
>were, but when they are accessed, they are moved back from tape to
>disk without the user knowing.

Actually, I imagine the user will notice:

	% cat old-file
 {*yawn*  Gosh the machine is slow today...}

The obvious way to do this is with portals.  Too bad they never got
written.  You can try U of Wisconsin's `watchdogs' (see winter 1988
Usenix proceedings), which are more or less the same idea.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chet@mandrill.CWRU.Edu (Chet Ramey) (05/21/88)

In article <11590@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:

[discussion of a scheme to store files on tape until accessed]

>The obvious way to do this is with portals.  Too bad they never got
>written.  You can try U of Wisconsin's `watchdogs' (see winter 1988
>Usenix proceedings), which are more or less the same idea.

Those are the University of Washington's `watchdogs'.  

The authors are Brian Bershad and C. Brian Pinkerton, both of the 
U. of Washington Computer Science Department.  (The article is 
pretty interesting, by the way; I like the concept.)

Their addresses are given as {brian,bp}@june.cs.washington.edu; I 
suppose requests for info should be directed there.

Chet Ramey

(Boy, it's not often that one catches Chris in error, even one this 
minor :-))



-- 
| Chet Ramey    chet@mandrill.ces.CWRU.Edu  {cbosgd,decvax,sun}!mandrill!chet |
|									      |
| "Oh, but I was so much older then...I'm younger than that now"	      |

chris@mimsy.UUCP (05/21/88)

>In article <11590@mimsy.UUCP> I wrote:
>>U of Wisconsin's `watchdogs' ...

In article <2483@mandrill.CWRU.Edu> chet@mandrill.CWRU.Edu (Chet Ramey) writes:
>Those are the University of Washington's `watchdogs'.  

Oops.  Apologies to both universities.  (One might wonder why both would
want an apology :-) )

>(Boy, it's not often that one catches Chris in error, even one this 
>minor :-))

(In my own defense, I must mention that I missed that Usenix and was
relying one someone else's description.  I am fairly sure he said
`Wisconsin'.  Well, at least it was a `U of W' :-) .)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

tcs@usna.mil (Terry Slattery) (05/27/88)

> How would you implement transparent archive retrievals in UNIX*?

Funny you should ask...  And no, you are not crazy.

Mike Muuss and Don Merritt at BRL and I at USNA are coorporating
on creating just this facility.  Here are some notes on what we are
doing.  [This note is ~90 lines in length.]

First, the criteria.  Just having migrated files (archived files) is
reasonably simple.  You need a new file type and a place to store a file
handle.  The file type and handle could be a variation on the Berkeley
symlink idea or use an additional field in the inode.  When the file
is migrated, the file type is set to FMIG and the handle is stored in
(or with) the inode.  When the kernel attempts to open this file, it
recognizes the FMIG type, blocks the process and informs a user level
daemon of the file handle to reload (plus other info about where to
reload it, the process to wakeup on completion, etc).  We've taken the
option of also providing a user settable action upon attempting to
open a migrated file such as return an error code, transparently
reload the file, return error and also reload the file, etc.  You'll
also need hooks in stat() so that ls can tell you about migrated files.

Something we call 'premigrated' files are also desirable.  These files
have been migrated but also continue to live on the disk.  If the
system runs low on disk space, these files may be deleted, freeing the
disk space for active users.  However, if a user accesses one of these
files, it re-appears very quickly (since a copy still lives on disk).
To do this, one needs to keep the file's block pointers intact and
have somewhere in the inode to store the file handle.  In Berkeley
systems, there are some spare fields in which to do this.  Not so in
System V.  Our solution is to migrate the file to an on-disk directory
within the same file system, make the original file's inode type FMIG
and zero the block pointer area.  The disk block pointers may now be
used to store the file handle.  To do this the kernel performs a
mknod() in a staging directory on the same file system as the file to
migrate, copies the source file inode information to the new inode in
the staging area, sets the source inode to FMIG and adds the file
handle.  Recovering a file from the staging area performs the reverse
operation.

In all, we modified about six or so kernel routines to provide this
functionality.  The user mode code is much more extensive.

One needs a database.  The file handle is used to find the file's
copies (yes, you may want multiple copies) on backing media.  One may
also want to stage between various backing media (jukeboxes, 3480
tape, operator mounted tapes, optical disk, etc).  This flexibility is
obtained by having the file handle reference multiple entries in a
database.  Each entry is for a particular file/volume tuple (a file
may also span multiple volumes).  Our database consists of ascii
newline terminated records (so it is machine independent and you can
read it!).  A media volume database is also needed to manage the
storage media (A file is on volume 1528. Now where is that volume? Is
it CD-ROM or tape?).

Next, one needs tools to archive the on-disk copy to other media.  We
are working on these tools now.  There will be a migarch tool to copy
files between media, migin tool to cause files to be reloaded, a media
management tool (to make sure that the correct media is mounted on the
correct drive and to reserve the drive for exclusive use).  While find(1)
can be used to create a list of files to migrate, there will probably
be some other tools created to assist the system administrator and/or
the user in selecting files to migrate.

Several programs need to be modified (df, fsck, ls, du among others).
Other issues like how NFS clients select one of the user visible
behaviors and what kinds of errors should they see have not yet been
tackled.

Ah! Just what you've been looking for, you say.  How do I get a copy
of this software?

Our initial work has been on Sun workstations and we have a demo
system runing which only performs migration to the on-disk staging
area and back.  Ls has been modified to show the new file type with the
-l and -F flags.  The file handle database has been designed but is
still subject to change.  We expect to have a prototype system running
soon.  A production release will probably happen in late summer after
we've had time to pound on it a while and get some of the other tools
built.  When we do make a release, it will be public domain.  Vendors
are encouraged to add it to their products.

I hope this answers some of your questions (and probably raises many
more).  Please be patient and something will be out soon.  Remember,
answering questions takes time from the project.

BTW, why do you ask?

	-tcs

rbj@icst-cmr.arpa (Root Boy Jim) (05/28/88)

   From: Chris Torek <chris@mimsy.uucp>

   In article <5482@umn-cs.cs.umn.edu> randy@umn-cs.cs.umn.edu
   (Randy Orrison) writes:
   >I'm thinking of a scheme where files will be stored on tape without
   >user intervention, and will still appear in the directory where they
   >were, but when they are accessed, they are moved back from tape to
   >disk without the user knowing.

   Actually, I imagine the user will notice:

	   % cat old-file
    {*yawn*  Gosh the machine is slow today...}

They certainly will. EXEC 8 does exactly that, rolling out old files
to tape and rolling them back in when referenced. One of the first
things I learned to do was assign all my files with the nowait option
whenever I logged on. Gee, Chris, think of all the fun you missed :-)

I know of at least one company who makes an NFS file server with real
disks staged and destaged to optical disk. In that case, while the I/O
is certainly slower, at least it's faster than waiting for an operator
to find and mount the tape.

   In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
   Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	My name is in /usr/dict/words. Is yours?