[comp.unix.admin] Ideas/software for multi-tiered disk system?

bernhold@red8.qtp.ufl.edu (David E. Bernholdt) (05/02/91)

We have an installation consisting of a number of Sun file servers
with a total of 10 GB of disk serving a group of about 60 people.  We
frequently run large, long jobs on these machines and others.  When
run locally, the jobs themselves may consume large amounts of disk
space for time periods ranging from days to weeks, possibly longer in
principle, though no one's done it yet.  And whether the jobs are run
locally or on remote machines, large output files which must be kept
around for a while to be analyzed.

To implement this with reasonable efficiency, we presently divide our
file system into two parts:  /home partitions, which contain the
user's primary files, and /scr partitions which are intended for the
storage of large impermanent files such as the outputs and job
intermediate files mentioned above.  /home partitions are backed up
routinely by the system, while /scr is never backed up except by
individual users.

We are now running into problems where /scr partition is filling up
because people aren't cleaning up after themselves.  As a first step,
we've started purging files which have not been accessed for more than
six months form the /scr partitions.  We want to take this one step
further and turn parts of /scr into partitions which are regularly
purged of files untouched for 24 hours (order of magnitude) which we
intend to be used for running jobs, while the remainder (purged of
six-month-old files) would be for outputs, etc.

Now comes the sticky part:  Jobs may run for more than 24 hours, and
could quite possibly leave crucial files untouched for more than 24
hours at a time.  We don't want to delete _any_ files which are needed
by a running job, even if it has been more than 24 hours.  Of course
we want to do this with minimal intervention (or possibility of abuse)
by the users.

So, I am seeking the experience of anyone who has similar problems to
hear how they solve things and the software used to implement the
policies.  I'm interested in hearing ideas about the problem as a
whole -- perhaps there are other ways to solve it we haven't thought
of -- as well as the more specific ideas of how to implement the 24
hour purge without harming running jobs.

A couple of ideas I had:

1) In each directory, base the deletion of all files on the age of the
most recently accessed.  While there may be some files untouched for
long periods, it is very unlikely that _no_ files would be untouched.
This method would have to work only with plain files and ignore
subdirectories.

2) Use the directory's last access time time to determine the fate of
the files it contains.  From reading the man pages, I think this is
equivalent to the above scheme, but it is not completely clear to me.

Please reply by email and I will summarize for these groups.  Thanks
in advance.
-- 
David Bernholdt			bernhold@qtp.ufl.edu
Quantum Theory Project		bernhold@ufpine.bitnet
University of Florida
Gainesville, FL  32611		904/392 6365