bernhold@red8.qtp.ufl.edu (David E. Bernholdt) (05/02/91)
We have an installation consisting of a number of Sun file servers with a total of 10 GB of disk serving a group of about 60 people. We frequently run large, long jobs on these machines and others. When run locally, the jobs themselves may consume large amounts of disk space for time periods ranging from days to weeks, possibly longer in principle, though no one's done it yet. And whether the jobs are run locally or on remote machines, large output files which must be kept around for a while to be analyzed. To implement this with reasonable efficiency, we presently divide our file system into two parts: /home partitions, which contain the user's primary files, and /scr partitions which are intended for the storage of large impermanent files such as the outputs and job intermediate files mentioned above. /home partitions are backed up routinely by the system, while /scr is never backed up except by individual users. We are now running into problems where /scr partition is filling up because people aren't cleaning up after themselves. As a first step, we've started purging files which have not been accessed for more than six months form the /scr partitions. We want to take this one step further and turn parts of /scr into partitions which are regularly purged of files untouched for 24 hours (order of magnitude) which we intend to be used for running jobs, while the remainder (purged of six-month-old files) would be for outputs, etc. Now comes the sticky part: Jobs may run for more than 24 hours, and could quite possibly leave crucial files untouched for more than 24 hours at a time. We don't want to delete _any_ files which are needed by a running job, even if it has been more than 24 hours. Of course we want to do this with minimal intervention (or possibility of abuse) by the users. So, I am seeking the experience of anyone who has similar problems to hear how they solve things and the software used to implement the policies. I'm interested in hearing ideas about the problem as a whole -- perhaps there are other ways to solve it we haven't thought of -- as well as the more specific ideas of how to implement the 24 hour purge without harming running jobs. A couple of ideas I had: 1) In each directory, base the deletion of all files on the age of the most recently accessed. While there may be some files untouched for long periods, it is very unlikely that _no_ files would be untouched. This method would have to work only with plain files and ignore subdirectories. 2) Use the directory's last access time time to determine the fate of the files it contains. From reading the man pages, I think this is equivalent to the above scheme, but it is not completely clear to me. Please reply by email and I will summarize for these groups. Thanks in advance. -- David Bernholdt bernhold@qtp.ufl.edu Quantum Theory Project bernhold@ufpine.bitnet University of Florida Gainesville, FL 32611 904/392 6365