bill@inls1.ucsd.edu (Bill Reynolds) (04/23/91)
Greetings, We are a computational physics group running a network of Sun and SGI workstations. We often have long running jobs on many of our machines. This leads to problems when a machine needs to be taken down that has a job in the third day of a five day run. What we would like is a routine to checkpoint a job to a disk file for later reloading into memory. I've looked at undump, but isn't adequate, we need to restart the job where it was interrupted. I've also looked at condor, but it seems to be a fly-with-a-sledgehammer type solution. I'm wondering if there are any simple unix/sun/sgi utilities to do checkpointing. (I know that such facilities exist for crays). Thankyou much, -- _______________________________________________________________________ | Bill Reynolds | bill@inls1.ucsd.edu