hafner@mysost.cs.wisc.edu (Brian J. Hafner) (03/31/91)
Archive-name: unix/batch/condor/1991-03-31 Archive-directory: shorty.cs.wisc.edu:/condor/ [128.105.2.8] Original-posting-by: hafner@mysost.cs.wisc.edu (Brian J. Hafner) Original-subject: Re: Monitoring processes and machines (program itself and central) Reposted-by: emv@msen.com (Edward Vielmetti, MSEN) In article <3545@inews.intel.com> khougland@sedona.intel.com writes: > >I'm intrested in being able to keep tabs on our whole domain. That way, when >people log off for the day; it's usable CPU time! The unfortune problem is >that sometimes the programs crash and burn by themselves and sometimes ye old >operator does a kill -9 one them. You may be interested in "condor" from the Univ. of Wisconsin. A portion of the condor_intro man page: Condor is a facility for executing UNIX jobs on a pool of cooperating workstations. Jobs are queued and executed remotely on workstations at times when those workstations would otherwise be idle. A transparent checkpointing mechanism is provided, and jobs migrate from workstation to workstation without user intervention. When the jobs com- plete, users are notified by mail. Condor may be obtained via anon-ftp from shorty.cs.wisc.edu Brian J. Hafner Computer Sciences Department University of Wisconsin - Madison hafner@cs.wisc.edu