[comp.archives] [unix-wizards...] Re: Monitoring processes and machines

hafner@mysost.cs.wisc.edu (Brian J. Hafner) (03/31/91)

Archive-name: unix/batch/condor/1991-03-31
Archive-directory: shorty.cs.wisc.edu:/condor/ [128.105.2.8]
Original-posting-by: hafner@mysost.cs.wisc.edu (Brian J. Hafner)
Original-subject: Re: Monitoring processes and machines (program itself and central)
Reposted-by: emv@msen.com (Edward Vielmetti, MSEN)

In article <3545@inews.intel.com> khougland@sedona.intel.com writes:
>
>I'm intrested in being able to keep tabs on our whole domain.  That way, when 
>people log off for the day; it's usable CPU time!  The unfortune problem is
>that sometimes the programs crash and burn by themselves and sometimes ye old
>operator does a kill -9 one them.

You may be interested in "condor" from the Univ. of Wisconsin.
A portion of the condor_intro man page:

     Condor is a facility for executing UNIX jobs on a pool of
     cooperating workstations.  Jobs are queued and executed
     remotely on workstations at times when those workstations
     would otherwise be idle.  A transparent checkpointing
     mechanism is provided, and jobs migrate from workstation to
     workstation without user intervention.  When the jobs com-
     plete, users are notified by mail.

Condor may be obtained via anon-ftp from shorty.cs.wisc.edu

Brian J. Hafner
Computer Sciences Department
University of Wisconsin - Madison
hafner@cs.wisc.edu