khouglan@autoc3.intel.COM (Kriss Hougland~) (03/29/91)
I'm intrested in being able to keep tabs on our whole domain. That way, when people log off for the day; it's usable CPU time! The unfortune problem is that sometimes the programs crash and burn by themselves and sometimes ye old operator does a kill -9 one them. What I am wondering is: 1) Where can I find say the source code for a "ps" function so I don't have to C shell out and get the info. 2) I'm trying to find the "ofiles" on a comp.source.unix machine. (so far no luck.) 3) I'm trying to figure out if there is a way to totally swap out the program (context or whatever) so I can resume execution later. Or at worst, have a central program (daemon time) that will kill it remotely. (like when someone comes in the morning and logs on, I want to either kill the process via a central program on another machine -- trying to use sockets now, or swap out the program so people don't gripe and get the operator to do a #9 on it.) Currently, I don't have source for the number crunching programs. Please post any comments or suggestions. I hope I have not screwed up my point, but I have a feeling that other people might be intrested in distributive computing the chuncky way other than using "at". All rights given, All wrongs deserved! -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_- Addresses: !Disclaimer: All information is my own and is not that khouglan@hopi.intel.com ! my employer. "Opportunity came knocking, but I was askah@acvax.inre.asu.edu! in the bathroom." (ME) -- -_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_- Addresses: !Disclaimer: All information is my own and is not that khouglan@hopi.intel.com ! my employer. "Opportunity came knocking, but I was askah@acvax.inre.asu.edu! in the bathroom." (ME)
hafner@mysost.cs.wisc.edu (Brian J. Hafner) (03/31/91)
In article <3545@inews.intel.com> khougland@sedona.intel.com writes: > >I'm intrested in being able to keep tabs on our whole domain. That way, when >people log off for the day; it's usable CPU time! The unfortune problem is >that sometimes the programs crash and burn by themselves and sometimes ye old >operator does a kill -9 one them. You may be interested in "condor" from the Univ. of Wisconsin. A portion of the condor_intro man page: Condor is a facility for executing UNIX jobs on a pool of cooperating workstations. Jobs are queued and executed remotely on workstations at times when those workstations would otherwise be idle. A transparent checkpointing mechanism is provided, and jobs migrate from workstation to workstation without user intervention. When the jobs com- plete, users are notified by mail. Condor may be obtained via anon-ftp from shorty.cs.wisc.edu Brian J. Hafner Computer Sciences Department University of Wisconsin - Madison hafner@cs.wisc.edu