[comp.sys.sgi] how do you run your batch jobs?

jdh@bu-pub.bu.edu (Jason Heirtzler) (03/16/90)

At BU we have our jobs on the SGIs split into two categories: long
running (non-interactive) batch jobs and interactive jobs, typically
whomever is sitting at the console.

What we'd like to do is reduce the batch jobs to having the lowest
impact on the interactive jobs.  It looks like one possible way to
do this is to start cron with `runon 3 cron' and maybe changing the
queuedefs file to further reduce the batch processes priority.  This
would, unfortunately, nullify any parallelism in the batch task.

My questions are

        `npri -n 15 -p pid' doesn't seem to have much effect; is
        this a bug in IRIX 3.2.1?

        will `runon 3 cron' force batch(1) jobs to run on CPU 3?

        The MP_MUSTRUN parameter of sysmp(2) only affects the calling
        process.  Will there be a way to affect an arbitrary process?

If you have a system for handling batch jobs that you think is useful,
I'd be interested in hearing from you.


-------------------------------------------------------------------------
Jason Heirtzler           (617) 353-2780       jdh@bu-pub.bu.edu
Information Technology    Boston University    ..!harvard!bu.edu!bu-pub!jdh

-------------------------------------------------------------------------
Jason Heirtzler           (617) 353-2780       jdh@bu-pub.bu.edu
Information Technology    Boston University    ..!harvard!bu.edu!bu-pub!jdh  

srp@babar.mmwb.ucsf.edu (Scott R. Presnell) (03/16/90)

jdh@bu-pub.bu.edu (Jason Heirtzler) writes:

>At BU we have our jobs on the SGIs split into two categories: long
>running (non-interactive) batch jobs and interactive jobs, typically
>whomever is sitting at the console.

>What we'd like to do is reduce the batch jobs to having the lowest
>impact on the interactive jobs.  It looks like one possible way to
>do this is to start cron with `runon 3 cron' and maybe changing the
>queuedefs file to further reduce the batch processes priority.  This
>would, unfortunately, nullify any parallelism in the batch task.

>If you have a system for handling batch jobs that you think is useful,
>I'd be interested in hearing from you.

I have written a batch job handler that seems to work for us.  Some of the
things it can do...
	1) Start and stop at various times of the day.
	2) Start and stop at varying loads.
	3) Start and stop based on whether or not someone has logged
		into the console. (interactive tasks have a definite
		priority at our site).

	(start and stop by sending SIGCONT and SIGSTOP to entire process
		groups)

	4) Run jobs with different nices *and* different n-pri's
	
	5) There is message logging and accounting as well.

The interface looks something like BSD lpr: there is a /etc/batchcap. and
the programs are:
	batch, baq, barm, bac (and batchd for the daemon).

I haven't "beta" tested it with anyone, and I have no idea what it will do
on a multiprocessor machine as we don't have one (this was developed on a
PI).  To be honest, I'm still fixing the occasional bug.  But it's used
almost constantly, and the daemon basically never goes down.  I'm still
adding features and making it more robust.

If your interested, drop me a line.

	- Scott
--
Scott Presnell				        +1 (415) 476-9890
Pharm. Chem., S-926				Internet: srp@cgl.ucsf.edu
University of California			UUCP: ...ucbvax!ucsfcgl!srp
San Francisco, CA. 94143-0446			Bitnet: srp@ucsfcgl.bitnet

pj@giraffe.asd.sgi.com (Paul Jackson) (03/16/90)

In article <54023@bu.edu.bu.edu>, jdh@bu-pub.bu.edu (Jason Heirtzler) writes:
> 
> At BU we have our jobs on the SGIs split into two categories: long
> running (non-interactive) batch jobs and interactive jobs, typically
> whomever is sitting at the console.
> 
> What we'd like to do is reduce the batch jobs to having the lowest
> impact on the interactive jobs.  It looks like one possible way to
> do this is to start cron with `runon 3 cron' and maybe ...
> 
>         `npri -n 15 -p pid' doesn't seem to have much effect; is
>         this a bug in IRIX 3.2.1?

The npri -n option takes absolute nice values, not relative.  The nice command takes relative.
The absolute nice of a process is visible under the NI column of ps -l.  It is typically
20 for interactive processes.  To slow a process down, try -n 30 or (slowest) -n 39.
The request for -n 15 will actually speed a process up a little bit.

>
> ...
> 
> If you have a system for handling batch jobs that you think is useful,
> I'd be interested in hearing from you.

The next release will support a new option - a port of NQS to IRIX.
This is a more elaborate batch system that is available on several
big-iron number crunchers, such as Cray, Convex, and (soon) SGI.

				Thanks, take care ...
				Paul Jackson (pj@asd.sgi.com), x1373