[comp.unix.admin] Network Queuing System

buck@nrl-cmf.UUCP (Loren Buchanan) (06/08/91)

We are in the processes of planning an upgrade of our Cray to Unicos
(I know it's about time), and was wondering about using NQS to
submit jobs to the Cray from SGI, Sun, VAX, IBM, Stardent, and other
computers.  

Are there reasons to not use NQS?  

Is there something else we should be using?  

Are there particular headaches that can be avoided by knowing of past mistakes?  
Is there public domain versions of the software available, or should we tell 
management to come up with the money to buy it.  

Are there third party vendors, and if so who are they?

Any parting shots...er...comments?

Thanks & B Cing U

Buck

P.S. Email responses will be collected, munged, and posted.

-- 
Loren Buchanan (buck@caligula.nrl.navy.mil) | #include <standard.disclaimer>
NRL Code 5842, 4555 Overlook Ave.           | #include <computer.graphics>
Washington, DC 20375         (202) 767-3884 | #include <electronic.music>
Phone tag, America's fastest growing business sport.

buck@nrl-cmf.UUCP (Loren Buchanan) (06/15/91)

This is the response document to the questions I posed about NQS
last week.  I have filtered out most of the noise (and in one case
most of the meat).  It appears as though we will start with the
code from COSMIC, 382 East Broad St., Athens GA 30602, or if you 
want to call, John A. Gibson, Director, (404) 542-3265.  Does
anyone have any experience with the Sterling or General Atomics
versions they would care to share with the rest of us?

From: bernhold@qtp.ufl.edu

>Are there reasons to not use NQS?  

Most of the vendors your listed don't (to my knowledge) offer NQS with
their systems.  You'll have to get it elsewhere.  It is not PD.  It
was developed on contract from NASA with public funds.  It is sold to
try to recover costs via NASA's COSMIC distribution center.  The cost
of the original version of NQS (_not_ what you'll get from Cray!) last
time I checked was $6000.  Don't know if you'd get some kind of deal
for being "family".

The current commercial versions of NQS are very nice -- much advanced
over the one to be had from COSMIC (that may have changed by now -- see
below), but either should probably be workable.  The one thing you'll
probably need which isn't in the older version is the ability to
specify a remote username to run under (verified with .rhosts, etc.).
Otherwise, there is no facility (in the original NQS) for one userid
to submit the job to run under another userid on the remote machine.
Given a knowledge of the communication protocol between NQS daemons,
this shouldn't be hard to implement in the old code (I say that
without having looked at the old code!).  Cavaet:  We are running the
original NQS only and haven't yet tried to speak to a machine running
a more current commercial verion -- who knows what may have changed in
the protocols!

About the different versions:  The original is definitely available
from COSMIC.  With some work, we got it to run on our Suns and FPS.
When I asked for information on NQS a while ago, I was told that a)
the original is being upgraded -- bugs fixed, perhaps _some_ enhanced
capabilities and this may be at COSMIC by now; b) there is a brand new
development, NQS II beginning, which is to rewrite the whole thing
from scratch to address needs which didn't exist when NQS (I) was
designed -- mostly distributed computing, I think.  Since NQS is going
to be a POSIX standard too, I imagine, but don't know for sure, that
NQS II will become POSIX-compliant.  I think NQS II is expected to be
available from COSMIC also, but I don't know the time frame.

I don't know the legality of it, but there used to be a copy of the
original NQS available from the Convex User Group archive on
permac.space.swri.edu.  I checked on it a while after its existance
had been widely announced on the net, and it was still there -- so
either noone who cares heard about it or noone cares or someone is
being stubborn in not removing it.  Take it as you will.

I would like to head any more up-to-date information -- particularly
on (a) vendors planning to support NQS and (b) updated versions of NQS
and where to obtain them.

From: jones@hermes.chpc.utexas.edu

You should run NQS on the cray. 

You can get a version of NQS from COSMIC (at an one time price)  that you can 
run on your SIG, SUN, VAX
and Stardent.   You may have do some porting.  Its not hard once you 
understand the source, I ported it to AIX in about two days, 
but it will take at least a month of work to get to the point where you can
do this.  The nice thing about the COSMIC version is you can do what you
want with it so long as you don't give to foreigner. (You will also
have to modify it to understand cray's tape conventions.)

STERLING SOFTWARE also sells NQS.  They sell it by CPU's and they
also have do maintenance  on NQS.
I don't know yet if they support the CRAY tapes conventions.   They have
ported it to AIX.  

You can also check out RQS from cray.   It allows you to submit jobs to the
cray NQS and get the output files back. 

Bill Jones



From: nash@ucselx.sdsu.edu (Ron Nash)

Here in San Diego, the Cray runs EZBATCH.  Here is the manual:

[[[with large chunks of the manual deleted]]]

                                   EZBATCH



          Scope          EZBATCH discusses the basics of using the Net-
                         work Queuing System (NQS), the UNICOS batch
                         facility.

          Last Revision  May 30, 1991

          Documentation  To view this document at your terminal, use the
                         interactive SDSC utility doc:

                           doc view ezbatch

                         For a list of other doc options, including
                         printing your documents, enter

                           doc

                         and respond to the prompts, or see the doc man
                         page.

          Consulting     For questions about or problems with any SDSC
                         hardware, software, or facilities, please call the
                         SDSC consultants at

                           (619)534-5100

                         between 0800 and 1700 Pacific time. To send your
                         questions online, enter the following and respond
                         to the prompts:

                           mailx consult

                         or use your local mail utility to send your
                         question via Internet mail to the following
                         Internet address:

                           consult@y1.sdsc.edu


                         (c) 1991 General Atomics.
                         General Atomics gives authorized users
                         of the San Diego Supercomputer Center
                         (SDSC) permission to make copies of this
                         document.  Authorized users include
                         academic, industrial, and government
                         researchers with SDSC accounts as well
                         as officials of the National Science
                         Foundation and the University of
                         California.  This material may not be
                         used for commercial purposes.
                         Permission for any other use of this
                         material and by any other party must be
                         obtained from General Atomics.


                                  Table of Contents



                                                                       Page

          Documentation Conventions..................................... 1
          Introduction.................................................. 2
               NQS Requests............................................. 3
               NQS Output............................................... 3
          NQS Queues.................................................... 4
               Batch Queues............................................. 4
                    Standard Queues..................................... 4
                    Queues for Large Disk Requirements.................. 5
                    Test Queue.......................................... 5
                    Queues for High or Low Priority..................... 5
                    Table of Batch Queues............................... 7
               Pipe Queues.............................................. 8
                    Table of Pipe Queues................................ 9
          Choosing a Queue..............................................10
               Choosing a Priority......................................10
               Determining Your Job's Memory Requirements...............11
               Determining Your Job's Local Disk Requirements...........12
          NQS Commands..................................................13
               The qsub command.........................................14
                    Submitting Scripts with Command Options.............14
                    Useful qsub Options.................................14
                    Example qsub Command Line...........................16
                    Specifying qsub Options in the Shell Script.........16
                    Submitting Shells Interactively.....................17
                    Message after Successful Submission.................18
                    Submission Example..................................18
                    Useful Shell Flags..................................18
               The qsmart Utility.......................................20
                    The qsmart Command Line.............................20
                    Example qsmart Command Line with Options............21
                    Interactive qsmart Example..........................21
               The qstat Command........................................23
                    The qstat Command Line..............................23
                    Default qstat Display...............................24
                    Using qstat to Examine Your Jobs....................27
                    Using qstat to Examine the Queue Complexes..........28
               The qdel Command.........................................30
               The qlimit Command.......................................31
               The qmsg Command.........................................32
               The qrank Command........................................33
                    Ranking by Time Submitted and Priority..............33
                    The qrank Command Line..............................34
                    Default qrank Display...............................35
                    Displaying a Single Request.........................36
                    Displaying Queues and Primary Complexes.............36
          Revision History..............................................38



                           INTRODUCTION



You can run jobs under UNICOS on the Cray Y-MP in three
different ways:  interactively in the foreground, interactively
in the background, and in batch.  The Network Queueing System
(NQS) is the UNICOS batch facility, which will help you make the
best use of SDSC system resources.  By submitting your jobs to
the batch queue, you allow NQS to schedule your job according to
the resources requested and to run it when those resources are
available.  By redistributing the load on the system over a 24-
hour period, this scheduling of jobs balances the load during the
day and prevents the machine from idling late at night when the
number of interactive jobs reaches a minimum. NQS also lets you

       o    Stretch your allocation.  When you run jobs
            interactively in the foreground or background, you are
            charged two times the amount of CPU time you use.  By
            running in batch, you can reduce the amount you are
            charged for each job.

       o    Checkpoint your program.  Jobs run in NQS are
            automatically checkpointed.  After a sytem shutdown (or
            crash), checkpointed jobs continue to run from the last
            checkpoint rather than from the beginning, which can
            save you from excessive charges and time delays caused
            by rerunning your entire job.

       o    Run jobs that are too large or too small to be run
            interactively.  Interactive jobs are limited to 6
            Mwords of memory, 20 CPU minutes, and 60 Mwords of disk
            space.  By using NQS, you can run jobs that require up
            to 6000 CPU minutes, 32 Mwords of memory, and 1000
            Mwords of disk space.

       o    Continue running your jobs after you logout.
            Interactive jobs, including those run in the background
            terminate when you logout (unless you specify nohup on
            the command line).

Thus endeth the summary or responses (thanks to all who responded, even
if none of your message ended up in this one).

B Cing U

Buck

-- 
Loren Buchanan (buck@caligula.nrl.navy.mil) | #include <standard.disclaimer>
NRL Code 5842, 4555 Overlook Ave.           | #include <computer.graphics>
Washington, DC 20375         (202) 767-3884 | #include <electronic.music>
Phone tag, America's fastest growing business sport.

sean@ms.uky.edu (Sean Casey) (06/17/91)

I wonder how it compares with MDQS?


-- 
** Sean Casey  <sean@s.ms.uky.edu>

benseb@grumpy.sdsc.edu (Booker Bense) (06/17/91)

>|> From: nash@ucselx.sdsu.edu (Ron Nash)
>|> 
>|> Here in San Diego, the Cray runs EZBATCH.  Here is the manual:
>|> 


- Umm, I don't mean to be picky , but we actually run the NQS
available from Cray Research supplemented with local bug/fixes and 
modifications. These mods are available for public use as well as 
our accounting , queued file tranport and other Unicos modifications.
A qrank utility is now also completed and will be available shortly. 

- You can get qsmart as well, but I wouldn't reccommend it. It's a
very ugly csh script I wrote in a fit of pique about NQS's lack of
graceful shutdown options( then not now). I would be astonished if it
ran anywhere else. BWT, ezbatch is the name of the document. We have
other ones like

ezmath
ezc 
ezdebug ....

-The convention is that an ez-doc contains enough to get you started. 
More complete documentation is also available. 

- Booker C. Bense                    
prefered: benseb@grumpy.sdsc.edu	"I think it's GOOD that everyone 
NeXT Mail: benseb@next.sdsc.edu 	   becomes food " - Hobbes