[net.wanted] need a 4.2 BSD batch job queuer

jiml@uwslh.UUCP (a) (12/12/85)

Does anyone have, or know of, a batch queuing program for 4.2 BSD?
It should behave as a combination of At and Lpr, maintaining a list of
waiting jobs, allowing removals, etc., and running them N at a time.

James E. Leinweber   ...!{seismo,harvard,topaz,ihnp4}!uwvax!uwslh!jiml
Wisconsin State Hygiene Lab          (608) 262-8096
University of Wisconsin; 465 Henry Mall; Madison, WI  53706

sean@ukma.UUCP (Sean Casey) (12/16/85)

In article <132@uwslh.UUCP> jiml@uwslh.UUCP (a) writes:
>Does anyone have, or know of, a batch queuing program for 4.2 BSD?
>It should behave as a combination of At and Lpr, maintaining a list of
>waiting jobs, allowing removals, etc., and running them N at a time.
>
>James E. Leinweber   ...!{seismo,harvard,topaz,ihnp4}!uwvax!uwslh!jiml
>Wisconsin State Hygiene Lab          (608) 262-8096
>University of Wisconsin; 465 Henry Mall; Madison, WI  53706

You should inquire into mdqs, a very flexible queueing system by BRL.  It's
too big to mail, but you can get it free from BRL if you have a 4.2 license.

Here's the usenix intro...

#! /bin/sh
: This is a shell archive, meaning:
: 1. Remove everything above the '#! /bin/sh' line.
: 2. Save the resulting text in a file.
: 3. Execute the file with /bin/sh '(not csh)' to create the files:
:	'/usr/src/brl/mdqs/doc/usenix'
: This archive created: 'Sun Dec 15 16:17:55 1985
'
: By:	'Sean Casey (The White Tower @ The Univ. of KY)'
export PATH; PATH=/bin:$PATH
echo shar: extracting "'usenix'" '(14656 characters)'
if test -f 'usenix'
then
	echo shar: will not over-write existing file "'usenix'"
else
cat  >'usenix' <<'SHAR_EOF'
.TL
The Multiple Device Queueing System
.AU
Douglas P. Kingston III
.AI
Vulnerability/Lethality Division
Ballistics Research Laboratory
U.S. Army
.AU
Michael J. Muuss
.AI
Ballistics Modelling Division
Ballistics Research Laboratory
.AB
.PP
The Multiple Device Queueing System (MDQS) is designed to
provide UNIX with a full function, modular, and consistent queueing
system. The MDQS system has been designed with portability,
expandability, robustness, and data integrity as key goals. MDQS is
designed around a central queue which is managed by a single privileged
daemon.  Requests, delayed or immediate, are queued by non-privileged
programs.  Once queued, requests can be listed, modified or deleted. 
When the requested device or job stream becomes available, the daemon
executes an appropriate server process to handle the request. Once
activated, the request can still be canceled or restarted if needed. 
MDQS can serve as a delayed-execution/batch subsystem and replaces
internally the functions of lpr(I) and at(I) as a minimum. MDQS provides
the system manager with a number of tools for managing the queueing
system.  Queues can be created, modified, or deleted without the loss of
requests.  MDQS recognizes and supports both multiple devices per queue
and multiple queues per device by mapping input for a logical device to
an appropriate physical output device.  Anticipating the inevitable,
MDQS also provides for crash recovery.
.PP
The MDQS system has been developed at the U.S. Army, Ballistics
Research Laboratory by Doug Kingston and Michael Muuss to support the
work of the laboratory and is available to other UNIX sites upon request.
The MDQS system is designed to be compilable
on a standard V7 UNIX system.
.AE
.SH
System Overview
.PP
Traditionally UNIX
.FS
Unix is a trademark of Bell Laboratories.
.FE
has been a small system aimed at experienced computer
users.
The early UNIX queuing systems followed the basic tools concept
found throughout UNIX and provided only basic queuing services
without any frills.
Since then, UNIX has become a widely accepted system
servicing a wide range of users and applications.
With wider use has come greater demands on the software.
The queuing software has been asked to provide the
same kinds of services found on "larger" systems.
In general, these requests have not been able to be satisfied
with the currently released UNIX queuers.
The authors, both researchers at the Ballistics Research
Laboratory (BRL), have undertaken to develop a
general purpose queuing system for UNIX as part of
the continuing UNIX development project at BRL.
.PP
The Multiple Device Queueing System (MDQS) was designed to fill
in the void in the released versions of UNIX
caused by the lack of an adequate queuing system.
The MDQS is collection of programs that share a common queue
management system.
There are "enqueuers" that place items in the queue.
These are normally programs like \fBlpr\fR(1) and \fBat\fR(1), but
could also be network servers accepting requests from another machine.
The "dequeuers" are normally the programs that actually handle output
onto the device, but if the device is not resident on the machine,
the dequeuer is a network transfer program which will connect
to an appropriate server process on a remote machine.
The dequeuers are analogous to the UNIX programs \fBlpd\fR(8),
and \fBatrun\fR(8).
Queue management programs
include the daemon, a status program, and a queue modification
program.
.SH
Features
.PP
Lack of a "full function" queuing system
for UNIX was a problem:
we wanted a number of features that are common to any queued request
to be handled by the same mechanism in all cases.
These include specification of start time, notice of completion,
prioritization, and output limits.
We needed the capability for users to list, modify, and
delete the queue entries
if necessary.
The organization of the queues
needed to allow for flexible queue administration.
It should be possible to have more than one device servicing a single
queue and likewise more than one queue should be able to feed
a single device.
The pairing of devices and queues in MDQS is table driven so that
it can be reconfigured by simply editing a file
which MDQS reads at runtime (not at compile time!).
.PP
The basic features of all queue requests include
several which have been missing from other queuing systems.
All requests are assigned a start time.
If the start time is in the future, that request is delayed until
that time for enabling.
Every request is also assigned a priority.
If two requests require the same resource, the one with the higher
priority is run first, or if both have the same priority,
the request with the earlier start time is run first.
A request can be flagged indicating that the initiator of
the request wants notification when the request has
been processed.
This notification is designed to work across network connections.
The request has associated with it a forms requirement
in recognition that device do not always have the materials
we expect in them.
Requests can stipulate limits on their output to
avoid runaway outputs.
Lastly, one can request multiple copies of the output without
submitting the source file multiple times.
.PP
The heart of the queuing system is the daemon.
The MDQS daemon periodically searches the queue for additions,
deletions, or modifications.
The daemon keeps several internal queues of requests.
A delayed request queue holds all requests whose start time
lies sometime in the future.
The ready request queue's contains all requests that are ready
to be processed but which are awaiting a free device.
There is one ready request queue for each \fIlogical\fR
device that is supported.
This logical device queue may be serviced by more than
one physical device on a heavily loaded system.
On another system, a "printer/plotter" type device
may be servicing both a plot queue and a lineprinter queue.
.PP
Since users often make mistakes, the queuing system
had to have provisions for monitoring and modification by
the user community as well as control by the operations
staff.
The queue status program allows the user to get
detailed information about his own requests and general
information about other jobs in the queues.
A queue modification program allows the user to change
the priority, start time, limits and flags of his
queued request, or he can delete it altogether.
.PP
Every job queued gets a "sequence" number which is
unique for his UID and can be used to reference that job
for modification, or deletion.
In the global sense, every job is identified by the UID and
"sequence" number combination, but for the individual user,
the sequence number will usually be a small integer.
Increasing sequences of small integers on a per user basis
is both easier to think about and will map more
directly to order actions in the users mind.
.PP
Finally, the queuing system is designed to recover from
crashes or program failures by verifying each step as it
is taken.
For instance, active queue files are not deleted
until the request has been completed
by the MDQS to the best its ability.
After a crash, the daemon will try and restart the job that
was running at the time of the crash.
Where possible, duplication of output is eliminated by
noting which portions of a job have been completed.
.SH
Configurability
.PP
There are several factors behind the design of the MDQS.
The primary design criteria is that the system should be modular.
We have the need to queue output to a large number of different devices
on different machines.  The MDQS will allow requests to be
queued on a machine regardless of
whether the actual device resides there.
We wanted it
to be possible to add and delete active queues as the status
of hardware and system configurations changed.
Lastly, we didn't want to have a different queuing system to maintain
for each device.
This means less work for the systems staff,
and one common interface for the user community to learn.
All configuration dependent information is read in at runtime
by those MDQS programs that need to know, enabling us to have
transportable binaries among homogeneous machines which greatly
simplifies queue management.
.SH
Functional Example
.PP
An example of the use of the MDQS will be helpful is showing
some of the capabilities of the system.
A typical user might start the day by submitting 3 requests to the
print queue to be printed on 8x11 paper.  He later finds out that
the computer room is out of 8x11 paper.  One of his printouts is needed
for a meeting, so he decides to print it on 15x11 instead.  He does
a queue status command to find his print requests ids.
Using the queue modify command, he changes the forms on his second request
to 15x11 and asks that he be notified when the request is completed.
Twenty minutes later he receives a electronic message from
the queue daemon announcing that his second print request
has completed.
.PP
Later that day, 
the user queues 2 "batch" jobs to be run after midnight
and again does a queue status command.
The who 8x11 prints (id=1,3) are still there, and the new
batch jobs (id=4,5) are now also shown.
Finally, before he leaves for the night,
he decides to start running the first batch job with
the hope that it will be done by the time he is finished
with dinner.
Using the qmodify command he changes the start time of the job
to the current time.
That evening he logs on to get his mail and finds his first
job has completed successfully.
.PP
The operators for the system could also issue the commands
demonstrated above but if they wanted to access other than their
own requests, they would have to specify the user as well as
the request id.
Operators also need to be able to inform MDQS of changes in the
queuing system (e.g. changes in paper type, enabling of a second printer).
To make these changes, the operator just edits the MDQS configuration file.
The queue daemon detects the changes and modifies its internal tables
accordingly.
.SH
Structure of the Queue
.PP
The queuing system uses a single set of queuing directories.
Figure 1 shows the directory hierarchy that is currently used.
The sequence of events require that the queuing be done in two
steps by two processes.
.PP
Another area of strong concern is the security and integrity of
the data and accounting information.
In order to preserve the integrity of the user's data, all
MDQS processes run with the UID and GID of the user when
accessing his files, including those files in the queue directory.
The queue directories are generally writable to allow "enqueuers"
running as a user to create files,
but the queue directories are
protected by a "lock" directory which is only entered by programs
which are setuid to the MDQS UID.  Those programs fall under the
the category of "trusted programs" and are maintained by
the systems staff.
.PP
One of the common places for security problems on all computer
systems
is in the area of "queuing systems", and we have had our
on share of headaches with the old lineprinter spooler.
The approach taken with MDQS was to always access data unprivileged.
In addition, since the dequeuing process will be running with
the user's real uid and gid, it will be possible to print
protected files without copying them.
.PP
The enqueuing process currently consists of two steps
which are normally separated into two processes.
The first process runs unprivileged and creates uniquely named files
in the \fIqtmp\fR directory.
One of these, the control file,
will contain the names of any other queued files.
When all the files have been created, the first process calls a
program that is setuid to the owner of the queue directories.
This program is passed the name of the control file as an argument.
The process is able to traverse
the \fIqlock\fR directory, and after chdir'ing
to the \fIqhome\fR directory, returns his effective UID and GID to the
user making the request so that he can modify the the control file.
All the data files are linked
into the \fIqdata\fR directory, and finally the
control file is linked into the \fIqcontrol\fR directory.
Throughout this enqueuing,
all the files remain readable and writable only
to the user making the queue request.
.SH
Resource Accounting
.PP
The BRL/JHU
accounting systems relies on the ability of a parent process to
inherit the accounting information of a child process.
Orphaned children (processes whose parents die before they do) are
inherited by INIT (process id 1).  INIT collects the accounting
information and updates the accounting records.
In order to properly account for the costs of processing a queue
request, we could not have the daemon inheriting the dead "dequeuer"
processes.  This would have charged all queue processing to the
MDQS account.
.PP
The need to doubly detach the servicing process causes a problem for
the daemon since it now more difficult
to monitor the status of that program.
The strategy to handle the "dequeuing" process was as follows.
A request is chosen from the list of ready requests.
The daemon forks and the child, referred to as the file-control process, sets
up the input file descriptor from the control file, and the output
file descriptor to the device.
The filecontrol process sets real and effective user id's to the
person who made the request.
Finally, a pipe is created which will allow the filecontrol process to
get the return code without doing a wait().
The filecontrol process then forks and
executes the appropriate servicing process.
When the servicing process is done,
he writes an exit code onto the pipe
to the filecontrol process and exits.
The filecontrol process reads this result and then exits
with this return code \fIwithout\fR
doing a wait().  The result is that the daemon gets
the return code but without inheriting the accounting.
The service process is now inherited by INIT who logs the time and
materials used.
.SH
Current Implementation Status
.PP
As of 22 June,
the current implementation of the MDQS consists of the daemon,
a qstatus program, a lineprinter queuer and a lineprinter
dequeuer.
In development are a network server for the lineprinter queue,
a "batch job" queuer and dequeuer, a typesetting queuer
(local and network) and
dequeuer,
and the qmodify command.
.PP
MDQS does use locking code to help preserve the integrity
of the queue and prevent collisions of competing servers.
We have implemented the RAND exclusive open code and have
been quite pleased (its upwards compatible!).
Since others may not have this available,
or may have a better mechanism, the locking
code has been isolated to one module which can
be replaced as needed.
SHAR_EOF
if test 14656 -ne "`wc -c < 'usenix'`"
then
	echo shar: error transmitting "'usenix'" '(should have been 14656 characters)'
fi
fi # end of overwriting check
:	End of shell archive
exit 0
-- 
-------------------------------------------------------------------------------
Sean Casey                             UUCP:  sean@ukma.UUCP   or
915 Patterson Office Tower                    {cbosgd,anlams,hasmed}!ukma!sean
University of Kentucky                 ARPA:  ukma!sean@ANL-MCS.ARPA
Lexington, Ky. 40506-0027            BITNET:  sean@UKMA.BITNET