[comp.std.unix] First cut at Batch PAR

isaak@decvax.dec.com (Jim Isaak) (01/05/90)

From: isaak@decvax.dec.com (Jim Isaak)

[ Here is a preliminary draft Project Authorization Request (PAR)
for a Batch Processing subcommittee of IEEE 1003.  It is accompanied
by a preliminary paragraph supplied later by Jim Isaak.  I will be
posting similar draft and actual PARs and related procedural material
as they reach me.  -mod ]

	I would suggest that you add some intro information for the 
uninitated.  Both "fact" and "status" -- for example, the Batch PAR is
the first proposal we have seen, and it is likely to change before
we agree to "sponsor" work in that area (so suggestions for change are
now very appropriate!) --- I would expect a more mature version just
before the SEC approval meeting (very little time for comment, but still
not "approved") -- then after SEC approval it is time to let the world
know we are soliciting participation and input in that area (no longer
time for comment on PAR contents in general)
			jim

PAR Proposal for Batch Processing			TCOS SEC N117
Karen Sheaffer						Jan. 4, 1990
						
Overview

Supercomputing applications, by definition, have massive resource
requirements.  It is not unusual for applications to require all
available memory, gigabytes of disk space, and still take many hours,
days, or weeks to complete.  A batch processing system that can allocate
and manage system resources among dozens of jobs to allow the efficient
execution of such jobs is essential.

The preparation of supercomputing jobs for submission is often a
complicated task carried out on network nodes other than the supercomputer,
e.g. workstations, front end processors, and minicomputers.  A batch
processing system must permit supercomputer job submission from 
these network nodes and the spooling of output to the network.

UNIX systems have primitive batch capabilities (at, cron), but these are
not adequate for production supercomputing environments.  These facilities
may suffice in a simple environment, but they make no provision for
overall management of a workload running under UNIX.  It is easy to create
a situation in which a number of processes compete for limited resources,
substantially increasing system overhead.

The IEEE 1003.10 Supercomputing Working Group has been developing
a proposed standard for a batch processing system based on NQS, the Network
Queuing System originally developed at NASA Ames.


Scope

The standard will define the system interfaces, utilities, system
administration interfaces, and an application level protocol required by a
network batch processing system in a POSIX environment. This standard will 
provide portability for applications, users, and system administrators.

Purpose

The purpose of this standard it to extend POSIX to provide a network batch
processing system.  These extensions include the following:

	system interfaces 
		checkpoint/recovery-
			the capability of a user session or process to
                        automatically checkpoint itself periodically and
                        to restart at the latest checkpoint following a 
			machine crash or shutdown.  The objective of 
			checkpoint/recovery is to avoid the expense of 
			rerunning work requests that may have been executing 
			several hours or days prior to a machine crash. 

		resource control-
			the ability to control the allotment of the resources 
			of the machine (such as cpu time, memory,disk space,
                        tapes etc.) to a process/session.

	utilities for the submission and management of the requests

	system administration interface for the creation and authorization
        of the network batch processing system

	network application level protocol 

Name of Group which will write the Standard:

	POSIX 1003.10 Supercomputing Working Group


		TCOS-SEC Checklist for New PAR Activity Proposals

I. Administration

	Karen Sheaffer 	Sandia National Laboratories	Chair
	Stuart McKaig	Convex				Vice-Chair
	Jim Tanner	Boeing Computer Services	Technical Editor
	John Caywood 	Unisys				Secretary
(Note with the exception of Stuart McKaig, all of the above have the same
 positions in the 1003.10 Working Group)

II. Working Group
	# of active (have attended 3/4 of meetings) participants 15
	# of correspondent members identified: 50
	Breakdown of active participants:  Producer: 5
					   User    : 10
					   Other   :
	# of companies/interests represented: 14
	What international participation has been identified ?

III. Deliverable Document
	Standard
	Expected Size  200 pages
	Projected time frame:
	First Draft: July 1989			Start Balloting: Fall 1990
	What candidates exist for a "base document"?
		The 1003.10 Supercomputing Working Group Draft Batch Document
		Network Queue System (NQS) public domain software and 
			documentation
IV. Scope
	See above

V. Overlap/Dependencies on other work
	Which TCOS standards assumed: 1003.1 and 1003.2
	
	What functions are required by other groups: Protocol Independent
			Network Service for Portable Applications
	
	What other groups are doing work here:  

Volume-Number: Volume 18, Number 20