isaak@decvax.dec.com (Jim Isaak) (01/05/90)
From: isaak@decvax.dec.com (Jim Isaak) [ Here is a preliminary draft Project Authorization Request (PAR) for a Batch Processing subcommittee of IEEE 1003. It is accompanied by a preliminary paragraph supplied later by Jim Isaak. I will be posting similar draft and actual PARs and related procedural material as they reach me. -mod ] I would suggest that you add some intro information for the uninitated. Both "fact" and "status" -- for example, the Batch PAR is the first proposal we have seen, and it is likely to change before we agree to "sponsor" work in that area (so suggestions for change are now very appropriate!) --- I would expect a more mature version just before the SEC approval meeting (very little time for comment, but still not "approved") -- then after SEC approval it is time to let the world know we are soliciting participation and input in that area (no longer time for comment on PAR contents in general) jim PAR Proposal for Batch Processing TCOS SEC N117 Karen Sheaffer Jan. 4, 1990 Overview Supercomputing applications, by definition, have massive resource requirements. It is not unusual for applications to require all available memory, gigabytes of disk space, and still take many hours, days, or weeks to complete. A batch processing system that can allocate and manage system resources among dozens of jobs to allow the efficient execution of such jobs is essential. The preparation of supercomputing jobs for submission is often a complicated task carried out on network nodes other than the supercomputer, e.g. workstations, front end processors, and minicomputers. A batch processing system must permit supercomputer job submission from these network nodes and the spooling of output to the network. UNIX systems have primitive batch capabilities (at, cron), but these are not adequate for production supercomputing environments. These facilities may suffice in a simple environment, but they make no provision for overall management of a workload running under UNIX. It is easy to create a situation in which a number of processes compete for limited resources, substantially increasing system overhead. The IEEE 1003.10 Supercomputing Working Group has been developing a proposed standard for a batch processing system based on NQS, the Network Queuing System originally developed at NASA Ames. Scope The standard will define the system interfaces, utilities, system administration interfaces, and an application level protocol required by a network batch processing system in a POSIX environment. This standard will provide portability for applications, users, and system administrators. Purpose The purpose of this standard it to extend POSIX to provide a network batch processing system. These extensions include the following: system interfaces checkpoint/recovery- the capability of a user session or process to automatically checkpoint itself periodically and to restart at the latest checkpoint following a machine crash or shutdown. The objective of checkpoint/recovery is to avoid the expense of rerunning work requests that may have been executing several hours or days prior to a machine crash. resource control- the ability to control the allotment of the resources of the machine (such as cpu time, memory,disk space, tapes etc.) to a process/session. utilities for the submission and management of the requests system administration interface for the creation and authorization of the network batch processing system network application level protocol Name of Group which will write the Standard: POSIX 1003.10 Supercomputing Working Group TCOS-SEC Checklist for New PAR Activity Proposals I. Administration Karen Sheaffer Sandia National Laboratories Chair Stuart McKaig Convex Vice-Chair Jim Tanner Boeing Computer Services Technical Editor John Caywood Unisys Secretary (Note with the exception of Stuart McKaig, all of the above have the same positions in the 1003.10 Working Group) II. Working Group # of active (have attended 3/4 of meetings) participants 15 # of correspondent members identified: 50 Breakdown of active participants: Producer: 5 User : 10 Other : # of companies/interests represented: 14 What international participation has been identified ? III. Deliverable Document Standard Expected Size 200 pages Projected time frame: First Draft: July 1989 Start Balloting: Fall 1990 What candidates exist for a "base document"? The 1003.10 Supercomputing Working Group Draft Batch Document Network Queue System (NQS) public domain software and documentation IV. Scope See above V. Overlap/Dependencies on other work Which TCOS standards assumed: 1003.1 and 1003.2 What functions are required by other groups: Protocol Independent Network Service for Portable Applications What other groups are doing work here: Volume-Number: Volume 18, Number 20