muller@sdcc3.UUCP (Keith Muller) (02/10/85)
This is the first shar file of 8 for the load control system as described
in net.unix-wizards and net.sources.
NOTE: This shar file MUST be unpacked berfore any of the others. It creates
several subdirectories which the seven other shar files require. Make
a subdirectory for the load control system source and unpack all eight
shar files in it.
Keith Muller
ucbvax!sdcsvax!muller
# This is a shell archive. Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by sdcc3!muller on Sat Feb 9 13:40:15 PST 1985
# Contents: client/ control/ h/ scripts/ server/ man/ README NOTICE Makefile
# man/Makefile man/ldc.8 man/ldd.8 man/ldq.1 man/ldrm.1
echo x - README
sed 's/^@//' > "README" <<'@//E*O*F README//'
TO INSTALL: (you MUST be root) (January 24, 1985 version)
1) Select a group id for load control system to use. No user should be in this
group. Add this group to /etc/groups and call it lddgrp.
** By default the group id 25 is used. **
2) Look at the file h/common.h. Make sure that LDDGID is defined to be the
same group id as you selected in step 1.
3) cd to the scripts directory. Inspect the paths used in the file makedirs.
The script makedirs creates the required directories with the proper modes
groups and owners. The .code directories are where the real executable
files are hidden, protected by group access (the directory is protected
from all "other" access). Each directory which contains programs that you
want load controlled must have a .code subdirectory.
NOTE: You really do not have to change makedirs at all except to ADD
any additional directories you want controlled. It is perfectly safe to
just run this system on any 4.2 system without ANY path changes (this
includes sun, vax and pyramid versions).
4) If you alter or add any pathnames in makedirs, you might have to adjust
the makefiles. For each subdirectory (client, server, control) adjust
or add the paths in the Makefiles.
5) If you alter any pathname in makedirs you will have to check all the h
files in the directory h. Change any paths as required.
6) run makedirs (if you have an older release of ldd: You should shut down
the ldd server and remove the old status and errlog file. Then run
makedirs.) Makedirs can be run any number of times without harm. It will
reset the owners and groups of all directories to the correct state.
7) In the top level directory (The same directory as this README file is in),
run make. then make install. All the binaries are now in place.
8) Start the ldd server:
/etc/ldd [-T cycle] [-L load]
The server will detach itself and wait for requests. You should get no
messages from the server. The two flags are optional. The -L flag
specifies the number of seconds between each load average check. The
-L flag specifies the load average queueing starts. If neither are
specified the defaults are used. (see the manual page for ldd). You
can change the defaults by editing h/server.h. ALRMTIME is the cycle
time, and MAXLOAD is the load average.
The following are good values to start with:
machine cycle load
----------------------------------------------------------
pyramid 90x 25 10.0
pyramid 90mx 15 15.0
vax 780 50 9.0
vax 750 60 7.5
vax 730 60 6.0
sun 2 60 6.5
9) add the following lines to /etc/rc.local (change path and add any ldd
arguements as selected from the above table). See the man page on ldd
for more info.
if [ -f /etc/ldd ]; then
/bin/rm -f /usr/spool/ldd/sr/errors
/etc/ldd & echo -n ' ldd' >/dev/console
fi
10) for each directory to be controlled select those programs you want under
the load control system. The programs you select should be jobs that
usually do not require user interaction, though nasty systems like macsyma
might be load controlled anyway. Never load control things that have time
response requirements. The jobs you select will determine the overall
usefullness of the load control system. For the load control system to
be completely effective, all the programs that cause any significant load
on the system should be placed under load control. For example the cc
command is a very typical of a program that should be load controlled.
When run, cc uses large amount of resources which increases as the size
of the program being compiled increases. When there are many cc's running
simultaneously the machine gets quite overloaded and your system thrashes.
A poor choice would be a command like cat. Sure cat can do a lot of i/o,
but even ten cat's reading very large files do not impact the system
very much. Troff is a very good command to load control. It is not very
interactive, and a lot of them running would bring even slow a cray.
Watching your system when it is overoaded with ps au should tell you which
programs on your system need to be load controlled.
The following is a list of programs I have under load control:
/bin/cc /bin/make /bin/passwd /usr/bin/pc /usr/bin/pix /usr/bin/liszt
/usr/bin/lisp /usr/bin/vgrind /usr/ucb/f77 /usr/ucb/lint /usr/ucb/nroff
/usr/ucb/spell /usr/ucb/troff /usr/ucb/yacc
The following is the list of places to look for other candidates for load
control:
a) /bin
b) /usr/bin
c) /usr/ucb
d) /usr/new
e) /usr/local
f) /usr/games
i) some programs use argv[0] to pass data (so far only the ucb pi
does this when called by pix). These programs must be treated
differently (since they mangle argv[0], it cannot be used to
determine which binary to execute). A special client called
.NAMEclient where NAME is the actual name of the program must be
created. These special programs must be specified in the
client/Makefile. See the sample for $(SPEC1) which is for a program
called test in /tmp. Run the script onetime/saddldd for these programs.
ii) run the script scripts/addldd with each program to be load controlled
that requires a STATUS MESSAGE ("Queued waiting to run.") as an
arguement (i.e. addldd /bin cc make)
iii)run the script scripts/qaddldd with each program to be load controlled
that DOES NOT require a STATUS MESSAGE as an arguement
(i.e. qaddldd /usr/bin nroff)
addldd/qaddldd/saddldd moves the real binary into the .code file and
replaces it with a "symbolic link" to either .client (for addldd and
qaddldd) or a .NAMEclient (for saddldd) So the command:
addldd /bin cc
moves cc to /bin/.code/cc and creates the symbolic link /bin/cc
to /bin/.client.
11) any changes to any file in the load control system from now on
will be correctly handled by a make install from the top level directory.
12) the script script/rmldd can be used to remove programs from the ldd system.
13) Compilers like cc and pc should have all the intermediate passes protected.
Each pass must be in group lddgrp and have the others access turned off
For example:
chmod 0750 /lib/c2
chgrp lddgrp /lib/c2
14) When the system is running you might have to adjust the operating
parameters of ldd for the job mix and the capacity of your machine.
Use ldc to adjust these parameters while the load control system is
running and watch what happens. The .h files as supplied use values that
will safely work on any machine, but might not be best values for your
specific needs. In the vast majority of cases, only the load point
and cycle time need to be changed and these can be set with arguements to
ldd when it is first invoked. Be careful as radical changes to
the defaults might make defeat the purpose of ldd. If things ever get
really screwed up, you can just kill -9 the server (or from ldc: abort
server) and things will run just like the load control doesn't exsist.
(Note the pid of the currently running ldd is always stored in the lock
file "spool/ldd/sr/lock"). (See the man page on ldd for more).
15) If load control does not stop the system load to no more than the load
limit + 2.5 then there are programs that are loading down the machine
which are not under load control. Find out what they are and load control
them.
16) To increase the response of the system you can lower the load threshold.
Of course if the threshold gets too low the system can end up with long
wait times for running. Long wait times are usually around 3000 seconds
for super loaded vaxes. On the very fast pyramids, 500 seconds (48 users
and as many large cc as the students can get running) seems the longest
delay I have seen. You can also play with the times between checks. This
has some effect on vaxes but 50 - 60 seconds seems optimal. On pyramids
it is quite different. Since the throughput is so very much greater
than vaxes (four times greater at the very least), the load needs to be
checked at least every 25 seconds. If this check time is too long you
risk having the machine go idle for a number of seconds. Since the whole
point is to squeeze out every last cpu cycle out of the machine, idle
time must be avoided. Watching the machine with vmstat or the mon program
is useful for this. Try to keep the user percentage of the cpu as high
as possible. Try to have enough jobs runnable so the machine doesn't
go idle do to a lack of jobs (yes this can happen with lots of disk io).
17) If you want/need more info on the inner workings of the ldd system, you
can read the comments in the .h files and the source files. If you have
problems drop me a line. I will be happy to answer any questions.
Keith Muller
University of California, San Diego
Mail Code C-010
La Jolla, CA 92093
ucbvax!sdcsvax!muller
(619) 452-6090
@//E*O*F README//
chmod u=r,g=r,o=r README
echo x - NOTICE
sed 's/^@//' > "NOTICE" <<'@//E*O*F NOTICE//'
DISCLAIMER
"Although each program has been tested by its author, no warranty,
express or implied, is made by the author as to the accuracy and
functioning of the program and related program material, nor shall
the fact of distribution constitute any such warranty, and no
responsibility is assumed by the author in connection herewith."
This program cannot be sold, distributed or copied for profit, without
prior permission from the author. You are free to use it as long the
author is properly credited with it's design and implementation.
Keith Muller
Janaury 15, 1985
San Diego, CA
@//E*O*F NOTICE//
chmod u=r,g=r,o=r NOTICE
echo x - Makefile
sed 's/^@//' > "Makefile" <<'@//E*O*F Makefile//'
#
# Makefile for ldd server and client
#
#
all:
cd server; make ${MFLAGS}
cd client; make ${MFLAGS}
cd control; make ${MFLAGS}
lint:
cd server; make ${MFLAGS} lint
cd client; make ${MFLAGS} lint
cd control; make ${MFLAGS} lint
install:
cd server; make ${MFLAGS} install
cd client; make ${MFLAGS} install
cd control; make ${MFLAGS} install
cd man; make ${MFLAGS} install
clean:
cd server; make ${MFLAGS} clean
cd client; make ${MFLAGS} clean
cd control; make ${MFLAGS} clean
@//E*O*F Makefile//
chmod u=r,g=r,o=r Makefile
echo mkdir - client
mkdir client
chmod u=rwx,g=rx,o=rx client
echo mkdir - control
mkdir control
chmod u=rwx,g=rx,o=rx control
echo mkdir - h
mkdir h
chmod u=rwx,g=rx,o=rx h
echo mkdir - scripts
mkdir scripts
chmod u=rwx,g=rx,o=rx scripts
echo mkdir - server
mkdir server
chmod u=rwx,g=rx,o=rx server
echo mkdir - man
mkdir man
chmod u=rwx,g=rx,o=rx man
echo x - man/Makefile
sed 's/^@//' > "man/Makefile" <<'@//E*O*F man/Makefile//'
#
# Makefile for ldd manual pages
#
DEST= /usr/man
TARG= $(DEST)/man8/ldd.8 $(DEST)/man8/ldc.8 $(DEST)/man1/ldrm.1 \
$(DEST)/man1/ldq.1
all:
install: $(TARG)
$(DEST)/man8/ldd.8: ldd.8
install -c -o root ldd.8 $(DEST)/man8
$(DEST)/man8/ldc.8: ldc.8
install -c -o root ldc.8 $(DEST)/man8
$(DEST)/man1/ldrm.1: ldrm.1
install -c -o root ldrm.1 $(DEST)/man1
$(DEST)/man1/ldq.1: ldq.1
install -c -o root ldq.1 $(DEST)/man1
clean:
@//E*O*F man/Makefile//
chmod u=r,g=r,o=r man/Makefile
echo x - man/ldc.8
sed 's/^@//' > "man/ldc.8" <<'@//E*O*F man/ldc.8//'
@.TH LDC 8 "24 January 1985"
@.UC 4
@.ad
@.SH NAME
ldc \- load system control program
@.SH SYNOPSIS
@.B /etc/ldc
[ command [ argument ... ] ]
@.SH DESCRIPTION
@.I Ldc
is used by the system administrator to control the
operation of the load control system, by sending commands to
@.I ldd
(the load control server daemon).
@.I Ldc
may be used to:
@.IP \(bu
list all the queued jobs owned by a single user,
@.IP \(bu
list all the jobs in the queue,
@.IP \(bu
list the current settings of changeable load control server parameters,
@.IP \(bu
abort the load control server,
@.IP \(bu
delete a job from the queue (specified by pid or by user name),
@.IP \(bu
purge the queue of all jobs,
@.IP \(bu
rearrange the order of queued jobs,
@.IP \(bu
run a job regardless of the system load (specified by pid or user name),
@.IP \(bu
change the load average at which jobs will be queued,
@.IP \(bu
change the limit on the number of jobs in queue,
@.IP \(bu
change the number of seconds between each check on the load average,
@.IP \(bu
print the contents of the servers error logging file,
@.IP \(bu
change the maximum time limit that a job can be queued.
@.PP
Without any arguments,
@.I ldc
will prompt for commands from the standard input.
If arguments are supplied,
@.IR ldc
interprets the first argument as a command and the remaining
arguments as parameters to the command. The standard input
may be redirected causing
@.I ldc
to read commands from a file.
Commands may be abbreviated, as any unique prefix of a command will be
accepted.
The following is the list of recognized commands.
@.TP
? [ command ... ]
@.TP
help [ command ... ]
@.br
Print a short description of each command specified in the argument list,
or, if no arguments are given, a list of the recognized commands.
@.TP
abort server
@.br
Terminate the load control server.
This does
@.I not
terminate currently queued jobs, which will run when they
next poll the server (usually every 10 minutes).
If the server is restarted these jobs will be inserted into the queue ordered
by the time at which the job was started.
Jobs will
@.I not
be lost by aborting the server.
Both words "abort server" must by typed (or a unique prefix) as a safety
measure.
Only root can execute this command.
@.TP
delete [\f2pids\f1] [-u \f2users\f1]
@.br
This command has two modes. It will delete jobs listed by pid, or with the
@.B \-u
option delete all the jobs owned by the listed users.
Job that are removed from the queue will exit returning status 1 (they
do not run).
Users can only delete jobs they own from the queue, while root can delete any
job.
@.TP
errors
@.br
Print the contents of the load control server error logging file.
@.TP
list [\f2user\f1]
@.br
This will list the contents of the queue, showing each jobs rank, pid,
owner, time in queue, and a abbreviated line of the command to be executed
for the specified user. If no user is specifies, it defaults to be the
user running the command. (Same as the ldq command).
@.TP
loadlimit \f2value\f1
@.br
Changes the load average at which the load control system begins
to queue jobs to \f2value\f1.
Only root can execute this command.
@.TP
longlist
@.br
Same as list except prints ALL the jobs in the queue. This is expensive to
execute. (Same as the ldq -a command).
@.TP
move \f2pid rank\f1
@.br
Moves the process specified by process id
@.I pid
to position
@.I rank
in the queue.
Only root can execute this command.
@.TP
purge all
@.br
Removes ALL the jobs from the queue. Removed jobs terminate returning a
status of 1.
As a safety measure both the words "purge all" (or a prefix of) must be typed.
Only root can execute this command.
@.TP
quit
@.br
Exit from ldc.
@.TP
run [\f2pids\f1] [-u \f2users\f1]
@.br
Forces the jobs with the listed
@.I pids
to be run
@.I regardless
of the system load.
The
@.B \-u
option forces all jobs owned by the listed users to be run regardless
of the system load.
Only root can execute this command.
@.TP
sizeset \f2size\f1
@.br
Sets the limit on the number of jobs that can be in the queue to be
@.I size.
This prevents the unix system process table from running out of slots if
the system is extremely overloaded. All job requests that are made while
the queue is at the limit are rejected and told to try again later.
The default value is 150 jobs.
Only root can execute this command.
@.TP
status
@.br
Prints the current settings of internal load control server variables.
This includes the number of jobs in queue, the load average above which
jobs are queued, the limit on the size of the queue, the time in seconds between
load average checks by the server, the maximum time in seconds a job can be
queued, and the number of recoverable errors detected by the server.
@.TP
timerset \f2time\f1
@.br
Sets the number of seconds that the server waits between system load average
checks to
@.I time.
(Every
@.I time
seconds the server reads the current load average and if it below the load
average limit (see
@.I loadlimit
) the jobs are removed from the front of the queue and told to run).
Only root can execute this command.
@.TP
waitset \f2time\f1
@.br
Sets the maximum number of seconds that a job can be queued regardless
of the system load to
@.I time
seconds.
This will prevent the load control system from backing up with jobs that never
run do to some kind of degenerate condition.
@.SH EXAMPLES
To list the jobs owned by user joe:
@.sp
list joe
@.sp
To move process 45 to position 6 in the queue:
@.sp
move 45 6
@.sp
To delete all the jobs owned by users sam and joe:
@.sp
delete -u sam joe
@.sp
To run jobs with pids 1121, 1177, and 43:
@.sp
run 1121 1177 43
@.SH FILES
@.nf
/usr/spool/ldd/* spool directory where sockets are bound
@.fi
@.SH "SEE ALSO"
ldd(8),
ldrm(1),
ldq(1)
@.SH DIAGNOSTICS
@.nf
@.ta \w'?Ambiguous command 'u
?Ambiguous command abbreviation matches more than one command
?Invalid command no match was found
?Privileged command command can be executed by only by root
@.fi
@//E*O*F man/ldc.8//
chmod u=r,g=r,o=r man/ldc.8
echo x - man/ldd.8
sed 's/^@//' > "man/ldd.8" <<'@//E*O*F man/ldd.8//'
@.TH LDD 8 "24 January 1985"
@.UC 4
@.ad
@.SH NAME
ldd \- load system server (daemon)
@.SH SYNOPSIS
@.B /etc/ldd
[
@.B \-L
@.I load
] [
@.B \-T
@.I alarm
]
@.SH DESCRIPTION
@.TP
@.B \-L
changes the load average threshold to
@.I load
instead of the default (usually 10).
@.TP
@.B \-T
changes the time (in seconds)
between load average checks to
@.I alarm
seconds instead of the default (usually 60 seconds).
@.PP
@.I Ldd
is the load control server (daemon) and is normally invoked
at boot time from the
@.IR rc.local (8)
file.
The
@.I ldd
server attempts to maintain the system load average
below a preset value so interactive programs like
@.IR vi (1)
remain responsive.
@.I Ldd
works by preventing the system from thrashing
(i.e. excessive paging and high rates of context switches) and decreasing the
systems throughput by limiting the number runnable processes in the system
at a given moment.
When the system load average
is above the threshold,
@.I ldd
will block specific cpu intensive processes from running and place
them in a queue.
These blocked jobs are not runnable and therefore do not
contribute to the system load. When the load average drops below the threshold,
@.I ldd
will remove jobs from the queue and allow them to continue execution.
The system administration determines which programs are
considered cpu intensive and places control of their execution under the
@.I ldd
server.
The system load average is the number of runnable processes,
and is measured by the 1 minute
@.IR uptime (1)
statistics.
@.PP
A front end client program replaces each program controlled by the
@.I ldd
server.
Each time a user requests execution of a controlled program, the
client enters the request state,
sends a "request to run" datagram to the server and waits for a response. The
waiting client is blocked, waiting for the response from the
@.I ldd
server.
If the client does not receive an answer to a request after a certain
period of time has elapsed (usually 90 seconds), the request is resent.
If the request is resent a number of times (usually 3)
without response from the server, the requested program is executed.
This prevents the process from being blocked forever if the
@.I ldd
server fails.
@.PP
The
@.I ldd
server can send one of five different messages to the client.
A "queued message" indicates that the client has
been entered into the queue and should wait.
A "poll message" indicates that the server did not receive a message,
so the client should resend the message.
A "terminate message" indicates that the request cannot be honored
and the client should exit abnormally.
A "run message" indicates the requested program should be run.
A "full message" indicates that the ldd queue is full and this request cannot
be accepted. This limit is to prevent the Unix kernel process table from
running out of slots, since queued processes
still use system process slots.
@.PP
When the server receives a "request to run",
it determines whether the job should run immediately, be rejected,
or be queued.
If the queue is full, the job is rejected and the client exits.
If the queue is not empty, the request is added to the queue,
and the client is sent a "queued message"
The client then enters the queued state
and waits for another command from the server.
If no further commands are received from the server after a preset time
has elapsed (usually 10 minutes),
the client re-enters the request state and resends the request
to the server to ensure that the server has not terminated or
failed since the time the client was queued.
@.PP
If the queue is empty, the server checks the current load average, and
if it is below the threshold, the client is sent a "run message".
Otherwise the server queues the request, sends the client a "queued message",
and starts the interval timer.
The interval timer is bound to a handler that checks the system load every
few seconds (usually 60 seconds).
If the handler finds the current load average is below the threshold,
jobs are removed from the head of the queue and sent a "run message".
The number of jobs sent "run messages" depends on how much the current
load average has dropped below the limit.
If the load average is above the threshold, the handler checks
how long the oldest process has been waiting to run.
If that time is greater than a preset limit (usually 4 hours), the job is
removed from the queue and allowed to run regardless of the load.
This prevents jobs from being blocked forever due to load averages that
remain above the threshold for long periods of time.
If the queue becomes empty, the handler will shut off the interval timer.
@.PP
The
@.I ldd
server logs all recoverable and unrecoverable errors in a logfile. Advisory
locks are used to prevent more than one executing server at a time.
When the
@.I ldd
server first begins execution, it scans the spool directory for clients that
might have been queued from a previous
@.I ldd
server and sends them a "poll request".
Waiting clients will resend their "request to run" message to the new
server, and re-enter the request state.
The
@.I ldd
server will rebuild the queue of waiting tasks
ordered by the time each client began execution.
This allows the
@.I ldd
server to be terminated and be re-started without
loss or blockage of any waiting clients.
@.PP
The environment variable LOAD can be set to "quiet", which will
surpress the output to stderr of the status strings "queued"
and "running" for commands which have been set up to display status.
@.PP
Commands can be sent to the server with the
@.IR ldc (8)
control program. These commands can manipulate the queue and change the
values of the various preset limits used by the server.
@.SH FILES
@.nf
@.ta \w'/usr/spool/ldd/sr/msgsock 'u
/usr/spool/ldd ldd spool directory
/usr/spool/ldd/sr/msgsock name of server datagram socket
/usr/spool/ldd/sr/cnsock name do server socket or control messages
/usr/spool/ldd/sr/list list of queued jobs (not always up to date)
/usr/spool/ldd/sr/lock lock file (contains pid of server)
/usr/spool/ldd/sr/errors log file of server errors
@.fi
@.SH "SEE ALSO"
ldc(8),
ldq(1),
ldrm(1).
@//E*O*F man/ldd.8//
chmod u=r,g=r,o=r man/ldd.8
echo x - man/ldq.1
sed 's/^@//' > "man/ldq.1" <<'@//E*O*F man/ldq.1//'
@.TH LDQ 1 "24 January 1985"
@.UC 4
@.SH NAME
ldq \- load system queue listing program
@.SH SYNOPSIS
@.B ldq
[
@.I user
] [
@.B \-a
]
@.SH DESCRIPTION
@.I Ldq
is used to print the contents of the queue maintained by the
@.IR ldd (8)
server.
For each job selected by
@.I ldq
to be printed, the rank (position) in the queue, the process id, the owner of
the job, the number of seconds the job has been waiting to run, and the
command line of the job (truncated in length to the first 16 characters)
is printed.
@.PP
With no arguments,
@.I ldq
will print out the status of the jobs in the queue owned by the user running
@.I ldq.
Another users jobs can be printed if that user is specified as an argument
to
@.I ldq.
The
@.B \-a
option will print all the jobs in the queue.
Of course the
@.B \-a
option is much more expensive to run.
@.PP
Users can delete any job they own by using either the
@.IR ldrm (1)
or
@.IR ldc (8)
commands.
@.SH FILES
@.nf
@.ta \w'/usr/spool/ldd/cl/* 'u
/usr/spool/ldd/cl/* the spool area where sockets are bound
@.fi
@.SH "SEE ALSO"
ldrm(1),
ldc(8),
ldd(8)
@.SH DIAGNOSTICS
This command will fail if the
@.I ldd
server is not executing.
@//E*O*F man/ldq.1//
chmod u=r,g=r,o=r man/ldq.1
echo x - man/ldrm.1
sed 's/^@//' > "man/ldrm.1" <<'@//E*O*F man/ldrm.1//'
@.TH LDRM 1 "24 January 1985"
@.UC 4
@.SH NAME
ldrm \- remove jobs from the load system queue
@.SH SYNOPSIS
@.B ldrm
[
@.I pids
] [
@.B \-u
@.I users
]
@.SH DESCRIPTION
@.I Ldrm
will remove a job, or jobs, from the load control queue.
Since the server is protected, this and
@.IR ldc (8)
are the only ways users can remove jobs from the load control spool (other
than killing the waiting process directly).
When a job is removed, it will terminate returning status 1.
This method is preferred over sending a kill -KILL to the process as the
job will be removed from the queue, and will no longer appear in
lists produced by
@.IR ldq (1)
or
@.IR ldc (8).
@.PP
@.I Ldrm
can remove jobs specified either by pid or by user name.
With the
@.B \-u
flag,
@.I ldrm
expects a list of users who will have all their jobs removed from the
load control queue.
When given a list of pid's,
@.I ldrm
will remove those jobs from the queue.
A user can only remove jobs they own, while root can remove any job.
@.SH EXAMPLES
To remove the two jobs with pids 8144 and 47:
@.sp
ldrm 8144 47
@.sp
To remove all the jobs owned by the users joe and sam:
@.sp
ldrm -u joe sam
@.SH FILES
@.nf
@.ta \w'/usr/spool/ldd/cl/* 'u
/usr/spool/ldd/cl/* directory where sockets are bound
@.fi
@.SH "SEE ALSO"
ldq(1),
ldc(8),
ldd(8)
@.SH DIAGNOSTICS
``Permission denied" if the user tries to remove files other than his
own.
@//E*O*F man/ldrm.1//
chmod u=r,g=r,o=r man/ldrm.1
echo Inspecting for damage in transit...
temp=/tmp/shar$$; dtemp=/tmp/.shar$$
trap "rm -f $temp $dtemp; exit" 0 1 2 3 15
cat > $temp <<\!!!
182 1518 9101 README
14 96 613 NOTICE
25 76 502 Makefile
27 52 439 Makefile
215 1075 5877 ldc.8
168 1045 6106 ldd.8
55 221 1145 ldq.1
59 261 1362 ldrm.1
745 4344 25145 total
!!!
wc README NOTICE Makefile man/Makefile man/ldc.8 man/ldd.8 man/ldq.1 man/ldrm.1 | sed 's=[^ ]*/==' | diff -b $temp - >$dtemp
if [ -s $dtemp ]
then echo "Ouch [diff of wc output]:" ; cat $dtemp
else echo "No problems found."
fi
exit 0muller@sdcc3.UUCP (Keith Muller) (02/12/85)
This is part 6 of the the load control system. Part 1 must be unpacked before
any other part.
Keith Muller
ucbvax!sdcsvax!muller
# This is a shell archive. Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by sdcc3!muller on Sat Feb 9 13:56:47 PST 1985
# Contents: server/Makefile server/data.c server/globals.c server/main.c
echo x - server/Makefile
sed 's/^@//' > "server/Makefile" <<'@//E*O*F server/Makefile//'
#
# Makefile for batch server
#
CFLAGS= -O
BGID= lddgrp
DEST= /etc
HDR= ../h/common.h ../h/server.h
SRC= main.c data.c globals.c setup.c commands.c
OBJ= main.o data.o globals.o setup.o commands.o
all: ldd
ldd: $(OBJ)
cc -o ldd $(OBJ)
$(OBJ): $(HDR)
install: $(DEST)/ldd
$(DEST)/ldd: ldd
install -c -m 700 -o root -g $(BGID) ldd $(DEST)
clean:
rm -f $(OBJ) core ldd
lint:
lint -abchx $(SRC)
@//E*O*F server/Makefile//
chmod u=r,g=r,o=r server/Makefile
echo x - server/data.c
sed 's/^@//' > "server/data.c" <<'@//E*O*F server/data.c//'
/*-------------------------------------------------------------------------
* data.c - server
*
* routines that deal with the data structures maintained by the server.
* the server uses a double linked list with qhead pointing at the head
* and qtail pointing at the tail. if the queue is not empty then
* qhead->back is always QNIL and qtail->fow is always QNIL. Insertions
* also require that the time field increase (older to younger) from qhead
* to qtail.
*
* NOTE: that when nodes are added to the free list only the fow
* link is altered so procedures that search through the list with the
* intention of calling rmqueue must search from qtail to qhead because
* rmqueue will destroy the nodes fow link.
*-------------------------------------------------------------------------
*/
/* $Log$ */
#include "../h/common.h"
#include "../h/server.h"
extern struct qnode *qhead;
extern struct qnode *qtail;
extern struct qnode *freequeue;
extern int qcount;
extern int newlist;
extern int newstatus;
/*------------------------------------------------------------------------
* rmqueue
*
* remove the node pointed at by work from the double linked list.
*------------------------------------------------------------------------
*/
rmqueue(work)
struct qnode *work;
{
/*
* set flags to indicate the list and status files are out of date
*/
newlist = 1;
newstatus = 1;
qcount--;
/*
* splice the job out of the queue
*/
if (work->back == QNIL)
qhead = work->fow;
if (work->fow == QNIL)
qtail = work->back;
if (work->fow != QNIL)
(work->fow)->back = work->back;
if (work->back != QNIL)
(work->back)->fow = work->fow;
work->fow = freequeue;
freequeue = work;
}
/*-------------------------------------------------------------------------
* addqueue
*
* add a node to the queue if it is not already in it.
* note that when clients poll the server to see if it is still alive they
* send another "queue" command. This is why addqueue must
* check if the job is still queued.
*-------------------------------------------------------------------------
*/
addqueue(work)
struct request *work;
{
register struct qnode *spot;
register struct qnode *spot2;
register struct qnode *ptr;
extern int full;
extern char *malloc();
extern char *strcpy();
/*
* find the place in the queue for this request. The
* time field is used for this oldest requests belong closer
* to the head of the queue.
*/
for (spot = qtail; spot != QNIL; spot = spot->back){
/*
* it might be already in the queue as a client
* is just polling the server to see if the server is
* still alive
*/
if (spot->pid == work->pid)
return(1);
/*
* check to see if this job is older
*/
if (work->time > spot->time)
break;
}
/*
* At this point, job is not in the queue at the correct point.
* either is a new job or a client checking to see if server is
* alive. If this is a check, look for job higher up in the queue.
*/
if (work->type != POLLCMD){
/*
* at this point the node is a new one, reject if the
* queue is full.
*/
if (qcount >= full)
return(-2);
}else if (spot != QNIL){
/*
* this job is just checking up to see if it is still
* queued.
*/
for (spot2 = spot->back; spot2 != QNIL; spot2 = spot2->back){
/*
* job must have been moved
*/
if (spot2->pid == work->pid)
return(1);
}
/*
* at this point the job is missing. it should have
* been in the queue. so put it back.
*/
}
/*
* allocate space for qnode, check freelist first
*/
if (freequeue == QNIL)
ptr = (struct qnode *)malloc(sizeof(struct qnode));
else{
ptr = freequeue;
freequeue = ptr->fow;
}
if (ptr == QNIL){
errlog("no space for a qnode");
return(-1);
}
/*
* copy in the data from the datagram
*/
ptr->pid = work->pid;
ptr->uid = work->uid;
ptr->time = work->time;
(void)strcpy(ptr->com, work->com);
/*
* special case if queue was empty
*/
if (qcount == 0){
if ((qhead != QNIL) || (qtail != QNIL)){
errlog("Addqueue: qcount should not be 0");
cleanup();
}
qhead = qtail = ptr;
ptr->fow = ptr->back = QNIL;
newlist = 1;
newstatus = 1;
qcount = 1;
return(0);
}
/*
* do two integrity checks, yes we are paranoid
*/
if (qhead == QNIL){
errlog("Addqueue: qhead should not be QNIL");
cleanup();
}
if (qtail == QNIL){
errlog("Addqueue: qtail should not be QNIL");
cleanup();
}
/*
* if spot == qhead, belongs at very beginning of queue
*/
if (spot == QNIL){
qhead->back = ptr;
ptr->fow = qhead;
ptr->back = QNIL;
qhead = ptr;
}else{
/*
* insert into the queue
*/
ptr->fow = spot->fow;
ptr->back = spot;
if (spot->fow != QNIL)
(spot->fow)->back = ptr;
else
qtail = ptr;
spot->fow = ptr;
}
/*
* change newlist to show queue has changed
*/
newlist = 1;
newstatus = 1;
qcount++;
return(1);
}
/*-------------------------------------------------------------------------
* movequeue
*
* move the job pid to posistion pos in the queue. Note to maintain
* insertion date requirements, the time field in the moved job is
* altered.
*-------------------------------------------------------------------------
*/
movequeue(pos,pid)
u_long pos;
u_long pid;
{
register struct qnode *ptr;
register struct qnode *work;
extern int qcount;
work = QNIL;
for (ptr = qhead; ptr != QNIL; ptr = ptr->fow){
/*
* look for the requested node, set work to point
*/
if (ptr->pid == pid){
work = ptr;
break;
}
}
/*
* if not found return -1 as no such pid, or return 0
* if only one job queued
*/
if (work == QNIL)
return(-1);
if (qcount == 1)
return(0);
/*
* set ptr to point a position to move work to
* note: first position in queue is 1 (not 0).
*/
for (ptr = qhead; ((ptr != QNIL) && (pos > 1)); ptr = ptr->fow){
if (ptr != work)
/*
* must be moving the job to a lower position
* in the queue. So cannot count self.
*/
pos--;
}
/*
* if it is already at the requested position, or the pos is
* after the last node and the pid IS the last node, return
*/
if ((ptr == work) || ((ptr == QNIL) && (qtail == work)))
return(0);
newlist = 1;
/*
* splice the node out of the queue
*/
if (work->fow != QNIL)
(work->fow)->back = work->back;
if (work->back != QNIL)
(work->back)->fow = work->fow;
if (qtail == work)
qtail = work->back;
if (qhead == work)
qhead = work->fow;
/*
* splice the node into the new position.
*/
if (ptr == QNIL){
/*
* put at the end of the queue
*/
work->back = qtail;
work->fow = QNIL;
work->time = qtail->time + 1;
qtail->fow = work;
qtail = work;
}else{
/*
* belongs in the queue as ptr points at a node
*/
work->fow = ptr;
work->back = ptr->back;
/*
* see if the pid is being put at the head of the list
*/
if (ptr->back != QNIL){
(ptr->back)->fow = work;
work->time = ptr->time-((ptr->time-(ptr->back)->time)/2);
}else{
qhead = work;
work->time = ptr->time - 1;
}
ptr->back = work;
}
return(0);
}
@//E*O*F server/data.c//
chmod u=r,g=r,o=r server/data.c
echo x - server/globals.c
sed 's/^@//' > "server/globals.c" <<'@//E*O*F server/globals.c//'
/*-------------------------------------------------------------------------
* globals.c - server
*
* allocation of the variables that are global to the server.
*-------------------------------------------------------------------------
*/
/* $Log$ */
#include "../h/common.h"
#include "../h/server.h"
#include <sys/uio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/time.h>
#include <stdio.h>
int kmem = -1; /* file desc for kmem to get load */
int cntrlsock = -1; /* socket desc for control messages*/
int msgsock = -1; /* socket for queue requests */
int qcount = 0; /* count job in the queue */
int newlist = 1; /* 1 when queue is new than last list*/
int newstatus = 1; /* 1 when status variable are changed*/
int errorcount = 0; /* count of number of recovered error*/
int timerstop = 1; /* when when timer stopped, 0 runs */
u_long mqtime = MAXQTIME; /* max time a job can be in queue */
int descsize = 0; /* desc table size for select */
long loadaddr = 0; /* address of load aver in kmem */
int alrmmask = 0; /* mask for blocking SIGALRM */
int full = MAXINQUEUE; /* max number of jobs waiting to run */
FILE *errfile; /* file where errors are logged */
struct qnode *qhead = QNIL; /* points at queue head */
struct qnode *qtail = QNIL; /* points at queue tail */
struct qnode *freequeue = QNIL; /* pointer to local freelist of qnode*/
struct itimerval startalrm = {{ALRMTIME,0},{ALRMTIME,0}}; /* alrm time */
struct itimerval stopalrm = {{0,0},{0,0}}; /* value used to stop timer */
struct timeval polltime = {WAITTIME,0}; /* wait time during poll */
#ifdef sun
long loadlevel = (long)(MAXLOAD*256); /* load at which queueing starts */
#else
double loadlevel = MAXLOAD; /* load at which queueing starts */
#endif
@//E*O*F server/globals.c//
chmod u=r,g=r,o=r server/globals.c
echo x - server/main.c
sed 's/^@//' > "server/main.c" <<'@//E*O*F server/main.c//'
/*-------------------------------------------------------------------------
* main.c - server
*
* The server takes requests from client processes and the control
* program, and performs various operations. The servers major task is
* to attempt to maintain the systems load average close to a set limit
* loadlevel. Client processes are kept in a queue and are waiting for a
* command from the server (to run or abort). The server reads /dev/kmem
* every ALRMTIME seconds checking to see if the load level has dropped
* below the required loadlevel. If the queue is empty the timer is turned
* off. While the timer is off, the server will only read /dev/kmem at the
* receipt of a request to run from a client program.
*
* The server was designed to be as fault tolerant as possible and maintains
* an errorfile of detectable errors. The server can safely be aborted and
* restarted without deadlocking the clients. The server when restarted
* will rebuild the queue of waiting processes to the state that exsisted
* before the prvious server exited. The entire system was designed to allow
* execution of user programs (even those under load control) even if the
* server is not functioning properly! (user jobs will ALWAYS run, the system
* will never hang).
*
* The effectiveness of the system depends on what fraction of the programs
* that are causing the system overload are maintained under this system.
* Processes can only remain in queue a maximium of "mqtime" seconds
* REGARDLESS of the loadlevel setting. This was done in case the programs
* that are keeping the systems loadlevel above the threshold are not
* controlled by the server! So eventually all jobs will run.
*
* The control program allows users to remove their jobs from the queue and
* allows root to adjust the operating parameters of the server while the
* server is running.
*
* All the programs and routines are commented and warnings about certain
* sections of code are given when the code might be vague.
*
* This system has ONLY BEEN RUN ON 4.2 UNIX (sun, vax and pyramid) and uses
* datagrams in the AF_UNIX domain. (which seems to be extremely reliable).
*
* Author: Keith Muller
* University of California, San Diego
* Academic Computer Center C - 010
* La Jolla, Ca 92093
* (ucbvax!sdcsvax!sdcc3!muller)
* (619) 452-6090
*-------------------------------------------------------------------------
*/
/* $Log$ */
#include "../h/common.h"
#include "../h/server.h"
#include <sys/time.h>
#include <sys/file.h>
#include <stdio.h>
#include <errno.h>
/*--------------------------------------------------------------------------
* main
*
*--------------------------------------------------------------------------
*/
main(argc, argv)
int argc;
char **argv;
{
register int msgmask;
register int cntrlmask;
int numfds;
int readfds;
int readmask;
extern int msgsock;
extern int cntrlsock;
extern int descsize;
extern int errno;
/*
* check the command line args
*/
doargs(argc, argv);
/*
* setup the server
*/
setup();
/*
* create all the sockets
*/
crsock();
/*
* scan the spool for waiting clients and send them a POLLCMD
*/
scanspool();
/*
* create the bit mask used by select to determine which descriptors
* are checked for available input ( datagrams).
*/
msgmask = 1 << msgsock;
cntrlmask = 1 << cntrlsock;
readmask = msgmask | cntrlmask;
/*
* do this forever
*/
for(;;){
readfds = readmask;
/*
* wait for a datagram to arrive
*/
numfds = select(descsize,&readfds,(int *)0,(int *)0,(struct timeval *)0);
if ((numfds < 0) && (errno != EINTR)){
errlog("select error");
cleanup();
}
/*
* if the interval timer interrupted us, go back to the select
*/
if (numfds <= 0)
continue;
/*
* WARNING! note that BOTH SOCKETS are always checked
* when the select indicates at least one datagram is waiting.
* This was done to prevent a situation where one socket
* "locks" out the other if it is subject to high traffic!
*/
/*
* first check to see if there is a control message
*/
if (readfds & cntrlmask)
cntrldis();
/*
* now see if there is a queue message
*/
if (readfds & msgmask)
msgdis();
}
}
/*--------------------------------------------------------------------------
* onalrm
*
* handler for the SIGALRM sent by the interval timer. This routine checks
* the queue to see if there is any jobs that can be run. The two conditions
* for running a job is that the load on the machine is below loadlimit or
* the oldest job in the queue has exceed the maximium queue time and should
* be run regardless of the load.
*--------------------------------------------------------------------------
*/
onalrm()
{
register int count;
struct timezone zone;
struct timeval now;
struct itimerval oldalrm;
extern struct itimerval stopalrm;
extern struct qnode *qhead;
extern u_long mqtime;
extern int qcount;
extern int timerstop;
extern int newstatus;
/*
* if the load average is below the limit run as many jobs as
* possable to bring the load up to the loadlimit.
* this could cause an overshoot of the loadlimit, but in most
* cases this overshoot will be small. This prevents excessive
* waiting of jobs due to momentary load peaks.
*/
if ((count = getrun()) != 0){
while ((count > 0) && (qcount > 0)){
/*
* only decrement count if there was really
* a waiting client (the client could be dead)
*/
if (outmsg(qhead->pid, RUNCMD) == 0)
count--;
rmqueue(qhead);
}
}else if (qcount > 0){
/*
* load is too high to run a job, check if oldest can be run
*/
if (gettimeofday(&now, &zone) < 0){
errlog("onalrm cannot get time");
return;
}
while ((qcount>0)&&(((u_long)now.tv_sec - qhead->time)>mqtime)){
/*
* determined oldest job can run. if job is
* dead try next one
*/
if (outmsg(qhead->pid, RUNCMD) == 0){
rmqueue(qhead);
break;
}else
rmqueue(qhead);
}
}
/*
* if the queue is not empty or the interval timer is stopped
* then return
*/
if ((qcount != 0) || (timerstop == 1))
return;
/*
* otherwise stop the timer
*/
if (setitimer(ITIMER_REAL,&stopalrm, &oldalrm) < 0)
errlog("stop timer error");
else{
timerstop = 1;
newstatus = 1;
}
}
/*-------------------------------------------------------------------------
* getrun
*
* determines how many jobs can be run after obtaining current 1 minute
* load average. since the load obtained from kmeme is an average, this
* should provide some hysteresis so the server doesn't thrash around
*-------------------------------------------------------------------------
*/
getrun()
{
extern int qcount;
extern int kmem;
extern long loadaddr;
#ifdef sun
long load;
long run;
extern long loadlevel;
#else
double load;
double run;
extern double loadlevel;
#endif sun
extern long lseek();
/*
* seek out into kmem (yuck!!!)
*/
if (lseek(kmem, loadaddr, L_SET) == -1){
errlog("lseek error");
cleanup();
}
/*
* read the load
*/
if (read(kmem, (char *)&load, sizeof(load)) < 0){
errlog("kmem read error");
cleanup();
}
/*
* calculate the number of jobs that can run
* (will always overshoot by the fraction)
*/
if ((run = loadlevel - load) > 0){
#ifdef sun
/*
* sun encodes the load average in a long. It is the
* load average * 256
*/
return(1 + (int)(run >> 8));
#else
return(1 + (int)run);
#endif
}else
return(0);
}
/*------------------------------------------------------------------------
* errlog
*
* log the erros into a log. should be small number (hopefully zero!!)
*------------------------------------------------------------------------
*/
errlog (mess)
char *mess;
{
struct timeval now;
struct timezone zone;
extern char *ctime();
extern int errorcount;
extern int errno;
extern int sys_nerr;
extern char *sys_errlist[];
extern FILE *errfile;
/*
* increase the errorcount
*/
errorcount = errorcount + 1;
/*
* if called with an arg, print it first
*/
if (mess != (char *)0)
fprintf(errfile,"%s: ", mess);
/*
* if a valid error print the human message
*/
if ((errno > 0) && (errno < sys_nerr))
fprintf(errfile," %s ", sys_errlist[errno]);
/*
* stamp the time of occurance
*/
if (gettimeofday(&now, &zone) < 0)
fprintf(errfile,"errlog cannot get time of day\n");
else
fprintf(errfile,"%s", ctime(&(now.tv_sec)));
(void)fflush(errfile);
}
/*-------------------------------------------------------------------------
* cleanup
*
* the whole system fell apart. close down the sockets log the server
* termination and exit.
*-------------------------------------------------------------------------
*/
cleanup()
{
extern int msgsock;
extern int cntrlsock;
extern int errno;
extern FILE *errfile;
(void)close(msgsock);
(void)close(cntrlsock);
(void)unlink(MSGPATH);
(void)unlink(CNTRLPATH);
errno = 0;
errlog("Server aborting at");
(void)fclose(errfile);
exit(1);
}
@//E*O*F server/main.c//
chmod u=r,g=r,o=r server/main.c
echo Inspecting for damage in transit...
temp=/tmp/shar$$; dtemp=/tmp/.shar$$
trap "rm -f $temp $dtemp; exit" 0 1 2 3 15
cat > $temp <<\!!!
33 62 411 Makefile
311 1144 7097 data.c
44 288 1782 globals.c
355 1341 9080 main.c
743 2835 18370 total
!!!
wc server/Makefile server/data.c server/globals.c server/main.c | sed 's=[^ ]*/==' | diff -b $temp - >$dtemp
if [ -s $dtemp ]
then echo "Ouch [diff of wc output]:" ; cat $dtemp
else echo "No problems found."
fi
exit 0muller@sdcc3.UUCP (Keith Muller) (02/12/85)
This is part 7 of the load control system. Part 1 must be unpacked before any
other part.
Keith Muller
ucbvax!sdcsvax!muller
# This is a shell archive. Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by sdcc3!muller on Sat Feb 9 13:58:16 PST 1985
# Contents: server/commands.c
echo x - server/commands.c
sed 's/^@//' > "server/commands.c" <<'@//E*O*F server/commands.c//'
/*------------------------------------------------------------------------
* commands.c - server
*
* Commands that can be executed by the server in response to client
* datagrams
*------------------------------------------------------------------------
*/
/* $Log$ */
#include "../h/common.h"
#include "../h/server.h"
#include <sys/file.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/uio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/time.h>
#include <stdio.h>
#include <signal.h>
#include <errno.h>
/*-----------------------------------------------------------------------
* cntrldis
*
* cntrldis reads the datagram on control socket port. Then call the
* appropriate routine as encoded in the datagrams type field.
* NOTE:
* The control program that sent the datagram ALWAYS waits for an
* indication that the datagram was processed. Each routine is
* required to send an indication to the control program that the
* request was processed.
*-----------------------------------------------------------------------
*/
cntrldis()
{
struct request work; /* datagram space */
int oldmask; /* old value of signal mask */
int fromlen = 0;
extern int cntrlsock;
extern int alrmmask;
extern int newstatus;
#ifdef sun
extern long loadlevel;
#else
extern double loadlevel;
#endif sun
extern int full;
extern int errno;
extern u_long mqtime;
extern struct qnode *qhead;
/*
* BLOCK OFF SIGALRM as the called routines modify the
* internal data structures and cannot be interrupted
* by the interval timer. That would corrupt the linked
* lists.
*/
oldmask = sigblock(alrmmask);
if (recvfrom(cntrlsock,&work,sizeof(struct request),0,(struct sockaddr *)0,&fromlen) <= 0){
if (errno != EINTR)
errlog("error in cntrldis recv");
(void)sigsetmask(oldmask);
return;
}
/*
* dispatch on type of request
*/
switch(work.type){
case RJOBCMD:
/*
* run a job by pid in the queue
* (privledged command)
*/
runjob(&work);
break;
case RUSRCMD:
/*
* remove all a users jobs in the queue
* (privledged command)
*/
runusr(&work);
break;
case PJOBCMD:
/*
* a request to remove a job from the queue
* (also can be called from msgdis())
*/
prjob(&work, cntrlsock);
break;
case PUSRCMD:
/*
* remove all a users jobs in the queue
*/
prusr(&work);
break;
case PALLCMD:
/*
* purge the ENTIRE queue
* (privledged command)
*/
prall(&work);
break;
case ABORTCMD:
/*
* make sure socket is owned by root, otherwise
* reject it
*/
if (chksock(CNTRLPRE, work.pid, 0) == 0){
(void)outcntrl(work.pid, STOPCMD);
return;
}
/*
* tell the server to terminate
* (privledged command)
*/
(void)outcntrl(work.pid, RUNCMD);
cleanup();
break;
case MOVECMD:
/*
* make sure socket is owned by root, otherwise
* reject it
*/
if (chksock(CNTRLPRE, work.pid, 0) == 0){
(void)outcntrl(work.pid, STOPCMD);
return;
}
/*
* move a process in the queue
* (privledged command)
*/
if (movequeue(work.time, work.uid) == 0)
(void)outcntrl(work.pid, RUNCMD);
else
(void)outcntrl(work.pid, STOPCMD);
break;
case LOADLIMCMD:
/*
* make sure socket is owned by root, otherwise
* reject it
*/
if (chksock(CNTRLPRE, work.pid, 0) == 0){
(void)outcntrl(work.pid, STOPCMD);
return;
}
/*
* change the load level queueing starts
* (privledged command)
*/
#ifdef sun
loadlevel = (long)work.time;
#else
loadlevel = ((double)work.time)/256.0;
#endif sun
newstatus = 1;
(void)outcntrl(work.pid, RUNCMD);
break;
case STATUSCMD:
/*
* update the status file if necessary
*/
status(&work);
break;
case LISTCMD:
/*
* update the queue list file if necessary
*/
list(&work);
break;
case MQTIMECMD:
/*
* make sure socket is owned by root, otherwise
* reject it
*/
if (chksock(CNTRLPRE, work.pid, 0) == 0){
(void)outcntrl(work.pid, STOPCMD);
return;
}
/*
* change the maximium time a job can wait
* (privledged command)
*/
mqtime = work.time;
newstatus = 1;
(void)outcntrl(work.pid, RUNCMD);
break;
case QUEUESIZE:
/*
* make sure socket is owned by root, otherwise
* reject it
*/
if (chksock(CNTRLPRE, work.pid, 0) == 0){
(void)outcntrl(work.pid, STOPCMD);
return;
}
/*
* change the maximium size limit on the
* queue of waiting jobs
* (privledged command)
*/
full = (int)work.time;
newstatus = 1;
(void)outcntrl(work.pid, RUNCMD);
break;
case CHTIMER:
/*
* change interval when load level checked
* (privledged command)
*/
chtimer(&work);
break;
default:
errno = 0;
errlog("cntrldis bad command");
(void)outcntrl(work.pid, STOPCMD);
break;
}
/*
* UNBLOCK SIGALRM so interval timer can check load
* to dispatch a job.
*/
(void)sigsetmask(oldmask);
}
/*-----------------------------------------------------------------------
* msgdis
*
* msgdis reads the datagram on msg socket port. Then calls the
* appropriate routine as encoded in the datagrams type field.
* NOTE:
* The client that sent the datagram ALWAYS waits for an indication
* that the datagram was processed. Each routine is required to send
* an indication to the client that the request has been processed.
*-----------------------------------------------------------------------
*/
msgdis()
{
struct request work; /* datagram space */
int oldmask; /* old value of signal mask */
int fromlen = 0;
extern int msgsock;
extern int alrmmask;
extern int errno;
/*
* BLOCK OFF SIGALRM as the called routines modify the
* internal data structures and cannot be interrupted
* by the iterval timer. That would corrupt the linked
* lists.
*/
oldmask = sigblock(alrmmask);
if (recvfrom(msgsock,&work,sizeof(struct request),0,(struct sockaddr *)0,&fromlen) <= 0){
if (errno != EINTR)
errlog("error in msgdis recv");
(void)sigsetmask(oldmask);
return;
}
/*
* dispatch on type of request
*/
if (work.type == POLLCMD){
/*
* a client making sure he is in the queue
* same as a QCMD, but addjob handles them differently.
*/
addjob(&work);
}else if (work.type == QCMD){
/*
* a request to queue a process
*/
addjob(&work);
}else if (work.type == PJOBCMD){
/*
* a request to remove a job from the queue
* should only be from a terminating client
* (also called from cntrldis())
*/
prjob(&work, msgsock);
}else{
errno = 0;
errlog("msgdis bad command");
(void)outmsg(work.pid, STOPCMD);
}
/*
* UNBLOCK SIGALRM so interval timer can check load
* to dispatch a job.
*/
(void)sigsetmask(oldmask);
}
/*-------------------------------------------------------------------------
* addjob
*
* check a job request to be queued. the request is in datagram work.
* jobs are only added if the load is above the set loadlimit threshold,
* otherwise they are told to run.
* If the queue is full, then the job is rejected.
*-------------------------------------------------------------------------
*/
addjob(work)
struct request *work;
{
struct itimerval oldalrm;
extern int full;
extern int qcount;
extern struct itimerval startalrm;
extern int addqueue();
extern int timerstop;
extern struct qnode *qhead;
/*
* if the queue is empty and the load is below the
* limit, just run the job.
*/
if ((qcount == 0) && (getrun() > 0)){
(void)outmsg(work->pid, RUNCMD);
return;
}
switch (addqueue(work)){
case 0:
/*
* queue was empty, turn the timer back on
*/
if (setitimer(ITIMER_REAL,&startalrm, &oldalrm)<0){
errlog("start timer error");
exit(1);
}
timerstop = 0;
/*
* fall through to case 1 below, and send queued
* message
*/
case 1:
/*
* job is in queue, all is ok
*/
(void)outmsg(work->pid, QCMD);
break;
case -1:
/*
* addqueue failed see if we can free up a space by
* telling oldest job to run.
*/
if (qcount > 0){
(void)outmsg(qhead->pid, RUNCMD);
(void)rmqueue(qhead);
(void)addqueue(work);
(void)outmsg(work->pid, QCMD);
}else{
(void)outmsg(work->pid, RUNCMD);
}
break;
case -2:
/*
* this is a new job and the queue is full.
* Reject the job.
*/
(void)outmsg(work->pid, FULLQUEUE);
break;
default:
/*
* bad return from addqueue()
*/
errlog("addqueue returned bad value");
exit(1);
break;
}
}
/*-------------------------------------------------------------------------
* chksock
*
* make sure that the bound socket is owned by the proper person. This is only
* checked for control messages (which are very infrequent).
*-------------------------------------------------------------------------
*/
chksock(prefix, jpid, juid)
char *prefix;
u_long jpid;
int juid;
{
char namebuf[64];
struct stat statbuf;
extern char *sprintf();
(void)sprintf(namebuf, "%s%u", prefix, jpid);
if (stat(namebuf, &statbuf) != 0)
return(0);
if ((unsigned int)statbuf.st_uid != juid)
return(0);
return(1);
}
/*-------------------------------------------------------------------------
* chtimer
*
* change the interval timer. The interval timer is used to force the server
* to check the queue every n seconds to see if the load is low enough to
* let some jobs run.
*-------------------------------------------------------------------------
*/
chtimer(work)
struct request *work;
{
struct itimerval oldalrm;
extern struct itimerval startalrm;
extern int timerstop;
extern int newstatus;
extern int outmsg();
/*
* make sure that this is from a socket owned by root, otherwise
* reject it
*/
if (chksock(CNTRLPRE, work->pid, 0) == 0){
(void)outcntrl(work->pid, STOPCMD);
return;
}
startalrm.it_interval.tv_sec = work->time;
startalrm.it_value.tv_sec = work->time;
newstatus = 1;
/*
* if the timer is already stopped, just leave.
*/
if (timerstop == 1){
(void)outcntrl(work->pid, RUNCMD);
return;
}
/*
* restart the timer with the new interval
*/
if (setitimer(ITIMER_REAL,&startalrm, &oldalrm) < 0){
errlog("start timer error");
(void)outcntrl(work->pid, STOPCMD);
cleanup();
}
/*
* tell the client the command was sucessful
*/
(void)outcntrl(work->pid, RUNCMD);
}
/*------------------------------------------------------------------------
* list
*
* if necessary update the list file with the current queue status
* then tell the client that the list file is up to date.
* The data is stored in a file to avoid any chance the server could block
* writing to a stopped control program.
*
* NOTE:
* The users uids are NOT looked up it the passwd file. That must be done
* by the programs that read the list file. Looking things up in the passwd
* file are real expensive (even with dbm hashing) and this cannot be
* afforded.
*------------------------------------------------------------------------
*/
list(work)
struct request *work;
{
register struct qnode *ptr; /* pointer to walk through queue */
FILE *out; /* file where list will be written */
extern int newlist;
extern int qcount;
extern struct qnode *qhead;
extern FILE *fopen();
/*
* if the queue is the same as the last time it was listed in
* the file, just tell the client to read the file.
*/
if (newlist == 0){
(void)outcntrl(work->pid, RUNCMD);
return;
}
if ((out = fopen(LISTFILE, "w")) == NULL){
errlog("list cannot open LISTFILE");
(void)outcntrl(work->pid, STOPCMD);
return;
}
/*
* write out the number of waiting clients
*/
fprintf(out, "%d\n", qcount);
/*
* write each queue entry
*/
for (ptr = qhead; ptr != QNIL; ptr = ptr->fow)
fprintf(out,"%u %u %u %s\n",ptr->uid,ptr->pid,ptr->time,ptr->com);
(void)fclose(out);
/*
* set the flag to indicate that the list was updated.
*/
newlist = 0;
/*
* tell the client to read the file
*/
(void)outcntrl(work->pid, RUNCMD);
}
/*-----------------------------------------------------------------------
* outcntrl
*
* send the indicated message to the waiting control program who pid is
* "pid". control sockets are always the CNTRLPRE followed by the pid.
*-----------------------------------------------------------------------
*/
outcntrl(pid, cmd)
u_long pid;
char cmd;
{
int len; /* the size of the datagram header */
struct sockaddr_un name; /* datagram recipient */
extern int cntrlsock;
extern char *sprintf();
extern int errno;
/*
* set up the address of the target of the message
*/
name.sun_family = AF_UNIX;
(void)sprintf(name.sun_path, "%s%u", CNTRLPRE, pid);
len = strlen(name.sun_path) + sizeof(name.sun_family) + 1;
if (sendto(cntrlsock, &cmd, sizeof(cmd), 0, &name, len) >= 0)
return(0);
/*
* If this point is reached:
*
* The sendto FAILED, either control died and left the old socket
* entry in the filesystem (so remove it) or terminated and
* cleaned up the old socket entry.
*/
if ((errno == ENOTSOCK) || (errno == ECONNREFUSED) || (errno == ENOENT)
|| (errno == EPROTOTYPE))
(void)unlink(name.sun_path);
else
errlog("outcntrl sendto failed");
return(1);
}
/*-----------------------------------------------------------------------
* outmsg
*
* send the indicated message to the waiting client who pid is "pid".
* clients sockets are always the CLIENTPRE followed by the clients pid.
*-----------------------------------------------------------------------
*/
outmsg(pid, cmd)
u_long pid;
char cmd;
{
int len; /* the size of the datagram header */
struct sockaddr_un name; /* datagram recipient */
extern int msgsock;
extern char *sprintf();
extern int errno;
/*
* set up the address of the target of the message
*/
name.sun_family = AF_UNIX;
(void)sprintf(name.sun_path, "%s%u",CLIENTPRE,pid);
len = strlen(name.sun_path) + sizeof(name.sun_family) + 1;
if (sendto(msgsock, &cmd, sizeof(cmd), 0, &name, len) >= 0)
return(0);
/*
* If this point is reached:
*
* The sendto FAILED, either client died and left the old socket
* entry in the filesystem (so remove it) or terminated and
* cleaned up the old socket entry.
*/
if ((errno == ENOTSOCK) || (errno == ECONNREFUSED) || (errno == ENOENT)
|| (errno == EPROTOTYPE))
(void)unlink(name.sun_path);
else
errlog("outmsg sendto failed");
return(1);
}
/*------------------------------------------------------------------------
* prall
*
* remove ALL the waiting tasks in the queue. The jobs are told to
* terminate.
*------------------------------------------------------------------------
*/
prall(work)
struct request *work;
{
register struct qnode *ptr;
extern struct qnode *qtail;
/*
* make sure control socket is owned by root
* otherwise reject it
*/
if (chksock(CNTRLPRE, work->pid, 0) == 0){
(void)outcntrl(work->pid, STOPCMD);
return;
}
for (ptr = qtail; ptr != QNIL; ptr= ptr->back){
(void)outmsg(ptr->pid, STOPCMD);
rmqueue(ptr);
}
/*
* tell the client the control program the queue is purged
*/
(void)outcntrl(work->pid, RUNCMD);
}
/*--------------------------------------------------------------------------
* prjob
*
* remove a job (specified by its pid) from the queue. The job is told to
* terminate. If the job is not found, tell the requesting client.
*--------------------------------------------------------------------------
*/
prjob(work, port)
register struct request *work;
int port;
{
register struct qnode *ptr;
extern struct qnode *qtail;
extern int cntrlsock;
extern int msgsock;
/*
* check to see if this is a control program request or
* a client request. check the validity of the bound
* socket in either case
*/
if (port == cntrlsock){
if (chksock(CNTRLPRE, work->pid, 0) == 0){
(void)outcntrl(work->pid, STOPCMD);
return;
}
}else if (chksock(CLIENTPRE, work->pid, (int)work->uid) == 0){
(void)outmsg(work->pid, STOPCMD);
return;
}
for (ptr = qtail; ptr != QNIL; ptr = ptr->back){
if (ptr->pid != work->time)
continue;
/*
* found the job, ONLY remove if the requester owns
* the job, if the requester is root, or this is a
* client that is terminating from a signal and is
* sending it's "last breath".
*/
if (work->pid == work->time){
/*
* clients "last breath". just remove from
* queue as by now the client is dead.
*/
rmqueue(ptr);
return;
}
if ((work->uid == 0) || (ptr->uid == work->uid)){
(void)outmsg(ptr->pid, STOPCMD);
rmqueue(ptr);
if (port == cntrlsock)
(void)outcntrl(work->pid, RUNCMD);
else
(void)outmsg(work->pid, RUNCMD);
return;
}else
break;
}
/*
* command failed, tell the process that sent the datagram
* only if this is not a "last breath" message (should really
* never happen!)
*/
if (port == cntrlsock)
(void)outcntrl(work->pid, STOPCMD);
else if (work->pid != work->time)
(void)outmsg(work->pid, STOPCMD);
}
/*-------------------------------------------------------------------------
* prusr
*
* remove all the jobs queued that belong to a specified user. Only root
* or the user can request his jobs to be removed.
* (check for user field must be done in the control program).
*-------------------------------------------------------------------------
*/
prusr(work)
register struct request *work;
{
register struct qnode *ptr;
int found = 0;
extern struct qnode *qtail;
/*
* check to see if this is a valid control program request
*/
if (chksock(CNTRLPRE, work->pid, 0) == 0){
(void)outcntrl(work->pid, STOPCMD);
return;
}
for (ptr = qtail; ptr != QNIL; ptr = ptr->back){
/*
* found a job owned by that user.
*/
if (ptr->uid == work->uid){
(void)outmsg(ptr->pid, STOPCMD);
rmqueue(ptr);
found = 1;
}
}
if (found == 1)
(void)outcntrl(work->pid, RUNCMD);
else
(void)outcntrl(work->pid, STOPCMD);
}
/*-------------------------------------------------------------------------
* runjob
*
* run a specified job (by pid) REGARDLESS of the load.
*-------------------------------------------------------------------------
*/
runjob(work)
register struct request *work;
{
register struct qnode *ptr;
extern struct qnode *qtail;
/*
* check to see if this is a control program request
*/
if (chksock(CNTRLPRE, work->pid, 0) == 0){
(void)outcntrl(work->pid, STOPCMD);
return;
}
for (ptr = qtail; ptr != QNIL; ptr = ptr->back){
if (ptr->pid == work->time){
/*
* found the job
*/
(void)outmsg(ptr->pid, RUNCMD);
rmqueue(ptr);
(void)outcntrl(work->pid, RUNCMD);
return;
}
}
(void)outcntrl(work->pid, STOPCMD);
}
/*-------------------------------------------------------------------------
* runusr
*
* run all jobs owned by a user REGARDLES of the load
*-------------------------------------------------------------------------
*/
runusr(work)
register struct request *work;
{
register struct qnode *ptr;
int found = 0;
extern struct qnode *qtail;
/*
* check to see if this is a control program request
*/
if (chksock(CNTRLPRE, work->pid, 0) == 0){
(void)outcntrl(work->pid, STOPCMD);
return;
}
for (ptr = qtail; ptr != QNIL; ptr = ptr->back){
if (ptr->uid == work->uid){
/*
* found a job owned by that user
*/
(void)outmsg(ptr->pid, RUNCMD);
rmqueue(ptr);
found = 1;
}
}
if (found == 1)
(void)outcntrl(work->pid, RUNCMD);
else
(void)outcntrl(work->pid, STOPCMD);
}
/*-------------------------------------------------------------------------
* status
*
* update the status file. the status file contains the current setting
* of server paramters which can be changed by the control program.
*-------------------------------------------------------------------------
*/
status(work)
struct request *work;
{
FILE *out;
extern int errorcount;
extern int newstatus;
extern int qcount;
extern int full;
extern int timerstop;
#ifdef sun
extern long loadlevel;
#else
extern double loadlevel;
#endif sun
extern u_long mqtime;
extern struct itimerval startalrm;
extern FILE *fopen();
/*
* status is the same since the last request.
*/
if (newstatus == 0){
(void)outcntrl(work->pid, RUNCMD);
return;
}
if ((out = fopen(STATUSFILE, "w")) == NULL){
errlog("status cannot open STATUSFILE");
(void)outcntrl(work->pid, STOPCMD);
return;
}
#ifdef sun
fprintf(out,"%d %d %d %ld %u %d %ld\n",qcount,full,timerstop,
#else
fprintf(out,"%d %d %d %ld %u %d %lf\n",qcount,full,timerstop,
#endif
startalrm.it_value.tv_sec,mqtime,errorcount,loadlevel);
(void)fclose(out);
newstatus = 0;
(void)outcntrl(work->pid, RUNCMD);
}
@//E*O*F server/commands.c//
chmod u=r,g=r,o=r server/commands.c
echo Inspecting for damage in transit...
temp=/tmp/shar$$; dtemp=/tmp/.shar$$
trap "rm -f $temp $dtemp; exit" 0 1 2 3 15
cat > $temp <<\!!!
878 2737 20872 commands.c
!!!
wc server/commands.c | sed 's=[^ ]*/==' | diff -b $temp - >$dtemp
if [ -s $dtemp ]
then echo "Ouch [diff of wc output]:" ; cat $dtemp
else echo "No problems found."
fi
exit 0muller@sdcc3.UUCP (Keith Muller) (02/12/85)
This is part 8 (last one!) of the load control system. Part 1 must be unpacked
before any other part.
Keith Muller
ucbvax!sdcsvax!muller
# This is a shell archive. Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by sdcc3!muller on Sat Feb 9 13:58:45 PST 1985
# Contents: server/setup.c
echo x - server/setup.c
sed 's/^@//' > "server/setup.c" <<'@//E*O*F server/setup.c//'
/*-------------------------------------------------------------------------
* setup.c - server
*
* routines needed to start up the server.
*-------------------------------------------------------------------------
*/
/* $Log$ */
#include "../h/common.h"
#include "../h/server.h"
#include <stdio.h>
#include <sys/time.h>
#include <sys/file.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/uio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/dir.h>
#include <sys/resource.h>
#include <nlist.h>
#include <signal.h>
#include <errno.h>
/*-------------------------------------------------------------------------
* doargs
*
* check the command line arguement list and set up the global parameters.
*
* Note that both -X value and -Xvalue format for any flag X is accepted
*-------------------------------------------------------------------------
*/
doargs(argc, argv)
int argc;
char **argv;
{
register int i;
register char *ptr;
int lasti;
int badarg;
#ifdef sun
extern long loadlevel;
#else
extern double loadlevel;
#endif sun
extern u_long mqtime;
extern struct itimerval startalrm;
extern long atol();
extern int atoi();
extern double atof();
badarg = 0;
for (i = 1; i < argc; i++){
if (argv[i][0] != '-'){
fprintf(stderr,"bad arg: %s\n", argv[i]);
badarg = 1;
break;
}
lasti = i;
/*
* set ptr to point at start of flags VALUE.
* if strlen > 2 must be -Xvalue format
* otherwise set ptr to point at next argv
*/
if (strlen(argv[i]) > 2)
ptr = &(argv[i][2]);
else if ((i+1) < argc)
ptr = argv[++i];
else{
fprintf(stderr,"bad arg: %s\n", argv[i]);
badarg = 1;
break;
}
switch(argv[lasti][1]){
case 'L':
/*
* load level to queue at
*/
#ifdef sun
if ((loadlevel = (long)(atof(ptr)*256)) <= 0){
fprintf(stderr,"bad loadlevel: %ld\n",atof(ptr));
#else
if ((loadlevel = atof(ptr)) <= 0){
fprintf(stderr,"bad loadlevel: %lf\n",loadlevel);
#endif
badarg = 1;
}
break;
case 'T':
/*
* timer cycle time for load checks
*/
if ((startalrm.it_value.tv_sec = atol(ptr))<1){
fprintf(stderr,"bad alarmtime: %ld\n",atol(ptr));
badarg = 1;
}
break;
default:
fprintf(stderr,"unknown arg: %s\n",argv[lasti]);
badarg = 1;
break;
}
if (badarg == 1)
break;
}
if (badarg == 1){
fprintf(stderr,"Useage: %s [-L load] [-T alarm]\n",argv[0]);
exit(1);
}
}
/*--------------------------------------------------------------------------
* setup
*
* a collection of code need at startup to set up the server. such as
* checking to make sure only one server runs, detatch server from control
* terminal etc
*--------------------------------------------------------------------------
*/
setup()
{
register int i;
int lockfile;
int temp;
char line[20];
static struct nlist avenrun[] = { {"_avenrun"}, {""}};
extern int alrmmask;
extern long loadaddr;
extern int descsize;
extern int kmem;
extern int errno;
extern int errorcount;
extern FILE *errfile;
extern FILE *fopen();
extern int getpid();
extern int onalrm();
extern int getpid();
extern int setpriority();
extern char *sprintf();
if (getuid() != 0){
fprintf(stderr, "must run as root\n");
exit(1);
}
/*
* see if the spool dir where the client sockets are bound exsists
*/
if (access(SPOOLDIR, F_OK) == -1){
fprintf(stderr,"No directory: %s\n", SPOOLDIR);
exit(1);
}
/*
* see if the spool dir where the server sockets are bound exsists
*/
if (access(SERVERDIR, F_OK) == -1){
fprintf(stderr,"No directory: %s\n", SERVERDIR);
exit(1);
}
/*
* detach from foreground
*/
if (fork() != 0)
exit(0);
/*
* close down all open descriptors
* so the server is no longer attached to any tty
*/
descsize = getdtablesize();
for (i = 0; i < descsize; i++)
(void)close(i);
(void)open("/dev/null",O_RDONLY);
(void)open("/dev/null", O_WRONLY);
(void)open("/dev/null", O_WRONLY);
/*
* do an ioctl to /dev/tty to detach server from ttys
*/
if ((i = open("/dev/tty", O_RDWR)) > 0){
(void)ioctl(i, TIOCNOTTY, (char *)0);
(void)close(i);
}
/*
* set umask to remove all others permissions
*/
(void)umask(027);
/*
* open the error logging file
*/
errfile = fopen(ERRORPATH,"a+");
if (errfile < 0)
exit(1);
/*
* check lockfile for other servers already running uses advisory
* locking
*/
lockfile = open (LOCK, O_WRONLY|O_CREAT, 0640);
if (lockfile < 0){
errlog("cannot create lockfile");
exit(1);
}
if (flock(lockfile, LOCK_EX|LOCK_NB) < 0){
if (errno == EWOULDBLOCK)
exit(0);
errlog("cannot lock lockfile");
exit(1);
}
/*
* write the pid of this server in the lock file in case you
* need to blow the server away. (not currently used).
*/
i = getpid();
(void)ftruncate(lockfile, 0);
(void)sprintf(line, "%d\n", i);
temp = strlen(line);
if (write(lockfile, line, temp) != temp)
errlog("cannot write server pid");
/*
* mark the logfile that a new server is starting
*/
(void)fprintf(errfile,"server pid: %d ",i);
errno = 0;
errlog("started at");
errorcount = 0;
/*
* lower the server priority so that under heavy load the server
* can get the machine cycles when it needs them. The server
* uses very small amounts of cpu, so this is not going to impact
* the system.
*/
if (setpriority(0, i, PRIO) < 0 )
errlog("cannot lower priority");
/*
* open kmem where the load average will be read from
*/
if ((kmem = open("/dev/kmem", O_RDONLY)) < 0){
errlog("cannot open kmem");
cleanup();
}
/*
* get the address in this vmunix where the load average is
* loacated in the kernel data space
*/
nlist("/vmunix", avenrun);
if (avenrun[0].n_value == 0){
errlog("cannot find _avenrun");
cleanup();
}
loadaddr = (long)avenrun[0].n_value;
/*
* bind the signal handlers now
*/
(void)signal(SIGALRM, onalrm);
/*
* mask used to block off sigalrm when the data structures
* are being changed and must not service a timer interrupt
* to check the load average
*/
alrmmask = 1 << (SIGALRM - 1);
}
/*---------------------------------------------------------------------------
* scanspool
*
* when the server is restarted it could be right after an older server
* just terminated and left a lot of jobs queued up. since the queue is
* kept in memory for speed, no record exsists anymore about the queued
* clients. Since all the client sockets are bound in the same spool
* directory simply search the directory for a client socket and send it
* a POLLCMD. the client will respond to the POLLCMD by resubmitting its
* work request datagram. The addqueue routine rebuilds the queue by time
* so the queue will be in the proper order. Any dead cleints whose bound
* sockets are still in the spool will be removed (the bound sockets).
*---------------------------------------------------------------------------
*/
scanspool()
{
register int i;
int numfds;
int readfds;
u_long pid;
int cnprlen;
int clprlen;
int tag;
int msgmask;
struct direct *dp;
struct stat statbuf;
DIR *dirp;
extern struct timeval polltime;
extern int msgsock;
extern int descsize;
extern long atol();
/*
* open a directory for scanning
*/
if ((dirp = opendir(SPOOLDIR)) == NULL){
errlog("cannot open spool directory");
cleanup();
}
/*
* cd to the directory, this allows short names for binding and
* has a place to look for core dumps if they might occur
*/
if (chdir(SPOOLDIR) == -1){
errlog("cannot cd to spool");
cleanup();
}
/*
* clprlen is length of the prefix of client socket.
* cnprlen is length of prefix of control program socket.
* needed to extract the uid
*/
clprlen = strlen(CLIENTPRE);
cnprlen = strlen(CNTRLPRE);
msgmask = 1 << msgsock;
for (dp = readdir(dirp); dp != NULL; dp = readdir(dirp)){
/*
* if not a possable client, go to next entry
*/
if (dp->d_ino == 0)
continue;
if (strncmp(CLIENTPRE, dp->d_name, clprlen) == 0)
tag = 1;
else if (strncmp(CNTRLPRE, dp->d_name, cnprlen) == 0)
tag = 0;
else
continue;
if (stat(dp->d_name, &statbuf) != 0){
errlog("stat on spool file failed");
continue;
}
/*
* file has client or cntrol like name but is not a socket,
* remove as could cause problems later
*/
if ((statbuf.st_mode & S_IFSOCK) == 0){
(void)unlink(dp->d_name);
continue;
}
if (tag == 0){
/*
* send a message to waiting control program. outcntrl
* will remove if this is an old socket.
*/
pid = (u_long)atol(dp->d_name + cnprlen);
(void)outcntrl(pid, POLLCMD);
continue;
}
/*
* is a client socket must force a resubmit of the job. If it
* is an old socket outmsg will remove.
*/
pid = (u_long)atol(dp->d_name + clprlen);
/*
* throw a couple of POLLCMDS at the client to see if
* he is still alive. If the system is loaded could take
* a while to swap back in so give him time.
*/
for (i = 0; i < MAXPOLLS; i++){
if (outmsg(pid, POLLCMD) != 0)
break;
readfds = msgmask;
numfds = select(descsize,&readfds,(int*)0,(int*)0,&polltime);
if ((numfds < 0) && (errno != EINTR)){
errlog("select error in scanspool");
cleanup();
}
/*
* time in select expired and no answer from client
* try again
*/
if (numfds <= 0){
continue;
}
/*
* got a datagram, figure it out
*/
msgdis();
}
}
}
/*-------------------------------------------------------------------------
* crsock
*
* create all the sockets used by the server
*-------------------------------------------------------------------------
*/
crsock()
{
int len;
struct sockaddr_un name;
extern int msgsock;
extern int cntrlsock;
extern char *strcpy();
/*
* create the msgsocket where queue requests appear
*/
name.sun_family = AF_UNIX;
msgsock = socket(AF_UNIX, SOCK_DGRAM, 0);
if (msgsock < 0){
errlog("cannot create msgsock");
cleanup();
}
/*
* remove any entry in the file system for this name else the bind
* will fail. We are sure from the locking that this is ok to do.
*/
(void)unlink(MSGPATH);
(void)strcpy(name.sun_path, MSGPATH);
len = strlen(name.sun_path) + sizeof(name.sun_family) + 1;
if (bind(msgsock, &name, len) < 0){
errlog("cannot bind msgsock");
cleanup();
}
/*
* create the control socket for high priority control commands
*/
cntrlsock = socket(AF_UNIX, SOCK_DGRAM, 0);
if (cntrlsock < 0){
errlog("cannot create cntrlsock");
cleanup();
}
(void)unlink(CNTRLPATH);
(void)strcpy(name.sun_path, CNTRLPATH);
len = strlen(name.sun_path) + sizeof(name.sun_family) + 1;
if (bind(cntrlsock, &name, len) < 0){
errlog("cannot bind cntrlsock");
cleanup();
}
}
@//E*O*F server/setup.c//
chmod u=r,g=r,o=r server/setup.c
echo Inspecting for damage in transit...
temp=/tmp/shar$$; dtemp=/tmp/.shar$$
trap "rm -f $temp $dtemp; exit" 0 1 2 3 15
cat > $temp <<\!!!
461 1512 10759 setup.c
!!!
wc server/setup.c | sed 's=[^ ]*/==' | diff -b $temp - >$dtemp
if [ -s $dtemp ]
then echo "Ouch [diff of wc output]:" ; cat $dtemp
else echo "No problems found."
fi
exit 0muller@sdcc3.UUCP (Keith Muller) (02/21/85)
Unpack part 1 before this part.
# This is a shell archive. Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by sdcc3!muller on Sat Feb 9 13:44:50 PST 1985
# Contents: client/Makefile client/main.c scripts/addldd scripts/makedirs
# scripts/qaddldd scripts/rmldd scripts/saddldd
echo x - client/Makefile
sed 's/^@//' > "client/Makefile" <<'@//E*O*F client/Makefile//'
#
# Makefile for batch client
#
CFLAGS= -O
HDR= ../h/common.h ../h/client.h
SRC= main.c
DEST1= /bin
TARG1= binclient
TARG1Q= qbinclient
DEST2= /usr/bin
TARG2= usrbinclient
TARG2Q= qusrbinclient
DEST3= /usr/local
TARG3= usrlocclient
TARG3Q= qusrlocclient
DEST4= /usr/ucb
TARG4= usrucbclient
TARG4Q= qusrucbclient
DEST5= /usr/games
TARG5= gamesclient
TARG5Q= qgamesclient
DEST6= /usr/new
TARG6= usrnewclient
TARG6Q= qusrnewclient
#
#the spec macros have the name of the program
#SPEC1= test
#DESTSPEC1= /tmp
all: $(TARG1) $(TARG1Q) $(TARG2) $(TARG2Q) $(TARG3) $(TARG3Q) $(TARG4) \
$(TARG4Q) $(TARG5) $(TARG5Q) $(TARG6) $(TARG6Q)
clean:
rm -f core *client
lint:
lint -abchx main.c
install: $(DEST1)/.client $(DEST1)/.qclient \
$(DEST2)/.client $(DEST2)/.qclient \
$(DEST3)/.client $(DEST3)/.qclient \
$(DEST4)/.client $(DEST4)/.qclient \
$(DEST5)/.client $(DEST5)/.qclient
# $(DEST5)/.client $(DEST5)/.qclient \
# $(DEST6)/.client $(DEST6)/.qclient
####################################################################
# Have the two commented lines replace the last line of the
# dependency list if your machine has /usr/new.
#
# The following line is a sample for $(SPEC1) which would
# have to be added into the install dependency for each
# SPEC defined.
# $(DESTSPEC1)/.$(SPEC1)client
####################################################################
####################################################################
# $(SPEC1) is a sample of a special client for controlling
# a single binary (like pi)
# NOARGV should be the full path name of where the real binary is
# is store (usually a .code directory)
####################################################################
$(SPEC1)client: main.c $(HDR)
cc $(CFLAGS) -DNOARGV=\"$(DESTSPEC1)/.code/$(SPEC1)\" main.c -o $(SPEC1)client
$(DESTSPEC1)/.$(SPEC1)client: $(SPEC1)client
install -c -m 4711 -o root $(SPEC1)client $(DESTSPEC1)/.$(SPEC1)client
####################################################################
$(TARG1): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST1)/.code/\" main.c -o $(TARG1)
$(TARG1Q): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST1)/.code/\" -DQUIET main.c -o $(TARG1Q)
$(TARG2): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST2)/.code/\" main.c -o $(TARG2)
$(TARG2Q): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST2)/.code/\" -DQUIET main.c -o $(TARG2Q)
$(TARG3): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST3)/.code/\" main.c -o $(TARG3)
$(TARG3Q): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST3)/.code/\" -DQUIET main.c -o $(TARG3Q)
$(TARG4): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST4)/.code/\" main.c -o $(TARG4)
$(TARG4Q): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST4)/.code/\" -DQUIET main.c -o $(TARG4Q)
$(TARG5): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST5)/.code/\" main.c -o $(TARG5)
$(TARG5Q): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST5)/.code/\" -DQUIET main.c -o $(TARG5Q)
$(TARG6): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST6)/.code/\" main.c -o $(TARG6)
$(TARG6Q): main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"$(DEST6)/.code/\" -DQUIET main.c -o $(TARG6Q)
$(DEST1)/.client: $(TARG1)
install -c -m 4711 -o root $(TARG1) $(DEST1)/.client
$(DEST1)/.qclient: $(TARG1Q)
install -c -m 4711 -o root $(TARG1Q) $(DEST1)/.qclient
$(DEST2)/.client: $(TARG2)
install -c -m 4711 -o root $(TARG2) $(DEST2)/.client
$(DEST2)/.qclient: $(TARG2Q)
install -c -m 4711 -o root $(TARG2Q) $(DEST2)/.qclient
$(DEST3)/.client: $(TARG3)
install -c -m 4711 -o root $(TARG3) $(DEST3)/.client
$(DEST3)/.qclient: $(TARG3Q)
install -c -m 4711 -o root $(TARG3Q) $(DEST3)/.qclient
$(DEST4)/.client: $(TARG4)
install -c -m 4711 -o root $(TARG4) $(DEST4)/.client
$(DEST4)/.qclient: $(TARG4Q)
install -c -m 4711 -o root $(TARG4Q) $(DEST4)/.qclient
$(DEST5)/.client: $(TARG5)
install -c -m 4711 -o root $(TARG5) $(DEST5)/.client
$(DEST5)/.qclient: $(TARG5Q)
install -c -m 4711 -o root $(TARG5Q) $(DEST5)/.qclient
$(DEST6)/.client: $(TARG6)
install -c -m 4711 -o root $(TARG6) $(DEST6)/.client
$(DEST6)/.qclient: $(TARG6Q)
install -c -m 4711 -o root $(TARG6Q) $(DEST6)/.qclient
####################################################################
# tmpclient is used for testing the ldc system with a /tmp/.code
# directory. This is not normally used.
####################################################################
tmpclient: main.c $(HDR)
cc $(CFLAGS) -DCODEPATH=\"/tmp/.code/\" main.c -o tmpclient
install -c -m 4711 -o root tmpclient /tmp/.client
@//E*O*F client/Makefile//
chmod u=r,g=r,o=r client/Makefile
echo x - client/main.c
sed 's/^@//' > "client/main.c" <<'@//E*O*F client/main.c//'
/*--------------------------------------------------------------------------
* main.c - client
*
* Front end program that communicates with the ldd server. This front
* end replaces the program to be controlled. The controlled binary is
* hidden in a directory that is only accessable through group privledges.
* Only one client executable is needed for each protected binary directory.
* The real name of the program to be executed is extracted from argv[0]
* unless NOARGV is defined. When defined, NOARGV has the name of the program
* to be exec'ed wired in. The NOARGV option is necessary for programs like
* pi and px which use argv[0] to pass data to them (YUCK!!!) when they are
* called from pix. Usually all the front ends can just be links (hard or soft)
* to the same code file.
* The QUIET option allows suppression of the client status messages (this
* is good for nroff). The ONETIME option exempts all child processes from
* being queued once the parent process has passed through load control
* once. (Good for queueing individual passes of a compiler, make, etc).
* If for any reason the server is dead or not responding, this program will
* simply exec the proper code file. This allows the load control system to
* be quickly disabled by killing off the ldd server program.
* The front end checks every QUEUETIME seconds to see if the server is
* still running and has this process queued up. If this poll fails the
* control program is exec'ed. This protects against the system locking up due
* to server death. The system WILL NOT be overloaded from a rash of executing
* jobs as each job will expire relative to the time it was queued (which will
* be spread out over time).
*
* Author : Keith Muller
* University of California, San Diego
* Academic Computer Center C - 010
* La Jolla Ca 92093
* ucbvax!sdcsvax!sdcc3!muller
* (619) 452-6090
*---------------------------------------------------------------------------
*/
/* $Log$ */
#include "../h/common.h"
#include "../h/client.h"
#include <sys/uio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/time.h>
#include <stdio.h>
#include <signal.h>
#include <errno.h>
int queued = 0; /* 1 if queued */
int msgsock = -1; /* desciptor of socket */
int len = 0; /* used to store address len */
char *ptr; /* ptr for pulling apart argv[0] */
char clientpath[255]; /* buffer for socket name */
char binary[255]; /* buffer for real binary's path */
struct request job; /* datagram for the server */
struct sockaddr_un name; /* socket address of server */
struct timeval polltime = {WAITTIME, 0}; /* waitime to check server */
extern int onint(); /* interrupt handler */
/*-----------------------------------------------------------------------
* main
*
*-----------------------------------------------------------------------
*/
main(argc, argv, envp)
int argc;
char **argv, **envp;
{
register int i; /* general counter */
int msgmask; /* mask for select */
int readfds; /* mask for desc to select on */
int numfds; /* number of desc select ret */
int egid; /* effective group id */
int rgid; /* real group id */
int uid; /* real user id */
int pollcount; /* number of polls to server */
int descsize; /* size of desc table */
int sigmask; /* signal mask before block */
char msg; /* answer from server */
struct timeval now; /* time (secs) value */
struct timezone zone; /* timezone value (unused) */
int fromlen = 0;
#ifndef QUIET
int announce; /* limits "queued" messages */
char *eptr;
extern char *getenv();
extern int strcmp();
#endif QUIET
extern char *strcpy();
extern char *strncpy();
extern char *strcat();
extern int getpid();
extern int getegid();
extern int getgid();
extern int getuid();
extern int sigblock();
extern int sigsetmask();
extern int errno;
extern char *sprintf();
extern char *rindex();
/*
* the client front end runs ONLY setuid to root. so get real user
* and both effective and real gids.
*/
egid = getegid();
rgid = getgid();
uid = getuid();
/*
* set the users real and effective uid (no limits on root). also set
* the group id to LDDGID so a socket can be bound in the spool
* directory and a datagram can be sent to the server. (the spool
* directory MUST BE in group LDDGID and mode 0730 only!
* NO OTHER PRIVLEDGES AT ALL!!!!!)
*/
(void)setregid(rgid, LDDGID);
(void)setreuid(uid, uid);
/*
* If NOARGV is defined, then this is a special client which
* will only exec a SINGLE program. This is to get around things
* like pi which can use argv[0] to pass data. Otherwise we must
* find the base name of the requested program. Since argv[0]
* can be a long ugly path name, ugly looking code is needed
* to strip off the path.
*/
#ifdef NOARGV
/*
* NOARGV is set in the makefile to have the FULL path of where the
* real binary lives: for example /usr/bin/.code/yuck
*/
(void)strcpy(binary, NOARGV);
if ((ptr = rindex(binary, '/')) == (char *)0)
ptr = binary;
else
ptr++;
#else
/*
* must pull the path out of the argv[0]
*/
if ((ptr = rindex(argv[0], '/')) == (char *)0)
ptr = argv[0];
else
ptr++;
(void)sprintf(binary, "%s%s", CODEPATH, ptr);
#endif
/*
* If ONETIME is defined, then all child processes of this job are
* EXEMPT from being queued. This is useful for things like pi which
* can be called both by a user and from pix.
* This works because if the effective gid of the process
* is the group LDDGID this process must be a decendent of a process
* that has already passed through the load control system. This
* mechanism will work only if this program is setuid and IS NOT
* setgid.
*
* root is always exempt!
*
* NOTE: ptr will be used later to build up the command line buffer
* in the datagram request packet sent to the server.
*/
#ifdef ONETIME
if ((egid == LDDGID) || (uid == 0))
run(argv, envp);
#else
if (uid == 0)
run(argv, envp);
#endif ONETIME
/*
* create the socket and the datagram. if anything fails
* just run. cannot afford to have this process HANG!
*/
msgsock = socket(AF_UNIX, SOCK_DGRAM, 0);
if (msgsock < 0)
run(argv, envp);
/*
* bind the handler to clean up
*/
(void)signal(SIGINT, onint);
(void)signal(SIGHUP, onint);
(void)signal(SIGQUIT, onint);
(void)signal(SIGTERM, onint);
/*
* make the datagram up
*/
job.pid = (u_long)getpid();
(void)sprintf(clientpath,"%s/%s%u",SPOOLDIR,CLIENTPRE,job.pid);
(void)strcpy(name.sun_path, clientpath);
name.sun_family = AF_UNIX;
len = strlen(name.sun_path) + 1 + sizeof(name.sun_family);
/*
* block off interrupt and control z until we get datagram
* sent
*/
sigmask = sigblock((1<<(SIGINT-1)) | (1<<(SIGTSTP-1)) | (1<<(SIGHUP-1)) |
(1<<(SIGQUIT-1)) | (1<<(SIGTERM-1)));
/*
* bind the socket, if it fails just run
*/
(void)unlink(name.sun_path);
if (bind(msgsock, &name, len) < 0){
(void)sigsetmask(sigmask);
run(argv, envp);
}
/*
* build up the command line that will be displayed in the
* when the user interrogates the queue. This helps in identifying
* which job is which.
*/
(void)strncpy(job.com, ptr, COMLEN - 1);
i = 1;
len = strlen(job.com) + 1;
while((i < argc) && ((len = len + strlen(argv[i]) + 1) <= COMLEN)){
(void)strcat(job.com, " ");
(void)strcat(job.com, argv[i]);
i++;
}
/*
* put the path name of the servers datagram in the sockaddr struct
*/
(void)strcpy(name.sun_path, MSGPATH);
len = strlen(name.sun_path) + 1 + sizeof(name.sun_family);
/*
* time stamp the datagram, and place my pid in it (identfies the name
* of this clients bound socket
*/
if (gettimeofday(&now, &zone) < 0)
run(argv, envp);
job.time = now.tv_sec;
job.type = QCMD;
job.uid = (u_long)uid;
/*
* send the request to the server
*/
if (sendto(msgsock, &job, sizeof(struct request), 0, &name, len) < 0){
(void)sigsetmask(sigmask);
run(argv, envp);
}
descsize = getdtablesize();
msgmask = 1 << msgsock;
pollcount = 0;
/*
* unblock the signals
*/
(void)sigsetmask(sigmask);
#ifndef QUIET
/*
* set announce to 0 to get at least one status message
* if user has ENVNAME = quiet in his environment, no messages!
*/
if (((eptr=getenv(ENVNAME))==(char *)0)||(strcmp(eptr,"quiet")!=0))
announce = 0;
else
announce = -1;
#endif QUIET
/*
* wait for the word from the server!
*/
for(;;){
readfds = msgmask;
numfds = select(descsize,&readfds,(int *)0,(int *)0,&polltime);
/*
* if there is a screwup here just run
*/
if (((numfds<0)&&(errno!=EINTR))||((numfds<=0)&&(pollcount>MAXPOLLS))){
#ifndef QUIET
if (announce == 1)
fprintf(stderr,"running\n");
#endif QUIET
run(argv, envp);
}
/*
* we waitied polltime seconds and no word from the server
* so send the datagram again in case the system lost it
* OR else we got a garbage answer!
*/
if ((numfds<=0)||(recvfrom(msgsock,&msg,sizeof(msg),0,(struct sockaddr *)0,&fromlen)<=0)){
pollcount++;
/*
* oh server where are you?
*/
if (sendto(msgsock,&job,sizeof(struct request),0,&name,len)<0){
#ifndef QUIET
if (announce == 1)
fprintf(stderr,"running\n");
#endif QUIET
run(argv, envp);
}else{
/*
* the datagram was sent so switch to WAITTIME
* for an answer. WAITTIME is much shorter
* than QUEUETIME as we want to be queued.
*/
polltime.tv_sec = WAITTIME;
continue;
}
}
/*
* we got the word see what to do
*/
switch(msg){
case RUNCMD:
/*
* we can run
*/
#ifndef QUIET
if (announce == 1)
fprintf(stderr,"running\n");
#endif QUIET
run(argv, envp);
break;
case STOPCMD:
/*
* bye bye
*/
#ifndef QUIET
if (announce == 1)
fprintf(stderr,"stopped\n");
#endif QUIET
(void)close(msgsock);
(void)unlink(clientpath);
exit(1);
case QCMD:
/*
* we have been queued
* switch to QUEUETIME so to check later
* that the server is still around (so we
* don't wait forever!
* Switch to POLLCMD so that the server knows
* we have been in the queue at least once.
*/
polltime.tv_sec = QUEUETIME;
queued = 1;
pollcount = 0;
job.type = POLLCMD;
#ifndef QUIET
/*
* tell user he is being queued
*/
if (announce == 0){
fprintf(stderr,"Queued, waiting to run....");
announce = 1;
}
#endif QUIET
break;
case FULLQUEUE:
/*
* The queue is full, this job cannot be
* accepted. This prevents the system
* from running out of slots in the
* process table.
*/
fprintf(stderr,"Cannot run, the system is overloaded. Try again later.\n");
(void)close(msgsock);
(void)unlink(clientpath);
exit(1);
case POLLCMD:
/*
* server wants the data again
* The only way we can get this is from
* a new server during startup.
* So reset the datagram to a QCMD.
* (fall through to default below),
*/
job.type = QCMD;
default:
/*
* or got garbage
*/
if (sendto(msgsock,&job,sizeof(struct request),0,&name,len)<0){
#ifndef QUIET
if (announce == 1)
fprintf(stderr,"running\n");
#endif QUIET
run(argv, envp);
}
/*
* switch back to WAITTIME to be a pest until
* we get queued
*/
polltime.tv_sec = WAITTIME;
queued = 0;
pollcount = 0;
break;
}
}
}
/*-----------------------------------------------------------------------------
* onint
*
* what to do when the user wants out
*-----------------------------------------------------------------------------
*/
onint()
{
/*
* if we are already queued say goodbye to the server
*/
if (queued == 1){
/*
* Send a message to the server we are quitting so the server
* can remove us from the queue.
*/
job.type = PJOBCMD;
job.time = job.pid;
(void)sendto(msgsock,&job,sizeof(struct request),0,&name,len);
}
(void)close(msgsock);
(void)unlink(clientpath);
exit(0);
}
/*-----------------------------------------------------------------------------
* run
*
* routine that execs the real program after getting the ok
*-----------------------------------------------------------------------------
*/
run(argv, envp)
char **argv, **envp;
{
extern int msgsock;
extern char binary[];
extern char clientpath[];
/*
* shut down the socket and remove it from the spool
*/
if (msgsock != -1)
(void)close(msgsock);
(void)unlink(clientpath);
/*
* all is set try to run!
* this works because the directory where the REAL code file is
* is mode 0730 and the group MUST BE LDDGID!
* and we are now running with an effective gid of LDDGID
*/
(void)execve(binary, argv, envp);
/*
* from now on something screwed up! print the error and
* hope the user reports it!
*/
perror(binary);
exit(0);
}
@//E*O*F client/main.c//
chmod u=r,g=r,o=r client/main.c
echo x - scripts/addldd
sed 's/^@//' > "scripts/addldd" <<'@//E*O*F scripts/addldd//'
#! /bin/csh -f
#
# script to add file1 through filen located in directory to the processes
# controlled by the ldd system.
#
# THIS IS FOR COMMANDS THAT SHOULD HAVE STATUS ANNOUNCEMENTS
umask 022
if ($#argv < 2) then
echo "usage: addldd directory file1 file2 .. filen"
else
echo "cd $1"
cd $1
shift
while($#argv)
if (-e .client == 0) then
echo "there is no .client front end. Do a make install."
break
else if (-e .code/$1) then
echo "$1 is already load controlled"
else if (-e $1) then
echo "putting $1 under load control"
echo "mv $1 .code/$1"
/bin/mv $1 .code/$1
echo "ln -s .client $1"
/bin/ln -s .client $1
else
echo "$1 does not exsist"
endif
shift
end
endif
@//E*O*F scripts/addldd//
chmod u=rx,g=rx,o=rx scripts/addldd
echo x - scripts/makedirs
sed 's/^@//' > "scripts/makedirs" <<'@//E*O*F scripts/makedirs//'
#! /bin/csh -f
# make all the directories needed by the load control system
# NOTE: lddgrp must be defined as the same group id as in h/client.h
#
echo "making the directories for load control"
foreach i (/bin /usr/bin /usr/local /usr/ucb /usr/new /usr/games)
if (-e $i && -e $i/.code == 0) then
echo "mkdir $i/.code"
/bin/mkdir $i/.code
endif
if (-e $i/.code) then
echo "chown root $i/.code"
/etc/chown root $i/.code
echo "chgrp lddgrp $i/.code"
/bin/chgrp lddgrp $i/.code
echo "chmod 0710 $i/.code"
/bin/chmod 0710 $i/.code
endif
end
#
if (-e /usr/spool/ldd == 0) then
echo "mkdir /usr/spool/ldd"
/bin/mkdir /usr/spool/ldd
endif
if (-e /usr/spool/ldd/sr == 0) then
echo "mkdir /usr/spool/ldd/sr"
/bin/mkdir /usr/spool/ldd/sr
endif
if (-e /usr/spool/ldd/cl == 0) then
echo "mkdir /usr/spool/ldd/cl"
/bin/mkdir /usr/spool/ldd/cl
endif
#
# spool/ldd/cl MUST BE GROUP writeable
# all others should NOT be GROUP writeable
#
echo "chown root /usr/spool/ldd"
/etc/chown root /usr/spool/ldd
echo "chgrp lddgrp /usr/spool/ldd"
/bin/chgrp lddgrp /usr/spool/ldd
echo "chmod 0710 /usr/spool/ldd"
/bin/chmod 0710 /usr/spool/ldd
echo "chown root /usr/spool/ldd/cl"
/etc/chown root /usr/spool/ldd/cl
echo "chgrp lddgrp /usr/spool/ldd/cl"
/bin/chgrp lddgrp /usr/spool/ldd/cl
echo "chmod 0730 /usr/spool/ldd/cl"
/bin/chmod 0730 /usr/spool/ldd/cl
echo "chown root /usr/spool/ldd/sr"
/etc/chown root /usr/spool/ldd/sr
echo "chgrp lddgrp /usr/spool/ldd/sr"
/bin/chgrp lddgrp /usr/spool/ldd/sr
echo "chmod 0710 /usr/spool/ldd/sr"
/bin/chmod 0710 /usr/spool/ldd/sr
@//E*O*F scripts/makedirs//
chmod u=rx,g=rx,o=rx scripts/makedirs
echo x - scripts/qaddldd
sed 's/^@//' > "scripts/qaddldd" <<'@//E*O*F scripts/qaddldd//'
#! /bin/csh -f
#
# script to add file1 through filen located in directory to the processes
# controlled by the ldd system.
#
# THIS IS FOR COMMANDS THAT *DO* *NOT* HAVE STATUS ANNOUNCEMENTS
umask 022
if ($#argv < 2) then
echo "usage: qaddldd directory file1 file2 .. filen"
else
echo "cd $1"
cd $1
shift
while($#argv)
if (-e .qclient == 0) then
echo "there is no .qclient front end. Do a make install."
break
else if (-e .code/$1) then
echo "$1 is already load controlled"
else if (-e $1) then
echo "putting $1 under load control (quiet mode)"
echo "mv $1 .code/$1"
/bin/mv $1 .code/$1
echo "ln -s .qclient $1"
/bin/ln -s .qclient $1
else
echo "$1 does not exsist"
endif
shift
end
endif
@//E*O*F scripts/qaddldd//
chmod u=rx,g=rx,o=rx scripts/qaddldd
echo x - scripts/rmldd
sed 's/^@//' > "scripts/rmldd" <<'@//E*O*F scripts/rmldd//'
#! /bin/csh -f
#
# script to remove process file1 through filen in directory from the
# load control system.
umask 022
if ($#argv < 2) then
echo "usage: rmldd directory file1 file2 .. filen"
else
echo "cd $1"
cd $1
shift
while($#argv)
if (-e .code/$1) then
echo "removing $1 from load control"
echo "rm $1"
/bin/rm $1
echo "mv .code/$1 $1"
/bin/mv .code/$1 $1
else
echo "$1 is not load controlled"
endif
shift
end
endif
@//E*O*F scripts/rmldd//
chmod u=rx,g=rx,o=rx scripts/rmldd
echo x - scripts/saddldd
sed 's/^@//' > "scripts/saddldd" <<'@//E*O*F scripts/saddldd//'
#! /bin/csh -f
#
# script to add file file1 as a SPECIAL client in directory directory.
#
# THIS IS FOR COMMANDS THAT REQUIRE SPECIAL PRIVATE CLIENTS
umask 022
if ($#argv != 2) then
echo "usage: saddldd directory file"
else
echo "cd $1"
cd $1
shift
if (-e .$1client == 0) then
echo "there is no .$1client front end. Do a make install."
break
else if (-e .code/$1) then
echo "$1 is already load controlled"
else if (-e $1) then
echo "putting $1 under load control (special client)"
echo "mv $1 .code/$1"
/bin/mv $1 .code/$1
echo "ln -s .$1client $1"
/bin/ln -s .$1client $1
else
echo "$1 does not exsist"
endif
endif
@//E*O*F scripts/saddldd//
chmod u=rx,g=rx,o=rx scripts/saddldd
echo Inspecting for damage in transit...
temp=/tmp/shar$$; dtemp=/tmp/.shar$$
trap "rm -f $temp $dtemp; exit" 0 1 2 3 15
cat > $temp <<\!!!
154 491 4545 Makefile
471 1929 12916 main.c
32 123 711 addldd
63 202 1580 makedirs
32 126 733 qaddldd
25 76 454 rmldd
28 113 644 saddldd
805 3060 21583 total
!!!
wc client/Makefile client/main.c scripts/addldd scripts/makedirs scripts/qaddldd scripts/rmldd scripts/saddldd | sed 's=[^ ]*/==' | diff -b $temp - >$dtemp
if [ -s $dtemp ]
then echo "Ouch [diff of wc output]:" ; cat $dtemp
else echo "No problems found."
fi
exit 0muller@sdcc3.UUCP (Keith Muller) (02/21/85)
This is part 1 of the load control system. This part MUST be unpacked
BEFORE any other part.
# This is a shell archive. Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by sdcc3!muller on Sat Feb 9 13:40:15 PST 1985
# Contents: client/ control/ h/ scripts/ server/ man/ README NOTICE Makefile
# man/Makefile man/ldc.8 man/ldd.8 man/ldq.1 man/ldrm.1
echo x - README
sed 's/^@//' > "README" <<'@//E*O*F README//'
TO INSTALL: (you MUST be root) (January 24, 1985 version)
1) Select a group id for load control system to use. No user should be in this
group. Add this group to /etc/groups and call it lddgrp.
** By default the group id 25 is used. **
2) Look at the file h/common.h. Make sure that LDDGID is defined to be the
same group id as you selected in step 1.
3) cd to the scripts directory. Inspect the paths used in the file makedirs.
The script makedirs creates the required directories with the proper modes
groups and owners. The .code directories are where the real executable
files are hidden, protected by group access (the directory is protected
from all "other" access). Each directory which contains programs that you
want load controlled must have a .code subdirectory.
NOTE: You really do not have to change makedirs at all except to ADD
any additional directories you want controlled. It is perfectly safe to
just run this system on any 4.2 system without ANY path changes (this
includes sun, vax and pyramid versions).
4) If you alter or add any pathnames in makedirs, you might have to adjust
the makefiles. For each subdirectory (client, server, control) adjust
or add the paths in the Makefiles.
5) If you alter any pathname in makedirs you will have to check all the h
files in the directory h. Change any paths as required.
6) run makedirs (if you have an older release of ldd: You should shut down
the ldd server and remove the old status and errlog file. Then run
makedirs.) Makedirs can be run any number of times without harm. It will
reset the owners and groups of all directories to the correct state.
7) In the top level directory (The same directory as this README file is in),
run make. then make install. All the binaries are now in place.
8) Start the ldd server:
/etc/ldd [-T cycle] [-L load]
The server will detach itself and wait for requests. You should get no
messages from the server. The two flags are optional. The -L flag
specifies the number of seconds between each load average check. The
-L flag specifies the load average queueing starts. If neither are
specified the defaults are used. (see the manual page for ldd). You
can change the defaults by editing h/server.h. ALRMTIME is the cycle
time, and MAXLOAD is the load average.
The following are good values to start with:
machine cycle load
----------------------------------------------------------
pyramid 90x 25 10.0
pyramid 90mx 15 15.0
vax 780 50 9.0
vax 750 60 7.5
vax 730 60 6.0
sun 2 60 6.5
9) add the following lines to /etc/rc.local (change path and add any ldd
arguements as selected from the above table). See the man page on ldd
for more info.
if [ -f /etc/ldd ]; then
/bin/rm -f /usr/spool/ldd/sr/errors
/etc/ldd & echo -n ' ldd' >/dev/console
fi
10) for each directory to be controlled select those programs you want under
the load control system. The programs you select should be jobs that
usually do not require user interaction, though nasty systems like macsyma
might be load controlled anyway. Never load control things that have time
response requirements. The jobs you select will determine the overall
usefullness of the load control system. For the load control system to
be completely effective, all the programs that cause any significant load
on the system should be placed under load control. For example the cc
command is a very typical of a program that should be load controlled.
When run, cc uses large amount of resources which increases as the size
of the program being compiled increases. When there are many cc's running
simultaneously the machine gets quite overloaded and your system thrashes.
A poor choice would be a command like cat. Sure cat can do a lot of i/o,
but even ten cat's reading very large files do not impact the system
very much. Troff is a very good command to load control. It is not very
interactive, and a lot of them running would bring even slow a cray.
Watching your system when it is overoaded with ps au should tell you which
programs on your system need to be load controlled.
The following is a list of programs I have under load control:
/bin/cc /bin/make /bin/passwd /usr/bin/pc /usr/bin/pix /usr/bin/liszt
/usr/bin/lisp /usr/bin/vgrind /usr/ucb/f77 /usr/ucb/lint /usr/ucb/nroff
/usr/ucb/spell /usr/ucb/troff /usr/ucb/yacc
The following is the list of places to look for other candidates for load
control:
a) /bin
b) /usr/bin
c) /usr/ucb
d) /usr/new
e) /usr/local
f) /usr/games
i) some programs use argv[0] to pass data (so far only the ucb pi
does this when called by pix). These programs must be treated
differently (since they mangle argv[0], it cannot be used to
determine which binary to execute). A special client called
.NAMEclient where NAME is the actual name of the program must be
created. These special programs must be specified in the
client/Makefile. See the sample for $(SPEC1) which is for a program
called test in /tmp. Run the script onetime/saddldd for these programs.
ii) run the script scripts/addldd with each program to be load controlled
that requires a STATUS MESSAGE ("Queued waiting to run.") as an
arguement (i.e. addldd /bin cc make)
iii)run the script scripts/qaddldd with each program to be load controlled
that DOES NOT require a STATUS MESSAGE as an arguement
(i.e. qaddldd /usr/bin nroff)
addldd/qaddldd/saddldd moves the real binary into the .code file and
replaces it with a "symbolic link" to either .client (for addldd and
qaddldd) or a .NAMEclient (for saddldd) So the command:
addldd /bin cc
moves cc to /bin/.code/cc and creates the symbolic link /bin/cc
to /bin/.client.
11) any changes to any file in the load control system from now on
will be correctly handled by a make install from the top level directory.
12) the script script/rmldd can be used to remove programs from the ldd system.
13) Compilers like cc and pc should have all the intermediate passes protected.
Each pass must be in group lddgrp and have the others access turned off
For example:
chmod 0750 /lib/c2
chgrp lddgrp /lib/c2
14) When the system is running you might have to adjust the operating
parameters of ldd for the job mix and the capacity of your machine.
Use ldc to adjust these parameters while the load control system is
running and watch what happens. The .h files as supplied use values that
will safely work on any machine, but might not be best values for your
specific needs. In the vast majority of cases, only the load point
and cycle time need to be changed and these can be set with arguements to
ldd when it is first invoked. Be careful as radical changes to
the defaults might make defeat the purpose of ldd. If things ever get
really screwed up, you can just kill -9 the server (or from ldc: abort
server) and things will run just like the load control doesn't exsist.
(Note the pid of the currently running ldd is always stored in the lock
file "spool/ldd/sr/lock"). (See the man page on ldd for more).
15) If load control does not stop the system load to no more than the load
limit + 2.5 then there are programs that are loading down the machine
which are not under load control. Find out what they are and load control
them.
16) To increase the response of the system you can lower the load threshold.
Of course if the threshold gets too low the system can end up with long
wait times for running. Long wait times are usually around 3000 seconds
for super loaded vaxes. On the very fast pyramids, 500 seconds (48 users
and as many large cc as the students can get running) seems the longest
delay I have seen. You can also play with the times between checks. This
has some effect on vaxes but 50 - 60 seconds seems optimal. On pyramids
it is quite different. Since the throughput is so very much greater
than vaxes (four times greater at the very least), the load needs to be
checked at least every 25 seconds. If this check time is too long you
risk having the machine go idle for a number of seconds. Since the whole
point is to squeeze out every last cpu cycle out of the machine, idle
time must be avoided. Watching the machine with vmstat or the mon program
is useful for this. Try to keep the user percentage of the cpu as high
as possible. Try to have enough jobs runnable so the machine doesn't
go idle do to a lack of jobs (yes this can happen with lots of disk io).
17) If you want/need more info on the inner workings of the ldd system, you
can read the comments in the .h files and the source files. If you have
problems drop me a line. I will be happy to answer any questions.
Keith Muller
University of California, San Diego
Mail Code C-010
La Jolla, CA 92093
ucbvax!sdcsvax!muller
(619) 452-6090
@//E*O*F README//
chmod u=r,g=r,o=r README
echo x - NOTICE
sed 's/^@//' > "NOTICE" <<'@//E*O*F NOTICE//'
DISCLAIMER
"Although each program has been tested by its author, no warranty,
express or implied, is made by the author as to the accuracy and
functioning of the program and related program material, nor shall
the fact of distribution constitute any such warranty, and no
responsibility is assumed by the author in connection herewith."
This program cannot be sold, distributed or copied for profit, without
prior permission from the author. You are free to use it as long the
author is properly credited with it's design and implementation.
Keith Muller
Janaury 15, 1985
San Diego, CA
@//E*O*F NOTICE//
chmod u=r,g=r,o=r NOTICE
echo x - Makefile
sed 's/^@//' > "Makefile" <<'@//E*O*F Makefile//'
#
# Makefile for ldd server and client
#
#
all:
cd server; make ${MFLAGS}
cd client; make ${MFLAGS}
cd control; make ${MFLAGS}
lint:
cd server; make ${MFLAGS} lint
cd client; make ${MFLAGS} lint
cd control; make ${MFLAGS} lint
install:
cd server; make ${MFLAGS} install
cd client; make ${MFLAGS} install
cd control; make ${MFLAGS} install
cd man; make ${MFLAGS} install
clean:
cd server; make ${MFLAGS} clean
cd client; make ${MFLAGS} clean
cd control; make ${MFLAGS} clean
@//E*O*F Makefile//
chmod u=r,g=r,o=r Makefile
echo mkdir - client
mkdir client
chmod u=rwx,g=rx,o=rx client
echo mkdir - control
mkdir control
chmod u=rwx,g=rx,o=rx control
echo mkdir - h
mkdir h
chmod u=rwx,g=rx,o=rx h
echo mkdir - scripts
mkdir scripts
chmod u=rwx,g=rx,o=rx scripts
echo mkdir - server
mkdir server
chmod u=rwx,g=rx,o=rx server
echo mkdir - man
mkdir man
chmod u=rwx,g=rx,o=rx man
echo x - man/Makefile
sed 's/^@//' > "man/Makefile" <<'@//E*O*F man/Makefile//'
#
# Makefile for ldd manual pages
#
DEST= /usr/man
TARG= $(DEST)/man8/ldd.8 $(DEST)/man8/ldc.8 $(DEST)/man1/ldrm.1 \
$(DEST)/man1/ldq.1
all:
install: $(TARG)
$(DEST)/man8/ldd.8: ldd.8
install -c -o root ldd.8 $(DEST)/man8
$(DEST)/man8/ldc.8: ldc.8
install -c -o root ldc.8 $(DEST)/man8
$(DEST)/man1/ldrm.1: ldrm.1
install -c -o root ldrm.1 $(DEST)/man1
$(DEST)/man1/ldq.1: ldq.1
install -c -o root ldq.1 $(DEST)/man1
clean:
@//E*O*F man/Makefile//
chmod u=r,g=r,o=r man/Makefile
echo x - man/ldc.8
sed 's/^@//' > "man/ldc.8" <<'@//E*O*F man/ldc.8//'
@.TH LDC 8 "24 January 1985"
@.UC 4
@.ad
@.SH NAME
ldc \- load system control program
@.SH SYNOPSIS
@.B /etc/ldc
[ command [ argument ... ] ]
@.SH DESCRIPTION
@.I Ldc
is used by the system administrator to control the
operation of the load control system, by sending commands to
@.I ldd
(the load control server daemon).
@.I Ldc
may be used to:
@.IP \(bu
list all the queued jobs owned by a single user,
@.IP \(bu
list all the jobs in the queue,
@.IP \(bu
list the current settings of changeable load control server parameters,
@.IP \(bu
abort the load control server,
@.IP \(bu
delete a job from the queue (specified by pid or by user name),
@.IP \(bu
purge the queue of all jobs,
@.IP \(bu
rearrange the order of queued jobs,
@.IP \(bu
run a job regardless of the system load (specified by pid or user name),
@.IP \(bu
change the load average at which jobs will be queued,
@.IP \(bu
change the limit on the number of jobs in queue,
@.IP \(bu
change the number of seconds between each check on the load average,
@.IP \(bu
print the contents of the servers error logging file,
@.IP \(bu
change the maximum time limit that a job can be queued.
@.PP
Without any arguments,
@.I ldc
will prompt for commands from the standard input.
If arguments are supplied,
@.IR ldc
interprets the first argument as a command and the remaining
arguments as parameters to the command. The standard input
may be redirected causing
@.I ldc
to read commands from a file.
Commands may be abbreviated, as any unique prefix of a command will be
accepted.
The following is the list of recognized commands.
@.TP
? [ command ... ]
@.TP
help [ command ... ]
@.br
Print a short description of each command specified in the argument list,
or, if no arguments are given, a list of the recognized commands.
@.TP
abort server
@.br
Terminate the load control server.
This does
@.I not
terminate currently queued jobs, which will run when they
next poll the server (usually every 10 minutes).
If the server is restarted these jobs will be inserted into the queue ordered
by the time at which the job was started.
Jobs will
@.I not
be lost by aborting the server.
Both words "abort server" must by typed (or a unique prefix) as a safety
measure.
Only root can execute this command.
@.TP
delete [\f2pids\f1] [-u \f2users\f1]
@.br
This command has two modes. It will delete jobs listed by pid, or with the
@.B \-u
option delete all the jobs owned by the listed users.
Job that are removed from the queue will exit returning status 1 (they
do not run).
Users can only delete jobs they own from the queue, while root can delete any
job.
@.TP
errors
@.br
Print the contents of the load control server error logging file.
@.TP
list [\f2user\f1]
@.br
This will list the contents of the queue, showing each jobs rank, pid,
owner, time in queue, and a abbreviated line of the command to be executed
for the specified user. If no user is specifies, it defaults to be the
user running the command. (Same as the ldq command).
@.TP
loadlimit \f2value\f1
@.br
Changes the load average at which the load control system begins
to queue jobs to \f2value\f1.
Only root can execute this command.
@.TP
longlist
@.br
Same as list except prints ALL the jobs in the queue. This is expensive to
execute. (Same as the ldq -a command).
@.TP
move \f2pid rank\f1
@.br
Moves the process specified by process id
@.I pid
to position
@.I rank
in the queue.
Only root can execute this command.
@.TP
purge all
@.br
Removes ALL the jobs from the queue. Removed jobs terminate returning a
status of 1.
As a safety measure both the words "purge all" (or a prefix of) must be typed.
Only root can execute this command.
@.TP
quit
@.br
Exit from ldc.
@.TP
run [\f2pids\f1] [-u \f2users\f1]
@.br
Forces the jobs with the listed
@.I pids
to be run
@.I regardless
of the system load.
The
@.B \-u
option forces all jobs owned by the listed users to be run regardless
of the system load.
Only root can execute this command.
@.TP
sizeset \f2size\f1
@.br
Sets the limit on the number of jobs that can be in the queue to be
@.I size.
This prevents the unix system process table from running out of slots if
the system is extremely overloaded. All job requests that are made while
the queue is at the limit are rejected and told to try again later.
The default value is 150 jobs.
Only root can execute this command.
@.TP
status
@.br
Prints the current settings of internal load control server variables.
This includes the number of jobs in queue, the load average above which
jobs are queued, the limit on the size of the queue, the time in seconds between
load average checks by the server, the maximum time in seconds a job can be
queued, and the number of recoverable errors detected by the server.
@.TP
timerset \f2time\f1
@.br
Sets the number of seconds that the server waits between system load average
checks to
@.I time.
(Every
@.I time
seconds the server reads the current load average and if it below the load
average limit (see
@.I loadlimit
) the jobs are removed from the front of the queue and told to run).
Only root can execute this command.
@.TP
waitset \f2time\f1
@.br
Sets the maximum number of seconds that a job can be queued regardless
of the system load to
@.I time
seconds.
This will prevent the load control system from backing up with jobs that never
run do to some kind of degenerate condition.
@.SH EXAMPLES
To list the jobs owned by user joe:
@.sp
list joe
@.sp
To move process 45 to position 6 in the queue:
@.sp
move 45 6
@.sp
To delete all the jobs owned by users sam and joe:
@.sp
delete -u sam joe
@.sp
To run jobs with pids 1121, 1177, and 43:
@.sp
run 1121 1177 43
@.SH FILES
@.nf
/usr/spool/ldd/* spool directory where sockets are bound
@.fi
@.SH "SEE ALSO"
ldd(8),
ldrm(1),
ldq(1)
@.SH DIAGNOSTICS
@.nf
@.ta \w'?Ambiguous command 'u
?Ambiguous command abbreviation matches more than one command
?Invalid command no match was found
?Privileged command command can be executed by only by root
@.fi
@//E*O*F man/ldc.8//
chmod u=r,g=r,o=r man/ldc.8
echo x - man/ldd.8
sed 's/^@//' > "man/ldd.8" <<'@//E*O*F man/ldd.8//'
@.TH LDD 8 "24 January 1985"
@.UC 4
@.ad
@.SH NAME
ldd \- load system server (daemon)
@.SH SYNOPSIS
@.B /etc/ldd
[
@.B \-L
@.I load
] [
@.B \-T
@.I alarm
]
@.SH DESCRIPTION
@.TP
@.B \-L
changes the load average threshold to
@.I load
instead of the default (usually 10).
@.TP
@.B \-T
changes the time (in seconds)
between load average checks to
@.I alarm
seconds instead of the default (usually 60 seconds).
@.PP
@.I Ldd
is the load control server (daemon) and is normally invoked
at boot time from the
@.IR rc.local (8)
file.
The
@.I ldd
server attempts to maintain the system load average
below a preset value so interactive programs like
@.IR vi (1)
remain responsive.
@.I Ldd
works by preventing the system from thrashing
(i.e. excessive paging and high rates of context switches) and decreasing the
systems throughput by limiting the number runnable processes in the system
at a given moment.
When the system load average
is above the threshold,
@.I ldd
will block specific cpu intensive processes from running and place
them in a queue.
These blocked jobs are not runnable and therefore do not
contribute to the system load. When the load average drops below the threshold,
@.I ldd
will remove jobs from the queue and allow them to continue execution.
The system administration determines which programs are
considered cpu intensive and places control of their execution under the
@.I ldd
server.
The system load average is the number of runnable processes,
and is measured by the 1 minute
@.IR uptime (1)
statistics.
@.PP
A front end client program replaces each program controlled by the
@.I ldd
server.
Each time a user requests execution of a controlled program, the
client enters the request state,
sends a "request to run" datagram to the server and waits for a response. The
waiting client is blocked, waiting for the response from the
@.I ldd
server.
If the client does not receive an answer to a request after a certain
period of time has elapsed (usually 90 seconds), the request is resent.
If the request is resent a number of times (usually 3)
without response from the server, the requested program is executed.
This prevents the process from being blocked forever if the
@.I ldd
server fails.
@.PP
The
@.I ldd
server can send one of five different messages to the client.
A "queued message" indicates that the client has
been entered into the queue and should wait.
A "poll message" indicates that the server did not receive a message,
so the client should resend the message.
A "terminate message" indicates that the request cannot be honored
and the client should exit abnormally.
A "run message" indicates the requested program should be run.
A "full message" indicates that the ldd queue is full and this request cannot
be accepted. This limit is to prevent the Unix kernel process table from
running out of slots, since queued processes
still use system process slots.
@.PP
When the server receives a "request to run",
it determines whether the job should run immediately, be rejected,
or be queued.
If the queue is full, the job is rejected and the client exits.
If the queue is not empty, the request is added to the queue,
and the client is sent a "queued message"
The client then enters the queued state
and waits for another command from the server.
If no further commands are received from the server after a preset time
has elapsed (usually 10 minutes),
the client re-enters the request state and resends the request
to the server to ensure that the server has not terminated or
failed since the time the client was queued.
@.PP
If the queue is empty, the server checks the current load average, and
if it is below the threshold, the client is sent a "run message".
Otherwise the server queues the request, sends the client a "queued message",
and starts the interval timer.
The interval timer is bound to a handler that checks the system load every
few seconds (usually 60 seconds).
If the handler finds the current load average is below the threshold,
jobs are removed from the head of the queue and sent a "run message".
The number of jobs sent "run messages" depends on how much the current
load average has dropped below the limit.
If the load average is above the threshold, the handler checks
how long the oldest process has been waiting to run.
If that time is greater than a preset limit (usually 4 hours), the job is
removed from the queue and allowed to run regardless of the load.
This prevents jobs from being blocked forever due to load averages that
remain above the threshold for long periods of time.
If the queue becomes empty, the handler will shut off the interval timer.
@.PP
The
@.I ldd
server logs all recoverable and unrecoverable errors in a logfile. Advisory
locks are used to prevent more than one executing server at a time.
When the
@.I ldd
server first begins execution, it scans the spool directory for clients that
might have been queued from a previous
@.I ldd
server and sends them a "poll request".
Waiting clients will resend their "request to run" message to the new
server, and re-enter the request state.
The
@.I ldd
server will rebuild the queue of waiting tasks
ordered by the time each client began execution.
This allows the
@.I ldd
server to be terminated and be re-started without
loss or blockage of any waiting clients.
@.PP
The environment variable LOAD can be set to "quiet", which will
surpress the output to stderr of the status strings "queued"
and "running" for commands which have been set up to display status.
@.PP
Commands can be sent to the server with the
@.IR ldc (8)
control program. These commands can manipulate the queue and change the
values of the various preset limits used by the server.
@.SH FILES
@.nf
@.ta \w'/usr/spool/ldd/sr/msgsock 'u
/usr/spool/ldd ldd spool directory
/usr/spool/ldd/sr/msgsock name of server datagram socket
/usr/spool/ldd/sr/cnsock name do server socket or control messages
/usr/spool/ldd/sr/list list of queued jobs (not always up to date)
/usr/spool/ldd/sr/lock lock file (contains pid of server)
/usr/spool/ldd/sr/errors log file of server errors
@.fi
@.SH "SEE ALSO"
ldc(8),
ldq(1),
ldrm(1).
@//E*O*F man/ldd.8//
chmod u=r,g=r,o=r man/ldd.8
echo x - man/ldq.1
sed 's/^@//' > "man/ldq.1" <<'@//E*O*F man/ldq.1//'
@.TH LDQ 1 "24 January 1985"
@.UC 4
@.SH NAME
ldq \- load system queue listing program
@.SH SYNOPSIS
@.B ldq
[
@.I user
] [
@.B \-a
]
@.SH DESCRIPTION
@.I Ldq
is used to print the contents of the queue maintained by the
@.IR ldd (8)
server.
For each job selected by
@.I ldq
to be printed, the rank (position) in the queue, the process id, the owner of
the job, the number of seconds the job has been waiting to run, and the
command line of the job (truncated in length to the first 16 characters)
is printed.
@.PP
With no arguments,
@.I ldq
will print out the status of the jobs in the queue owned by the user running
@.I ldq.
Another users jobs can be printed if that user is specified as an argument
to
@.I ldq.
The
@.B \-a
option will print all the jobs in the queue.
Of course the
@.B \-a
option is much more expensive to run.
@.PP
Users can delete any job they own by using either the
@.IR ldrm (1)
or
@.IR ldc (8)
commands.
@.SH FILES
@.nf
@.ta \w'/usr/spool/ldd/cl/* 'u
/usr/spool/ldd/cl/* the spool area where sockets are bound
@.fi
@.SH "SEE ALSO"
ldrm(1),
ldc(8),
ldd(8)
@.SH DIAGNOSTICS
This command will fail if the
@.I ldd
server is not executing.
@//E*O*F man/ldq.1//
chmod u=r,g=r,o=r man/ldq.1
echo x - man/ldrm.1
sed 's/^@//' > "man/ldrm.1" <<'@//E*O*F man/ldrm.1//'
@.TH LDRM 1 "24 January 1985"
@.UC 4
@.SH NAME
ldrm \- remove jobs from the load system queue
@.SH SYNOPSIS
@.B ldrm
[
@.I pids
] [
@.B \-u
@.I users
]
@.SH DESCRIPTION
@.I Ldrm
will remove a job, or jobs, from the load control queue.
Since the server is protected, this and
@.IR ldc (8)
are the only ways users can remove jobs from the load control spool (other
than killing the waiting process directly).
When a job is removed, it will terminate returning status 1.
This method is preferred over sending a kill -KILL to the process as the
job will be removed from the queue, and will no longer appear in
lists produced by
@.IR ldq (1)
or
@.IR ldc (8).
@.PP
@.I Ldrm
can remove jobs specified either by pid or by user name.
With the
@.B \-u
flag,
@.I ldrm
expects a list of users who will have all their jobs removed from the
load control queue.
When given a list of pid's,
@.I ldrm
will remove those jobs from the queue.
A user can only remove jobs they own, while root can remove any job.
@.SH EXAMPLES
To remove the two jobs with pids 8144 and 47:
@.sp
ldrm 8144 47
@.sp
To remove all the jobs owned by the users joe and sam:
@.sp
ldrm -u joe sam
@.SH FILES
@.nf
@.ta \w'/usr/spool/ldd/cl/* 'u
/usr/spool/ldd/cl/* directory where sockets are bound
@.fi
@.SH "SEE ALSO"
ldq(1),
ldc(8),
ldd(8)
@.SH DIAGNOSTICS
``Permission denied" if the user tries to remove files other than his
own.
@//E*O*F man/ldrm.1//
chmod u=r,g=r,o=r man/ldrm.1
echo Inspecting for damage in transit...
temp=/tmp/shar$$; dtemp=/tmp/.shar$$
trap "rm -f $temp $dtemp; exit" 0 1 2 3 15
cat > $temp <<\!!!
182 1518 9101 README
14 96 613 NOTICE
25 76 502 Makefile
27 52 439 Makefile
215 1075 5877 ldc.8
168 1045 6106 ldd.8
55 221 1145 ldq.1
59 261 1362 ldrm.1
745 4344 25145 total
!!!
wc README NOTICE Makefile man/Makefile man/ldc.8 man/ldd.8 man/ldq.1 man/ldrm.1 | sed 's=[^ ]*/==' | diff -b $temp - >$dtemp
if [ -s $dtemp ]
then echo "Ouch [diff of wc output]:" ; cat $dtemp
else echo "No problems found."
fi
exit 0