[comp.sys.alliant] job classes

xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) (08/10/90)

I'm running an intel hypercube simulator and am having trouble getting the
forked processes it generates to run on individual ce's (they always
want to run on the complex).  I've tried running the simulator using the
'execute -ce` command, but any generated processes (it generates a process for
every node of the hypercube that you simulate) still run on the complex. (?)

Anyone have any ideas?  The scheduler is set: 

setsched IP 11 -t  40  4  3 -t  20  3  4 
setsched CL  0 -td 30  3  2  1 -t  3  2  1 
setsched CE  0 -t  30  2  3 -t  30  3  2 
setsched CE  1 -t  30  2  3 -t  30  3  2 
(etc..)

setcomplex CL 0 -d1 -c1

-rich


--
-----------------------------------------------------------------------------
Rich Rinehart                  |     phone: 216-433-5211
NASA Lewis Research Center     |     email: xxrich@alliant1.lerc.nasa.gov
-----------------------------------------------------------------------------

xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) (08/13/90)

In article <1990Aug10.112530.720@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes:
>
>I'm running an intel hypercube simulator and am having trouble getting the
>forked processes it generates to run on individual ce's (they always
>want to run on the complex).  I've tried running the simulator using the
>'execute -ce` command, but any generated processes (it generates a process for
>every node of the hypercube that you simulate) still run on the complex. (?)
>
>Anyone have any ideas?  The scheduler is set: 
>
>setsched IP 11 -t  40  4  3 -t  20  3  4 
>setsched CL  0 -td 30  3  2  1 -t 30 3  2  1 
>setsched CE  0 -t  30  2  3 -t  30  3  2 
>setsched CE  1 -t  30  2  3 -t  30  3  2 
>(etc..)
>
>setcomplex CL 0 -d1 -c1
>

Thanks for all the email responses to my posting.  Patrick Wolfe suggested
using the -nc option on the link, which did the trick.  What I can't understand
though, is that I had specifed -Ogv on the link, thinking that it would notice
that I did not specify concurrency and NOT run me on the complex!!  Wouldn't
this make sense??  If i don't specify concurrency why default me to the complex?





--
-----------------------------------------------------------------------------
Rich Rinehart                  |     phone: 216-433-5211
NASA Lewis Research Center     |     email: xxrich@alliant1.lerc.nasa.gov
-----------------------------------------------------------------------------

cantrell@Alliant.COM (Paul Cantrell) (08/13/90)

In article <1990Aug10.112530.720@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes:
>I'm running an intel hypercube simulator and am having trouble getting the
>forked processes it generates to run on individual ce's (they always
>want to run on the complex).  I've tried running the simulator using the
>'execute -ce` command, but any generated processes (it generates a process for
>every node of the hypercube that you simulate) still run on the complex. (?)
>

How about a little more information on how it generates the processes. Does
it simply fork, or does it fork/exec, or does it do a 'system()' call? If
it simply forks, things should be fine. However, if you do an exec of another
executable, that process will then be set to run on the complex if it has
been compiled that way. The same would be true of the unix 'system()' call,
since this actually exec()'s a shell, and then the target.

The easiest thing is probably to just compile the program(s) with -Ogv
so that it generates vector code, but not concurrency code. Then it
should want to run on CE's, instead of complexes.

					PC

xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) (08/14/90)

In article <4059@alliant.Alliant.COM> cantrell@alliant.Alliant.COM (Paul Cantrell) writes:
>In article <1990Aug10.112530.720@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes:
>>I'm running an intel hypercube simulator and am having trouble getting the
>>forked processes it generates to run on individual ce's (they always
>>want to run on the complex).  I've tried running the simulator using the
>>'execute -ce` command, but any generated processes (it generates a process for
>>every node of the hypercube that you simulate) still run on the complex. (?)
>>
>
>How about a little more information on how it generates the processes. Does
>it simply fork, or does it fork/exec, or does it do a 'system()' call? If

in a quick glance it looks like it just does a 'fork()'

>it simply forks, things should be fine. However, if you do an exec of another
>executable, that process will then be set to run on the complex if it has
>been compiled that way. The same would be true of the unix 'system()' call,
>since this actually exec()'s a shell, and then the target.
>
>The easiest thing is probably to just compile the program(s) with -Ogv
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
glad to hear you say that as that is what i thought too.  -Ogv is not enough
on the the link though, as -nc is needed.  (not intuitively obvious)

ex:
via makefiles.....

fortran -c node.f
fortran -Ogv -o n node.o /usr/local/intel/bsimlib.a
-wants to run on the complex

fortran -c node.f
fortran -Ogv -nc -o node node.o /usr/local/intel/bsimlib.a
-makes it run on individual ce's, which is what i wanted.

--
-----------------------------------------------------------------------------
Rich Rinehart                  |     phone: 216-433-5211
NASA Lewis Research Center     |     email: xxrich@alliant1.lerc.nasa.gov
-----------------------------------------------------------------------------

dereks@aggie.sgi.com (Derek Spears) (08/15/90)

	Just specifying -Ogv wil generate vector code and no cocurrency code.
However, the scheduler you posted still had classes 2 and 3 on the
complex. Therefore, when the complex switches on during its time slice,
it will see that it has a class 2 job (vector) and try to run it. I
agree that it is not the most intuitive apporach, but that is how the
Alliant scheduler does things...

	Derek Spears			| dereks@aggie.sgi.com
	Silicon Graphics, Inc.		|
	(415) 335-7211			| Yes, aggie as in Texas Aggie

cantrell@Alliant.COM (Paul Cantrell) (08/16/90)

In article <1990Aug14.115752.23746@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes:
>via makefiles.....
>
>fortran -c node.f
>fortran -Ogv -o n node.o /usr/local/intel/bsimlib.a
>-wants to run on the complex
>
>fortran -c node.f
>fortran -Ogv -nc -o node node.o /usr/local/intel/bsimlib.a
>-makes it run on individual ce's, which is what i wanted.

Right. But what I would have suggested would be:

fortran -Ogv -c node.f
fortran -o node node.o

which would avoid generating any concurrency instructions at all.

First thing to realize here is how the "fortran" or "cc" commands work
under unix. They try to do any compilations necessary, and then invoke
the linker for you. Many other operating systems require you to invoke
the linker by yourself. This feature of unix annoyed me the first few
weeks I used it until I got used to it.

What this means is that the line:

	fortran -c node.f

actually invokes the compiler to take node.f and compile it into a
relocatable binary. Since you didn't specify any level of optimization,
it assumed -Ogvc (the most agressive). It generated node.o which contains
possibly both vector and concurrency instructions.

Then you issued the command:

	fortran -Ogv -o n node.o /usr/local/intel/bsimlib.a

Since there are no .f files to be compiled in this line, the fortran
compiler actually never gets invoked, and the command just causes the
linker to run. The -Ogv command gets ignored by the linker, which then
proceeds to build "n" out of "node.o" and the library. Since "node.o"
contains concurrency instructions, the linker sets the "n" executable
file to require concurrency hardware, thus the OS will try to run it on
a complex if one exists.

So the basic problem you had was not specifying the -Ogv switch at the
time that the program got compiled. The reason that the -nc switch caused
the behavior that you desired was that this is indeed a linker switch
which caused the executable header to be bashed, indicating no concurrency
instructions in the file, when indeed there really are some in the program.

When the OS loads the program to run on a CE, the CE's concurrency hardware
will be turned off so that the embedded concurrency instructions will act
as nops. There may be a very minor performance penalty since you are executing
extra concurrency instructions. Just how much impact this has on performance
depends a lot on the exact structure of the loops. In general, I would expect
it to be fairly minor.

					Paul Cantrell

cantrell@Alliant.COM (Paul Cantrell) (08/16/90)

Disclaimer: The following is a non-official reply, don't take any of it as
	official Alliant policy or position or my boss will yell at me ;-)

In article <1990Aug14.194125.9771@odin.corp.sgi.com> dereks@aggie.sgi.com (Derek Spears) writes:
>	Just specifying -Ogv wil generate vector code and no cocurrency code.

Correct.

>However, the scheduler you posted still had classes 2 and 3 on the
>complex. Therefore, when the complex switches on during its time slice,
>it will see that it has a class 2 job (vector) and try to run it. I
>agree that it is not the most intuitive apporach, but that is how the
>Alliant scheduler does things...

Correct again, but only if there are no concurrent (class 1) jobs to run,
since they would get priority over the class 2 and 3 jobs. The idea here
is that if the complex got put together to run concurrent jobs, runs them
all to completion and has nothing else to do, it may as well run class 2
or class 3 jobs until the end of the resource timeslice.

Note that when it does this, only one of the members of the complex actually
run user code, and the other members just hang out idle. So it is acting
exactly like a CE in this case.

You might ask yourself, why is the complex getting put together if there are
no complex jobs to run?

The answer is that the -tc switch in the scheduling vector says that if there
are any concurrent jobs at all, put the complex together. It also says that if
there are no concurrent jobs and no non-concurrent jobs (i.e. no jobs at all)
go ahead and put the complex together anyway. If non-concurrent jobs then
become runnable, they will run on the complex until the end of the timeslice
at which point it would probably explode.

The reason for putting it together if there are no jobs at all is a
response issue. It's fairly time consuming to put the complex together,
and easy to take the complex apart, so this switch just anticipates
that complex jobs may become ready to run in the near future.

If you run a mix of concurrent and non-concurrent jobs, you might want to
try something like:

	setcomplex cl 0 -c1 -d8
	setsched cl 0 -td 35 1 2 3 -t 35 1 3 2

the result of which is that during the first (-td) timeslice, the complex
will get put together if there are any concurrent jobs, as long as there
are less than 8 non-concurrent jobs.

During the second (-t) timeslice, if there are more concurrent jobs than
non-concurrent, the complex will be put together, otherwise it will be
exploded to run as detached CEs.

This way, concurrent jobs get at least 50% of the system up until there are
more than 8 outstanding non-concurrent jobs, at which point the complex
jobs will get ignored until the backlog of non-concurrent jobs is decreased.
However, the complex will never get put together when there are no concurrent
jobs to run.

One final note: many people are confused by the complex schedule. The
classes in each complex timeslice only get used if the complex is put
together for that timeslice. If the complex is exploded, each individual
CE's schedule controls what gets run, and the classes in the complex
scheduling vector are ignored.

Hope you find this somewhat helpful.

					Paul Cantrell

xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) (08/16/90)

In article <4067@alliant.Alliant.COM> cantrell@alliant.Alliant.COM (Paul Cantrell) writes:
>In article <1990Aug14.115752.23746@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes:
>>via makefiles.....
>>
>>fortran -c node.f
>>fortran -Ogv -o n node.o /usr/local/intel/bsimlib.a
>>-wants to run on the complex
>>
>>fortran -c node.f
>>fortran -Ogv -nc -o node node.o /usr/local/intel/bsimlib.a
>>-makes it run on individual ce's, which is what i wanted.
>
>Right. But what I would have suggested would be:
>
>fortran -Ogv -c node.f
>fortran -o node node.o


[stuff deleted]

>So the basic problem you had was not specifying the -Ogv switch at the
>time that the program got compiled. The reason that the -nc switch caused
>the behavior that you desired was that this is indeed a linker switch

Thanks Paul for your tutorial, however I believe that if someone puts -Ogv in
the link phase it ought to be recognized as -nc.  It is certainly less 
cumbersome for a user to carry around 1 set of options than 2, and it is
more intuitive.  Getting it wrong can also make your machine perform and look 
VERY bad.  I also don't expect a general user to understand everything that
is happening at this level to know to use 2 different sets of options.   
Additionally, the fact that the linker doesn't complain about -Ogv gives a 
false sense of security.


[stuff deleted]
--
-----------------------------------------------------------------------------
Rich Rinehart                  |     phone: 216-433-5211
NASA Lewis Research Center     |     email: xxrich@alliant1.lerc.nasa.gov
-----------------------------------------------------------------------------

pmontgom@euphemia.math.ucla.edu (Peter Montgomery) (08/17/90)

C		Even unoptimized programs can use concurrency 
C		instructions.  For example, this program generates
C		calls to library function _vsqrt_fortran, 
C		which tries to do the square roots in parallel.
C		At UCLA, it runs 3.2 times as fast using 
C		a cluster of 6 processors as when running detached.
C		(0.5 vs. 1.6 seconds on an FX/80 with 8 ACEs).

	program test
	implicit none
	integer VECLNG, i, j
	parameter (VECLNG = 5000)
	real vec(VECLNG), tarray(2), ETIME, tbeg, tend
	common vec
	intrinsic SQRT

	tbeg = ETIME(tarray)
	do i = 1, VECLNG
	    vec(i) = i
	end do

	do j = 1, 100
	    vec = SQRT(vec)		! Square root of vector
	end do

	tend = ETIME(tarray)
	print *, 'Execution time = ', tend - tbeg
	end
--
        Peter L. Montgomery 
        pmontgom@MATH.UCLA.EDU 
        Department of Mathematics, UCLA, Los Angeles, CA 90024-1555
If I spent as much time on my dissertation as I do reading news, I'd graduate.

tj@Alliant.COM (Tom Jaskiewicz) (08/23/90)

In article <273@kaos.MATH.UCLA.EDU> pmontgom@euphemia.math.ucla.edu (Peter Montgomery) writes:
>C		Even unoptimized programs can use concurrency 
>C		instructions.  For example, this program generates
> . . .

Yes, this is the correct answer.  Any FORTRAN program will use the standard
fortran library, and any part of this library can use concurrency.  For
example, most FORTRAN READ and WRITE statements invoke library routines
that use concurrency.
-- 
##########################################################################
# The doctine of nonresistance against arbitrary power, and oppression is
# absurd, slavish, and destructive of the good and happiness of mankind.
#   -- Article 10, Part First, Constitution of New Hampshire