xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) (08/10/90)
I'm running an intel hypercube simulator and am having trouble getting the forked processes it generates to run on individual ce's (they always want to run on the complex). I've tried running the simulator using the 'execute -ce` command, but any generated processes (it generates a process for every node of the hypercube that you simulate) still run on the complex. (?) Anyone have any ideas? The scheduler is set: setsched IP 11 -t 40 4 3 -t 20 3 4 setsched CL 0 -td 30 3 2 1 -t 3 2 1 setsched CE 0 -t 30 2 3 -t 30 3 2 setsched CE 1 -t 30 2 3 -t 30 3 2 (etc..) setcomplex CL 0 -d1 -c1 -rich -- ----------------------------------------------------------------------------- Rich Rinehart | phone: 216-433-5211 NASA Lewis Research Center | email: xxrich@alliant1.lerc.nasa.gov -----------------------------------------------------------------------------
xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) (08/13/90)
In article <1990Aug10.112530.720@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes: > >I'm running an intel hypercube simulator and am having trouble getting the >forked processes it generates to run on individual ce's (they always >want to run on the complex). I've tried running the simulator using the >'execute -ce` command, but any generated processes (it generates a process for >every node of the hypercube that you simulate) still run on the complex. (?) > >Anyone have any ideas? The scheduler is set: > >setsched IP 11 -t 40 4 3 -t 20 3 4 >setsched CL 0 -td 30 3 2 1 -t 30 3 2 1 >setsched CE 0 -t 30 2 3 -t 30 3 2 >setsched CE 1 -t 30 2 3 -t 30 3 2 >(etc..) > >setcomplex CL 0 -d1 -c1 > Thanks for all the email responses to my posting. Patrick Wolfe suggested using the -nc option on the link, which did the trick. What I can't understand though, is that I had specifed -Ogv on the link, thinking that it would notice that I did not specify concurrency and NOT run me on the complex!! Wouldn't this make sense?? If i don't specify concurrency why default me to the complex? -- ----------------------------------------------------------------------------- Rich Rinehart | phone: 216-433-5211 NASA Lewis Research Center | email: xxrich@alliant1.lerc.nasa.gov -----------------------------------------------------------------------------
cantrell@Alliant.COM (Paul Cantrell) (08/13/90)
In article <1990Aug10.112530.720@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes: >I'm running an intel hypercube simulator and am having trouble getting the >forked processes it generates to run on individual ce's (they always >want to run on the complex). I've tried running the simulator using the >'execute -ce` command, but any generated processes (it generates a process for >every node of the hypercube that you simulate) still run on the complex. (?) > How about a little more information on how it generates the processes. Does it simply fork, or does it fork/exec, or does it do a 'system()' call? If it simply forks, things should be fine. However, if you do an exec of another executable, that process will then be set to run on the complex if it has been compiled that way. The same would be true of the unix 'system()' call, since this actually exec()'s a shell, and then the target. The easiest thing is probably to just compile the program(s) with -Ogv so that it generates vector code, but not concurrency code. Then it should want to run on CE's, instead of complexes. PC
xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) (08/14/90)
In article <4059@alliant.Alliant.COM> cantrell@alliant.Alliant.COM (Paul Cantrell) writes: >In article <1990Aug10.112530.720@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes: >>I'm running an intel hypercube simulator and am having trouble getting the >>forked processes it generates to run on individual ce's (they always >>want to run on the complex). I've tried running the simulator using the >>'execute -ce` command, but any generated processes (it generates a process for >>every node of the hypercube that you simulate) still run on the complex. (?) >> > >How about a little more information on how it generates the processes. Does >it simply fork, or does it fork/exec, or does it do a 'system()' call? If in a quick glance it looks like it just does a 'fork()' >it simply forks, things should be fine. However, if you do an exec of another >executable, that process will then be set to run on the complex if it has >been compiled that way. The same would be true of the unix 'system()' call, >since this actually exec()'s a shell, and then the target. > >The easiest thing is probably to just compile the program(s) with -Ogv ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ glad to hear you say that as that is what i thought too. -Ogv is not enough on the the link though, as -nc is needed. (not intuitively obvious) ex: via makefiles..... fortran -c node.f fortran -Ogv -o n node.o /usr/local/intel/bsimlib.a -wants to run on the complex fortran -c node.f fortran -Ogv -nc -o node node.o /usr/local/intel/bsimlib.a -makes it run on individual ce's, which is what i wanted. -- ----------------------------------------------------------------------------- Rich Rinehart | phone: 216-433-5211 NASA Lewis Research Center | email: xxrich@alliant1.lerc.nasa.gov -----------------------------------------------------------------------------
dereks@aggie.sgi.com (Derek Spears) (08/15/90)
Just specifying -Ogv wil generate vector code and no cocurrency code. However, the scheduler you posted still had classes 2 and 3 on the complex. Therefore, when the complex switches on during its time slice, it will see that it has a class 2 job (vector) and try to run it. I agree that it is not the most intuitive apporach, but that is how the Alliant scheduler does things... Derek Spears | dereks@aggie.sgi.com Silicon Graphics, Inc. | (415) 335-7211 | Yes, aggie as in Texas Aggie
cantrell@Alliant.COM (Paul Cantrell) (08/16/90)
In article <1990Aug14.115752.23746@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes: >via makefiles..... > >fortran -c node.f >fortran -Ogv -o n node.o /usr/local/intel/bsimlib.a >-wants to run on the complex > >fortran -c node.f >fortran -Ogv -nc -o node node.o /usr/local/intel/bsimlib.a >-makes it run on individual ce's, which is what i wanted. Right. But what I would have suggested would be: fortran -Ogv -c node.f fortran -o node node.o which would avoid generating any concurrency instructions at all. First thing to realize here is how the "fortran" or "cc" commands work under unix. They try to do any compilations necessary, and then invoke the linker for you. Many other operating systems require you to invoke the linker by yourself. This feature of unix annoyed me the first few weeks I used it until I got used to it. What this means is that the line: fortran -c node.f actually invokes the compiler to take node.f and compile it into a relocatable binary. Since you didn't specify any level of optimization, it assumed -Ogvc (the most agressive). It generated node.o which contains possibly both vector and concurrency instructions. Then you issued the command: fortran -Ogv -o n node.o /usr/local/intel/bsimlib.a Since there are no .f files to be compiled in this line, the fortran compiler actually never gets invoked, and the command just causes the linker to run. The -Ogv command gets ignored by the linker, which then proceeds to build "n" out of "node.o" and the library. Since "node.o" contains concurrency instructions, the linker sets the "n" executable file to require concurrency hardware, thus the OS will try to run it on a complex if one exists. So the basic problem you had was not specifying the -Ogv switch at the time that the program got compiled. The reason that the -nc switch caused the behavior that you desired was that this is indeed a linker switch which caused the executable header to be bashed, indicating no concurrency instructions in the file, when indeed there really are some in the program. When the OS loads the program to run on a CE, the CE's concurrency hardware will be turned off so that the embedded concurrency instructions will act as nops. There may be a very minor performance penalty since you are executing extra concurrency instructions. Just how much impact this has on performance depends a lot on the exact structure of the loops. In general, I would expect it to be fairly minor. Paul Cantrell
cantrell@Alliant.COM (Paul Cantrell) (08/16/90)
Disclaimer: The following is a non-official reply, don't take any of it as official Alliant policy or position or my boss will yell at me ;-) In article <1990Aug14.194125.9771@odin.corp.sgi.com> dereks@aggie.sgi.com (Derek Spears) writes: > Just specifying -Ogv wil generate vector code and no cocurrency code. Correct. >However, the scheduler you posted still had classes 2 and 3 on the >complex. Therefore, when the complex switches on during its time slice, >it will see that it has a class 2 job (vector) and try to run it. I >agree that it is not the most intuitive apporach, but that is how the >Alliant scheduler does things... Correct again, but only if there are no concurrent (class 1) jobs to run, since they would get priority over the class 2 and 3 jobs. The idea here is that if the complex got put together to run concurrent jobs, runs them all to completion and has nothing else to do, it may as well run class 2 or class 3 jobs until the end of the resource timeslice. Note that when it does this, only one of the members of the complex actually run user code, and the other members just hang out idle. So it is acting exactly like a CE in this case. You might ask yourself, why is the complex getting put together if there are no complex jobs to run? The answer is that the -tc switch in the scheduling vector says that if there are any concurrent jobs at all, put the complex together. It also says that if there are no concurrent jobs and no non-concurrent jobs (i.e. no jobs at all) go ahead and put the complex together anyway. If non-concurrent jobs then become runnable, they will run on the complex until the end of the timeslice at which point it would probably explode. The reason for putting it together if there are no jobs at all is a response issue. It's fairly time consuming to put the complex together, and easy to take the complex apart, so this switch just anticipates that complex jobs may become ready to run in the near future. If you run a mix of concurrent and non-concurrent jobs, you might want to try something like: setcomplex cl 0 -c1 -d8 setsched cl 0 -td 35 1 2 3 -t 35 1 3 2 the result of which is that during the first (-td) timeslice, the complex will get put together if there are any concurrent jobs, as long as there are less than 8 non-concurrent jobs. During the second (-t) timeslice, if there are more concurrent jobs than non-concurrent, the complex will be put together, otherwise it will be exploded to run as detached CEs. This way, concurrent jobs get at least 50% of the system up until there are more than 8 outstanding non-concurrent jobs, at which point the complex jobs will get ignored until the backlog of non-concurrent jobs is decreased. However, the complex will never get put together when there are no concurrent jobs to run. One final note: many people are confused by the complex schedule. The classes in each complex timeslice only get used if the complex is put together for that timeslice. If the complex is exploded, each individual CE's schedule controls what gets run, and the classes in the complex scheduling vector are ignored. Hope you find this somewhat helpful. Paul Cantrell
xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) (08/16/90)
In article <4067@alliant.Alliant.COM> cantrell@alliant.Alliant.COM (Paul Cantrell) writes: >In article <1990Aug14.115752.23746@eagle.lerc.nasa.gov> xxrich@alliant1.lerc.nasa.gov (Rich Rinehart) writes: >>via makefiles..... >> >>fortran -c node.f >>fortran -Ogv -o n node.o /usr/local/intel/bsimlib.a >>-wants to run on the complex >> >>fortran -c node.f >>fortran -Ogv -nc -o node node.o /usr/local/intel/bsimlib.a >>-makes it run on individual ce's, which is what i wanted. > >Right. But what I would have suggested would be: > >fortran -Ogv -c node.f >fortran -o node node.o [stuff deleted] >So the basic problem you had was not specifying the -Ogv switch at the >time that the program got compiled. The reason that the -nc switch caused >the behavior that you desired was that this is indeed a linker switch Thanks Paul for your tutorial, however I believe that if someone puts -Ogv in the link phase it ought to be recognized as -nc. It is certainly less cumbersome for a user to carry around 1 set of options than 2, and it is more intuitive. Getting it wrong can also make your machine perform and look VERY bad. I also don't expect a general user to understand everything that is happening at this level to know to use 2 different sets of options. Additionally, the fact that the linker doesn't complain about -Ogv gives a false sense of security. [stuff deleted] -- ----------------------------------------------------------------------------- Rich Rinehart | phone: 216-433-5211 NASA Lewis Research Center | email: xxrich@alliant1.lerc.nasa.gov -----------------------------------------------------------------------------
pmontgom@euphemia.math.ucla.edu (Peter Montgomery) (08/17/90)
C Even unoptimized programs can use concurrency C instructions. For example, this program generates C calls to library function _vsqrt_fortran, C which tries to do the square roots in parallel. C At UCLA, it runs 3.2 times as fast using C a cluster of 6 processors as when running detached. C (0.5 vs. 1.6 seconds on an FX/80 with 8 ACEs). program test implicit none integer VECLNG, i, j parameter (VECLNG = 5000) real vec(VECLNG), tarray(2), ETIME, tbeg, tend common vec intrinsic SQRT tbeg = ETIME(tarray) do i = 1, VECLNG vec(i) = i end do do j = 1, 100 vec = SQRT(vec) ! Square root of vector end do tend = ETIME(tarray) print *, 'Execution time = ', tend - tbeg end -- Peter L. Montgomery pmontgom@MATH.UCLA.EDU Department of Mathematics, UCLA, Los Angeles, CA 90024-1555 If I spent as much time on my dissertation as I do reading news, I'd graduate.
tj@Alliant.COM (Tom Jaskiewicz) (08/23/90)
In article <273@kaos.MATH.UCLA.EDU> pmontgom@euphemia.math.ucla.edu (Peter Montgomery) writes: >C Even unoptimized programs can use concurrency >C instructions. For example, this program generates > . . . Yes, this is the correct answer. Any FORTRAN program will use the standard fortran library, and any part of this library can use concurrency. For example, most FORTRAN READ and WRITE statements invoke library routines that use concurrency. -- ########################################################################## # The doctine of nonresistance against arbitrary power, and oppression is # absurd, slavish, and destructive of the good and happiness of mankind. # -- Article 10, Part First, Constitution of New Hampshire