gustav@arp.anu.edu.au (Zdzislaw Meglicki) (02/21/91)
I have encountered a problem with pmake in ISIS 2.1 which I have traced down (after beeing coerced into it by Ken) to what looks like a haphazard use of pointers. In a file .../isisv2.1/demos/pmk/pmkio.c you'll find the following (line 503): if ((int) (stream=fopen(p_fname,"w"))< 0) { printf("%s: ", p_fname); perror("pmkio open error 2"); return; } and further on (line 661): if ((int)(stream=fopen(p_fname,"w"))< 0) { printf("%s: ", p_fname); perror("pmkio open error 3"); return; } Under 4.3BSD and SunOS fopen returns a NULL pointer on failure. This means that if a failure occurs the above will not detect it, and if the correctly returned pointer happens to return a negative integer if cast on int then the failure will be "detected" even if there isn't any. This is exactly what was happening when I attempted to run pmake on my systems. The files were created, but they were left empty, because functions write_graph and show_run were returning prematurely. Another problem was caused by the fact that printf on stdout is mixed with perror which prints and flushes on stderr. This resulted in the error messages being garbled. The fix to the above is (line 503): if (!(stream=fopen(p_fname,"w"))) { fprintf(stderr, "%s: cannot open for writing\n", p_fname); fflush(stderr); perror("pmkio open error 2"); fprintf(stderr, "uid = %d, euid = %d, gid = %d, egid = %d\n", getuid(), geteuid(), getgid(), getegid()); fflush(stderr); return; } else { fprintf(stderr, "%s: successfully opened for writing\n", p_fname); fflush(stderr); } And likewise for line 661. Note that being a bit of a voyer I also added messages for a successful opening, which you should probably comment out or remove after making sure that things work properly. At this stage pmake began to open graph files and write on them correctly, but now another problem emerged. In file pmkexec.c a temporary file name is constructed (line 668): strcpy(fname,(char *)tempnam(PMK_SCR,"state")); Under SunOS (I am not sure about 4.3BSD) "/usr/tmp" is automatically prepended to "stateAAA????" unless the user has TMPDIR defined in his/her environment which points elsewhere. This made pmkexec write those graph files on /usr/tmp. But /usr/tmp is local to a CPU. When pmkexec passed this file name to other CPUs they couldn't find the files to open because they weren't in their /usr/tmp. Only after I have defined TMPDIR to point to an area mounted on all CPUs involved and accessed by the same name things began to work. I got pmake finally to execute various compilation steps in parallel on CPUs specified in /usr/spool/isis/sites. Now about pmake itself: it's a great demo, and a useful one to that. I, for that matter, intend to use it in my work. I would like to suggest various improvements though. In the first place, pmkexec should have some kind of a configuration file. You may have several different types of CPUs on the network. Even identical CPUs may have different versions of OS, or different versions of gcc or Fortran. The user should be able to tell pmkexec which particular CPUs of all mentioned in /usr/spool/isis/sites should be used for the compilation. Alternatively that information should be accessible through /usr/spool/isis/sites itself. Once the proper CPUs are selected, the final decision as to which should be used should depend not so much on the number of users on the given CPU but on the load. By the way, the number of users is not 0 as described in the manual but 4, see line 1522 of pmkexec.c. The note in the comment beneath that line indeed states: [...] We should either check for non-idle users, or wait until a proper rexec service which knows about load factors. In the meantime you can tinker with this particular part of the code by hand and make it check for specific hostnames or change numbers of users (I put it up immediately to 10). Hope this helps Gustav -- Gustav Meglicki, gustav@arp.anu.edu.au, Automated Reasoning Project, RSSS, and Plasma Theory Group, RSPhysS, The Australian National University, G.P.O. Box 4, Canberra, A.C.T., 2601, Australia, fax: (Australia)-6-249-0747, tel: (Australia)-6-249-0158
ken@gvax.cs.cornell.edu (Ken Birman) (02/22/91)
In article <1991Feb21.101910@arp.anu.edu.au> gustav@arp.anu.edu.au (Zdzislaw Meglicki) writes: > ... >I, for that matter, intend to use it in my work. I would like to suggest >various improvements though. In the first place, pmkexec should have >some kind of a configuration file. You may have several different types >of CPUs on the network. Even identical CPUs may have different versions >of OS, or different versions of gcc or Fortran. The user should be able >to tell pmkexec which particular CPUs of all mentioned in /usr/spool/isis/sites >should be used for the compilation. Alternatively that information should >be accessible through /usr/spool/isis/sites itself. Once the proper CPUs >are selected, the final decision as to which should be used should depend >not so much on the number of users on the given CPU but on the load. By the >way, the number of users is not 0 as described in the manual but 4, see line >1522 of pmkexec.c. The note in the comment beneath that line indeed states: > > [...] We should either > check for non-idle users, or wait until a proper rexec > service which knows about load factors. > >In the meantime you can tinker with this particular part of the code by hand >and make it check for specific hostnames or change numbers of users (I put >it up immediately to 10). These are good points. Pmake was developed by a Masters student who did his work and left Cornell to return to HP two years ago; since then, nobody has worked on the program at all. I hope that you, or other ISIS/pmake users will consider adapting the program to use the V3.0 network resource manager (i.e. to select machines to run on) and perhaps to extend its load balancing policies as you suggest. We will be happy to help out if you run into more problems. The network resource manager is a "proper rexec service which knows about load factors", although it might need some extensions if pmake needs to get access to those factors. This is just the sort of thing we might be willing to add, as long as the architecture of the resulting system is clean and simple... and the extended pmake code becomes available to other ISIS users, of course. Ken