beaumont@CompSci.Bristol.AC.UK (Tony Beaumont) (02/08/90)
We have a prototype OR-parallel Prolog system, written in C and running on a sequent symmetry with 12 processors and a micro-second clock. We want to get accurate timings of the run-times of Prolog programs when we use 1, 2, ... 11 processors in order to accurately measure the speed-ups. Our sequent is also running X-windows and there are 4 or 5 X-terminals attached. THE PROBLEM... 1. How can we be sure that a program using (say) 6 DYNIX processes will be run on 6 processors without interruption by other processes (ie operating system processes, X-windows processes etc) Is there a way to get a processor to exclusively run a process? 2. Our current solution is to ask other users to log off when we make timing runs, ensuring the load on the machine is as low as possible. However, operating system processes are still running and although we always leave at least 1 processor idle, how can we be sure that the operating system processes do not interfere with (slow down) the processes we are timing? Also this approach is rather unfortunate in that we require all other users to log off which in effect means that we have to make our timings during the night. email replies directly to me and if there is any interest I'll post a summary of responses. -Tony Beaumont email (JANET): beaumont@uk.ac.bristol.compsci Post: Department of Computer Science University of Bristol Bristol BS8 1TR UK
cudep@warwick.ac.uk (Ian Dickinson) (02/12/90)
In article <1324@csisles.Bristol.AC.UK> beaumont@CompSci.Bristol.AC.UK (Tony Beaumont) writes: >Is there a way to get a processor to exclusively run a process? Yes. If you are running the code as setuid. There is a system call called tmp_affinity or similar (try 'man -k affinity' and you should get something in section 2) This call will make a process have exclusive use of a single processor. If you make all the parallel processes call this, then spin until they are synchronised, you should be able to conduct reasonable tests without an other process intervening. It might still be worthwhile running the tests during light load periods so bus congestion, paging etc are less of a problem. Hokay! -- \/ato. vato@uk.ac.warwick. *NIX gives good head. Support the FSF. Plinth. "When it's a smoking charred stump, that's too much. Mine hasn't charred yet." - entropy@alembic.acs.com
jim@cs.strath.ac.uk (Jim Reid) (02/12/90)
In article <1324@csisles.Bristol.AC.UK> beaumont@CompSci.Bristol.AC.UK (Tony Beaumont) writes: >We want to get accurate timings of the run-times of Prolog programs when we >use 1, 2, ... 11 processors in order to accurately measure the speed-ups. > Our current solution is to ask other users to log off when we make >timing runs, ensuring the load on the machine is as low as possible. >However, operating system processes are still running and although we >always leave at least 1 processor idle, how can we be sure that the >operating system processes do not interfere with (slow down) the processes >we are timing? Also this approach is rather unfortunate in that we >require all other users to log off which in effect means that we have to >make our timings during the night. If you are hell-bent on timing *only* your application, you have little choice but to dedicate your machine to that task. That means kicking off all the users and booting the machine with a minimal number processes active - essentially in single user mode. You wouldn't want the results to be affected by a tty interrupt or a packet arriving from the ethernet or whatever. This will give you accurate results but is somewhat extreme. Note that there will always be some kernel latency. [This is because of the way the UNIX kernel works.] Your application process may have to switch into kernel mode to do some interrupt handling, so unless you can disable the interrupts, your results will be affected by the demands of interrupt servicing. Do you *really* need that accuracy? Most people treat benchmarks with deserved scepticism so your super-accurate results are only likely to be interpreted as a rough rule of thumb. If that's all your looking for, you might as well run the timings while the machine is in use. Personally speaking, I'd be happier if I knew that timings included the noise of other system activity. I don't run applications on an empty machine, so why should I run benchmarks in that way. To me, the absolute numbers are not important - it's more useful to know what to expect when the system's doing real work... Jim
luis@octopus.tds.kth.se (Luis Barriga) (02/15/90)
>We want to get accurate timings of the run-times of Prolog programs when we >use 1, 2, ... 11 processors in order to accurately measure the speed-ups. I may thinks of two ways more of solving this problem. 1) You can use the "at" facility to schedule a job at any tile of any day, while hopefully nobody is logged into your sequent (maybe Friday or Saturday night). This works if your application does not take days to finish and there is no interaction with the user. If the input is the same you can prepare it in a file and redirect it from standart input. 2) I have read about a system call "getrusage" that gives info about resource utilization: system, user time consumed, and other stuff. Is there any problem using it? -- ________________________________________________________________________| Luis Barriga The Royal Institute of Technology | Dep. Computer systems (TDS) | e-address: luis@tds.kth.se S-100 44 Stockholm | SWEDEN | ________________________________________________________________________|
jim@cs.strath.ac.uk (Jim Reid) (02/15/90)
In article <LUIS.90Feb15104438@molly.octopus.tds.kth.se> luis@octopus.tds.kth.se (Luis Barriga) writes: >2) I have read about a system call "getrusage" that gives info about >resource utilization: system, user time consumed, and other stuff. Is >there any problem using it? No, but the numbers it gives may not be reproducible. Paging statistics may be influenced by the amount of free memory when the process is run. The system time will probably include time spent processing interrupts *on behalf of other processes*. Likewise the I/O statistics may include counts for I/O for another process (i.e. starting the next disk transfer request in the queue after servicing an interrupt from the controller). If you plan on using getrusage for real, run your program several times to average out these potential inconsistencies. Jim
peralta@pinocchio.Encore.COM (Rick Peralta) (02/16/90)
In article luis@octopus.tds.kth.se (Luis Barriga) writes: > >2) I have read about a system call "getrusage" that gives info about >resource utilization: system, user time consumed, and other stuff. Is >there any problem using it? It should be fine, just as long as you peek at it at the beginning and end of your timing runs. Otherwise you will get a little startup cost into the overall measurements. BTW: you'll need to sync the starts. If you are interested in just general trends, just use the wall clock time and test 0, 1 and many iterations of the code path. That way you can estimate fairly accurately how long things take, without going to tremendous effort coding out the startup costs. - Rick
david@torsqnt.UUCP (David Haynes) (02/16/90)
peralta@pinocchio.Encore.COM (Rick Peralta) writes: >In article luis@octopus.tds.kth.se (Luis Barriga) writes: >> >>2) I have read about a system call "getrusage" that gives info about >>resource utilization: system, user time consumed, and other stuff. Is >>there any problem using it? >It should be fine, just as long as you peek at it at the beginning and >end of your timing runs. Otherwise you will get a little startup cost >into the overall measurements. BTW: you'll need to sync the starts. There was some work done at the University of Western Ontario utilizing a program called "gun" which provided synchronized starts. The basic strategy, as it was explained to me, involved having the processes watch a memory location to see a change of state. -david- -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- David Haynes Sequent Computer Systems (Canada) Ltd. "...and this is true, which is unusual for marketing." -- RG May 1989 ...!{utgpu, yunexus, utai}!torsqnt!david -or- david@torsqnt.UUCP
eugene@eos.UUCP (Eugene Miya) (02/16/90)
In article <11171@encore.Encore.COM> peralta@multimax.encore.com (Rick Peralta) writes: >BTW: you'll need to sync the starts. Very much so. I wrote a paper on this in the 88 Usenix Supercomputing Workshop. --eugene
carl@aerospace.aero.org (Carl Kesselman) (02/24/90)
No, no no. The responses being sent are wrong. How about RTFM. There is a facility called tmp_affinity. You can use this to ensure that a process runs on a specific processor. If you have the appropiate kernel configuration, this can be done without root permission. You might also want to dissable swapping and pff reduction of the working set using the vm_ctl system call. This does not completly free you from interaction with other programs, but it is pretty good. In particular, you still must share the system bus and disk controllers with other processes. In addition, you should always leave a processor or two free to handle interupts, network traffic and the like. NOTE: there is about a 20 microsecond overhead if you use the function call interface described in the manual page. There is also a set of macros defined that can allow you to access the clock in about 6 usecs. Carl
jim@cs.strath.ac.uk (Jim Reid) (02/26/90)
In article <67411@aerospace.AERO.ORG> carl@altair.UUCP (Carl Kesselman) writes: >No, no no. The responses being sent are wrong. How about RTFM. There >is a facility called tmp_affinity. You can use this to ensure that a >process runs on a specific processor. Yes, but that is not much help to the person who posed the initial question. They were looking for a way to *exclusively* dedicate a processor (or bunch of processors) to a particular process. The tmp_affinity() system call does not provide this facility. It can bind a process to a particular processor, but does not prevent another process from also being bound to that processor or for the processor to be given over to interrupt servicing and related kernel processing. If the processor had to switch from the process that was being timed, the benchmark results would be inconsistent and possibly non-reproducible. I understand that Encore provide this capability on their boxes, so maybe DYNIX will get it some day. Jim
bakken@cs.arizona.edu (Dave Bakken) (02/27/90)
In article <2151@baird.cs.strath.ac.uk> jim@cs.strath.ac.uk writes: >Yes, but that is not much help to the person who posed the initial >question. They were looking for a way to *exclusively* dedicate a >processor (or bunch of processors) to a particular process. The >tmp_affinity() system call does not provide this facility. It can bind >a process to a particular processor, but does not prevent another >process from also being bound to that processor or for the processor to >be given over to interrupt servicing and related kernel processing. If >the processor had to switch from the process that was being timed, the >benchmark results would be inconsistent and possibly non-reproducible. >I understand that Encore provide this capability on their boxes, so maybe >DYNIX will get it some day. I hope so, but I won't hold my breath, since Sequent seems to be about 2-3 years behind every other multiprocessor vendor in terms of their OS. I think that, in general, it would be nice to have a (privileged) system call that allowed you to dedicate a certain number of processors to a certain job without rebooting or anything. Even more generally, I think it would be useful to a lot of people of you could create separate pools of processors, where processors in the same pool pulled processes off of the same ready queue, and where you can configure which jobs go to which pool in a flexible way, maybe based on some sort of grouping (e.g., faculty, grad_student, undergrad). This grouping could probably be both static and dynamic. -- Dave Bakken Internet: bakken@cs.arizona.edu 721 Gould-Simpson Bldg UUCP: uunet!arizona!bakken Dept of Computer Science; U of Arizona Phone: +1 602 621 8372 (w) Tucson, AZ 85721 USA FAX: +1 602 621 4246
chowe@bbn.com (Carl Howe) (02/28/90)
In article <2151@baird.cs.strath.ac.uk> jim@cs.strath.ac.uk writes: >In article <67411@aerospace.AERO.ORG> carl@altair.UUCP (Carl Kesselman) writes: >>No, no no. The responses being sent are wrong. How about RTFM. There >>is a facility called tmp_affinity. You can use this to ensure that a >>process runs on a specific processor. > >Yes, but that is not much help to the person who posed the initial >question. They were looking for a way to *exclusively* dedicate a >processor (or bunch of processors) to a particular process. The >tmp_affinity() system call does not provide this facility. It can bind >a process to a particular processor, but does not prevent another >process from also being bound to that processor or for the processor to >be given over to interrupt servicing and related kernel processing. If >the processor had to switch from the process that was being timed, the >benchmark results would be inconsistent and possibly non-reproducible. > >I understand that Encore provide this capability on their boxes, so maybe >DYNIX will get it some day. > I apologize in advance if this is too far off the topic of Sequent computers, but since you already brought up one other vendor, I thought I'd just mention an implementation of both capabilities. The facility that you both are describing exists in nX, the UNIX OS for the BBN Advanced Computers' multiprocessors. Affiliating a process with a processor is done via a "fork_and_bind" system call. Dedication of processors to processes is done via a facility called "clusters", which are dynamically created and destroyed. When user's originally log in, they work in a public cluster of processors, but if they want a dedicated set of processors to run a program, they can use the command, cluster <#processors> <command> <args> to get such a cluster. The processors for this cluster are allocated out of a free pool and will be returned to the pool after the execution of the command. We use this type of facility all the time for benchmarking and characterizing multiprocessor programs without disrupting the work of other users. No special privileges are required. There are a few descriptions of this facility in the literature if you are interested in more details. I apologize in advance that I can't quote them here because my office is currently packed in boxes. Email me if you would like them when I get unpacked. If people would like to discuss or post regarding this further, we should probably move to comp.parallel. Carl chowe@bbn.com.