andrew@alice.att.com (Andrew Hume) (12/24/90)
I am running some benchmarks on a variety of machines and in particular, on a SGI 4D/380, a multiprocesor with 8 33MHz R3000 cpus. my benchmark reads in about 1.1MB of text into an internal buffer and then runs cpu bound for about 40s. total memory usage is <2MB; the machine's memory is 256MB. The benchmarks are run with the machine in single user (Unix) mode with normally mounted NFS filesystems unmounted. No other processes (excpet paging daemon etc) are running. my problem is that I see quite large variations over multiple runs of the same benchmark, sometimes as much as 1.26%. Now, the resolution of the timer is .01s and i should se an accuracy of about .01/40 or .025%. I am a factor of 50 off this. does anyone know how i can run these benchmarks so as to get reproducible timings? (i note as an aside that just running the benchmarks on the cray in multi-user mode yields variations of the order of .15% which is satisfactory). andrew hume andrew@research.att.com
tve@sprite.berkeley.edu (Thorsten von Eicken) (12/24/90)
... quick 2 cents worth of guesses: you haven't said whether you're running your program on all 8 processors or on only one of them. if you're running on only one, could it be that the other seven interfere? What happens if you run a "for(;;);" program on seven processors while running the benchmark on the eighth? also, is there a cache-flush system call you can call before starting the timer? TvE
andrew@alice.att.com (Andrew Hume) (12/25/90)
In article <9932@pasteur.Berkeley.EDU>, tve@sprite.berkeley.edu (Thorsten von Eicken) writes:
~ ... quick 2 cents worth of guesses:
~ you haven't said whether you're running your program on all 8 processors
~ or on only one of them. if you're running on only one, could it be that
~ the other seven interfere? What happens if you run a "for(;;);" program
~ on seven processors while running the benchmark on the eighth?
~ also, is there a cache-flush system call you can call before starting the
~ timer?
the program runs on just one cpu. the other processes are presumably
idle (or running some idle process). does cache-flush refer to file system?
if so, i don't see the need; my benchmark generates 200 bytes every run
(5 bytes/sec) and i'm sure one of the other 7 spare cpu's could handle
sending that one block off.
still puzzled,
andrew
p.s. how the hell do the specmark people do this stuff?
raytrace@cutmcvax.cs.curtin.edu.au (Phil Dench) (12/27/90)
andrew@alice.att.com (Andrew Hume) writes: > I am running some benchmarks on a variety of machines >and in particular, on a SGI 4D/380, a multiprocesor with 8 >33MHz R3000 cpus. my benchmark reads in about 1.1MB of text >into an internal buffer and then runs cpu bound for about >40s. total memory usage is <2MB; the machine's memory is 256MB. >The benchmarks are run with the machine in single user (Unix) >mode with normally mounted NFS filesystems unmounted. No other >processes (excpet paging daemon etc) are running. > my problem is that I see quite large variations over >multiple runs of the same benchmark, sometimes as much as >1.26%. Now, the resolution of the timer is .01s and i should se >an accuracy of about .01/40 or .025%. I am a factor of 50 off this. >does anyone know how i can run these benchmarks so as to get reproducible >timings? (i note as an aside that just running the benchmarks on the cray >in multi-user mode yields variations of the order of .15% which is >satisfactory). > andrew hume > andrew@research.att.com You LUCKY BASTARD! I dream of 256Mb 8 processor SG plus access to a Cray. There's no pleasing some people :?) -- Phil Dench Andrew Marriott. --------------------------------------------+---------------------------------- | School of Computer Science, ACSNet: raytrace@cutmcvax.cs.curtin.edu.au | Curtin University of Technology, UUCP: ...!uunet!munnari!cutmcvax!raytrace | Kent Street, ARPA: raytrace@cutmcvax.cs.curtin.edu.au | Bentley | Western Australia, 6102 --------------------------------------------+----------------------------------
cprice@mips.COM (Charlie Price) (12/29/90)
In article <11737@alice.att.com> andrew@alice.att.com (Andrew Hume) writes: > > I am running some benchmarks on a variety of machines >and in particular, on a SGI 4D/380, a multiprocesor with 8 >33MHz R3000 cpus. ... > my problem is that I see quite large variations over >multiple runs of the same benchmark, sometimes as much as >1.26%. Now, the resolution of the timer is .01s and i should se >an accuracy of about .01/40 or .025%. I am a factor of 50 off this. >does anyone know how i can run these benchmarks so as to get reproducible >timings? (i note as an aside that just running the benchmarks on the cray >in multi-user mode yields variations of the order of .15% which is >satisfactory). > > andrew hume > andrew@research.att.com One source of variability in benchmark times that nobody else has mentioned (so I will) is cache conflicts. Identical exeuctions of a benchmark use the same *virtual* locations in the same pattern, but these virtual locations get mapped to physical locations, and in particular cache locations, in some manner determined by the OS, previous activity on the machine, the phase of the moon... If subsequent executions of the program get different patterns of cache conflict then you can easily see several percent difference in the execution time due to differences in cache conflict. This isn't just speculation. In the early days at MIPS some maddening variability in execution times was finally traced to variability in page alocation. The execution variability mostly went away when the OS did page coloring (matching the physical and virtual address of a page in certain ways) to remove the cache-use variability. I suspect that if the OS isn't giving you reproducible use of the caches that you won't ever be able to get reproducible benchmark times. -- Charlie Price cprice@mips.mips.com (408) 720-1700 MIPS Computer Systems / 928 Arques Ave. / Sunnyvale, CA 94086-23650