joel@pandora.pa.dec.com (Joel McCormack) (12/08/89)
When running x11perf, a program to measure X11 graphics and windowing performance, I sometimes get measurements that can vary by 10% or more. x11perf first runs each benchmark to calibrate how many times total it should run the benchmark, chooses a number that will result in total time of about 5 seconds, then run that benchmark 5 times. The calibration run gets everything into caches, and then the 5 actual runs usually match up within a few percent of each other. However, I have noticed that the order in which I run benchmarks can affect timings, or if I run another program between two runs of the same benchmark the timings may be different. These variations often exceed 10%, which seems rather large. Does anyone have an explanation for this behavior? I could understand cache conflicts that might occur within the X11 server if I relink the server, but these differences show up using the same binary for both the server and x11perf. Pages from the code files might get loaded into different locations on different runs, but since the caches are much larger than a page size, I would expect this to have no effect. Any ideas, however harebrained, would be appreciated. - Joel McCormack (decwrl!joel, joel@decwrl.dec.com)
mash@mips.COM (John Mashey) (12/29/89)
In article <2247@bacchus.dec.com> joel@pandora.pa.dec.com (Joel McCormack) writes: >When running x11perf, a program to measure X11 graphics and windowing >performance, I sometimes get measurements that can vary by 10% or more. >x11perf first runs each benchmark to calibrate how many times total it >should run the benchmark, chooses a number that will result in total >time of about 5 seconds, then run that benchmark 5 times. > >The calibration run gets everything into caches, and then the 5 actual >runs usually match up within a few percent of each other. However, I >have noticed that the order in which I run benchmarks can affect >timings, or if I run another program between two runs of the same >benchmark the timings may be different. These variations often exceed >10%, which seems rather large. > >Does anyone have an explanation for this behavior? I could understand >cache conflicts that might occur within the X11 server if I relink the >server, but these differences show up using the same binary for both the >server and x11perf. Pages from the code files might get loaded into >different locations on different runs, but since the caches are much >larger than a page size, I would expect this to have no effect. This sounds like an issue we dealt with a while back, whereby we added partial page coloring to the kernel to lessen the variance in benchmark times. Specifically, with physical, direct-mapped caches, random allocation of virtual pages to physical pages can sometimes cause more variation in run-times than you might expect, either because of I-clashes for big codes, or D-clashes for big data with linear algorithms. This especially bothered the old M/500s, which had 16KB of I-cache and 8KB of D-cache. This caused us to do "statistical" page coloring, i.e., where you tried to cause the virtual pages of a program to have the same relationship in a physical cache across different runs, at least mostly, and most of the time. (Note that appropriate algoorithms can spread the usage around in the cache: what's important is that if 2 pages are mapped to the same page of the cache once, they usually do it every time. We actually ran into 1 program that could vary by a factor of 2X (!) without this.) Anyway, it can always be an issue with certain kinds of programs, but this approach lessens the variance quite reasonably, at low cost. Was this effect found on Ultrix? I'm not sure whether it does this trick or not. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086