[comp.arch] SPEC, SPECthruput

Publius@dg.dg.com (Publius) (05/03/90)

SPECthruput is a performance measurement defined by SPEC.  Compared to
the well-known SPECmark, it represents one step further.  It recognizes
the importance of the multiprocessor systems in the marketplace.
In many real world environments, throughput has more significance than
the elapse time, and thus SPECthruput is the better performance measurement
than the SPECmark.

However, in its present form, SPECthruput has a few shortcomings.  These
shortcomings should be fixed before SPECthruput can live up to its
good and noble intention.

The first problem I can see in the current SPECthruput methodology is that
it does not specify the maximum time slice allowed.  As we all understand,
the larger the time slice is, the less the context switch overhead will be.
The context switch overhead here includes not only saving and restoring
the process context, but also the warming up of the caches and the
address translation table.  Without specifying the maximum time slice
would allows vendors to inflat SPECthruput numbers by setting a large
maximum time slice that a real world application environment can not accept.
 
The second problem is that what the current SPECthruput methodology
measures is the BATCH THROUGHPUT, not the throughput in a time-shared
environment.  This has at least two implications.  One is about
job scheduling.  The other is about cache utilization.

Concerning job scheduling, there is a fundamental difference between
a batch processing environment and a time-shared environment, especially
on a multiprocessor system.  In a batch processing enviroment, especially
if all the jobs take about the same processing time (as in the case of
SPECthruput methodology), the OS can assign each job a "preferred processor"
and have a job run on only one processor, and enhances the throughput
as the result of the reduced burden for keeping caches coherent.
In a time-shared environment, processes come and go in a random manner,
and the "preferred processor" scheme mentioned above won't work as well.

Concerning cache utilization, there is also a difference in characteristics.
In a heavily loaded time-shared system, the physical memory tends to get
fragmented.  It is well-known that fragmentation of physical memory
can result in inefficient utilization of direct-mapped cache when the
cache size is larger than the page size (or more accurately, the cluster size
in memory management).

SPEC has done remarkable things.  Let us keep improving.

-- 
Disclaimer: I speak (and write) only for myself, not my employer.

Publius     "Old federalists never die, they simply change their names."
publius@dg-pag.webo.dg.com

jmoore@stan.Solbourne.COM (Jim Moore) (05/09/90)

>From: Publius@dg.dg.com (Publius)
>Message-ID: <428@dg.dg.com>
>
>The first problem I can see in the current SPECthruput methodology is that
>it does not specify the maximum time slice allowed.  As we all understand,
>the larger the time slice is, the less the context switch overhead will be.
>The context switch overhead here includes not only saving and restoring
>the process context, but also the warming up of the caches and the
>address translation table.  Without specifying the maximum time slice
>would allows vendors to inflat SPECthruput numbers by setting a large
>maximum time slice that a real world application environment can not accept.
 
Granted there are problems with SPECthruput and SPEC in general, hopefully
these will be reduced over time and versions. Your complaint here is 
too specific. All the techniques that you mention above are valid things
for vendors to do to try to optimize the performance of their systems.
System vendors do try to reduce context switches and the resulting 
overhead. I interpret that you are trying to say that there should
also be metric to measure interactive response and fairness to go
along with the SPECthruput rating. Users don't care about time slices
and other such nits.

>The second problem is that what the current SPECthruput methodology
>measures is the BATCH THROUGHPUT, not the throughput in a time-shared
>environment.  This has at least two implications.  One is about
>job scheduling.  The other is about cache utilization.
>
>Concerning job scheduling, there is a fundamental difference between
>a batch processing environment and a time-shared environment, especially
>on a multiprocessor system.  In a batch processing enviroment, especially
>if all the jobs take about the same processing time (as in the case of
>SPECthruput methodology), the OS can assign each job a "preferred processor"
>and have a job run on only one processor, and enhances the throughput
>as the result of the reduced burden for keeping caches coherent.
>In a time-shared environment, processes come and go in a random manner,
>and the "preferred processor" scheme mentioned above won't work as well.

Again, perfectly valid techniques. Refer back to recent articles about
processor affinity (preferred processor) to get some testimonials on
how effective this technique can be. Real life system loads rarely
are pure batch or pure time share. Most vendors try to come up with
a system that will deal well with a reasonable blend of both. They
will typically include some tuning knobs for users that have unusual
load mixes and the expertise to optimize.

>Concerning cache utilization, there is also a difference in characteristics.
>In a heavily loaded time-shared system, the physical memory tends to get
>fragmented.  It is well-known that fragmentation of physical memory
>can result in inefficient utilization of direct-mapped cache when the
>cache size is larger than the page size (or more accurately, the cluster size
>in memory management).

Just more challenges for the OS types. These problems are not new, but
the input parameters are always changing (CPU speed, memory size,
memory/disk speed ratios, etc.). I still interpret the gist of your
criticism is that the SPECthruput measurement allows cheating by the
vendors to optimize for SPECthruput at the expense of producing a real,
usable system. There is always this potential in benchmarking (just
look at the fixed length string copy in early Dhrystones) and it is
valid to point out where they might occur. I will pass your article
along to our SPEC representative. 

>SPEC has done remarkable things.  Let us keep improving.

Here, here!

Jim Moore
Solbourne Computer, Inc.
1900 Pike Road
Longmont Colorado 80501
jmoore@solbourne.com