[comp.arch] Information on the PIXIE Instruction Simulator

saghir@eecg.toronto.edu (Mazen Saghir) (01/16/91)

In a previous post I asked about information on the dixie simulator, and as
it turns out, I meant the pixie (with a "p") instruction simulator. Does 
anyone know where I could get some information on this simulator? Thanks in
advance.

Mazen Saghir

-- 
Computer Group                       | e-mail: saghir@eecg.toronto.edu 
Department of Electrical Engineering | 
University of Toronto                |          ** CCE '89 **    
Toronto, Ontario  M5S 1A4   CANADA   |    

mash@mips.COM (John Mashey) (01/18/91)

In article <1991Jan15.162526.2139@jarvis.csri.toronto.edu> saghir@eecg.toronto.edu (Mazen Saghir) writes:
>
>In a previous post I asked about information on the dixie simulator, and as
>it turns out, I meant the pixie (with a "p") instruction simulator. Does 
>anyone know where I could get some information on this simulator? Thanks in
>advance.

"pixie" is a program shipped as part of a standard binary distribution for
MIPS systems; I believe DEC ships it standard on DEC{server,station} as
well.

It is not a simulator; here's what it does and how you use it:

1)  assume you have an executable X, and that running it with a given input
takes Y seconds.

2) type pixie X, which creates a new executable X.pixie, that contains
code to count the number of times each basic block is executed,
and if you invoke the right options, also generates complete address traces.
this takesa few seconds.

3) run X.pixie, which creates a counts file that records the numbers of
executions.

4) say prof X, which gives you:
	number of instruction cycles spent by subroutine, sorted by frequency
	number of calls per subroutine
	number of cycles spent per line of source code, showing the top N
	lines of code
These numbers do NOT give you complete times, in that they do not show
cache & memory system overhead.  However, by comparing the cycle count
to Y*clock rate, you can get an idea of the percentage of time occupied
by instruction cycles versus memory cycle lossage. Typical would be
60-80% going to instructions.

This information gets used by programmers, as it gives you accurate
profiling without profiling libraries, special compilers flags, etc.
(Note that many vendors do not ship profiled versions of system libraries,
making it difficult to see the time going there.)

5) say pixstats X, which gives you numerous gory details of instruction
usage, number of registers saved per subroutine call, # loads,
instruction concentration (i.e., what percentage of instruction cycles
would fit in a perfect associative cache of 1 word, 2 words, 4 words, etc.)
Msot of this is of interest only to computer architects.

6) There is an additional cache simulator, that comes as a part of separate
software package (System PRogrammer's Package), which takes the output of
address traces above, plus paramaters to specify: refill size, write-buffering,
memory latency, etc, etc, and compute the number of cycles lost to
memory latency, write-buffer stalls, etc;  adding those cycles to the
instruction cycles shoudl give a pretty good approximation of the actual
time.

This tends to be of use to people designing hardware systems, who want to
see the tradeoffs between memory speed, cache size, refill-size,
performance, and cost.

SO, if you have a MIPS system, or (usually) a MIPS-based system,
you should be able to use pixie, pixstats, and prof.  There are manual
pages on these things on the systems.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

tom@ssd.csd.harris.com (Tom Horsley) (01/18/91)

The 88k based Night Hawk 4000 series machines that Harris Computer Systems
make come with a tool similar in many ways to the PIXIE tool that comes with
MIPS systems. This is the analyze88 tool. It can read an existing executable
and generate a new executable patched to generate basic block counts and
dump the information to a file when the program exits. The report88 program
can then be used to generate various reports from the basic block
statistics.

In addition to the profiling capability, analyze88 also serves as a
post-linker code optimizer (reducing many of the 2 instruction sequences
compilers generate for memory references to one instruction, by putting the
most commonly referenced high 16 bit address values in reserved registers
that act as program wide common sub-expressions).

It can also generate annotated dis-assembly listings showing the static
instruction timing within each basic block, where instructions are blocked
due to resources that are not available yet, etc.

Like PIXIE, it has the limitation that it cannot tell you anything about the
cache, the timings it generates assume the 88k will never have to wait on
memory.

P.S. I gratefully acknowledge that many of the ideas for analyze88 came from
things I heard about PIXIE, they are all great ideas, and the tool has been
enormously beneficial to us in-house at Harris in helping to make our 88k
compilers generate unsurpassed quality code for the 88k (shameless plug :-).
--
======================================================================
domain: tahorsley@csd.harris.com       USMail: Tom Horsley
  uucp: ...!uunet!hcx1!tahorsley               511 Kingbird Circle
                                               Delray Beach, FL  33444
+==== Censorship is the only form of Obscenity ======================+
|     (Wait, I forgot government tobacco subsidies...)               |
+====================================================================+