[comp.sys.amiga] profiler

rokicki@navajo.STANFORD.EDU (Tomas Rokicki) (12/23/86)

---cut here---
                        Profiler for Manx 3.30e, v1.0
                        (C) 1986  Radical Eye Software

Introduction and Overview

    These two programs, p1 and p2, comprise a profiler for programs compiled
with Manx 3.30e or later.  With this tool, those sections of code executed the
most can be identified, so the program can be sped up with a minimum of
effort.

   The goal was to have a profiler with the following characteristics:

      -  Does not slow the program down excessively
      -  Does not require a recompilation of the program
      -  Does not take over system resources the program might need
      -  Does not require excessive disk storage
      -  Invocation counts should be kept
      -  Time for self should be kept
      -  Time for self plus children kept

   These goals have been attained, and though the results of the profiler
should be treated with some caution, it does perform a valuable service
currently lacking on the Amiga.

   The first program, p1, takes an executable and a link symbol file
created with the `-t' option of the Manx linker, and creates a new
executable with traps substituted for the link and unlink instructions,
and a data file for the second phase.  The second program, p2, resides in
memory as the modified executable is running, servicing the traps and
collecting usage information.  To collect routine timings, the second
phase also runs a fast real-time clock driven by the vertical beam
position and the vertical blanking interrupt.

Files

   To create these programs, you will need the following files:

   readme
   makefile
   p1.c
   p2.c
   p2a.asm

   Run the makefile over the three source files, and you should get two
executables, p1 and p2, which can be moved into your c: directory.  The
sizes of these two programs should be approximately 9K and 12K
respectively.

Usage

   These programs do very dangerous things, so they should be treated
carefully.  If the instructions below are not followed, system crashes may
result.

   The first step is to make the executable of the program you want to
profile, and its symbol file.  To do this, simply execute your normal link
step, but give the `-t' option to the compiler as well.  If your program
is named `foo', this should create the files `foo' and `foo.sym'.  If you
are developing a piece of code and feel you might profile it occasionally,
you might add the `-t' option to your makefile, as the resulting executable
will be no different, and the `.sym' file is small.

   Next, you will need to create the modified executable for tracing.  This
new executable will contain trap instructions at each routine's entrance and
exit.  To do this, simply type `p1 foo'.  The program p1 will read `foo' and
`foo.sym', and create `foo.exe', the new executable, and `foo.pdt', a profile
data file for the second phase.  You should not run `foo.exe' by itself.
If your `foo.sym' file is not up to date with your `foo' executable, `p1'
will complain, in which case you should relink `foo' with the `-t' option.

   Finally, you should perform the actual profiling step.  First, invoke
p2 in the background by typing `run p2 foo'.  Wait for p2 to open its window
before proceeding.  Note that you must supply the name `foo', and it must
match the program you expect to profile.  The program p2 will read in
`foo.pdt', and start the real time clock.

   Once p2 has opened its window, you are ready to start profiling.  Simply
run your program as you normally would, only invoke `foo.exe' instead of
`foo'.  The new executable will perform in every respect the same as the
old one, only more slowly because of the tracing.  Run `foo.exe' as many
times as you want; the profiling information will continue to accumulate
as long as `p2' is running.  It is recommended that you profile at least
twenty minutes of the programs execution to allow for sampling inaccuracies.
If you want to start profiling fresh, simply select the `initialize' gadget
on p2's window.

   After you are finished, and `foo.exe' has exited, you may select the
`finish' gadget on p2's window.  This will print a summary of the results
to a file called `foo.mon', and exit.  Do not select this gadget when
`foo.exe' is still running; the system will crash!

   The file `foo.mon' may now be examined for the results of the tracing.

Theory of Operation

   Under Manx C, each C routine has exactly one link instruction, at the
beginning of the code, and one unlink instruction, at its exit point.
(Multiple returns are handled by branches to the single exit.)  These links
and unlinks are replaced by `trap #3's, followed by an integer representing
the number of the procedure.  (These numbers are assigned by p1.)  Thus, as
the program is executing, these traps occur, and call the trap handling
routine set up by p2.

   To locate the procedures, and to assign names, the `foo.sym' file is
read in.  This file contains a list of each routine and its address in the
executable.  A link instruction should exist at the exact beginning of each
routine, and an unlink followed by a rts sometime within the routine.  These
are scanned for and converted.

   To calculate the run time of each procedure, a fast free-running clock
is also generated by p2.  The vertical beam position counter is used as the
eight least significant bits.  A vertical blanking interrupt routine is set up
to supply the rest of the bits; this gives a 32-bit counter with resolution
of approximately 65 microseconds.

Limitations

   Currently this program only works under Manx C, version 3.30e or later.
It can be modified for other compilers quite easily, however; p1 should be
the only routine which needs changes.  A list of the routine names and
their locations is needed by p1, and (hopefully) a single link and unlink
instruction will reside in each subroutine.

   The executable part of the program must reside in a single hunk; this
is an arbitrary limitation which simply exists in the code at the moment.
(I don't have any programs which I am compiling for scatter loading or
overlays.)  Again, simple changes to p1 should eliminate this problem.

   Recursion is handled, albeit not correctly.  The time for the self
is correct, but the time for self plus children is too large.  For instance,
if A calls itself recursively once, then the second invocation of A will
be counted twice in the self plus children category.  The necessary data
structures are set up to handle this problem, but the code is not there.
It is simple to add.

   Non-local jumps (setjmp/longjmp) are not handled correctly.  Avoid using
the profiler on code which has these.  The fix is to trap these specially,
and to do as many `virtual unlinks' as necessary to keep the profiling on
course.  Again, I don't need this.

   The program must exit by calling _exit() eventually.  I flag this as a
special case, and use it to `virtually unlink' all currently active routines.
By `virtual unlinks', I mean pretend the routine has exited and calculate the
statistics for it.  If the program does not exit by calling _exit(), then all
statistics for the current invocation of currently active routines when the
program does exit will be lost.  In addition, the stack maintained by the
profiler will start to creep; stack checking is done, so the program should
not crash, but if the stack hits bottom, the data will be invalid.  The fix
for this is to trap other possible exit routes and treat them the same way I
treat _exit.

   Static routines are not traced (they are invisible; their names never
get to the link stage.)  Lower level routines and routines without link
and unlink instructions are not traced.

   The time values are real time.  Thus, any time waiting for user interaction
is charged to the procedure as if it were running real code.  If this is a
problem, either localize user interaction to a single routine (so that its
value can be ignored) or supply interaction very quickly.  Or, if anyone out
there is clever, they can figure out a way to stop or fudge the clock while
another task is running.

   Related to the above, the profiling time is not subtracted.  Thus,
routines which don't do much and return immediately will have slightly
higher time values than they should.  I've tried to keep this factor to a
minimum, but since it will differ on different machines (68010, 68020,
fast memory, etc.), I've decided not to factor it out.

Extensions

   If anyone makes any extensions or modifications to this program, or you
just want to send kudos or flames, I can be reached at 326-5312 (home),
Box 2081, Stanford, CA  94305 (mail), rokicki@sushi.stanford.edu (arpanet),
or ...!decwrl!navajo.stanford.edu!rokicki (usenet).  Please contact me
before distributing extensions; I'll try and coordinate these.  Enjoy!

                                                            -Tom Rokicki
---cut here---

rokicki@navajo.STANFORD.EDU (Tomas Rokicki) (12/23/86)

---cut here---
all:	p1 p2

p1:	p1.o
	ln -t p1.o -lc

p2:	p2.o p2a.o
	ln -t p2.o p2a.o -lm -lc
---cut here---

rokicki@navajo.STANFORD.EDU (Tomas Rokicki) (12/23/86)

---cut here---
/*
 *   (C) 1986 Radical Eye Software.  This code may be used and distributed
 *   freely, so long as this notice stays in it and the banner messages
 *   are unchanged except for the version notices.  Written by Tomas Rokicki
 *   on December 19 through December 21 1986.
 *
 *   This is the first part of the profiler.  It takes an executable
 *   and a (ln -t) symbol file, and changes all of the links and unlinks
 *   to traps.  It also writes a data file for the second part of the
 *   profiler containing sizes of the data segments for the links, the
 *   names of the routines, and the numbers assigned to the routines.
 *
 *   Usage:  p1 filename
 *
 *   Reads  filename       (the executable)
 *          filename.sym   (the symbol file)
 *
 *   Writes filename.exe   (the executable for profiling)
 *          filename.pdt   (the profiler data file)
 */
#include "stdio.h"
/*
 *   Globals.  (Yuck!)
 */
char filename[100] ;
FILE *exein, *exeout, *datin, *datout ;
long hunk_pos ;
long hunk_length ;
long proc_address ;
char proc_name[255] ;
int __exit_seen = 0;
int proc_num, assigned = 1 ;
char cur_name[100] ;
/*
 *   String declarations.
 */
char *strcat(), *strcpy() ;
/*
 *   Main routine.
 */
main(argc, argv)
int argc ;
char *argv[] ;
{
   printf("This is p1, v1.0, (C) 1986 Radical Eye Software\n") ;
   if (argc != 2) {
      printf("Usage: p1 filename\n") ;
      exit(1) ;
   }
   if ((exein = fopen(argv[1], "r"))==NULL) {
      printf("Couldn't open %s\n", argv[1]) ;
      exit(1) ;
   }
   if ((datin = fopen(strcat(strcpy(filename, argv[1]), ".sym"), "r"))==NULL) {
      printf("Couldn't open %s; use ln -t to create\n", filename) ;
      exit(1) ;
   }
   if ((exeout = fopen(strcat(strcpy(filename, argv[1]), ".exe"), "w"))==NULL) {
      printf("Couldn't open %s for writing\n", filename) ;
      exit(1) ;
   }
   if ((datout = fopen(strcat(strcpy(filename, argv[1]), ".pdt"), "w"))==NULL) {
      printf("Couldn't open %s for writing\n", filename) ;
      exit(1) ;
   }
   find_code_hunk() ;
   process_hunk() ;
   finish_up() ;
   exit(0) ;
}
/*
 *   The following routines are used to read from the input executable and
 *   write to the output.  The output functions return their value.
 */
int getword() {
   register int i = getc(exein) ;
   hunk_pos += 2 ;
   return (i << 8) + (getc(exein) & 0xff) ;
}
long getlong() {
   register long i = getword() ;
   return (i << 16) + (getword() & 0xffffL) ;
}
int putword(i)
int i ;
{
   putc(i >> 8, exeout) ;
   putc(i & 255, exeout) ;
   return(i) ;
}
long putlong(i)
long i ;
{
   putword((int)(i >> 16)) ;
   putword((int)(i & 0xffffL)) ;
   return(i) ;
}
/*
 *   find_code_hunk() skips over the header information in the executable,
 *   copying it to the output.  Note that at the moment we only support
 *   executables which are in one contiguous chunk.
 */
find_code_hunk() {
   register long i, j ;
   int dmy1 ;

   if (putlong(getlong()) != 0x3f3)
      error("! expected hunk_header longword") ;
/*
 *   Skip over names.
 */
   while ((i=putlong(getlong()))!=0)
      for (j=0; j<i; j++)
         putlong(getlong()) ;
/*
 *   Skip over lengths.
 */
   putlong(getlong()) ;
   i = putlong(getlong()) ;
   i = putlong(getlong()) - i + 1 ;
   for (j=0; j<i; j++)
      putlong(getlong()) ;
/*
 *   Now we should see a hunk_code hunk (3e8); this will be the piece
 *   of code we will relocate.
 */
   if (putlong(getlong()) != 0x3e9)
      error("! expected hunk_code longword") ;
   hunk_length = putlong(getlong()) << 2 ;
   hunk_pos = 0 ;
/*
 *   We are now ready to begin.  We skip the first line, and look
 *   for an address and procedure name.
 */
   if (fscanf(datin, "Segment %d: Hunk %d", &dmy1, &dmy1) != 2)
      exit("! format error in symbol table file") ;
   get_name_and_address() ;
}
/*
 *   This routine reads a name and an address from the symbol file.
 */
get_name_and_address() {
   if (proc_address != 0xfffffe) {
      if (fscanf(datin, " %lx %s", &proc_address, proc_name) != 2) {
         printf("Oops!  Went off the end of the .sym file!\n") ;
         proc_address = 0xfffffe ;
         strcpy(proc_name, "<ERROR>") ;
      }
   }
}
/*
 *   error() prints an error message and exits if it was fatal.
 */
error(s)
char *s ;
{
   printf("%s\n", s) ;
   if (*s=='!')
      exit(1) ;
}
/*
 *   Process_hunk() actually does the work for a particular hunk.
 */
process_hunk() {
   while (hunk_pos < hunk_length) {
      if (! search_for_link())
         break ;
      if (get_procedure_name()) {
         write_data() ;
         scan_for_unlink() ;
      } else {
         printf("Warning:  Found a link with no procedure.\n") ;
         putword(0x4e55) ;
      }
   }
}
/*
 *   search_for_link() searches for a link a5 instruction.
 */
search_for_link() {
   int i ;

   while ((i=getword()) != 0x4e55) {
      putword(i) ;
      if (hunk_pos >= hunk_length)
         return(0) ;
   }
   return(1) ;
}
/*
 *   Get_procedure_name() looks for a procedure which has
 *   the address given.
 */
get_procedure_name() {
   while (proc_address < hunk_pos - 2)
      get_name_and_address() ;
   if (proc_address != hunk_pos - 2)
      printf("Procedure %ld hunk pos %ld\n", proc_address, hunk_pos - 2) ;
   return (proc_address == hunk_pos - 2) ;
}
/*
 *   We have found a procedure with a link.  We write out the
 *   appropriate data to the dataout file, and change the link
 *   instruction to a trap #3.
 */
write_data() {
   int data_size = getword() ;

   if (strcmp(proc_name, "__exit")==0) {
      __exit_seen = 1 ;
      proc_num = 0 ;
   } else
      proc_num = assigned ++ ;
   fprintf(datout, "%d %s %d\n", proc_num, proc_name + 1, data_size) ;
   putword(0x4e43) ;
   putword(proc_num) ;
   strcpy(cur_name, proc_name) ;
   get_name_and_address() ;
}
/*
 *   Scan_for_unlink() looks through the rest of the procedure
 *   for an unlink a5 followed by an rts.  If none are found, it
 *   complains.  If more than one is found, the subsequent ones
 *   are ignored (hidden static procedures!).
 */
scan_for_unlink() {
   int count = 0 ;
   int i, j ;

   while (hunk_pos < proc_address - 2) {
      i = getword() ;
loop:
      if (i == 0x4e5d)
         if ((j=getword()) == 0x4e75) {
            count++ ;
            putword(0x4e43) ;
            putword(0x8000 + proc_num) ;
            break ;
         } else {
            putword(i) ;
            i = j ;
            goto loop ;
         }
      else
         putword(i) ;
   }
   while (hunk_pos < proc_address - 2)
      putword(getword()) ;
   if (count != 1)
      printf("Warning:  %d unlinks found in procedure %s\n", count, cur_name) ;
}
/*
 *   Finish_up() finishes copying the executable file, with no
 *   changes.
 */
finish_up() {
   register int i ;

   while (proc_address < hunk_length)
      get_name_and_address() ;
   if (proc_address != hunk_length)
      printf("The .sym (%ld) and executable (%ld) don't seem to match!\n",
          proc_address, hunk_length) ;
   while ((i = getc(exein))!=EOF)
      putc(i, exeout) ;
   if (! __exit_seen)
      error("didn't see a call to __exit()") ;
}
---cut here---

rokicki@navajo.STANFORD.EDU (Tomas Rokicki) (12/23/86)

---cut here---
/*
 *   (C) 1986 Radical Eye Software.  This code may be used and distributed
 *   freely, so long as this notice stays in it and the banner messages
 *   are unchanged except for the version notices.  Written by Tomas Rokicki
 *   on December 19 through December 21 1986.
 *
 *   This is the second part of the profiler.  It resides in memory,
 *   and handles the traps from the execution of the .exe file.
 *
 *   ***This is a dangerous program!!!***  Do not run the patched
 *   executable while this program is not running.  Do not exit this
 *   program while the patched executable is running.  Do not choose
 *   the `initialize' gadget when this program is running.
 *
 *   Currently, this program does not work for programs which are invoked
 *   from the workbench, because the glue routine for Exit() does not
 *   start with a link, so it's not trapped.  This will require a patch
 *   in p1.  Setjmp/Longjmp also will throw it off.  But it should be
 *   useful anyway.
 *
 *   Usage:  p2 filename [memsize]
 *
 *   Reads  filename.pdt   (the profiler data file)
 *
 *   Writes filename.mon   (the monitor output data)
 */
#include "stdio.h"
#include "exec/memory.h"
#include "intuition/intuition.h"
/*
 *   Globals (yuck).
 */
long memsize = 20000 ;
FILE *f ;
char filename[255] ;
long old_trap_vector ;
long tick_val = 0 ;
int max_rout ;
char *filearg ;
struct IntuitionBase *IntuitionBase ;
struct GfxBase *GfxBase ;
/*
 *   We have to disable ^C interrupts.
 */
extern int Enable_Abort ;
/*
 *   Note that the following two structures are also used in
 *   p2a.asm; when changing them, also change the ones
 *   there.  Or better yet, put them into an include file.
 *   (I didn't because you need two include files, one for C and
 *   one for assembly, and I'm too lazy.)
 */
struct rout_stat {
   short recursion_count ;
   short data_offset ;
   long invoke_count ;
   long self ;
   long children ;
} ;
struct tstruct {
   short running ;
   char *tick_add ;
   char *dat_add ;
   char *stack ;
   char *stack_top ;
   char *stack_bot ;
   short error ;
   char *debug_ptr ;
} trap_glob ;
struct rout_stat *big_arr ;
extern int_rout(), trap_rout() ;
/*
 *   External routines.
 */
char *strcpy(), *strcat() ;
char *AllocMem() ;
char *OpenLibrary() ;
struct Window *OpenWindow() ;
long atol() ;
struct IntuiMessage *GetMsg() ;
char *malloc() ;
/*
 *   Startup installs the interrupt and the trap code.
 */
startup() {
/*
 *   Find the address of the vector.
 */
   register long *t = (long *)((*(long *)4)+0x94) ;
   register int *dest ;
/*
 *   Make sure we can allocate the trap.
 */
   if (AllocTrap(3L) != 3)
      error("! couldn't allocate trap #3") ;
/*
 *   Set up the global pointers.
 */
   trap_glob.tick_add = (char *)&tick_val ;
   trap_glob.dat_add = (char *)big_arr ;
/*
 *   We subtract 4 because we use the zeros at the end as
 *   a guard zone.
 */
   trap_glob.stack = (char *)big_arr + memsize - 4 ;
   trap_glob.stack_top = (char *)big_arr + memsize - 4 ;
   trap_glob.stack_bot = (char *)(big_arr + max_rout + 1) ;
/*
 *   Debug; take this out!
 *
   {
      long i ;
      trap_glob.debug_ptr = (char *)big_arr + memsize ;
      for (i=0; i<memsize; i++)
         *((char *)big_arr + memsize + i) = -1 ;
      printf("Trace is at %lx\n", trap_glob.debug_ptr) ;
   }
 */
   dest = (int *)trap_rout + 3 ;
   *(struct tstruct **)dest = &trap_glob ;
/*
 *   Store the address to increment here.
 */
   dest = (int *)int_rout + 3 ;
   *(long **)dest = &tick_val ;
/*
 *   Store the default jump routine at the end.
 */
   dest = (int *)int_rout + 6 ;
   *(long *)dest = *t ;
/*
 *   Start her up.
 */
   *t = (long)&int_rout ;
/*
 *   Now the trap.  Save the old value:
 */
   old_trap_vector = *(long *)0x8c ;
/*
 *   And install.
 */
   *(long *)0x8c = (long)&trap_rout ;
}
cleanup() {
/*
 *   Find the address of the vector.
 */
   register long *t = (long *)((*(long *)4)+0x94) ;
   register int *dest ;
/*
 *   Store the original address there.
 */
   dest = (int *)int_rout + 6 ;
   *t = *(long *)dest ;
/*
 *   Don't forget the trap!
 */
   *(long *)0x8c = old_trap_vector ;
   FreeTrap(3L) ;
}
/*
 *   The main routine is very simple.  For now, we don't even try to
 *   run the code, we just check out the gadgets.
 */
main(argc, argv)
int argc ;
char *argv[] ;
{
   Enable_Abort = 0 ;
   printf("This is p2, v1.0, (C) 1986 Radical Eye Software\n") ;
   if (argc < 2 || argc > 3)
      error("! usage: p2 filename [memsize]") ;
   filearg = argv[1] ;
   if ((f = fopen(strcat(strcpy(filename,filearg),".pdt"), "r"))==NULL)
      error("! couldn't open .pdt file") ;
   if (argc > 2)
      memsize = atol(argv[2]) ;
   if ((memsize & 3) != 0 || memsize < 1000 || memsize > 500000)
      error("! let's be reasonable in our memory size") ;
   get_data() ;
   startup() ;
   gadgets() ;
   cleanup() ;
   write_summary() ;
/*
 *   Free the memory.
 */
   if (big_arr != NULL)
      FreeMem(big_arr, memsize) ;
   exit(0) ;
}
/*
 *   We need a set of routines to deal with the names.  They add the names
 *   to the dynamic list, and free them on exit.  Note that this wastes
 *   on the average 19 bytes per name.
 */
struct namestruct {
   struct namestruct *next ;
   int number ;
   char name[1] ;
} *globnames = NULL ;
/*
 *   Addname() adds a name to our global list.
 */
addname(s, n)
char *s ;
int n ;
{
   register struct namestruct *p = (struct namestruct *)
         malloc(sizeof(struct namestruct)+strlen(s)) ;

   if (p==NULL)
      error("! couldn't allocate name struct\n") ;
   p->next = globnames ;
   globnames = p ;
   p->number = n ;
   strcpy(p->name, s) ;
}
/*
 *   Name() looks up a name in our global list.
 */
char *name(n)
int n ;
{
   register struct namestruct *p ;

   for (p=globnames; p!=NULL; p = p->next)
      if (p->number == n)
         return(p->name) ;
   return("<ERROR>") ;
}
/*
 *   Get_data() reads in the data from the .pdt file and initializes
 *   the arrays.
 */
get_data() {
   int num, offset ;
   char name[255] ;

/*
 *   The (*2) is there for debugging.
 */
   big_arr = (struct rout_stat *)AllocMem(memsize/* * 2*/, MEMF_CLEAR) ;
   if (big_arr == NULL)
      error("! couldn't allocate memory") ;
   while (fscanf(f, "%d %s %d", &num, name, &offset)==3) {
      if (num > max_rout) {
         max_rout = num ;
         if (max_rout * (long) sizeof(struct rout_stat) > memsize)
            error("! increase mem_size") ;
      }
      addname(name, num) ;
      big_arr[num].data_offset = offset ;
   }
   fclose(f) ;
}
/*
 *   Here are the window and gadget structure definitions.
 */
short borders[] = { 0, 0,
                    100, 0,
                    100, 11,
                    0, 11,
                    0, 0 } ;
struct Border b1 = {
   0, 0,
   1, 0, JAM1,
   5,
   borders,
   NULL
} ;
struct IntuiText it1 = {
      1, 0,
      JAM1,
      10, 2,
      NULL,
      (UBYTE *)"Initialize",
      NULL,
} ;
struct Gadget g1 = {
   NULL,
   60, 16,
   100, 11,
   GADGHCOMP,
   RELVERIFY,
   BOOLGADGET,
   (APTR)&b1,
   NULL,
   &it1,
   0,
   NULL,
   'i',
   NULL
} ;
struct IntuiText it2 = {
      1, 0,
      JAM1,
      26, 2,
      NULL,
      (UBYTE *)"Finish",
      NULL,
} ;
struct Gadget g2 = {
   &g1,
   60, 32,
   100, 11,
   GADGHCOMP,
   RELVERIFY,
   BOOLGADGET,
   (APTR)&b1,
   NULL,
   &it2,
   0,
   NULL,
   'f',
   NULL
} ;
struct NewWindow p2window =
{
   400, 20,
   220, 50,
   -1, -1,
   GADGETUP,
   WINDOWDEPTH | WINDOWDRAG | SMART_REFRESH,
   &g2,
   NULL,
   (UBYTE *)"Radical Eye Software",
   NULL,
   NULL,
   0, 0,
   0, 0,
   WBENCHSCREEN
} ;
/*
 *   Gadgets does the fun stuff.  It opens a small window with two
 *   gadgets, a `initialize' and a `write_data' gadget.
 */
gadgets() {
   struct Window *mywindow = NULL ;
   struct IntuiMessage *im ;
   char gadg = ' ' ;
   int i ;

   if ((IntuitionBase = (struct IntuitionBase *)
         OpenLibrary("intuition.library", 0L))==NULL)
      error("! couldn't open intuition") ;
   if ((GfxBase = (struct GfxBase *)
         OpenLibrary("graphics.library", 0L))==NULL)
      error("! couldn't open graphics") ;
   if ((mywindow = OpenWindow(&p2window))==NULL)
      error("! couldn't open window") ;
   while (gadg != 'f') {
      Wait(1L << mywindow->UserPort->mp_SigBit) ;
      while (im = GetMsg(mywindow->UserPort)) {
         if (im->Class == GADGETUP)
            gadg = ((struct Gadget *)(im->IAddress))->GadgetID ;
         else
            gadg = ' ' ;
         ReplyMsg(im) ;
      }
/*
 *   For now, we turn this off, because _exit() calls close().
 *   We are just going to have to trust the user.
 *
      if (i=trap_glob.running) {
         DisplayBeep(mywindow->WScreen) ;
         printf("Last routine called: %d\n", i) ;
         gadg = ' ' ;
      } else */ { 
         if (gadg == 'i') {
            initialize() ;
         } else if (gadg == 'f') {
            break ;
         }
      }
   }
   CloseWindow(mywindow) ;
}
/*
 *   The initialize() routine sets all of the counts back to zero.
 *   The program had better not be running at this point!
 */
initialize() {
   register int i ;
   register struct rout_stat *p ;

   for (i=0, p=big_arr; i<=max_rout; i++, p++) {
      p->recursion_count = 0 ;
      p->invoke_count = 0 ;
      p->self = 0 ;
      p->children = 0 ;
   }
   trap_glob.error = 0 ;
}
/*
 *   The write_summary() simply writes the data in the structure
 *   to a file; it is recovered by p3.
 */
#define fix(i) (p->invoke_count==0?0.0:i*0.065169336/p->invoke_count)
write_summary() {
   register int i, j ;
   register struct rout_stat *p ;
   int *sort_ptrs ;
   int t ;
   long s ;

   if ((f = fopen(strcat(strcpy(filename,filearg),".mon"), "w"))==NULL)
      error("! couldn't open .mon file") ;
   if ((sort_ptrs = (int *)AllocMem(((long)max_rout+1)*2, 0L))==NULL)
      error("! couldn't allocate pointers file") ;
/*
 *   We use a bubble sort to sort the array.
 */
   s = 0 ;
   for (i=0; i<=max_rout; i++) {
      sort_ptrs[i] = i ;
      s += big_arr[i].self ;
   }
   if (s != 0) {
      for (i=max_rout; i>0; i--)
         for (j=0; j<i; j++)
            if (big_arr[sort_ptrs[j]].self < big_arr[sort_ptrs[i]].self) {
               t = sort_ptrs[i] ;
               sort_ptrs[i] = sort_ptrs[j] ;
               sort_ptrs[j] = t ;
            }
      fprintf(f, "               Self        Self + Children\n") ;
      fprintf(f, "# Calls   ms/call   %%time   ms/call   %%time  name\n") ;
      fprintf(f,
      "-------  ---------- -----  ---------- -----  ---------------------\n") ;
      for (i=0; i<=max_rout; i++) {
         p = big_arr + sort_ptrs[i] ;
         fprintf(f, "%7ld  %10.2f %5.2f  %10.2f %5.2f  %s\n",
               p->invoke_count, fix(p->self), (100.0*p->self)/s,
               fix(p->children), (100.0*p->children)/s, name(sort_ptrs[i])) ;
      }
   }
   if (trap_glob.error != 0)
      printf("Error no %d in trace.\n", trap_glob.error) ;
/*
 *   This is debug code!  Remove me!
 *
   {
      int *t ;
      for (i=0, t=(int *)((char *)big_arr + memsize); *t != -1; i++, t++)
         fprintf(f, "%04x\n", *t) ;
   }
 */
   FreeMem(sort_ptrs, ((long)max_rout+1)*2) ;
}
/*
 *   error() prints an error message and exits if it was fatal.
 */
error(s)
char *s ;
{
   printf("%s\n", s) ;
   if (*s=='!') {
      if (big_arr != NULL)
         FreeMem(big_arr, memsize) ;
      exit(1) ;
   }
}
---cut here---

rokicki@navajo.STANFORD.EDU (Tomas Rokicki) (12/23/86)

---cut here---
;:ts=8
;
;   (C) 1986 Radical Eye Software.  This code may be used and distributed
;   freely, so long as this notice stays in it and the banner messages
;   are unchanged except for the version notices.  Written by Tomas Rokicki
;   on December 19 through December 21 1986.
;
;   This is the assembly language source for the second phase of the
;   profiler.  It contains a vertical retrace interrupt routine and a
;   trap routine.
;
	cseg
;
;   First, the interrupt routine which maintains the tick count.
;   All `$123456's are stuffed with absolute addresses within the
;   C routine.
;
	public	_int_rout
_int_rout
	add.l	#$100,$123456
	jmp	$fc108c
;
;   Now we have the trap routine; this can get entertaining.  First,
;   we save a bunch of registers, and then load the global vector.
;   It's format (from p2.c) is:
;
;          struct tstruct {
;             short running ;     0
;             char *tick_add ;    2
;             char *dat_add ;     6
;             char *stack ;       10
;             char *stack_top ;   14
;             char *stack_bot ;   18
;             short error ;       22
;             char *debug_ptr ;   24
;          } trap_glob ;          28
;
;   Error bits:
;      0   Stack underflow (increase memory!)
;      1   Unlink from different routine than linked
;
;   The dat_add array is organized as follows:
;          struct rout_stat {
;             short recursion_count ;  0
;             short data_offset ;      2
;             long invoke_count ;      4
;             long self ;              8
;             long children ;         12
;          } ;                        16
;
;   The stack we create will look like this:
;          struct stack_entry {
;             short routine_number ;   0
;             long start_time ;        2
;          } ;                         6
;
;   On every push (routine link), we push a new start_time and
;   routine_number, checking for recursion.  On every pop (routine
;   unlink), we check that the routine we are unlinking is the
;   proper one.  If it isn't, we complain for now.
;   Then, we calculate the elapsed time, add it to the 'child'
;   field and the 'self' field.  We then pop the stack, and subtract
;   the elapsed time from the exposed entry (if any.)  We have to
;   check the stack bounds at each step, and set the error flag if
;   something goes amiss.
;
	public	_trap_rout
_trap_rout
	movem.l	regs,-(sp)
	move.l	#$123456,a0
	move.l	2(a0),a1
;
;   Get and mask out the vpos
;
	move.l	$dff004,d1
	and.l	#$1ff00,d1
;
;   We loop around until we are sure the vpos didn't change
;   while we were busy.  This is necessary in case an interrupt
;   occurs during this piece of code; we can't mask the
;   interrupts since we are in user mode.
;
loop1
	move.l	(a1),d0
	move.l	$dff004,d2
	and.l	#$1ff00,d2
	cmp.l	d1,d2
	beq	okay1
	move.l	d2,d1
	bra	loop1
;
;   Now we check it against 5.  The values we use are
;   0..1 -> 254..255; 9..262 -> 0..253.  No other values
;   should be possible.
;
okay1
	lsr.l	#8,d1
	sub.w	#2,d1
	bcs	over3
over2
	sub.w	#7,d1
over3
	move.b	d1,d0
;
;   Now the time is in d0.  For now, we ignore it.  We
;   get the parameter following the trap instruction.
;
	move.l	2+regsize(sp),a1
	move.w	(a1)+,d1
;
;   Debug code; turn this off!
;
;	move.l	24(a0),a2
;	move.w	d1,(a2)+
;	move.l	a2,24(a0)
;	tst.w	d1
;
;   We now return you . . .
;
	bmi	unlink_rout
	bne	link_rout
;
;   The exit routine simply sets the done flag and then jumps into
;   the normal link routine.  What it should do, eventually, is
;   virtually unlink everything.  Especially since it will never
;   exit.  Yeah, that's what we do.  It gets a charge of zero.
;
exit_rout
	move.l	a1,2+regsize(sp)
	move.w	d1,(a0)
	lsl.w	#4,d1
	movem.l	d1/d4,-(sp)
	move.l	6(a0),a1
	move.l	10(a0),a2
	move.w	(a2),d1
;
;   Now we pop things, calculating the elapsed time.
;
unstack
	move.l	d0,d4
	sub.l	2(a2),d4
;
;   We add the elapsed time to both variables of the current procedure,
;   and subtract it from the self variable for the above procedure.
;
	add.l	d4,8(a1,d1.w)
	add.l	d4,12(a1,d1.w)
	move.w	6(a2),d1
;
;   This works because only if we've run off the end do we
;   have a zero here.
;
	beq	unstkd
	sub.l	d4,8(a1,d1.w)
	lea	6(a2),a2
	bra	unstack
unstkd
	lea	6(a2),a2
	move.l	a2,10(a0)
	movem.l	(sp)+,d1/d4
;
;   Finally, we do a link-style exit.
;
	move.w	2(a1,d1.w),d3
	movem.l	(sp)+,regs
	move.w	(sp)+,d2
	move.l	(sp)+,a1
	move	d2,sr
	move.l	a5,-(sp)
	move.l	a7,a5
	lea	(a7,d3.w),a7
	jmp	(a1)
;
;   This is the routine used for a link instruction.  We increment a
;   counter at the moment; that's all.
;
link_rout
	move.l	a1,2+regsize(sp)
	move.w	d1,(a0)
link_2
	lsl.w	#4,d1
	move.l	6(a0),a1
	add.l	#1,4(a1,d1.w)
;
;   Now we push a new entry onto the stack.
;
	move.l	10(a0),a2
	lea	-6(a2),a2
	cmp.l	18(a0),a2
	bcc	okay4
;
;   If there was an error, we simply overwrite
;   current entry.
;
	lea	6(a2),a2
	or.w	#1,22(a0)
okay4
	move.l	d0,2(a2)
	move.w	d1,(a2)
	move.l	a2,10(a0)
;
;   This is the exit code.
;
	move.w	2(a1,d1.w),d3
	movem.l	(sp)+,regs
	move.w	(sp)+,d2
	move.l	(sp)+,a1
	move	d2,sr
	move.l	a5,-(sp)
	move.l	a7,a5
	lea	(a7,d3.w),a7
	jmp	(a1)
;
;   An unlink puts us here.  For now, we simply exit; nothing fancy.
;   We skip over the return address from the interrupt and just return
;   directly to the calling routine.
;
unlink_rout
	move.l	a1,2+regsize(sp)
;
;   We get the stack pointer, and make sure we are unlinking from
;   the same routine we linked.
;
	lsl.w	#4,d1
	move.l	6(a0),a1
	move.l	10(a0),a2
	cmp.w	(a2),d1
	beq	okay5
;
;   If it's not okay, we pretend it is for now, but only after
;   setting the appropriate error bit.
;
	or.w	#2,22(a0)
okay5
;
;   Now we pop things, calculating the elapsed time.
;
	sub.l	2(a2),d0
	lea	6(a2),a2
	move.l	a2,10(a0)
;
;   We add the elapsed time to both variables of the current procedure,
;   and subtract it from the self variable for the above procedure.
;
	add.l	d0,8(a1,d1.w)
	add.l	d0,12(a1,d1.w)
	move.w	(a2),d1
;
;   If we are at the end, we don't do this.
;
	beq	over7
	sub.l	d0,8(a1,d1.w)
;
;   Exit code for unlink.
;
over7
	movem.l	(sp)+,regs
	move.w	(sp)+,d3
	move.l	(sp)+,a1
	move	d3,sr
	unlk	a5
	rts
;
;   Registers used.
;
regs	reg	a0-a2/d0-d1
regsize	equ	20
	end
---cut here---