[comp.arch] debugging on a VLIW machine

bukys@cs.rochester.edu (Liudvikas Bukys) (04/21/88)

I'm curious... to run a debugger on a Multiflow Trace, and make any sense
out of what you see, I suppose that you have to compile it with trace
scheduling and other hairy optimizations off?  Is this true?

(I suppose one could automatically intersperse debugger code between
every line of source code, and have the trace scheduler intertwine it
so that a source level debugger could still be made to work.  Is this done?)

root@mfci.UUCP (SuperUser) (04/26/88)

In article <8860@sol.ARPA> bukys@cs.rochester.edu (Liudvikas Bukys) writes:
>I'm curious... to run a debugger on a Multiflow Trace, and make any sense
>out of what you see, I suppose that you have to compile it with trace
>scheduling and other hairy optimizations off?  Is this true?
>
>(I suppose one could automatically intersperse debugger code between
>every line of source code, and have the trace scheduler intertwine it
>so that a source level debugger could still be made to work.  Is this done?)

Short answer: Trace scheduling doesn't make the problem any worse.

Long answer:

During the debugging process users reason about their program at the
source level.  Any optimization which breaks the correspondence
between source code and object code is going to make the code
difficult to reason about while debugging.  It is the job of the
optimizers to determine which features specified by the source are
essential for correctness and which features are incidental.
However, when compiling for a source level debugger additional
contraints are placed on the compiler.  Features which were incidental
become essential.

One of the important properties users depend on while debugging is the
order of execution specified by the source.  If the compilation
process does not produce code which executes in the order specified by
the source, the user may find it confusing to debug.  Consider the
optimization of loop invariant motion.

	DO 10 I=1,10
		A = B(i) + F/Z;
10 	CONTINUE

Say F/Z is loop invariant.  The object code will correspond more to this:

	T = F/Z
	DO 10 I=1,10
		A = B(i) + T
10 	CONTINUE

Now say the user sets a break point at the entrance to the do loop.
If Z is zero, a fault will be generated before ever reaching the breakpoint.
Reasoning about the problem using the source code will lead to confusion.
"Why did the divide by zero fault occur on a line in the do loop before
the do loop was entered?"

There are other properties besides order of execution which the user depends
on.  Consider copy propagation and dead code removal in the following:

	A = 1
	CALL S(A)

After these optimizations the following would reflect the object code.

	CALL S(1)

By examining the source code, the user would expect to be able to set a
breakpoint at the call and change the value of A before the call is made.
But A is gone, so the user is confused again.

There's an interesting way to think about these optimizations.  The
optimizers must determine if certain criteria are met before the
optimization can be performed.  Copy propagation can take place only
if there is one reaching def at the call site.  Besides the reaching
def from "A=1" there is a reaching def from the user with the
debugger.  The same applies for loop invariant motion.  F/Z can be
moved out of the loop only if it is invariant.  If the program is
being compiled for debugging, than the optimizer can no longer assume
F, and Z are invariant, the user may change them with the debugger.
In a sense the user with the debugger is supplying
reaching defs for all user variables to everywhere in the program.
[The user can be thought of as a def. :-)]

So the classical optimizations are shut off when compiling for the
debugger because the debugger violates the criteria of these optimizations.
If the optimizations were performed anyway, the code would be difficult
to debug.

Notice that so far none of this has to do with trace scheduling.
Trace scheduling must determine which sequencing information presented
in the source program is incidental, and which is essential.  By
eliminating incidental sequencing information, the trace scheduler is
able to maximize parallelism.  But trace scheduling has exactly the
same problem.  Once the decision has been made to compile for the 
debugger, a great deal of sequencing information which was incidental
is now essential.

Having to make a decision to compile for the debugger, and accepting
a loss of performance when doing so, is accepted by users as standard
procedure.

Of course all of this has to do with source level debugging.  This is
what happens when using a debugger such as the Dbx under Unix.  If one
is willing to accept that the compiler will make transformations on
the source which will break the source to object correspondence then
an object level debugger can be used, such as Adb.

The OS group at Multiflow uses adb on optimized code with little
trouble.  The optimizers and the trace scheduler scramble the code
thoroughly.  Reasoning about an arbitrary point in a program using
the source code is difficult.  However, at the entry to each function
things are a little easier.  Often the state of global variables
and function arguments is enough of a clue to determine whats going on
in the kernel.  In addition, those that debug the kernel understand
the machine language and are sometimes willing to wade into a function.

----------------------------------------------------------------------
Chris Genly, genly@multiflow.com

peter@athena.mit.edu (Peter J Desnoyers) (04/26/88)

In article <365@m3.mfci.UUCP> genly@multiflow.com (Chris Hind Genly) writes:
>In article <8860@sol.ARPA> bukys@cs.rochester.edu (Liudvikas Bukys) writes:
>>I'm curious... to run a debugger on a Multiflow Trace, and make any sense
>>out of what you see, I suppose that you have to compile it with trace
>>scheduling and other hairy optimizations off?  Is this true?
>
>Short answer: Trace scheduling doesn't make the problem any worse.
>
I've gotten confused enough by debugging 68000 code at the source
level. Under some conditions, the Apollo C compiler will merge code
for parts of separate cases of a switch statement. Thus the debugger
thinks that you jumped by some magic from the middle of case 10 to
case 1. Debugging Multiflow code can't be much worse, as long as it
doesn't inline user subroutines. The only problem I can see is that
the current line might be off by one or two (although it should always
be able to tell whether the next instruction should call a subroutine)
and that steps might cover quite a few lines of code.

				Peter Desnoyers
				peter@athena.mit.edu

gwu@clyde.ATT.COM (George Wu) (04/27/88)

In article <8860@sol.ARPA> bukys@cs.rochester.edu (Liudvikas Bukys) writes:
>I'm curious... to run a debugger on a Multiflow Trace, and make any sense
>out of what you see, I suppose that you have to compile it with trace
>scheduling and other hairy optimizations off?  Is this true?
>
>(I suppose one could automatically intersperse debugger code between
>every line of source code, and have the trace scheduler intertwine it
>so that a source level debugger could still be made to work.  Is this done?)


     Being an expert on neither the Multiflow machine nor debuggers, this
is really more conjecture than fact, but anyways: I'd be suprised to see
debugging code stirred into user code by a compiler. My guess: for each
line of source code, the compiler stores into a table the address of the
first and last instruction it generates (or maybe just the first, since
that's all it really needs). The debugger can then use this to map the PC
to a line of source code.

     In order to set breakpoints, the debugger writes an instruction which
will cause a trap to the debugger into the appropriate location of the
instruction space. (NOTE: I've actually seen this done on a single-user,
MC68000 board. BUT, it won't work quite so easily on a multiuser system
where multiple processes may share the same instruction segment. Copy
on write? Nah, can't be. Maybe your original idea is right: one trap 
inserted by the compiler into the object/assembly for each source to object
map entry. I dunno.) The overwritten instruction will, of course,
eventually have to be restored.

     As for source level execution traces, before each line of source
code is executed, determine what will be the address of the next line
to be executed and write a trap there. Gets kinda tricky for loops,
conditionals, and such. Might be easier to write a trap to all locations
where execution might next go, rather than trying to predict exactly
where to go. And of course, if the compiler does indeed insert it's own
traps into the object code, traces are simple.

     One needs to delineate the two methods here, call them "static" and
"dynamic" debuggers. A static debugger is where the compiler insperses
it's own code, which implies that for this object, you are *always*
debugging. (Unless you can mask this particular interrupt. Or use a null
debugging routine and have the debugger load a new non-null routine, ie.
itself.) The dynamic method means you have to load you debugger and object.

     This seems more like an OS question to me, but I find it interesting,
and would like to see followups by some gurus. Any takers, or perhaps I
should move to comp.unix.wizards?

-- 
					George J Wu

UUCP: {ihnp4,ulysses,cbosgd,allegra}!clyde!gwu
ARPA: gwu%clyde.att.com@rutgers.edu or gwu@faraday.ece.cmu.edu