[comp.sys.hp] tracing/setting breakpoints on shared binaries and accross fork

jinx@zurich.ai.mit.edu (Guillermo J. Rozas) (10/13/90)

Here's a question and a suggestion for HP-UX kernel and debugger gurus:

I help develop and maintain a Lisp system (MIT C Scheme).  The people
in our group are often running the binary for the latest version, and
occasionally come up with some strange behavior that cannot be
examined with its internal debugger, but instead requires the use of a
lower-level tool, like cdb, xdb, adb, or gdb.

The problem is that typically the binary is being run simultaneously
by multiple users (our set-up is a 30 client cluster served by an 850)
and it seems that no debugger allows me to set breakpoints after
attaching it to the running process.

Restarting the process is not a viable solution because the process
often has been running for hours or days and it would take that long
to reconstruct the current (wedged) state even it the appropriate
input could be typed from scratch.  The problems are not necessarily
reproducible either, so being able to debug the existing process is
crucial.

A related problem is attempting to debug a program that forks.  Again,
it seems that setting breakpoints accross calls to fork produces
strange results at best (unexpected trace/bpt traps from the
children), and at worst does not work.  I run into this when
attempting to debug non-inetd-based network daemons.

Is there any way of bypassing this shared binary problem?

If there isn't, I think it is an important problem to resolve.
Besides debugging long-running shared programs, I suspect that
applications are slowly drifting away from the single process model
and towards the cooperating processes model and this problem is
definitely getting in the way.

Is there a serious technical reason why this cannot be solved easily?
I'm not being facetious; I would like to understand the issue.

Along the same lines, here's a suggestion:

Add a ptrace(2) option that tells the OS that any children processes
of the process being debugged should immediately receive the SIGSTOP
signal after the call to fork(2).  In this way, a new incarnation of
the debugger could be attached to the child in order to debug it.
Spurious children could be continued easily by manually sending
SIGCONT.

This would make debugging parallel programs (whether executing on
multiple cpus or time-slicing on one) much easier.

jlol@REMUS.EE.BYU.EDU (Jay Lawlor) (10/13/90)

>>>>> On 13 Oct 90 05:56:06 GMT, jinx@zurich.ai.mit.edu (Guillermo J. Rozas) said:

GJR> Here's a question and a suggestion for HP-UX kernel and debugger gurus:

GJR> I help develop and maintain a Lisp system (MIT C Scheme).  The people
GJR> in our group are often running the binary for the latest version, and
GJR> occasionally come up with some strange behavior that cannot be
GJR> examined with its internal debugger, but instead requires the use of a
GJR> lower-level tool, like cdb, xdb, adb, or gdb.

GJR> The problem is that typically the binary is being run simultaneously
GJR> by multiple users (our set-up is a 30 client cluster served by an 850)
GJR> and it seems that no debugger allows me to set breakpoints after
GJR> attaching it to the running process.

GJR> Restarting the process is not a viable solution because the process
GJR> often has been running for hours or days and it would take that long
GJR> to reconstruct the current (wedged) state even it the appropriate
GJR> input could be typed from scratch.  The problems are not necessarily
GJR> reproducible either, so being able to debug the existing process is
GJR> crucial.

This seems to work (at least on 300 series):

let 'progA' be the program to be debugged that might be in use by
multiple users.

% cp progA progB

cdb -P <pid of progA> progB