[comp.unix.wizards] Debugging the kernel: proper methods?

randy@uw-june.UUCP (William Randy Day) (06/21/87)

Is there a proper method for debugging kernel routines? Suppose, for
instance, that I would like to watch the operation of the getf() routine
in the kernel. What's the best way to proceed? I suppose I could make
a kernel with printf's dumping what I want to see, but that
would be infeasible with a commonly used function like getf().

Thanks,
Randy Day.
Internet (ARPA): randy@dbnet.cs.washington.edu
CSNET: randy%washington@relay.cs.net
UUCP: {decvax|ihnp4}!uw-beaver!uw-june!randy

djl@mips.UUCP (Dan Levin) (06/22/87)

In article <2713@uw-june.UUCP>, randy@uw-june.UUCP (William Randy Day) writes:
> Is there a proper method for debugging kernel routines?

Well, it all depends on what facilities you have available.

When I want to debug my kernels here at MIPS, I crank up our version of
dbx that understands how to debug running kernels. Then I put a breakpoint
in the interesting routine, and use standard symbolic tools to analyze
whatever I am interested in.

In past lives, I have spent a fair amount of time hunched over a hex symbol
table running a PROM level debugger under a kernel.  You need a
disassembly of the routine you want to examine, but most machines provide
some type of PROM debugger that lets you step through code. Just put a
breakpoint in the routine in question, and away you go.  Sometimes you
need to put a special instruction (like a brk or some kind of intr) in
to drop you into the debugger.  You can use adb to patch it into the
routine in question.

A final, and very painful, technique, is to use printf()'s.  It helps if
you have a method for controlling which printf()'s are enabled at any given
time, otherwise you get reams of stuff off the console.  Note that printf()'s
are sometimes the only way to get context on a problem...

-- 
			***dan

decwrl!mips!djl                  mips!djl@decwrl.dec.com

jfh@killer.UUCP (John Haugh) (06/24/87)

In article <479@winchester.UUCP>, djl@mips.UUCP (Dan Levin) writes:
> In article <2713@uw-june.UUCP>, randy@uw-june.UUCP (William Randy Day) writes:
> > Is there a proper method for debugging kernel routines?
> 
> A final, and very painful, technique, is to use printf()'s.  It helps if
> you have a method for controlling which printf()'s are enabled at any given
> time, otherwise you get reams of stuff off the console.  Note that printf()'s
> are sometimes the only way to get context on a problem...

I didn't invent the sprintf idea - John Bremsteller at Pinnacle was the
first I saw to use it.  When he had a problem, he would add these things
he called 'snaps' (like snapshot) to the code.  It was basically a sprintf
that also gave the stacked PS and PC registers.  All this went into a big
ring buffer (64K or so). There were tools for turning snaps on and off as
well as several ways of viewing them.

This way nothing gets on the screen and you can look quite a ways back in
time to find the problem.

- John.

Disclaimer:
	No disclaimer.  Whatcha gonna do, sue me?

stevesu@copper.TEK.COM (Steve Summit) (06/26/87)

Nobody has mentioned this yet, so I'll toss it in, although I
don't know the full details: someone once built adb into the
kernel, so it could "debug itself," so to speak.  You typed some
magic command, and the console terminal started talking adb.
You could set breakpoints and everything.

Building adb into a program is actually easier than it sounds.
The only real problem is uniqueifying all of its global variables
so they don't clash with those in your program.  You also need a
special version of ptrace that can examine and modify locations
in your own process instead of another one.

I built a copy of adb into a window manager I was working on --
invoking a special window manager command would cause a new
window to open up with adb "running" in it.  This was handy
because using conventional adb on a full-screen, interactive
process like a window manager is tricky (and requires two
terminals), and because most of the problems this window manager
had were hard to reproduce, but with adb "already there," you
could track down a bug when it appeared, rather than having to
recapture it in a later run under adb.

(Actually, I lied: it's only easy to build adb in if you leave
out breakpoints, which I did, because all I really wanted to do
was examine data structures.  Getting breakpoints to work would
require a writable text segment and a _r_e_a_l clever SIGTRAP handler.)

					Steve Summit
					stevesu@copper.tek.com

P.S.  Here, for your amusement, is the "special version of ptrace
that can examine and modify locations in your own process."  As
you can see, it's not rippingly difficult to write.  (Of course,
it ignores the requests having to do with running the other
process.  It also doesn't handle the u-area stuff)

	ptrace(request, pid, addr, data)
	int request;
	int pid;
	int *addr;
	int data;
	{
	switch(request)
		{
		case 0:
			return(0);

		case 1:
		case 2:			/* sorry, no split I&D */
			return(*addr);

		case 3:
			return(*addr);		/* ??? */

		case 4:
		case 5:			/* sorry, no split I&D */
			*addr = data;
			return(0);

		case 6:
			*addr = data;		/* ??? */
			return(0);
		}
	}

chris@mimsy.UUCP (Chris Torek) (06/27/87)

In article <479@winchester.UUCP>, djl@mips.UUCP (Dan Levin) writes:
>... I crank up our version of dbx that understands how to debug
>running kernels. ...
 
>In past lives, I have spent a fair amount of time hunched over a hex symbol
>table running a PROM level debugger under a kernel. ...

>A final, and very painful, technique, is to use printf()'s.

Funny, I usually use the old-fashioned technique:  I look at the
code and see why it did what it did, and recode it to do what it
should have done. :-)

Actually, running-kernel debuggers are handy.  There is one in use
internally at Berkeley.  Look for it someday in a BRL distribution.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seis't g't g'I hax!

meissner@xyzzy.UUCP (Michael Meissner) (06/30/87)

In article <1168@copper.TEK.COM> stevesu@copper.TEK.COM (Steve Summit) writes:
> Nobody has mentioned this yet, so I'll toss it in, although I
> don't know the full details: someone once built adb into the
> kernel, so it could "debug itself," so to speak.  You typed some
> magic command, and the console terminal started talking adb.
> You could set breakpoints and everything.

When I came to Data General 8 years ago, this was ancient history (binding
a real debugger into the kernel).  I have pity for those who try to
debug an OS with print statements.  Our UNIX group now has a source level
debugger available (the source level debugger runs on a host computer and
talks to the assembler level debugger linked in the kernel over async lines).
-- 
Michael Meissner, Data General.		Uucp: ...!mcnc!rti!xyzzy!meissner

zemon@felix.UUCP (Art Zemon) (07/02/87)

In article <7224@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>
>Funny, I usually use the old-fashioned technique:  I look at the
>code and see why it did what it did, and recode it to do what it
>should have done. :-)

Actually this isn't so far fetched.  We are all going to
spend X amount of time writing a program.  Now we can either
spend a little time writing it and a bunch of time debugging
it or we can spend all of time writing it correctly the
first time and no time debugging it.  Since the choice is
ours, we obviously prefer debugging to writing!

Of course *I* only spend time debugging so the rest of you
mortals won't feel inferior....  :-)
--
	-- Art Zemon
	   FileNet Corporation
	   Costa Mesa, California
	   ...!hplabs!felix!zemon

haynes@ucscc.UCSC.EDU.ucsc.edu (99700000) (07/05/87)

Believe I mentioned this already some time ago, but here is again an
idea stolen from the venerable Burroughs 5500.

Have a word or two in the kernel, called the option word(s).
Have a system call to read and write the option word.  Have
programs using this system call to display the option word
and to allow setting or clearing any bit.  (OK, you can just
use adb and not bother with the system call and the program,
but Burroughs made it convenient to use.)

#define a handy name for each bit of the option word you want to
use.

Now in the kernel you can put debugging printfs in the code that
are conditional on bits in the option word, so you can turn them
on or off while the system is running.  Of course you have to be
smart to choose what you are going to printf.  Some of the existing
printfs could well be put under option word control (e.g. the one
for file system full) so that once you have noted the condition you
can get the mess off the console terminal while you deal with the
situation.

Point is that having the option word encourages you to put in all
the printfs you think you might want, because if they turn out to
be uninteresting you can just turn them off without having to
recompile the kernel and reboot.

Of course you can use bits in the option word for things other
than printf-s.  Such as to enable or disable some experimental
piece of code that is supposed to fix a problem or improve performance.

Jim Haynes
haynes@ucscc.ucsc.edu
haynes@ucscc.bitnet
...ucbvax!ucscc!haynes

mash@mips.UUCP (John Mashey) (07/07/87)

Sigh.  Once again, it sounds like a shoemakers' children have no shoes
[with exception, so far, of Michael Meissner's note on DG debug envrionment.]
Despite all the talk of CASE, productivity tools, etc, it's sad to
find that people are still using adb, printfs, etc.  In particular,
when you're debugging things that UNIX has now gotten to, this is
nontrivial.  We did something like what Michale described, only a little
more maybe:

Before we'd even gotten silicon, we had the following:
1) a program that turned MIPS object code into VAX object code,
so it would run at a reasonable rate of speed.
2) a dbx variant that would deal with the results of 1), so you could
think you were actually dealing with a MIPS machine.

3) an instruction-level simulator, for debugging bootproms, kernels, etc.
4) a dbx variant that talks to 3).

after we got hardware, we ended up with:

5) a dbx variant that can download a kernel or standalone across
Ethernet onto a testbed, along with a debug monitor that talks
to the dbx across a serial link.

After you you get used to this, there is simply no going back.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086