[net.unix-wizards] How do *you* debug device drivers?

jack@vu44.UUCP (Jack Jansen) (12/28/84)

[A crash a day keeps the users away]

 After reading the articles about VM/370, I was wondering how the
average unix-wizard debugs his/her device driver(or other kernel mods).

 What I usually do is the most obvious thing: kicking everyone off
the system, loading my new unix, (usually) watching it crash,
and examining either the remains of it, or the real thing in action.

 Since this usually involves awful things like printf's on the console
and lots of booting, I wondered whether there might be anyone out
there who developped a more reasonable way of debugging kernel mods.

Waiting for the great and simple solutions,
-- 
	Jack Jansen, {seismo|philabs|decvax}!mcvax!vu44!jack
	or				       ...!vu44!htsa!jack
If *this* is my opinion, I wasn't sober at the time.

howard@cyb-eng.UUCP (Howard Johnson) (01/01/85)

> In my experience with debugging device drivers, the next step up from
>  After reading the articles about VM/370, I was wondering how the

average unix-wizard debugs his/her device driver(or other kernel mods).
printf's and intuition is an in-circuit emulator.  (Well, it works for
68000's; I don't know about Vaxen.)  Sometimes a fancy stand-alone software
monitor will help.

> Waiting for the great and simple solutions,

When *I* hear about those great and simple solutions, I'll be trying to jump
on the bandwagon before my job disappears.
-- 
	Howard Johnson		Cyb Systems, Austin, TX
..!{gatech,harvard,ihnp4,nbires,noao,seismo}!ut-sally!cyb-eng!howard

bux@dual.UUCP (Dave Buxbaum) (01/01/85)

I agree that printf's are a rather crude method for debugging dirvers.
One way to make them more useful is to add an "event trigger". The
idea is to include a global varible which controlls the printf. Initially
set to zero, this varible is bumped upon reaching the desired "event".
It is useful to have access to the trigger from ADB.

An example:

 - in the code looking at status:
		if (event happened)
		#ifdef DEBUG
			trigger++
		#endif DEBUG
			/* Normal action here */

- in the code where more info is needed:
		#ifdef DEBUG
		if (trigger)
			printf(" SOMETHING HERE");
		#endif DEBUG

This idea can be expanded to include different levels of debugging on a
"per bit" flavor. Also, you can flip the trigger using the debugger.  This
can be very useful.

Another thing to try is using ADB on the running kernal.  This is always
a source of fun and excitment, usually leading to some spectacular crashes.
The idea is, ofcourse, to look at queues and status bytes and other related
structures to try to get an idea of what is happening.

I think debugging drivers is really the most fun a programmer can hope to have
while at work ... What a comforting thought!!!



	David Buxbaum

	dual!bux@BERKELEY.ARPA
	{ihnp4,ucbvax,hplabs,decwrl,cbosgd,sun,nsc,apple,pyramid}!dual!bux
	Dual Systems Corporation, Berkeley, California

guy@rlgvax.UUCP (Guy Harris) (01/02/85)

If you look at the source to the 4.2BSD "adb" (and, I believe, the S3 and
S5 "adb") for the VAX, there is an EDDT #ifdef that seems to be part of
somebody's effort to make a version of "adb" that can be linked or loaded
with a kernel and used on it.  I've thought about seeing how much of that
can be made to work (especially with the help of the 4.2BSD or S5 "standalone"
libraries, which could enable "adb" to read the kernel's symbol table from
/<whatever your kernel is called>).  You might want to play with that.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

chris@umcp-cs.UUCP (Chris Torek) (01/02/85)

Well, the ideal way is not to write any bugs.

Of course, we sometimes tend to fall short of the ideal.  I like to see
where the machine crashed or what the erroneous behaviour was, think a
while, and come up with something that explains the problem exactly,
without "explaining" things that didn't happen.  Then it's time to look
at the code and see whether that explanation is correct.

If all goes well, this takes only a few minutes, after which we get to
observe the *next* crash . . . :-)
-- 
(This line accidently left nonblank.)

In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (01/02/85)

> Well, the ideal way is not to write any bugs.
> 
> Of course, we sometimes tend to fall short of the ideal.  I like to see
> where the machine crashed or what the erroneous behaviour was, think a
> while, and come up with something that explains the problem exactly,
> without "explaining" things that didn't happen.  Then it's time to look
> at the code and see whether that explanation is correct.

Right on!  If you cannot legitimately EXPECT your code to work right
the first time, then you do not have it under control.  Better to think
it out then do it right, rather than to tediously hack away hoping to
get that "last known bug" out eventually..

henry@utzoo.UUCP (Henry Spencer) (01/02/85)

Short of virtual-machine systems, which (alas) are hard to do on many
machines, there isn't an entirely satisfactory answer.  However, note
that Rob Warnock gave a paper on this very subject at Salt Lake, and
there is at least a summary of it in the proceedings.  His conclusion
was that you can do a lot of debugging in user mode if you really try.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

karl@osu-eddie.UUCP (Karl Kleinpaste) (01/04/85)

----------
>If you look at the source to the 4.2BSD "adb" (and, I believe, the S3 and
>S5 "adb") for the VAX, there is an EDDT #ifdef that seems to be part of
>somebody's effort to make a version of "adb" that can be linked or loaded
>with a kernel and used on it. ---Guy Harris of Computer Consoles, Inc
----------
Yes, and CCI's own PERPOS  operating  system  (derivative of Unix OS [System
?3?5?  don't  know  which any more]) has this rather  neat  facility  called
"ebug" which is a primitive form of  adb  compiled right into things.  It is
without  a doubt one of the better tools available for debugging that OS.  I
was really glad I had it available to me when I was working there.

Now, it is quite primitive, allowing only absolute addressing, thus making a
sorted  namelist of the OS an essential item to have on hand when  debugging
the  kernel;  but  nonetheless  you  can't argue  with  a  darn  good  idea.
Breakpointing the kernel is fun, anyway!
-- 
From the badly beaten keyboards of him who speaks     +-best address
in textured Technicolor *TyPe* f-O-n-T-s...           |
						      |
Karl Kleinpaste @ Bell Labs, Columbus   614/860-5107  +---> cbrma!kk
                @ Ohio State University 614/422-0915  osu-eddie!karl

rpw3@redwood.UUCP (Rob Warnock) (01/05/85)

+---------------
|                                                     ... However, note
| that Rob Warnock gave a paper on this very subject at Salt Lake, and
| there is at least a summary of it in the proceedings.  His conclusion
| was that you can do a lot of debugging in user mode if you really try.
|	Henry Spencer |	{allegra,ihnp4,linus,decvax}!utzoo!henry
+---------------

As I already sent a copy to the original requestor, I will not post it
here (~200 lines) unless the demand warrents. Henry states my general
position well: You can do a lot, actually nearly all, from user mode.

CAVEAT: The devices you're trying to talk to must not permanently lose
data if you don't service them speedily. What this means is that debugging
disks, Ethernets, streamer cartridge tapes, and other block-at-a-time devices
is easy; debugging synchronous comm lines is o.k.  if you put the packet
frame level (NOT the protocol, just the frame) in the kernel and leave
the protocol in user mode; debugging speech processors is a bit harder
(but I'm doing it anyway these days). Debugging byte-at-a-time disk
controllers (like an Apple "Woz chip" controller) is asking for your
disk to be erased the first time "cron" wakes up while you have the
write gate open! ;-}

What lets user-mode debugging work in general is that IN MOST CASES
the time-critical part of the driver ("catch the interrupt") is not
the algorithmically complex part of the driver ("what do I do with
THIS frame?"), so you can separate them. Put just a stub in the kernel
to "break the interrupt latency" and work on the hard stuff in user mode.


Rob Warnock
Systems Architecture Consultant

UUCP:	{ihnp4,ucbvax!dual}!fortune!redwood!rpw3
DDD:	(415)572-2607
USPS:	510 Trinidad Lane, Foster City, CA  94404

bsa@ncoast.UUCP (Brandon Allbery (the tame hacker on the North Coast)) (01/08/85)

System V may be on its way to a solution.  It seems to me that someone
could write a 'fake kernel' to run as a Unix user process, and use that
to debug device drivers., provided that a user address space can be >256K.
(of course, 512K might also do, since it wouldn't have to be a full multiuser
kernel, just a device driver debugging aid).

--bsa
-- 
  Brandon Allbery @ decvax!cwruecmp!ncoast!bsa (..ncoast!tdi1!bsa business)
	6504 Chestnut Road, Independence, Ohio 44131   (216) 524-1416
    Who said you had to be (a) a poor programmer or (b) a security hazard
			     to be a hacker?

robert@cheviot.UUCP (Robert Stroud) (01/09/85)

<This line is a figment of your imagination>

I have found lots of printf's very effective especially since the
Unix workstation I debug my kernels on has a habit of stopping dead
(or crashing unrecoverably into an incomprehensible microcode debugger)
whenever there is a problem. I seriously think that this sort of total
failure mode is a useful debugging aid as well (well nearly seriously :-)!

Of course, printf's are very unscientific, but there are a couple of useful
techniques you can use to improve things. Using a macro it is very easy
to introduce the idea of a debugging level - the information is only
printed if the level is higher than the argument to the macro. For example,

# define dbgprint(x,y)  if (level > x) printf(y)

where y could be a list of arguments, (unfortunately this means that you need
versions of the macro for 0,1,2 etc  arguments), and level is either a global
variable known to the device driver or if you're being really selective,
local to each minor device (ie a component of your minor device control
structure). You can set level with an ioctl, adb or kmem - I prefer the first.

Similarly, you can add another ioctl which either dumps a load of tables there
and then inside the kernel, or else returns some interesting structures to a
debugging program which does with them as it wishes, (cf some of the ideas
for replacing kmem floating around). Again you could do this sort of thing
with kmem or adb directly, but not all of us have "adb -k" and kmem is too
painful for my liking. 

The advantage of using ioctl is that you can then write arbitrarily complex
C programs which set the debugging level or print out selected information
using ioctl, but no doubt you could do the same using a shell-script, adb
and your favourite combination of awk, sed, grep etc.

These ideas are hardly original but perhaps they will be new to some people.

Robert Stroud,
Computing Laboratory,
University of Newcastle upon Tyne

rpw3@redwood.UUCP (Rob Warnock) (01/12/85)

+---------------
| [Me:] As I already sent a copy to the original requestor, I will not post it
| here (~200 lines) unless the demand warrents...
+---------------

Nine requests in six days... o.k., here it is (attached below). This was only
an extended abstract, and in any case did not advocate a specific "system",
but was more of a general approach to the problem.

Notes:	1. This material is Copyright 1984 by USENIX Association,
	   as it appeared in the Proceedings, posted with permission.
	   Any use other than for personal education must be authorized
	   in writing by USENIX.

	2. Use "nroff -ms".
	
	3. The work described was done for Fortune Systems, a previous employer.
	   Requests for further details or code should be addressed to them.


Rob Warnock
Systems Architecture Consultant

UUCP:	{ihnp4,ucbvax!dual}!fortune!redwood!rpw3
DDD:	(415)572-2607
USPS:	510 Trinidad Lane, Foster City, CA  94404


-----cut here---------cut here---------cut here------------cut here-----
.RT
.nr LL 7i
.ll 7i
.nr PO 0.75i
.po 0.75i
.DA 20 May 1984
.TL
User-Mode Development
Of Hardware and Kernel Software
.AU
Robert P. Warnock, III
.AI
Fortune Systems Corporation
Redwood City, California 94061
.AB
As a general rule, the development of new hardware devices,
operating systems drivers for those devices,
and other new operating systems functions
is considerably more difficult than the development of
user-mode functions of similar complexity.
Several factors contribute to this:
hardware often doesn't work as initially expected (despite documentation);
testing drivers and other kernel functions requires a very scarce resource \(em
standalone time on the system;
errors often leave the entire system hung or halted with no history trace,
making crash analysis a challenge at best;
the edit-compile-load cycle tends to be longer and more complex;
and a logic analyzer is seldom the most convenient diagnostic tool.

A set of techniques or "tricks" are presented,
with examples of their application.
While each one may be "obvious" by itself,
and not particularly related to the others,
together they illustrate a common principle and general method.
The principle is that of separation of concerns,
together with addressing those concerns in the proper order.
"First make it work correctly;
then make it work well while remaining correct."
The general method is to do the development in user-mode software,
using minimal "hooks" to make this possible.
Then, after the functionality has been demonstrated and the critical
algorithms debugged,
the software is "ported" to kernel mode as necessary to
attain the required performance goals.

Other authors [Holt] [Wulf]
have suggested, in fact, that the
"kernel" of an operating system should be quite tiny
(a few hundred lines of assembler),
and that ALL of what one normally thinks of as the "operating system"
should be run in user-mode, including device drivers, file systems,
and schedulers.
Unfortunately, most of us do not have the freedom to make major
modifications to our operating system environment (typically
.UX
of some flavor or other).
The examples given demonstrate that, at least during initial development,
it is possible to obtain the benefits of the "user-mode style"
even though the production version may be completely traditional in structure.

The development projects used as examples took place at Fortune Systems
between Summer 1982 and Summer 1984, and include:
.IP 1.
A byte-parallel file-transfer link was implemented between a DEC VAX-11/780
and a Fortune Systems 32:16.
The VAX driver was developed in user mode using /dev/kUmem
to access the hardware.
The 32:16 driver was developed in user mode using the "sysphys" feature
(UNIX Edition 7 "phys(2)" call) to map the user addresses to the hardware.
After the file-transfer application was completely functional, the
VAX driver was moved to the kernel, with a 25-fold improvement in performance.
(The 32:16 driver was left in user-mode permanently.)
.IP 2.
A communications co-processor for the 32:16 was debugged using user-mode
software (again using "sysphys").
When the UNIX driver was being debugged, host-resident user-mode code
was used to mimic the co-processor application on the one hand,
while making calls to the driver and comparing the results on the other.
A similar procedure was used in developing a bit-mapped
graphics controller and a parallel-I/O co-processor.
.IP 3.
A set of library subroutines was written to allow user-mode emulation
of (proposed) new operating system calls.
When the "system call" was invoked, instead of entering the normal
(kernel-mode) system call handler,
a call-request packet was passed through a "pty" to a daemon program which
emulated the call and passed a "return value" packet back through the pty.
Packet types were provided to allow the daemon to read and write
the client process's address space (as the kernel would have been able to do).

This facility was used to develop a network "socket" mechanism
(similar to 4.2bsd sockets).
A "network line discipline" was implemented
using ordinary terminal ports as network devices.
After the internet router and network line discipline were completely
functional running in user mode as a system-call emulation daemons
(including actually transmitting packets over a multi-host net),
they were "ported" straightforwardly into the kernel.
.IP 4.
In the previous hardware examples, the physical device had its interrupts
disabled when driven by the user-mode driver, so as not to crash the
unmodified naive kernel with unexpected interrupts.
(The user-mode drivers used either busywait-polling or sleep-polling
for synchronization.)
Similarly, DMA operation was not possible.

In developing a local-area network interface, it was necessary to
utilize both of those features.
A slight kernel modification was made to reserve a block of physical
memory which the kernel would not use.
User-mode library routines were provided that
(1)\ allowed allocation of that memory area to DMA operations
(the results of which were then examined with "/dev/mem" or "sysphys"),
and (2) allowed run-time installation of minimal interrupt-service routines
(using "pre-compiled" templates) which merely stored the device status
in a mailbox and cleared the interrupt
(the user-mode driver polled the mailbox, rather than the hardware).

Again, the device driver was not "ported" to kernel mode until
the hardware had been completely checked out,
the device driver algorithms were debugged,
and the sample application programs had demonstrated end-to-end functionality.
.LP
Several examples have been given of developing what is normally considered
"kernel mode" software in user mode.
While these examples are not likely to apply directly to other environments,
it is hoped that implementors will be encouraged to consider the
"user-mode style" when planning future kernel-mode software
development projects.
.AE
.FS
.sp 1
[Holt]
R. C. Holt,
.I
Concurrent Euclid, The UNIX System, and Tunis,
.R
Addison-Wesley, 1983
.FE
.FS
.sp 1
[Wulf]
William A. Wulf, Roy Levin, and Samuel P. Harbison,
.I
HYDRA/C.mmp,
.R
McGraw-Hill, 1981
.FE

mark@rtech.ARPA (Mark Wittenberg) (01/19/85)

Thanks rob; that was a useful set of suggestions.

When I was at Zehntel we had an additional solution to the problem of crashes
while testing kernel software (we had plenty of "single-user" time).
We were running SUN 68000 boards, and since we had to rewrite the boot proms
anyway we put a small kernel debugger into the proms.  Then when the system
crashed we didn't have to reboot: we just activated the prom debugger and could
then look at a kernel stack trace, the proc table, random memory ... very
useful.

BTW, one of the nasty problems we had wouldn't have been very well addressed
by rob's techniques; the hardware in question worked ok EXCEPT for interrupts.
Another one worked only when run from a standalone kernel (because it fit in
64k!).

Mark Wittenberg
Relational Technology, Inc.

ucbvax!mtxinu!rtech!mark
zehntel!rtech!mark