jack@vu44.UUCP (Jack Jansen) (12/28/84)
[A crash a day keeps the users away] After reading the articles about VM/370, I was wondering how the average unix-wizard debugs his/her device driver(or other kernel mods). What I usually do is the most obvious thing: kicking everyone off the system, loading my new unix, (usually) watching it crash, and examining either the remains of it, or the real thing in action. Since this usually involves awful things like printf's on the console and lots of booting, I wondered whether there might be anyone out there who developped a more reasonable way of debugging kernel mods. Waiting for the great and simple solutions, -- Jack Jansen, {seismo|philabs|decvax}!mcvax!vu44!jack or ...!vu44!htsa!jack If *this* is my opinion, I wasn't sober at the time.
howard@cyb-eng.UUCP (Howard Johnson) (01/01/85)
> In my experience with debugging device drivers, the next step up from > After reading the articles about VM/370, I was wondering how the average unix-wizard debugs his/her device driver(or other kernel mods). printf's and intuition is an in-circuit emulator. (Well, it works for 68000's; I don't know about Vaxen.) Sometimes a fancy stand-alone software monitor will help. > Waiting for the great and simple solutions, When *I* hear about those great and simple solutions, I'll be trying to jump on the bandwagon before my job disappears. -- Howard Johnson Cyb Systems, Austin, TX ..!{gatech,harvard,ihnp4,nbires,noao,seismo}!ut-sally!cyb-eng!howard
bux@dual.UUCP (Dave Buxbaum) (01/01/85)
I agree that printf's are a rather crude method for debugging dirvers. One way to make them more useful is to add an "event trigger". The idea is to include a global varible which controlls the printf. Initially set to zero, this varible is bumped upon reaching the desired "event". It is useful to have access to the trigger from ADB. An example: - in the code looking at status: if (event happened) #ifdef DEBUG trigger++ #endif DEBUG /* Normal action here */ - in the code where more info is needed: #ifdef DEBUG if (trigger) printf(" SOMETHING HERE"); #endif DEBUG This idea can be expanded to include different levels of debugging on a "per bit" flavor. Also, you can flip the trigger using the debugger. This can be very useful. Another thing to try is using ADB on the running kernal. This is always a source of fun and excitment, usually leading to some spectacular crashes. The idea is, ofcourse, to look at queues and status bytes and other related structures to try to get an idea of what is happening. I think debugging drivers is really the most fun a programmer can hope to have while at work ... What a comforting thought!!! David Buxbaum dual!bux@BERKELEY.ARPA {ihnp4,ucbvax,hplabs,decwrl,cbosgd,sun,nsc,apple,pyramid}!dual!bux Dual Systems Corporation, Berkeley, California
guy@rlgvax.UUCP (Guy Harris) (01/02/85)
If you look at the source to the 4.2BSD "adb" (and, I believe, the S3 and S5 "adb") for the VAX, there is an EDDT #ifdef that seems to be part of somebody's effort to make a version of "adb" that can be linked or loaded with a kernel and used on it. I've thought about seeing how much of that can be made to work (especially with the help of the 4.2BSD or S5 "standalone" libraries, which could enable "adb" to read the kernel's symbol table from /<whatever your kernel is called>). You might want to play with that. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
chris@umcp-cs.UUCP (Chris Torek) (01/02/85)
Well, the ideal way is not to write any bugs. Of course, we sometimes tend to fall short of the ideal. I like to see where the machine crashed or what the erroneous behaviour was, think a while, and come up with something that explains the problem exactly, without "explaining" things that didn't happen. Then it's time to look at the code and see whether that explanation is correct. If all goes well, this takes only a few minutes, after which we get to observe the *next* crash . . . :-) -- (This line accidently left nonblank.) In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690 UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (01/02/85)
> Well, the ideal way is not to write any bugs. > > Of course, we sometimes tend to fall short of the ideal. I like to see > where the machine crashed or what the erroneous behaviour was, think a > while, and come up with something that explains the problem exactly, > without "explaining" things that didn't happen. Then it's time to look > at the code and see whether that explanation is correct. Right on! If you cannot legitimately EXPECT your code to work right the first time, then you do not have it under control. Better to think it out then do it right, rather than to tediously hack away hoping to get that "last known bug" out eventually..
henry@utzoo.UUCP (Henry Spencer) (01/02/85)
Short of virtual-machine systems, which (alas) are hard to do on many machines, there isn't an entirely satisfactory answer. However, note that Rob Warnock gave a paper on this very subject at Salt Lake, and there is at least a summary of it in the proceedings. His conclusion was that you can do a lot of debugging in user mode if you really try. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
karl@osu-eddie.UUCP (Karl Kleinpaste) (01/04/85)
---------- >If you look at the source to the 4.2BSD "adb" (and, I believe, the S3 and >S5 "adb") for the VAX, there is an EDDT #ifdef that seems to be part of >somebody's effort to make a version of "adb" that can be linked or loaded >with a kernel and used on it. ---Guy Harris of Computer Consoles, Inc ---------- Yes, and CCI's own PERPOS operating system (derivative of Unix OS [System ?3?5? don't know which any more]) has this rather neat facility called "ebug" which is a primitive form of adb compiled right into things. It is without a doubt one of the better tools available for debugging that OS. I was really glad I had it available to me when I was working there. Now, it is quite primitive, allowing only absolute addressing, thus making a sorted namelist of the OS an essential item to have on hand when debugging the kernel; but nonetheless you can't argue with a darn good idea. Breakpointing the kernel is fun, anyway! -- From the badly beaten keyboards of him who speaks +-best address in textured Technicolor *TyPe* f-O-n-T-s... | | Karl Kleinpaste @ Bell Labs, Columbus 614/860-5107 +---> cbrma!kk @ Ohio State University 614/422-0915 osu-eddie!karl
rpw3@redwood.UUCP (Rob Warnock) (01/05/85)
+--------------- | ... However, note | that Rob Warnock gave a paper on this very subject at Salt Lake, and | there is at least a summary of it in the proceedings. His conclusion | was that you can do a lot of debugging in user mode if you really try. | Henry Spencer | {allegra,ihnp4,linus,decvax}!utzoo!henry +--------------- As I already sent a copy to the original requestor, I will not post it here (~200 lines) unless the demand warrents. Henry states my general position well: You can do a lot, actually nearly all, from user mode. CAVEAT: The devices you're trying to talk to must not permanently lose data if you don't service them speedily. What this means is that debugging disks, Ethernets, streamer cartridge tapes, and other block-at-a-time devices is easy; debugging synchronous comm lines is o.k. if you put the packet frame level (NOT the protocol, just the frame) in the kernel and leave the protocol in user mode; debugging speech processors is a bit harder (but I'm doing it anyway these days). Debugging byte-at-a-time disk controllers (like an Apple "Woz chip" controller) is asking for your disk to be erased the first time "cron" wakes up while you have the write gate open! ;-} What lets user-mode debugging work in general is that IN MOST CASES the time-critical part of the driver ("catch the interrupt") is not the algorithmically complex part of the driver ("what do I do with THIS frame?"), so you can separate them. Put just a stub in the kernel to "break the interrupt latency" and work on the hard stuff in user mode. Rob Warnock Systems Architecture Consultant UUCP: {ihnp4,ucbvax!dual}!fortune!redwood!rpw3 DDD: (415)572-2607 USPS: 510 Trinidad Lane, Foster City, CA 94404
bsa@ncoast.UUCP (Brandon Allbery (the tame hacker on the North Coast)) (01/08/85)
System V may be on its way to a solution. It seems to me that someone could write a 'fake kernel' to run as a Unix user process, and use that to debug device drivers., provided that a user address space can be >256K. (of course, 512K might also do, since it wouldn't have to be a full multiuser kernel, just a device driver debugging aid). --bsa -- Brandon Allbery @ decvax!cwruecmp!ncoast!bsa (..ncoast!tdi1!bsa business) 6504 Chestnut Road, Independence, Ohio 44131 (216) 524-1416 Who said you had to be (a) a poor programmer or (b) a security hazard to be a hacker?
robert@cheviot.UUCP (Robert Stroud) (01/09/85)
<This line is a figment of your imagination> I have found lots of printf's very effective especially since the Unix workstation I debug my kernels on has a habit of stopping dead (or crashing unrecoverably into an incomprehensible microcode debugger) whenever there is a problem. I seriously think that this sort of total failure mode is a useful debugging aid as well (well nearly seriously :-)! Of course, printf's are very unscientific, but there are a couple of useful techniques you can use to improve things. Using a macro it is very easy to introduce the idea of a debugging level - the information is only printed if the level is higher than the argument to the macro. For example, # define dbgprint(x,y) if (level > x) printf(y) where y could be a list of arguments, (unfortunately this means that you need versions of the macro for 0,1,2 etc arguments), and level is either a global variable known to the device driver or if you're being really selective, local to each minor device (ie a component of your minor device control structure). You can set level with an ioctl, adb or kmem - I prefer the first. Similarly, you can add another ioctl which either dumps a load of tables there and then inside the kernel, or else returns some interesting structures to a debugging program which does with them as it wishes, (cf some of the ideas for replacing kmem floating around). Again you could do this sort of thing with kmem or adb directly, but not all of us have "adb -k" and kmem is too painful for my liking. The advantage of using ioctl is that you can then write arbitrarily complex C programs which set the debugging level or print out selected information using ioctl, but no doubt you could do the same using a shell-script, adb and your favourite combination of awk, sed, grep etc. These ideas are hardly original but perhaps they will be new to some people. Robert Stroud, Computing Laboratory, University of Newcastle upon Tyne
rpw3@redwood.UUCP (Rob Warnock) (01/12/85)
+--------------- | [Me:] As I already sent a copy to the original requestor, I will not post it | here (~200 lines) unless the demand warrents... +--------------- Nine requests in six days... o.k., here it is (attached below). This was only an extended abstract, and in any case did not advocate a specific "system", but was more of a general approach to the problem. Notes: 1. This material is Copyright 1984 by USENIX Association, as it appeared in the Proceedings, posted with permission. Any use other than for personal education must be authorized in writing by USENIX. 2. Use "nroff -ms". 3. The work described was done for Fortune Systems, a previous employer. Requests for further details or code should be addressed to them. Rob Warnock Systems Architecture Consultant UUCP: {ihnp4,ucbvax!dual}!fortune!redwood!rpw3 DDD: (415)572-2607 USPS: 510 Trinidad Lane, Foster City, CA 94404 -----cut here---------cut here---------cut here------------cut here----- .RT .nr LL 7i .ll 7i .nr PO 0.75i .po 0.75i .DA 20 May 1984 .TL User-Mode Development Of Hardware and Kernel Software .AU Robert P. Warnock, III .AI Fortune Systems Corporation Redwood City, California 94061 .AB As a general rule, the development of new hardware devices, operating systems drivers for those devices, and other new operating systems functions is considerably more difficult than the development of user-mode functions of similar complexity. Several factors contribute to this: hardware often doesn't work as initially expected (despite documentation); testing drivers and other kernel functions requires a very scarce resource \(em standalone time on the system; errors often leave the entire system hung or halted with no history trace, making crash analysis a challenge at best; the edit-compile-load cycle tends to be longer and more complex; and a logic analyzer is seldom the most convenient diagnostic tool. A set of techniques or "tricks" are presented, with examples of their application. While each one may be "obvious" by itself, and not particularly related to the others, together they illustrate a common principle and general method. The principle is that of separation of concerns, together with addressing those concerns in the proper order. "First make it work correctly; then make it work well while remaining correct." The general method is to do the development in user-mode software, using minimal "hooks" to make this possible. Then, after the functionality has been demonstrated and the critical algorithms debugged, the software is "ported" to kernel mode as necessary to attain the required performance goals. Other authors [Holt] [Wulf] have suggested, in fact, that the "kernel" of an operating system should be quite tiny (a few hundred lines of assembler), and that ALL of what one normally thinks of as the "operating system" should be run in user-mode, including device drivers, file systems, and schedulers. Unfortunately, most of us do not have the freedom to make major modifications to our operating system environment (typically .UX of some flavor or other). The examples given demonstrate that, at least during initial development, it is possible to obtain the benefits of the "user-mode style" even though the production version may be completely traditional in structure. The development projects used as examples took place at Fortune Systems between Summer 1982 and Summer 1984, and include: .IP 1. A byte-parallel file-transfer link was implemented between a DEC VAX-11/780 and a Fortune Systems 32:16. The VAX driver was developed in user mode using /dev/kUmem to access the hardware. The 32:16 driver was developed in user mode using the "sysphys" feature (UNIX Edition 7 "phys(2)" call) to map the user addresses to the hardware. After the file-transfer application was completely functional, the VAX driver was moved to the kernel, with a 25-fold improvement in performance. (The 32:16 driver was left in user-mode permanently.) .IP 2. A communications co-processor for the 32:16 was debugged using user-mode software (again using "sysphys"). When the UNIX driver was being debugged, host-resident user-mode code was used to mimic the co-processor application on the one hand, while making calls to the driver and comparing the results on the other. A similar procedure was used in developing a bit-mapped graphics controller and a parallel-I/O co-processor. .IP 3. A set of library subroutines was written to allow user-mode emulation of (proposed) new operating system calls. When the "system call" was invoked, instead of entering the normal (kernel-mode) system call handler, a call-request packet was passed through a "pty" to a daemon program which emulated the call and passed a "return value" packet back through the pty. Packet types were provided to allow the daemon to read and write the client process's address space (as the kernel would have been able to do). This facility was used to develop a network "socket" mechanism (similar to 4.2bsd sockets). A "network line discipline" was implemented using ordinary terminal ports as network devices. After the internet router and network line discipline were completely functional running in user mode as a system-call emulation daemons (including actually transmitting packets over a multi-host net), they were "ported" straightforwardly into the kernel. .IP 4. In the previous hardware examples, the physical device had its interrupts disabled when driven by the user-mode driver, so as not to crash the unmodified naive kernel with unexpected interrupts. (The user-mode drivers used either busywait-polling or sleep-polling for synchronization.) Similarly, DMA operation was not possible. In developing a local-area network interface, it was necessary to utilize both of those features. A slight kernel modification was made to reserve a block of physical memory which the kernel would not use. User-mode library routines were provided that (1)\ allowed allocation of that memory area to DMA operations (the results of which were then examined with "/dev/mem" or "sysphys"), and (2) allowed run-time installation of minimal interrupt-service routines (using "pre-compiled" templates) which merely stored the device status in a mailbox and cleared the interrupt (the user-mode driver polled the mailbox, rather than the hardware). Again, the device driver was not "ported" to kernel mode until the hardware had been completely checked out, the device driver algorithms were debugged, and the sample application programs had demonstrated end-to-end functionality. .LP Several examples have been given of developing what is normally considered "kernel mode" software in user mode. While these examples are not likely to apply directly to other environments, it is hoped that implementors will be encouraged to consider the "user-mode style" when planning future kernel-mode software development projects. .AE .FS .sp 1 [Holt] R. C. Holt, .I Concurrent Euclid, The UNIX System, and Tunis, .R Addison-Wesley, 1983 .FE .FS .sp 1 [Wulf] William A. Wulf, Roy Levin, and Samuel P. Harbison, .I HYDRA/C.mmp, .R McGraw-Hill, 1981 .FE
mark@rtech.ARPA (Mark Wittenberg) (01/19/85)
Thanks rob; that was a useful set of suggestions. When I was at Zehntel we had an additional solution to the problem of crashes while testing kernel software (we had plenty of "single-user" time). We were running SUN 68000 boards, and since we had to rewrite the boot proms anyway we put a small kernel debugger into the proms. Then when the system crashed we didn't have to reboot: we just activated the prom debugger and could then look at a kernel stack trace, the proc table, random memory ... very useful. BTW, one of the nasty problems we had wouldn't have been very well addressed by rob's techniques; the hardware in question worked ok EXCEPT for interrupts. Another one worked only when run from a standalone kernel (because it fit in 64k!). Mark Wittenberg Relational Technology, Inc. ucbvax!mtxinu!rtech!mark zehntel!rtech!mark