anderson@spanky.sps.mot.com (08/11/89)
Release 10.1, dn3550: When aqdev is used to acquire a device and the device is successfully acquired, traceback no longer works. Any program that aborts while the device is acquired shows nothing when tb is invoked. This makes it a little difficult to debug programs that use the device. At least this is the situation on my node. Any ideas regarding how to get a tb after a device is acquired?? (I sure hate to go back to inserting print statements...) Howard C. Anderson ...sun!sunburn!dover!anderson
krowitz@RICHTER.MIT.EDU (David Krowitz) (08/11/89)
Hmmm ... my understanding of /com/aqdev under SR9.7 is that the program acquires the device (ie. reads the ddf file, loads and initializes the device libraries) and then uses pgm$invoke to start a shell which then executes your commands. Under SR9.7 the shell seems to be invoked in-process (ie. no new processes show up on the system when /com/aqdev is run). If I quit out of a program which is using the previously acquired device I can get a traceback just fine. SR10 is a different matter ... when you use /com/aqdev to aquire a device under SR10, you start a new process which runs the aqdev program. When you then run your application, yet another process gets started to handle that. When the application dies (or you quit out of it), the process goes away just like real Unix processes do ... but, the system does keep a process dump file which may contain enough info for you to get a traceback. You will need to use the "tb" command with one of the following options (I don't know which will work in your case): "-last" will traceback the last process in the dump file (which could be another process which died just after yours), "-command" will look for the last dump which came from a process running a particular command line, "-proc" will look for a dump from a particular Unix PID, an Aegis UID, or a process name. One of these options should be able to find the dump of your application and give you the traceback. -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter@eddie.mit.edu krowitz%richter@athena.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)
krowitz@RICHTER.MIT.EDU (David Krowitz) (08/12/89)
Hmm ... sounds like /com/aqdev may have its own fault handler set up, in which case the system fault handler might not ever see the fault and would not write the process dump. Without seeing the source code, I can only guess at what is going on. Can anyone at Chelmsford pass this on to the Apollo corporate mailing list for comment? -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter@eddie.mit.edu krowitz%richter@athena.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)
anderson@spanky.sps.mot.com (08/17/89)
Refinement of information:
Under release 10.1...
While a device is acquired, any program failing in any pad fails
to produce traceback data in
/sys/node_data/system_logs/proc_dump. NOTHING is ever written
to the proc_dump file while a device is acquired no matter what
programs fail on the machine. In fact, if you delete the
proc_dump file, while aqdev is in effect, you will see that no
attempt is ever made to recreate it no matter what program
signals are generated. (If the device is released and a program
then fails, proc_dump is recreated immediately and contains
proper information concerning the failure.)
You can test this yourself. Do "ld -a /dev", look for "ddf"
files. Here are some from our network:
char ddf 2 2048 P prwx- dt2821
char ddf 2 2048 P prwx- dt2821F
char ddf 2 2048 P prwx- dt2821G
char ddf 2 2048 P prwx- dt2827
char ddf 2 2048 P prwx- dt2828
char ddf 2 2048 P prwx- pio_ddf
char ddf 2 2048 P prwx- sio2_ddf
char ddf 2 2048 P prwx- sio3_ddf
char ddf 2 2048 P prwx- exatape0
Use aqdev to acquire one of the "ddf" files, i.e.:
$ aqdev /dev/sio2_ddf
Device 4 acquired.
While acquired, you will see traceback and other debugging tools
that depend upon sys/node_data/system_logs/proc_dump rendered
useless.
A ctrl z will release the device.
$ *** EOF ***
Device 4 released.
Specific information regarding my node is:
Domain/OS kernel(7), revision 10.1.1.2, April 4, 1989 5:22:10
pm
NODE CONFIGURATION
Node Type: DN3500
Display type: 1024 x 800 color display
68882 Floating Point Unit present.
Peripheral configuration:
Disks: winchester
Networks: Ring
Peripheral bus: AT-bus
Tapes: none
Disk types: MSD-380M -FA
I have a Data Translation DT-2823 digitizer board and software
that I wrote that never failed on release 9.7 but is failing
occasionally under release 10.1. The board works fine. I get
A/D and D/A conversion as before but something is failing when I
take Fourier transforms of the data collected. The error is
perhaps because of the new unix-like libraries that are more
sensitive than the 9.7 Apollo libraries. (For example log(0)
under 9.7 yields zero - under 10.1 log(0) yields "log: SING
error" and, if you get enough of them, eventually aborts your
program??) Anyway the dt2821 device must be acquired in order
to run my program and the program aborts presumably due to the
new unix-like libraries but it is difficult to debug since
nothing is ever written to sys/node_data/system_logs/proc_dump
while the device is acquired.
Any ideas regarding how to get traceback data while a device is
acquired will be greatly appreciated.
Howard C. Anderson anderson@spanky.sps.mot.com