anderson@spanky.sps.mot.com (08/11/89)
Release 10.1, dn3550: When aqdev is used to acquire a device and the device is successfully acquired, traceback no longer works. Any program that aborts while the device is acquired shows nothing when tb is invoked. This makes it a little difficult to debug programs that use the device. At least this is the situation on my node. Any ideas regarding how to get a tb after a device is acquired?? (I sure hate to go back to inserting print statements...) Howard C. Anderson ...sun!sunburn!dover!anderson
krowitz@RICHTER.MIT.EDU (David Krowitz) (08/11/89)
Hmmm ... my understanding of /com/aqdev under SR9.7 is that the program acquires the device (ie. reads the ddf file, loads and initializes the device libraries) and then uses pgm$invoke to start a shell which then executes your commands. Under SR9.7 the shell seems to be invoked in-process (ie. no new processes show up on the system when /com/aqdev is run). If I quit out of a program which is using the previously acquired device I can get a traceback just fine. SR10 is a different matter ... when you use /com/aqdev to aquire a device under SR10, you start a new process which runs the aqdev program. When you then run your application, yet another process gets started to handle that. When the application dies (or you quit out of it), the process goes away just like real Unix processes do ... but, the system does keep a process dump file which may contain enough info for you to get a traceback. You will need to use the "tb" command with one of the following options (I don't know which will work in your case): "-last" will traceback the last process in the dump file (which could be another process which died just after yours), "-command" will look for the last dump which came from a process running a particular command line, "-proc" will look for a dump from a particular Unix PID, an Aegis UID, or a process name. One of these options should be able to find the dump of your application and give you the traceback. -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter@eddie.mit.edu krowitz%richter@athena.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)
krowitz@RICHTER.MIT.EDU (David Krowitz) (08/12/89)
Hmm ... sounds like /com/aqdev may have its own fault handler set up, in which case the system fault handler might not ever see the fault and would not write the process dump. Without seeing the source code, I can only guess at what is going on. Can anyone at Chelmsford pass this on to the Apollo corporate mailing list for comment? -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter@eddie.mit.edu krowitz%richter@athena.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)
anderson@spanky.sps.mot.com (08/17/89)
Refinement of information: Under release 10.1... While a device is acquired, any program failing in any pad fails to produce traceback data in /sys/node_data/system_logs/proc_dump. NOTHING is ever written to the proc_dump file while a device is acquired no matter what programs fail on the machine. In fact, if you delete the proc_dump file, while aqdev is in effect, you will see that no attempt is ever made to recreate it no matter what program signals are generated. (If the device is released and a program then fails, proc_dump is recreated immediately and contains proper information concerning the failure.) You can test this yourself. Do "ld -a /dev", look for "ddf" files. Here are some from our network: char ddf 2 2048 P prwx- dt2821 char ddf 2 2048 P prwx- dt2821F char ddf 2 2048 P prwx- dt2821G char ddf 2 2048 P prwx- dt2827 char ddf 2 2048 P prwx- dt2828 char ddf 2 2048 P prwx- pio_ddf char ddf 2 2048 P prwx- sio2_ddf char ddf 2 2048 P prwx- sio3_ddf char ddf 2 2048 P prwx- exatape0 Use aqdev to acquire one of the "ddf" files, i.e.: $ aqdev /dev/sio2_ddf Device 4 acquired. While acquired, you will see traceback and other debugging tools that depend upon sys/node_data/system_logs/proc_dump rendered useless. A ctrl z will release the device. $ *** EOF *** Device 4 released. Specific information regarding my node is: Domain/OS kernel(7), revision 10.1.1.2, April 4, 1989 5:22:10 pm NODE CONFIGURATION Node Type: DN3500 Display type: 1024 x 800 color display 68882 Floating Point Unit present. Peripheral configuration: Disks: winchester Networks: Ring Peripheral bus: AT-bus Tapes: none Disk types: MSD-380M -FA I have a Data Translation DT-2823 digitizer board and software that I wrote that never failed on release 9.7 but is failing occasionally under release 10.1. The board works fine. I get A/D and D/A conversion as before but something is failing when I take Fourier transforms of the data collected. The error is perhaps because of the new unix-like libraries that are more sensitive than the 9.7 Apollo libraries. (For example log(0) under 9.7 yields zero - under 10.1 log(0) yields "log: SING error" and, if you get enough of them, eventually aborts your program??) Anyway the dt2821 device must be acquired in order to run my program and the program aborts presumably due to the new unix-like libraries but it is difficult to debug since nothing is ever written to sys/node_data/system_logs/proc_dump while the device is acquired. Any ideas regarding how to get traceback data while a device is acquired will be greatly appreciated. Howard C. Anderson anderson@spanky.sps.mot.com