[comp.sys.apollo] Failure of tb after aqdev

anderson@spanky.sps.mot.com (08/11/89)

Release 10.1, dn3550:

When aqdev is used to acquire a device and the device 
is successfully acquired, traceback no longer works.
Any program that aborts while the device is acquired
shows nothing when tb is invoked.  This makes it a 
little difficult to debug programs that use the device.
At least this is the situation on my node.  Any ideas 
regarding how to get a tb after a device is acquired??
(I sure hate to go back to inserting print statements...)

Howard C. Anderson   ...sun!sunburn!dover!anderson

krowitz@RICHTER.MIT.EDU (David Krowitz) (08/11/89)

Hmmm ... my understanding of /com/aqdev under SR9.7 is that
the program acquires the device (ie. reads the ddf file,
loads and initializes the device libraries) and then uses
pgm$invoke to start a shell which then executes your commands.
Under SR9.7 the shell seems to be invoked in-process (ie. no
new processes show up on the system when /com/aqdev is run).
If I quit out of a program which is using the previously
acquired device I can get a traceback just fine.

SR10 is a different matter ... when you use /com/aqdev to
aquire a device under SR10, you start a new process which
runs the aqdev program. When you then run your application,
yet another process gets started to handle that. When the
application dies (or you quit out of it), the process goes
away just like real Unix processes do ... but, the system
does keep a process dump file which may contain enough info
for you to get a traceback. You will need to use the "tb"
command with one of the following options (I don't know
which will work in your case): "-last" will traceback the
last process in the dump file (which could be another
process which died just after yours), "-command" will look
for the last dump which came from a process running a
particular command line, "-proc" will look for a dump 
from a particular Unix PID, an Aegis UID, or a process name.
One of these options should be able to find the dump
of your application and give you the traceback.


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

krowitz@RICHTER.MIT.EDU (David Krowitz) (08/12/89)

Hmm ... sounds like /com/aqdev may have its own fault handler
set up, in which case the system fault handler might not
ever see the fault and would not write the process dump.
Without seeing the source code, I can only guess at what
is going on. Can anyone at Chelmsford pass this on to
the Apollo corporate mailing list for comment?


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

anderson@spanky.sps.mot.com (08/17/89)

Refinement of information:

Under release 10.1... 

While  a device is acquired, any program failing in any pad fails
to          produce          traceback          data           in
/sys/node_data/system_logs/proc_dump.   NOTHING  is  ever written
to the proc_dump file while a device is acquired no  matter  what
programs  fail  on  the  machine.   In  fact,  if  you delete the
proc_dump file, while aqdev is in effect, you will  see  that  no
attempt  is  ever  made  to  recreate  it  no matter what program
signals are generated.  (If the device is released and a  program
then  fails,  proc_dump  is  recreated  immediately  and contains
proper information concerning the failure.)

You can test this yourself.  Do "ld  -a  /dev",  look  for  "ddf"
files.  Here are some from our network:  

 char  ddf            2      2048  P    prwx-        dt2821
 char  ddf            2      2048  P    prwx-        dt2821F
 char  ddf            2      2048  P    prwx-        dt2821G
 char  ddf            2      2048  P    prwx-        dt2827
 char  ddf            2      2048  P    prwx-        dt2828
 char  ddf            2      2048  P    prwx-        pio_ddf
 char  ddf            2      2048  P    prwx-        sio2_ddf
 char  ddf            2      2048  P    prwx-        sio3_ddf
 char  ddf            2      2048  P    prwx-        exatape0

Use aqdev to acquire one of the "ddf" files, i.e.:

 $ aqdev /dev/sio2_ddf
 Device 4 acquired.  

While  acquired, you will see traceback and other debugging tools
that  depend  upon  sys/node_data/system_logs/proc_dump  rendered
useless. 

A ctrl z will release the device. 

 $ *** EOF *** 
 Device 4 released. 

Specific information regarding my node is:

Domain/OS  kernel(7),  revision  10.1.1.2, April 4, 1989  5:22:10
pm

  NODE CONFIGURATION
    Node Type:  DN3500
    Display type:  1024 x 800 color display
    68882 Floating Point Unit present. 
    Peripheral configuration:
        Disks:  winchester
        Networks: Ring
        Peripheral bus:  AT-bus
        Tapes:  none
    Disk types:  MSD-380M  -FA


I have a Data Translation DT-2823 digitizer  board  and  software
that  I  wrote  that  never  failed on release 9.7 but is failing
occasionally under release 10.1.  The board works  fine.   I  get
A/D  and D/A conversion as before but something is failing when I
take Fourier transforms of the  data  collected.   The  error  is
perhaps  because  of  the  new  unix-like libraries that are more
sensitive than the 9.7 Apollo  libraries.   (For  example  log(0)
under  9.7  yields  zero  -  under  10.1 log(0) yields "log: SING
error" and, if you get enough of  them,  eventually  aborts  your
program??)   Anyway  the  dt2821 device must be acquired in order
to run my program and the program aborts presumably  due  to  the
new  unix-like  libraries  but  it  is  difficult  to debug since
nothing is ever  written  to  sys/node_data/system_logs/proc_dump
while the device is acquired.  

Any  ideas  regarding how to get traceback data while a device is
acquired will be greatly appreciated. 

Howard C. Anderson   anderson@spanky.sps.mot.com