thoth@springs.cis.ufl.edu (Gilligan) (03/12/90)
While we're all discussing defunct and exiting processes, how about that SunOS 4.x that sometimes puts processes in permanent disk wait? Processes show up with a D in the STAT field of a ps -gux. These processes are unkillable and can only been removed with a halt and a reboot. They tend to collect other processes as well. If you ever put an emacs into this totally-hosed-state it is quite likely that any others you start up after it will follow it into D-space. I have on occasion been using X windows and watched the xload skyrocket to 15 as all my shells and compilations go to hell. The L1-a is the only solution. It hasn't happened as much since we got 8 megs of ram for all of our computers. Comments? explanations? -- ( My name's not really Gilligan, It's Robert Forsman, without an `e' )
antony@lbl-csam.arpa (Antony A. Courtney) (03/12/90)
In article <THOTH.90Mar11213935@springs.cis.ufl.edu> thoth@springs.cis.ufl.edu (Gilligan) writes: > > While we're all discussing defunct and exiting processes, how about >that SunOS 4.x that sometimes puts processes in permanent disk wait? >Processes show up with a D in the STAT field of a ps -gux. These >processes are unkillable and can only been removed with a halt and a >reboot. They tend to collect other processes as well. If you ever >put an emacs into this totally-hosed-state it is quite likely that any >others you start up after it will follow it into D-space. > I have on occasion been using X windows and watched the xload >skyrocket to 15 as all my shells and compilations go to hell. The >L1-a is the only solution. > It hasn't happened as much since we got 8 megs of ram for all of our >computers. > > Comments? explanations? >-- >( My name's not really Gilligan, It's Robert Forsman, without an `e' ) We noticed this a lot, too. The best explanation that I could come up with was the following: We frequently noticed that when this did happen, we were ocassionally getting messages in syslog about how the system "lost interrupt from controller". My theory on what was going on is this: your application requests access to some file. The inode of the file or a block of the file is allocated and is locked by the kernel on behalf of your process. Then the kernel initiates the disk controller for the read(), and puts your process to sleep on the block pending the DMA transfer from the disk controller. IF THE SYSTEM LOSES THE INTERRUPT TELLING IT THE DMA TRANSFER HAS COMPLETED, OR IF THE DMA TRANSFER NEVER OCCURES, YOUR PROCESS SLEEPS ON THIS BLOCK INDEFINITELY. Furthermore, any other processes which attempt to access this file will find the particular block locked and will sleep pending a brelse() of this block. Since your first process is never woken up, it never releases the block and those subsequent processes also sleep indefinitely. We have not had this happen for quite a while. (we have also been running SunSos 4.1Beta since X-mas). We may have also replaced our disk controller, I'm not sure. Has anyone else experienced this problem? Is my 'theory' a valid explanation? antony -- ******************************************************************************* Antony A. Courtney antony@lbl.gov Advanced Development Group ucbvax!lbl-csam.arpa!antony Lawrence Berkeley Laboratory AACourtney@lbl.gov