hagan@scotty.dccs.upenn.edu (John Dotts Hagan) (07/26/90)
OK - I could use some help from you, the people, who use DECstations. I have submitted a problem to Digital about trouble starting jobs from /bin/csh, specifically gnu emacs. Unfortunately, Digital seems highly skeptical that I am really having a problem: "I think I mentioned that we have hundreds of people here using emacs and the reported seg fault problem has never been seen here." -- unname Digital Ultrix Engineer latest response to the bug report If other people are having the problem, please let me know (and perhaps Digital). I will also let them know that I am not alone with the problem (or maybe I am!). Here is the problem: You are running in /bin/csh like normal, and try and start an application (like the gnu emacs Digital distributes an unsupported subset): % emacs Segmentation fault % In fact, if you keep trying to run emacs, it just keeps faulting: % emacs Segmentation fault % emacs Segmentation fault % emacs Segmentation fault % emacs Segmentation fault % However, if you run another task: % emacs Segmentation fault % emacs Segmentation fault % emacs Segmentation fault % emacs Segmentation fault % ls fish % emacs <emacs starts up OK> The Segmentation fault occurs almost instantly, and core rarely dumps. The few times core has dumped, it is always a core dump of /bin/csh, not emacs. Since the dumps are of the csh, and it fault happens so fast, I believe the csh is having trouble starting the emacs task, rather than emacs causing the problem. I have built two newer versions of gnu emacs and it still happens. I should also mention that these faults happen quite rarely to some users (I may see it once a month), while other users see it a few times a week or even daily! But no one at my site can make it happen at will. We have looked for a pattern, like it happens when you first log on, or after you have been logged on for days, or just to vt100s, or just X Windowing emacs, etc. No pattern is obvious to us. Also, we have looked at everyone's .cshrc and .login and they are all quite different, but everyone sees this problem to some extent. Also, we have diskless systems (with local swap disk), "dataless" systems (with a local swap and root partition, but NFS mounting /usr), and "diskfull" (all disk locally attached) that all experience the problem. We have been suffering with this problem since the first release of UWS for RISC - and only our RISC DECstation 3100s show this bug. It has never happened on our VAX/Ultrix systems. We have installed UWS 2.1, UWS 2.2, and Ultrix 3.1D/2.2D on each DECstation, and no change. Once, we believe another task failed the same way. It was the expire program that runs nightly on one DECstation to clean up network news. Also possibly related is a strange observation the occationally things to do seem to get started in crontab. Some of our scripts that run out of crontab append to a log file as their first step - the next day after the job did not run, we look at the log file and it is not touched! It also seems to happen rarely, but once in a while the same task at night will not run for 2 or 3 days, then it will run nightly for months correctly. The expire program that failed to start was in a /bin/csh script started bu cron, and all of these other scripts that failed are /bin/csh scripts. Again, if you are think you are experiencing these problems, please let me know so I can let Digital know it is not a problem unique to my site. Also, if you know the problem, PLEASE LET ME KNOW - it's driving us crazy!!!!!!!!! --Kid.
diamond@tkou02.enet.dec.com (diamond@tkovoa) (07/27/90)
In article <27522@netnews.upenn.edu> hagan@scotty.dccs.upenn.edu (John Dotts Hagan) writes: >I have submitted a problem to Digital about trouble starting jobs from >/bin/csh, specifically gnu emacs. Unfortunately, Digital seems highly >skeptical that I am really having a problem: > "I think I mentioned that we have hundreds of > people here using emacs and the reported seg fault problem has never been > seen here." Well, the response was almost suitable. The problem is not in emacs. You say so yourself: >The Segmentation fault occurs almost instantly, and core rarely dumps. The >few times core has dumped, it is always a core dump of /bin/csh, not emacs. >Since the dumps are of the csh, and it fault happens so fast, I believe the >csh is having trouble starting the emacs task, rather than emacs causing the >problem. >... The expire program that failed to start >was in a /bin/csh script started bu cron, and all of these other scripts that >failed are /bin/csh scripts. I had similar problems on a MIPS box at a previous employer. The failures were often in csh but also often in other programs such as the ld step of a cc command. I haven't seen it on a DECstation (yet) but it looks like this narrows it down to code that was inherited from MIPS. Sometimes I had the same problem while debugging a program that I wrote myself: If the program aborted, and I tried running it under dbx, it would get a segmentation fault as soon as I said "run" -- repeatedly. Perhaps certain environments set up segments of some particular lengths, or by some other unknown means they manifest an intermittent bug in address mapping/paging/swapping/ who knows. Or perhaps the kernel forgets to delete a signal that has been delivered, so it gets delivered again when a new process bears a certain characteristic. I hope these guesses might help locate the problem. -- Norman Diamond, Nihon DEC diamond@tkou02.enet.dec.com This is me speaking. If you want to hear the company speak, you need DECtalk.
lat@creatures.cs.vt.edu (Laurie Zirkle) (07/27/90)
In article <27522@netnews.upenn.edu> hagan@scotty.dccs.upenn.edu (John Dotts Hagan) writes: > >OK - I could use some help from you, the people, who use DECstations. > >You are running in /bin/csh like normal, and try and start an application (like >the gnu emacs Digital distributes an unsupported subset): > >% emacs >Segmentation fault >% I have seen the same problem here on a DECstation 3100 running 3.1/2.1, 3.1/2.2, and 3.1d/2.2d. I have emacs-18.55 compiled instead of the unsupported emacs that DEC supplies, and it's compiled with the X-windowing support. It doesn't always happen, but it happens enough to annoy the owner/user of the 3100 (especially since she is a heavy emacs user). Laurie Zirkle lat@vtopus.cs.vt.edu Computer Systems Engineer lat@vtcs1.bitnet VA Tech Computer Science Dept Blacksburg, VA 24060 703-231-6370
grue@nirvana.cool.engin.umich.edu (Paul Howell) (07/28/90)
In article <27522@netnews.upenn.edu>, hagan@scotty.dccs.upenn.edu (John Dotts Hagan) writes: |> |> |> You are running in /bin/csh like normal, and try and start an application (like |> the gnu emacs Digital distributes an unsupported subset): |> ... |> % emacs |> Segmentation fault |> % |> |> |> However, if you run another task: |> |> % emacs |> Segmentation fault |> % emacs |> Segmentation fault |> % emacs |> Segmentation fault |> % emacs |> Segmentation fault |> % ls |> fish |> % emacs |> <emacs starts up OK> |> ... |> The Segmentation fault occurs almost instantly, and core rarely dumps. The |> few times core has dumped, it is always a core dump of /bin/csh, not emacs. |> Since the dumps are of the csh, and it fault happens so fast, I believe the |> csh is having trouble starting the emacs task, rather than emacs causing the |> problem. |> ... !> |> Again, if you are think you are experiencing these problems, please let me |> know so I can let Digital know it is not a problem unique to my site. Also, |> if you know the problem, PLEASE LET ME KNOW - it's driving us crazy!!!!!!!!! |> |> --Kid. We had the same problem here with emacs giving a segmentation fault. And only after exec'ing something, would emacs work. Our thoughts were that it was/is a problem with csh. sh didn't have the problem. We're now running Ultrix 3.1d and haven't seen the problem again. Not to say it's fixed, but I'm not an emacs user so I can only go by what others say. --- Paul Howell grue@caen.engin.umich.edu
cole@dip.cs.wisc.edu (Bruce Cole) (07/29/90)
In article <27522@netnews.upenn.edu> hagan@scotty.dccs.upenn.edu (John Dotts Hagan) writes: > "I think I mentioned that we have hundreds of > people here using emacs and the reported seg fault problem has never been > seen here." It is very distressing to hear DEC say this, since I QAR'ed this problem to DEC some time ago, and gave them the fixes to the Ultrix kernel problem which causes this to occur. I posted this description to info-gnu-emacs: From: cole (Bruce Cole) To: gordon!jpd Cc: cole, info-gnu-emacs@prep.ai.mit.edu Subject: RISC Ultrix- Emacs problems Date: Sat, 7 Jul 90 12:11:25 -0500 > We're running emacs 18.54 on an Ultrix Risc Decsystem 5400. Three times > we've had the machine hang with the following message: > > panic: tblmod on invalid pte > > Ultrix support tells us this is caused by emacs. Has anyone experienced > this? DEC says it only happens on RISC boxes. This is due to a MIPS specific Ultrix kernel bug. I sent DEC a description of the bug with a bug fix. The Kernel bug manifests itself with emacs since emacs uses a non-standard data start address on Ultrix MIPS machines. I haven't often seen emacs cause MIPS machines to panic. Usually you just see one of the following errors when you try to start up emacs: segmentation fault (core dumped) emacs: Bad address Out of memory data size rlimit exceeded, pid 6523, process tcsh (for example) Until DEC fixes their kernel, you can avoid the bug by changing the data start address used by emacs. Change m-pmax.h to define these values: [...] Here are diffs to emacs 18.55: *** m-pmax.h Thu Jun 8 11:53:55 1989 --- m-pmax.h.new Mon Jul 9 10:21:21 1990 *************** *** 1,3 **** --- 1,7 ---- #include "m-mips.h" #undef LIBS_MACHINE #undef BIG_ENDIAN + #undef LD_SWITCH_MACHINE + #undef DATA_START + #define DATA_START 0x10000000 + #define DATA_SEG_BITS 0x10000000 >I have built two newer versions of gnu emacs and it still happens. I should >also mention that these faults happen quite rarely to some users (I may see >it once a month), while other users see it a few times a week or even daily! The problem only occurs when a MIPS machine is doing a lot of paging. Users who don't cause their workstation to page will not see this problem. -- Bruce Cole Computer Sciences Dept. U. of Wisconsin - Madison
gk5g+@andrew.cmu.edu (Gary Keim) (08/08/90)
Excerpts from netnews.comp.unix.ultrix: 28-Jul-90 Re: emacs or csh problem? Bruce Cole@dip.cs.wisc.e (2336) > > "I think I mentioned that we have hundreds of > > people here using emacs and the reported seg fault problem has never been > > seen here." > It is very distressing to hear DEC say this, since I QAR'ed this problem to > DEC some time ago, and gave them the fixes to the Ultrix kernel problem which > causes this to occur. Andrew Toolkit applications suffer from this problem. Can someone tell if, and in which version of Ultrix, this paging bug has been fixed. Might it be fixed in 4.0? Gary Keim ATK Group