[comp.unix.questions] ksh dumping core

fail@fozzy.UUCP (Dennis Fail) (10/19/89)

Help,

   We recently obtained the ksh-88 source from the toolchest and installed
on our newtwork of Sun3's and sun4's running SunOS 4.0.3.  The problem
we have been having is that suddenly a window will die because it's
parent process ksh has dumped core.  The best that I can determine is that
when when someone is working at their Sun workstation with ksh as their
login shell and then rlogin's or physically gets up and goes to another
workstation, logs in, and does some work and then goes back to his
workstation and try to use his windows that he was working in, they will
dump core. This doesn't happen all the time, but with enough frequency
to be a problem.

   I've isolated the problem to be in the history function of ksh as the
following dbx dialog will tell, but I am at a loss of what to do about.
Is it a ksh bug, is it something wrong in the configuration.  All the users
HOME directorys are NFS mounted on every machine and they all use .sh_history
as the history file and I am suspecting that this sharing of the file
has something to do with it, but thats just an un-educated guess.

As a side note, we had had problems with the previous version of ksh
corrupting the history file after an rlogin, but it would never dump core.
When we tried to recall a line after a rlogin the shell would just beep
at you, doing a history command would display a list of numbers something
like this:  200     252      201     ...
To fix this we would do an 'exec ksh' and everything would be fine.

Here is the dbx dialog

dbx /bin/ksh core  
Reading symbolic information...
Read 14645 symbols
(dbx) where
hist_eof(), line 430 in "isda/fail/sun-src/ksh-i/src/sh/history.c"
exfile(), line 377 in "isda/fail/sun-src/ksh-i/src/sh/main.c"
main(c = 1, v = 0xefffc40, 0xefffc48), line 280 in "isda/fail/sun-src/ksh-i/src/sh/main.c"
(dbx) list
  430   	register off_t count = fp->fixcnt;
  431   	int skip = 0;
  432   	io_seek(fp->fixfd,count,SEEK_SET);
  433   #ifdef INT16
  434   	while((c=io_getc(fp->fixfd))!=EOF)
  435   	{
  436   #else
  437   	/* could use io_getc() but this is faster */
  438   	while(1)
  439   	{
(dbx) print fp
`history`hist_eof`fp = (nil)
(dbx) 

I guess the nil pointer is causing the cre dump, but I dont know why it is nil.

Any clues anybody.

Thanks

Dennis Fail
Rockwell Int.
{uunet | attctc}!fozzy!fail

seth@ctr.columbia.edu (Seth Robertson) (10/19/89)

In article <834@fozzy.UUCP> fail@fozzy.UUCP (Dennis Fail) writes:
>goes to another
>workstation, logs in, and does some work and then goes back to his
>workstation and try to use his windows that he was working in, they will
>dump core. This doesn't happen all the time, but with enough frequency
>to be a problem.
>
>   I've isolated the problem to be in the history function of ksh

>As a side note, we had had problems with the previous version of ksh
>corrupting the history file after an rlogin, but it would never dump core.

The way I solved the corrupted history file problem was to change the history
file for each pty.  I assume that this would solve your problem also.

From my .kshrc: (The file that gets read in every time a ksh starts.  If you
		 do not have this feature, you could stick it in .profile)
------------------------------------------------------------
##
## Source the people's startup file.
. /public/etc/kshsetup
##
## Set it up so that it prints the contents of ~/.face when I log out
if test "$0" = "su" -o "$0" = "-su"
 then
# WatchOut gets set if there is already another history file existing
# with the same name (i.e. don't delete it)
  if test "$WatchOut"
   then
#   Keep the history file and don't print a closing screen
#   (Because if you su, you don't want your history file to disappear :-)
    trap 'trap 0; exec ~/.kshout save no; kill -9 0; exit; exit' 0
   else
#   Delete the history file and don't print a closing screen
    trap 'trap 0; exec ~/.kshout kill no; kill -9 0; exit; exit' 0
  fi
 else 
  if test "$0" = "-ksh"
   then
#   Delete the histry file and print a closing screen
    trap 'trap 0; exec ~/.kshout kill yes; kill -9 0; exit; exit' 0
   else
#   Delete the history file but don't print a closing screen
    trap 'trap 0; exec ~/.kshout kill no; kill -9 0; exit; exit' 0
   fi
 fi
------------------------------------------------------------

From my .kshout
------------------------------------------------------------
: ${tty:=`tty`}
: ${pty:=`basename $tty`}
: ${host:=`hostname`}
# If argv[1] is `kill' then we are supposed to get rid of the
# ksh history file
if test "$1" = "kill"
 then
  if test "$pty" = "console"
   then
    rm -f .ksh.$host.* ~/core
   else
    : ${HISTFILE:="$HOME/.ksh.$host.$pty.$USER"}
    rm -f "$HISTFILE" ~/core
   fi
 fi
# If argv[2] is `yes' then we are supposed to print a closing screen
if test "$2" = "yes"
 then
  clear
  cat ~/.face
 fi
------------------------------------------------------------

From /public/etc/kshsetup:
------------------------------------------------------------
: ${tty:=`tty`}
: ${pty:=`basename $tty`}
: ${host:=`hostname`}
if test -f "$HOME/.ksh.$host.$pty.$USER"
 then
  WatchOut="true"
 fi
HISTFILE="$HOME/.ksh.$host.$pty.$USER"
. /public/etc/kshenv  # Set up some special features
------------------------------------------------------------

That should solve the problem by having a seperate history file for each
invocation of ksh.  I'll include the public/etc/kshenv for those who are
interested:
------------------------------------------------------------
: ${tty:=`tty`}
: ${pty:=`basename $tty`}

if test "$pty" = "console"
 then
# 3.4 machines don't run sunview
  if test -d /var
   then
    :
   else
    alias sunview=suntools
   fi
 else
  alias sunview="echo 'This is not allowed unless you are on the console.'"
  alias suntools="echo 'This is not allowed unless you are on the console.'"
 fi  

alias logout="exit"
# Plus some more CTR specific stuff
------------------------------------------------------------

Hope this helps,

                                        -Seth Robertson
                                         seth@ctr.columbia.edu
-- 
                                        -Seth Robertson
                                         seth@ctr.columbia.edu

amos@taux01.UUCP (Amos Shapir) (10/21/89)

It seems all your sessions use the same history file; one of them adds
to it, making the other's current pointer into it invalid.

It is easy to fix: just define HISTFILE=.hist$HOST$TTY in your .profile
(make sure $TTY and $HOST are defined first, of course).  I do not use
it since it's sometime useful to keep history around from other sessions;
just pressing RETURN every now and then makes sure a session re-reads
the history file, so it does not stay behind too much.

-- 
	Amos Shapir		amos@taux01.nsc.com or amos@nsc.nsc.com
National Semiconductor (Israel) P.O.B. 3007, Herzlia 46104, Israel
Tel. +972 52 522261  TWX: 33691, fax: +972-52-558322
34 48 E / 32 10 N			(My other cpu is a NS32532)