jjd@BBN.COM (James J Dempsey) (07/31/89)
I have a lot more information about the runaway bashes I have been
seeing.
I had another runaway bash and this time I was able to get a
core file to look at by using the "gcore" command. Here is a
backtrace from gdb:
$ gdb-3.2 /usr/local/gnu/bin/bash ~/core.22004
GDB 3.2, Copyright (C) 1988 Free Software Foundation, Inc.
There is ABSOLUTELY NO WARRANTY for GDB; type "info warranty" for details.
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "info copying" to see the conditions.
Reading symbol data from /usr/local/gnu/bin/bash...done.
Type "help" for a list of commands.
(gdb) cd /usr/local/gnu/bash/bash-1.02
Working directory /usr/local/gnu/bash/bash-1.02.
(gdb) bt
#0 0x24ecc in strcmp (313, 1313428048)
#1 0x7530 in find_variable (...) (...)
#2 0x7553 in get_string_value (...) (...)
#3 0xdf2 in save_history (...) (...)
#4 0xd35 in termination_unwind_protect (...) (...)
#5 0x7fffe47b in ?? (11, 1313428048, 2147474260, 3372)
#6 0x7fffe470 in ?? ()
#7 0x7b4 in main (...) (...)
#8 0x6394 in execute_simple_command (...) (...)
#9 0x575b in execute_command_internal (...) (...)
#10 0x5498 in execute_command (...) (...)
#11 0xa21 in reader_loop (...) (...)
#12 0x7b4 in main (...) (...)
#13 0x6394 in execute_simple_command (...) (...)
#14 0x575b in execute_command_internal (...) (...)
#15 0x5498 in execute_command (...) (...)
#16 0xa21 in reader_loop (...) (...)
#17 0x7b4 in main (...) (...)
(gdb)
Next, I took a look at "list" in find_variable and it looks like list
has been mangled so that list->next points to never never land:
(gdb) print *list
$1 = {next = 0x45474150, name = 0x4e495250 <Address 0x4e495250 out of bounds>, v
alue = 0x3d524554 <Address 0x3d524554 out of bounds>, attributes = 912207922, co
ntext = 1426092912, prev_context = 0x555555}
(gdb) print *(list->next)
Cannot read memory: address 0x45474150 out of bounds.
(gdb)
I also had reason to believe that this runnaway shell was a result of
a shell script I was running, so I ran some more tests. Sometimes this
script would work, sometimes it would just hang and sometimes it would
cause bash to try to send a bug report.
When it would hang and I would eventually interrupt it, that is when
the runnaway process was created -- theoretically it was the shell
which was running the script. I have duplicated this behaviour
several times this morning. I have looked at cores from several
instances and they all look identical to the above.
It is important to note that the script in question does not begin
with a #! line. My login shell is /usr/local/gnu/bin/bash. If I add
the line:
#/bin/sh
to the top of the script or if I add the line:
#/usr/local/gnu/bin/bash
to the top of the script it works fine every time -- no hanging, no
core dumping, no runaway processes. The only time it causes problems
is when there is no #! line at all!
When it tries to send a bug report it looks like this:
$ bad stuff
free: Called with already freed block argument
Tell jjd@bbn.com to fix this someday.
Mailing a bug report...done.
Stopping myself...Illegal instruction (core dumped)
$
It is interesting to note that even though it says "Stopping myself"
it continues and gives me a prompt and seems to function normally as a
bash. The core file is generated and a backtrace from it looks like this:
$ gdb-3.2 /usr/local/gnu/bin/bash core
GDB 3.2, Copyright (C) 1988 Free Software Foundation, Inc.
There is ABSOLUTELY NO WARRANTY for GDB; type "info warranty" for details.
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "info copying" to see the conditions.
Reading symbol data from /usr/local/gnu/bin/bash...done.
Type "help" for a list of commands.
(gdb) dir /usr/local/gnu/bash/bash-1.02
Source directories searched: /u1/jjd:/usr/local/gnu/bash/bash-1.02
(gdb) bt
#0 0x7fffe474 in ?? ()
#1 0x47b0 in programming_error (...) (...)
#2 0x1721b in free (...) (...)
#3 0x631d in execute_simple_command (...) (...)
#4 0x575b in execute_command_internal (...) (...)
#5 0x5498 in execute_command (...) (...)
#6 0xa21 in reader_loop (...) (...)
#7 0x7b4 in main (...) (...)
#8 0x6394 in execute_simple_command (...) (...)
#9 0x575b in execute_command_internal (...) (...)
#10 0x5498 in execute_command (...) (...)
#11 0xa21 in reader_loop (...) (...)
#12 0x7b4 in main (...) (...)
(gdb)
(gdb) frame 3
Reading in symbols for execute_cmd.c...done.
#3 0x631d in execute_simple_command (simple_command=(struct simple_com *) 0x2fc
0c, pipe_in=-1, pipe_out=-1, async=0) (execute_cmd.c line 767)
767 free ((char *)jobs);
(gdb) print *(simple_command)
$1 = {words = 0x2fbec, redirects = 0x0}
(gdb) print *(simple_command->words)
$2 = {next = 0x2fb4c, word = 0x2fc6c}
(gdb) print *(simple_command->words->word)
$3 = {word = 0x2fbac "/nd/bin/phone", dollar_present = 0, quoted = 0, assignment
= 0}
(gdb) print *(simple_command->words->next)
$4 = {next = 0x0, word = 0x2fb8c}
*(gdb) print *(simple_command->words->next->word)
$5 = {word = 0x2fc4c "$*", dollar_present = 1, quoted = 0, assignment = 0}
(gdb)
(gdb) frame 2
Reading in symbols for ./alloc-files/malloc.c...done.
#2 0x1721b in free (mem=(char *) 0x3bc0c "") (./alloc-files/malloc.c line 556)
556 botch ("free: Called with already freed block argument\n");
(gdb) frame 1
Reading in symbols for make_cmd.c...done.
#1 0x47b0 in programming_error (reason=(char *) 0x1714e "free: Called with alre
ady freed block argument\n", arg1=0, arg2=-159802982494542) (make_cmd.c line 436
)
436 abort ();
(gdb)
The script in question looks like this:
uncompress -c $HOME/Text/phones | grep -i $*
/nd/bin/phone $*
I'm not sure what to do from here. I still have the core files lying
around if you want me to look for something specific. I hope this helps.
--Jim--