[gnu.bash.bug] More on runaway bash

jjd@BBN.COM (James J Dempsey) (07/31/89)

I have a lot more information about the runaway bashes I have been
seeing. 

I had another runaway bash and this time I was able to get a
core file to look at by using the "gcore" command.  Here is a
backtrace from gdb:

$ gdb-3.2 /usr/local/gnu/bin/bash ~/core.22004
GDB 3.2, Copyright (C) 1988 Free Software Foundation, Inc.
There is ABSOLUTELY NO WARRANTY for GDB; type "info warranty" for details.
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "info copying" to see the conditions.
Reading symbol data from /usr/local/gnu/bin/bash...done.
Type "help" for a list of commands.
(gdb) cd /usr/local/gnu/bash/bash-1.02
Working directory /usr/local/gnu/bash/bash-1.02.
(gdb) bt
#0  0x24ecc in strcmp (313, 1313428048)
#1  0x7530 in find_variable (...) (...)
#2  0x7553 in get_string_value (...) (...)
#3  0xdf2 in save_history (...) (...)
#4  0xd35 in termination_unwind_protect (...) (...)
#5  0x7fffe47b in ?? (11, 1313428048, 2147474260, 3372)
#6  0x7fffe470 in ?? ()
#7  0x7b4 in main (...) (...)
#8  0x6394 in execute_simple_command (...) (...)
#9  0x575b in execute_command_internal (...) (...)
#10 0x5498 in execute_command (...) (...)
#11 0xa21 in reader_loop (...) (...)
#12 0x7b4 in main (...) (...)
#13 0x6394 in execute_simple_command (...) (...)
#14 0x575b in execute_command_internal (...) (...)
#15 0x5498 in execute_command (...) (...)
#16 0xa21 in reader_loop (...) (...)
#17 0x7b4 in main (...) (...)
(gdb)

Next, I took a look at "list" in find_variable and it looks like list
has been mangled so that list->next points to never never land:

(gdb) print *list
$1 = {next = 0x45474150, name = 0x4e495250 <Address 0x4e495250 out of bounds>, v
alue = 0x3d524554 <Address 0x3d524554 out of bounds>, attributes = 912207922, co
ntext = 1426092912, prev_context = 0x555555}
(gdb) print *(list->next)
Cannot read memory: address 0x45474150 out of bounds.
(gdb)

I also had reason to believe that this runnaway shell was a result of
a shell script I was running, so I ran some more tests.  Sometimes this
script would work, sometimes it would just hang and sometimes it would
cause bash to try to send a bug report.

When it would hang and I would eventually interrupt it, that is when
the runnaway process was created -- theoretically it was the shell
which was running the script.  I have duplicated this behaviour
several times this morning.  I have looked at cores from several
instances and they all look identical to the above.

It is important to note that the script in question does not begin
with a #! line. My login shell is /usr/local/gnu/bin/bash.  If I add
the line:

#/bin/sh

to the top of the script or if I add the line:

#/usr/local/gnu/bin/bash

to the top of the script it works fine every time -- no hanging, no
core dumping, no runaway processes.  The only time it causes problems
is when there is no #! line at all!

When it tries to send a bug report it looks like this:

$ bad stuff
free: Called with already freed block argument

Tell jjd@bbn.com to fix this someday.
Mailing a bug report...done.
Stopping myself...Illegal instruction (core dumped)
$

It is interesting to note that even though it says "Stopping myself"
it continues and gives me a prompt and seems to function normally as a
bash.  The core file is generated and a backtrace from it looks like this:

$ gdb-3.2 /usr/local/gnu/bin/bash core
GDB 3.2, Copyright (C) 1988 Free Software Foundation, Inc.
There is ABSOLUTELY NO WARRANTY for GDB; type "info warranty" for details.
GDB is free software and you are welcome to distribute copies of it
 under certain conditions; type "info copying" to see the conditions.
Reading symbol data from /usr/local/gnu/bin/bash...done.
Type "help" for a list of commands.
(gdb) dir /usr/local/gnu/bash/bash-1.02
Source directories searched: /u1/jjd:/usr/local/gnu/bash/bash-1.02
(gdb) bt
#0  0x7fffe474 in ?? ()
#1  0x47b0 in programming_error (...) (...)
#2  0x1721b in free (...) (...)
#3  0x631d in execute_simple_command (...) (...)
#4  0x575b in execute_command_internal (...) (...)
#5  0x5498 in execute_command (...) (...)
#6  0xa21 in reader_loop (...) (...)
#7  0x7b4 in main (...) (...)
#8  0x6394 in execute_simple_command (...) (...)
#9  0x575b in execute_command_internal (...) (...)
#10 0x5498 in execute_command (...) (...)
#11 0xa21 in reader_loop (...) (...)
#12 0x7b4 in main (...) (...)
(gdb)
(gdb) frame 3
Reading in symbols for execute_cmd.c...done.
#3  0x631d in execute_simple_command (simple_command=(struct simple_com *) 0x2fc
0c, pipe_in=-1, pipe_out=-1, async=0) (execute_cmd.c line 767)
767                         free ((char *)jobs);
(gdb) print *(simple_command)
$1 = {words = 0x2fbec, redirects = 0x0}
(gdb) print *(simple_command->words)
$2 = {next = 0x2fb4c, word = 0x2fc6c}
(gdb) print *(simple_command->words->word)
$3 = {word = 0x2fbac "/nd/bin/phone", dollar_present = 0, quoted = 0, assignment
 = 0}
(gdb) print *(simple_command->words->next)
$4 = {next = 0x0, word = 0x2fb8c}
*(gdb) print *(simple_command->words->next->word)
$5 = {word = 0x2fc4c "$*", dollar_present = 1, quoted = 0, assignment = 0}
(gdb)
(gdb) frame 2
Reading in symbols for ./alloc-files/malloc.c...done.
#2  0x1721b in free (mem=(char *) 0x3bc0c "") (./alloc-files/malloc.c line 556)
556               botch ("free: Called with already freed block argument\n");
(gdb) frame 1
Reading in symbols for make_cmd.c...done.
#1  0x47b0 in programming_error (reason=(char *) 0x1714e "free: Called with alre
ady freed block argument\n", arg1=0, arg2=-159802982494542) (make_cmd.c line 436
)
436       abort ();
(gdb)

The script in question looks like this:

uncompress -c $HOME/Text/phones | grep -i $*
/nd/bin/phone $*

I'm not sure what to do from here.  I still have the core files lying
around if you want me to look for something specific.  I hope this helps.

		--Jim--