jjd@BBN.COM (James J Dempsey) (07/31/89)
I have a lot more information about the runaway bashes I have been seeing. I had another runaway bash and this time I was able to get a core file to look at by using the "gcore" command. Here is a backtrace from gdb: $ gdb-3.2 /usr/local/gnu/bin/bash ~/core.22004 GDB 3.2, Copyright (C) 1988 Free Software Foundation, Inc. There is ABSOLUTELY NO WARRANTY for GDB; type "info warranty" for details. GDB is free software and you are welcome to distribute copies of it under certain conditions; type "info copying" to see the conditions. Reading symbol data from /usr/local/gnu/bin/bash...done. Type "help" for a list of commands. (gdb) cd /usr/local/gnu/bash/bash-1.02 Working directory /usr/local/gnu/bash/bash-1.02. (gdb) bt #0 0x24ecc in strcmp (313, 1313428048) #1 0x7530 in find_variable (...) (...) #2 0x7553 in get_string_value (...) (...) #3 0xdf2 in save_history (...) (...) #4 0xd35 in termination_unwind_protect (...) (...) #5 0x7fffe47b in ?? (11, 1313428048, 2147474260, 3372) #6 0x7fffe470 in ?? () #7 0x7b4 in main (...) (...) #8 0x6394 in execute_simple_command (...) (...) #9 0x575b in execute_command_internal (...) (...) #10 0x5498 in execute_command (...) (...) #11 0xa21 in reader_loop (...) (...) #12 0x7b4 in main (...) (...) #13 0x6394 in execute_simple_command (...) (...) #14 0x575b in execute_command_internal (...) (...) #15 0x5498 in execute_command (...) (...) #16 0xa21 in reader_loop (...) (...) #17 0x7b4 in main (...) (...) (gdb) Next, I took a look at "list" in find_variable and it looks like list has been mangled so that list->next points to never never land: (gdb) print *list $1 = {next = 0x45474150, name = 0x4e495250 <Address 0x4e495250 out of bounds>, v alue = 0x3d524554 <Address 0x3d524554 out of bounds>, attributes = 912207922, co ntext = 1426092912, prev_context = 0x555555} (gdb) print *(list->next) Cannot read memory: address 0x45474150 out of bounds. (gdb) I also had reason to believe that this runnaway shell was a result of a shell script I was running, so I ran some more tests. Sometimes this script would work, sometimes it would just hang and sometimes it would cause bash to try to send a bug report. When it would hang and I would eventually interrupt it, that is when the runnaway process was created -- theoretically it was the shell which was running the script. I have duplicated this behaviour several times this morning. I have looked at cores from several instances and they all look identical to the above. It is important to note that the script in question does not begin with a #! line. My login shell is /usr/local/gnu/bin/bash. If I add the line: #/bin/sh to the top of the script or if I add the line: #/usr/local/gnu/bin/bash to the top of the script it works fine every time -- no hanging, no core dumping, no runaway processes. The only time it causes problems is when there is no #! line at all! When it tries to send a bug report it looks like this: $ bad stuff free: Called with already freed block argument Tell jjd@bbn.com to fix this someday. Mailing a bug report...done. Stopping myself...Illegal instruction (core dumped) $ It is interesting to note that even though it says "Stopping myself" it continues and gives me a prompt and seems to function normally as a bash. The core file is generated and a backtrace from it looks like this: $ gdb-3.2 /usr/local/gnu/bin/bash core GDB 3.2, Copyright (C) 1988 Free Software Foundation, Inc. There is ABSOLUTELY NO WARRANTY for GDB; type "info warranty" for details. GDB is free software and you are welcome to distribute copies of it under certain conditions; type "info copying" to see the conditions. Reading symbol data from /usr/local/gnu/bin/bash...done. Type "help" for a list of commands. (gdb) dir /usr/local/gnu/bash/bash-1.02 Source directories searched: /u1/jjd:/usr/local/gnu/bash/bash-1.02 (gdb) bt #0 0x7fffe474 in ?? () #1 0x47b0 in programming_error (...) (...) #2 0x1721b in free (...) (...) #3 0x631d in execute_simple_command (...) (...) #4 0x575b in execute_command_internal (...) (...) #5 0x5498 in execute_command (...) (...) #6 0xa21 in reader_loop (...) (...) #7 0x7b4 in main (...) (...) #8 0x6394 in execute_simple_command (...) (...) #9 0x575b in execute_command_internal (...) (...) #10 0x5498 in execute_command (...) (...) #11 0xa21 in reader_loop (...) (...) #12 0x7b4 in main (...) (...) (gdb) (gdb) frame 3 Reading in symbols for execute_cmd.c...done. #3 0x631d in execute_simple_command (simple_command=(struct simple_com *) 0x2fc 0c, pipe_in=-1, pipe_out=-1, async=0) (execute_cmd.c line 767) 767 free ((char *)jobs); (gdb) print *(simple_command) $1 = {words = 0x2fbec, redirects = 0x0} (gdb) print *(simple_command->words) $2 = {next = 0x2fb4c, word = 0x2fc6c} (gdb) print *(simple_command->words->word) $3 = {word = 0x2fbac "/nd/bin/phone", dollar_present = 0, quoted = 0, assignment = 0} (gdb) print *(simple_command->words->next) $4 = {next = 0x0, word = 0x2fb8c} *(gdb) print *(simple_command->words->next->word) $5 = {word = 0x2fc4c "$*", dollar_present = 1, quoted = 0, assignment = 0} (gdb) (gdb) frame 2 Reading in symbols for ./alloc-files/malloc.c...done. #2 0x1721b in free (mem=(char *) 0x3bc0c "") (./alloc-files/malloc.c line 556) 556 botch ("free: Called with already freed block argument\n"); (gdb) frame 1 Reading in symbols for make_cmd.c...done. #1 0x47b0 in programming_error (reason=(char *) 0x1714e "free: Called with alre ady freed block argument\n", arg1=0, arg2=-159802982494542) (make_cmd.c line 436 ) 436 abort (); (gdb) The script in question looks like this: uncompress -c $HOME/Text/phones | grep -i $* /nd/bin/phone $* I'm not sure what to do from here. I still have the core files lying around if you want me to look for something specific. I hope this helps. --Jim--