tim@VAX1.CC.UAKRON.EDU (Timothy H Smith) (03/22/89)
hello.. Im looking for a very specific reference of how the builtin hash function for csh works. I need specific info on how that hash funtion works in that is does not have to search the (path) for the location of a command (executable +x) file. I am able to creat a new executable file and csh will find it, but if a move a file to a different location csh cannot find it. I then have to do a rehash. I believe that csh looks at the location of where it thinks the file is in it's hash table, and when it's not there it complanes. Is this true? How does it know the difference between a newly created file and one not in its hash table. Does csh check that path if the named executable is not in its table? thanks for any help tim@vax1.cc.uakron.edu
chris@mimsy.UUCP (Chris Torek) (03/22/89)
In article <111@VAX1.CC.UAKRON.EDU> tim@VAX1.CC.UAKRON.EDU (Timothy H Smith) writes: > Im looking for a very specific reference of how the builtin >hash function for csh works. ... I am able to creat a new executable >file and csh will find it, but if a move a file to a different location >csh cannot find it. I then have to do a rehash. No and yes. [Now I will see if seanf is reading this.] >I believe that csh looks at the location of where it thinks the >file is in it's hash table, and when it's not there it complanes. >Is this true? How does it know the difference between a newly >created file and one not in its hash table. Does csh check that path >if the named executable is not in its table? Forget everything you think you know about csh's hashing. The C shell uses a technique whose name I either never learned or have forgotten. It works like this: Choose ten or fifteen hash functions, each of which map a pair ({"/bin", "ls"}, {"/bin", "cat"}, {"/usr/ucb", "rlogin"}) to a number in $0..k-1$. Now make an array of $k$ bits, and do the following: foreach directory in $path foreach file in $directory array[hash_0($directory, $file)] <- 1 array[hash_1($directory, $file)] <- 1 array[hash_2($directory, $file)] <- 1 ... array[hash_9($directory, $file)] <- 1 rof rof Now, if given a command name without a `/', you can decide whether it *cannot* be in some directory. So instead of trying everywhere, you instead say: foreach directory in $path if array[hash_0($directory, $command)] = 1 and array[hash_1($directory, $command)] = 1 and array[hash_2($directory, $command)] = 1 and ... array[hash_9($directory, $command)] = 1 then if execute($directory/$command) fails then if errno = ENOENT then continue else complain about not-executable fi fi fi rof complain about not-found What this does is eliminate some (but not all) attempts to execute a nonexistent file. The smaller the constant $k$, and the fewer hash functions $h_i$ you apply, the more likely you are to find `false hits', where a string like `uItrfZgnby' happens to have all $i$ array[$hash_i$()] bits set. The C shell was written to run on small PDP-11s, so it uses a small $k$ (8192) and a single hash function (!). (It saves time and code space. The function hash_0() operates on the *index* of the path component and a true hash of the trailing command part; the real hash need only be computed once, and hash_0() is then a macro. To make things better, the path index is used to select which of 8 bits is set, so that bit collisions cannot happen between path components, if $#path <= 8. This keeps csh from trying too many wrong directories.) My own path (which has more than 8 components, so I am not saved by that last trick) produces over 1400 bits that would have to be set; assuming a no-collision distribution, there is then a 1400/8192 chance that a random string would appear to be runnable via some directory. From my own experience, I would guess that csh's hash function produces many collisions for my path and the strings [a-z]+, so that while fewer than 1400 bits are set, the chances of a random string producing at least one exec() call is better than the 17% suggested by the above. [Warning: there is a four-letter word in the following story. You can stop reading now if such things offend you. There is a control-L on the next line to stop pagers here.] Incidentally, csh's treatment of errors above is also not the best. We used to turn off access to /usr/games from 0900 to 1700 (or something like that), and if you logged in before cron turned them off, csh would set bits for the various games. If you then tried to run one later (rogue, for instance), it would attempt an exec(/usr/games/rogue) and get EACCES, and print `/usr/games/rogue: Permission denied.' All well and good, until you tried to run `blorfloogle' or something else nonexistent, and csh thought it might be in /usr/games. It would attempt an execl(/usr/games/blorfloogle) and get EACCESS, and assume that this meant the file existed, rather than that the directory was off-limits. This really confused someone who had gotten mad about something and gave the command `fuck off'. It just happened that he had gotten the appropriate bit set, and csh printed: `/usr/games/fuck: Permission denied.' He was rather disappointed to find out that there really was no such game after all. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris