steve@nuchat.UUCP (Steve Nuchia) (06/28/87)
In article <2249@bunker.UUCP>, zink@bunker.UUCP (David Zink) writes: > > In article <2211@bunker.UUCP>, zink@bunker.UUCP (David Zink) writes: > > > What's wrong about more than eight symbolic [links]? ... > I am continually appalled by the number of hard-coded constants: > # of file descriptors, # of signals, # of characters in a filename or path. > like 8 into the kernal? ... > I guess that my major complaint is that ELOOP purports to mean LOOP but > really means EIGHT. For reference, if anyone is in a position to fix this, there is a very good loop detection algorithm that takes 1.5 X the time and 2 X the space of the basic linear search it protects. That is, the protected algorithm as a whole has that complexity. The algorithm is to run two pointers down your list, the trailing pointer moving one element for every two scanned by the leader. The two pointers should be compared after each step of the leader. Proof that all loops are detected is left as an excersize. Hint: if I remember correctly, the loop must be detected within 1/2 the loop length after the trailing pointer enters the loop. For application to symlinks this state machine could operate only on the symlink decodes, although it would have to decode any hard links between the symlinks as part of the basic cycle. Thus the extra work of moving the trailing pointer would never come into play in cases with no more than one symlink in the (expanded) path. The space consideration is trivial: it doubles (essentially) the local variables of nami(). Time-wise the algorithm is a bit of a loss, because of the potential for reading blocks into the buffer cache more than once in a single nami() _without_loops_. Note that it would not be correct to implement the algorithm on all pathname components, else foo/../bar would get an ELOOP when ./foo and ./bar are both legal. I'm pretty sure I saw this algorithm in one of Dijkstra's papers, though I doubt it is original with him: probably a crufty old trick from the dawn of time... Steve Nuchia (713) 334 6720 voice 334 1204 2400N81 "trouble" for anon mail to me. {housun,{academ,soma}!uhnix1}!nuchat!steve
chris@mimsy.UUCP (Chris Torek) (07/04/87)
>In article <2249@bunker.UUCP> zink@bunker.UUCP (David Zink) writes: >>I guess that my major complaint is that ELOOP purports to mean LOOP but >>really means EIGHT. (Would you prefer ETWELVE? :-) ) In article <244@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes: >For reference, if anyone is in a position to fix this, there is a very >good loop detection algorithm that takes 1.5 X the time and 2 X the space >of the basic linear search it protects. ... Of course, the present `loop detectection' takes 1 X the time and 1 (no multiplier) space unit. Since namei accounts for most of the time spent in the kernel, slowing it even a little bit may be too much. Moreover, eight symlinks is really not all that small a number. We had five levels of symlinks in one translation on a Sun, and it was noticeably slower than other path name translations. If you have enough levels of symlinks that you run into ELOOP errors, you probably have too complex a nest of symlinks. (That said, I do think eight is too small; but I also think namei is too slow.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) Domain: chris@mimsy.umd.edu Path: seismo!mimsy!chris
kre@munnari.oz (Robert Elz) (07/06/87)
The question is in the Summary header .. the answer is here... Its not easy! And this is quite apart from any effeciency considerations (such as slowing namei from a walk to a crawl). I'm not going to say its impossible, but its not nearly as easy as has been implied by some of the news items on this subject. You can't simply read the literature and translate the mathematics into code. Not for the first time, it turns out that the theoretical results are simply impractical (see also Tanenbaum's MINIX book, and the discussion of deadlock avoidance algorithms). There are several reasons why its not easy. One is that symlinks simply don't operate like nodes in a graph, another is that programming inside the kernel imposes a few limitations that theoretical algorithms rarely allow for. There's a third, which is the real killer, but I'll leave that for a bit lower, to keep you reading... First, the nature of symlinks .. its not a loop for a symlink to be encountered in a path 2, 3, 4, ... times, that might be entirely valid. Most loop detection algorithms assume that there is just one path out of a node, so when you encounter it again, you have a loop. The interesing question is supposed to be how to remember what you have encountered in minimum space and time. Consider ls -F /foo x/ y/ z/ ln -s /foo /a ln -s /a/x/xx /a/y/yy ln -s /a/y/yy /a/z/zz cp /dev/null /foo/x/xx Now consider the number of times that the symlink "a" is examined when referencing /a/z/zz .. Now there are ways around this, you can look for loops at each "level" of interpretation, but here the kernel programming restrictions start making life difficult. Here I assume that we're talking about unix, any unix, not a redesigned kernel .. if you want to start from scratch you can implement whatever you like. Programming inside the kernel means no recursion. You can sometimes get away with one or two recursive calls, but when you write the code you haveto guarantee that that's all there will be. You can't write anything that recurses a variable number of times based on user input data (such as a path containing symbolic links). The reason for this is that the kernel has a small fixed size stack, and when its exceeded the kernel crashes (or in some implementations, random memory is trashed). Now for the third problem ... all of the alogorithms rely on remembering something. In the context of loop detection in symloops, the question is what is it that you want to remember? You could try remembering the string of names, but I don't think that's going to get you very far. You could build pwd into the kernel, and have it remember absolue paths to each symllink encountered, but I don't think you'll get many customers for that system. The obvious thing to remember is inodes, right? Wrong.. the kernel is not allowed to remember inodes in general. That either causes incorrect results, or deadlocks, and neither of those is to be desired. Anyone with 4.3 source should look in the directory cache code and see the torture that was necessary to make that work correctly. There it can be done, since its just a cache, and whenever things look difficult its perfectly OK to simply forget it, and go back to the old way. In a loop detection algorithm you couldn't afford to do that, or from time to time someone will tie the kernel up in an infinite loop, because a loop wasn't noticed. As I said at the start, I would welcome an implementation that proves me wrong, but please go and do it .. don't just talk about it! Finally, as usual, I agree with Chris Torek in article <7326@mimsy.UUCP>: > Moreover, eight symlinks is really not all that small a number. > We had five levels of symlinks in one translation on a Sun, and it > was noticeably slower than other path name translations. If you > have enough levels of symlinks that you run into ELOOP errors, you > probably have too complex a nest of symlinks. (That said, I do > think eight is too small; but I also think namei is too slow.) Given that no-one has yet actually implemented a loop detection algorithm, and we still have some constant limit on symlinks, we have a trade off. System administrators will usually prefer a small value, the smaller the better, as it lowers the overheads in filesystem lookups. Nb: you never *need* more than a single symlink, any symlink chain can *always* be reduced to a single symlink, albeit sometimes with considerably more administrative headaches. Users would typically like a large limit, as then they don't have to think about what they are doing, and can simply install a symlink to anything that appears anywhere. This is something that should be a system config option, so that the local site can decide for itself what the right limit is (ie: how much cpu time they're willing to devote to pathname lookups to increase user convenience). It shouldn't be a compiled in number. Nothing should be a compiled in number, unless that number is fixed by some external and immutable object, that is really nevergoing to change (the number of bits in a byte is one such number .. changing that is going to mean changing the hardware, and that's certainly going to require recompiling anyway). Even if someone does implement a loop detection algorithm, I would still want an absolute limit on the number of symlinks to keep my machine running at a reasonable rate! kre