[comp.unix.wizards] symbolic links poster child

steve@nuchat.UUCP (Steve Nuchia) (06/28/87)

In article <2249@bunker.UUCP>, zink@bunker.UUCP (David Zink) writes:
> >   In article <2211@bunker.UUCP>, zink@bunker.UUCP (David Zink) writes:
> >   > What's wrong about more than eight symbolic [links]?
...
> I am continually appalled by the number of hard-coded constants:
> # of file descriptors, # of signals, # of characters in a filename or path.
> like 8 into the kernal?
... 
> I guess that my major complaint is that ELOOP purports to mean LOOP but
> really means EIGHT.

For reference, if anyone is in a position to fix this, there is a very
good loop detection algorithm that takes 1.5 X the time and 2 X the space
of the basic linear search it protects.  That is, the protected algorithm
as a whole has that complexity.

The algorithm is to run two pointers down your list, the trailing pointer
moving one element for every two scanned by the leader.  The two pointers
should be compared after each step of the leader.  Proof that all loops
are detected is left as an excersize.  Hint: if I remember correctly, the
loop must be detected within 1/2 the loop length after the trailing pointer
enters the loop.

For application to symlinks this state machine could operate only on the
symlink decodes, although it would have to decode any hard links between
the symlinks as part of the basic cycle.  Thus the extra work of moving
the trailing pointer would never come into play in cases with no more than
one symlink in the (expanded) path.  The space consideration is trivial:
it doubles (essentially) the local variables of nami().  Time-wise the
algorithm is a bit of a loss, because of the potential for reading blocks
into the buffer cache more than once in a single nami() _without_loops_.

Note that it would not be correct to implement the algorithm on all pathname
components, else foo/../bar would get an ELOOP when ./foo and ./bar are
both legal.

I'm pretty sure I saw this algorithm in one of Dijkstra's papers, though
I doubt it is original with him: probably a crufty old trick from the dawn
of time...

	Steve Nuchia
	(713) 334 6720 voice  334 1204  2400N81  "trouble" for anon mail to me.
	{housun,{academ,soma}!uhnix1}!nuchat!steve

chris@mimsy.UUCP (Chris Torek) (07/04/87)

>In article <2249@bunker.UUCP> zink@bunker.UUCP (David Zink) writes:
>>I guess that my major complaint is that ELOOP purports to mean LOOP but
>>really means EIGHT.

(Would you prefer ETWELVE? :-) )

In article <244@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
>For reference, if anyone is in a position to fix this, there is a very
>good loop detection algorithm that takes 1.5 X the time and 2 X the space
>of the basic linear search it protects. ...

Of course, the present `loop detectection' takes 1 X the time and
1 (no multiplier) space unit.  Since namei accounts for most of
the time spent in the kernel, slowing it even a little bit may be
too much.

Moreover, eight symlinks is really not all that small a number.
We had five levels of symlinks in one translation on a Sun, and it
was noticeably slower than other path name translations.  If you
have enough levels of symlinks that you run into ELOOP errors, you
probably have too complex a nest of symlinks.  (That said, I do
think eight is too small; but I also think namei is too slow.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris

kre@munnari.oz (Robert Elz) (07/06/87)

The question is in the Summary header .. the answer is here...

Its not easy!  And this is quite apart from any effeciency
considerations (such as slowing namei from a walk to a crawl).

I'm not going to say its impossible, but its not nearly as easy as
has been implied by some of the news items on this subject.  You can't
simply read the literature and translate the mathematics into code.

Not for the first time, it turns out that the theoretical results
are simply impractical (see also Tanenbaum's MINIX book, and the
discussion of deadlock avoidance algorithms).

There are several reasons why its not easy.  One is that symlinks
simply don't operate like nodes in a graph, another is that programming
inside the kernel imposes a few limitations that theoretical algorithms
rarely allow for.  There's a third, which is the real killer, but
I'll leave that for a bit lower, to keep you reading...

First, the nature of symlinks .. its not a loop for a symlink to be
encountered in a path 2, 3, 4, ... times, that might be entirely
valid.  Most loop detection algorithms assume that there is just one
path out of a node, so when you encounter it again, you have a loop.
The interesing question is supposed to be how to remember what you
have encountered in minimum space and time.  Consider

	ls -F /foo
	x/ y/ z/

	ln -s /foo /a
	ln -s /a/x/xx /a/y/yy
	ln -s /a/y/yy /a/z/zz
	cp /dev/null /foo/x/xx

Now consider the number of times that the symlink "a" is examined
when referencing /a/z/zz ..

Now there are ways around this, you can look for loops at each "level"
of interpretation, but here the kernel programming restrictions start
making life difficult.

Here I assume that we're talking about unix, any unix, not a redesigned
kernel .. if you want to start from scratch you can implement whatever
you like.

Programming inside the kernel means no recursion.  You can sometimes
get away with one or two recursive calls, but when you write the code
you haveto guarantee that that's all there will be.  You can't write
anything that recurses a variable number of times based on user input
data (such as a path containing symbolic links).  The reason for this
is that the kernel has a small fixed size stack, and when its exceeded
the kernel crashes (or in some implementations, random memory is trashed).

Now for the third problem ... all of the alogorithms rely on remembering
something.  In the context of loop detection in symloops, the question
is what is it that you want to remember?  You could try remembering the
string of names, but I don't think that's going to get you very far.
You could build pwd into the kernel, and have it remember absolue paths
to each symllink encountered, but I don't think you'll get many customers
for that system.  The obvious thing to remember is inodes, right?

Wrong.. the kernel is not allowed to remember inodes in general.  That
either causes incorrect results, or deadlocks, and neither of those is
to be desired.  Anyone with 4.3 source should look in the directory
cache code and see the torture that was necessary to make that work
correctly.  There it can be done, since its just a cache, and whenever
things look difficult its perfectly OK to simply forget it, and go
back to the old way.  In a loop detection algorithm you couldn't afford
to do that, or from time to time someone will tie the kernel up in an
infinite loop, because a loop wasn't noticed.

As I said at the start, I would welcome an implementation that proves
me wrong, but please go and do it .. don't just talk about it!

Finally, as usual, I agree with Chris Torek in article <7326@mimsy.UUCP>:

> Moreover, eight symlinks is really not all that small a number.
> We had five levels of symlinks in one translation on a Sun, and it
> was noticeably slower than other path name translations.  If you
> have enough levels of symlinks that you run into ELOOP errors, you
> probably have too complex a nest of symlinks.  (That said, I do
> think eight is too small; but I also think namei is too slow.)

Given that no-one has yet actually implemented a loop detection
algorithm, and we still have some constant limit on symlinks, we
have a trade off.  System administrators will usually prefer a small
value, the smaller the better, as it lowers the overheads in filesystem
lookups.  Nb: you never *need* more than a single symlink, any symlink
chain can *always* be reduced to a single symlink, albeit sometimes
with considerably more administrative headaches.

Users would typically like a large limit, as then they don't have
to think about what they are doing, and can simply install a symlink
to anything that appears anywhere.

This is something that should be a system config option, so that the
local site can decide for itself what the right limit is (ie: how much
cpu time they're willing to devote to pathname lookups to increase
user convenience).  It shouldn't be a compiled in number.  Nothing
should be a compiled in number, unless that number is fixed by some
external and immutable object, that is really nevergoing to change
(the number of bits in a byte is one such number .. changing that is
going to mean changing the hardware, and that's certainly going to
require recompiling anyway).

Even if someone does implement a loop detection algorithm, I would
still want an absolute limit on the number of symlinks to keep my
machine running at a reasonable rate!

kre