ken@rochester.ARPA (Comfy chair) (09/27/86)
First let me make clear the context of my question, ask the question, then explain why I ask. I am asking about hard links, not symbolic links. The question is: If you were designing a new operating system, say a successor to U**x, would you implement links in the same way, enhance it, or put more restrictions on it? Now my thoughts leading up to the question. When I first read how links worked in U**x, I liked the idea. Mv was just a link() followed by unlink(), rm was unlink(), and you could share files and have Jerkyl and Hyde programs, &c, &c. Over the years I noticed that the limitations were such that people couldn't use the full generality of the concept. You couldn't link across filesystems or link directories, so symbolic links were invented. To find out where the other links were, you would have to search the whole filesystem, in the general case. Utilities like tar and rdist have to remember files with links and might run out of memory to store the filenames, in theory at least. One restriction that would ease the search problem is to restrict links to the current directory. Most uses of links are to alias things like "vi" and "ex", "tip" and "cu" and these things usually live close together. This means rename would have to be made a primitive. But mv has to do a copy when it can't do a link because link() doesn't generalize across filesystems. There still is obviously a need for some kind of indirection mechanism. I don't like symbolic links, there are some warts, like having to check for looping, but I can't think of anything better. Do you have any ideas on this? I'd like to hear them. Please mail. I don't want to start an OS war. Ken -- UUCP: ..!{allegra,decvax,seismo}!rochester!ken ARPA: ken@rochester.arpa Snail: CS Dept., U. of Roch., NY 14627. Voice: Ken! "It is absurd to divide people into good or bad. People are either charming or tedious." -- Oscar Wilde
rlk@mit-trillian.MIT.EDU (Robert L Krawitz) (09/28/86)
First of all, mv does not use link(2) followed by unlink(2). It uses rename(2), which is a system call. Secondly, "all hard links are equal." Each directory entry is just a name and a pointer to an inode, and the inode holds the link count. Link() and unlink() increment and decrement the link count of the inode. When this reaches zero, the inode is deallocated and the storage returned to the free list. This mechanism allows the use of links to protect the existence (although not the contents) of precious files -- just keep a link to the file around somewhere. It should be obvious why hard links between file systems are impossible. A directory entry refers to a certain inode, not to any filesystem (there is no guarantee that any other filesystem will be mounted). The device is implicitly the device that the directory is in. Links to directories are not impossible, just forbidden to non-superusers to reduce the chance of filesystem corruption of the form of a closed loop unaccessible to the rest of the file system (fsck does detect this, by the way). Actually, mkdir(2) does create links to directories -- after all, . and .. in each directory are nothing more than links. On unix systems without the mkdir(2) system call, a privileged program (mkdir(1)) calls mknod(2) followed by two calls to link(). This exception is a controlled, safe exception, since a system call rmdir(2) is needed to unlink a directory, which takes care of all cleanup. Restricting all hard links to the same (NOT the "current", which has a specific meaning) directory would cause far more problems that it could possibly solve. First of all, all calls to link() would have to check that other links to a file were in the same directory, which would require a search of the whole filesystem. Secondly, rename() would have to check similar conditions. Also, it would weaken the power of links. It is true that to find all links to a file you have to search the entire filesystem. That's one of the problems with the simple link concept of unix. However, all powerful tools have some drawbacks. The problems for tar and rdist aren't as bad as you suggest. All that they have to do is remember which inodes from what filesystems have already been found. This could be a bit vector. Usually the actual disk partitions are not readable, but in a pinch df -i can be used to get the number of inodes in each file system (a fixed quantity). Symbolic links are completely different. They are pointers to arbitrary pathnames as opposed to pointers to inodes. As far as the filesystem is concerned, they are just a slightly special type of file. The only thing special about them is that most system calls automatically indirect through them (to a certain level to prevent looping). Translation: restricting links would be pointless, difficult to implement, etc. The homogeneity of the unix filesystem is one of its strengths. -- Robert^Z
simon@its63b.ed.ac.uk (ECSC68 S Brown CS) (10/02/86)
In article <21127@rochester.ARPA> ken@rochester.UUCP (Comfy chair) writes: > >There still is obviously a need for some kind of indirection mechanism. >I don't like symbolic links, there are some warts, like having to check >for looping, but I can't think of anything better. The "check for looping" could be fixed for symbolic links by defining some primitive that converts a filename into the filename that it "really is" -- ie, it does the work that it does internally in order to do things like open(), execle(), etc... on a symbolic link. lstat() is fine but it only does one level of translation. -- Simon Brown Computer Science Dept. University of Edinburg.
thk@uxrd1.UUCP (Tom Kiermaier ) (10/06/86)
The mv command on SysV does indeed implement renames as link() followed by unlink(). The rename() system call doesn't exist on SysV.
guy@sun.UUCP (10/07/86)
> The "check for looping" could be fixed for symbolic links by defining > some primitive that converts a filename into the filename that it > "really is" -- ie, it does the work that it does internally in order > to do things like open(), execle(), etc... on a symbolic link. Huh? The "check for looping" is there to prevent calls like "open" from looping: ln -s a b ln -s b a cat a I presume the primitive you're referring to would be something like: int evaluatelink(const char *path, char *buf, int buflen); which would take the path pointed to by "path" and return the path name of the file it ultimately refers to in the buffer whose first character is pointed to by "buf", transferring at most "buflen" characters. How would this help? If that primitive does the work that the kernel does internally for things like "open", it would have the same problems as those calls, and would have to do the same check for looping. > lstat() is fine but it only does one level of translation. Huh? "lstat" does *no* translation; that's what it's there for. It finds the file referred to by the path argument, assuming *no* symbolic-link translation, and returns its file status. "stat" does symbolic-link translation. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
chris@umcp-cs.UUCP (Chris Torek) (10/08/86)
>In article <21127@rochester.ARPA> ken@rochester.UUCP (Comfy chair) writes: >>There still is obviously a need for some kind of indirection mechanism. >>I don't like symbolic links, there are some warts, like having to check >>for looping, but I can't think of anything better. In article <65@its63b.ed.ac.uk> simon@its63b.ed.ac.uk (Simon Brown) writes: >The "check for looping" could be fixed for symbolic links by defining >some primitive that converts a filename into the filename that it >"really is" .... All you have done is to move the check from namei() into this new primitive. If you are willing to expend large amounts of space, the symlink loop checks can be made rigorous, e.g., by remembering each symlink inode and requiring that no one appear twice. The eight-links limit seems to work well in practice, though, particularly since symlinks slow name translation markedly. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
mangler@cit-vax.Caltech.Edu (System Mangler) (10/20/86)
In article <21127@rochester.ARPA> ken@rochester.UUCP (Comfy chair) writes: > I don't like symbolic links, there are some warts, like having to check > for looping, but I can't think of anything better. Warts... you can't chmod, chgrp, utime, or link them. The access time never means much, because doing an "ls -l" to see it has the side effect of changing it. Symbolic links are too expensive to use freely. They take up an inode and 1K of disk space, just to hold a few characters. They carry all the baggage of a regular inode (atime, mtime, links, owner, group, mode) but you can't make proper use of any of it. Since Berkeley was making directory entries variable length anyway, why didn't they just make symbolic links a variant type of directory entry, containing a string instead of an inode number? They might be twice the size of a normal directory entry, but the time saved in not having to read another inode would be a big win. Don Speck speck@vlsi.caltech.edu {seismo,rutgers}!cit-vax!speck
guy@sun.UUCP (10/20/86)
> Since Berkeley was making directory entries variable length > anyway, why didn't they just make symbolic links a variant > type of directory entry, containing a string instead of an > inode number? They might be twice the size of a normal > directory entry, but the time saved in not having to read > another inode would be a big win. Because that would have required non-trivial changes to programs that read directory entries, in order that they understand this new type of directory entry. The 4.2BSD file system changed the format of directory entries, but didn't really change their meaning; as far as an application reading the directory is concerned, they are still <inumber, name> pairs. Converting a program to use the directory library is a mechanical, albeit not automated, operation. If this new "indirect" directory entry were introduced, the conversion process would no longer be mechanical. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
mangler@cit-vax.Caltech.Edu (System Mangler) (10/26/86)
In article <8313@sun.uucp>, guy@sun.UUCP writes: > > why didn't they [Berkeley] just make symbolic links a variant > > type of directory entry, containing a string instead of an > > inode number? > > Because that would have required non-trivial changes to programs that read > directory entries, Many directory-reading programs (ls, tar, find) had to gain explicit knowledge of symbolic links anyway, the changes were even user-visible. Don Speck speck@vlsi.caltech.edu {seismo,rutgers}!cit-vax!speck
guy@sun.UUCP (10/27/86)
> Many directory-reading programs (ls, tar, find) had to gain explicit > knowledge of symbolic links anyway, the changes were even user-visible. And many didn't. If a new type of directory entry were added, *every* directory-reading program would have to gain explicit knowledge of symbolic links, and the change would be more complicated (with symbolic links as they are, the *directory-reading code* didn't have to change, other than mechanically replacing explicit "read"/"fread"/whatever calls with "readdir" calls, etc.). Furthermore, "fsck" would have to be taught about these new kinds of directory entries (not just about new kinds of inodes, as was the case with the current symbolic link implementation), as would a bunch of other utilities that know about file system formats. I presume they just decided the added benefits weren't worth the hassle. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)