jeremy@socs.uts.edu.au (Jeremy Fitzhardinge) (01/30/91)
I've been having a few ideas about changes to the unix filesystem that may or may not be useful. I'd like comments, but not flames unless you're feeling really motivated. When I refer to "the" unix filesystem, I'm talking about a BSD FFS since the old "ILike14Charact" filesystem of System V R<4 seems to have faded out; however it is simpler to explain changes to, so I may use it for examples. I have 3 main ideas for change: 1 - a flink(char *path, int fd) system call/operation. It seems odd to be that you can open a file, unlink it from the filesystem, and then not be able to put it back as a file unless you actually copy it out. What I was thinking about was a system call that lets you make a new directory entry that refers to the inode of an open file. The syscall would allow the link to take place if the user can create a directory entry at the specified path which can point to the inode of the open file. The only security problem I can think of is this: it would be possible to link a file back into the filesystem into a publically accessable directory after some time, even if the path to the original file becomes closed. If this were a real problem, you'd have to use a utility like fuser to see what processes have what files open in the closed off area. However, this situation is only marginally different from just copying out the file, which would have less side effects anyway (like not incrementing the link count). 2 - insertion/deletion in the middle of a file without copying Inserting and deleting chunks from the middle of a file seems like a pretty common operation, yet it is algorithmically quite inefficent as a result of the way the filesystem is designed. What I was thinking about is having the logical size of each block in the indirect blocks, as well as their location. When I say "block" I'm refering to the smallest singly writeable unit onto some disk-like device - basically a SysV FS block as opposed to BSD's myriad of sectors/blocks/clusters etc. When the file is being used normally (new data being appended to the end) then all blocks but the last will have valid data in them. However when data is added into the middle of the file, a new block is inserted into the blocklist. If the insertion is in the middle of a currently existing block, then the block's logical size is truncated to the offset of the insertion into the block. The remainder is copied into the newly allocated block. The logical size of the new block is set to the remainder's size, and the filepointer is set to the end. Is the file is read, then it appears exactly the same, until new data is written. On a write, instead of overwriting existing data, the data is written to fill the remainder of the new block, thus increasing its logical size. When the logical size matches the physical size another block is inserted into the file. Rather then having separate "write with insert" operations (as i implied above), I think the best way of allowing program support would be an "insert" system call that inserts a certain amount of empty space into an open file at the current position. Naturally, the blocks are only inserted into the file, but are not actually allocated on disk. If a negative amount is specified then the space is closed up. If the file becomes too fragmented, then it can be just rewritten contigiously, which would fill up all gaps. This mechanism saves having to copy any of the actual file larger than a physical block size, but it does mean that there is quite a bit of shuffling about of the indirect blocks, which could make the operation hard to guarantee atomic. It might also be worth making insertion an attribute of a file when its created so that only files that need it have the overhead of logical block sizes in the indirect blocks. 3 - limited sized files This idea is essentially quite similar to the above - basically I've been sick of simple log files that grow and grow without bound, often making serious holes in a file system. The idea is simply this - create a file that has a certain maximum size. If there is a write to the end of the file that would normally grow the file, then rather than ignoring it, blocks from the front of the file are reallocated and reordered to hold the new data. I suppose the file size would be best be in units of filesystem blocks, however if implemented in conjunction with insertion/deletion, then this need not be the case. These are ideas that may be implemented in a filesystem that's currently being designed. I would quite like comments and ideas from fellow experienced Unix users/hackers. -- Jeremy Fitzhardinge:jeremy@ultima.socs.uts.edu.au jeremy@utscsd.csd.uts.edu.au Irregular adjective: I have a moral standpoint You are assertive He is aggressive
mjr@hussar.dco.dec.com (Marcus J. Ranum) (02/04/91)
jeremy@socs.uts.edu.au (Jeremy Fitzhardinge) writes: >1 - a flink(char *path, int fd) system call/operation. Part of the problem with such a system call is that it would break a lot of fairly clean and elegant interfaces. Presently, you can (for example) write code that ignores whether it is writing to a tty, a socket, or a disk file. An flink() system call would break that because you'd have to generate an error if someone tried to flink() a socket to a filename. Sure, it'd be do-able, but there would be lots of grotty special cases to deal with. The question then becomes "why?" - actual cases where someone would want to do such a thing are fairly rare, I believe - not worth the cost that would be incurred. You'd also have the same problem that you'd get an error if you tried to flink() across a device. In the cross-device case, copying it back is far more portable. >2 - insertion/deletion in the middle of a file without copying This is another fairly special case. I don't have any hard statistics, but I suspect most file activity is sequential or random, and a random sequential :) file wouldn't be used a whole lot. There are a lot of cases where this would be very nice, and typically such functionality is fairly easily added to an application via a set of library routines that manipulates blocks in some form of linked list. This is probably a good way to do it, since it won't make the inodes bigger (which means that EVERY file will waste extra space) - it's also just a simple issue of application support. If I write my application with a library to handle file management, I don't have to worry that it won't run on Joe Bob's UNIX which hasn't got kernel support for chunked files. That counts for a lot. Generally, it's better to put stuff in the application layer unless it *HAS* to go into the kernel, or unless it will somehow dramatically help all the applications running on that kernel - without breaking portability. For example, implementing Osterhout's log-based file system and getting a 10% write speed up would be a bigger win for 95% of the applications on the system than getting a 95% speedup for 10% of the applications. >3 - limited sized files There are a lot of things that UNIX doesn't do that it might be nice if it did - but a lot of those are because it'd be unnecessarily complex or expensive to do them, and the return on investment is fairly low. Fortunately, kernel hackers have been one of the last bastions against the "let's just add this feature because it'd look neet" crowd - otherwise UNIX would look like X-window or GNUemacs. mjr. -- Lutraphiles unite!
lm@slovax.Eng.Sun.COM (Larry McVoy) (02/04/91)
In article <1991Jan30.143326.16676@socs.uts.edu.au> jeremy@socs.uts.edu.au (Jeremy Fitzhardinge) writes: >1 - a flink(char *path, int fd) system call/operation. Perfectly reasonable. Almost everything that takes a pathname as an arg is already written as func(char *p, args...) { struct vnode *vp; lookuppn(p, &vp, dirvp...); cfunc(vp, args...); } ffunc(int fd, args) { struct file fp; fp = GETF(fd); cfunc((struct vnode*)fp->f_data, args...); } >2 - insertion/deletion in the middle of a file without copying > >result of the way the filesystem is designed. What I was thinking about >is having the logical size of each block in the indirect blocks, as well >as their location. This is normally known as an extent, i.e, a <bn, length> tuple. >[much stuff about insertion alg deleted] I'm not very interested in this idea. While I agree that it is nice to be able to say "vi 100MBfile", insert some junk, and write it out, and have it all happen quickly, I question that this is a common enough operation that you really want to cram this sort of complexity into the file system. If you really need it, build it inot the application using multiple files. You can also mitigate the copy stuff (it may be that some editors do this already) by rewriting the data from the change on down. >3 - limited sized files > >This idea is essentially quite similar to the above - basically I've >been sick of simple log files that grow and grow without bound, often >making serious holes in a file system. There is a per process file size limit. Find the offending processes and crank down their limit. That's what it is there for. Better yet, write a crontab entry that goes in, deletes all but the last N lines/bytes/whatever of data. This is an administration issue, not a file system issue. >These are ideas that may be implemented in a filesystem that's currently >being designed. I would quite like comments and ideas from fellow >experienced Unix users/hackers. You got 'em. I'd like to know who/where/why this file system is being designed on when/where/how it will be released. --- Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com
bzs@world.std.com (Barry Shein) (02/04/91)
Actually, I'm going to take exception with all these old fuddy-duddies who seem to be defending the status quo and say that the idea of manipulating blocks within a file is perfectly rational and would be useful. I don't know why everyone seems to think it's a wild idea. Think of fixed length record files and inserting into them, it would be nice to be able to just copy/munge the block numbers rather than the data. You'd need operators for inserting and deleting, perhaps one function could do both (who cares, two functions, or one with flags, easy enough to use flags.) Moving blocks around (e.g. a sort) would be handy also. Of course, you could do most of this virtually (although really freeing space is a problem) by just writing an application library which goes through a block table. I suppose the obvious suggestion would be to try writing and putting such a library into common use and seeing if it gets used. Personally, I'd be more interested in a meta-file format where you can create files which point into other files, a file-type made up of an (offset, length) tuple list. The hard part is reference counts. But it is the file system equivalent of a database "view". I could think of many uses for that, and its operators (e.g. hypertext.) -- -Barry Shein Software Tool & Die | bzs@world.std.com | uunet!world!bzs Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
chip@tct.uucp (Chip Salzenberg) (02/05/91)
According to sef@kithrup.COM (Sean Eric Fagan): >And, yeah, there have been times when I would have liked to have seen >a "round" file (i.e., wrapping around the end). So create a circular file subroutine library that looks at the top of the file for "maxsize,curpos\n". We use one here; it's very handy, and it's usable on all UNIX implementations with file/record locks. -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip> "Most of my code is written by myself. That is why so little gets done." -- Herman "HLLs will never fly" Rubin
jeff@crash.cts.com (Jeff Makey) (02/05/91)
In article <1991Jan30.143326.16676@socs.uts.edu.au> jeremy@socs.uts.edu.au (Jeremy Fitzhardinge) writes: >2 - insertion/deletion in the middle of a file without copying [...] >3 - limited sized files Both of these wishes could be granted by combining a variant of ftruncate() that deletes bytes from arbitrary sections of a file with a new kernel call that efficiently creates empty space in the middle of a file. :: Jeff Makey Department of Tautological Pleonasms and Superfluous Redundancies Department Posting from my temporary home at ... Domain: jeff@crash.cts.com UUCP: nosc!crash!jeff
igb@fulcrum.bt.co.uk (Ian G Batten) (02/05/91)
In article <BZS.91Feb4003139@world.std.com> bzs@world.std.com (Barry Shein) writes: > Think of fixed length record files and inserting into them, it would > be nice to be able to just copy/munge the block numbers rather than > the data. What's needed is a version of streams for filesystems. With Multics, the One True Operating System, you could attach modules (== push modules) such as vfile_ to provide additional functionality over and above that which you got from initiate_segment_ and its friends. What would be nice with Unix would be ISAM, record mode, whatever modules you could push on top of the mmap interface. Once you can map files into your address space most things can be done on top of that. ian
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/05/91)
In article <1991Feb04.004933.17253@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: > In article <1991Jan30.143326.16676@socs.uts.edu.au> jeremy@socs.uts.edu.au (Jeremy Fitzhardinge) writes: > >1 - a flink(char *path, int fd) system call/operation. > This, while not necessarily a bad idea, is not necessarily a *good* idea. > You are not going to be able to do it for any arbitrary path and > file descriptor (since you have problems with mount points still, just like > normal links), and some of the objects don't make a whole lot of sense as > files. You're describing exactly the limitations on link(). What's wrong with that? Here's one use of flink(): You run ``rmprotect foo bar'', where foo and bar are important files that you want to make sure you never delete. rmprotect periodically checks the number of links on foo and bar; if they ever disappear, it puts them back and sends you mail. The only way to do this without flink() is to waste some directory space elsewhere for extra links, and then you don't get the same reliability. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/05/91)
In article <1991Feb04.045330.779@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes: > In article <1991Jan30.143326.16676@socs.uts.edu.au> jeremy@socs.uts.edu.au (Jeremy Fitzhardinge) writes: > >1 - a flink(char *path, int fd) system call/operation. > Currently, a setuid program can open a file, turn off its setuid-ness, and > exec some other program which can use the open file but not link, chmod, etc. > This is pretty pointless for disk files, but can sometimes be useful if the > file is a device or network file. Being able to invent a name for any open > file passed to you introduces possible protection holes. Ugh. Yeah, but if you give flink() the obvious link() semantics then there's no security problem. > I'd be > interested to hear what real problems this is intended to fix. Well, you might create temporary files with fdtemp(), manipulate them outside the directory tree, and then flink() them in their final state. (fdtemp() would have to take a directory argument so that it could assign a filesystem to the file.) This prevents several potential security holes and makes it easier to synchronize separate applications. ---Dan
tchrist@convex.COM (Tom Christiansen) (02/06/91)
From the keyboard of igb@fulcrum.bt.co.uk (Ian G Batten):
:In article <BZS.91Feb4003139@world.std.com> bzs@world.std.com (Barry Shein) writes:
:> Think of fixed length record files and inserting into them, it would
:> be nice to be able to just copy/munge the block numbers rather than
:> the data.
:
:What's needed is a version of streams for filesystems. With Multics,
:the One True Operating System, you could attach modules (== push
:modules) such as vfile_ to provide additional functionality over and
:above that which you got from initiate_segment_ and its friends. What
:would be nice with Unix would be ISAM, record mode, whatever modules you
:could push on top of the mmap interface. Once you can map files into
:your address space most things can be done on top of that.
I think a good watchdog (file/inode daemon) implementation would allow
that. See the paper is the proceedings from the next-to-the-last USENIX
in Dallas (about 3 years ago) for a description of the idea and one
implementation.
--tom
--
"Still waiting to read alt.fan.dan-bernstein using DBWM, Dan's own AI
window manager, which argues with you 10 weeks before resizing your window."
### And now for the question of the month: How do you spell relief? Answer:
U=brnstnd@kramden.acf.nyu.edu; echo "/From: $U/h:j" >>~/News/KILL; expire -f $U
thorinn@diku.dk (Lars Henrik Mathiesen) (02/06/91)
richard@locus.com (Richard M. Mathews) writes: >jeremy@socs.uts.edu.au (Jeremy Fitzhardinge) writes: >>1 - a flink(char *path, int fd) system call/operation. >>The only security problem I can think of is >>this: it would be possible to link a file back into the filesystem into >>a publically accessable directory after some time, even if the path to >>the original file becomes closed. If this were a real problem, you'd >>have to use a utility like fuser to see what processes have what files >>open in the closed off area. As it is now, you have to use a utility like ncheck to see what hard links there are lying around. >Say there is a >file protected by directory permissions which some setuid/setgid program >lets you look at under controlled circumstances. Your program can now >create a name for the file not protected by the directory and reopen the >file with more flexibility. This may be far fetched, but it should be >considered. As it is now, if you get an open file descriptor for a file, you can copy it anywhere you like; if you want to track changes, fstat every second and re-copy as needed. This would not be a new hole, it would just be easier to use. >Combined with problems like only being able to link it to a name in the >correct file system, I think this idea needs some work. With the obvious implementation of flink, the sequence fd = open("foo", mode); flink(fd, "bar"); will have _exactly_ the same effect as, and fail in the same cases as, fd = open("foo", mode); link("foo", "bar"); This includes making a hard link to /dev/tty?? or a FIFO inode if that's what fd was opened on. (Under Sun licensed NFS, and 4.3 BSD.) If I could think of something to use it for, I'd add it to the kernel tonight. (Except we also have Suns, so that'd be one more incompatibility. Or maybe they'd let us buy source if we said we wanted to ``experiment with kernel extensions''?) -- Lars Mathiesen, DIKU, U of Copenhagen, Denmark [uunet!]mcsun!diku!thorinn Institute of Datalogy -- we're scientists, not engineers. thorinn@diku.dk
rbj@uunet.UU.NET (Root Boy Jim) (02/06/91)
In article <1991Jan30.143326.16676@socs.uts.edu.au> jeremy@socs.uts.edu.au (Jeremy Fitzhardinge) writes: >I have 3 main ideas for change: > >1 - a flink(char *path, int fd) system call/operation. Many people have complained about "security problems". I don't see any. If you have an fd, you have the data, so you can copy it to your own file anyway. An flink is just faster. >2 - insertion/deletion in the middle of a file without copying I don't like this any better than anyone else, with one exception. I can see extending f?truncate to trim the beginning. The kernel would keep an beginning pointer for it's own internal use. Well, the implementation gets a bit tricky, but it could work. >3 - limited sized files Hey, pipes already do this! They treat the ten direct block pointers as a ring buffer. Now the question becomes, what sizes will be supported, and how do you know where to start scanning when the ring wraps. Almost certainly you will be in the middle of a "record" if using variable ones. Cron jobs to trim log files can lose log entrys. You have to rename the file, then send a signal to any process that keeps the file open so it can open the new log. Or lock the file before renaming it. All of these have been discussed before. I think the consensus is that each has its appeal, for about five minutes. However, ideas stimulate us to look for better ways of doing things. For example, instead of using a log file, a unix domain socket could be written to. It could write flat files, circular files, filter entrys, update a database, send to a secure machine, whatever. -- Root Boy Jim Cottrell <rbj@uunet.uu.net> I got a head full of ideas They're driving me insane
richard@locus.com (Richard M. Mathews) (02/07/91)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >sef@kithrup.COM (Sean Eric Fagan) writes: >> jeremy@socs.uts.edu.au (Jeremy Fitzhardinge) writes: >> >1 - a flink(char *path, int fd) system call/operation. >> This, while not necessarily a bad idea, is not necessarily a *good* idea. >> You are not going to be able to do it for any arbitrary path and >> file descriptor (since you have problems with mount points still, just like >> normal links), and some of the objects don't make a whole lot of sense as >> files. >You're describing exactly the limitations on link(). What's wrong with >that? With link() you have a pathname to the file, so you have some idea where you can put the new link(). Presumably with flink(), you are using this call because you DON'T have a pathname. Since you don't know where the file came from, you have to do more work to figure out where it can go. Now I'll rebut my own statements above. Perhaps the real reason you are using flink() is that the file has ZERO links. You know where it WAS, so you know where you can put it back as well as you do with link(). Flink() has some real use here. I agree that flink could be useful, but as I've pointed out elsewhere, I am slightly worried about its possible use to violate security. On the other hand, given the weak security of most Unix systems, this small chance of opening a hole is nothing. Richard M. Mathews Freedom for Lithuania richard@locus.com Laisve! lcc!richard@seas.ucla.edu ...!{uunet|ucla-se|turnkey}!lcc!richard
richard@locus.com (Richard M. Mathews) (02/07/91)
rbj@uunet.UU.NET (Root Boy Jim) writes: >Many people have complained about "security problems". >I don't see any. If you have an fd, you have the data, so you >can copy it to your own file anyway. An flink is just faster. The question isn't whether you can write your own copy; it is whether you can write to the "system's" copy. Say the "system" has a file with mode 666 which is protected only by directory permissions. Certain setuid or setgid programs are supplied which provide controlled access to the file. A user supplied program can be invoked with the file open for read. Only "system" supplied programs can access the file for write. With flink(), the user could create a name for the file, reopen it for write, and screw up the whole world. ("system" here refers not necessarily to the Unix system, but to whomever or whatever is in charge of some application package) Richard M. Mathews D efend richard@locus.com E stonian-Latvian-Lithuanian lcc!richard@seas.ucla.edu I ndependence ...!{uunet|ucla-se|turnkey}!lcc!richard
kandall@sgitokyo.nsg.sgi.com (Michael Kandall) (02/07/91)
In article <G2%&-2#@uzi-9mm.fulcrum.bt.co.uk> igb@fulcrum.bt.co.uk (Ian G Batten) writes: >In article <BZS.91Feb4003139@world.std.com> bzs@world.std.com (Barry Shein) writes: >> Think of fixed length record files and inserting into them, it would >> be nice to be able to just copy/munge the block numbers rather than >> the data. > >What's needed is a version of streams for filesystems. With Multics, >the One True Operating System, you could attach modules (== push >modules) such as vfile_ to provide additional functionality over and >above that which you got from initiate_segment_ and its friends. What >would be nice with Unix would be ISAM, record mode, whatever modules you >could push on top of the mmap interface. Once you can map files into >your address space most things can be done on top of that. > >ian I believe SVR4 has this. In SVR4's enhanced STREAMS, I believe you can push STREAMS onto arbitrary file descriptors. -- ---- Michael Kandall Independent Consultant Nihon Silicon Graphics
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/07/91)
In article <richard.665896876@fafnir.la.locus.com> richard@locus.com (Richard M. Mathews) writes: [ foo is mode 700 root, foo/bar is mode 666 root, some setuid program ] [ opens foo/bar for reading and passes the descriptor to user code ] > With flink(), the user could create a name for the file, reopen it for > write, and screw up the whole world. Nah. flink() would only work if you have the file open for writing. End of security problems. You say this is a limitation? Well--- (The *right* way to do this is to have an entirely separate bit: O_LINK, perhaps. The privileged program here would just make sure to leave O_LINK out of the open. See the O_NONE discussion that crops up now and then: people have proposed good uses for a few other bits.) ---it did occur to you that under the current system, you'd need either read or write access to open the descriptor for flink() in the first place. Didn't it? Until there's something like O_NONE to open files for operations without I/O, this part of the system will never be perfectly clean. The simplest solution is to make O_LINK synonymous with O_WRONLY. ---Dan
chip@tct.uucp (Chip Salzenberg) (02/08/91)
According to thorinn@diku.dk (Lars Henrik Mathiesen): >jeremy@socs.uts.edu.au (Jeremy Fitzhardinge) writes: >>1 - a flink(char *path, int fd) system call/operation. > >If I could think of something to use it for, I'd add it to the kernel >tonight. It's a convenient way to create lock files -- if, that is, the kernel also supports fdcreat(), which creates a plain file with no links. Also, the obvious companion fdunlink(int fd, char *path) is something I've always wanted. It unlinks the given path if and only if it is a name for fd. With fdunlink(), the UUCP style of lock files can be used safely and reliably, since the normal race condition -- "how do I know that the lock I'm removing is the stale one" -- disappears. -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip> "Most of my code is written by myself. That is why so little gets done." -- Herman "HLLs will never fly" Rubin
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (02/08/91)
In <20190:Feb712:13:4391@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >Nah. flink() would only work if you have the file open for writing. Well, writing but not O_APPEND. -- Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com> UUCP: oliveb!cirrusl!dhesi
barmar@think.com (Barry Margolin) (02/08/91)
In article <richard.665896415@fafnir.la.locus.com> richard@locus.com (Richard M. Mathews) writes: >With link() you have a pathname to the file, so you have some idea where >you can put the new link(). Presumably with flink(), you are using this >call because you DON'T have a pathname. Since you don't know where the >file came from, you have to do more work to figure out where it can go. What do pathnames have to do with "where you can put the new link"? You can put the new link anywhere on the same file system as the file. Due to symbolic links, it is not possible to determine whether two files are on the same file system simply by looking at the pathnames. Presumably one of the first things that the link() system call does is translate the old pathname to a device and inode (or vnode or whatever is appropriate for the file system). It then does the same thing for the directory portion of the new pathname, and compares the device portion. The device/inode information is presumably stored in the file table entry that the file descriptor references, so the differences are trivial. The kernel code would presumably be structured something like: link(const char *old_path, const char *new_path) { file_info = namei(old_path); file_link(file_info, new_path); } flink(unsigned int fd, const char *new_path) { file_info = lookup_fd_info(fd); if (file_info.device_type != file) link_wrong_type_attempt(); /* can't flink a pipe, etc. */ else file_link(file_info, new_path); } file_link (FILE_INFO fi, new_path) { device = get_pathname_device(new_path); if (device != fi.device) cross_device_link_attempt(); else create_file(file_info, new_path); } -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
bzs@world.std.com (Barry Shein) (02/08/91)
>The question isn't whether you can write your own copy; it is whether you >can write to the "system's" copy. Say the "system" has a file with mode >666 which is protected only by directory permissions. Certain setuid >or setgid programs are supplied which provide controlled access to the >file. A user supplied program can be invoked with the file open for >read. Only "system" supplied programs can access the file for write. >With flink(), the user could create a name for the file, reopen it for >write, and screw up the whole world. Since all flink() would do is enter a string/i-num pair into a directory I can't see how any of this applies. I was trying to think of some trick along the lines of a setuid program which opens a protected file and then execs a non-priv process handing down only the open fd, some software does this sort of thing. Inetd is analogous to this, as an example, since it takes privilege to bind() a low-numbered port for accepts() but the processes it execs need not be priv'd in any way (I realize these are sockets, not plain files, but just in case anyone thought this sort of thing I am describing is unlikely...) But if it can be flink()'d at all then we assume you could seek to zero and copy all the data out of the file to your own file anyhow, so that's not a new opportunity. And whether you can read or write is dictated by the setting of the inode and how the original fd was opened which is independent of flink() entirely. ---------- Hmm, it would also increase the link count of the file. I suppose that could be a weak security problem. It also would change the change date in the inode, even if the file and/or directory were otherwise inaccessible for any modification by other means. So I suppose someone could use this on a read-only fd handed down from a priv'd process to maliciously force the file to appear to need a back-up. -- -Barry Shein Software Tool & Die | bzs@world.std.com | uunet!world!bzs Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
lupienj@hpwadac.hp.com (John Lupien) (02/09/91)
In article <422@bria> uunet!bria!mike (Michael Stefanik) writes: >In an article, socs.uts.edu.au!jeremy (Jeremy Fitzhardinge) writes: >>3 - limited sized files >Although this has some merit, I would much prefer to have cron fire >up a script that simply trims down my growing log files, rather than >burden the kernel with the job. I quite agree that this does not need to be done in the kernel. The original posting was talking about a kind of "fifo" file with the data falling off a cliff when it gets to the end. This could be done quite nicely at the user level with a file-based ring buffer. --- John R. Lupien lupienj@hpwarq.hp.com
richard@locus.com (Richard M. Mathews) (02/09/91)
bzs@world.std.com (Barry Shein) writes: >Since all flink() would do is enter a string/i-num pair into a >directory I can't see how any of this applies. >.... >But if it can be flink()'d at all then we assume you could seek to >zero and copy all the data out of the file to your own file anyhow, so >that's not a new opportunity. And whether you can read or write is >dictated by the setting of the inode and how the original fd was >opened which is independent of flink() entirely. Whether you can read or write is dictated by the mode setting of the inode and the effective uid/gid AND THE MODES OF THE DIRECTORIES YOU MUST PASS THROUGH TO GET TO THE FILE. All of this is deterimined AT THE TIME OF AN OPEN, and never again. By allowing creation of a link, you create an opportunity to do an open call which would otherwise have been prevented by the directory permissions. For example, say directory /user/joe/foo is mode 700 and file /user/joe/foo/bar is mode 666. Despite the mode of the file, I can't open it. If, however, a setuid-joe program lets me run a program I wrote while it has /user/joe/foo/bar open for read, then I can flink the file to /user/richard/gotcha which I can then open for write. Adding a restriction that you can do flink only if the file is open for write is an interesting idea. Richard M. Mathews Freedom for Lithuania richard@locus.com Laisve! lcc!richard@seas.ucla.edu ...!{uunet|ucla-se|turnkey}!lcc!richard
richard@locus.com (Richard M. Mathews) (02/09/91)
barmar@think.com (Barry Margolin) writes: >In article <richard.665896415@fafnir.la.locus.com> richard@locus.com (Richard M. Mathews) writes: >>With link() you have a pathname to the file, so you have some idea where >>you can put the new link(). Presumably with flink(), you are using this >>call because you DON'T have a pathname. Since you don't know where the >>file came from, you have to do more work to figure out where it can go. >What do pathnames have to do with "where you can put the new link"? You >can put the new link anywhere on the same file system as the file. Due to >symbolic links, it is not possible to determine whether two files are on >the same file system simply by looking at the pathnames. >Presumably one of the first things that the link() system call does is I get the feeling that you thought I meant that the flink system call wouldn't be able to figure out where the link is allowed to go. That isn't what I meant at all. The "you" in my quote above refers to application programs which try to make general use of flink on a random file descriptor. I meant that a possible application of flink would NOT include a general purpose program which would flink its stdin to some file to allow it to reopen the file in a different mode (e.g., "more" wants to read stderr (at least it did at one time), so if it gets a stderr which is open for write-only, it might want an opportunity to reopen the file (terminal) for read-write). This would not be a practical use of flink because of the single file system restriction of link and flink. On the other hand, a program which knows where a file is because it created it (and thus it also knows that the original path name was not through a symbolic link) can make good use of flink to put back a file even after the link count goes to zero. Sorry, if I wasn't clear before. Richard M. Mathews D efend richard@locus.com E stonian-Latvian-Lithuanian lcc!richard@seas.ucla.edu I ndependence ...!{uunet|ucla-se|turnkey}!lcc!richard
xtdn@levels.sait.edu.au (02/09/91)
bzs@world.std.com (Barry Shein) writes: > But if it can be flink()'d at all then we assume you could seek to > zero and copy all the data out of the file to your own file anyhow, so > that's not a new opportunity. And whether you can read or write is > dictated by the setting of the inode and how the original fd was > opened which is independent of flink() entirely. Copying a file now gives access to the current contents later. Flinking a file now could give access to the future contents of that file. This may not be desirable. It is, however, unlikely to be a major problem to the aware programmer: an appropriate file mode solves all! I do like the idea of flink(). The suggestion (I forget whose it was) of fdunlink() seems appropriate, too; it sounds quite balanced. David Newall, who no longer works Phone: +61 8 344 2008 for SA Institute of Technology E-mail: xtdn@lux.sait.edu.au "Life is uncertain: Eat dessert first"
bzs@world.std.com (Barry Shein) (02/10/91)
From: xtdn@levels.sait.edu.au >bzs@world.std.com (Barry Shein) writes: >> But if it can be flink()'d at all then we assume you could seek to >> zero and copy all the data out of the file to your own file anyhow, so >> that's not a new opportunity. And whether you can read or write is >> dictated by the setting of the inode and how the original fd was >> opened which is independent of flink() entirely. > >Copying a file now gives access to the current contents later. Flinking >a file now could give access to the future contents of that file. This >may not be desirable. It is, however, unlikely to be a major problem to >the aware programmer: an appropriate file mode solves all! Ok, granted, if the path to the original file was not accessible and was opened by a priv'd program and the fd handed down (e.g. thru an open(), setuid() and then an exec()), then an flink() would bypass the original directory protection. Thus, an accessible file in an inaccessible directory would become accessible. If it were a changing file it could then be opened any time later, even long after the program exited, w/o need for the original priv. Whew. So you're right, that *is* a potential problem. And just subtle enough that it might bite someone in a big way. Just thought I'd lay that out before we got a hundred "huh?" messages. I'd say that pretty much condemns flink() as an idea unless someone can think of a way around that. I can't think of anything that's not awful. -- -Barry Shein Software Tool & Die | bzs@world.std.com | uunet!world!bzs Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/10/91)
In article <27B18AD8.2F15@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes: [ on flink() ] > It's a convenient way to create lock files -- if, that is, the kernel > also supports fdcreat(), which creates a plain file with no links. A few months ago I briefly discussed with Keith Bostic the top three calls on my BSD-extensions list: fdlink(), fdtemp(), and fdunlink(). The first was the same as flink(); the second was the same as fdcreat(); the third was the same as unlink(), but returned a file descriptor pointing to the removed file. He didn't believe that my examples of race conditions were problems in practice. > Also, the obvious companion fdunlink(int fd, char *path) is something > I've always wanted. It unlinks the given path if and only if it is a > name for fd. Hmmm. I already have fdunldilink() listed; it only removes a file if it has a specified number of hard links, device, and inode, with 0 for no restriction. I think fdunldilink(0,st.st_dev,st.st_ino,path) would do the trick after an fstat(fd,&st). ---Dan
jim@segue.segue.com (Jim Balter) (02/11/91)
In article <BZS.91Feb7163251@world.std.com> bzs@world.std.com (Barry Shein) writes: >Since all flink() would do is enter a string/i-num pair into a >directory I can't see how any of this applies. You mean, since all flink would do is let you create an accessible path to a file that you didn't previously have a accessible path to, you can't see how any of this applies? Strange. "Since all setuid(0) does is clear a few bits somewhere, I can't see how a discussion of the consequences of letting anyone do so applies."
guy@auspex.auspex.com (Guy Harris) (02/12/91)
In article <1991Feb7.064348.1873@sgitokyo.nsg.sgi.com> kandall@sgitokyo.nsg.sgi.com (Michael Kandall) writes: >In article <G2%&-2#@uzi-9mm.fulcrum.bt.co.uk> igb@fulcrum.bt.co.uk (Ian G Batten) writes: >>In article <BZS.91Feb4003139@world.std.com> bzs@world.std.com (Barry Shein) writes: >>> Think of fixed length record files and inserting into them, it would >>> be nice to be able to just copy/munge the block numbers rather than >>> the data. >> >>What's needed is a version of streams for filesystems. With Multics, >>the One True Operating System, you could attach modules (== push >>modules) such as vfile_ to provide additional functionality over and >>above that which you got from initiate_segment_ and its friends. What >>would be nice with Unix would be ISAM, record mode, whatever modules you >>could push on top of the mmap interface. Once you can map files into >>your address space most things can be done on top of that. >> >>ian > >I believe SVR4 has this. In SVR4's enhanced STREAMS, I believe you can >push STREAMS onto arbitrary file descriptors. You believe incorrectly. What S5R4 *does* have is the ability to attach a STREAMS-device file descriptor to a "node in the file system name space", using "fattach()". This does *NOT* magically turn a regular file into a STREAMS device; it turns it into a name for a STREAMS device. I.e., anybody who opens the file after you've "fattach()"ed something to it will *NOT* get a file descriptor that reads from or writes to the underlying file; the underlying file merely provides a *name* for the stream. What Ian was describing sounds more like the stuff Apollo did - with the name "Extensible Streams", but where "Streams" has nothing to do with "streams" in the Research UNIX sense or "STREAMS" in the S5 sense. The low-level means of accessing a file is by mapping into a process's address space; atop that is built a mechanism for more "conventional" file access, with each file having an "object type UID". The "object type UID" indicates what code acts as a "type manager" for the file; that "type manager" code implements operations such as "open", "read", "write", etc.. Dunno if they were "stackable" like (streams|STREAMS) modules. (Then again, I don't remember whether modules were stackable in any of Multics's I/O subsystems, either "ios_" or "iox_".) I think the type managers all lived in user-mode code. That doesn't necessarily give you the stuff Barry was referring to; the "containers" provided by the (probably kernel-level) file system are arrays of pages, similar to UNIX files. Unless there was an interface to that file system that let you insert pages into the middle of a container, you wouldn't be able to do an insert like that.
richard@locus.com (Richard M. Mathews) (02/12/91)
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes: >brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: >>Nah. flink() would only work if you have the file open for writing. >Well, writing but not O_APPEND. I don't think an O_APPEND check would be necessary. Since fcntl() can be used to change the O_APPEND flag, anything which depends on it for security would already be broken (unless you have a system which has O_APPEND but doesn't have fcntl(F_SETFL)). Richard M. Mathews Freedom for Lithuania richard@locus.com Laisve! lcc!richard@seas.ucla.edu ...!{uunet|ucla-se|turnkey}!lcc!richard
chip@tct.uucp (Chip Salzenberg) (02/14/91)
According to bzs@world.std.com (Barry Shein): >Ok, granted, if the path to the original file was not accessible and >was opened by a priv'd program and the fd handed down (e.g. thru an >open(), setuid() and then an exec()), then an flink() would bypass the >original directory protection. What if flink() were permitted only on file descriptors open for O_RDWR without O_APPEND? After all, if you have a file descriptor meeting that description, there's almost nothing bad you can do with the file that you couldn't do with the file descriptor, slower. -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip> "I want to mention that my opinions whether real or not are MY opinions." -- the inevitable William "Billy" Steinmetz
chip@tct.uucp (Chip Salzenberg) (02/14/91)
According to brnstnd@kramden.acf.nyu.edu (Dan Bernstein): >I already have fdunldilink() listed; it only removes a file if it >has a specified number of hard links, device, and inode, with 0 for >no restriction. I think fdunldilink(0,st.st_dev,st.st_ino,path) >would do the trick after an fstat(fd,&st). Yes; fdunldilink() [what a name!] can simulate my fdunlink(), but not vice versa; so fdunldilink() is the better choice. I would suggest -1 for the "don't care" value, though, since st_dev could easily be zero. It's too bad that Keith doesn't see the need for these operations. But then, adding features to BSD never made my life any easier. :-) -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip> "I want to mention that my opinions whether real or not are MY opinions." -- the inevitable William "Billy" Steinmetz
gsteckel@vergil.East.Sun.COM (Geoff Steckel - Sun BOS Hardware CONTRACTOR) (02/14/91)
In article <BZS.91Feb7163251@world.std.com> bzs@world.std.com (Barry Shein) writes: >Since all flink() would do is enter a string/i-num pair into a >directory I can't see how any of this applies. Ummm... pipes USED to be implemented as `nameless' files on PIPEDEV, with some strange semantics to make the read/write pointers wrap around at 10 blocks (or whatever size pipes were). You wanna link one of THOSE into the file system? (:-) regards, geoff steckel (gwes@wjh12.harvard.EDU) (...!husc6!wjh12!omnivore!gws) Disclaimer: I am not affiliated with Sun Microsystems, despite the From: line. This posting is entirely the author's responsibility.
bzs@world.std.com (Barry Shein) (02/14/91)
>What if flink() were permitted only on file descriptors open for >O_RDWR without O_APPEND? After all, if you have a file descriptor >meeting that description, there's almost nothing bad you can do with >the file that you couldn't do with the file descriptor, slower. Except now you can come back later, say a week later, and re-open the file (assuming the file protexns were ok), without the setuid program. So there would be no point at which an applications writer or admin could be sure that no one could open that file (w/o checking links and searching the file system.) As a more concrete example, let's say the program which let you in used a list of valid users, and you were removed from that list. If you flink()'d it, you could still get at it. Another way of putting it is, if we allow flink(), how do we ever have a file which cannot be flink()'d, you'd need to invent a new bit for open() I guess. I suppose if you had such a bit one could argue that it's up to the application writer to set it (or unset it) if s/he cares. Perhaps it should be off (disallowed) by default. -- -Barry Shein Software Tool & Die | bzs@world.std.com | uunet!world!bzs Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
xtdn@levels.sait.edu.au (02/15/91)
chip@tct.uucp (Chip Salzenberg) writes: > What if flink() were permitted only on file descriptors open for > O_RDWR without O_APPEND? After all, if you have a file descriptor > meeting that description, there's almost nothing bad you can do with > the file that you couldn't do with the file descriptor, slower. I think we've already discussed that, in time, the file contents could be changed: Just because we're allowed to read the file now doesn't mean that we should be allowed to read the future contents. Never the less I like the idea of flink() and I don't see much benefit in discussions along the lines of "only allow flink() on fd's open for O_xxx". Sorry to stifle this thread, people, but I believe that we can say little more than: flink() does have some security implications that one must be aware of, but being aware of them probably goes most of the way to circumventing those problems. Enough said? David Newall, who no longer works Phone: +61 8 344 2008 for SA Institute of Technology E-mail: xtdn@lux.sait.edu.au "Life is uncertain: Eat dessert first"
chip@tct.uucp (Chip Salzenberg) (02/16/91)
According to bzs@world.std.com (Barry Shein): >>What if flink() were permitted only on file descriptors open for >>O_RDWR without O_APPEND? > >Except now you can come back later, say a week later, and re-open the >file (assuming the file protexns were ok), without the setuid program. Well, okay. Idea #2: Allow flink() only if you are the owner of the file or the superuser. -- Chip Salzenberg at Teltronics/TCT <chip@tct.uucp>, <uunet!pdn!tct!chip> "I want to mention that my opinions whether real or not are MY opinions." -- the inevitable William "Billy" Steinmetz
xtdn@levels.sait.edu.au (02/16/91)
bzs@world.std.com (Barry Shein) writes: > Except now you can come back later, say a week later, and re-open the > file (assuming the file protexns were ok), without the setuid program. But Barry, you have put you're finger on a very salient point; which is that one can always protect the file thus disallowing it from being opened later. I don't see that flink() would cause any major security problems. Just to put this into context: as things stand one could leave the recipient process running for a week and then read the file. Really, excepting that processes can be killed and that machines do sometimes go down, flink() would not allow any access that one cannot now obtain. David Newall, who no longer works Phone: +61 8 344 2008 for SA Institute of Technology E-mail: xtdn@lux.sait.edu.au "Life is uncertain: Eat dessert first"
bzs@world.std.com (Barry Shein) (02/17/91)
>>Except now you can come back later, say a week later, and re-open the >>file (assuming the file protexns were ok), without the setuid program. > >Well, okay. > >Idea #2: Allow flink() only if you are the owner of the file or the >superuser. That still bypasses directory protections. I suppose one would be hard-pressed to come up with an example of why there would be a file which was owned by you but otherwise not accessible, but I don't like to consider that sort of argument, it only belies the limits of imagination. However, if it were "only the superuser" it would be possible to pass fd's around to setuid programs and let them flink() them. Then it would be up to the setuid application writer to figure out what rules should be imposed, which sounds about right. The potential security pitfalls could be explained in a short paragraph in the manual page, at least in the abstract. The only problem is that the pitfalls are very hard to work around (e.g. does this fd point at a file in an otherwise inaccessible path???), so the result would probably be to not do it on behalf of a non-priv'd program. But it might be of some direct use to priv'd programs. Particularly in combination with that BSD feature of passing fd's around on sockets. (oh boy, now 100 people are gonna say, huh??? look into the access rights stuff in send(2)/recv(2) on a BSD system.) -- -Barry Shein Software Tool & Die | bzs@world.std.com | uunet!world!bzs Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/17/91)
In article <BZS.91Feb16112615@world.std.com> bzs@world.std.com (Barry Shein) writes: > Particularly in > combination with that BSD feature of passing fd's around on sockets. > (oh boy, now 100 people are gonna say, huh??? look into the access > rights stuff in send(2)/recv(2) on a BSD system.) More precisely, that BSD 4.2-and-later-but-generally-buggy-before-4.3 feature of file descriptor passing on UNIX-domain sockets, usable with sendmsg(2) and recvmsg(2). Working example: pty's reconnect feature. ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/17/91)
In article <27BC2E07.673D@tct.uucp> chip@tct.uucp (Chip Salzenberg) writes: > Idea #2: Allow flink() only if you are the owner of the file or the > superuser. Idea #2': Allow link() only if you are the owner of the file or the superuser. Of course, these are both subsumed by a link protection bit (and O_LINK, O_EXEC, O_NONE, etc. bits on open()). ---Dan
jr@oglvee.UUCP (Jim Rosenberg) (02/21/91)
In <BZS.91Feb16112615@world.std.com> bzs@world.std.com (Barry Shein) writes: >I suppose one would be >hard-pressed to come up with an example of why there would be a file >which was owned by you but otherwise not accessible, but I don't like >to consider that sort of argument, it only belies the limits of >imagination. If I remember it correctly, the BRL spooler, MDQS, *routinely* sets up files for which the owner has no access by virtue of lack of a directory permission. MDQS protects its spool directory by a "lock" directory. You have to have "spooler permissions" to traverse this directory. But having done that, its actual spool files have the uid and gid of the submitting user. [Aside: MDQS is a *nice* spooler! It's amazing to me that it hasn't joined elm and smail et al among the cast of characters of PD software packages that replace the respective "standard" package that comes with the operating system. MDQS surely beats the socks off the usual System V spooler.] -- Jim Rosenberg #include <disclaimer.h> --cgh!amanue!oglvee!jr Oglevee Computer Systems / / 151 Oglevee Lane, Connellsville, PA 15425 pitt! ditka! INTERNET: cgh!amanue!oglvee!jr@dsi.com / /