mcgregor@hemlock.Atherton.COM (Scott McGregor) (08/11/90)
I am curious as to how accesses to multiple concurrent types of file systems are implemented under SysV and BSD. I've read the Bach book and the BSD book, as well as books on device drivers. But what I am more interested in is at the switchable file system layer. Explanations of what goes on in the following example are welcome, recommendations of books that cover this stuff is even more heartily desired. I know that mount takes a file type. I know how the file system tree is traced using inodes, and I have heard of vnodes (which is what I think I want to know more about) but I haven't found anything written on vnodes. I am aware that several vendors support multiple varieties of file systems (Bell, BSD, NFS, MS-DOS) all accessable on the same system, and the files on them can be accessed using standard o/s calls and stdio library routines. I guess what I am interested in is if I have a non-unix file system and I want to allow the this file system to be mounted as a unix file system, and accessed using open, creat, read, write, close, et al, what interface translators would I have to create between my own file system and the unix system calls, and how and where would I add typically add these translators. I presume that a new type would have to be tested for in mount, and that the open, etc. commands would have to know what type of file system they were operating on and that a case statement switch would have to be supported by them for each new type of file system to be supported. Is this correct, or is there some other way that the switching between file systems is handled. Mail can be directed to: mcgregor@atherton.com or ...!sun!atherton!mcgregor Scott McGregor Atherton Technology
chris@mimsy.umd.edu (Chris Torek) (08/13/90)
In article <28595@athertn.Atherton.COM> mcgregor@hemlock.Atherton.COM (Scott McGregor) writes: >I am curious as to how accesses to multiple concurrent types of file >systems are implemented under SysV and BSD. ... I have heard of vnodes >(which is what I think I want to know more about) but I haven't found >anything written on vnodes. Very little *is* written about them. The first thing to know is that everyone does it differently. SysV (through R3 at least) uses the File System Switch. Each `file' (represented as an inode? I have never seen the code) has a file system ID attached, and instead of doing something like error = inode_read(struct inode *ip, parameters); one does something like error = (*fssw[ip->i_fstype].fs_read)(ip, parameters); Ultrix: uses gnodes. Gnodes are like vnodes or inodes, only different. (Never having dealt with them I cannot say how so, other than in name.) SunOS: uses vnodes. Vnodes are different in just about every release of SunOS (they keep getting bigger). The important part is that vnodes have a pointer to an operations vector, so instead of error = vnode_read(struct vnode *vp, parameters); one writes error = (*vp->v_ops->vn_read)(vp, parameters); These are encapulated as macros to save typing: error = VOP_READ(vp, parameters); 4.3BSD-Reno: uses vnodes that are very similar to SunOS vnodes, but not identical. Again vnodes contain pointers to operation vectors, and again one uses macros to invoke them. In this case, however, operations are permitted to store state; callers must call VOP_ABORTOP when `backing out' of something. In other words, one does something like - Look up the following name with intent to delete. - Oops, never mind, something went wrong; I will not delete the file after all. Or - Look up the following name with intent to delete. - Delete the name you found. (SunOS does not have the `abort' step, thus every operation must be self- contained.) >I guess what I am interested in is if I have a non-unix file system >and I want to allow the this file system to be mounted as a unix >file system, and accessed using open, creat, read, write, close, et al, >what interface translators would I have to create between my own >file system and the unix system calls, and how and where would I >add typically add these translators. For 4.3BSD-Reno, you would: 1. modify <sys/mount.h> to add the parameters needed for a new mount, along with a mount type: #define MOUNT_FOOFS 5 #define MOUNT_MAXTYPE 5 /* nb: not 6 (go argue with kirk...) */ struct foofs_args { ... whatever ... }; 2. write VFS operations to implement the following. `mp' is always a `struct mount *', and is used to name the particular foofs file system (except during mount, where you have to figure out which foofs file system is meant from the args, and store it for later operations). foofs_mount(mp, char *path, caddr_t args, struct nameidata *ndp) mount the foofs file system at `path' using the given arguments. foofs_start(mp, int flags) start operation on the given foofs file system (after mount completes). foofs_unmount(mp, int forcibly) unmount the given foofs file system, possibly even if something is going on on it. foofs_root(mp, struct vnode **vpp) set *vpp to the vnode that represents the root of the given foofs file system. foofs_quotactl(mp, int cmd, uid_t uid, caddr_t arg) implement quotas, or (more simply) return EOPNOSUPPORT. foofs_statfs(mp, struct statfs *sbp) fill in a `statfs' structure. foofs_sync(mp, int waitfor) write all data to permanent storage, optionally waiting until done befor returning. foofs_fhtovp(mp, struct fid *fidp, struct vnode **vpp) convert a `file identifier' (private data) to a vnode, storing the resulting vnode in *vpp. foofs_vptofh(struct vnode *vp, struct fid *fidp) convert a vnode (vp) to a file identifier that can later be used to get vp again. foofs_init() set up any private data structures at boot time (e.g., inode hash chains for a ufs file system). 3. write vnode operations. vp is always a `struct vnode *'; vpp is always a `struct vnode **' into which the resulting vnode is stored; ndp is always a `struct nameidata *'; vap is always a `struct vattr *' containing the vnode attributes to apply or to be filled in; and cred is always a `struct ucred *' containing the (Unix-level) credentials of the person or program doing the operation (uid and gids). (Often the credentials are in the nameidata instead.) foofs_lookup(vp, ndp) look up a path name segment (it contains no slashes). ndp has all the semantics embedded, except that the lookup takes place in the directory given by vp. foofs_create(ndp, vap) create a file (after a lookup with CREATE set); store the new vnode in ndp->ni_vp. foofs_mknod(ndp, vap, cred) create a `node' (for Unix-specific thing like devices). foofs_open(vp, int flags, cred) open a file; flags is from the open() syscall, e.g., no-delay. foofs_close(vp, int flags, cred) close a file. foofs_access(vp, int flags, cred) test to see if the given user (cred) has the given access mode (flags&VREAD, flags&VWRITE, flags&VEXEC). foofs_getattr(vp, vap, cred) get file attributes (store in *vap). foofs_setattr(vp, vap, cred) set file attributes (do not change those marked VNOVAL). foofs_read(vp, struct uio *uio, int ioflags, cred) read from file; valid flags are IO_NDELAY: do not block. foofs_write(vp, struct uio *uio, int ioflags, cred) write to file; valid flags are IO_UNIT: write as atomic unit (if write fails, undo it); IO_APPEND: append to file; IO_SYNC: do synchronous writes; IO_NDELAY: do not block. foofs_ioctl(vp, int cmd, caddr_t data, int flags, cred) do ioctl operation. foofs_select(vp, int which, int flag, cred) select for read/write/exception (which==FREAD/FWRITE/0 resp). foofs_mmap(vp, not-defined-yet) reserved for 4.4BSD. probably should be replaced with name and unname page operations a la SunOS. foofs_fsync(vp, int flags, cred, int waitfor) push all blocks for the file to permanent storage. fflags is the file table entry flags. optionally, wait until done. foofs_seek(vp, oldoff, newoff, cred) I dunno what this is doing in the ops vector. Seeks are all taken care of above the vnode layer. foofs_remove(ndp) remove a file (after a lookup with DELETE). foofs_link(vp, ndp) create a link to the file vp (after a lookup with CREATE). foofs_rename(struct nameidata *old, struct nameidata *new) rename the old name to the new one (after a lookup with RENAME). foofs_mkdir(ndp, vap) create a directory (after a lookup with CREATE). foofs_rmdir(ndp) remove a directory (after a lookup with REMOVE). foofs_symlink(ndp, vap, char *target) create a symbolic link (after a lookup with CREATE). foofs_readdir(vp, struct uio *uio, cred, int *eofflag) read directory contents (store in `struct dirent' format). eofflag is not used for anything. foofs_readlink(vp, struct uio *uio, cred) read symlink contents. foofs_abortop(ndp) changed mind after operation with intent to create/remove/rename. foofs_inactive(vp) last close on file, do whatever is appropriate (e.g., write inode back). foofs_reclaim(vp) vnode vp being reused; disassociate from any cache. foofs_lock(vp) lock underying object (if possible). foofs_unlock(vp) unlock underlying object. (These are not `file locking' locks; POSIX locking is not yet implemented.) foofs_bmap(vp, daddr_t bn, vpp, daddr_t *mapbn) map logical block number to physical block number, for old VM code. this should go away. foofs_strategy(struct buf *bp) map logical block to physical and do I/O. (Probably also should go away; read/write should handle this. I do not trust the current buffer cache code....) foofs_print(vp) print contents of vnode, for debugging. foofs_islocked(vp) return true if underlying object is locked. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris (New campus phone system, active sometime soon: +1 301 405 2750)
guy@auspex.auspex.com (Guy Harris) (08/13/90)
>SysV (through R3 at least) uses the File System Switch.
SysV R4: uses vnodes that are very similar to SunOS 4.x vnodes, but not
identical. They're probably much closer to SunOS 4.x vnodes than are
4.3-Reno vnodes, though.
richard@aiai.ed.ac.uk (Richard Tobin) (08/14/90)
In article <28595@athertn.Atherton.COM> mcgregor@hemlock.Atherton.COM (Scott McGregor) writes: >I guess what I am interested in is if I have a non-unix file system >and I want to allow the this file system to be mounted as a unix >file system, and accessed using open, creat, read, write, close, et al, If your system already has NFS, you can do this without kernel modifications, by writing an NFS server for your device. NFS mounting works by passing the kernel the address of a socket through which it can send and receive messages from the filesystem. I recently hacked up such a thing so that I can mount Minix floopies on a Sun. I can send you the code if you're interested. -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin
del@thrush.mlb.semi.harris.com (Don Lewis) (08/15/90)
In article <3199@skye.ed.ac.uk> richard@aiai.UUCP (Richard Tobin) writes: >In article <28595@athertn.Atherton.COM> mcgregor@hemlock.Atherton.COM (Scott McGregor) writes: >>I guess what I am interested in is if I have a non-unix file system >>and I want to allow the this file system to be mounted as a unix >>file system, and accessed using open, creat, read, write, close, et al, > >If your system already has NFS, you can do this without kernel >modifications, by writing an NFS server for your device. NFS mounting >works by passing the kernel the address of a socket through which it >can send and receive messages from the filesystem. I recently hacked >up such a thing so that I can mount Minix floopies on a Sun. > >I can send you the code if you're interested. I did this for an automatic version of /usr/hosts. It periodically reads the hosts YP map and emulates a directory of symbolic links to /usr/ucb/rsh for the map entries. I have another application in mind where I would like to build my own filesystem type. It would not be a complete filesystem implementation. The reason that I can't do it with an NFS server is that I need to know what syscall a process is executing when the process is doing a lookup in my filesystem. -- Don "Truck" Lewis Harris Semiconductor Internet: del@mlb.semi.harris.com PO Box 883 MS 62A-028 Phone: (407) 729-5205 Melbourne, FL 32901
andrew@alice.UUCP (Andrew Hume) (08/15/90)
i agree with richard tobin; a user level file server is the right way to go, even if it is NFS. it is easier to write, easier to debug and easier on the other users on your system. as an example of how easy it can be, given the appropriate libraries, implementing a 10th edition netb server requires about 350-400 lines of code if the underlying base looks something like a unix filesystem. once you understand what is going on and if you need the speed etc., then you should plug in the code into your variant of the [gv...]node stuff.