merritt@BRL.MIL (Don Merritt) (02/15/89)
>I remember seeing a note from someone that said they had hacked NFS >so that client nfs requests followed through mounted file systems. >[ie: just have to mount the root on your file server to have all >the filesystems available]. > >Can anyone tell me how... > >JASON That work was done here at BRL by Doug Kingston. Here is Doug's description of how to do it. ============================================================================= Date: Fri, 5 Dec 86 22:12:03 EST >From: NFS Functionality Enhancement Committee <dpk@brl.arpa> To: Sun-Spots@rice.edu, unix-wizards@brl.arpa Subject: Updated NFS Change to merge filesystems (This is an updated version of my previous letter. We think its finished.) We are just beginning to use NFS around BRL and I have been amazed at how little thought seems to have been put into using NFS in a large collection of large hosts. Many of our machine have 8 to 16 disk segments mounted, and almost as many physical disks, so there is little that can be done to lower the number of mounted partitions. We wish to make every file system available from every system (or a close approximation of this). If we were to use the normal SUN NFS implementation, we would have mount tables with 100 to 200 mounted filesystems. This is a nightmare. I like to sleep, so I have made the following change to nfs/nfs_server.c and nfs/nfs_vnodeops.c, both part of the NFS related kernel source. The effect of this change is to make tree of mounted local file systems appear as a single homogeneous file system to remote system that mount the root of such a tree. Mount points are invisibly followed as long as they go to a file system of the same type (which in this case is local). The restriction on the same type of file system is necessary to prevent file system loops. When/If more local file system types are supported, the "if" below would have to be made smarter. The statfs operation is somewhat meaningless with this change since it will only return the stats for the file system you mounted and not any file systems under it. The change to nfs_vnodeops.c is to improve the information content of the faked-up dev entry in a stat structure of a remote file. The key problem is that the dev entry is still a short, making it very hard to make useful dev entries for remote files. My adhoc scheme allows for up to 31 remote mounts (hosts) until things fall apart. st_dev should really be at least a long. Ideally it would be an object containing an fsid and a machineid. Maybe on the next version... The end result of all this is that you can now make all the file systems on a server system available by simply mounting the root file system (actually directory, e.g. mount -t nfs -o bg,soft host:/ /n/host). We have chosen to creat a directory /n and to make a directory in it for each system we wish to make available. We then mount the root of each system as /n/hostA, /n/hostB, ... It is quite possible some of you may be able to suggest some improvments to this implementation, such as ways to make it conditional or to better handle the statfs data. For us, this change alone is a big step forward in making NFS usable in a large cluster of independent super-mini computers (Vaxen, Goulds, Alliants) as well as workstations (Iris's, Suns). Comments welcome. -Doug- Encl. Diff of /sys/nfs/nfs_server.c and nfs_vnodeops.c. Line numbers are from the Gould version of the SUN 3.0 sources, your numbers may vary. *** /tmp/,RCSt1000202 Mon Jan 26 23:30:03 1987 --- nfs_server.c Mon Jan 26 23:03:44 1987 *************** *** 282,288 **** --- 282,306 ---- return; } + #ifdef BRL /* + * Handle ".." special case. + * If this vnode is the root of a mounted + * file system, then replace it with the + * vnode which was mounted on so we take the + * .. in the other file system. + */ + if (da->da_name[0]=='.' && da->da_name[1]=='.' && da->da_name[2]==0) { + while (dvp->v_flag & VROOT) { + vp = dvp->v_vfsp->vfs_vnodecovered; + VN_HOLD(vp); + VN_RELE(dvp); + dvp = vp; + } + } + #endif BRL + + /* * do lookup. */ error = VOP_LOOKUP(dvp, da->da_name, &vp, u.u_cred); *************** *** 289,294 **** --- 307,345 ---- if (error) { vp = (struct vnode *)0; } else { + #ifdef BRL + register struct vfs *vfsp; + struct vnode *tvp; + + /* + * The following allows the exporting of contiguous + * collections of local file systems. -DPK- + * + * If this vnode is mounted on, and the mounted VFS + * is the same as the current one (local), then we + * transparently indirect to the vnode which + * is the root of the mounted file system. + * Before we do this we must check that an unmount is not + * in progress on this vnode. This maintains the fs status + * quo while a possibly lengthy unmount is going on. + */ + mloop: + while ((vfsp = vp->v_vfsmountedhere) && + vfsp->vfs_op == vp->v_vfsp->vfs_op) { + while (vfsp->vfs_flag & VFS_MLOCK) { + vfsp->vfs_flag |= VFS_MWAIT; + sleep((caddr_t)vfsp, PVFS); + goto mloop; + } + error = VFS_ROOT(vp->v_vfsmountedhere, &tvp); + VN_RELE(vp); + if (error) { + vp = (struct vnode *)0; + goto bad; + } + vp = tvp; + } + #endif BRL error = VOP_GETATTR(vp, &va, u.u_cred); if (!error) { vattr_to_nattr(&va, &dr->dr_attr); *************** *** 295,305 **** error = makefh(&dr->dr_fhandle, vp); } } dr->dr_status = puterrno(error); ! if (vp) { VN_RELE(vp); ! } ! VN_RELE(dvp); #ifdef NFSDEBUG dprint(nfsdebug, 5, "rfs_lookup: returning %d\n", error); #endif --- 346,357 ---- error = makefh(&dr->dr_fhandle, vp); } } + bad: dr->dr_status = puterrno(error); ! if (vp) VN_RELE(vp); ! if (dvp) ! VN_RELE(dvp); #ifdef NFSDEBUG dprint(nfsdebug, 5, "rfs_lookup: returning %d\n", error); #endif *** /tmp/,RCSt1000210 Mon Jan 26 23:30:18 1987 --- nfs_vnodeops.c Fri Jan 2 17:51:54 1987 *************** *** 579,585 **** --- 578,590 ---- */ rp = vtor(vp); nattr_to_vattr(&rp->r_nfsattr, vap); + #ifdef BRL + /* a better better kludge ??? */ + vap->va_fsid &= 0x7ff; + vap->va_fsid |= ((vtomi(vp)->mi_mntno+1)<<11); + #else vap->va_fsid = 0xff00 | vtomi(vp)->mi_mntno; + #endif BRL if (rp->r_size < vap->va_size) { rp->r_size = vap->va_size; } else if (vap->va_size < rp->r_size) { *************** *** 600,606 **** --- 605,617 ---- * an dev from the mount number and an arbitrary major * number 255. */ + #ifdef BRL + /* a better better kludge ??? */ + vap->va_fsid &= 0x7ff; + vap->va_fsid |= ((vtomi(vp)->mi_mntno+1)<<11); + #else vap->va_fsid = 0xff00 | vtomi(vp)->mi_mntno; + #endif BRL if (rp->r_size < vap->va_size) { rp->r_size = vap->va_size; } else if (vap->va_size < rp->r_size) {