CALT@SLACVM.SLAC.STANFORD.EDU (06/18/91)
On my machines, the 530s, there are a number of filesystems exported to be mounted by a number of other machines here. By exprience, I learned that mounting NFS filesystems by the hard/foreground options may cause the machines hanging, while by the soft/backgroud options seems to work OK. Here comes a problem: If some machine DOES mount my exported filesystems by the hard/foreground optins, it may cause my machines hanging. Is there any way to configure my exporting filesystems as follows: Only the machines which use the soft/backgronud options will be allowed to use my exported filesystems. Is it possible? Or is there any other way to solve the problem? Thanks in advanc for any advice! Ching Shih shih@cithex.bitnet shih@cithe1.cithep.caltech.edu
marc@ekhomeni.austin.ibm.com (Marc Wiz) (06/19/91)
In article <91169.000329CALT@SLACVM.SLAC.STANFORD.EDU>, CALT@SLACVM.SLAC.STANFORD.EDU writes: > By exprience, I learned that mounting NFS filesystems by the > hard/foreground options may cause the machines hanging, while > by the soft/backgroud options seems to work OK. > To put it mildly this is not a good thing to do. Remember if you mount the filesystem soft the client process will get an error after three retries. If your application can handle this fine but I have to wonder how many applications can not. If you care about your data I recommend hard mounts. At least when the server/network comes back up the data will be written/read to/from the server. > Here comes a problem: If some machine DOES mount my exported filesystems > by the hard/foreground optins, it may cause my machines hanging. Is there > any way to configure my exporting filesystems as follows: > Only the machines which use the soft/backgronud options will be allowed > to use my exported filesystems. > The mount options are controlled from the client. The server has no control over this. What are you trying to accomplish? Marc Wiz MaBell (512)823-4780 NFS/NIS Change Team Yes that really is my last name. The views expressed are my own. marc@aixwiz.austin.ibm.com or uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc
mrl@uai.com (Mark R. Ludwig) (06/19/91)
In article <8567@awdprime.UUCP>, marc@ekhomeni (Marc Wiz) writes: >In article <91169.000329CALT@SLACVM.SLAC.STANFORD.EDU>, >CALT@SLACVM.SLAC.STANFORD.EDU writes: >> By exprience, I learned that mounting NFS filesystems by the >> hard/foreground options may cause the machines hanging, while >> by the soft/backgroud options seems to work OK. >> > >To put it mildly this is not a good thing to do. Remember if you mount the >filesystem soft the client process will get an error after three >retries. If your >application can handle this fine but I have to wonder how many applications >can not. If you care about your data I recommend hard mounts. At least when >the server/network comes back up the data will be written/read to/from >the server. I agree fully. At least one of the Sun administration manuals states it bluntly: if you are mounting the NFS read/write, you should mount it hard. To do otherwise is to risk corrupted files. However, I believe if you have *very* intelligent applications manipulating the files, you may disregard this warning, but I dare say the average Unix utility is not in this category. Furthermore, why would you want this? Since your application probably really wants to write the file it was trying to write when the server went silent, then your application has to keep trying until the server responds. With ``hard'' the system does it for you. The second part which I want to address is the foreground/background part. We use the ``bg'' option routinely, because the NFS partitions are not required for the system to operate, and this allows the system to finish multi-user startup without mounting all the NFS partitions. The NFS partitions are only required for certain applications to run. If the partition is required for the system, you probably must use foreground. >> Here comes a problem: If some machine DOES mount my exported filesystems >> by the hard/foreground optins, it may cause my machines hanging. Is there >> any way to configure my exporting filesystems as follows: >> Only the machines which use the soft/backgronud options will be allowed >> to use my exported filesystems. Come again? You're saying that the *server* is hanging because the *client* mounts the NFS hard? I've never seen that happen. >What are you trying to accomplish? Right. This is the first question we have to ask. It helps to get answers when you explain what you really want to do, and the circumstances which caused you to be wedged into the corner. Maybe then we can get you centered in the room.$$ -- INET: mrl@uai.com UUCP: uunet!uaisun4!mrl PSTN: +1 213 822 4422 USPS: 7740 West Manchester Boulevard, Suite 208, Playa del Rey, CA 90293 WANT: Succinct, insightful statement to occupy this space. Inquire within.
jona@iscp.Bellcore.COM (Jon Alperin) (06/19/91)
In article <1991Jun19.154830.17276@uai.com>, mrl@uai.com (Mark R. Ludwig) writes: |> In article <8567@awdprime.UUCP>, marc@ekhomeni (Marc Wiz) writes: |> >In article <91169.000329CALT@SLACVM.SLAC.STANFORD.EDU>, |> >CALT@SLACVM.SLAC.STANFORD.EDU writes: |> >> By exprience, I learned that mounting NFS filesystems by the |> >> hard/foreground options may cause the machines hanging, while |> >> by the soft/backgroud options seems to work OK. |> >> |> > |> >To put it mildly this is not a good thing to do. Remember if you mount the |> >filesystem soft the client process will get an error after three |> >retries. If your |> >application can handle this fine but I have to wonder how many applications |> >can not. If you care about your data I recommend hard mounts. At least when |> >the server/network comes back up the data will be written/read to/from |> >the server. |> |> I agree fully. At least one of the Sun administration manuals states |> it bluntly: if you are mounting the NFS read/write, you should mount |> it hard. To do otherwise is to risk corrupted files. However, I |> believe if you have *very* intelligent applications manipulating the |> files, you may disregard this warning, but I dare say the average Unix |> utility is not in this category. Furthermore, why would you want |> this? Since your application probably really wants to write the file |> it was trying to write when the server went silent, then your |> application has to keep trying until the server responds. With |> ``hard'' the system does it for you. Hey...maybe this explains the reason that when I save a file under VI which is kept on another NFS partition, VI tells me that it was able to save the file, but because the real physical disk was full I end up with a 0 length file (and lose all my work)..... |> -- |> INET: mrl@uai.com UUCP: uunet!uaisun4!mrl PSTN: +1 213 822 4422 |> USPS: 7740 West Manchester Boulevard, Suite 208, Playa del Rey, CA 90293 |> WANT: Succinct, insightful statement to occupy this space. Inquire within. -- Jon Alperin Bell Communications Research ---> Internet: jona@iscp.bellcore.com ---> Voicenet: (908) 699-8674 ---> UUNET: uunet!bcr!jona * All opinions and stupid questions are my own *
jackv@turnkey.tcc.com (Jack F. Vogel) (06/20/91)
In article <1991Jun19.162331.25505@bellcore.bellcore.com> jona@iscp.Bellcore.COM (Jon Alperin) writes: >In article <1991Jun19.154830.17276@uai.com>, mrl@uai.com (Mark R. Ludwig) writes: [ stuff about using the 'hard' mount for data integrity deleted...] > > Hey...maybe this explains the reason that when I save a file >under VI which is kept on another NFS partition, VI tells me that it >was able to save the file, but because the real physical disk was full I >end up with a 0 length file (and lose all my work)..... No, I don't believe that mounting the filesystem 'hard' will prevent this from happening. The reason this can happen is that the NFS client is doing a bawrite() (asynchronous) so it doesn't get an immediate error, rather just the inode is marked in error. If you wrote enough data that multiple calls to bawrite() were necessary then the error would be noticed and vi would tell you. In the NFS in BSD 4.3 reno, there was a mount option 'synchronous' that solves this by forcing the client to use bwrite(), thus you will be guaranteed to see the error, of course you are going to suffer somewhat of a performance hit by using it. I don't know if the current SunOS has such an option or not. Also I don't know what level of NFS the 6000 is based on, so it could have such an option for all I know, check the man page for mount. Disclaimer: I'm a kernel hacker not a company spokesweenie :-}. -- Jack F. Vogel jackv@locus.com AIX370 Technical Support - or - Locus Computing Corp. jackv@turnkey.TCC.COM
marc@ekhomeni.austin.ibm.com (Marc Wiz) (06/20/91)
In article <1991Jun19.172354.9964@turnkey.tcc.com>, jackv@turnkey.tcc.com (Jack F. Vogel) writes: > From: jackv@turnkey.tcc.com (Jack F. Vogel) > > Hey...maybe this explains the reason that when I save a file > >under VI which is kept on another NFS partition, VI tells me that it > >was able to save the file, but because the real physical disk was full I > >end up with a 0 length file (and lose all my work)..... > > No, I don't believe that mounting the filesystem 'hard' will prevent this > from happening. The reason this can happen is that the NFS client is doing > a bawrite() (asynchronous) so it doesn't get an immediate error, rather > just the inode is marked in error. If you wrote enough data that multiple > calls to bawrite() were necessary then the error would be noticed and vi > would tell you. > > In the NFS in BSD 4.3 reno, there was a mount option 'synchronous' that > solves this by forcing the client to use bwrite(), thus you will be > guaranteed to see the error, of course you are going to suffer somewhat > of a performance hit by using it. I don't know if the current SunOS has > such an option or not. Also I don't know what level of NFS the 6000 is > based on, so it could have such an option for all I know, check the > man page for mount. Mounting the file system hard will not prevent the problem. An application will not "see" the error until a fsync or close is done on the file. Probably what is happening with vi is that it is not checking the return from close. I can't ever remember seeing someone check the return from a close with all the code that I have looked at. (I'm sure there are folks out there who have looked at a lot more code that I have :-) NFS for the 6000 does not have a sync option for mount. There is a way to force synchronous operations on a file. First of all opening the file O_SYNC does NOT do it. (I don't know whether this is a problem or not) The way to do it is to open the file and then obtain a lock on a piece of the file. When NFS sees that a lock has been obtained it turns off caching for that file. One thing to point out is that you would take one heck of a performance hit if you were running synchronously. One could do a fsync after every write but that means taking a performance hit not to mention you have to change your application. I'm sure that there are other ways to handle this without impacting performance too much. One way to do it (IMHO) is to have your application hold on to it's buffers between fsync calls. If you get an error from fsync, correct the problem and then reissue the fsync for that batch of writes. This means that the application will need someway of knowing when to resume (i.e. the problem aka file system full has been corrected) As an aside does anyone out there check the return from close in their programs? Marc Wiz MaBell (512)823-4780 NFS/NIS Change team Yes that really is my last name. The views expressed are my own. marc@aixwiz.austin.ibm.com or uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc
teexand@ioe.lon.ac.uk (Andrew Dawson) (06/20/91)
In <1991Jun19.162331.25505@bellcore.bellcore.com> jona@iscp.Bellcore.COM (Jon Alperin) writes: > Hey...maybe this explains the reason that when I save a file >under VI which is kept on another NFS partition, VI tells me that it >was able to save the file, but because the real physical disk was full I >end up with a 0 length file (and lose all my work)..... This sounds like something we have been discussing with IBM recently. I think essentially the client is cacheing requests, so although write() returns sucessfully, the data hasn't been written to disk. Your application may pick up an error if fsync is called, and the close should also fail (not that many applications check this). However, I'm not sure even this much worked until we'd applied an APAR fix. -- #include <std_disclaimer.h> /* My brain was swiss-cheesed when I wrote this */ JANET: andrew@uk.ac.ucl.sm.uxm UUCP/EARN/BITNET: andrew@uxm.sm.ucl.ac.uk INTERNET: andrew%uxm.sm.ucl.ac.uk@nsfnet-relay.ac.uk "Leapers do it with assistance from neurological holograms"
jackv@turnkey.tcc.com (Jack F. Vogel) (06/20/91)
In article <8623@awdprime.UUCP> marc@aixwiz.austin.ibm.com writes: >In article <1991Jun19.172354.9964@turnkey.tcc.com>, >jackv@turnkey.tcc.com (Jack F. Vogel) writes: [ stuff about using the 'hard' nfs mount option not preventing data loss] |Mounting the file system hard will not prevent the problem. An |application will |not "see" the error until a fsync or close is done on the file. Probably what |is happening with vi is that it is not checking the return from close. I can't |ever remember seeing someone check the return from a close with all the |code that |I have looked at. Right, its not just an issue of user-written applications not doing this, its that every application "command" in the system, like vi, don't do this. Marc, you and I both know where this issue is coming from (like a certain customer complaint :-) and this is BOGUS. The user is free to write code to check the return from close should he like, but rewriting vi ?!? NO, the right option as far as I am concerned is what was done in 4.3reno, provide a synchronous option to mount!! >As an aside does anyone out there check the return from close in their >programs? We could probably generate a month-long thread of arguments in comp.unix.wizards on the appropriateness of this :-} :-}! Disclaimer: I hack the kernel, I don't speak for the company. -- Jack F. Vogel jackv@locus.com AIX370 Technical Support - or - Locus Computing Corp. jackv@turnkey.TCC.COM
chip@tct.com (Chip Salzenberg) (06/20/91)
According to marc@aixwiz.austin.ibm.com: >As an aside does anyone out there check the return from close in their >programs? Yes. In writing the mail delivery program "Deliver", I was (and still am) paranoid about checking all system calls for which I have some reasonable action in case of failure. -- Chip Salzenberg at Teltronics/TCT <chip@tct.com>, <uunet!pdn!tct!chip> "You can call Usenet a democracy if you want to. You can call it a totalitarian dictatorship run by space aliens and the ghost of Elvis. It doesn't matter either way." -- Dave Mack
jona@iscp.Bellcore.COM (Jon Alperin) (06/20/91)
In article <8623@awdprime.UUCP>, marc@ekhomeni.austin.ibm.com (Marc Wiz) writes: |> In article <1991Jun19.172354.9964@turnkey.tcc.com>, |> jackv@turnkey.tcc.com (Jack F. Vogel) writes: |> > From: jackv@turnkey.tcc.com (Jack F. Vogel) <from my original post> |> > > Hey...maybe this explains the reason that when I save a file |> > >under VI which is kept on another NFS partition, VI tells me that it |> > >was able to save the file, but because the real physical disk was full I |> > >end up with a 0 length file (and lose all my work)..... |> > <mark responds....> |> This means that the application will need someway of |> knowing when to |> resume (i.e. the problem aka file system full has been corrected) Yes, but has the "file system full" bug been fixed. This is becoming an increasing problem in large NFS server/WS based environments, since the editing is being done on the WS, but the file gets trashed on the server. Furthermore, the server copy is completely trashed (size = 0) rather than just losing the changes made in that editing session. |> |> Marc Wiz MaBell (512)823-4780 |> |> NFS/NIS Change team |> marc@aixwiz.austin.ibm.com -- Jon Alperin Bell Communications Research ---> Internet: jona@iscp.bellcore.com ---> Voicenet: (908) 699-8674 ---> UUNET: uunet!bcr!jona * All opinions and stupid questions are my own *
jona@iscp.Bellcore.COM (Jon Alperin) (06/20/91)
In article <1991Jun20.090136.14351@ioe.lon.ac.uk>, teexand@ioe.lon.ac.uk (Andrew Dawson) writes: |> In <1991Jun19.162331.25505@bellcore.bellcore.com> jona@iscp.Bellcore.COM (Jon Alperin) writes: |> |> > Hey...maybe this explains the reason that when I save a file |> >under VI which is kept on another NFS partition, VI tells me that it |> >was able to save the file, but because the real physical disk was full I |> >end up with a 0 length file (and lose all my work)..... |> |> This sounds like something we have been discussing with IBM recently. I think |> essentially the client is cacheing requests, so although write() returns |> sucessfully, the data hasn't been written to disk. Your application may pick up |> an error if fsync is called, and the close should also fail (not that many |> applications check this). However, I'm not sure even this much worked until |> we'd applied an APAR fix. SO....whats the APAR #? |> JANET: andrew@uk.ac.ucl.sm.uxm UUCP/EARN/BITNET: andrew@uxm.sm.ucl.ac.uk |> INTERNET: andrew%uxm.sm.ucl.ac.uk@nsfnet-relay.ac.uk |> "Leapers do it with assistance from neurological holograms" -- Jon Alperin Bell Communications Research ---> Internet: jona@iscp.bellcore.com ---> Voicenet: (908) 699-8674 ---> UUNET: uunet!bcr!jona * All opinions and stupid questions are my own *
marc@ekhomeni.austin.ibm.com (Marc Wiz) (06/20/91)
> Yes, but has the "file system full" bug been fixed. This is becoming > an increasing problem in large NFS server/WS based environments, since the > editing is being done on the WS, but the file gets trashed on the server. > Furthermore, the server copy is completely trashed (size = 0) rather than just > losing the changes made in that editing session. The problem is being addressed and somone is aware that the server copy of the file is being trashed. Marc Wiz MaBell (512)823-4780 Yes that really is my last name. The views expressed are my own. marc@aixwiz.austin.ibm.com or uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc
jpe@egr.duke.edu (John P. Eisenmenger) (06/21/91)
From article <91169.000329CALT@SLACVM.SLAC.STANFORD.EDU>, by CALT@SLACVM.SLAC.STANFORD.EDU: > By exprience, I learned that mounting NFS filesystems by the > hard/foreground options may cause the machines hanging, while > by the soft/backgroud options seems to work OK. Hmm.. I usually mount hard,bg,intr and have had no problems. In general I use hard mounts for read-write mounts and soft mounts for read-only mounts. Putting the mount in the background will prevent the client from waiting indefinitely for the mount to complete... -John
marc@ekhomeni.austin.ibm.com (Marc Wiz) (06/21/91)
In article <1991Jun20.141138.7555@bellcore.bellcore.com>, jona@iscp.Bellcore.COM (Jon Alperin) writes: > SO....whats the APAR #? And the winning numbers are: ix18846 and ix20007. Marc Wiz MaBell (512)823-4780 Yes that really is my last name. The views expressed are my own. marc@aixwiz.austin.ibm.com or uunet!cs.utexas.edu!ibmchs!auschs!ekhomeni.austin.ibm.com!marc
jfh@rpp386.cactus.org (John F Haugh II) (06/24/91)
In article <1991Jun20.090136.14351@ioe.lon.ac.uk>, teexand@ioe.lon.ac.uk (Andrew Dawson) writes: > In <1991Jun19.162331.25505@bellcore.bellcore.com> jona@iscp.Bellcore.COM (Jon Alperin) writes: > > > Hey...maybe this explains the reason that when I save a file > >under VI which is kept on another NFS partition, VI tells me that it > >was able to save the file, but because the real physical disk was full I > >end up with a 0 length file (and lose all my work)..... > > This sounds like something we have been discussing with IBM recently. I think > essentially the client is cacheing requests, so although write() returns > sucessfully, the data hasn't been written to disk. Your application may pick > up an error if fsync is called, and the close should also fail (not that many > applications check this). However, I'm not sure even this much worked until > we'd applied an APAR fix. There have been problems with "vi" and other applications which do not check the return status of "close" when using NFS on certain file systems. For example, I worked on an APAR where "vi" of a file on an NFS-mounted MVS file system failed if the file was extended. The solution was, as I recall, to have fsync() called and to check its return status. I'd suggest that anyone who finds a problem where a program thinks it has exited successfully, but the data wasn't written, open an APAR. The cause will probably be very similiar. -- John F. Haugh II | Distribution to | UUCP: ...!cs.utexas.edu!rpp386!jfh Ma Bell: (512) 255-8251 | GEnie PROHIBITED :-) | Domain: jfh@rpp386.cactus.org "UNIX signals are not interrupts. Worse, SIGCHLD/SIGCLD is not even a UNIX signal, it's an abomination." -- Doug Gwyn