[comp.emacs] GNUemacs on SUNS: full partitions and NFS timeouts

murthy@arvak (Chet Murthy) (02/08/88)

We run GNUemacs version 18.41 here, and we have been experiencing problems with
writing files to full partitions, or to NFS servers which can timeout.

SYNOPSIS:
	When I run on a SUN-3, if the partition is full, or the NFS write
	times out, or perhaps the create does, then GNUemacs will not
	report this error all the time.  This results in truncated (to zero
	length) or nonexistent files, and on a heavily loaded Ethernet,
	it can result in all versions of the file being lost.  Has
	anybody else out there seen this happen?


	--chet--
In Real Life:		Chet Murthy
ARPA:			murthy@svax.cs.cornell.edu
SnailMail:		Chet Murthy
			North Woods Apts #20-2A
			700 Warren Road
			Ithaca, NY 14850
Office:			4162 Upson (607) 255-2219
MaBellNet:		(607)-257-2542

karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (02/08/88)

murthy@arvak writes:
   When I run on a SUN-3, if the partition is full, or the NFS
   write times out, or perhaps the create does, then GNUemacs
   will not report this error all the time.  This results in
   truncated (to zero length) or nonexistent files, and on a
   heavily loaded Ethernet, it can result in all versions of
   the file being lost.  Has anybody else out there seen this
   happen?

Try mounting your filesystems hard.  (/etc/fstab's 4th parameter.)
With "hard" mounting, your workstation will wait forever for the
server to come back to life before return from the write() or
close().  It sounds like you're using soft mounts, which allow you to
keep doing things in the face of a server's failure, but loses the
guarantee of writes being done as requested by the program.
-=-
Karl

hrp@hall.cray.com (Hal Peterson) (02/09/88)

In article <13252@cornell.UUCP>, murthy@arvak (Chet Murthy) writes:
> 
> We run GNUemacs version 18.41 here, and we have been experiencing problems with
> writing files to full partitions, or to NFS servers which can timeout.

We ran into this, too.  I think it's either a bug in SunOS or a problem
in the mapping from NFS to UNIX(R) system calls.  What should happen is
this: the write(2) call to save the buffer returns -1 with ENOSPC.  Instead,
the write call says that everything is OK, and the fsync(2) call gives
the ENOSPC error.  This is especially interesting since, according to the
man page, ENOSPC isn't one of the choices for fsync.  Anyway, the GNU
Emacs code checks the return value of the write, but not of the fsync,
resulting in the observed symptom.

Long term fix:  fix SunOS.  In the meantime, though, it wouldn't hurt to
check the return value from fsync.
-- 
Hal Peterson / Cray Research / 1345 Northland Dr. / Mendota Hts, MN  55120
hrp%hall.CRAY.COM@umn-rei-uc.ARPA	ihnp4!cray!hrp	    (612) 681-5884

schwartz@gondor.cs.psu.edu (Scott E. Schwartz) (02/11/88)

In article <3920@hall.cray.com> hrp@hall.cray.com (Hal Peterson) writes:
>We ran into this, too.  I think it's either a bug in SunOS or a problem
>in the mapping from NFS to UNIX(R) system calls.  What should happen is
>this: the write(2) call to save the buffer returns -1 with ENOSPC.  Instead,
>the write call says that everything is OK, and the fsync(2) call gives
>the ENOSPC error. 

I hope someone has written this up and sent a bug report to hotline@sun.com.

-- Scott Schwartz            schwartz@gondor.cs.psu.edu

egisin@watmath.waterloo.edu (Eric Gisin) (02/12/88)

In article <3920@hall.cray.com>, hrp@hall.cray.com (Hal Peterson) writes:
> We ran into this, too.  I think it's either a bug in SunOS or a problem
> in the mapping from NFS to UNIX(R) system calls.  What should happen is
> this: the write(2) call to save the buffer returns -1 with ENOSPC.  Instead,
> the write call says that everything is OK, and the fsync(2) call gives
> the ENOSPC error.  This is especially interesting since, according to the
> man page, ENOSPC isn't one of the choices for fsync.  Anyway, the GNU
> Emacs code checks the return value of the write, but not of the fsync,
> resulting in the observed symptom.

Emacs is broken if it doesn't check the status of fsync(),
it could return EIO on a non-NFS system.

The reason write doesn't return ENOSPC on an NFS file system is NFS's write behind. 
So write() doesn't wait for the RPC reply containing the error.
On a local file system the check for ENOSPC is made when write() allocates a block.

To correctly check for all write errors on a NFS or non-NFS system
probably requires doing a fsync() before the close() and checking it's value.
> 
> Long term fix:  fix SunOS.  In the meantime, though, it wouldn't hurt to
> check the return value from fsync.