[comp.mail.mh] inc loses mail when file system full

steved@longs.LANCE.ColoState.Edu (Steve Dempsey) (05/11/89)

The first time this happened, I was completely aghast when inc said:

     3 empty
     4 empty
     5 empty
     6 empty

After a few other commands, I finally got a `write failed: file system full'.
I had not noticed that my partition filled up, and inc did not bother to tell
me either.  Not only does inc give no error message, but the new mail is
deleted from the mail drop and lost forever.  This is not right!

Before I go mucking around in the code (MH6.6), I wonder if anyone has already
fixed this.

        Steve Dempsey,  Center for Computer Assisted Engineering
  Colorado State University, Fort Collins, CO  80523    +1 303 491 0630
INET: steved@longs.LANCE.ColoState.Edu, dempsey@handel.CS.ColoState.Edu
UUCP: boulder!ccncsu!longs.LANCE.ColoState.Edu!steved, ...!ncar!handel!dempsey

khera@juliet.cs.duke.edu (Vick Khera) (05/11/89)

In article <1875@ccncsu.ColoState.EDU> steved@longs.LANCE.ColoState.Edu (Steve Dempsey) writes:
>The first time this happened, I was completely aghast when inc said:
> ...
>After a few other commands, I finally got a `write failed: file system full'.
> ...
>Before I go mucking around in the code (MH6.6), I wonder if anyone has already
>fixed this.
>
>        Steve Dempsey,  Center for Computer Assisted Engineering

This was a problem back in  mh 6.3, but inc did indeed warn me and did not
zero out my mail drop when the file system was full under mh 6.6.  I am
running under SunOS 3.2 and 4.0 with the folowing mh config file.

bin	/usr/local/public/bin/mh
debug	off
bboards	off
etc	/usr/local/public/lib/mh
mail	/usr/spool/mail
mandir	/usr/local/public/man
manuals	local
mts	sendmail/smtp
pop	off
options BSD42
options MHE NETWORK
options BERK MHRC WHATNOW OVERHEAD
options BIND RPATHS
ldoptions -O
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
ARPA:	khera@cs.duke.edu		Department of Computer Science
CSNET:	khera@duke			Duke University
UUCP:	{mcnc,decvax}!duke!khera	Durham, NC 27706

steved@longs.LANCE.ColoState.Edu (Steve Dempsey) (05/12/89)

[about inc losing mail when file system is full]


I got a couple of replies saying the problem should have been fixed
after MH6.3, so I looked a bit further and found it is more complicated
than I thought.  I usually use a workstation where my home is remote
mounted with NFS.  This is when I have problems.  But if I use the
machine where file system physically resides, inc does recognize the
error and aborts, leaving the mail drop intact. Using the other machine
is not an option because it is only a gateway & file server, and most
of our 2000+ users are not permitted to login to the servers.

My MH options are:

version: MH 6.6 #1[UCI] (ics) of Tue May 24 15:51:53 PDT 1988
options: [BSD42] [BSD43] [BERK] [TTYD] [DUMB] [MHE] [NETWORK] [BIND]
         [RPATHS] [ATZ] [SBACKUP='"#"'] [DPOP] [MHRC] [RPOP] [OVERHEAD]
         [WHATNOW] [SENDMTS] [SMTP] [POP] [BPOP]

This is under Ultrix2.2 and MORE/BSD (Mt. Xinu).  So who is at fault,
MH or NFS?

        Steve Dempsey,  Center for Computer Assisted Engineering
  Colorado State University, Fort Collins, CO  80523    +1 303 491 0630
INET: steved@longs.LANCE.ColoState.Edu, dempsey@handel.CS.ColoState.Edu
UUCP: boulder!ccncsu!longs.LANCE.ColoState.Edu!steved, ...!ncar!handel!dempsey

smarks%trantor@Sun.COM (Stuart Marks) (05/13/89)

In article <1884@ccncsu.ColoState.EDU> steved@longs.LANCE.ColoState.Edu (Steve Dempsey) writes:
> I usually use a workstation where my home is remote
> mounted with NFS.  This is when I have problems.  But if I use the
> machine where file system physically resides, inc does recognize the
> error and aborts, leaving the mail drop intact.

This is just a guess, but make sure that the filesystem on which your
home directory resides is mounted "hard".  That is, the line in your
/etc/fstab should look something like:

	host:/usr/host /usr/host nfs rw,hard,bg,intr 0 0

and not:

	host:/usr/host /usr/host nfs rw,soft,bg,intr 0 0

If the fileserver goes down or there's a network glitch, the hard mount
will wait until the request is satisfied, while the soft mount will even-
tually timeout and return an error.  The upshot is that soft-mounted file-
systems will return errors much more frequently and hard-mounted ones.
This tends to trip up a lot of programs.  Who knows, maybe "inc" isn't
as bulletproof as we've been led to believe.

s'marks
Stuart Marks			ARPA: smarks@sun.com
Window Systems Group		UUCP: {decwrl,ucbvax}!sun!smarks
Sun Microsystems, Inc.

steved@longs.LANCE.ColoState.Edu (Steve Dempsey) (05/28/89)

I posted some weeks ago about my problem with inc on a full file system.
I've since determined that the real problem is with a less than perfect
implementation of NFS.  It seems that a full file system is not caught
until a buffer is flushed or the file is closed.  Here is my solution.

        
        Steve Dempsey,  Center for Computer Assisted Engineering
  Colorado State University, Fort Collins, CO  80523    +1 303 491 0630
INET: steved@longs.LANCE.ColoState.Edu, dempsey@handel.CS.ColoState.Edu
UUCP: boulder!ccncsu!longs.LANCE.ColoState.Edu!steved, ...!ncar!handel!dempsey

-----------------------------------------------------------------------------

MH-Version: 6.6

mh-6.6/conf/MH:
    bin	/usr/new/mh
    debug	off
    bboards	off
    etc	/usr/new/lib/mh
    mail	/usr/spool/mail
    mandir	/usr/man
    manuals	new
    chown	/etc/chown
    mts	sendmail/smtp
    options	BSD42 BSD43 BERK TTYD DUMB
    options	MHE NETWORK BIND RPATHS ATZ
    options	SBACKUP='"\\043"'
    options DPOP MHRC RPOP OVERHEAD WHATNOW SENDMTS SMTP POP BPOP
    ldoptions -ns -O
    ccoptions -O -20
    mf	off
    pop	on
    popbboards	off

System: VAX780,VAX730,MtXinu 4.3BSD;  VAX3600,VS2000,Ultrix2.2,Ultrix3.0
        (at least :-)

Index: mh-6.6/uip/inc.c

Descripton:
	Inc may fail to detect a full file system, proceed to create empty
	messages, and delete the messages from the mail drop.  This occurs
	primarily on NFS mounted folders.

Repeat-By:
	Place home directory, or MH-Path on an NFS mounted file system.
	Fill the file system to capacity.  Run inc.

Fix:
	Check for error on file close.  Also check the file size before
	and after the close.


*** inc.c.orig	Sat May 27 12:59:12 1989
--- inc.c	Sat May 27 13:03:48 1989
***************
*** 562,568
  			    noisy);
  	    }
  	    else {
! 		(void) fclose (pf);
  		free (cp);
  	    }
  

--- 562,586 -----
  			    noisy);
  	    }
  	    else {
! 
! 		int beforesize;
! 		struct stat pfsb;
! 
! 		if (fstat(fileno(pf),&pfsb))
! 		    adios(cp,"pre_fstat on");
! 
! 		beforesize=pfsb.st_size;
! 
! 		if (fclose (pf) == EOF) {
! 		    adios (cp, "close error on");
! 		}
! 
! 		if (stat(cp,&pfsb))
! 		    adios(cp,"post_stat on");
! 
! 		if (beforesize != pfsb.st_size)
! 		    adios(cp,"file size changed after close on");
! 
  		free (cp);
  	    }
  

leres@ace.ee.lbl.gov (Craig Leres) (06/01/89)

Humm... Steve Dempsey's fix to inc seems overly complex. My fix for
6.5's inc (which should work with the 6.6 version) is to check the
status of ferror() and fclose(); appended is a context diff.

		Craig
------
diff -c -r1.1 inc.c
*** /tmp/,RCSt1a01830   Wed May 31 22:00:18 1989
--- inc.c       Wed May 31 21:49:13 1989
***************
*** 580,586 ****
		(void) map_write (file, pd, 0, start, stop, pos, size, noisy);
	    }
	    else {
!               (void) fclose (pf);
		free (cp);
	    }
  
--- 580,587 ----
		(void) map_write (file, pd, 0, start, stop, pos, size, noisy);
	    }
	    else {
!               if (ferror(pf) || fclose (pf))
!                   adios (file, "write error on");
		free (cp);
		}

steved@longs.LANCE.ColoState.Edu (Steve Dempsey) (06/02/89)

> Humm... Steve Dempsey's fix to inc seems overly complex. My fix for
> 6.5's inc (which should work with the 6.6 version) is to check the
> status of ferror() and fclose(); appended is a context diff.
> 
> 		Craig

[diff deleted]

I stand corrected.  This is much simpler and works just as well.  It
seems to me, however, that any such errors should be detected on read
or write long before closing the file.  And why was this fix not included
in 6.6?

        
        Steve Dempsey,  Center for Computer Assisted Engineering
  Colorado State University, Fort Collins, CO  80523    +1 303 491 0630
INET: steved@longs.LANCE.ColoState.Edu, dempsey@handel.CS.ColoState.Edu

leres@ace.ee.lbl.gov (Craig Leres) (06/06/89)

Steve Dempsey writes:
> I stand corrected.  This is much simpler and works just as well.  It
> seems to me, however, that any such errors should be detected on read
> or write long before closing the file.  And why was this fix not included
> in 6.6?

It's certainly a good practice to check the return status of each and
every system call. Meanwhile, the mechanism used by ferror() is pretty
good; errors detected by fwrite() or fflush() set a bit that ferror()
can be used to check. And I really don't care when the error is
detected so long as my mail drop doesn't get clobbered.

I suppose one reason my fix wasn't included in mh 6.6 is that I didn't
implement it until a few weeks ago.

		Craig