[comp.unix.questions] File system problems

KFL@AI.AI.MIT.EDU (Keith F. Lynch) (07/25/87)

We have had a lot of problems with the Sun 3/260 (3.3) (4.2BSD) we have
had for two months.  The file system has gotten totalled several times.

Our Sun representative has told us that if either disk partition becomes
more than 90% full, it is normal for all files on both partitions to be
trashed without warning.  Is this right?  If it is, is there a way to
prevent more than 90% of a partition from being used?

He also said it could be trashed if a program tries to use too much
memory, for instance with large arrays of real numbers.  Is this true?
If so, how can we prevent this?

He has also said that after using doing a restore of a zero level dump,
it is necessary to immediately do another zero level dump or the file
system will get hosed again.  Is this really needed?  If so, can it be
done overnight, to /dev/null?

Please reply to me.  I am not on both of these lists.
								...Keith

barry@mind.UUCP (Barry Lustig) (07/26/87)

In article <8467@brl-adm.ARPA> KFL@AI.AI.MIT.EDU (Keith F. Lynch) writes:
>We have had a lot of problems with the Sun 3/260 (3.3) (4.2BSD) we have
>had for two months.  The file system has gotten totaled several times.
>
>Our Sun representative has told us that if either disk partition becomes
>more than 90% full, it is normal for all files on both partitions to be
>trashed without warning.  Is this right?  If it is, is there a way to
>prevent more than 90% of a partition from being used?

That has got to be the one of the most pathetic explanations I've ever
heard.  There is no reason for any file to get trashed because the file
systems is 90% full.  If it were true, 75% of the filesystems in UNIX
land would be trashed.  90% is an interesting figure though.  In the
Berkeley fast filesystem, only root can allocate the last 10% of a
filesystem (changeable with tunefs(8)).

>He also said it could be trashed if a program tries to use too much
>memory, for instance with large arrays of real numbers.  Is this true?
>If so, how can we prevent this?

More garbage from your Sun representative.

>He has also said that after using doing a restore of a zero level dump,
>it is necessary to immediately do another zero level dump or the file
>system will get hosed again.  Is this really needed?  If so, can it be
>done overnight, to /dev/null?

And even more garbage.

Do you by any chance have either a Xylogics 451 controller or a Fuji
SuperEagle?   If so, that is where you problem probably is.  Under very
heavy loads with 2 drives hanging off of it, the 451 has been known to
write data with bits shifted.  The Fuji SuperEagles have also been know
to have problems.

I recommend that you call 1-800-USA-4SUN (Sun's technical support
number) and demand some competent help with your problem.

Barry Lustig
Cognitive Science Lab
Princeton University

jpn@teddy.UUCP (John P. Nelson) (07/27/87)

>In article <8467@brl-adm.ARPA> KFL@AI.AI.MIT.EDU (Keith F. Lynch) writes:
>>He has also said that after using doing a restore of a zero level dump,
>>it is necessary to immediately do another zero level dump

In article <1052@mind.UUCP> barry@mind.UUCP (Barry Lustig) writes:
>And even more garbage.

Most of what the "Sun representative" is supposed to have said was just
that:  garbage.  Interestingly, this part is NOT garbage.  Oh, not doing
another level 0 dump will not trash the filesystem, but it COULD render
all subsequent incremental backups useless.  To quote from the "restore"
manual page:

     A level zero  dump  must  be  done  after  a  full  restore.
     Because  restore  runs  in user mode, it has no control over
     inode allocation; this means that  restore  repositions  the
     files,  although it does not change their contents.  Thus, a
     full dump must be done to  get  a  new  set  of  directories
     reflecting the new file positions, so that later incremental
     dumps will be correct.

john@xanth.UUCP (John Owens) (07/28/87)

In article <1052@mind.UUCP>, barry@mind.UUCP (Barry Lustig) writes:
> In article <8467@brl-adm.ARPA> KFL@AI.AI.MIT.EDU (Keith F. Lynch) writes:
> >He has also said that after using doing a restore of a zero level dump,
> >it is necessary to immediately do another zero level dump or the file
> >system will get hosed again.  Is this really needed?  If so, can it be
> >done overnight, to /dev/null?
> 
> And even more garbage.

Your other comments are good, but in this case, the Sun person was
somewhat correct, even if he didn't really know what he was talking
about.  The filesystem certainly won't get "hosed" if you don't dump
it, but future incremental dumps will.  If you do a complete
filesystem restoration (level 0 and any incrementals), it's good
practice to do a fresh level 0 dump.  You *must* do this before any
future incrementals on that filesystem.  If you know the next
scheduled backup of that filesystem is a level 0 dump, it's safe not
to worry about it.
-- 
John Owens		Old Dominion University - Norfolk, Virginia, USA
john@ODU.EDU		old arpa: john%odu.edu@RELAY.CS.NET
+1 804 440 4529		old uucp: {decuac,harvard,hoptoad,mcnc}!xanth!john

gordon@sneaky.UUCP (08/01/87)

> /* Written  6:53 pm  Jul 25, 1987 by mind.UUCP!barry in sneaky:comp.unix.ques */
> In article <8467@brl-adm.ARPA> KFL@AI.AI.MIT.EDU (Keith F. Lynch) writes:
...
> >He has also said that after using doing a restore of a zero level dump,
> >it is necessary to immediately do another zero level dump or the file
> >system will get hosed again.  Is this really needed?  If so, can it be
> >done overnight, to /dev/null?
> 
> And even more garbage.
...

I'm not so sure this is pure garbage.  The 4.2/4.3 BSD restore program restores
files by going through the file system, not the disk device, and the inodes
of the files restored do not necessarily have the same numbers as they had
on the dump.  

If you don't do a full level zero dump, and later do an incremental dump,
and your file system gets trashed again (not because of not doing a level
zero dump, but because of Murphy's Law), and you try to restore the OLD
level zero dump and the NEW incremental on top of it, you will probably get 
garbage.  The first dump of the restored file system should be level 0.

				Gordon L. Burditt
				...!ihnp4!sys1!sneaky!gordon
				...!convex!infoswx!hal6000!sneaky!gordon

mangler@cit-vax.Caltech.Edu (System Mangler) (08/01/87)

In article <1052@mind.UUCP>, barry@mind.UUCP (Barry Lustig) writes:
> Under very
> heavy loads with 2 drives hanging off of it, the 451 has been known to
> write data with bits shifted.

Our old Xylogics 450 did the same thing, and this was without
overlapped seeks, so I don't understand how it can matter how
many drives are on the controller.  (Yes, we had two on ours).

I guess this makes the 451 "bug-for-bug" compatible with the 450?
Do they use the same microcode?

In our case this particular problem went away after a controller swap.

Don Speck   speck@vlsi.caltech.edu  {ll-xn,rutgers,amdahl}!cit-vax!speck

mangler@cit-vax.UUCP (08/01/87)

>In article <8467@brl-adm.ARPA> KFL@AI.AI.MIT.EDU (Keith F. Lynch) writes:
>>He has also said that after using doing a restore of a zero level dump,
>>it is necessary to immediately do another zero level dump

In article <4221@teddy.UUCP>, jpn@teddy.UUCP (John P. Nelson) writes:
> not doing another level 0 dump [...] COULD render
> all subsequent incremental backups useless.  To quote from the "restore"
> manual page:
>      [...]  Thus, a
>      full dump must be done to  get  a  new  set  of	directories
>      reflecting the new file positions, so that later incremental
>      dumps will be correct.

[BSD-specific discussion]

Could someone explain to me why this should be true?  After a full
restore, the st_ctime of every file/directory has been updated, so the
next dump, no matter what level, should dump every single file, right?
How is running restore on a fresh filesystem different from cleaning
off the current filesystem with "rm -rf" and creating a bunch of files?

Admittedly, since the incremental will be just as large as a full, you
might as well do a full, but it seems like something is basically wrong
if an incremental doesn't work just because you changed *everything*.
[Yes, I know that restore bombs if you changed *nothing*, but so what].

Perhaps the comments in the man page are a holdover from 4.1bsd dump,
which did, in fact, need this restriction because restor [sic] wrote
the raw disk and thus did not update st_ctime?

Don Speck   speck@vlsi.caltech.edu  {ll-xn,rutgers,amdahl}!cit-vax!speck

jerry@oliveb.UUCP (Jerry F Aguirre) (08/29/87)

In article <8467@brl-adm.ARPA> KFL@AI.AI.MIT.EDU (Keith F. Lynch) writes:
>He has also said that after using doing a restore of a zero level dump,
>it is necessary to immediately do another zero level dump or the file
>system will get hosed again.  Is this really needed?  If so, can it be
>done overnight, to /dev/null?

1/half true.  Under 4.1BSD and before the restor (sic) worked on the raw 
file system.  This meant that the restored files had the same inode
number and ctime.  (The only thing different would be the actual data
block addresses and that is desirable.)

After 4.2BSD, "restore" worked on the mounted file system and did creat,
write, link, etc. calls to put the files on disk.  Because of this the inode numbers would almost certainly be different.  Also, while the
restore could reset the atime and mtime using utimes(2), there is no
system call to reset the ctime.

So, the inodes are different and the ctime is the time of restore.  The
updated ctimes will force the next dump, of whatever level, to dump
every file that was restored.  If you are planning a small level 9 dump
and the entire file system gets dumped this can cause confusion.  The
best way to avoid confusion is to do another level 0 dump.  The only
rush is to do it before, or in place of, the next regular dump of that
file system.

People have suggested playing with the system date or editing dumpdates.
This will not fix anything because the inode numbers have changed.
That is the kind of thing that can cause corrupted files if you have to
do another restore later.

The important thing to remember is that all the files really have been
changed (as far as dump/restore is concerned) and will get dumped.

					Jerry Aguirre